Skip to content

Commit 4211266

Browse files
symatbusbey
authored andcommitted
HBASE-21606 document meta table load metrics
Closes #369 Signed-off-by: Xu Cang <xcang@apache.org> Signed-off-by: Sakthi <sakthivel.azhaku@gmail.com> Signed-off-by: Sean Busbey <busbey@apache.org> (cherry picked from commit e5f05bf)
1 parent 1092533 commit 4211266

1 file changed

Lines changed: 94 additions & 0 deletions

File tree

src/main/asciidoc/_chapters/ops_mgt.adoc

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1537,6 +1537,100 @@ hbase.regionserver.authenticationFailures::
15371537
hbase.regionserver.mutationsWithoutWALCount ::
15381538
Count of writes submitted with a flag indicating they should bypass the write ahead log
15391539

1540+
[[rs_meta_metrics]]
1541+
=== Meta Table Load Metrics
1542+
1543+
HBase meta table metrics collection feature is available in HBase 1.4+ but it is disabled by default, as it can
1544+
affect the performance of the cluster. When it is enabled, it helps to monitor client access patterns by collecting
1545+
the following statistics:
1546+
1547+
* number of get, put and delete operations on the `hbase:meta` table
1548+
* number of get, put and delete operations made by the top-N clients
1549+
* number of operations related to each table
1550+
* number of operations related to the top-N regions
1551+
1552+
1553+
When to use the feature::
1554+
This feature can help to identify hot spots in the meta table by showing the regions or tables where the meta info is
1555+
modified (e.g. by create, drop, split or move tables) or retrieved most frequently. It can also help to find misbehaving
1556+
client applications by showing which clients are using the meta table most heavily, which can for example suggest the
1557+
lack of meta table buffering or the lack of re-using open client connections in the client application.
1558+
1559+
.Possible side-effects of enabling this feature
1560+
[WARNING]
1561+
====
1562+
Having large number of clients and regions in the cluster can cause the registration and tracking of a large amount of
1563+
metrics, which can increase the memory and CPU footprint of the HBase region server handling the `hbase:meta` table.
1564+
It can also cause the significant increase of the JMX dump size, which can affect the monitoring or log aggregation
1565+
system you use beside HBase. It is recommended to turn on this feature only during debugging.
1566+
====
1567+
1568+
Where to find the metrics in JMX::
1569+
Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different
1570+
JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX
1571+
under the following MBean:
1572+
`Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics`.
1573+
1574+
.Examples: some Meta Table metrics you can see in your JMX dump
1575+
[source,json]
1576+
----
1577+
{
1578+
"MetaTable_get_request_count": 77309,
1579+
"MetaTable_put_request_mean_rate": 0.06339092997186495,
1580+
"MetaTable_table_MyTestTable_request_15min_rate": 1.1020599841623246,
1581+
"MetaTable_client_/172.30.65.42_lossy_request_count": 1786
1582+
"MetaTable_client_/172.30.65.45_put_request_5min_rate": 0.6189810954855728,
1583+
"MetaTable_region_1561131112259.c66e4308d492936179352c80432ccfe0._lossy_request_count": 38342,
1584+
"MetaTable_region_1561131043640.5bdffe4b9e7e334172065c853cf0caa6._lossy_request_1min_rate": 0.04925099917433935,
1585+
}
1586+
----
1587+
1588+
Configuration::
1589+
To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml.
1590+
This coprocessor will run on all the HBase RegionServers, but will be active (i.e. consume memory / CPU) only on
1591+
the server, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the
1592+
web UI of the given RegionServer or by a simple REST call. These metrics will not be present in the JMX dump of the
1593+
other RegionServers.
1594+
1595+
.Enabling the Meta Table Metrics feature
1596+
[source,xml]
1597+
----
1598+
<property>
1599+
<name>hbase.coprocessor.region.classes</name>
1600+
<value>org.apache.hadoop.hbase.coprocessor.MetaTableMetrics</value>
1601+
</property>
1602+
----
1603+
1604+
.How the top-N metrics are calculated?
1605+
[NOTE]
1606+
====
1607+
The 'top-N' type of metrics will be counted using the Lossy Counting Algorithm (as defined in
1608+
link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"]),
1609+
which is designed to identify elements in a data stream whose frequency count exceed a user-given threshold.
1610+
The frequency computed by this algorithm is not always accurate but has an error threshold that can be specified by the
1611+
user as a configuration parameter. The run time space required by the algorithm is inversely proportional to the
1612+
specified error threshold, hence larger the error parameter, the smaller the footprint and the less accurate are the
1613+
metrics.
1614+
1615+
You can specify the error rate of the algorithm as a floating-point value between 0 and 1 (exclusive), it's default
1616+
value is 0.02. Having the error rate set to `E` and having `N` as the total number of meta table operations, then
1617+
(assuming the uniform distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and
1618+
each kept element will have a frequency higher than `E * N`.
1619+
1620+
An example: Let’s assume we are interested in the HBase clients that are most active in accessing the meta table.
1621+
When there was 1,000,000 operations on the meta table so far and the error rate parameter is set to 0.02, then we can
1622+
assume that only at most 350 client IP address related counters will be present in JMX and each of these clients
1623+
accessed the meta table at least 20,000 times.
1624+
1625+
[source,xml]
1626+
----
1627+
<property>
1628+
<name>hbase.util.default.lossycounting.errorrate</name>
1629+
<value>0.02</value>
1630+
</property>
1631+
----
1632+
====
1633+
15401634
[[ops.monitoring]]
15411635
== HBase Monitoring
15421636

0 commit comments

Comments
 (0)