You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/main/asciidoc/_chapters/configuration.adoc
+15-11Lines changed: 15 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@
29
29
30
30
This chapter expands upon the <<getting_started>> chapter to further explain configuration of Apache HBase.
31
31
Please read this chapter carefully, especially the <<basic.prerequisites,Basic Prerequisites>>
32
-
to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
32
+
to ensure that your HBase testing and deployment goes smoothly.
33
33
Familiarize yourself with <<hbase_supported_tested_definitions>> as well.
34
34
35
35
== Configuration Files
@@ -164,9 +164,9 @@ It is recommended to raise the ulimit to at least 10,000, but more likely 10,240
164
164
+
165
165
For example, assuming that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open `3 * 3 * 100 = 900` file descriptors, not counting open JAR files, configuration files, and others. Opening a file does not take many resources, and the risk of allowing a user to open too many files is minimal.
166
166
+
167
-
Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS issues thread on the hbase-users mailing list, from 2011.
167
+
Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions.
168
168
+
169
-
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on your hadoop cluster is Aaron Kimball's Configuration Parameters: What can you just ignore?
169
+
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance.
170
170
+
171
171
.`ulimit` Settings on Ubuntu
172
172
====
@@ -201,7 +201,8 @@ See link:https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Suppo
201
201
.Hadoop 2.x is recommended.
202
202
[TIP]
203
203
====
204
-
Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile.
204
+
Hadoop 2.x is faster and includes features, such as short-circuit reads (see <<perf.hdfs.configs.localread>>),
205
+
which will help improve your HBase random read profile.
205
206
Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase does not support running with
206
207
earlier versions of Hadoop. See the table below for requirements specific to different HBase versions.
207
208
@@ -226,7 +227,8 @@ Use the following legend to interpret this table:
226
227
|Hadoop-2.7.0 | X | X | X
227
228
|Hadoop-2.7.1+ | S | S | S
228
229
|Hadoop-2.8.[0-1] | X | X | X
229
-
|Hadoop-2.8.2+ | NT | NT | NT
230
+
|Hadoop-2.8.2 | NT | NT | NT
231
+
|Hadoop-2.8.3+ | NT | NT | S
230
232
|Hadoop-2.9.0 | X | X | X
231
233
|Hadoop-3.0.0 | NT | NT | NT
232
234
|===
@@ -252,18 +254,20 @@ data loss. This patch is present in Apache Hadoop releases 2.6.1+.
252
254
.Hadoop 2.y.0 Releases
253
255
[TIP]
254
256
====
255
-
Starting around the time of Hadoop version 2.7.0, the Hadoop PMC got into the habit of calling out new minor releases on their major version 2 release line as not stable / production ready. As such, HBase expressly advises downstream users to avoid running on top of these releases. Note that additionally the 2.8.1 was release was given the same caveat by the Hadoop PMC. For reference, see the release announcements for link:https://s.apache.org/hadoop-2.7.0-announcement[Apache Hadoop 2.7.0], link:https://s.apache.org/hadoop-2.8.0-announcement[Apache Hadoop 2.8.0], link:https://s.apache.org/hadoop-2.8.1-announcement[Apache Hadoop 2.8.1], and link:https://s.apache.org/hadoop-2.9.0-announcement[Apache Hadoop 2.9.0].
257
+
Starting around the time of Hadoop version 2.7.0, the Hadoop PMC got into the habit of calling out new minor releases on their major version 2 release line as not stable / production ready. As such, HBase expressly advises downstream users to avoid running on top of these releases. Note that additionally the 2.8.1 release was given the same caveat by the Hadoop PMC. For reference, see the release announcements for link:https://s.apache.org/hadoop-2.7.0-announcement[Apache Hadoop 2.7.0], link:https://s.apache.org/hadoop-2.8.0-announcement[Apache Hadoop 2.8.0], link:https://s.apache.org/hadoop-2.8.1-announcement[Apache Hadoop 2.8.1], and link:https://s.apache.org/hadoop-2.9.0-announcement[Apache Hadoop 2.9.0].
256
258
====
257
259
258
260
.Replace the Hadoop Bundled With HBase!
259
261
[NOTE]
260
262
====
261
-
Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its _lib_ directory.
262
-
The bundled jar is ONLY for use in standalone mode.
263
+
Because HBase depends on Hadoop, it bundles Hadoop jars under its _lib_ directory.
264
+
The bundled jars are ONLY for use in standalone mode.
263
265
In distributed mode, it is _critical_ that the version of Hadoop that is out on your cluster match what is under HBase.
264
-
Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues.
265
-
Make sure you replace the jar in HBase across your whole cluster.
266
-
Hadoop version mismatch issues have various manifestations but often all look like its hung.
266
+
Replace the hadoop jars found in the HBase lib directory with the equivalent hadoop jars from the version you are running
267
+
on your cluster to avoid version mismatch issues.
268
+
Make sure you replace the jars under HBase across your whole cluster.
269
+
Hadoop version mismatch issues have various manifestations. Check for mismatch if
0 commit comments