Software configuration

Last Updated: Dec 06, 2017

Purpose of software configuration

Hadoop, Hive, Pig, and other relevant software contain numerous configurations that can be changed through the software configuration function. For example, the number of service threads in HDFS server dfs.namenode.handler.count is 10 by default and will be increased to 50, and the size of HDFS file block dfs.blocksize is 128 MB by default and will be decreased to 64 MB because the system contains only small files.

The function can only be performed once during the startup of a cluster.

How to use

  1. Log on to the E-MapReduce console.

  2. Select the region and the created cluster associated with the region is listed.

  3. Click Create cluster to enter the cluster creation page.

  4. All contained software and corresponding versions can be seen in the software configuration of cluster creation. Change the configuration of the cluster by selecting a corresponding json format configuration file in the (optional) software configuration box. Then, override or add to the defaulted cluster parameters. The sample of a .json file is as follows:

    1. {
    2. "configurations": [
    3. {
    4. "classification": "core-site",
    5. "properties": {
    6. "fs.trash.interval": "61"
    7. }
    8. },
    9. {
    10. "classification": "hadoop-log4j",
    11. "properties": {
    12. "hadoop.log.file": "hadoop1.log",
    13. "hadoop.root.logger": "INFO",
    14. "a.b.c": "ABC"
    15. }
    16. },
    17. {
    18. "classification": "hdfs-site",
    19. "properties": {
    20. "dfs.namenode.handler.count": "12"
    21. }
    22. },
    23. {
    24. "classification": "mapred-site",
    25. "properties": {
    26. "mapreduce.task.io.sort.mb": "201"
    27. }
    28. },
    29. {
    30. "classification": "yarn-site",
    31. "properties": {
    32. "hadoop.security.groups.cache.secs": "251",
    33. "yarn.nodemanager.remote-app-log-dir": "/tmp/logs1"
    34. }
    35. },
    36. {
    37. "classification": "httpsfs-site",
    38. "properties": {
    39. "a.b.c.d": "200"
    40. }
    41. },
    42. {
    43. "classification": "capacity-scheduler",
    44. "properties": {
    45. "yarn.scheduler.capacity.maximum-am-resource-percent": "0.2"
    46. }
    47. },
    48. {
    49. "classification": "hadoop-env",
    50. "properties": {
    51. "BC":"CD"
    52. },
    53. "configurations":[
    54. {
    55. "classification":"export",
    56. "properties": {
    57. "AB":"${BC}",
    58. "HADOOP_CLIENT_OPTS":"\"-Xmx512m -Xms512m $HADOOP_CLIENT_OPTS\""
    59. }
    60. }
    61. ]
    62. },
    63. {
    64. "classification": "httpfs-env",
    65. "properties": {
    66. },
    67. "configurations":[
    68. {
    69. "classification":"export",
    70. "properties": {
    71. "HTTPFS_SSL_KEYSTORE_PASS":"passwd"
    72. }
    73. }
    74. ]
    75. },
    76. {
    77. "classification": "mapred-env",
    78. "properties": {
    79. },
    80. "configurations":[
    81. {
    82. "classification":"export",
    83. "properties": {
    84. "HADOOP_JOB_HISTORYSERVER_HEAPSIZE":"1001"
    85. }
    86. }
    87. ]
    88. },
    89. {
    90. "classification": "yarn-env",
    91. "properties": {
    92. },
    93. "configurations":[
    94. {
    95. "classification":"export",
    96. "properties": {
    97. "HADOOP_YARN_USER":"${HADOOP_YARN_USER:-yarn1}"
    98. }
    99. }
    100. ]
    101. },
    102. {
    103. "classification": "pig",
    104. "properties": {
    105. "pig.tez.auto.parallelism": "false"
    106. }
    107. },
    108. {
    109. "classification": "pig-log4j",
    110. "properties": {
    111. "log4j.logger.org.apache.pig": "error, A"
    112. }
    113. },
    114. {
    115. "classification": "hive-env",
    116. "properties": {
    117. "BC":"CD"
    118. },
    119. "configurations":[
    120. {
    121. "classification":"export",
    122. "properties": {
    123. "AB":"${BC}",
    124. "HADOOP_CLIENT_OPTS1":"\"-Xmx512m -Xms512m $HADOOP_CLIENT_OPTS1\""
    125. }
    126. }
    127. ]
    128. },
    129. {
    130. "classification": "hive-site",
    131. "properties": {
    132. "hive.tez.java.opts": "-Xmx3900m"
    133. }
    134. },
    135. {
    136. "classification": "hive-exec-log4j",
    137. "properties": {
    138. "log4j.logger.org.apache.zookeeper.ClientCnxnSocketNIO": "INFO,FA"
    139. }
    140. },
    141. {
    142. "classification": "hive-log4j",
    143. "properties": {
    144. "log4j.logger.org.apache.zookeeper.server.NIOServerCnxn": "INFO,DRFA"
    145. }
    146. }
    147. ]
    148. }

The classification parameter designates the configuration file to change. The properties parameter stores the key-value pair that requires changes. When the default configuration file has a corresponding key, override the value, otherwise, add the corresponding key-value pair.

The correspondence between configuration file and classification is shown in the following table.

Hadoop

File name classification
core-site.xml core-site
log4j.properties hadoop-log4j
hdfs-site.xml hdfs-site
mapred-site.xml mapred-site
yarn-site.xml yarn-site
httpsfs-site.xml httpsfs-site
capacity-scheduler.xml capacity-scheduler
hadoop-env.sh hadoop-env
httpfs-env.sh httpfs-env
mapred-env.sh mapred-env
yarn-env.sh yarn-env

Pig

File name classification
pig.properties pig
log4j.properties pig-log4j

Hive

File name classification
hive-env.sh hive-env
hive-site.xml hive-site
hive-exec-log4j.properties hive-exec-log4j
hive-log4j.properties hive-log4j

The core-site and other flat XML files only have one layer. All configurations are put in properties. The hadoop-en v and other sh files may have two layers of structures and can be set in the embedded configurations mode. See hadoop-env in the example where -Xmx512m -Xms512m setting is added for HADOOP_CLIENT_OPTS property of export.

After setting, confirm and click Next step.

Thank you! We've received your feedback.