Ambari allows you to install, manage and maintain, and monitor Hadoop components. You can use Ambari to manage Hadoop clusters. This topic describes how to use Ambari together with LindormDFS of ApsaraDB for Lindorm (Lindorm). LindormDFS replaces the underlying Hadoop Distributed File System (HDFS) storage. You can use Ambari together with LindormDFS to set up an open source cloud native system for big data. This way, Lindorm decouples computing and storage.
Prerequisites
- Your Lindorm instance and Ambari are deployed in the same virtual private cloud (VPC).
- The IP address of a node on which Ambari is deployed is added to the whitelist of the Lindorm instance. For information about how to add an IP address to a whitelist, see Configure a whitelist.
Set the file engine of LindormDFS as the default storage engine
- Activate LindormDFS. For more information, see Activate LindormDFS.
- Configure the endpoint of LindormDFS.
- Log on to Ambari. In the left-side navigation pane, click HDFS. Then, click the CONFIGS tab and click
ADVANCED. You are redirected to HDFS Configuration Pages.
- By default, a Hadoop distributed file system that is built in Ambari uses a primary/secondary architecture. In this architecture, NameNodes on HDFS are deployed in primary/secondary mode to ensure high availability (HA). Therefore, when you initialize Ambari, deploy HDFS in HA mode.
- In the Custom hdfs-site section, choose Custom hdfs-site > Add Property. Configure Key, Value, and Property Type to add a configuration item for LindormDFS. The following table describes the parameters.
Key Value Property Type Description dfs.client.failover.proxy.provider.{Instance ID} org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider TEXT None dfs.ha.namenodes.{Instance ID} NameNode 1 and NameNode 2 TEXT The name of LindormDFS in HA mode. dfs.namenode.rpc-address.{Instance ID} {Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020 TEXT None dfs.namenode.rpc-address.{Instance ID} {Instance ID}-master2-001.lindorm.rds.aliyuncs.com:8020 TEXT None dfs.nameservices {Instance ID} TEXT None - Click ADD to add the configuration item.
- Log on to Ambari. In the left-side navigation pane, click HDFS. Then, click the CONFIGS tab and click
ADVANCED. You are redirected to HDFS Configuration Pages.
- Set the file engine of LindormDFS as the default storage engine for Ambari.
- Log on to Ambari. In the left-side navigation pane, click HDFS. Then, click the CONFIGS tab and click ADVANCED. You are redirected to HDFS Configuration Pages.
- In the Advanced core-site section, enter the endpoint of LindormDFS in the fs.defaultFS field. For information about how to obtain an endpoint, see Activate LindormDFS.
- Modify the configuration file hdfs-site.
- In the left-side navigation pane of the Ambari homepage, click HDFS. Then, click the CONFIGS tab and click ADVANCED. Find the Custom hdfs-site section.
- Set dfs.internal.nameservices to the ID of the Lindorm instance.
- Add the configuration item dfs.namenode.http-address.{Instance ID}.nn1 and set the value to
{Instance ID}-master1-001.lindorm.rds.aliyuncs.com:50070
. - Add the configuration item dfs.namenode.http-address.{Instance ID}.nn2 and set the value to
{Instance ID}-master2-001.lindorm.rds.aliyuncs.com:50070
.
- Click SAVE to save the settings.
- In the upper-right corner of the page, choose ACTIONS > Restart All to restart LindormDFS.
Notice After you modify the configuration, perform this step to apply the configuration to each node in Ambari. LindormDFS cannot restart due to the configuration changes. Therefore, disable LindormDFS. To check whether you can use Ambari to connect to LindormDFS, log on to a node on which Ambari is deployed and run the command in Step 4. If the command output in Step 4 appears, the endpoint of LindormDFS is configured.
- In the upper-right corner of the page, choose ACTIONS > Stop to disable LindormDFS.
- Log on to Ambari. In the left-side navigation pane, click HDFS. Then, click the CONFIGS tab and click ADVANCED. You are redirected to HDFS Configuration Pages.
- Check whether you can use Ambari to connect to LindormDFS.
- Log on to a node on which Ambari is deployed. Then, run the following command:
$ hadoop fs -ls /
- Verify the result. If the following command output appears, the endpoint of LindormDFS
is configured. Notice If the configuration in Step 3 does not take effect, log on to Ambari and choose ACTIONS > Restart All in the upper-right corner. Then, click Stop to disable LindormDFS. This validates the new configuration.
- Log on to a node on which Ambari is deployed. Then, run the following command:
- If each component is running as expected, LindormDFS replaces the underlying HDFS
storage, as shown in the following figure.
Install YARN
- Log on to Ambari. In the left-side navigation pane, click the
icon next to Services and click Add Service. On the page that appears, select the YARN + MapReduce2 check box.
- On the Add Service Wizard page, follow the steps in the wizard to install YARN. After you configure the settings,
click DEPLOY. Wait until YARN is installed.
- Check whether YARN is running as expected.
After you use Ambari to install YARN, a file named
hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar
is generated. Use the file to test whether YARN is running as expected. When you use Ambari to install YARN, the file is stored in the directory usr/hdp/3.1.4.0-315/hadoop-mapreduce.- Log on to a node on which Ambari is deployed. Then, run the following command to generate
a test file of 128 MB in the directory /tmp/randomtextwriter:
$ yarn jar /usr/hdp/3.1.4.0-315/hadoop-mapreduce/hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=134217728 -D mapreduce.job.maps=4 -D mapreduce.job.reduces=4 /tmp/randomtextwriter
Notice In the command,hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar
specifies the test file that is generated in Ambari. Replace the value with the actual file name. - Check whether the job is submitted to YARN.
Log on to a node on which Ambari is deployed. Then, run the following command:
$ yarn application -list
If the following command output appears, YARN is running as expected.
- Log on to a node on which Ambari is deployed. Then, run the following command to generate
a test file of 128 MB in the directory /tmp/randomtextwriter:
Install Hive
- Log on to Ambari. In the left-side navigation pane, click the
icon next to Services and click Add Service. On the page that appears, select the Hive check box.
- On the Add Service Wizard page, follow the steps in the wizard to install Hive. After you configure the settings,
click DEPLOY. Wait until YARN is installed.
- If the following information appears, Hive is installed.
- Add proxy users. Hive uses a proxy to connect to LindormDFS. Before you use Hive to connect to LindormDFS, grant permissions to users. The steps to connect Spark to LindormDFS are similar to the steps described in this topic. If you need to add other users, contact Expert Service. For more information, see Expert support.
- Restart YARN. The Tez component is built on top of YARN. Tez can accelerate jobs that
are running in Hive. Therefore, when you install Hive, some of the YARN configuration
is added to Ambari. If you need to validate the configuration, restart YARN.
In the left-side navigation pane, click YARN. In the upper-right corner of the page, choose ACTIONS > Restart All to restart YARN.
- Check whether Hive is running as expected.
- Log on to a node on which Ambari is deployed. Then, run the following command:
# su - hive # Log on to the Hive client. hive@ambaritest2 ~]$ hive Beeline version 3.1.0.3.1.4.0-315 by Apache Hive 0: jdbc:hive2://ambaritest1:2181,ambaritest2:> create table foo (id int, name string); INFO : Compiling command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49): create table foo (id int, name string) INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49); Time taken: 1.337 seconds INFO : Executing command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49): create table foo (id int, name string) INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49); Time taken: 0.814 seconds INFO : OK No rows affected (2.596 seconds) 0: jdbc:hive2://ambaritest1:2181,ambaritest2:> insert into table foo select * from (select 12,"xyz")a; # su - hive # Log on to the Hive client. hive@ambaritest2 ~]$ hive Beeline version 3.1.0.3.1.4.0-315 by Apache Hive 0: jdbc:hive2://ambaritest1:2181,ambaritest2:> create table foo (id int, name string); INFO : Compiling command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49): create table foo (id int, name string) INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49); Time taken: 1.337 seconds INFO : Executing command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49): create table foo (id int, name string) INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20201111193943_5471ede8-e51f-44b8-a91a-b6fde9f58b49); Time taken: 0.814 seconds INFO : OK No rows affected (2.596 seconds) 0: jdbc:hive2://ambaritest1:2181,ambaritest2:> insert into table foo select * from (select 12,"xyz")a;
- Run the following command to query data on Hive:
0: jdbc:hive2://ambaritest1:2181,ambaritest2:> select * from foo;
If the following command output appears, Hive is installed and running as expected.
- Log on to a node on which Ambari is deployed. Then, run the following command:
Install Spark
- Log on to Ambari. In the left-side navigation pane, click the
icon next to Services and click Add Service. On the page that appears, select the Spark2 check box.
- Follow the steps in the wizard to install Spark 2. After you configure the settings,
click DEPLOY. Wait until Spark 2 is installed.
- Check whether Spark is running as expected.
After you use Ambari to install Spark, a file named
spark-examples_2.11-x.x.x.x.x.x.0-315.jar
is generated. Use the file to test whether Spark is running as expected. When you use Ambari to install Spark, the file is stored in the directory /usr/hdp/3.1.4.0-315/spark2/examples/jars/.- Log on to a node on which Ambari is deployed. Then, run the following command to generate
a test file of 128 MB in the directory /tmp/randomtextwriter. Skip this step if the
test file exists.
$ yarn jar /usr/hdp/3.1.4.0-315/hadoop-mapreduce/hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=134217728 -D mapreduce.job.maps=4 -D mapreduce.job.reduces=4 /tmp/randomtextwriter
Note In the command,hadoop-mapreduce-examples-3.1.1.3.1.4.0-315.jar
specifies the test file that is generated in Ambari. Replace the value with the actual file name. - Log on to a node on which Ambari is deployed. Then, run the following command to use
Spark to query the test file on LindormDFS and query the WordCount result:
$ spark-submit --master yarn --executor-memory 2G --executor-cores 2 --class org.apache.spark.examples.JavaWordCount /usr/hdp/3.1.4.0-315/spark2/examples/jars/spark-examples_2.11-2.3.2.3.1.4.0-315.jar /tmp/randomtextwriter
If the following command output appears, Spark is running as expected.
- Log on to a node on which Ambari is deployed. Then, run the following command to generate
a test file of 128 MB in the directory /tmp/randomtextwriter. Skip this step if the
test file exists.
Install HBase
- Log on to Ambari. In the left-side navigation pane, click the
icon next to Services and click Add Service. On the page that appears, select the HBase check box.
- Follow the steps in the wizard to install HBase. After you configure the settings,
click DEPLOY. Wait until HBase is installed.
- Check whether HBase is running as expected.
- Log on to a node on which Ambari is deployed. Run the following command to open HBase Shell:
[spark@ambaritest1 ~]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/phoenix/phoenix-5.0.0.3.1.4.0-315-server.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.0.2.3.1.4.0-315, r, Fri Aug 23 05:15:48 UTC 2019 Took 0.0023 seconds hbase(main):001:0>
- Run the following commands to create a test table on HBase:
[hive@ambaritest2 ~]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/phoenix/phoenix-5.0.0.3.1.4.0-315-server.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.0.2.3.1.4.0-315, r, Fri Aug 23 05:15:48 UTC 2019 Took 0.0023 seconds hbase(main):001:0> create 'hbase_test','info' Created table hbase_test Took 1.9513 seconds => Hbase::Table - hbase_test hbase(main):002:0> put 'hbase_test','1', 'info:name' ,'Sariel' Took 0.2576 seconds hbase(main):003:0> put 'hbase_test','1', 'info:age' ,'22' Took 0.0078 seconds hbase(main):004:0> put 'hbase_test','1', 'info:industry' ,'IT' Took 0.0077 seconds hbase(main):005:0> scan 'hbase_test' ROW COLUMN+CELL 1 column=info:age, timestamp=1605097177701, value=22 1 column=info:industry, timestamp=1605097181758, value=IT 1 column=info:name, timestamp=1605097174400, value=Sariel 1 row(s) Took 0.0230 seconds hbase(main):006:0>
- Run the following command to query the /apps/hbase/data/data/default directory on LindormDFS. If the hbase_test directory exists in /apps/hbase/data/data/default, HBase is running as expected.
- Log on to a node on which Ambari is deployed. Run the following command to open HBase Shell:
Install other services
You can install other services based on the methods described in this topic.