An elastic network interface (ENI) is a virtual network interface controller (NIC) that can be bound to an ECS instance that is deployed in a VPC. You can use ENIs to deploy high availability clusters and perform low-cost failovers and fine-grained network management. This topic describes how to use the serverless Spark engine of DLA to access ApsaraDB for HBase clusters in your VPC by using an ENI.
Prerequisites
Notice The vSwitch and security group where the existing ENI resides in your ApsaraDB for
HBase cluster can be used.
Procedure
- Add the CIDR block of the vSwitch where the ENI resides to a whitelist or security
group of your ApsaraDB for HBase cluster..If you want to access an ApsaraDB for HBase cluster where X-Pack is deployed, you can add the CIDR block of the vSwitch where the ENI resides to a whitelist of the ApsaraDB for HBase cluster in the ApsaraDB for HBase console.
- Obtain the parameters that need to be configured for the serverless Spark engine.ApsaraDB for HBase clusters where X-Pack is deployed: Log on to the ApsaraDB for HBase console. In the left-side navigation pane, click Clusters, click the name of the specific instance in the ID / Name column. In the left-side navigation pane, click Database Connection and view
ThriftServer Access
- Edit code in the Spark application file to access the ApsaraDB for HBase cluster.
- Sample code:
package com.aliyun.spark import org.apache.spark.sql.SparkSession object SparkHbase { def main(args: Array[String]): Unit = { //The ZooKeeper address of the ApsaraDB for HBase cluster. The ZooKeeper address in the sample code is for reference only. Replace the address with the ZooKeeper address of your ApsaraDB for HBase cluster. //Format: xxx-002.hbase.rds.aliyuncs.com:2181,xxx-001.hbase.rds.aliyuncs.com:2181,xxx-003.hbase.rds.aliyuncs.com:2181 val zkAddress = args(0) //The name of the table in your ApsaraDB for HBase cluster. You must create a table in advance. For more information about how to create a table in your ApsaraDB for HBase cluster, click the following link: https://www.alibabacloud.com/help/doc-detail/52051.htm val hbaseTableName = args(1) //The name of the table in the serverless Spark engine. val sparkTableName = args(2) val sparkSession = SparkSession .builder() // .enableHiveSupport() //After enableHiveSupport is called, use the Java Database Connectivity (JDBC) driver of the serverless Spark engine to query the table you created from the code. .appName("scala spark on HBase test") .getOrCreate() import sparkSession.implicits. _ //If the table exists, delete it. sparkSession.sql(s"drop table if exists $sparkTableName") val createCmd = s"""CREATE TABLE ${sparkTableName} USING org.apache.hadoop.hbase.spark | OPTIONS ('catalog'= | '{"table":{"namespace":"default", "name":"${hbaseTableName}"},"rowkey":"rowkey", | "columns":{ | "col0":{"cf":"rowkey", "col":"rowkey", "type":"string"}, | "col1":{"cf":"cf", "col":"col1", "type":"String"}}}', | 'hbase.zookeeper.quorum' = '${zkAddress}' | )""".stripMargin println(s" the create sql cmd is: \n $createCmd") sparkSession.sql(createCmd) val querySql = "select * from " + sparkTableName + " limit 10" sparkSession.sql(querySql).show } }
- POM file that contains the dependencies of ApsaraDB for HBase:
<dependency> <groupId>com.aliyun.apsaradb</groupId> <artifactId>alihbase-spark</artifactId> <version>1.1.3_2.4.3-1.0.4</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.aliyun.hbase</groupId> <artifactId>alihbase-client</artifactId> <version>1.1.3</version> <scope>provided</scope> <exclusions> <exclusion> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.aliyun.hbase</groupId> <artifactId>alihbase-protocol</artifactId> <version>1.1.3</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.aliyun.hbase</groupId> <artifactId>alihbase-server</artifactId> <version>1.1.3</version> <scope>provided</scope> <exclusions> <exclusion> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> </exclusion> </exclusions> </dependency>
- Sample code:
- Upload the JAR file of the Spark application and dependencies to Object Storage Service
(OSS).For more information, see Upload objects.Note The region where OSS resides must be the same as the region where the serverless Spark engine resides.
- Submit a job in the serverless Spark engine and perform data computations.
- Use HBase Shell of your ApsaraDB for HBase cluster to prepare data.
bin/hbase shell hbase(main):001:0> create 'mytable', 'cf' hbase(main):001:0> put 'mytable', 'rowkey1', 'cf:col1', 'this is value'
- In the DLA console, submit a job to access your ApsaraDB for HBase cluster. For more
information, see Create and run Spark jobs.
{ "args": [ "xxx:2181,xxx1:2181,xxx2:2181", "mytable", "spark_on_hbase_job" ], "name": "spark-on-hbase", "className": "com.aliyun.spark.SparkHbase", "conf": { "spark.dla.eni.vswitch.id": "{ID of the vSwitch that you selected}", "spark.dla.eni.security.group.id": "{ID of the security group that you selected}", "spark.driver.resourceSpec": "medium", "spark.dla.eni.enable": "true", "spark.dla.connectors": "hbase", "spark.executor.instances": 2, "spark.executor.resourceSpec": "medium" }, "file": "oss://{OSS directory in which your JAR file is stored}" }
The following table describes the parameters.Parameter Description Remarks xxx:2181,xxx1:2181,xxx2:2181 The ZooKeeper address of your ApsaraDB for HBase cluster. You can perform Step 2 to obtain this address in the ApsaraDB for HBase console. mytable The table in your ApsaraDB for HBase cluster. In this topic, the table name is mytable. HBase Shell is used to prepare table data. None spark_on_hbase_job The name of the table in the serverless Spark engine. This table is mapped to the table created in your ApsaraDB for HBase cluster. None spark.dla.connectors Specifies whether to include the built-in JAR package of the serverless Spark engine in classpath. This JAR package may contain the dependency that allows you to read the tables in your ApsaraDB for HBase cluster. If the JAR package does not contain this dependency, you must configure this parameter. If the JAR package contains this dependency, you do not need to configure this parameter. spark.dla.eni.vswitch.id The ID of the vSwitch where your ENI resides. None spark.dla.eni.security.group.id The ID of the security group where your ENI resides. None spark.dla.eni.enable Specifies whether to enable the ENI. None After the job succeeds, find the job and click Log in the Operation column to view the logs of the job.
Note If you want to upload a custom JAR package of the ApsaraDB for HBase connector, you do not need to setspark.dla.connectors
to hbase. Instead, you can usejars:["<oss://path/to/your/hbase/connector/jar>"]
inConfigJson
to upload the JAR package that contains the dependency of the ApsaraDB for HBase cluster. - Use HBase Shell of your ApsaraDB for HBase cluster to prepare data.