How to add a Spark data source - Lindorm - Alibaba Cloud Documentation Center

You can add a Spark data source for fast, batch data imports. This topic describes how to add a Spark data source.

Prerequisites

You have a Lindorm instance with the Lindorm Tunnel Service (LTS) engine.
You have a Lindorm instance with Lindorm Distributed Processing System (LDPS) activated. For more information, see Create an instance.

Log on to the Lindorm console.
On the Instances page, click the ID of the instance that uses the LTS engine.
In the left-side navigation pane, click Data Sources.
On the Compute Engine Data Source tab, click Add Data Source.

In the Add Data Source dialog box, configure the parameters described in the following table.

Parameter	Description
Instance Type	Select Lindorm.
Region	Select the region of the target Lindorm instance.
Instance ID	Select the ID of the target Lindorm instance. Note Ensure that LDPS is activated for the target Lindorm instance. For more information, see Activate the service. Ensure that the target Lindorm instance and the Lindorm instance that uses the LTS engine are in the same virtual private cloud (VPC). To connect instances across different VPCs, see Connect VPCs.

Click Determine. A status of Associated indicates that the Spark data source is successfully added.

Log on to LTS. For more information, see Activate and log on to LTS.
In the left-side navigation pane, choose Data Sources > Add Data Source.

On the Add Data Source page, configure the parameters described in the following table.

Parameter	Description
Name	Enter lts_bulkload_spark.
Data source type	Select Spark.
Data source parameters	Configure the parameters for the Spark data source. `{ "virtualClusterName":"token", "hdfsUri":"hdfs://nn1:8020,nn2:8020", "sparkEndpoint":"http://192.168.XX.XX:10099" }` virtualClusterName: The token for the LDPS JAR address. To obtain the token, go to the Lindorm console, navigate to the Database Connections page. `virtualClusterName` token value can be obtained on the Database Connection page in the Lindorm console, by selecting the Compute Engine tab, and finding the value in the Token field. `hdfsUri`: The HDFS connection address of the Lindorm instance. The format is `hdfs://nn1:8020,nn2:8020`. Note To obtain the `nn1` and `nn2` values for the connection address, see Connect to and use LindormDFS with an open-source HDFS client. The hdfs-site information contains the `nn1` and `nn2` addresses. sparkEndpoint: The LDPS JAR VPC address. To obtain this address, go to the Lindorm console, navigate to the Database Connections page. `sparkEndpoint` value can be obtained on the Database Connection page in the Lindorm console, by selecting the Compute Engine tab, and finding the value in the JAR VPC Address field.