This topic describes how to use the serverless Spark engine to access data in your VPC. The data sources include ApsaraDB RDS, AnalyticDB for MySQL, PolarDB, ApsaraDB for MongoDB, Elasticsearch, ApsaraDB for HBase, E-MapReduce Hadoop, and self-managed data services hosted on ECS instances. When you use DLA to access data from specified services, the configurations provided in this topic are not required. The specified services include Object Storage Service (OSS), MaxCompute, and Tablestore. To access data from these services, you must configure the AccessKey ID and AccessKey secret.

Principles

The Driver and Executor processes on the serverless Spark engine are running in a security container. You can attach an Elastic Network Interface (ENI) of your VPC to the security container. This way, the security container can run in the VPC as an ECS instance. The lifecycle of an ENI is the same as that of a process on the serverless Spark engine. After a job succeeds, all ENIs are released.

To attach an ENI of your VPC to the serverless Spark engine, you must configure the security group and VSwitch of your VPC in the job configuration of the serverless Spark engine. In addition, you must make sure that the network between the ENI and service that stores the destination data is interconnected. The interconnection method is the same as that used for connecting a common virtual machine and the service that stores the destination data. If your ECS instance can access the destination data, you only need to configure the security group and VSwitch of the ECS instance on the serverless Spark engine.

Note On the serverless Spark engine, each of the Driver process and Executor processes that are running on the computing container occupies an IP address of the specified VSwitch. Before you submit a job, make sure that IP addresses in the CIDR block of the VSwitch are sufficient.

Configure the VSwitch and security group

If your ECS instance has accessed the destination data over your VPC, we recommend that you use Method 1 to select the security group and VSwitch of your ECS instance. Otherwise, use Method 2 to create a security group and VSwitch.

Method 1: Select the existing security group and VSwitch

If one of your ECS instances has accessed data in a data service, such as ApsaraDB RDS, you can select the security group and VSwitch of the ECS instance.

  1. Log on to the ECS console. Enter the instance name in the search box to find the instance.
  2. On the Instance Details page of the ECS instance, view the VSwitch ID.
  3. On the security group page, view the security group ID.

Method 2: Create a security group and VSwitch

  1. Create a VSwitch.

    For more information, see Create a VSwitch.

  2. Create a security group.

    For more information, see Create a security group.

  3. Configure the outbound rules of the security group that is created in Step 2.

    Log on to the ECS console. In the left-side navigation pane, choose Network & Security > Security Groups. On the Security Groups page, configure outbound rules to allow access to the destination data.

  4. Access the destination data.

    If the destination data is stored in an instance of an Alibaba Cloud service, such as ApsaraDB RDS or ApsaraDB for MongoDB, you can configure a whitelist in the console of the service. Typically, a whitelist can include CIDR blocks or security groups. To configure a whitelist of CIDR blocks, add the CIDR blocks of the VSwitch that is created in Step 1 to the whitelist. To configure a whitelist of security groups, add the security group that is created in Step 2 to the whitelist.

    If the destination data is stored in a self-managed database hosted on an ECS instance, you must configure the inbound rules for the security group of this ECS instance. The configured rules allow access from the new security groups or VSwitches. The configuration page is the same as that in Step 3. The only difference is that you must configure inbound rules on the page.

Note
  1. For more information about how to configure a security group, see Add security group rules.
  2. In most cases, access failures are caused by invalid configurations of the security group or whitelist. When these failures occur, check the outbound rules of the security group for the Spark job, the inbound rules of the security group for the ECS instance that stores the destination data, or the whitelist of this ECS instance.

Submit a job

Compile a Spark-Submit script in the serverless Spark engine.12
Note
  1. The value of spark.dla.eni.enable is true, which indicates that VPC is enabled.
  2. The value of spark.dla.eni.vswitch.id is the VSwitch ID obtained from Configure the VSwitch and security group.
  3. The value of spark.dla.eni.security.group.id is the security group ID obtained from Configure the VSwitch and security group.