This topic describes how to access instances in an Alibaba Cloud virtual private cloud (VPC) from Spark on MaxCompute.

Directly access instances in a VPC

You can access instances in a VPC or custom private domain names from Spark on MaxCompute. Instances in a VPC include Elastic Compute Service (ECS) instances, ApsaraDB for HBase instances, and ApsaraDB RDS instances.

If you want to access instances in a VPC from Spark on MaxCompute, you must add the spark.hadoop.odps.cupid.vpc.domain.list parameter to the spark-defaults.conf file of MaxCompute or the related configuration file of DataWorks to specify one or more instances that you want to access. The value of this parameter is in the JSON format. When you configure this parameter, you must remove spaces and line feeds from the text in this parameter and merge JSON text into one line.

The following examples show the configurations of the spark.hadoop.odps.cupid.vpc.domain.list parameter when you access different types of instances. The values of the regionId, vpcId, domain, and port parameters in the following examples are for reference only. For information about the ID of each region, see Project operations.
Important You must add the CIDR block 100.104.0.0/16 to the whitelist of the instance that you want to access.
Examples
  • Access ApsaraDB for MongoDB instances
    The following code shows the value of spark.hadoop.odps.cupid.vpc.domain.list when you access ApsaraDB for MongoDB instances. In this example, a primary instance and a secondary instance are specified.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "vpcId":"vpc-2zeaeq21mb1dmkqh0****",
          "zones":[
            {
              "urls":[
                {
                  "domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com",
                  "port": 3717
                },
                {
                  "domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com",
                  "port":3717
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    {"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port": 3717},{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port":3717}]}]}]}
  • Access an ApsaraDB RDS instance
    The following code shows the value of spark.hadoop.odps.cupid.vpc.domain.list when you access an ApsaraDB RDS instance.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "vpcId":"vpc-2zeaeq21mb1dmkqh0****",
          "zones":[
            {
              "urls":[
                {
                  "domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com",
                  "port": 3306
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    
    {"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com","port": 3306}]}]}]}
  • Access ApsaraDB for HBase instances
    The following code shows the value of spark.hadoop.odps.cupid.vpc.domain.list when you access ApsaraDB for HBase instances.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "vpcId":"vpc-2zeaeq21mb1dmkqh0exox",
          "zones":[
            {
              "urls":[
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":2181
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16000
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":2181
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16000
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":2181
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16000
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                },
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                }, 
                {
                  "domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
                  "port":16020
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    
    {"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0exox","zones":[{"urls":[{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020}]}]}]}
  • Access an ApsaraDB for Redis instance
    The following code shows the value of spark.hadoop.odps.cupid.vpc.domain.list when you access an ApsaraDB for Redis instance.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "vpcId":"vpc-2zeaeq21mb1dmkqh0****",
          "zones":[
            {
              "urls":[
                {
                  "domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com",
                  "port":3717
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    
    {"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com","port":3717}]}]}]}
  • Access a LogHub instance
    The following code shows the value of spark.hadoop. odps.cupid.vpc.domain.list when you access a LogHub instance.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "zones":[
            {
              "urls":[
                {
                  "domain":"cn-beijing-intranet.log.aliyuncs.com",
                  "port":80
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    
    {"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"cn-beijing-intranet.log.aliyuncs.com","port":80}]}]}]}

    Set the domain parameter to the classic network endpoint or the VPC endpoint of the LogHub instance. For the endpoint of each region, see Endpoints.

  • Access a DataHub instance
    The following code shows the value of spark.hadoop.odps.cupid.vpc.domain.list when you access a DataHub instance.
    {
      "regionId":"cn-beijing",
      "vpcs":[
        {
          "zones":[
            {
              "urls":[
                {
                  "domain":"dh-cn-beijing.aliyun-inc.com",
                  "port":80
                }
              ]
            }
          ]
        }
      ]
    }
    Results of merging JSON text into one line:
    
    {"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"dh-cn-beijing.aliyun-inc.com","port":80}]}]}]}

    Set the domain parameter to the ECS endpoint on the classic network.

  • Access a custom domain name
    In this example, the custom domain name example.aliyundoc.com is configured in a VPC. Spark on MaxCompute accesses the domain name by using example.aliyundoc.com:80, which is a combination of the domain name and a port number. Perform the following operations before you access the domain name:
    1. Associate a zone with the VPC in PrivateZone.
    2. On the Cloud Resource Access Authorization page in the RAM console, click Confirm Authorization Policy to grant MaxCompute the read-only permissions on PrivateZone.
    3. Add the following parameters to the configurations of the Spark node:
      spark.hadoop.odps.cupid.pvtz.rolearn=acs:ram::xxxxxxxxxxx:role/aliyunodpsdefaultrole 
      spark.hadoop.odps.cupid.vpc.usepvtz=true

      The spark.hadoop.odps.cupid.pvtz.rolearn parameter specifies the Alibaba Cloud Resource Name (ARN), which can be obtained from the RAM console.

    4. Add the spark.hadoop.odps.cupid.vpc.domain.list parameter to the configuration file of your Spark job. The following code shows the value of this parameter:
      {
        "regionId":"cn-beijing",
        "vpcs":[
          {
            "vpcId":"vpc-2zeaeq21mb1dmkqh0****",
            "zones":[
              {
                "urls":[
                  {
                    "domain":"example.aliyundoc.com",
                    "port":80
                  }
                ],
                "zoneId":"9b7ce89c6a6090e114e0f7c415ed****"
              }
            ]
          }
        ]
      }
      Results of merging JSON text into one line:
      
      {"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"example.aliyundoc.com","port":80}],"zoneId":"9b7ce89c6a6090e114e0f7c415ed****"}]}]}
  • Access an HDFS instance
    • Add the hdfs-site.xml file to enable HDFS support. Sample configurations in the file:
      <?xml version="1.0"?>
      <configuration>
          <property>
              <name>fs.defaultFS</name>
              <value>dfs://DfsMountpointDomainName:10290</value>
          </property>
          <property>
              <name>fs.dfs.impl</name>
              <value>com.alibaba.dfs.DistributedFileSystem</value>
          </property>
          <property>
              <name>fs.AbstractFileSystem.dfs.impl</name>
              <value>com.alibaba.dfs.DFS</value>
          </property>
      </configuration>
    • Add the spark.hadoop.odps.cupid.vpc.domain.list parameter to the configuration file of your Spark job. The following code shows the value of this parameter:
      {
          "regionId": "cn-shanghai",
          "vpcs": [{
              "vpcId": "vpc-xxxxxx",
              "zones": [{
                  "urls": [{
                      "domain": "DfsMountpointDomainName",
                      "port": 10290
                  }]
              }]
          }]
      }
      Results of merging JSON text into one line:
      
      {"regionId": "cn-shanghai","vpcs": [{"vpcId": "vpc-xxxxxx","zones": [{"urls": [{"domain": "DfsMountpointDomainName","port": 10290}]}]}]}

Access instances over VPCs

Compared with the direct access method described in Directly access instances in a VPC, this access method provides high stability and better performance. In addition, this access method supports Internet access.

When you use this access method, take note of the following points:
  • You can use this access method to access instances in a VPC. If your Spark job needs to access instances across multiple VPCs at the same time, you can establish connections between the VPC that you have accessed and other VPCs.
  • For a Spark job that runs in a MaxCompute project, the user ID (UID) of the Alibaba Cloud account that owns the MaxCompute project must be the same as the UID of the Alibaba Cloud account that owns the VPC. Otherwise, the following error message appears: You are not allowed to use this vpc - vpc owner and project owner must be the same person.

For more information about how to establish a VPC connection, see Network connection process.