This topic describes how to access instances in a VPC from Spark on MaxCompute.
VPC access
You can access instances (for example ECS, ApsaraDB for HBase, and ApsaraDB for RDS) in a VPC, or access user-defined private domain names from Spark on MaxCompute.
When you access instances in a VPC from Spark on MaxCompute, add the spark.hadoop.odps.cupid.vpc.domain.list
parameter to the spark-defaults.conf file or the DataWorks file to specify one or more instances. The value of this parameter is in the JSON
format. When you configure this parameter, you must delete the spaces and line breaks
between multiple lines in the parameter and merge JSON text into one line.
spark.hadoop.odps.cupid.vpc.domain.list
to a required value, as described in the following examples. You must replace the
RegionID, VPCID, instance domain name, and port number with actual values. For information
about RegionID of a region, see Regions and zones.
- You must add the CIDR block 100.104.0.0/16 to the whitelist of the service that you want to access.
- If RegionID is cn-shanghai or cn-beijing, you must set
spark.hadoop.odps.cupid.smartnat.enable
to true. - You can access only the services in one VPC in the current region from Spark on MaxCompute.
Example 1: access ApsaraDB for MongoDB
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example. This ApsaraDB for MongoDB instance has a primary
instance and a secondary instance.
{
"regionId":"cn-beijing",
"vpcs":[
{
"vpcId":"vpc-2zeaeq21mb1dmkqh0****",
"zones":[
{
"urls":[
{
"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com",
"port": 3717
},
{
"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com",
"port":3717
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port": 3717},{"domain":"dds-2ze3230cfea08****.mongodb.rds.aliyuncs.com","port":3717}]}]}]}
Example 2: access ApsaraDB for RDS
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
{
"regionId":"cn-beijing",
"vpcs":[
{
"vpcId":"vpc-2zeaeq21mb1dmkqh0****",
"zones":[
{
"urls":[
{
"domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com",
"port": 3306
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"rm-2zem49k73c54z****.mysql.rds.aliyuncs.com","port": 3306}]}]}]}
Example 3: access ApsaraDB for HBase
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
{
"regionId":"cn-beijing",
"vpcs":[
{
"vpcId":"vpc-2zeaeq21mb1dmkqh0exox",
"zones":[
{
"urls":[
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":2181
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16000
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16020
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":2181
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16000
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16020
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":2181
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16000
},
{
"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com",
"port":16020
},
{
"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
"port":16020
},
{
"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
"port":16020
},
{
"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com",
"port":16020
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0exox","zones":[{"urls":[{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":2181},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16000},{"domain":"hb-2zecxg2ltnpeg8me4-master*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020},{"domain":"hb-2zecxg2ltnpeg8me4-cor*-***.hbase.rds.aliyuncs.com","port":16020}]}]}]}
Example 4: access ApsaraDB for Redis
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
{
"regionId":"cn-beijing",
"vpcs":[
{
"vpcId":"vpc-2zeaeq21mb1dmkqh0****",
"zones":[
{
"urls":[
{
"domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com",
"port":3717
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"r-2zebda0d3c05****.redis.rds.aliyuncs.com","port":3717}]}]}]}
Example 5: access LogHub
spark.hadoop. odps.cupid.vpc.domain.list
, as shown in the following example:
{
"regionId":"cn-beijing",
"vpcs":[
{
"zones":[
{
"urls":[
{
"domain":"cn-beijing-intranet.log.aliyuncs.com",
"port":80
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"cn-beijing-intranet.log.aliyuncs.com","port":80}]}]}]}
For the domain parameter, use the LogHub endpoint of the classic network or the VPC endpoint. For the endpoint of each region, see Endpoints.
Example 6: access DataHub
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
{
"regionId":"cn-beijing",
"vpcs":[
{
"zones":[
{
"urls":[
{
"domain":"dh-cn-beijing.aliyun-inc.com",
"port":80
}
]
}
]
}
]
}
Results of merging JSON text into one line:
{"regionId":"cn-beijing","vpcs":[{"zones":[{"urls":[{"domain":"dh-cn-beijing.aliyun-inc.com","port":80}]}]}]}
For the domain parameter, use the ECS endpoint of the classic network as the DataHub endpoint.
Example 7: access a user-defined domain name
a.b.com
in a VPC. An access is initiated from Spark on MaxCompute by using this domain name
and port a.b.com:80
. You must complete the following configurations before the access:
- Associate a zone with a VPC in PrivateZone.
- Click Authorize to grant MaxCompute the read-only access permissions on PrivateZone.
- In Spark node configuration, add the following two parameters:
spark.hadoop.odps.cupid.pvtz.rolearn=acs:ram::xxxxxxxxxxx:role/aliyunodpsdefaultrole spark.hadoop.odps.cupid.vpc.usepvtz=true
The
spark.hadoop.odps.cupid.pvtz.rolearn
parameter indicates your ARN information. You can obtain the information from the RAM console. - In the Spark configuration file, set
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
Results of merging JSON text into one line:{ "regionId":"cn-beijing", "vpcs":[ { "vpcId":"vpc-2zeaeq21mb1dmkqh0****", "zones":[ { "urls":[ { "domain":"a.b.com", "port":80 } ], "zoneId":"9b7ce89c6a6090e114e0f7c415ed****" } ] } ] }
{"regionId":"cn-beijing","vpcs":[{"vpcId":"vpc-2zeaeq21mb1dmkqh0****","zones":[{"urls":[{"domain":"a.b.com","port":80}],"zoneId":"9b7ce89c6a6090e114e0f7c415ed****"}]}]}
Example 8: access HDFS
- To use HDFS, add hdfs-site.xml. The file contains the following content:
<? xml version="1.0"? > <configuration> <property> <name>fs.defaultFS</name> <value>dfs://DfsMountpointDomainName:10290</value> </property> <property> <name>fs.dfs.impl</name> <value>com.alibaba.dfs.DistributedFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.dfs.impl</name> <value>com.alibaba.dfs.DFS</value> </property> </configuration>
- In the Spark configuration file, set
spark.hadoop.odps.cupid.vpc.domain.list
, as shown in the following example:
Results of merging JSON text into one line:{ "regionId": "cn-shanghai", "vpcs": [{ "vpcId": "vpc-xxxxxx", "zones": [{ "urls": [{ "domain": "DfsMountpointDomainName", "port": 10290 }] }] }] }
{"regionId": "cn-shanghai","vpcs": [{"vpcId": "vpc-xxxxxx","zones": [{"urls": [{"domain": "DfsMountpointDomainName","port": 10290}]}]}]}