This solution applies to scenarios where you access a target IP address or domain name on the public network from a MaxCompute user-defined function (UDF), Spark, MapReduce (MR), PyODPS, or Mars job.
Scope
Supported top-level domains: aliyuncs.com, aliyun.com, amap.com, dingtalk.com, alicloudapi.com, cainiao.com, alicdn.com, taobao.com, alibaba.com, alipaydev.com, and alibabadns.com.
IPv6 addresses are not supported. There is no limit on the number of public IP addresses.
If an external network address fails automatic verification, delete the address and resubmit it. If you still need to use the address, submit a ticket to request its configuration. For more information, see Network connection process.
Procedure
Step 1: Prepare accounts and projects
Before you establish network connectivity between MaxCompute and the target service, make sure that the following prerequisites are met.
Create a MaxCompute project. In a data lakehouse scenario, set the data type of the MaxCompute project to a Hive-compatible type.
If you want to access a target service in a VPC, ensure that the following accounts belong to the same Alibaba Cloud account: the VPC owner's account, the Alibaba Cloud account used to access the MaxCompute project, and the administrator account for the target service environment or cluster.
Step 2: Edit external network addresses in project management
You can add and delete common public IP addresses or domain names, such as aliyun.com, on the Project Management page in the MaxCompute console:
Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose .
On the Projects page, find the target project and click Manage in its Actions column.
On the Project Settings page, click the Parameter Configuration tab.
In the MaxCompute External Network section, click Edit.
Specify the external network addresses that MaxCompute can access.
Click Submit.
Step 3: Access public network addresses
When you use a SQL UDF, Spark, or MaxFrame job to access the public network, add the following configurations.
For other types of jobs, adjust the configuration information based on the job type.
SQL user-defined function (UDF) job
The parameter settings are as follows:
--Set the public IP address or domain name and port that you configured in the network connection request form. This is the public IP address or domain name that the following SQL statement will access. --To access multiple domain names or ports, separate them with commas (,). SET odps.internet.access.list=<ip_address:port|realm_name:port>; --Execute the SQL statement to call the UDF. SELECT <UDF_name>("<http://ip_address|realm_name>");ip_address:port | domain_name:port: Required. Specifies the target public IP address or domain name and port.
UDF_name: The UDF that accesses the public IP address or domain name.
The following code provides an example job:
package com.aliyun.odps.test.udf; import com.aliyun.odps.udf.UDF; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URL; public class <UDF_name> extends UDF { public String evaluate(String urlStr) throws IOException { URL url = new URL(urlStr); StringBuilder sb = new StringBuilder(); try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) { String line; while ((line = reader.readLine()) != null) { sb.append(line).append('\n'); } } return sb.toString(); } }For example, if the UDF created from the sample code is named
url_fetch, run the following commands after your network connection request is approved:SET odps.internet.access.list=www.aliyun.com:80; SELECT url_fetch("http://www.aliyun.com");
Spark on MaxCompute job
Parameter settings: Add the following configuration item to the conf file of the Spark client or to the configuration items when you submit a Spark job from DataWorks.
spark.hadoop.odps.cupid.smartnat.enable = true;
spark.hadoop.odps.cupid.internet.access.list=<ip_address:port>MaxFrame job
The parameter settings are as follows:
from maxframe import options
options.sql.settings = {
"odps.internet.access.list": "<host>:80,<host>:443",
}(Optional) Step 4: Add to the whitelist
If access control is enabled on your server, you must add the MaxCompute egress IP addresses to your service whitelist. If a public IP address or domain name fails automatic verification, perform the following steps:
Submit a ticket to request that the public IP addresses or domain names and ports be added to the whitelist.
The MaxCompute technical support team reviews your request and completes the configuration. This process typically takes three business days. After your request is processed, you can proceed with the subsequent steps. If you have any objections to the review result, you can contact the support team again by submitting a ticket.