All Products
Search
Document Center

MaxCompute:Network connection process

Last Updated:Jan 31, 2024

By default, MaxCompute cannot access a service over the Internet or a virtual private cloud (VPC). To allow access to the service, you must establish a network connection between MaxCompute and the specified object, such as an IP address, a domain name, an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster. This topic describes the network architecture between MaxCompute and the object that you want to access and the supported network connection schemes.

Disclaimer

You can use MaxCompute to establish network connections with services over the Internet or in a VPC free of charge. Before you use MaxCompute to establish a network connection, take note of the following limits:

  • MaxCompute ensures network connectivity. If a failover is triggered by network-related code of users, MaxCompute may rerun tasks on nodes. To resolve this issue, you need to optimize the code. We recommend that you perform only read operations. You need to prevent dirty data from being generated due to repeated write operations.

  • Access requests must be forwarded by a proxy, and the number of requests that can be forwarded by a proxy is limited. We recommend that you use persistent connections and manage the number of nodes. An excessive concurrency or a large number of connections may cause network requests to fail.

  • MaxCompute does not provide guaranteed bandwidth and is not responsible if jobs run at a slow speed.

  • The number of outbound proxy IP addresses is limited. If a connection exception occurs due to the limited number of outbound proxy IP addresses, contact Alibaba Cloud technical support.

  • Outbound proxy IP addresses may change. We recommend that you do not enable access control for the service that you want to access. If you configure an IP address whitelist for the service, access to the service may be denied due to changes of outbound proxy IP addresses.

Important

After you establish a network connection between MaxCompute and the destination service, you may still fail to access the destination service from MaxCompute. This may be caused by network restrictions of the tool where MaxCompute jobs are run. For example, when you use MaxCompute in the DataWorks console to synchronize or cleanse data, you must also establish a network connection between a DataWorks resource group and the destination service, and make sure that DataWorks allows the access from the destination service. If restrictions are configured, you must add the IP address or CIDR block of the destination service to the sandbox whitelist of DataWorks. For more information about network connection and sandbox configurations of DataWorks, see Establish a network connection between a resource group and a data source.

Network connection schemes

The following figure shows the network architecture between MaxCompute and the service that you want to access and the supported network connection schemes.方案

You can use one of the following schemes to allow MaxCompute to access a service:

  • Scheme 1: Access over the Internet

    You can use this scheme if you want to access an IP address or a domain name over the Internet by using user-defined functions (UDFs), Spark, MapReduce, PyODPS, or Mars in MaxCompute. If you use a public IP address or domain name that is commonly used, such as aliyun.com, you can directly add and remove the public IP address or domain name on the Projects page in the MaxCompute console. If the public IP address or domain name fails to pass automatic verification, you can fill out the form to apply for access to the IP address or domain name. If no security limits are imposed on the IP address or domain name that you want to access, you can access the destination IP address or domain name after your application is approved. The review period is three weekdays.

    Note

    If security limits are imposed on the IP address or domain name that you want to access over the Internet, contact the owner of your organization to resolve the issue based on the security limits.

  • Scheme 2: Access over a VPC (dedicated connection)

    You can use this scheme if you want to access an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster that resides in a VPC by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution on MaxCompute. You must authorize MaxCompute to create elastic network interfaces (ENIs) by using the Alibaba Cloud account to which the VPC belongs, and establish a connection between MaxCompute and the VPC in the MaxCompute console. Before you establish the connection, you must configure a security group to allow the connection between MaxCompute and the destination service. The security group specifies the access rules of the ENIs created by MaxCompute. You can view the created ENIs in the MaxCompute console.

    Note
    • If an access control policy is configured for the destination service, you must add the IP addresses of the ENIs or the CIDR block of the vSwitch to the IP address whitelist of the destination service.

    • MaxCompute can access only the VPC whose ID is specified for the dedicated connection with the specified VPC. If you want to access a VPC across regions or another VPC in the region to which the VPC specified for the dedicated connection belongs, you must establish a network connection between the VPC specified for the dedicated connection and another VPC.

  • Scheme 3: Access to specific Alibaba Cloud services

    You can use this scheme if you want to access Alibaba Cloud services such as Object Storage Service (OSS), Data Lake Formation (DLF), Tablestore, and Hologres by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution in MaxCompute. In this scheme, the classic network endpoints of Alibaba Cloud services are used.

    • If you created an OSS or Tablestore external table, you can access OSS or Tablestore by using the internal endpoint of OSS or Tablestore.

    • If you call a UDF to access OSS or Tablestore, you can access OSS or Tablestore by using only the public endpoint of OSS or Tablestore.

    For more information about the configurations and endpoint-based access in different scenarios, see Access to specific Alibaba Cloud services in this topic.

Prerequisites

Before you apply for establishing a network connection between MaxCompute and a service, make sure that the following conditions are met:

  • A MaxCompute project is created. If a MaxCompute project exists, you can use the project without the need to create another project. If you use the data lakehouse solution, we recommend that you set the data type edition for your MaxCompute project to the Hive-compatible data type edition. For more information about how to create a MaxCompute project, see Create a MaxCompute project.

  • If you want to access an object in a VPC, you must make sure that the account of the VPC owner, the account that is used to access the MaxCompute project, and the administrator account of the destination object are the same Alibaba Cloud account or are RAM users that belong to the same Alibaba Cloud account.

Supported regions

The following table describes the regions where a network connection can be established between MaxCompute and an object over the Internet or a VPC.

Scheme

Supported region

Connected object

Access over the Internet

  • China (Beijing)

  • China (Shanghai)

  • China (Zhangjiakou)

  • China (Hangzhou)

  • China (Shenzhen)

  • China (Chengdu)

  • China (Hong Kong)

  • Singapore

  • Malaysia (Kuala Lumpur)

  • Germany (Frankfurt)

  • US (Virginia)

Public IP address or domain name

Access over a VPC (dedicated connection)

  • China (Beijing)

  • China (Shanghai)

  • China (Zhangjiakou)

  • China (Hangzhou)

  • China (Shenzhen)

  • China (Hong Kong)

  • Singapore

  • Malaysia (Kuala Lumpur)

  • Germany (Frankfurt)

  • US (Virginia)

  • IP address or domain name of a VPC

  • ApsaraDB RDS

  • ApsaraDB for HBase cluster

  • Hadoop cluster

Access over the Internet

Manage a public IP address or domain name on the Projects page

If you use a public IP address or domain name that is commonly used, such as aliyun.com, you can directly add and remove the public IP address or domain name on the Projects page in the MaxCompute console. To manage a public IP address or domain name, perform the following steps:

  1. Log on to the MaxCompute console. In the top navigation bar, select a region.

  2. In the left-side navigation pane, click Projects.

  3. On the Projects page, find the desired project and click Manage in the Actions column.

  4. In the Outbound Internet section of the Parameter Configuration tab, add the desired public IP address or domain name.

  5. Click Submit.

Note
  • The following Top-level domains (TLDs) are supported: aliyuncs.com, aliyun.com, amap.com, dingtalk.com, alicloudapi.com, cainiao.com, alicdn.com, taobao.com, alibaba.com, alipaydev.com, and alibabadns.com.

  • You cannot configure IPv6 addresses. The number of public IP addresses is not limited.

  • If the public IP address or domain name fails to pass automatic verification, you must remove the public IP address or domain name and add another public IP address or domain name. If you still need to use the public IP address or domain name that fails to pass automatic verification, you must fill out an application form. For more information, see Access application.

Access application

To allow MaxCompute to access a public IP address or domain name that fails to pass automatic verification, perform the following steps:

  1. Submit an application form to apply for configuring an IP address whitelist.

  2. Enter the destination IP address or domain name and the port number in the form. If you want to add multiple IP addresses or domain names and port numbers, separate them with commas (,). For example, if you want to access an Alibaba Cloud domain name, provide the network configuration information www.aliyun.com:80. If you want to access the AMAP service, provide the network configuration information restapi.amap.com:443,restapi.amap.com:80.

  3. After the MaxCompute technical support team receives the application, the team reviews and completes the network configuration. After the review is passed, you can proceed with the subsequent steps. The review requires approximately 3 weekdays. If you have a question about the review result, you can click the application link or search for the DingTalk group ID 11782920 to join the DingTalk group of the MaxCompute developer community to provide feedback.

SQL statements that call UDFs

  • Perform the following configurations:

    -- Set the odps.internet.access.list property to the public IP address or domain name and port number that are specified in the network connection application form. Specify the public IP address or domain name and port number in the following SQL statement. 
    -- If you want to add multiple IP addresses or domain names and port numbers, separate them with commas (,).
    set odps.internet.access.list=<ip_address:port|realm_name:port>;
    -- Execute the following SQL statement to call a UDF: 
    select <UDF_name>("<http://ip_address|realm_name>");
  • ip_address:port | realm_name:port: required. The public IP address or domain name and port number that you want to access.

  • UDF_name: the name of the UDF that you use to access the public IP address or domain name.

  • The following sample code provides an example of the UDF code.

    package com.aliyun.odps.test.udf;
    import com.aliyun.odps.udf.UDF;
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.net.URL;
    public class <UDF_name> extends UDF {
        public String evaluate(String urlStr) throws IOException {
            URL url = new URL(urlStr);
            StringBuilder sb = new StringBuilder();
            try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) {
                String line;
                while ((line = reader.readLine()) != null) {
                    sb.append(line).append('\n');
                }
            }
            return sb.toString();
        }
    }
  • The UDF that is created based on the sample UDF code is named url_fetch. After the network connection application is approved, execute the following statements:

    set odps.internet.access.list=www.aliyun.com:80;
    select url_fetch("http://www.aliyun.com");

MaxCompute Spark task

Add the following configuration to the conf file of the Spark client or the configuration of the Spark job that is submitted by DataWorks.

spark.hadoop.odps.cupid.smartnat.enable = true;
spark.hadoop.odps.cupid.internet.access.list=<ip_address:port>

IP address whitelist

If access control is configured for the service that you want to access, you must add the public outbound IP address of MaxCompute to the IP address whitelist of the service. You can search for the DingTalk group ID 11782920 to join the DingTalk group of the MaxCompute technical team and obtain the outbound IP address.

Access over a VPC (dedicated connection)

  1. Establish a dedicated network connection

    Procedure

    1. Perform authorization.

      • Grant the logon user the permission to establish network connections. The logon user must be the project owner or the user that is assigned the Super_Administrator role or Admin role at the tenant level. For more information about network connections, see Network connection. For more information about the roles, see Role planning. For more information about authorization, see Permissions on objects in a tenant.

      • Authorize MaxCompute to create ENIs in the VPC. This way, MaxCompute can be connected to the VPC. To grant the permissions, you need to use an Alibaba Cloud account to log on to the Alibaba Cloud Management Console, visit the Cloud Resource Access Authorization page, and click Confirm Authorization Policy.

        Note

        For more information about the billing rules when you access a VPC from MaxCompute, see Billing and pricing.

    2. Configure a security group. Create a security group for MaxCompute to manage the access of MaxCompute to various resources in the VPC.

      1. Log on to the VPC console. On the VPCs page, click the ID of the destination VPC. On the page that appears, click the Resources tab.

      2. In the VPC Resources section of the Resources tab, move the pointer over the value of Security Group and click Add. On the Security Groups page, click Create Security Group to create a security group for MaxCompute and record the ID of the security group. You must create a basic security group, instead of an advanced security group. By default, basic security groups allow outbound traffic, and advanced security groups do not allow outbound traffic. If you use an advanced security group, no objects in the VPC can be accessed. You must select the same VPC as the object that MaxCompute needs to access. For more information about how to create a security group, see Create a security group.

        • Create a security group添加安全组

        • Configure a security group配置安全组

      Note

      By default, MaxCompute automatically creates two ENIs based on the bandwidth requirements. You can use the two ENIs free of charge. The ENIs that are created by MaxCompute belong to the security group that you created. If you want to establish a connection between MaxCompute and an ApsaraDB for HBase cluster but access to the ApsaraDB for HBase cluster is not allowed in the security group, you can add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster. The IP addresses of the ENIs may change. We recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the whitelist. To obtain the IP addresses of the ENIs, perform the following operations: Log on to the Elastic Compute Service (ECS) console. In the left-side navigation pane, click ENIs in the Network & Security section to view the IP addresses of the ENIs.

    3. Establish a network connection between MaxCompute and the destination VPC

      An Alibaba Cloud account or a RAM user that is assigned the tenant-level Super_Administrator or Admin role can establish a connection between MaxCompute and a VPC in the MaxCompute console. For more information about the roles, see Role planning. To establish a connection between MaxCompute and a VPC, perform the following steps:

      1. Log on to the MaxCompute console.

      2. In the left-side navigation pane, choose Tenants > Network Connection > Add Network Connection.

      3. In the Add Network Connection dialog box, configure the parameters and click OK. The following table describes the parameters.新增网络连接

        Parameter

        Description

        Network Connection Name

        The name of the custom network connection. The name must meet the following format requirements:

        • Start with a letter.

        • Contain only letters, underscores (_), and digits.

        • Contain 1 to 63 characters in length.

        Type

        The network connection type. Default value: Passthrough.

        Note

        The default value indicates that the VPC connection scheme is used.

        Region

        The region in which you can use the VPC connection scheme to establish a network connection between MaxCompute and the specified service. For more information, see Supported regions.

        Selected VPC

        The ID of the VPC.

        To obtain the ID of the VPC, perform the following operations:

        • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the VPC ID in the network connection information in the console of the related service.

        • In other cases, you can perform the following operations: Log on to the VPC console. On the VPCs page, view the ID of the desired VPC in the Instance ID/Name column.VPC实例

        vSwitch ID

        The ID of the vSwitch to which the VPC belongs.

        To obtain the ID of the VPC, perform the following operations:

        • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the VPC ID in the network connection information in the console of the related service.

        • In other cases, you can perform the following operations: Log on to the VPC console. In the left-side navigation pane, click vSwitch. On the vSwitch page, click the name of the desired VPC. On the page that appears, view the vSwitch ID in the vSwitch Basic Information section.

        Security Group

        The ID of the security group that is recorded in the Establish a dedicated network connection step.

    4. Configure the security group of the service that MaxCompute needs to access.

      After the ENI is enabled, you must add rules to the security group of the destination service to allow the MaxCompute security group created in Step 2 to access the destination service by using specific ports, such as port 9200 and port 31000. For example, if you want to access an ApsaraDB RDS instance, you need to add rules to the security group of the ApsaraDB RDS instance to allow access from the security group created in Step 2. If the service that you want to access does not support security groups, and only IP addresses can be added, you need to add the CIDR block of the vSwitch that is used by the destination service.

      • Configure the security group of the Hadoop cluster that MaxCompute needs to access.

        • To ensure that MaxCompute can access a Hadoop cluster, perform the following configurations for the security group of the Hadoop cluster:

          • Add inbound rules to the security group of the Hadoop cluster.

          • Set the authorization object to the security group to which the ENIs belong. In this case, the security group refers to the group that you created in Step 2.

          • Set the port number for the Hive metastore service to 9083.

          • Set the port number for NameNode of Hadoop Distributed File System (HDFS) to 8020.

          • Set the port number for DataNode of HDFS to 50010.

        • For example, if you want to allow MaxCompute to access a Hadoop cluster that is deployed on Alibaba Cloud E-MapReduce (EMR), you must configure the security group rules that are shown in the following figure. For more information, see Add security group rules.配置

      • Configure the security group of an ApsaraDB for HBase cluster.

        • Add the security group that is created for MaxCompute to the security group of the ApsaraDB for HBase cluster or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster.

        • For example, if you want to allow MaxCompute to access an ApsaraDB for HBase cluster, you can perform the following operations: Log on to the ApsaraDB for HBase console. On the Clusters page, click the name of the ApsaraDB for HBase cluster that you want to access in the ID / Name column. In the left-side navigation pane, click Access Control. Then, add the security group of MaxCompute on the Security Group tab or add the IP addresses of the ENIs created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. For more information about how to add a security group or add an IP address to a whitelist, see Configure a whitelist and a security group.

          Note

          If the security group of MaxCompute cannot be added, you can add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. If the MaxCompute configuration is changed, the IP addresses of the ENIs may change. We recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the whitelist.

      • Configure the security group of an ApsaraDB RDS instance.

        • Add the security group that is created for MaxCompute to the security group of the ApsaraDB RDS instance or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB RDS instance.

        • For example, if you want to allow MaxCompute to access an ApsaraDB RDS instance, you can perform the following operations: Log on to the ApsaraDB RDS console. On the Instances page, click the name of the ApsaraDB RDS instance that you want to access in the Instance ID/Name column. In the left-side navigation pane, click Whitelist and SecGroup. Then, add a security group on the Security Group tab or configure an IP address whitelist on the Whitelist Settings tab. For more information about how to add a security group or configure an IP address whitelist, see Configure a security group for an ApsaraDB RDS for MySQL instance or Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.

        Note

        If the MaxCompute configuration is changed, the IP addresses of the ENIs may change. We recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the whitelist.

  2. SQL job

    Add configuration items

    • For more information about the configurations for MaxCompute to access resources in a VPC by using UDFs, see Use UDFs to access resources in VPCs. The following code shows an example:

      -- Configure the name of the network connection that you established based on the VPC connection scheme. This setting is valid only for the current session.
      set odps.session.networklink=testLink;
    • Access resources in a VPC by using external tables. The following code shows an example:

      -- Configure parameters in the table creation statement.
       TBLPROPERTIES(
      'networklink'='<networklink_name>')
    • For more information about how to establish a network connection by using the data lakehouse solution, see Lakehouse of MaxCompute.

  3. MaxCompute Spark job

    • After you establish a dedicated network connection between MaxCompute and a VPC, add the following configurations to the SQL statement:

    • Add the following configurations to the SQL statement before you run a Spark job. The configurations allow MaxCompute to access objects in the destination VPC by using an ENI. For more information, see Access instances in a VPC from Spark on MaxCompute.

      • spark.hadoop.odps.cupid.eni.enable = true

      • spark.hadoop.odps.cupid.eni.info=regionid:vpc id

  4. IP address whitelist

    If access control is configured for the object that you want to access, you must add the security group for the established dedicated network connection to the IP address whitelist of the object.

Access to specific Alibaba Cloud services

You can use this scheme if you want to access Alibaba Cloud services such as Object Storage Service (OSS), Data Lake Formation (DLF), Tablestore, and Hologres by using SQL statements, UDFs, Spark, MapReduce, PyODPS, Mars, external tables, or the data lakehouse solution in MaxCompute. In this scheme, the classic network endpoints of Alibaba Cloud services are used.

Access OSS or Tablestore by using external tables

If you created an OSS or Tablestore external table, you can access OSS or Tablestore by using the internal endpoint of OSS or Tablestore.

  • For more information about the internal endpoints of OSS in each region, see the Internal endpoint column in Regions and endpoints.

  • For more information about the internal endpoints of Tablestore in each region, see Classic network endpoints in Endpoints.

For more information about how to use an external table to access OSS or Tablestore, see Hologres external tables.

Access OSS or Tablestore by calling UDFs

If you want to access OSS or Tablestore by calling UDFs, you must use the public endpoint of OSS or Tablestore and add the public endpoint of OSS or Tablestore to the whitelist of MaxCompute.

  1. Add the public endpoint of OSS or Tablestore to the whitelist of MaxCompute.

    You can submit an application to add the public endpoint of OSS or Tablestore to the whitelist of MaxCompute. References for the public endpoint of OSS or Tablestore in each region:

    • Public endpoint of OSS in each region: Public endpoint column in Regions and endpoints.

    • Public endpoint of Tablestore in each region: Public endpoints in Endpoints.

  2. Use the public endpoint to access OSS or Tablestore.

    For more information about examples of Spark jobs in MaxCompute, see Access OSS from Spark on MaxCompute.