By default, MaxCompute cannot access a service over the Internet or over a virtual private cloud (VPC). To allow the access, you must establish a network connection between MaxCompute and the specified object, such as an IP address, an endpoint, an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster. This topic describes the network architecture between MaxCompute and the object that you want to access and the supported network connection schemes.

Background information

If you want to access an object over the Internet or over a VPC by using one of the following methods, you must apply for a network connection between MaxCompute and the object.
The following figure shows the network architecture between MaxCompute and the object that you want to access and the supported network connection schemes. Network architecture
You can use one of the following schemes to allow MaxCompute to access an object:
  • Service mapping scheme
    The service mapping scheme varies based on the type of the network in which the object you want to access resides.
    • Service mapping scheme (Internet)
      You can use this scheme if you want to access a public IP address or a public endpoint by calling a UDF or by using an external table. If you want to use this scheme, you must submit a ticket to send an application. If no security limits are imposed on the destination IP address or endpoint, you can access the IP address or endpoint after the application is approved.
      Note If security limits are imposed on the destination IP address or endpoint, contact the MaxCompute technical support team to handle this issue based on the security limits.
    • Service mapping scheme (VPC)

      You can use this scheme if you want to access an IP address or endpoint in a VPC by calling a UDF or by using an external table when MaxCompute is connected to the VPC. You need to only add the CIDR blocks of the region in which your MaxCompute project resides to the security group of the VPC and add the VPC in which the destination IP address or endpoint resides to your MaxCompute project. After you grant mutual access between MaxCompute and the VPC, MaxCompute can access the destination IP address or endpoint in the VPC.

  • VPC connection scheme

    You can use this scheme if you want to access an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster that resides in a VPC by using an external table, calling a UDF, or using the data lakehouse solution. If you want to use this scheme, you must perform authorization and configure security groups in the VPC console. You must also establish a network connection between MaxCompute and the VPC and configure security groups for services such as an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster in the MaxCompute console to establish a network connection between MaxCompute and the destination service.

  • Direct connection scheme
    You can use this scheme if you want to access Alibaba Cloud Object Storage Service (OSS) or Tablestore by calling a UDF or by using an external table. If you want to establish a network connection between MaxCompute and OSS or Tablestore, you do not need to apply for activating the VPC service.
    Note
    • If you have created an OSS or Tablestore external table, you can access OSS or Tablestore by using the internal endpoint of OSS or Tablestore.
    • If you call a UDF to access OSS or Tablestore, you can access OSS or Tablestore by using only the public endpoint of OSS or Tablestore. Before you access OSS or Tablestore, you must add the public IP addresses of OSS or Tablestore to an IP address whitelist of MaxCompute to allow MaxCompute to access OSS or Tablestore over the Internet. For more information, see Configure an IP address whitelist. For more information about the endpoints of OSS, see Regions and endpoints. For more information about the endpoints of Tablestore, see Endpoint.

Prerequisites

Before you use the service mapping scheme (Internet) or VPC connection scheme to apply for a network connection between MaxCompute and an object, make sure that the following conditions are met:
  • A MaxCompute project is created.

    If a MaxCompute project already exists, you can use the project without the need to create another project. If you use the data lakehouse solution, we recommend that you set the data type edition for your MaxCompute project to the Hive-compatible data type edition. For more information about how to create a MaxCompute project, see Create a MaxCompute project.

  • If you want to access a service in a VPC, you must obtain the Alibaba Cloud account to which the VPC belongs. You must also obtain the Alibaba Cloud account that is used to access the MaxCompute project and the administrator account of the destination service or cluster.

Limits

  • If you use the data lakehouse solution, only the Alibaba Cloud account to which the VPC belongs can create MaxCompute projects.
  • When you use MaxCompute to access an ApsaraDB RDS instance, an ApsaraDB for HBase cluster, or a Hadoop cluster that resides in a VPC, make sure that the MaxCompute project is in the same region as this VPC.

Supported regions

The following table describes the regions in which you can use the service mapping scheme or VPC connection scheme to establish a network connection between MaxCompute and the specified object.
Scheme Region Connected object
Service mapping scheme (Internet) All regions at the China site (aliyun.com) Public IP address or endpoint
Service mapping scheme (VPC)
  • China (Beijing)
  • China (Shanghai)
IP address or endpoint of a VPC
VPC connection scheme
  • China (Beijing)
  • China (Shanghai)
  • China (Zhangjiakou)
  • China (Hangzhou)
  • China (Shenzhen)
  • IP address or endpoint of a VPC
  • ApsaraDB RDS instance
  • ApsaraDB for HBase cluster
  • Hadoop cluster

Service mapping scheme (Internet)

To allow MaxCompute to access a public IP address or a public endpoint, perform the following operations:
  1. Submit an application.

    Submit a ticket to apply for the configuration of an IP address whitelist that contains the public IP addresses or endpoints and the port numbers you want to add.

    You must enter the destination IP address or endpoint and the port number in the ticket. If you want to add multiple IP addresses or endpoints and port numbers, separate them with commas (,). For example, if you want to access an Alibaba Cloud endpoint, provide the network configuration information www.aliyun.com:80. If you want to access the AMAP service, provide the network configuration information restapi.amap.com:443,restapi.amap.com:80.

    After the MaxCompute technical support team receives the application, the team reviews and completes the network configuration. After the ticket is processed, proceed with subsequent steps.

  2. Test network connectivity.

    Log on to the MaxCompute client, configure the odps.internet.access.list property, and execute the SQL statement that calls a UDF to access the destination IP address or endpoint. Syntax:

    -- Set the odps.internet.access.list property to the public IP address or endpoint and port number that are specified in the network connection application form. Include the public IP address or endpoint and port number in the following SQL statement. 
    set odps.internet.access.list=<ip_address:port | realm_name:port>; 
    -- Execute the following SQL statement to call a UDF. 
    select <UDF_name>("<http://ip_address | realm_name>");
    • ip_address:port | realm_name:port: required. The public IP address or endpoint and port number that you want to access.
    • UDF_name: the name of the UDF that you use to access the public IP address or endpoint.

      The following example shows the UDF code.

      package com.aliyun.odps.test.udf;
      import com.aliyun.odps.udf.UDF;
      import java.io.BufferedReader;
      import java.io.IOException;
      import java.io.InputStreamReader;
      import java.net.URL;
      public class <UDF_name> extends UDF {
        public String evaluate(String urlStr) throws IOException {
          URL url = new URL(urlStr);
          StringBuilder sb = new StringBuilder();
          try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) {
            String line;
            while ((line = reader.readLine()) != null) {
              sb.append(line).append('\n');
            }
          }
          return sb.toString();
        }
      }
    The built-in UDF that is created based on the sample UDF code is named url_fetch. After the network connection application is approved, execute the following statements:
    set odps.internet.access.list=www.aliyun.com:80;
    select url_fetch("http://www.aliyun.com");

Service mapping scheme (VPC)

To allow MaxCompute to access a service in a VPC, perform the following operations:
  1. Configure a whitelist.
    1. Add MaxCompute CIDR blocks to a security group of the VPC that you want to access.

      You must add the MaxCompute CIDR blocks of the region in which you want to establish a network connection to the security group of the VPC to allow the IP addresses in the MaxCompute CIDR blocks to access the services in the VPC. The following table lists the CIDR blocks. For more information about how to configure VPC security groups, see VPC security groups.

      Region MaxCompute CIDR block
      China (Shanghai) 100.104.49.64/26, 100.104.212.192/26, 100.104.244.0/26, and 100.104.94.0/26
      China (Beijing) 100.104.218.0/26, 100.104.120.0/26, 100.104.156.192/26, 100.104.149.0/26, 100.104.49.64/26, 100.104.212.192/26, 100.104.244.0/26, 100.104.94.0/26
    2. Log on to the MaxCompute client. Then, add the VPC to the whitelist of MaxCompute to allow the services in the VPC to access MaxCompute. Syntax:
      -- Add the destination VPC to the whitelist of MaxCompute. 
      setproject odps.security.outbound.destination=<RegionID>_<VPC ID>[*];  
      • RegionID: required. The ID of the region to which the VPC belongs. For more information about how to obtain region IDs, see Obtain the ID of the region to which the VPC belongs and the ID of the VPC.
      • VPC ID: required. The ID of the VPC. You can log on to the VPC console and obtain the ID of the destination VPC from the Instance ID/Name column on the VPCs page.
      • [*]: required. A wildcard, which indicates that all the IP addresses and port numbers under the VPC are added to a whitelist of MaxCompute. The brackets ([]) in the preceding command cannot be omitted.
      For example, if the ID of the destination VPC is vpc-bp1e4p7feyvwt103j**** and the region is China (Shanghai), you can run the following command to add all IP addresses and port numbers in the VPC to the whitelist of MaxCompute.
      setproject odps.security.outbound.destination=cn-shanghai_vpc-bp1e4p7feyvwt103j****[*];
  2. Test network connectivity.

    Use the MaxCompute client to configure the following properties and submit the SQL statement that is used to call a UDF to access the destination IP address in the VPC.

    For example, if the IP address of the VPC that you want to access is 192.0.2.0 and the port number of the VPC is 80, you can run the following commands to test network connectivity.

    -- Specify the ID of the VPC that you want to access. 
    set odps.vpc.id=vpc-bp1e4p7feyvwt103j****; 
    -- Specify the IP address and port number of the VPC that you want to access. 
    set odps.vpc.access.ips=192.0.2.0:80; 
    -- Call the UDF to access the destination IP address and port number.      
    select url_fetch("http://192.0.2.0:80");   

VPC connection scheme

To use the VPC connection scheme, perform the following operations:

  1. Perform authorization.

    Log on to the Alibaba Cloud Management Console by using the Alibaba Cloud account to which the VPC belongs, click Authorization, and then click Confirm Authorization Policy to complete the authorization.

    Grant the Alibaba Cloud account the permissions to allow MaxCompute to create and bind elastic network interfaces (ENIs) in a VPC security group. After the authorization is complete, MaxCompute automatically creates ENIs in the VPC.

  2. Configure a security group.

    You can create a security group for MaxCompute to manage the access of MaxCompute to various resources in the VPC.

    1. Log on to the VPC console. On the VPCs page, click the ID of the destination VPC. On the page that appears, click the Resources tab. VPC
    2. In the VPC Resources section of the Resources tab, move the pointer over the value of Security Group and click Add. On the Security Groups page, click Create Security Group to create a security group for MaxCompute and record the security group ID.

      The security group that you created must be a basic security group. Therefore, you must select the same VPC as the service that MaxCompute needs to access. For more information about how to create a security group, see Create a security group.

      • Create a security groupAdd
      • Configure a security group

        You can add inbound and outbound rules to the security group based on the default configurations.

        Configure a security group
      Note
      • You must create a basic security group, instead of an advanced security group. By default, basic security groups allow outbound traffic. By default, advanced security groups do not allow outbound traffic. If you use an advanced security group, no services in the VPC can be accessed.
      • By default, MaxCompute automatically creates two ENIs based on the bandwidth requirements. You can use the two ENIs free of charge. The ENIs that are created by MaxCompute belong to the security group that you created. If you want to establish a connection between MaxCompute and an ApsaraDB for HBase cluster but the security group of MaxCompute cannot be added to the security group of the ApsaraDB for HBase cluster, you can add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster. The IP addresses of the ENIs may change. Therefore, we recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the whitelist.

        To obtain the IP addresses of the ENIs, perform the following operations: Log on to the Elastic Compute Service (ECS) console. In the left-side navigation pane, click ENIs in the Network & Security section to view the IP addresses of the ENIs.

  3. Establish a network connection.
    An Alibaba Cloud account or a RAM user that is assigned the tenant-level Super_Administrator role can establish a connection between MaxCompute and a VPC in the MaxCompute console. You can perform the following operations to establish a connection between MaxCompute and a VPC.
    Note You can assign the tenant-level Super_Administrator role to a user on the Users tab of the MaxCompute console. Only the Alibaba Cloud account or a RAM user that is assigned the tenant-level Super_Administrator role can assign roles to users. For more information, see Add a user (tenant-level).
    1. Log on to the MaxCompute console.
    2. On the Network Links tab of the MaxCompute console, click Create Network Link.
    3. In the Create Network Link dialog box, configure the parameters and click OK. The following table describes the parameters.
      Parameter Description
      Connection Name The name of the custom network connection. The name must meet the following format requirements:
      • Start with a letter.
      • Contain only letters, underscores (_), and digits.
      • Contain 1 to 63 characters in length.
      Type The network connection type. Default value: Passthrough.
      Note The default value indicates that the VPC connection scheme is used.
      Region The region in which you can use the VPC connection scheme to establish a network connection between MaxCompute and the specified object. For more information about the supported regions, see Supported regions.
      VPC ID The ID of the VPC.
      To obtain the ID of the VPC, perform the following operations:
      • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the VPC ID from the network connection information in the console of the service that MaxCompute needs to access.
        For example, if you want to allow MaxCompute to access an ApsaraDB for HBase cluster, you can perform the following operations: Log on to the ApsaraDB for HBase console. On the Clusters page, click the name of the ApsaraDB for HBase cluster that you want to access in the ID / Name column. In the left-side navigation pane, click Database Connection and view the VPC ID in the Connection Information section. Obtain the VPC ID
      • In other cases, you can perform the following operations: Log on to the VPC console. On the VPCs page, view the ID of the desired VPC in the Instance ID/Name column. VPC
      vSwitch ID The ID of the vSwitch to which the VPC belongs.
      To obtain the ID of the vSwitch, perform the following operations:
      • If you want to establish a network connection between MaxCompute and an ApsaraDB for HBase cluster or a Hadoop cluster, you can obtain the vSwitch ID in the network connection information in the console of the related service.

        For example, if you want to allow MaxCompute to access an ApsaraDB for HBase cluster, you can perform the following operations: Log on to the ApsaraDB for HBase console. On the Clusters page, click the name of the ApsaraDB for HBase cluster that you want to access in the ID / Name column. In the left-side navigation pane, click Database Connection and view the vSwitch ID in the Connection Information section.

      • In other cases, you can perform the following operations: Log on to the VPC console. In the left-side navigation pane, click vSwitch. On the vSwitch page, click the name of the desired VPC. On the page that appears, view the vSwitch ID in the vSwitch Basic Information section.
      Security Group The ID of the security group that is recorded in Step 2.
  4. Configure the security group of the service that MaxCompute needs to access.
    • Configure the security group of the Hadoop cluster that MaxCompute needs to access.
      To ensure that MaxCompute can access a Hadoop cluster, configure the following information for the security group of the Hadoop cluster:
      • Add inbound rules to the security group of the Hadoop cluster.
      • Set the authorization object to the security group to which the ENIs belong. The security group is the one you created in Step 2.
      • Set the port number for the Hive metastore service to 9083.
      • Set the port number for NameNode of Hadoop Distributed File System (HDFS) to 8020.
      • Set the port number for DataNode of HDFS to 50010.
      For example, if you want to allow MaxCompute to access a Hadoop cluster that is deployed on Alibaba Cloud E-MapReduce (EMR), you must configure the security group rules that are shown in the following figure. For more information about how to configure security group rules, see Add a security group rule. Configure a firewall or security group
    • Configure the security group of an ApsaraDB for HBase cluster.

      Add the security group that is created for MaxCompute to the security group of the ApsaraDB for HBase cluster or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster.

      For example, if you want to allow MaxCompute to access an ApsaraDB for HBase cluster, you can perform the following operations: Log on to the ApsaraDB for HBase console. On the Clusters page, click the name of the ApsaraDB for HBase cluster that you want to access in the ID / Name column. In the left-side navigation pane, click Access Control. Then, add the security group of MaxCompute on the Security Group tab or add the IP addresses of the ENIs created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. For more information about how to add a security group or add an IP address to a whitelist, see Configure IP address whitelists and security groups.

      Add Security Group
      Note If the security group of MaxCompute cannot be added, you can add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB for HBase cluster on the Whitelist Setting tab. If the MaxCompute configuration is changed, the IP addresses of the ENIs may change. Therefore, we recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the whitelist.
    • Configure the security group of an ApsaraDB RDS instance.

      Add the security group that is created for MaxCompute to the security group of the ApsaraDB RDS instance or add the IP addresses of the ENIs that are created by MaxCompute to a whitelist of the ApsaraDB RDS instance.

      For example, if you want to allow MaxCompute to access an ApsaraDB RDS instance, you can perform the following operations: Log on to the ApsaraDB RDS console. On the Instances page, click the name of the ApsaraDB RDS instance that you want to access in the Instance ID/Name column. In the left-side navigation pane, click Data Security. Then, add a security group on the Security Group tab or create an IP address whitelist on the Whitelist Settings tab. For more information about how to add a security group or configure an IP address whitelist, see Configure a security group for an ApsaraDB RDS for MySQL instance or Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.

      Note If the MaxCompute configuration is changed, the IP addresses of the ENIs may change. Therefore, we recommend that you add the CIDR block of the vSwitch to which the VPC belongs to the IP address whitelist.
  5. Test network connectivity.

    You can test the network connectivity by following the operations in Lakehouse of MaxCompute.