This topic describes the data stores that support connectivity testing and how to troubleshoot common connectivity testing failures.

Support for connectivity testing

The following table describes the support for connectivity testing by various data stores. If the connectivity test fails for a data store connection added in connection string mode, the possible causes are as follows:
  • The data store is not started. Check whether the data store is started.
  • DataWorks cannot access the network where the data store resides. Make sure that the network where the data store resides is connected to Alibaba Cloud.
  • DataWorks is prohibited to access the network where the data store resides by a network firewall. Add the IP addresses or Classless Inter-Domain Routing (CIDR) blocks used by DataWorks to a whitelist. For more information, see Configure a whitelist.
  • The domain name of the data store cannot be resolved. Make sure that the domain name of the data store can be resolved properly.
  • The default resource group is used but the data store is deployed in a Virtual Private Cloud (VPC) or an Internet data center (IDC). Use a custom resource group or an exclusive resource group to guarantee network connectivity. In this case, connectivity testing is not supported. Whether a sync node can be run depends on the selected resource group.
Data store Data store type Network type Connectivity testing
MySQL ApsaraDB Classic network Supported
VPC Supported
Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on Elastic Compute Service (ECS) Classic network Supported
VPC Not supported
SQL Server ApsaraDB Classic network Supported
VPC Supported
Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
PostgreSQL ApsaraDB Classic network Supported
VPC Supported
Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
Oracle Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
DRDS ApsaraDB Classic network Supported
VPC Coming soon
HybridDB for MySQL ApsaraDB Classic network Supported
VPC Supported
AnalyticDB for PostgreSQL ApsaraDB Classic network Supported
VPC Supported
MaxCompute ApsaraDB Classic network Supported
AnalyticDB for MySQL 2.0 ApsaraDB Classic network Supported
VPC Not supported
OSS ApsaraDB Classic network Supported
VPC Supported
Hadoop Distributed File System (HDFS) Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
FTP Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
MongoDB ApsaraDB Classic network Supported
VPC Not supported
Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
Memcache ApsaraDB Classic network Supported
VPC Not supported
Redis ApsaraDB Classic network Supported
VPC Not supported
Connection string mode Not supported if an internal endpoint is used
User-created data store hosted on ECS Classic network Supported
VPC Not supported
Table Store ApsaraDB Classic network Supported
VPC Not supported
Datahub ApsaraDB Classic network Supported
VPC Not supported

If you specify an internal endpoint in a VPC when adding a connection for a data store, connectivity testing is not supported. You can save the settings without clicking Test Connection. You must select a custom resource group or an exclusive resource group when creating a sync node that uses the connection. In addition, you can create the sync node only in the code editor. For more information, see Exclusive resource groups for Data Integration instances, Add a custom resource group.

Connectivity testing scenarios

This section describes the scenarios of connectivity testing by using relational databases as an example.
  • Data store deployed in a local IDC
    • Public network access available: Connectivity testing is supported for a connection to such a data store. You must add the connection based on the JDBC URL. When you add the connection based on the JDBC URL, check the network reachability and whitelist settings to make sure that the resource group for running sync nodes can access the data store over the public network. If you use a public endpoint, pay attention to the data transfer cost for the public network. For more information, see Public network traffic of Data Integration instances.
    • Public network access unavailable: Connectivity testing is not supported for a connection to such a data store. You can create sync nodes that use the connection only in the code editor. You can add the connection based on the JDBC URL. If you have connected the local IDC to a VPC, purchase an exclusive resource group for data integration and submit a ticket. For more information, see Exclusive resource groups for Data Integration instances. You can also upgrade your DataWorks to Professional Edition and run sync nodes on a custom resource group. For more information, see Add a custom resource group.
  • User-created data store hosted on ECS
    • Public network access available: Connectivity testing is supported for a connection to such a data store. You must add the connection based on the JDBC URL. When you add the connection based on the JDBC URL, check the network reachability and whitelist settings to make sure that the resource group for running sync nodes can access the data store over the public network. If you use a public endpoint, pay attention to the data transfer cost for the public network. For more information, see Public network traffic of Data Integration instances.
    • Data store that resides on a classic network:
      • If the DataWorks workspace and the data store are in the same region, connectivity testing is supported, and you can add the connection based on the JDBC URL. You can run sync nodes on the default resource group, which is not recommended.
      • If the DataWorks workspace and the data store are in different regions, connectivity testing is not supported. You can add the connection based on the JDBC URL. In this case, you must run the sync nodes that uses the connection on a custom resource group and can create the sync nodes only in the code editor.
      • If the user-created data store is hosted on ECS instances that reside on a classic network, network connectivity is not guaranteed when sync nodes are run on the default resource group. We recommend that you run the sync nodes on a custom resource group. If you use a custom resource group or the connectivity test fails, you must create the sync nodes in the code editor.
      • We recommend that you migrate the data store to a VPC.
    • Data store that resides in a VPC and uses an internal endpoint: Connectivity testing is not supported. You can add the connection based on the JDBC URL. In this case, you must run the sync nodes that use the connection on a custom resource group or an exclusive resource group for data integration and can create the sync nodes only in the code editor.
  • Alibaba Cloud services
    • Connection added in instance mode:
      • DataWorks automatically delivers the endpoints for connections added in instance mode for Apsara DB for POLARDB, Distributed Relational Database Service (DRDS), HybridDB for MySQL, AnalyticDB for PostgreSQL, AnalyticDB for MySQL 3.0, ApsaraDB RDS for MySQL, ApsaraDB RDS for PostgreSQL, and ApsaraDB RDS for SQL Server, according to the running status and environment of sync nodes. Connectivity testing is supported for such connections, and you can run the sync nodes on the default resource group.
      • You can also add connections in instance mode for ApsaraDB for Redis, ApsaraDB for MongoDB, and AnalyticDB for MySQ L2.0. However, such connections do not support reverse VPC access or connectivity testing. You must run the sync nodes that use such connections on a custom resource group and can create the sync nodes only in the code editor.
    • Public network access available: Connectivity testing is supported for a connection to such a data store. You must add the connection based on the JDBC URL. We recommend that you add a connection in instance mode preferentially. When you add the connection based on the JDBC URL, check the network reachability and whitelist settings to make sure that the resource group for running sync nodes can access the data store over the public network. If you use a public endpoint, pay attention to the data transfer cost for the public network.
    • Data store that resides on a classic network:
      • If the DataWorks workspace and the data store are in the same region, connectivity testing is supported. You must add the connection based on the JDBC URL.
      • If the DataWorks workspace and the data store are in different regions, connectivity testing is not supported. You can add the connection based on the JDBC URL. In this case, you must run the sync nodes that uses the connection on a custom resource group and can create the sync nodes only in the code editor.
      • We recommend that you add a connection in instance mode preferentially.
    • Data store that resides in a VPC and uses an internal endpoint: Connectivity testing is not supported. You can add the connection based on the JDBC URL. In this case, you must run the sync nodes that use the connection on a custom resource group or an exclusive resource group for data integration and can create the sync nodes only in the code editor. We recommend that you add a connection in instance mode preferentially.

    Three types of endpoints are available for centralized services, such as MaxCompute, Object Storage Service (OSS), and LogHub. You can select one according to your needs.

Note
  • The constraints on endpoints for other data stores such as HDFS, Redis, and MongoDB are the same as those for relational databases.
  • When you select an endpoint for a connection, you must check the node configuration mode (codeless UI or code editor) and selected resource group (default, custom, or exclusive) to make sure that the resource group can access the data store.
  • Considering the characteristics of HBase and HDFS, we recommend that you use a custom resource group or an exclusive resource group for data integration to run sync nodes for these data stores.
  • Connectivity testing is supported for connections to data stores in Finance Cloud, and you can add such connections in instance mode. If the connectivity test fails, run sync nodes on a custom resource group.

Application scenarios of exclusive resources

  • Scenario 1: The data store in a VPC and the DataWorks workspace are in different regions.
    An exclusive resource group for data integration cannot access data stores across VPCs. If your data store and the DataWorks workspace are in different regions, follow these steps:
    1. Create a VPC in the region where the DataWorks workspace resides.
    2. Connect the VPC created in the previous step to the VPC where the data store resides through Cloud Enterprise Network.
    3. Purchase an exclusive resource group for data integration in the same zone as that of the data store and bind the resource group to the created VPC.
    4. Submit a ticket to enable network access.
  • Scenario 2: The data store in a VPC and the DataWorks workspace are in the same region.

    To synchronize data from or to data stores in a VPC, you must purchase an exclusive resource group for data integration in the same zone as that of the data stores and bind the resource group to the VPC where the data stores reside. If the synchronization fails after binding, add the CIDR block of the VPC to the whitelist or security group of the data stores.

Services for enabling network access

  • For more information about how to enable network access through Enterprise Cloud Network, see Enterprise Cloud Network.
  • For more information about how to enable network access through Express Connect, see Express Connect.
  • For more information about how to enable network access through VPN Gateway, see VPN Gateway.

Note on the scheduling cluster

  • Currently, Alibaba Cloud has deployed scheduling clusters in the China (Hangzhou), China (Shenzhen), China (Hong Kong), and Singapore regions. DataWorks assumes that the scheduling cluster deployed in the China (Hangzhou) region is used when checking the network connectivity to your data store. For example, if your MongoDB data store is deployed on a classic network in the China (Beijing) region, DataWorks determines that the scheduling cluster cannot access the data store due to the region difference.
  • The OXS cluster and the ECS cluster cannot communicate with each other through the internal network.

    The scheduling cluster for RDS databases is an OXS cluster. The OXS cluster can communicate with RDS databases in all regions in mainland China through the internal network. An ECS cluster on a classic network serves as the scheduling cluster for other data stores.

    For example, when you synchronize data from an RDS database to a user-created database, the connectivity test can be passed for the connections to both databases. However, during node scheduling, the RDS database uses the OXS cluster to schedule the sync node, whereas the user-created database uses the ECS cluster to schedule the sync node. The ECS cluster cannot access the RDS database, and the synchronization fails. We recommend that you add the connection for the RDS database as a MySQL connection in JDBC URL mode. This guarantees that both databases can be accessed by the ECS cluster, and the synchronization is successful.

View the resource group on which a sync node is run

  • A sync node for an RDS database is scheduled in the OXS cluster.Log
    To determine the resource group running the sync node, check the log details.
    • If the logs contain information similar to the following, the sync node is run on the default resource group:
      running in Pipeline[basecommon_ group_xxxxxxxxx]
    • If the logs contain information similar to the following, the sync node is run on a custom resource group:
      running in Pipeline[basecommon_xxxxxxxxx]
    • If the logs contain information similar to the following, the sync node is run on an exclusive resource group for data integration:
      running in Pipeline[basecommon_S_res_group_xxx]
  • When a sync node for other types of data stores is scheduled in the ECS cluster, the log information is as shown in the following figure.Log
  • When a custom scheduling resource is used as the scheduling cluster, the log information is as shown in the following figure. You can determine whether a custom resource group is used based on the following log information.Log
  • You can go to the testing page of Data Integration and click Run for a sync node to schedule it in the ECS cluster. Sync nodes for an RDS database must be scheduled in the OXS cluster. Therefore, an RDS-related sync node may be run manually, but fails to be run as scheduled. In this case, you must click Test Run on the Scheduling Maintenance page.

Common connectivity test failures

When a connectivity test fails, verify that the region, network type, RDS whitelist, database name, and username are properly configured for the connection. If your connectivity test fails, you can first troubleshooting the failure based on common Data Integration failures. The common connectivity test failures are as follows:
  • The database password is incorrect.
  • The network connection fails, as shown in the following figure.Network connection failure
  • A network error occurs during synchronization.

    Check the logs and determine which resource group incurs the issue. Check whether the problematic resource group is a custom one.

    If so, check whether the CIDR block of the custom resource group is added to the whitelist of the corresponding data store, such as the ApsaraDB for RDS instance.
    Note The CIDR block of the custom resource group must be added to the whitelist of the MongoDB data store.

    Check whether the connectivity test is passed for both connections and whether whitelists of RDS and MongoDB are complete.

    Note If required information is unavailable in the whitelists, sync nodes may fail to be run. If a sync node is delivered to a scheduling server whose IP address has been added to the whitelists, the sync node can be run. Otherwise, the sync node fails to be run.
  • The result shows that a sync node is run but an error is reported, indicating that port 8000 is disconnected.

    This issue occurs because a custom resource group is used and no inbound rule is configured for the IP address 10.116.134.123 and port 8000 in the security group. To resolve the issue, add the IP address and port to the inbound rule of the security group and run the sync node again.

Examples of connectivity test failures

Example 1
  • Symptom

    A data store failed the connectivity test. Database URL: jdbc:mysql://xx.xx.xx.x:xxxx/t_uoer_bradef. Username: xxxx_test. Error message: Access denied for user 'xxxx_test'@'%' to database 'yyyy_demo'.

  • Troubleshooting method
    1. Check whether the information you entered is correct.
    2. Check whether the password is correct, the whitelist is properly configured, and your account has permission to access the database. You can grant the required permissions in the RDS console.
Example 2
  • Symptom
    A data store failed the connectivity test.
    error message: Timed out after 5000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[(xxxxxxxxxx), type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]
  • Troubleshooting method

    Before testing the connectivity to a MongoDB data store that is not deployed in a VPC, you must add related CIDR block to the whitelist of the data store. For more information, see Configure a whitelist.