All Products
Search
Document Center

Realtime Compute for Apache Flink:Network connectivity

Last Updated:May 14, 2025

By default, Realtime Compute for Apache Flink cannot access the Internet. This topic provides answers to frequently asked questions about Internet access, cross-VPC access, domain name resolution, and network connectivity testing.

How do I troubleshoot network issues?

A Realtime Compute for Apache Flink workspace is deployed in a VPC. You cannot change the VPC that you select when you purchase a Realtime Compute for Apache Flink workspace. If the source or sink is not in the same VPC as the Realtime Compute for Apache Flink workspace, the source or sink is disconnected from the workspace and data cannot be read from the source or written to the sink. If data cannot be read from the source or written to the sink, perform the following steps to check whether a network issue exists:

  1. Check the network connectivity between the upstream and downstream storage services and the Flink workspace. You can test the network connectivity in the Flink console. For more information, see How do I perform network detection?.

    By default, Realtime Compute for Apache Flink can access only services that are deployed in the same region and the same VPC as Realtime Compute for Apache Flink. If you want to access resources across VPCs or access Realtime Compute for Apache Flink over the Internet, use the following methods:

  2. Check whether a whitelist is configured for the upstream and downstream storage services. For more information, see How do I configure a whitelist?.

  3. If a network timeout error persists, the network issue may be caused by a connection timeout. You need to increase the value of the connect.timeout parameter in the WITH clause of the data definition language (DDL) statement. The default value of this parameter is 30 seconds.

How do I perform network detection?

Realtime Compute for Apache Flink supports the network detection feature. To use this feature, perform the following steps in the development console of Realtime Compute for Apache Flink:

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Click Console in the Actions column of the target workspace.

  3. Click the Network Detection icon in the upper-right corner of the top navigation bar.

    image

  4. In the Network detection dialog box, configure the Host parameter to specify an IP address or endpoint to check whether the running environment of a Realtime Compute for Apache Flink deployment is connected to the upstream and downstream storage services.

    Important

    When you enter an endpoint, remove :<port> from the endpoint and enter the port number in the Port field of the Network detection dialog box.

    image.png

    If the connect timed out error message appears, check whether the domain name that you want to access is a public domain name or a domain name in another VPC. By default, Realtime Compute for Apache Flink can access only services that are deployed in the same VPC as Realtime Compute for Apache Flink. If you want to access resources across VPCs or access Realtime Compute for Apache Flink over the Internet, see How does Realtime Compute for Apache Flink access a service across VPCs? and How does Realtime Compute for Apache Flink access the Internet?.

How do I obtain the endpoint address of Hologres?

  1. Log on to the Hologres console. On the Instances page, click the target instance.

  2. On the Instance Details page, you can obtain the endpoint address in the Network Information section.

    You can obtain the endpoint address based on your network type.

    Network type

    Scenario

    Specified VPC (recommended)

    A private network that is connected to the specified VPC.

    • Same VPC (recommended): If the Hologres instance and the Realtime Compute for Apache Flink workspace are in the same VPC, they can be directly connected.

    • Different VPCs: If the Hologres instance and the Realtime Compute for Apache Flink workspace are in different VPCs, you need to configure the network to access resources across VPCs. For more information, see Services across VPCs.

    Internet

    A public network that can be used in scenarios without network access restrictions. Compared with internal networks, public networks have uncertain latency.

    You need to use a NAT gateway to connect a VPC to the Internet. For more information, see Configure Internet access.

  3. (Optional) Test the network connectivity in the Flink console. For more information, see How do I perform network detection?.

    • If the network detection is successful, the endpoint address that you obtained is correct.

    • If the network detection fails, check whether the Hologres instance and the Realtime Compute for Apache Flink workspace are in different VPCs or whether Internet access is required. You need to complete the corresponding configuration before you can access the Hologres instance. For more information, see Services across VPCs and Configure Internet access.

How does Realtime Compute for Apache Flink access the Internet?

Compared with internal networks, public networks have uncertain latency. If your business scenario has high requirements for network latency and stability, we recommend that you use internal networks.

Alibaba Cloud provides NAT gateways to connect VPCs to the Internet. This allows Realtime Compute for Apache Flink to access data sources over the Internet. For more information, see Internet data sources.

How do I view the public bandwidth?

If the metric values of the deployment are normal and no backpressure exists in the deployment during data reading or writing over the Internet, you can view the public bandwidth to check whether a bottleneck issue occurs. Perform the following steps:

  1. On the Workspace Details page of the Realtime Compute for Apache Flink console, obtain the VPC ID.

  2. Log on to the VPC console. In the left-side navigation pane, click VPC. On the VPC page, find the desired VPC and click its ID.

  3. In the Resource Management section, click the number next to Internet NAT Gateway.

    Note

    If the number next to Internet NAT Gateway is 0, you need to create an Internet NAT gateway. For more information, see Create and manage an Internet NAT gateway.

  4. On the Internet NAT Gateway page, click the ID of the Internet NAT gateway.

  5. Click the Associated Elastic IP Addresses tab and click the instance name.

  6. On the Internet Access > Elastic IP Addresses page, click Monitoring to view the public bandwidth information.

How does Realtime Compute for Apache Flink access a service across VPCs?

If other services are in the planning stage or can be replaced, we recommend that you directly purchase services in the same VPC as Realtime Compute for Apache Flink. Alternatively, you can release the current Flink workspace and purchase a new workspace in the same VPC as other services.

You can choose an appropriate method to access services across VPCs based on your business requirements. For more information about how to select a solution, see Services across VPCs.

How do I configure a whitelist?

In most cases, the upstream and downstream storage services that are supported by Realtime Compute for Apache Flink do not allow access from external systems. Therefore, you must perform the following steps to add the CIDR block of the vSwitch of Realtime Compute for Apache Flink to the whitelist of the storage system that Realtime Compute for Apache Flink needs to access:

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target Workspace, choose More > Workspace Details.

  3. In the Workspace Details dialog box, view the CIDR Block information of the vSwitch of the workspace.网段

  4. Add the CIDR Block of the vSwitch of the Flink workspace to the whitelist of your target upstream and downstream storage services.

    For example, if you want to configure a whitelist for an ApsaraDB RDS for MySQL database, see Configure an IP address whitelist.

    Note
    • If you add a vSwitch later, you must also add the CIDR block of the new vSwitch to the whitelist of the storage service that Realtime Compute for Apache Flink needs to access.

    • If your vSwitch is not in the same zone as the upstream and downstream storage services, the network can be connected after you add the CIDR block of the vSwitch to the whitelist.

How do I resolve the domain name of the service on which a Flink job depends?

If your Realtime Compute for Apache Flink deployment depends on the domain name of the service, a domain name resolution failure is reported when you migrate the service data to Realtime Compute for Apache Flink. To solve this issue, you can use one of the following methods based on the scenario:

  • You have a self-managed DNS. Flink can connect to the self-managed DNS service over a VPC, and the self-managed DNS can normally resolve domain names.

    In this case, you can perform DNS resolution by using the deployment template of Realtime Compute for Apache Flink. For example, the IP address of your self-managed DNS is 192.168.0.1. Perform the following steps:

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Click Console in the Actions column of the target workspace.

    3. On the Configuration Management page, click the Job Default Configuration tab. In the Other Configuration field, add the following code:

      env.java.opts: >-
        -Dsun.net.spi.nameservice.provider.1=default
        -Dsun.net.spi.nameservice.provider.2=dns,sun
        -Dsun.net.spi.nameservice.nameservers=192.168.0.1
      Note

      If your self-managed DNS has multiple IP addresses, we recommend that you separate the IP addresses with commas (,).

    4. Click Save Changes.

    5. Create and run a job in the development console of Realtime Compute for Apache Flink.

      • If the UnknownHostException error persists, domain names cannot be resolved. In this case, contact Alibaba Cloud for technical support.

      • After I configure self-managed DNS, the deployment frequently fails, and the error message "JobManager heartbeat timeout" appears. For more information about the solution, see Error: JobManager heartbeat timeout.

  • You do not have a self-managed DNS or fully managed Flink cannot connect to the self-managed DNS over a VPC.

    In this case, you must use Alibaba Cloud DNS PrivateZone to resolve domain names. For example, the VPC in which fully managed Flink resides is named vpc-flinkxxxxxxx, and the domain names that your Flink job needs to access are aaa.test.com 127.0.0.1, bbb.test.com 127.0.0.2, and ccc.test.com 127.0.0.3. To resolve the domain names, perform the following steps:

    1. Activate Alibaba Cloud DNS PrivateZone. For more information, see Activate PrivateZone.

    2. Add a zone and use the common suffix of the service that your Flink job needs to access as the zone name. For more information, see Add a zone.

    3. Associate the zone with the VPC in which Realtime Compute for Apache Flink resides. For more information, see Associate or disassociate a VPC.

    4. Add DNS records to the zone. For more information, see Add a DNS record to a PrivateZone zone.结果

    5. Create and run a job in the development console of Realtime Compute for Apache Flink, or stop and restart a historical job.

      If the UnknownHost error persists, domain names cannot be resolved. In this case, contact Alibaba Cloud for technical support.

Note

If a network connection is established between Realtime Compute for Apache Flink and Kafka but a timeout error occurs, see Why does the error message "timeout expired while fetching topic metadata" appear even if a network connection is established between Flink and Kafka?.

Error: JobManager heartbeat timeout

  • Error message

    After I configure self-managed DNS, the deployment frequently fails, and the error message "JobManager heartbeat timeout" appears.

  • Causes

    The network latency to self-managed DNS is high.

  • Solution

    You need to disable domain name resolution for TaskManager (TM) in the job by configuring jobmanager.retrieve-taskmanager-hostname: false. This configuration does not affect the job's ability to connect to external services through domain names. For more information about how to configure this parameter, see How do I configure custom job running parameters?.

Why does the error message "timeout expired while fetching topic metadata" appear even if a network connection is established between Flink and Kafka?

A network connection between Flink and Kafka does not mean that data can be read. Only the endpoint described in the cluster metadata returned by the Kafka broker during the bootstrap process can be used to connect Flink to Kafka and read data from Kafka. For more information, see Flink-cannot-connect-to-Kafka. To check the network connection, perform the following steps:

  1. Use zkCli.sh or zookeeper-shell.sh to log on to the ZooKeeper service that is used by the Kafka cluster.

  2. Run the ls /brokers/ids command to list all Kafka broker IDs.

  3. Run the get /brokers/ids/{your_broker_id} command to view the broker metadata information.

    The endpoint is displayed in listener_security_protocol_map.

  4. Check whether Realtime Compute for Apache Flink can connect to the endpoint.

    If the endpoint contains a domain name, configure the DNS service for Realtime Compute for Apache Flink. For more information about how to resolve domain names, see How do I resolve the domain name of the service on which a Flink job depends?.