All Products
Search
Document Center

DataHub:FAQ

Last Updated:Aug 08, 2023

Configure the following whitelists in the ApsaraDB RDS console to use DataHub to access an ApsaraDB RDS instance

Region

Whitelist of IP addresses from classic networks

Whitelist of IP addresses from VPCs

China (Hangzhou)

11.197.14.0/24

11.197.15.0/24

100.104.191.0/24

China (Shanghai)

11.217.75.0/24

11.222.38.0/24

11.222.93.0/24

11.223.69.0/24

100.104.136.0/24

China (Beijing)

11.204.155.0/24

11.204.158.0/24

11.204.161.0/24

11.204.162.0/24

11.218.245.0/24

11.220.203.0/24

11.220.204.0/24

11.220.216.0/24

11.220.217.0/24

11.220.237.0/24

11.220.238.0/24

11.220.240.0/24

11.220.242.0/24

11.223.107.0/24

100.104.33.0/24

China (Shenzhen)

11.216.113.0/24

11.217.52.0/24

11.220.54.0/24

11.220.56.0/24

100.104.55.0/24

Singapore

11.216.101.0/24

11.219.129.0/24

100.104.163.0/24

China North 2 Ali Gov

11.199.246.0/24

11.199.247.0/24

100.104.254.0/26

China (Zhangjiakou)

11.218.202.0/24

11.218.203.0/24

100.104.195.0/26

India (Mumbai)

11.207.230.0/24

11.207.231.0/24

11.207.248.0/24

100.104.254.0/26

Malaysia (Kuala Lumpur)

11.204.39.0/24

11.204.40.0/24

11.204.41.0/24

11.48.249.0/24

11.48.250.0/24

100.104.13.0/24

China (Hong Kong)

11.195.192.0/24

100.104.166.0/24

US (Silicon Valley)

11.199.218.0/24

11.199.219.0/24

11.199.229.0/24

100.104.235.0/24

Error related to permissions

Error message:

com.aliyun.datahub.exception.NoPermissionException: No permission, authentication failed in ram

The error message indicates that the Resource Access Management (RAM) user does not have the required permissions. For more information about how to grant permissions to a RAM user, see Access control.

Error related to an ApsaraDB RDS instance in a VPC

Error message:

InvalidInstanceId.NotFound:The instance not in current vpc

Solution:

  1. Call the DescribeDBInstanceAttribute operation to query the details of the ApsaraDB RDS instance.

  2. Click Debug. On the right part of the page that appears, select a region in Region and enter the ID of the instance that you want to manage in DBInstanceId.

  3. Click Call. Find VpcCloudInstanceId in the returned results.

  4. Go to the panel to synchronize data from DataHub to ApsaraDB RDS. Then, enter the obtained instance ID of the VPC in the Instance ID field.

Errors related to JAR package conflicts

If you use DataHub SDK for Java, the following JAR package conflicts may occur:

  • InjectionManagerFactory not found

    • By default, DataHub SDK for Java depends on the jersey client V2.22.1. If you use a jersey client of a version later than V2.22.1, you must add dependencies to the SDK.

  <dependency>
      <groupId>org.glassfish.jersey.inject</groupId>
      <artifactId>jersey-hk2</artifactId>
      <version>xxx</version>
    </dependency>
  • java.lang.NoSuchFieldError: EXCLUDE_EMPTY

    • The version of the jersey-common library is earlier than V2.22.1. We recommend that you use jersey-common library V2.22.1 or later.

  • Error reading entity from input stream

    • Cause 1: The version of the HTTP client is earlier than V4.5.2. Upgrade the version of the HTTP client to V4.5.2 or later.

    • Cause 2: The SDK of the current version does not support specific data types. Upgrade the SDK.

  • jersey-apache-connector of versions later than V2.22.1 contain bugs related to TCP connections.

    • Use V2.22.1.

  • java.lang.NosuchMethodError:okhttp3.HttpUrl.get(java/lang/String:)okhttp3/HttpUrl

    • Run the mvn dependency:tree command to check whether the version of the OkHttp client conflicts with dependencies.

  • javax/ws/rs/core/ResponseStatusFamily

    • Check the dependencies of the javax.ws.rs package. For example, check whether the javax.ws.rs package depends on jsr311-api.

Other errors

  • Parse body failed, Offset: 0

    • In most cases, this error occurs when data is being written. Earlier versions of Apsara Stack DataHub do not support binary data transmission of protocol buffers. However, binary data transmission is enabled in some SDKs by default. In these cases, you must manually disable binary data transmission.

    • Use SDK for Java to solve the issue.

datahubClient = DatahubClientBuilder.newBuilder()
    .setDatahubConfig(
        new DatahubConfig(endpoint,
            // Specify whether to enable binary data transmission. In DataHub SDK for Java V2.12 and later, the server supports binary data transmission.
            new AliyunAccount(accessId, accessKey), true))
    .build();
  • Use SDK for Python to solve the issue.

# Json mode: for datahub server version <= 2.11
dh = DataHub(access_id, access_key, endpoint, enable_pb=False)
  • Use SDK for GO to solve the issue.

config := &datahub.Config{
    EnableBinary:   false,
}
dh := datahub.NewClientWithConfig(accessId, accessKey, endpoint, config)
  • Use Logstash to solve the issue.

Set the value of enable_pb to false.

  • Request body size exceeded

    • The error message indicates that the size of the request body exceeds the upper limit. For more information, see Limits.

  • Record field size not match

    • The error message indicates that the specified schema does not match the schema of the topic. We recommend that you call the getTopic method to obtain the schema.

  • The limit of query rate is exceeded.

    • To ensure efficient use of resources, we set a limit to the number of queries per second (QPS) that are processed by DataHub. This error occurs if the frequency of data reads or writes exceeds the upper limit. We recommend that you read or write data in batches. For example, you can write a batch of data every minute and read 1,000 records each time a batch of data is written.

  • Num of topics exceed limit

    • In the latest version of DataHub, the maximum number of topics that can be contained in a project is set to 20.

  • SeekOutOfRange

    • The offset parameter is invalid or the offset expires.

  • Offset session has changed

    • A subscription cannot be consumed by multiple consumers at the same time. Check whether a subscription is consumed by multiple consumers in the program.

  • Can I synchronize data of the DECIMAL type to MaxCompute?

    • Data of the DECIMAL type with no precision is supported by MaxCompute. By default, a DECIMAL value can contain up to 18 digits on each side of the decimal point.

  • What does the addAttribute method do?

    • You can use the addAttribute() method to add additional attributes to a record based on your business requirements. Additional attributes are optional.

  • How do I delete data from a topic?

    • DataHub does not allow you to delete data from a topic. However, you can reset offsets to invalidate data.

  • The data within a shard is stored in a file located at the specified Object Storage Service (OSS) path. The name of the file is randomly generated. If the file size exceeds 5 GB, another file is created to store the data from the shard. Can I modify the file size?

    • No, you cannot modify the file size.

  • What can I do if my AnalyticDB for MySQL instance cannot access a public endpoint?

    • You must apply for an internal endpoint in AnalyticDB for MySQL. Log on to your AnalyticDB for MySQL instance, execute the alter database set intranet_vip = true statement to connect to the database, and then execute the select internal_domain, internal_port from information_schemata statement to query the endpoint.