Tunnel Service is built on Tablestore APIs to provide tunnels that are used to export and consume data in full, incremental, and differential modes. After you create tunnels, you can consume historical and incremental data exported from the specified table.

A tunnel client is an automatic data consumption framework of Tunnel Service. The tunnel client performs the following operations by regularly detecting heartbeats:
  • Detect active channels.
  • Update the statuses of channels and channel connections.
  • Initialize, run, and terminate data processing tasks.

You can use TunnelWorkerConfig to configure the tunnel client as follows:

  • Configure the interval to detect heartbeats and the timeout period to receive heartbeats.
  • Record the interval between the checkpoints when data was consumed.
  • Customize the client tag.
  • Customize the callback to process data.
  • Configure the thread pool to read and process data.
  • Configure the memory control. This configuration applies to Tablestore SDK V5.4.0 and later.
  • Configure the maximum backoff time. This configuration applies to Tablestore SDK V5.4.0 and later.

Configurations

  • Heartbeat
    • HeartbeatTimeoutInSec: the timeout period to receive heartbeats. Default value: 300s. When a heartbeat timeout occurs, the tunnel server considers that the current tunnel client is not available. The tunnel client must try to connect to the tunnel server again.
    • HeartbeatIntervalInSec: the interval to detect heartbeats. Default value: 30s. You can detect heartbeats to detect active channels, update the status of channels, and automatically initialize data processing tasks. Minimum value: 5s.
  • Interval between checkpoints

    checkpointIntervalInMillis: the interval between checkpoints when data was consumed. The interval is recorded on the tunnel server. Unit: ms. Default value: 5000.

    Note
    • Data to read is stored in different servers. Various errors may occur when you run processes. For example, the server may restart due to environmental factors. The tunnel server regularly records checkpoints after data is processed. A task processes data from the last checkpoint after the task is restarted. In exceptional conditions, Tunnel Service may sequentially synchronize data once or multiple times. If some data is reprocessed, check the business processing logic.
    • To prevent data being reprocessed in the case of errors, record more checkpoints. However, too many checkpoints may compromise the system throughput. We recommend that you record the checkpoints as needed.
  • The client tag

    clientTag: the custom client tag that is used to generate a tunnel client ID. You can customize this parameter to uniquely identify TunnelWorkers.

  • The custom callback to process data

    channelProcessor: the callback that you register to process data, including the process and shutdown methods.

  • The configuration of the thread pool to read and process data
    • readRecordsExecutor: the thread pool to read data. Use the default configuration if there are no special requirements.
    • processRecordsExecutor: the thread pool to process data. Use the default configuration if there are no special requirements.
    Note
    • When you customize the thread pool, we recommend that the number of threads in the pool and the number of the channels in the tunnel be the same. This way, each channel can be quickly allocated with compute resources such as CPU.
    • The following operations are performed for default configurations to ensure throughput:
      • Allocate 32 core threads in advance to guarantee real-time throughput when there is a small amount of data or a small number of channels.
      • Reduce the queue length when there is a large amount of data to process or when there is a large number of channels. This way, the policy is triggered to create a thread in the pool and quickly allocate more compute resources.
      • Set the thread keep-alive time to 60s. When the data to process is reduced, thread resources can be recycled.
  • Memory control

    maxChannelParallel: the concurrency level of channels to read and process data for memory control. The default value is -1, indicating that the concurrency level is not limited.

    Note This configuration applies to Tablestore SDK V5.4.0 and later.
  • Maximum backoff time

    maxRetryIntervalInMillis: the reference value to calculate the maximum backoff time for the tunnel. Minimum value: 200 ms. Default value: 2000 ms. The maximum backoff time is random around the reference value. Valid values of the maximum backoff time: 0.75 × maxRetryIntervalInMillis to 1.25 × maxRetryIntervalInMillis.

    Note
    • This configuration applies to Tablestore SDK V5.4.0 and later.
    • When there is a small amount of data (smaller than 900 KB or 500 pieces for each export) to process, the tunnel client uses exponential backoff for the tunnel until the maximum backoff time is exceeded.