All Products
Search
Document Center

Dataphin:Offline integration pipeline channel configuration

Last Updated:Jan 23, 2025

The offline integration pipeline channel allows you to define fault tolerance, concurrency, JVM resources, database configuration, and traffic monitoring for offline integration tasks at runtime. This topic guides you through the configuration process.

Procedure

  1. Navigate to the Dataphin home page and select Development -> Data Development.

  2. To access the Runtime Configuration drawer, follow these steps:

    Choose Project (Dev-Prod mode requires environment selection) -> select Offline Integration -> choose the Offline Pipeline to be configured -> select Properties -> select Channel Configuration.

    image

  3. In the Runtime Configuration drawer, set the following parameters:

    Parameter

    Description

    Fault Tolerance Configuration

    Number of Errors

    Defines the maximum number of error allowed when a pipe task run. Dataphin offline integration does not allow errors by default, that is, the maximum allowable number of error value is 0 by default, but the allowable number of error threshold can be set pass the configuration number of error.

    When executing an integrated task configured with the fault threshold, two scenarios may arise:

    • If the cumulative number of errors across nodes exceeds the configured fault tolerance, the current pipeline task will fail.

    • If the cumulative number of errors is within the fault tolerance range, the errors will be skipped, and the task will continue to run.

    Reasons for errors: Errors commonly arise when data fails to transfer from the source to the target data source due to an exception. For instance, attempting to write VARCHAR type data from the source into an INT type column in the target data source can lead to format incompatibility, preventing the data from being written and resulting in dirty data.

    Concurrency Configuration

    Expected Maximum Concurrency

    Determines the maximum number of threads for parallel reading from the source or parallel writing to the target in the current pipeline script.

    JVM Configuration

    JVM Parameters

    Specifies the JVM resources allocated to the current pipeline script, encompassing CPU and memory parameters.

    • The maximum CPU core resource is 4.0 cores, and negative values are not supported.

    • The maximum memory allocation is 16384MB (16GB), and decimals and negative values are not supported.

    Database Configuration

    SQL Execution Timeout

    Refers to the execution timeout for prepared and completion statement SQL. If the execution time exceeds this limit, the task will fail. The default timeout is 30 minutes, with a range from 1 to 2880 minutes (48 hours).

    Important

    The query timeout set for the output component takes precedence over the pipeline's query timeout.

    Database Connection Retry Count

    Defines the number of retries for database connection attempts in case of timeouts. The default is 1 retry, with a range from 0 to 10 retries. If the maximum retry count is reached without success, the task will be marked as failed.

    Important
    • This feature is only available for data source instances in pipeline tasks that allow retry count configuration.

    • By default, the task inherits the retry count from the data source configuration, but it can be overridden at the task level, following the hierarchy: task-level configuration > data source configuration.

    • If not configured in the data source, the default retry count is 1.

    • For integration tasks with multiple relational data sources, you can adjust the retry count for each data source instance separately within the pipeline. The pipeline's configuration will take precedence after submission.

    Traffic Monitoring

    No Traffic Time Threshold

    Specifies the time threshold for data inactivity during data reading and result transmission. If no data traffic occurs beyond this threshold due to extended query times or high database load, the task will fail. The default threshold is 30 minutes, with a range from 5 to 2880 minutes (48 hours).

    Time Zone Configuration

    Time Zone

    Select the appropriate time zone based on the database configuration. The default time zone for data integration in China is GMT+8, which does not account for daylight saving time. If the database time zone observes daylight saving time, such as Asia/Shanghai, choose the Asia/Shanghai time zone during daylight saving time to avoid a one-hour discrepancy with the database data.

    Supported time zones include: GMT+1, GMT+2, GMT+3, GMT+5:30, GMT+8, GMT+9, GMT+10, GMT-5, GMT-6, GMT-8, Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New York, America/Sao Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.

  4. Click OK to save the channel configuration.

What to do next

Once the channel configuration is complete, click Submit to send the task to the publishing center or operation center.

  • In Dev-Prod mode development, you must publish your task. For more information, see manage published tasks or .

  • In Basic mode development, once you successfully submit a task, it will be scheduled in the production environment. To view your published tasks, visit the Operation Center.