All Products
Search
Document Center

DataWorks:TiDB

Last Updated:Feb 28, 2026

The TiDB data source enables you to read data offline. This topic describes how DataWorks supports data synchronization for TiDB.

Supported TiDB versions

  • Offline read: TiDB database versions 7.x and 8.x are supported.

  • Offline write: Not supported.

  • Real-time read and write: Not supported.

Note

TiDB is highly compatible with the MySQL protocol and supports the common features and syntax of MySQL 5.7 and MySQL 8.0. DataWorks uses the MySQL protocol to read data from TiDB for offline data synchronization. For more information about the compatibility between TiDB and MySQL, see Compatibility with MySQL.

Supported field types

For more information about all supported TiDB field types, see Data Type Overview. The following table lists the support for major field types.

Field type

Offline read (TiDB Reader)

TINYINT

Supported

SMALLINT

Supported

MEDIUMINT

Support

INTEGER

Support

BIGINT

Supported

FLOAT

Supported

DOUBLE

Support

DECIMAL

Supported

CHAR

Supported

VARCHAR

Support

JSON

Support

TEXT

Support

TINYTEXT

Support

MEDIUMTEXT

Support

LONGTEXT

Support

VARBINARY

Support

BINARY

Support

BLOB

Support

TINYBLOB

Supported

MEDIUMBLOB

Support

LONGBLOB

Support

ENUM

Support

SET

Support

BOOLEAN

Supported

BIT

Support

DATE

Support

DATETIME

Support

TIMESTAMP

Support

TIME

Support

YEAR

Support

Prerequisites: Prepare the TiDB environment

Before you synchronize data in DataWorks, you must prepare the TiDB environment. This ensures that you can correctly configure and run TiDB data synchronization tasks. The following sections describe the preparation steps.

Confirm the TiDB version

Data Integration requires TiDB version 7.x or 8.x. Ensure that your TiDB database meets this version requirement. To check the current version, you can run the following statement in your TiDB database.

SELECT TIDB_VERSION()\G

*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)

Configure account permissions

Create a dedicated TiDB account for DataWorks to access the data source.

  1. (Optional) Create an account. For more information, see User Account Management.

  2. Configure permissions.

    For offline data reads from TiDB, this account must have the read (SELECT) permission on the tables that you want to synchronize.

    You can run the following commands to grant permissions to the account or grant the SUPER permission directly. In the following statements, replace 'sync_account' with the account that you created.

    -- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; // Create a sync account and set a password. This allows the account to log on to the database from any host. The percent sign (%) indicates any host.
    GRANT SELECT ON *.* TO 'sync_account'@'%'; // Grant the SELECT permission on the database to the sync account.

    *.* grants the account the specified permissions on all tables in all databases. You can also grant the account the permissions on a specific table in a target database. For example, to grant the account the permissions on the user table in the test database, you can run the GRANT SELECT ON test.user TO 'sync_account'@'%'; statement.

    Note

    The REPLICATION SLAVE statement grants a global permission. You cannot use this statement to grant permissions on a specific table in a target database to the sync account.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Appendix: TiDB script demo and parameter descriptions

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

The following code provides a sample script for reading data:

Note

The comments in the following JSON sample are for demonstration purposes only. You must remove the comments when you configure the script.

{
  "type": "job",
  "version": "2.0",
  "steps":
  [
    {
      "stepType": "tidb",
      "parameter":
      {
        "column":
        [
          "id",
          "name"
        ],
        "where": "",
        "splitPk": "id",
        "connection":
        [
          {
            "selectedDatabase": "test_database",
            "datasource": "test_datasource",
            "table":
            [
              "test_table"
            ]
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "odps",
      "parameter":
      {
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting":
  {
    "errorLimit":
    {
      "record": "0"
    },
    "speed":
    {
      "throttle": false,
      "concurrent": 3
    }
  },
  "order":
  {
    "hops":
    [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader script parameters

Script parameter name

Description

Required

Default value

datasource

The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.

Yes

None

table

The name of the table from which to synchronize data.

The following examples show advanced uses of the table parameter to configure a range:

  • You can read data from sharded tables by configuring a range. For example, 'table_[0-99]' specifies that data is read from 'table_0', 'table_1', 'table_2', up to 'table_99'.

  • If the numeric suffixes of your table names have the same length, such as 'table_000', 'table_001', 'table_002', up to 'table_999', you can set the parameter to '"table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"]'.

Yes

None

column

The columns to synchronize from the specified table. Use a JSON array to describe the fields. To synchronize all columns, set this parameter to ["*"].

  • Column pruning: You can select specific columns to export.

  • Column reordering: You can export columns in an order different from the table schema.

  • Constant configuration: You must follow the MySQL SQL syntax. Example: ["id","table","1","'test_constant'","'null'","to_char(a+1)","2.3","true"] .

    • id is a regular column name.

    • table is a column name that is also a reserved word.

    • 1 is an integer constant.

    • 'test_constant' is a string constant. Note that it must be enclosed in single quotation marks.

    • About null:

      • "" indicates an empty string.

      • null indicates a null value.

      • 'null' indicates the string "null".

    • The to_char(a+1) function calculates string length.

    • 2.3 is a floating-point number.

    • true is a Boolean value.

  • The column parameter must explicitly specify the set of columns to synchronize. It cannot be empty.

Yes

None

splitPk

When TiDB Reader extracts data, if you specify splitPk, the system uses the specified field to partition the data. This enables concurrent tasks for data synchronization and improves efficiency.

  • Use the primary key of the table for splitPk. Primary keys are usually distributed evenly. This helps prevent data hot spots in the resulting shards.

  • Currently, splitPk supports only integer-based data partitioning. It does not support other data types such as strings, floating-point numbers, or dates. If you specify an unsupported data type, the splitPk feature is ignored and data is synchronized through a single channel.

  • If you do not specify splitPk, or if its value is empty, data is synchronized from the table through a single channel.

No

None

where

The filter condition. In many business scenarios, you might synchronize data for the current day. To do this, set the where condition to gmt_create>$bizdate.

  • The where condition allows for efficient incremental data synchronization. If you do not specify a where clause, or if the key or value is not provided, all data is synchronized.

  • The limit syntax is not supported in the where condition.

No

None