The TiDB data source enables you to read data offline. This topic describes how DataWorks supports data synchronization for TiDB.
Supported TiDB versions
Offline read: TiDB database versions 7.x and 8.x are supported.
Offline write: Not supported.
Real-time read and write: Not supported.
TiDB is highly compatible with the MySQL protocol and supports the common features and syntax of MySQL 5.7 and MySQL 8.0. DataWorks uses the MySQL protocol to read data from TiDB for offline data synchronization. For more information about the compatibility between TiDB and MySQL, see Compatibility with MySQL.
Supported field types
For more information about all supported TiDB field types, see Data Type Overview. The following table lists the support for major field types.
Field type | Offline read (TiDB Reader) |
TINYINT | Supported |
SMALLINT | Supported |
MEDIUMINT | Support |
INTEGER | Support |
BIGINT | Supported |
FLOAT | Supported |
DOUBLE | Support |
DECIMAL | Supported |
CHAR | Supported |
VARCHAR | Support |
JSON | Support |
TEXT | Support |
TINYTEXT | Support |
MEDIUMTEXT | Support |
LONGTEXT | Support |
VARBINARY | Support |
BINARY | Support |
BLOB | Support |
TINYBLOB | Supported |
MEDIUMBLOB | Support |
LONGBLOB | Support |
ENUM | Support |
SET | Support |
BOOLEAN | Supported |
BIT | Support |
DATE | Support |
DATETIME | Support |
TIMESTAMP | Support |
TIME | Support |
YEAR | Support |
Prerequisites: Prepare the TiDB environment
Before you synchronize data in DataWorks, you must prepare the TiDB environment. This ensures that you can correctly configure and run TiDB data synchronization tasks. The following sections describe the preparation steps.
Confirm the TiDB version
Data Integration requires TiDB version 7.x or 8.x. Ensure that your TiDB database meets this version requirement. To check the current version, you can run the following statement in your TiDB database.
SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)Configure account permissions
Create a dedicated TiDB account for DataWorks to access the data source.
(Optional) Create an account. For more information, see User Account Management.
Configure permissions.
For offline data reads from TiDB, this account must have the read (
SELECT) permission on the tables that you want to synchronize.You can run the following commands to grant permissions to the account or grant the
SUPERpermission directly. In the following statements, replace'sync_account'with the account that you created.-- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; // Create a sync account and set a password. This allows the account to log on to the database from any host. The percent sign (%) indicates any host. GRANT SELECT ON *.* TO 'sync_account'@'%'; // Grant the SELECT permission on the database to the sync account.*.*grants the account the specified permissions on all tables in all databases. You can also grant the account the permissions on a specific table in a target database. For example, to grant the account the permissions on the user table in the test database, you can run theGRANT SELECT ON test.user TO 'sync_account'@'%';statement.NoteThe
REPLICATION SLAVEstatement grants a global permission. You cannot use this statement to grant permissions on a specific table in a target database to the sync account.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Appendix: TiDB script demo and parameter descriptions
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
The following code provides a sample script for reading data:
The comments in the following JSON sample are for demonstration purposes only. You must remove the comments when you configure the script.
{
"type": "job",
"version": "2.0",
"steps":
[
{
"stepType": "tidb",
"parameter":
{
"column":
[
"id",
"name"
],
"where": "",
"splitPk": "id",
"connection":
[
{
"selectedDatabase": "test_database",
"datasource": "test_datasource",
"table":
[
"test_table"
]
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter":
{
},
"name": "Writer",
"category": "writer"
}
],
"setting":
{
"errorLimit":
{
"record": "0"
},
"speed":
{
"throttle": false,
"concurrent": 3
}
},
"order":
{
"hops":
[
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader script parameters
Script parameter name | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
table | The name of the table from which to synchronize data. The following examples show advanced uses of the table parameter to configure a range:
| Yes | None |
column | The columns to synchronize from the specified table. Use a JSON array to describe the fields. To synchronize all columns, set this parameter to
| Yes | None |
splitPk | When TiDB Reader extracts data, if you specify splitPk, the system uses the specified field to partition the data. This enables concurrent tasks for data synchronization and improves efficiency.
| No | None |
where | The filter condition. In many business scenarios, you might synchronize data for the current day. To do this, set the where condition to
| No | None |