This topic describes the data types and parameters that PolarDB Reader supports and how to configure it by using the codeless user interface (UI) and code editor.
PolarDB Reader connects to a remote PolarDB database by using Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The PolarDB database runs the statement and returns the result. Then, PolarDB Reader assembles the returned data to abstract datasets of custom data types that are supported by Data Integration, and passes the datasets to a writer.
In short, PolarDB Reader connects to a remote PolarDB database and runs a SELECT statement to select and read data from the database.
PolarDB Reader can read tables and views. For table fields, you can specify all or some of the columns in sequence, adjust the column order, specify constant fields, and configure PolarDB functions, such as now().
Data types
Category | PolarDB data type |
---|---|
Integer | INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT |
Floating point | FLOAT, DOUBLE, and DECIMAL |
String | VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT |
Date and time | DATE, DATETIME, TIMESTAMP, TIME, and YEAR |
Boolean | BIT and BOOLEAN |
Binary | TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY |
- PolarDB Reader supports only the data types that are described in the preceding table.
- PolarDB Reader classifies tinyint(1) as the INTEGER type.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The connection name. It must be the same as the name of the created connection. You can create connections in the code editor. | Yes | N/A |
table | The name of the source table. | Yes | N/A |
column | The columns to be synchronized from the source table. The columns are described in
a JSON array. The default value is [ * ], which indicates all columns.
|
Yes | N/A |
splitPk | The field that is used for data sharding when PolarDB Reader reads data. If you specify
the splitPk parameter, the table is sharded based on the shard key that is specified by this
parameter. Data Integration then runs concurrent threads to synchronize data. This
way, data can be synchronized more efficiently.
|
No | N/A |
where | The WHERE clause. For example, set this parameter to gmt_create>$bizdate.
|
No | N/A |
querySql (available only in the code editor) | The SELECT statement that is used for refined data filtering. If you specify this
parameter, Data Integration filters data based on this parameter. For example, if you want to join multiple tables for data synchronization, set this
parameter to select a,b from table_a join table_b on table_a.id = table_b.id . The priority of the querySql parameter is higher than those of the column, table,
where, and splitPk parameters. When you specify the querySql parameter, PolarDB Reader
ignores the column, table, where, and splitPk parameters that you have configured. The data store that is specified
by the datasource parameter parses information, including the username and password,
from this parameter.
|
No | N/A |
singleOrMulti (applicable only to database and table sharding) | Specifies whether to shard the database or table. After you switch from the codeless
UI to the code editor, the following configuration is automatically generated: "singleOrMulti":"multi" . However, if you use the code editor since the beginning, the configuration is not
automatically generated and you must manually specify this parameter. If you do not
specify this parameter or leave it empty, PolarDB Reader can only read data from the
first shard.
Note The singleOrMulti parameter is used only by the frontend, but not by the backend.
|
Yes | multi |
Configure PolarDB Reader by using the codeless UI
- Configure the connections.
Configure the connections to the source and destination data stores for the sync node.
Parameter Description Connection The datasource parameter in the preceding parameter description. Select a connection type and select the name of a connection that you have configured in DataWorks. Table The table parameter in the preceding parameter description. Filter The condition for filtering the data to be synchronized. PolarDB Reader cannot filter data based on the limit keyword. The SQL syntax is determined by the selected connection. Shard Key The shard key. You can specify a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key. Only integer fields are supported. If data sharding is performed based on the configured shard key, data can be read concurrently. This way, data can be synchronized more efficiently. Note The Shard Key parameter is displayed only after you select the connection to the source data store for the sync node. - Configure field mapping, that is, the column parameter in the preceding parameter description.
Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field. To delete a field, move the pointer over the field and click theDelete icon.
GUI element Description Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. The data types of the fields must match. Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping between fields in the same row. The data types of the fields must match. Delete All Mappings Click Delete All Mappings to remove mappings that have been established. Auto Layout Click Auto Layout to sort the fields based on specified rules. - Configure channel control policies.
Parameter Description Expected Maximum Concurrency The maximum number of concurrent threads that the sync node uses to read data from or write data to data stores. You can configure the concurrency for the node on the codeless UI. Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value. Dirty Data Records Allowed The maximum number of dirty data records allowed.
Configure PolarDB Reader by using the code editor
{
"type": "job",
"steps": [
{
"parameter": {
"datasource": "test_005",// The connection name.
"column": [// The columns to be synchronized from the source table.
"id",
"name",
"age",
"sex",
"salary",
"interest"
],
"where": "id=1001",// The WHERE clause.
"splitPk": "id",// The shard key.
"table": "polardb_person"// The source table name.
},
"name": "Reader",
"category": "reader"
},
{
"parameter": {}
],
"version": "2.0",// The version number.
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {// The maximum number of dirty data records allowed.
"record": ""
},
"speed": {
"concurrent": 6,// The number of concurrent threads.
"throttle": false,// Specifies whether to enable bandwidth throttling.
}
}
}