This topic describes the data types and parameters supported by PolarDB Reader and how to configure it by using the codeless user interface (UI) and code editor.

PolarDB Reader connects to a remote PolarDB database through Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The PolarDB database runs the statement and returns the result. Then, PolarDB Reader assembles the returned data to abstract datasets in custom data types supported by Data Integration, and sends the datasets to a writer.

In short, PolarDB Reader connects to a remote PolarDB database and runs a SELECT statement to select and read data from the database.

PolarDB Reader can read tables and views. For table fields, you can specify all or some of the columns in sequence, adjust the column order, specify constant fields, and configure PolarDB functions, such as now().

Data types

The following table lists the data types supported by PolarDB Reader.

Category PolarDB data type
Integer INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT
Floating point FLOAT, DOUBLE, and DECIMAL
String VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT
Date and time DATE, DATETIME, TIMESTAMP, TIME, and YEAR
Boolean BIT and BOOLEAN
Binary TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY
Note
  • Except for the preceding data types, other types are not supported.
  • PolarDB Reader classifies tinyint(1) as the INTEGER type.

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
table The name of the table to be synchronized. Yes None
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select and export specific columns.
  • Change of the column order is supported. You can export the columns in an order different from that specified in the schema of the table.
  • Constants are supported. The column names must be arranged in compliance with the SQL syntax supported by PolarDB, for example, ["id", "table","1","'mingya.wmy'","'null'", "to_char(a+1)","2.3","true"].
    • id: a column name.
    • table: the name of a column that contains reserved keywords.
    • 1: an integer constant.
    • 'mingya.wmy': a string constant, which is enclosed in single quotation marks (' ').
    • 'null': the string null.
    • to_char(a+1): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes None
splitPk The field used for data sharding when PolarDB Reader extracts data. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This improves efficiency.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to certain shards.
  • Currently, the splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you specify this parameter to a column of an unsupported type, PolarDB Reader ignores the splitPk parameter and synchronizes data through a single thread.
  • If you do not specify the splitPk parameter or leave it empty, Data Integration synchronizes data through a single thread.
No None
where The WHERE clause. For example, set this parameter to gmt_create>$bizdate.
  • You can use the WHERE clause to synchronize incremental data. If you do not specify the where parameter or leave it empty, all data is synchronized.
  • Do not set the where parameter to limit 10, which does not conform to the constraints of PolarDB on the SQL WHERE clause.
No None
querySql (only available in the code editor) The SELECT statement used for refined data filtering. If you specify this parameter, Data Integration directly filters data based on this parameter. For example, if you want to join multiple tables for data synchronization, set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. The priority of the querySql parameter is higher than those of the column, table, and where parameters. When you specify the querySql parameter, PolarDB Reader ignores the column, table, and where parameters that you have configured. The data store specified by the datasource parameter parses information, including the username and password, from this parameter. No None
singleOrMulti (only applicable to database and table sharding) Specifies whether to shard the database or table. After you switch from the codeless UI to the code editor, the following configuration is automatically generated: "singleOrMulti":"multi". However, if you use the code editor since the beginning, the configuration is not automatically generated and you must manually specify this parameter. If you do not specify this parameter or leave it empty, PolarDB Reader can only read data from the first shard.
Note The singleOrMulti parameter is only used by the front end, but not by the back end.
Yes multi

Configure PolarDB Reader by using the codeless UI

  1. Configure the connections.
    Configure the source and destination connections for the sync node.Connections
    Parameter Description
    Connection The datasource parameter in the preceding parameter description. Select a connection type, and enter the name of a connection that has been configured in DataWorks.
    Table The table parameter in the preceding parameter description.
    Filter The filter condition for the data to be synchronized. Currently, filtering based on the limit keyword is not supported. The SQL syntax is determined by the selected connection.
    Shard Key The shard key. You can use a column in the source table as the shard key. We recommend that you use the primary key or an indexed column. Only integer fields are supported. If data sharding is performed based on the configured shard key, data can be read concurrently to improve data synchronization efficiency.
    Note The Shard Key parameter is displayed only when you configure the source connection for a sync node.
  2. Configure field mapping, that is, the column parameter in the preceding parameter description.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field, or move the pointer over a field and click the Delete icon to delete the field.Mappings
    Parameter Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. Note that the data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping for fields in the same row. Note that the data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
    Auto Layout Click Auto Layout. The fields are automatically sorted based on specified rules.
  3. Configure channel control policies.Channel
    Parameter Description
    Expected Maximum Concurrency The maximum number of concurrent threads to read and write data to data storage within the sync node. You can configure the concurrency for a node on the codeless UI.
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.

Configure POLARBD Reader by using the code editor

In the following code, a node is configured to read data from a database or table that is not sharded. For more information about the parameters, see the preceding parameter description.
{
    "type": "job",
    "steps": [
        {
            "parameter": {
                "datasource": "test_005",// The connection name.
                "column": [// The columns to be synchronized.
                    "id",
                    "name",
                    "age",
                    "sex",
                    "salary",
                    "interest"
                ],
                "where": "id=1001",// The WHERE clause.
                "splitPk": "id",// The shard key.
                "table": "PolarDB_person"// The source table name.
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "parameter": {}
    ],
    "version": "2.0",// The version number.
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {// The maximum number of dirty data records allowed.
            "record": ""
        },
        "speed": {
            "concurrent": 6,// The number of concurrent threads.
            "throttle": false // Specifies whether to enable bandwidth throttling.
        }
    }
}