This topic describes the data types and parameters that HybridDB for MySQL Reader supports and how to configure it by using the codeless user interface (UI) and code editor.

HybridDB for MySQL Reader can read tables and views. For table fields, you can specify all or some of the columns in sequence, adjust the column order, specify constant fields, and configure HybridDB for MySQL functions, such as now().

HybridDB for MySQL Reader allows you to read data from HybridDB for MySQL. HybridDB for MySQL Reader connects to a remote HybridDB for MySQL database and runs a SELECT statement to select and read data from the database.

Specifically, HybridDB for MySQL Reader connects to a remote HybridDB for MySQL database by using Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The HybridDB for MySQL database runs the statement and returns the result. Then, HybridDB for MySQL Reader assembles the returned data to abstract datasets of custom data types that are supported by Data Integration, and passes the datasets to a writer.

Data types

The following table describes the data types that HybridDB for MySQL Reader supports.

Category HybridDB for MySQL data type
Integer INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT
Floating point FLOAT, DOUBLE, and DECIMAL
String VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT
Date and time DATE, DATETIME, TIMESTAMP, TIME, and YEAR
Boolean BIT and BOOLEAN
Binary TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY
Note
  • HybridDB for MySQL Reader supports only the data types that are described in the preceding table.
  • HybridDB for MySQL Reader considers tinyint(1) as the INTEGER type.

Parameters

Parameter Description Required Default value
datasource The connection name. It must be the same as the name of the created connection. You can create connections in the code editor. Yes N/A
table The name of the source table from which HybridDB for MySQL Reader reads data. You can select only one source table for each sync node. Yes N/A
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns.
  • Column pruning is supported. You can select specific columns to export.
  • The column order can be changed. You can configure HybridDB for MySQL Reader to export the specified columns in an order different from that specified in the schema of the table.
  • Constants are supported. The column names must be arranged in compliance with the SQL syntax that HybridDB for MySQL supports, for example, ["id", "table","1","'mingya.wmy'","'null'", "to_char(a+1)","2.3","true"].
    • id: a column name.
    • table: the name of a column that contains reserved keywords.
    • 1: an integer constant.
    • 'mingya.wmy': a string constant, which is enclosed in single quotation marks (' ').
    • 'null': the string null.
    • to_char(a+1): a function expression.
    • 2.3: a floating-point constant.
    • true: a Boolean value.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes N/A
splitPk The field that is used for data sharding when HybridDB for MySQL Reader reads data. If you specify the splitPk parameter, the table is sharded based on the shard key that is specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This way, data can be synchronized more efficiently.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to specific shards.
  • The splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you specify this parameter to a column of an unsupported type, HybridDB for MySQL Reader ignores the splitPk parameter and synchronizes data by using a single thread.
  • If you do not specify the splitPk parameter or leave it empty, Data Integration synchronizes data by using a single thread.
No N/A
where The WHERE clause. For example, set this parameter to gmt_create>$bizdate.
  • You can use the WHERE clause to synchronize incremental data. If you do not specify the where parameter or leave it empty, all data is read.
  • Do not set this parameter to limit 10, which does not conform to the constraints of HybridDB for MySQL on the SQL WHERE clause.
No N/A
querySql (available only in the code editor) The SELECT statement that is used for refined data filtering. If you specify this parameter, Data Integration filters data based on this parameter. For example, if you want to join multiple tables for data synchronization, set this parameter to ["id","table","1",'"mingya.wmy'","'null"',"to_char(a+1)","2.3","true"]. The priority of the querySql parameter is higher than those of the table, column, where, and splitPk parameters. If you specify the querySql parameter, HybridDB for MySQL Reader ignores the table, column, where, and splitPk parameters that you have configured. The data store that is specified by the datasource parameter parses information, including the username and password, from this parameter. No N/A
singleOrMulti (applicable only to database and table sharding) Specifies whether to shard the database or table. After you switch from the codeless UI to the code editor, the following configuration is automatically generated: "singleOrMulti":"multi". However, if you use the code editor since the beginning, the configuration is not automatically generated and you must manually specify this parameter. If you do not specify this parameter or leave it empty, HybridDB for MySQL Reader can only read data from the first shard. The singleOrMulti parameter is used only by the frontend, but not by the backend. Yes multi

Configure HybridDB for MySQL Reader by using the codeless UI

  1. Configure the connections.
    Configure the connections to the source and destination data stores for the sync node.Connections section
    Parameter Description
    Connection The datasource parameter in the preceding parameter description. Select a connection type and select the name of a connection that you have configured in DataWorks.
    Table The table parameter in the preceding parameter description.
    Filter The condition for filtering the data to be synchronized. HybridDB for MySQL Reader cannot filter data based on the limit keyword. The SQL syntax is determined by the selected connection.
    Shard Key The shard key. You can specify a column in the source table as the shard key. We recommend that you use the primary key or an indexed column as the shard key. Only integer fields are supported. If data sharding is performed based on the configured shard key, data can be read concurrently. This way, data can be synchronized more efficiently.
    Note The Shard Key parameter is displayed only after you select the connection to the source data store for the sync node.
  2. Configure field mapping, that is, the column parameter in the preceding parameter description.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field. To delete a field, move the pointer over the field and click theDelete icon.Mappings section
    GUI element Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. The data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping between fields in the same row. The data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
    Auto Layout Click Auto Layout to sort the fields based on specified rules.
    Change Fields Click the Change Fields icon. In the Change Fields dialog box, you can manually edit the fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
    Add Click Add to add a field. Take note of the following rules when you add a field:
    • You can enter constants. Each constant must be enclosed in single quotation marks (' '), for example, 'abc' and '123'.
    • You can use scheduling parameters such as ${bizdate}.
    • You can enter functions that are supported by relational databases, for example, now() and count(1).
    • Fields that cannot be parsed are indicated by Unidentified.
  3. Configure channel control policies.Channel section
    Parameter Description
    Expected Maximum Concurrency The maximum number of concurrent threads that the sync node uses to read data from or write data to data stores. You can configure the concurrency for the node on the codeless UI.
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.
    Resource Group The resource group that is used to run the sync node. If a large number of nodes including this sync node are deployed on the default resource group, the sync node may need to wait for resources. We recommend that you purchase an exclusive resource group for Data Integration or add a custom resource group. For more information, see DataWorks exclusive resources and Add a custom resource group.

Configure HybridDB for MySQL Reader by using the code editor

The following example shows how to configure a sync node to read data from a database or table that is not sharded. For more information about the parameters, see the preceding parameter description.
{
    "type": "job",
    "steps": [
        {
            "parameter": {
                "datasource": "px_aliyun_hymysql",// The connection name.
                "column": [// The columns to be synchronized from the source table.
                    "id",
                    "name",
                    "sex",
                    "salary",
                    "age",
                    "pt"
                ],
                "where": "id=10001",// The WHERE clause.
                "splitPk": "id",// The shard key.
                "table": "person"// The name of the source table.
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "parameter": {}
    ],
    "version": "2.0",// The version number.
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {// The maximum number of dirty data records allowed.
            "record": ""
        },
        "speed": {
            "concurrent": 7,// The maximum number of concurrent threads.
            "throttle": true,// Specifies whether to enable bandwidth throttling.
            "mbps": 1,// The maximum transmission rate.
        }
    }
}