MySQL Reader plug-in can connect to remote MySQL database by the JDBC connector. SQL query statements are generated and sent to the remote MySQL database based on your configuration. Then, the SQL statements are run and the returned results are assembled into abstract datasets using custom data types of data synchronization. Datasets are passed to the downstream writer for processing.
In short, With the JDBC connector, MySQL Reader connects to the remote MySQL database and runs SQL statements to select data from MySQL database and read data from MySQL database from the underlying level.
MySQL Reader supports table and view reading. In table field, you can specify all columns in sequence, specify certain columns, adjust the column order, specify constant fields, and configure MySQL functions, such as now().
MySQL Reader supports the following MySQL data types:
Category | MySQL data type |
---|---|
Integer | int, tinyint, smallint, mediumint, int, bigint |
Floating point | float, double, decimal |
String | varchar, char, tinytext, text, mediumtext, longtext |
Date and time | date, datetime, timestamp, time, year |
Boolean | bit, bool |
Binary | tinyblob, mediumblob, blob, longblob, varbinary |
Note:
- Apart from the field types listed here, other types are not supported.
- MySQL Reader classifies tinyint(1) as the integer type.
Parameter description
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | Data source name. It must be identical to the data source name added. Adding data source is supported in script mode. | Yes | None |
Table | The name of the table selected for synchronization. One data integration job can only synchronize one table. | Yes | None |
column | The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. [ * ] indicates all columns by default. - Column pruning is supported, which means you can select some columns to export. - Change of column order is supported, which means you can export the columns in an order different from the schema order of the table. - Constant configuration is supported. You must follow MySQL SQL syntax format, for example [“id”, “ table “, “1”, “‘mingya.wmy’”, “‘null’”, “to_char(a + 1)”, “2.3” , “true”], where id refers to the ordinary column name, table is the column name including reserved fields, 1 is an integer constant, ‘Mingya.wmy’ is a string constant (note that a pair of single quotation marks must be added), null refers to the null pointer, CHAR_LENGTH(s) is the function to calculate the string length, 2.3 is a floating point number, and true is a Boolean value.- Column must contain the specified column set to be synchronized and it cannot be blank. |
Yes | None |
splitPk | If you specify the splitPk when using the MySQL Reader to extract data, it means that you want to use the fields represented by splitPk for data sharding. Then, the data synchronization starts concurrent tasks to synchronize data, which greatly improves the efficiency of data synchronization. - We recommend splitPk users to use table main key which is usually very even and therefore, data hotpoints seldom occur for data shards. - Currently, splitPk only supports data sharding for integer data types. Other types such as string, floating point, and date are not supported. If you specify an unsupported data type, the splitPk is ignored and the data is synchronized using a single channel. - If the splitPk is not specified (for example, splitPk is not provided or splitPk value is null), the table data is synchronized using a single channel. |
No | None |
where | Filtering condition. In actual service scenarios, incremental synchronization typically synchronizes the data generated on the current day. You can specify the where condition as gmt_create > $bizdate. - The where condition can be effectively used for incremental synchronization. If the where is not specified (for example, the key or value of the where is not provided), full synchronization is performed. - You cannot specify where condition as limit 10, which does not conform to requirements for MySQL SQL where clause. |
No | None |
querySql (in advanced mode, unavailable in wizard mode) |
In some service scenarios, the where condition is insufficient for filtering. In such cases, you can customize a SQL filter using this parameter. When querySql is configured, data synchronization system filters data using this configuration item directly instead of such configuration items as tables, columns, and splitPk. For example, for data synchronization after multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id. When querySql is configured, MySQL Reader directly ignores the configuration of table, column, where, splitPk conditions. The priority of querySql is higher than table, column, where, and splitPk. Datasource is used all the time for parsing such information as user name and password |
No | None |
singleOrMulti (only suitable for database/table sharding) |
It is used for database/table sharding. When the wizard mode is converted to the script mode, “singleOrMulti”: “multi” is automatically generated. However, this configuration cannot be directly generated by configuring script task template and it must be added manually. Otherwise, only the first data source is identified. singleOrMulti is only used by the frontend. It cannot be used to determine database/table sharding on the backend | Yes | multi |
Development in wizard mode
Parameter description:
- Data source: datasource in the preceding parameter description. Select MySQL.
Table: table in the preceding parameter description. Select the table to be synchronized.
Data filtering: where in the preceding parameter description. You can enter a WHERE filtering statement (with WHERE keyword not required) based on the corresponding SQL syntax. This filtering condition can be used as an incremental synchronization criterion.
Splitting key: splitPk in the preceding parameter description. Supports only integer fields. During data reading, the data is split based on the configured fields to achieve concurrent reading, improving data synchronization efficiency. The configuration of splitting key is related to the source selection in data synchronization. The splitting key configuration item is displayed only when you configure the data source.
Note:
- You can enter constants. The value must be embraced with a pair of single quotation marks, such as abc, 123.
- Scheduling parameters can be used together, such as ${bdp.system.bizdate}.
- You can enter functions supported by relational database, such as now() and count(1).
- If the value you entered cannot be parsed, the type is displayed as -.
Field mapping: column in the preceding parameter description. The Source Table Field on the left maps with the Target Table Field on the right. Click Add Line, and then a field is added. Hover the cursor over a line, click Delete, and then the line is deleted.
Development in script mode
The script configuration example is shown as follows (single database/table). For more information about how to enter parameters, see Parameter Description:
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {
"plugin": "mysql",
"parameter": {
"datasource": "datasourceName",
"table": "`table`",
"splitPk": "id",
"column": [
"id", "name", "age"
],
"where": "1 < id and id < 100"
}
},
"writer": {
}
}
}
Script configuration examples are provided as follows (database/table sharding). For more information about how to enter parameters, see Parameter Description:
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {
"plugin": "mysql",
"parameter": {
"connection": [
{
"table": [
"tbl1",
"tbl2",
"tbl3"
],
"datasource": "datasourceName1"
},
{
"table": [
"tbl4",
"tbl5",
"tbl6"
],
"datasource": "datasourceName2"
}
],
"singleOrMulti": "multi",
"splitPk": "db_id",
"column": [
"id", "name", "age"
],
"where": "1 < id and id < 100"
}
},
"writer": {
}
}
}