edit-icon download-icon

Configure MySQL Reader

Last Updated: Oct 31, 2017

MySQL Reader plug-in can connect to remote MySQL database by the JDBC connector. SQL query statements are generated and sent to the remote MySQL database based on your configuration. Then, the SQL statements are run and the returned results are assembled into abstract datasets using custom data types of data synchronization. Datasets are passed to the downstream writer for processing.

In short, With the JDBC connector, MySQL Reader connects to the remote MySQL database and runs SQL statements to select data from MySQL database and read data from MySQL database from the underlying level.

MySQL Reader supports table and view reading. In table field, you can specify all columns in sequence, specify certain columns, adjust the column order, specify constant fields, and configure MySQL functions, such as now().

MySQL Reader supports the following MySQL data types:

Category MySQL data type
Integer int, tinyint, smallint, mediumint, int, bigint
Floating point float, double, decimal
String varchar, char, tinytext, text, mediumtext, longtext
Date and time date, datetime, timestamp, time, year
Boolean bit, bool
Binary tinyblob, mediumblob, blob, longblob, varbinary

Note:

  • Apart from the field types listed here, other types are not supported.
  • MySQL Reader classifies tinyint(1) as the integer type.

Parameter description

Parameter Description Required Default value
datasource Data source name. It must be identical to the data source name added. Adding data source is supported in script mode. Yes None
Table The name of the table selected for synchronization. One data integration job can only synchronize one table. Yes None
column The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. [ * ] indicates all columns by default.
- Column pruning is supported, which means you can select some columns to export.
- Change of column order is supported, which means you can export the columns in an order different from the schema order of the table.
- Constant configuration is supported. You must follow MySQL SQL syntax format, for example [“id”, “table“, “1”, “‘mingya.wmy’”, “‘null’”, “to_char(a + 1)”, “2.3” , “true”], where id refers to the ordinary column name, table is the column name including reserved fields, 1 is an integer constant, ‘Mingya.wmy’ is a string constant (note that a pair of single quotation marks must be added), null refers to the null pointer, CHAR_LENGTH(s) is the function to calculate the string length, 2.3 is a floating point number, and true is a Boolean value.
- Column must contain the specified column set to be synchronized and it cannot be blank.
Yes None
splitPk If you specify the splitPk when using the MySQL Reader to extract data, it means that you want to use the fields represented by splitPk for data sharding. Then, the data synchronization starts concurrent tasks to synchronize data, which greatly improves the efficiency of data synchronization.
- We recommend splitPk users to use table main key which is usually very even and therefore, data hotpoints seldom occur for data shards.
- Currently, splitPk only supports data sharding for integer data types. Other types such as string, floating point, and date are not supported. If you specify an unsupported data type, the splitPk is ignored and the data is synchronized using a single channel.
- If the splitPk is not specified (for example, splitPk is not provided or splitPk value is null), the table data is synchronized using a single channel.
No None
where Filtering condition. In actual service scenarios, incremental synchronization typically synchronizes the data generated on the current day. You can specify the where condition as gmt_create > $bizdate.
- The where condition can be effectively used for incremental synchronization. If the where is not specified (for example, the key or value of the where is not provided), full synchronization is performed.
- You cannot specify where condition as limit 10, which does not conform to requirements for MySQL SQL where clause.
No None
querySql
(in advanced mode, unavailable in wizard mode)
In some service scenarios, the where condition is insufficient for filtering. In such cases, you can customize a SQL filter using this parameter. When querySql is configured, data synchronization system filters data using this configuration item directly instead of such configuration items as tables, columns, and splitPk. For example, for data synchronization after multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id.
When querySql is configured, MySQL Reader directly ignores the configuration of table, column, where, splitPk conditions. The priority of querySql is higher than table, column, where, and splitPk. Datasource is used all the time for parsing such information as user name and password
No None
singleOrMulti
(only suitable for database/table sharding)
It is used for database/table sharding. When the wizard mode is converted to the script mode, “singleOrMulti”: “multi” is automatically generated. However, this configuration cannot be directly generated by configuring script task template and it must be added manually. Otherwise, only the first data source is identified. singleOrMulti is only used by the frontend. It cannot be used to determine database/table sharding on the backend Yes multi

Development in wizard mode

Parameter description:

  • Data source: datasource in the preceding parameter description. Select MySQL.
  • Table: table in the preceding parameter description. Select the table to be synchronized.

  • Data filtering: where in the preceding parameter description. You can enter a WHERE filtering statement (with WHERE keyword not required) based on the corresponding SQL syntax. This filtering condition can be used as an incremental synchronization criterion.

  • Splitting key: splitPk in the preceding parameter description. Supports only integer fields. During data reading, the data is split based on the configured fields to achieve concurrent reading, improving data synchronization efficiency. The configuration of splitting key is related to the source selection in data synchronization. The splitting key configuration item is displayed only when you configure the data source.

    Note:

    • You can enter constants. The value must be embraced with a pair of single quotation marks, such as abc, 123.
    • Scheduling parameters can be used together, such as ${bdp.system.bizdate}.
    • You can enter functions supported by relational database, such as now() and count(1).
    • If the value you entered cannot be parsed, the type is displayed as -.
  • Field mapping: column in the preceding parameter description. The Source Table Field on the left maps with the Target Table Field on the right. Click Add Line, and then a field is added. Hover the cursor over a line, click Delete, and then the line is deleted.

Development in script mode

The script configuration example is shown as follows (single database/table). For more information about how to enter parameters, see Parameter Description:

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "mysql",
  7. "parameter": {
  8. "datasource": "datasourceName",
  9. "table": "`table`",
  10. "splitPk": "id",
  11. "column": [
  12. "id", "name", "age"
  13. ],
  14. "where": "1 < id and id < 100"
  15. }
  16. },
  17. "writer": {
  18. }
  19. }
  20. }

Script configuration examples are provided as follows (database/table sharding). For more information about how to enter parameters, see Parameter Description:

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. "plugin": "mysql",
  7. "parameter": {
  8. "connection": [
  9. {
  10. "table": [
  11. "tbl1",
  12. "tbl2",
  13. "tbl3"
  14. ],
  15. "datasource": "datasourceName1"
  16. },
  17. {
  18. "table": [
  19. "tbl4",
  20. "tbl5",
  21. "tbl6"
  22. ],
  23. "datasource": "datasourceName2"
  24. }
  25. ],
  26. "singleOrMulti": "multi",
  27. "splitPk": "db_id",
  28. "column": [
  29. "id", "name", "age"
  30. ],
  31. "where": "1 < id and id < 100"
  32. }
  33. },
  34. "writer": {
  35. }
  36. }
  37. }
Thank you! We've received your feedback.