edit-icon download-icon

Configure sqlServer reader

Last Updated: Dec 04, 2017

The SqlServer Reader plug-in provides the ability to read data from SqlServer. At the underlying implementation level, SqlServer Reader connects to a remote SqlServer database through JDBC and runs SELECT statements to extract data from the database.

In short, SqlServer Reader connects to a remote SqlServer database through the JDBC connector. The SELECT SQL query statements are generated and sent to the remote SqlServer database based on your configuration. Then, the SQL statements are run and the returned results are assembled into abstract datasets using the custom data types of data synchronization. Datasets are passed to the downstream writer for processing.

  • SqlServer Reader concatenates the table, column, and WHERE information you configured into SQL statements and sends them to the SqlServer database.

  • SqlServer directly sends the querySql information you configured to the SqlServer database.

SqlServer Reader supports most data types in SqlServer. Check whether your data type is supported.

SqlServer Reader converts SqlServer data types as follows:

Category SqlServer data type
Integer bigint, int, smallint, tinyint
Floating point float, decimal, real, numeric
String char,nchar,ntext,nvarchar,text,varchar,nvarchar(MAX),varchar(MAX)
Date and time date, datetime, time
Boolean bit
Binary binary,varbinary,varbinary(MAX),timestamp

Parameter description

Parameter Description Required Default value
datasource Data source name. It must be identical to the data source name added. Adding data source is supported in script mode.

You must configure the data source before configuring the SqlServer Reader plug‑in. For more information, see Configure the SqlServer Data Source.
Yes None
table The table selected for synchronization. One job can only synchronize one table. Yes None
column The column name set to be synchronized in the configured table. Field information is described with arrays in JSON. [““] indicates all columns by default.
- Column pruning is supported, which means you can select some columns to export.
- Change of column order is supported, which means you can export the columns in an order different from the schema order of the table.
- Constant configuration is supported. You must follow the MySQL SQL syntax format, for example [“id”, “table“, “1”, “‘mingya.wmy’”, “‘null’”, “to_char(a + 1)”, “2.3” , “true”], where id refers to the ordinary column name, table is the name of the column containing reserved words, 1 is an integer constant, ‘Mingya.wmy’ is a string constant (note that a pair of single quotation marks must be added), null refers to the null pointer, to_char(a + 1) is a function expression, 2.3 is a floating point number, and true is a Boolean value.
- Column must contain the specified column set to be synchronized and it cannot be blank.
Yes None
splitPk If you specify the splitPk when using SqlServer Reader to extract data, it means that you want to use the fields represented by splitPk for data sharding. Then, the data synchronization system starts concurrent tasks to synchronize data, which greatly improves the efficiency of data synchronization.
We recommend the splitPk users to the primary keys of tables, because the primary keys are generally even and data hot spots are less prone to split data fragments.
Currently, splitPk only supports data sharding for integer data types. Other types such as floating point, string, and date are not supported. If you specify an unsupported data type, SqlServer Reader reports an error.
No Null
where Filtering condition. SqlServer Reader concatenates an SQL command based on the specified column, table, and where conditions and extracts data according to the SQL. For example, you can specify the where condition as limit 10 during a test. In actual business scenarios, the data on the current day are usually required to be synchronized. You can specify the where condition as gmt_create > $bizdate.
The where condition can be effectively used for incremental synchronization. If the value is null, it means synchronizing all the information in the table.
No None
querySql In some business scenarios, the where condition is insufficient for filtering. In such cases, you can customize a SQL filter using this parameter. When this item is configured, the data synchronization system filters data using this configuration item directly instead of such configuration items as table and column. For example, for data synchronization after multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id.
When querySql is configured, SqlServer Reader directly ignores the configuration of table, column, and where conditions.
No None
fetchSize It defines the pieces of batch data that the plug-in and database server can fetch each time. The value determines the number of network interactions between the data synchronization system and the server, which can greatly improve the data extraction performance.
Note: A value greater than 2048 may lead to OOM for data synchronization.
No 1024

Development in wizard mode

sqlServer

  • Data Sources: datasource in the preceding parameter description. Select SqlServer.

  • Table: table in the preceding parameter description. Select the table to be synchronized.

  • Data Filtering: where in the preceding parameter description. You can enter a WHERE filtering statement (with WHERE keyword not required) based on the corresponding SQL syntax. This filtering condition can be used as an incremental synchronization criterion.

  • the split button: splitPk in the preceding parameter description. Supports only integer fields. During data reading, the data is split based on the configured fields to achieve concurrent reading, improving data synchronization efficiency. The splitting key configuration item is displayed only when the synchronization task is for importing RDS or Oracle data to MaxCompute.

  • Field Mapping: column in the preceding parameter description. The Source Table Field on the left maps with the Target Table Field on the right. Click Add Line, and then a field is added. Hover the cursor over a line, click Delete, and then the field is deleted.

    Notes:

    • You can enter constants. The value must be embraced with a pair of single quotation marks, such as abc, 123.

    • Scheduling parameters can be used together, such as ${bdp.system.bizdate}.

    • You can enter functions supported by relational databases, such as now() and count(1).

    • If the value you entered cannot be parsed, the type is displayed as ‘-‘.

Development in script mode

Configure a job to synchronously extract data from an SqlServer database:

  1. {
  2. "type": "job",
  3. "traceId": "You can enter a trace ID for your job here. We recommend you to organize your trace ID in the following format: service name + job ID",
  4. "version": "1.0",
  5. "configuration": {
  6. "reader": {
  7. "plugin": "sqlserver",
  8. "parameter": {
  9. "datasource": "datasourceName",
  10. "table": "table",
  11. "column": [
  12. "id", "name", "age"
  13. ],
  14. "fetchSize": 1024,
  15. "splitPk": "id",
  16. "where": "id > 100"
  17. }
  18. },
  19. "writer": {
  20. }
  21. }
  22. }

Configure a job to synchronize tasks from a database with custom SQLs to MaxCompute:

  1. {
  2. "type": "job",
  3. "traceId": "You can enter a trace ID for your job here. We recommend you to organize your trace ID in the following format: service name + job ID",
  4. "version": "1.0",
  5. "configuration": {
  6. "reader": {
  7. "plugin": "sqlserver",
  8. "parameter": {
  9. "datasource": "datasourceName",
  10. "querySql": "SELECT * from tableName",
  11. "fetchSize": 1024
  12. }
  13. },
  14. "writer": {
  15. }
  16. }
  17. }

Supplemental instructions

Data restoration of master/slave synchronization

Master/slave synchronization means that SqlServer uses a master/slave disaster recovery mode in which the slave database continuously restores data from the master database through binlog. Due to the time difference in master/slave data synchronization, especially in some special situations such as network latency, the restored data in the slave database after synchronization are significantly different from the data of the master database, that is to say, the data synchronized in the slave database are not a full image of the master database at the current time.

If CDP synchronizes data of the RDS provided by Alibaba Cloud, the data is directly read from the master database without data restoration concerns. However, this may cause concerns on the master database load. Configure it properly for throttling.

Consistency constraints

SqlServer is an RDBMS system in terms of data storage, which can provide APIs for querying strong consistency data. For example, if another data writer writes data to the database during a synchronization task, SqlServer Reader does not get the newly written data because of the snapshot features of the database. For the snapshot features of databases, see MVCC Wikipedia.

These are the features of data synchronization consistency in the single-threaded model of SqlServer Reader. Because SqlServer Reader uses concurrent data extraction according to your configuration information, robust data consistency cannot be guaranteed. After SqlServer Reader completes data sharding based on splitPk, multiple concurrent tasks are successively enabled to synchronize data. Since multiple concurrent tasks do not belong to the same read transaction and time intervals exist between the concurrent tasks, the data is not complete and consistent data snapshot information.

Currently, the consistency snapshot demands in the multi-threaded model cannot be met technically, which can only be solved from the engineering point of view. The engineering approaches have both advantages and disadvantages. The following solutions are provided for your consideration:

  • Use single-threaded synchronization without data sharding. This is slow but can ensure robust data consistency.

  • Close other data writers to ensure the current data is static. For example, you can lock the table or close slave database synchronization. The disadvantage is that the online businesses may be affected.

Database encoding

SqlServer Reader extracts data using JDBC at the underlying level. JDBC is applicable to all types of encodings and can complete transcoding at the underlying level. Therefore, SqlServer Reader can identify the encoding and complete transcoding automatically without the need to specify the encoding.

Incremental data synchronization

Since SqlServer Reader extracts data using JDBC SELECT statements, you can extract incremental data using the SELECT…WHERE…conditions in either of the following ways:

  • When database online applications write data into the database, the modify field is filled with the modification timestamp, including addition, update, and deletion (logical deletion). For this type of applications, SqlServer Reader only requires the WHERE condition followed by the timestamp of the last synchronization phase.

  • For new streamline data, SqlServer Reader requires the WHERE condition followed by the maximum auto-increment ID of the last synchronization phase.

In case that no field is provided for the business to identify the addition or modification of data, SqlServer Reader cannot perform incremental data synchronization and can only perform full data synchronization.

SQL security

SqlServer Reader provides querySql statements for you to SELECT data by yourself. SqlServer Reader conducts no security verification on querySql. The security during use is ensured by the data synchronization users.

Thank you! We've received your feedback.