This topic describes the parameters that are supported by GBase8a Reader and how to configure GBase8a Reader by using the codeless user interface (UI) and code editor.

Background information

GBase 8a is a new type of column-oriented analytical database. GBase8a Reader can read data from GBase 8a databases.
Notice GBase8a Reader supports only exclusive resource groups for Data Integration, but not the shared resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration and Create a custom resource group for Data Integration.
GBase8a Reader connects to a remote GBase 8a database by using Java Database Connectivity (JDBC), generates an SQL statement based on your configurations, and then sends the statement to the database. The system executes the statement on the database and returns data. Then, GBase8a Reader assembles the returned data into abstract datasets of the data types supported by Data Integration and sends the datasets to a writer.
  • GBase8a Reader generates the SQL statement based on the settings of the table, column, and where parameters and sends the generated statement to the GBase 8a database.
  • If you specify the querySql parameter, GBase8a Reader sends the value of this parameter to the GBase 8a database.
GBase8a Reader uses the MySQL database driver to access a GBase 8a database. You must make sure that your GBase 8a database is compatible with the driver version. The driver used by GBase8a Reader is of the following version:
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.22</version>
</dependency>

Parameters

Parameter Description Required Default value
datasource The name of the data source. If the edition of the DataWorks service that you activated supports GBase 8a data sources, you can add a GBase 8a data source and specify the data source in this parameter.

You can connect to the added GBase 8a data source based on the setting of the jdbcUrl or username parameter.

No No default value
jdbcUrl The JDBC URL that is used to connect to the source database. You can specify multiple JDBC URLs in a JSON array for a database.

If you specify multiple JDBC URLs, GBase8a Reader verifies the connectivity of the URLs in sequence to find a valid URL.

If no URL is valid, GBase8a Reader returns an error.
Note The jdbcUrl parameter must be included in the connection parameter.

The value of the jdbcUrl parameter must comply with the standard format that is supported by GBase 8a. You can also specify the information of the attachment facility. An example JDBC URL is jdbc:mysql://127.0.0.1:3306/database. You must specify either the jdbcUrl or username parameter.

No No default value
username The username that is used to connect to the source database. No No default value
password The password that is used to connect to the source database. No No default value
table The name of the table from which you want to read data. GBase8a Reader can read data from multiple tables. Specify the table names in a JSON array.
If you specify multiple tables, make sure that the tables have the same schema. GBase8a Reader does not check whether the tables have the same schema.
Note The table parameter must be included in the connection parameter.
Yes No default value
column The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all the columns in the source table.
  • You can select specific columns to read.
  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.
  • Constants are supported, such as '123'.
  • Functions are supported, such as date('now').
  • The column parameter must explicitly specify all the columns from which you want to read data. The parameter cannot be left empty.
Yes No default value
splitPk The field that is used for data sharding when GBase8a Reader reads data. If you specify this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently.
  • We recommend that you set the splitPk parameter to the name of the primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards.
  • The splitPk parameter supports sharding for data only of integer data types. If you set the splitPk parameter to a field of an unsupported data type, such as a string, floating point, or date data type, the setting of this parameter is ignored, and a single thread is used to read data.
  • If you leave the splitPk parameter empty, a single thread is used to read data.
No No default value
where The WHERE clause. GBase8a Reader generates an SQL statement based on the settings of the column, table, and where parameters and uses the generated statement to read data.
For example, when you perform a test, you can set the where parameter to limit 10. To read the data that is generated on the current day, you can set the where parameter to gmt_create > $bizdate.
  • You can use the WHERE clause to read incremental data.
  • If the where parameter is not provided or is left empty, GBase8a Reader reads all data.
No No default value
querySql The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter.

If you specify the querySql parameter, GBase8a Reader ignores the settings of the table, column, where, and splitPk parameters.

No No default value
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between GBase8a Reader and the source database and affects read efficiency.
Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization.
No 1,024

Configure GBase8a Reader by using the codeless UI

This method is not supported.

Configure GBase8a Reader by using the code editor

In the following code, a synchronization node is configured to read data from a GBase 8a database. For more information about how to configure a synchronization node by using the code editor, see Create a sync node by using the code editor.
{
    "type": "job",
    "steps": [
        {
            "stepType": "gbase8a", // The reader type. 
            "parameter": {
                "datasource": "", // The name of the data source. 
                "username": "",
                "password": "",
                "where": "",
                "column": [ // The names of the columns from which you want to read data. 
                    "id",
                    "name"
                ],
                "splitPk": "id",
                "connection": [
                    {
                        "table": [ // The name of the table from which you want to read data. 
                            "table"
                        ],
                        "jdbcUrl": [
                            "jdbc:mysql://host:port/database"
                        ]
                    }
                ]
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0" // The maximum number of dirty data records allowed. 
        },
        "speed": {
            "throttle": true, // Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent": 1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    }
}