This topic describes the data types and parameters supported by Gbase8a Reader and how to configure it by using the code editor.

Gbase8a is a new type of column-oriented analytical database. Gbase8a Reader allows you to read data from Gbase8a databases.
Notice Currently, Gbase8a Reader only supports Use exclusive resource groups for data integration and does not support the default resource group or custom resource groups.
Gbase8a Reader connects to a remote Gbase8a database through Java Database Connectivity (JDBC), generates a SELECT statement based on your configurations, and then sends the statement to the database. The Gbase8a database executes the statement and returns the result. Then, Gbase8a Reader assembles the returned data to abstract datasets in custom data types supported by Data Integration, and passes the datasets to a writer.
  • Gbase8a Reader generates the SELECT statement based on the table, column, and where parameters that you set, and sends the generated SELECT statement to the Gbase8a database.
  • If you specify the querySql parameter, Gbase8a Reader directly sends the value of this parameter to the Gbase8a database.
Gbase8a Reader accesses a Gbase8a database through the MySQL database driver. You need to confirm the compatibility between the driver version and your Gbase8a database. Gbase8a Reader uses the following version of the MySQL database driver:
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.22</version>
</dependency>

Parameters

Parameter Description Required Default value
datasource The name of the connection. If the edition of the DataWorks service that you activated supports Gbase8a connections, you can add a Gbase8a connection and reference the connection in this parameter.

You can connect to the Gbase8a database based on the settings of the jdbcUrl or username parameter.

No None
jdbcUrl The JDBC URL for connecting to the Gbase8a database. You can specify multiple JDBC URLs in a JSON array for a database.

If you specify multiple JDBC URLs, Gbase8a Reader verifies the connectivity of the URLs in sequence to find a valid URL.

If no URL is valid, Gbase8a Reader returns an error.
Note The jdbcUrl parameter must be included in the connection parameter.

The value of the jdbcUrl parameter must be in compliance with the standard format supported by Gbase8a. You can also specify the information of the attachment facility. Example: jdbc:mysql://127.0.0.1:3306/database. You must specify the jdbcUrl parameter or the datasource parameter.

No None
username The username for connecting to the Gbase8a database. No None
password The password for connecting to the Gbase8a database. No None
table The name of the source table from which Gbase8a Reader reads data. Gbase8a Reader can read data from multiple tables. The tables are described in a JSON array.
If you specify multiple tables, make sure that the tables have the same schema. Gbase8a Reader does not check whether the tables have the same schema.
Note The table parameter must be included in the connection parameter.
Yes None
column The columns that Gbase8a Reader reads from the source table. The columns are described in a JSON array. The default value is [ * ], which indicates all columns in the source table.
  • Column pruning is supported. You can specify specific columns for Gbase8a Reader to export.
  • The column order can be changed. You can configure Gbase8a Reader to export columns in an order different from that specified in the schema of the table.
  • Constants are supported. Example: '123'.
  • Functions are supported. Example: date('now').
  • The column parameter must explicitly specify a set of columns to read. The parameter cannot be left empty.
Yes None
splitPk The field used for data sharding when Gbase8a Reader reads data. If you specify the splitPk parameter, the table is sharded based on the shard key specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This improves efficiency.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to certain shards.
  • Currently, the splitPk parameter supports data sharding only for integers but not for other data types such as strings, floating points, and dates. If you specify this parameter to a column of an unsupported type, Gbase8a Reader ignores the splitPk parameter and reads data through a single thread.
  • If you leave the splitPk parameter empty, Gbase8a Reader reads data in the source table through a single thread.
No None
where The WHERE clause. Gbase8a Reader generates a SELECT statement based on the column, table, and where parameters that you set, and uses the generated SELECT statement to select and read data.
For example, you can set the where parameter to limit 10 during testing. To read data generated on the current day, set the where parameter to gmt_create > $bizdate.
  • You can use the WHERE clause to read incremental data.
  • If you do not specify the where parameter or leave it empty, all data is read.
No None
querySql The SELECT statement used for refined data filtering. If you specify this parameter, Data Integration directly filters data based on this parameter.

If you specify the querySql parameter, Gbase8a Reader ignores the table, column, where, and splitPk parameters that you set.

No None
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects data reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data reading process.
No 1,024

Configure Gbase8a Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for Gbase8a Reader.

Configure Gbase8a Reader by using the code editor

The following example provides sample code for configuring Gbase8a Reader. For more information, see Create a sync node by using the code editor.
{
    "type": "job",
    "steps": [
        {
            "stepType": "gbase8a", // The reader name.
            "parameter": {
                "datasource": "", // The connection name.
                "username": "",
                "password": "",
                "where": "",
                "column": [ // The columns to be read.
                    "id",
                    "name"
                ],
                "splitPk": "id",
                "connection": [
                    {
                        "table": [ // The name of the table whose data is to be read.
                            "table"
                        ],
                        "jdbcUrl": [
                            "jdbc:mysql://host:port/database"
                        ]
                    }
                ]
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record":"0" // The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": false, // Specifies whether to enable bandwidth throttling. The value false indicates that the bandwidth is not throttled. The value true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent": 1, // The maximum number of concurrent threads.
        }
    }
}