Vertica is a column-oriented database that uses the massively parallel processing (MPP) architecture. Vertica Reader allows you to read data from Vertica. This topic describes how Vertica Reader works, the parameters that Vertica Reader supports, and how to configure it by using the code editor.

Notice Vertica Reader supports only exclusive resource groups for Data Integration, but not the default resource group or custom resource groups. For more information, see Exclusive resource group for Data Integration and Create a custom resource group for Data Integration.

How it works

Vertica Reader connects to a remote Vertica database by using Java Database Connectivity (JDBC) and executes a SELECT statement to select and read data from the database.

Specifically, Vertica Reader connects to a remote Vertica database by using JDBC, generates a SELECT statement based on your configurations, and then sends the statement to the database. The Vertica database executes the statement and returns the results. Then, Vertica Reader assembles the returned data to abstract datasets of custom data types that are supported by Data Integration, and sends the datasets to a writer.
  • Vertica Reader generates the SELECT statement based on the table, column, and where parameters that you specify, and sends the generated SELECT statement to the Vertica database.
  • If you specify the querySql parameter, Vertica Reader directly sends the value of this parameter to the Vertica database.

Vertica Reader connects to a Vertica database by using the Vertica JDBC driver. You must confirm the compatibility between the driver version and your Vertica database. Vertica Reader uses the following version of the Vertica JDBC driver:

<dependency>
    <groupId>com.vertica</groupId>
    <artifactId>vertica-jdbc</artifactId>
    <version>7.1.2</version>
</dependency>

Parameters

Parameter Description Required Default value
datasource The connection name. It must be the same as the name of the created connection. You can create connections in the code editor. Yes N/A
jdbcUrl The JDBC URL of the Vertica database. You can specify multiple JDBC URLs for a database. The JDBC URLs are described in a JSON array.

If you specify multiple JDBC URLs, Vertica Reader verifies the connectivity of the URLs in sequence to find a valid URL. If no URL is valid, Vertica Reader returns an error.

Note The jdbcUrl parameter must be included in the connection parameter.

The value of the jdbcUrl parameter must be in compliance with the standard format that is supported by Vertica. You can also specify the information of the attachment facility. Example: jdbc:vertica://1**.0.0.1:3306/database.

No N/A
username The username that you can use to connect to the database. No N/A
password The password that you can use to connect to the database. No N/A
table The name of the source table from which Vertica Reader reads data. Vertica Reader can read data from multiple tables. The tables are described in a JSON array.

If you specify multiple tables, make sure that the tables have the same schema. Vertica Reader does not check whether the tables have the same schema.

Note The table parameter must be included in the connection parameter.
Yes N/A
column The columns to be synchronized from the source table. The columns are described in a JSON array. The default value is [*], which indicates all columns in the source table.
  • Column pruning is supported. You can select specific columns to export.
  • The column order can be changed. You can configure Vertica Reader to export the specified columns in an order different from that specified in the schema of the table.
  • Constants are supported.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes N/A
splitPk The field that is used for data sharding when Vertica Reader reads data. If you specify the splitPk parameter, the table is sharded based on the shard key that is specified by this parameter. Data Integration then runs concurrent threads to synchronize data. This way, data can be synchronized more efficiently.
  • We recommend that you set the splitPk parameter to the primary key of the table. Based on the primary key, data can be well distributed to different shards, but not intensively distributed to specific shards.
  • The splitPk parameter supports data sharding only for integers but not for other data types such as string, floating point, and date. If you set the splitPk parameter to a column of an unsupported type, Vertica Reader returns an error.
  • If you leave the splitPk parameter empty, Vertica Reader reads data from the source table by using a single thread.
No N/A
where The WHERE clause. Vertica Reader generates a SELECT statement based on the table, column, and where parameters that you specify, and uses the generated SELECT statement to select and read data.

For example, you can specify the where parameter during testing. To synchronize data that is generated on the current day, set the where parameter to gmt_create > $bizdate.

  • You can use the WHERE clause to read incremental data.
  • If you do not specify the where parameter or leave it empty, all data is read.
No N/A
querySql The SELECT statement that is used for refined data filtering. If you specify this parameter, Data Integration filters data based on this parameter.

If you specify the querySql parameter, Vertica Reader ignores the table, column, and where parameters that you specify.

No N/A
fetchSize The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects reading efficiency.
Note A value greater than 2048 may lead to out of memory (OOM) during the data synchronization process.
No 1,024

Configure Vertica Reader by using the codeless UI

The codeless user interface (UI) is not supported for Vertica Reader.

Configure Vertica Reader by using the code editor

The following example shows how to configure a sync node to read data from Vertica in the code editor. For more information, see Create a sync node by using the code editor.
{
    "type": "job",
    "steps": [
        {
            "stepType":"vertica", // The reader type.
             "parameter": {
                "datasource": "",// The connection name.
                 "username": "",
                "password": "",
                "where": "",
                "column":[// The columns to be synchronized from the source table.
                     "id",
                    "name"
                ],
                "splitPk": "id",
                "connection": [
                    {
                        "table": [ // The name of the source table.
                             "table"
                        ],
                        "jdbcUrl": [
                            "jdbc:vertica://host:port/database"
                        ]
                    }
                ]
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record":"0" // The maximum number of dirty data records allowed.
         },
        "speed": {
            "throttle": false, // Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
             "concurrent":1, // The maximum number of concurrent threads.
         }
    }
}