Vertica is a column-oriented database that uses the massively parallel processing (MPP) architecture. Vertica Reader reads data from Vertica. This topic describes how Vertica Reader works, the parameters that are supported by Vertica Reader, and how to configure Vertica Reader by using the codeless user interface (UI) and code editor.
How it works
- Vertica Reader generates the SQL statement based on the settings of the table, column, and where parameters and sends the generated statement to the Vertica database.
- If you specify the querySql parameter, Vertica Reader directly sends the value of this parameter to the Vertica database.
Vertica Reader connects to a Vertica database by using the Vertica JDBC driver. You must make sure that the driver version is compatible with your Vertica database. Vertica Reader uses the Vertica JDBC driver of the following version:
<dependency>
<groupId>com.vertica</groupId>
<artifactId>vertica-jdbc</artifactId>
<version>7.1.2</version>
</dependency>
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
jdbcUrl | The JDBC URL that is used to connect to the Vertica database. You can specify multiple
JDBC URLs for a database. Specify JDBC URLs in a JSON array.
If you specify multiple JDBC URLs, Vertica Reader verifies the connectivity of the URLs in sequence to find a valid URL. If no URL is valid, Vertica Reader returns an error. Note The jdbcUrl parameter must be included in the connection parameter.
The value of the jdbcUrl parameter must comply with the standard format that is supported by Vertica. You
can also specify the information of the attachment facility. Example: |
No | No default value |
username | The username that is used to connect to the database. | No | No default value |
password | The password that is used to connect to the database. | No | No default value |
table | The name of the table from which you want to read data. Vertica Reader can read data
from multiple tables. Specify the table names in a JSON array.
If you specify multiple tables, make sure that the tables have the same schema. Vertica Reader does not check whether the tables have the same schema. Note The table parameter must be included in the connection parameter.
|
Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a
JSON array. The default value is [ * ], which indicates all the columns in the source
table.
|
Yes | No default value |
splitPk | The field that is used for data sharding when Vertica Reader reads data. If you specify
this parameter, the source table is sharded based on the value of this parameter.
Data Integration then runs parallel threads to read data. This improves data synchronization
efficiency.
|
No | No default value |
where | The WHERE clause. Vertica Reader generates an SQL statement based on the settings
of the table, column, and where parameters and uses the generated statement to read data.
For example, when you perform a test, you can specify the where parameter to filter data. In actual business scenarios, you can set the where parameter to
|
No | No default value |
querySql | The SQL statement that is used for refined data filtering. If you specify this parameter,
Data Integration filters data based only on the value of this parameter.
If you specify the querySql parameter, Vertica Reader ignores the settings of the table, column, and where parameters. |
No | No default value |
fetchSize | The number of data records to read at a time. This parameter determines the number
of interactions between Data Integration and Vertica and affects read efficiency.
Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error
may occur during data synchronization.
|
No | 1,024 |
Configure Vertica Reader by using the codeless UI
This method is not supported.
Configure Vertica Reader by using the code editor
{
"type": "job",
"steps": [
{
"stepType": "vertica", // The reader type.
"parameter": {
"datasource": "", // The name of the data source.
"username": "",
"password": "",
"where": "",
"column": [ // The names of the columns from which you want to read data.
"id",
"name"
],
"splitPk": "id",
"connection": [
{
"table": [ // The name of the table from which you want to read data.
"table"
],
"jdbcUrl": [
"jdbc:vertica://host:port/database"
]
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // The maximum number of dirty data records allowed.
},
"speed": {
"throttle": true, // Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent": 1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
}
}