All Products
Search
Document Center

DataWorks:Vertica

Last Updated:Mar 26, 2026

Vertica is a column-oriented database that uses a massively parallel processing (MPP) architecture. The Vertica data source provides bidirectional channels to read from and write to Vertica. This topic describes the data synchronization capabilities of the Vertica data source in DataWorks.

Supported versions

Vertica Reader connects to Vertica through the Vertica JDBC driver. Make sure the driver version is compatible with your Vertica service. DataWorks uses JDBC driver version 7.1.2:

<dependency>
  <groupId>com.vertica</groupId>
  <artifactId>vertica-jdbc</artifactId>
  <version>7.1.2</version>
</dependency>

Limits

Supported field types

Common Vertica data types are supported: integer, float, string, and time. Support for advanced data types is limited.

Add a data source

Before developing a synchronization task, add the Vertica data source to DataWorks. For instructions, see Data source management.

Parameter descriptions are also available in the DataWorks console when you add the data source.

Develop a data synchronization task

Vertica data synchronization tasks must be configured in the code editor. The following sections cover the script format, parameters, and examples.

Configure an offline synchronization task for a single table

Appendix: Script examples and parameter descriptions

Configure a batch synchronization task by using the code editor

The script must follow the unified script format for batch synchronization tasks. For format requirements, see Configure a task in the code editor.

Reader script example

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "vertica",
      "parameter": {
        "datasource": "",        // The data source name.
        "column": [              // The columns to read.
          "id",
          "name"
        ],
        "where": "",
        "splitPk": "id",
        "connection": [
          {
            "table": [           // The source table name.
              "table"
            ]
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {
        "print": false,
        "fieldDelimiter": ","
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  },
  "setting": {
    "errorLimit": {
      "record": "0"              // Maximum number of error records allowed.
    },
    "speed": {
      "throttle": true,          // Set to true to enable rate limiting; set to false to disable.
      "concurrent": 1,           // Number of concurrent jobs.
      "mbps": "12"               // Maximum transmission rate. 1 mbps = 1 MB/s.
    }
  }
}

Reader script parameters

ParameterDescriptionRequiredDefault valueExample
datasourceThe data source name. Must match the name of the data source added in the code editor.YesNonemy_vertica_source
tableThe source tables to read from, specified as a JSON array. Multiple tables can be read simultaneously, but all tables must have the same schema. Vertica Reader does not verify schema consistency. The table parameter must be placed inside the connection block.YesNone["orders", "order_items"]
columnThe columns to read from the source tables, specified as a JSON array. Use ["*"] to read all columns. Supports column pruning, column reordering, and constants. Cannot be left blank.YesNone["id", "name", "created_at"]
splitPkThe column used to partition data for concurrent reads. Use the primary key for even distribution and to avoid hot spots. Only integer columns are supported — string, float, and date columns are not. If left blank, data is read through a single channel without partitioning.NoNone"id"
whereA filter condition. Vertica Reader uses the column, table, and where parameters to construct the SQL query. For incremental synchronization of daily data, set this to gmt_create > $bizdate. If not configured, all data in the table is read.NoNone"gmt_create > $bizdate"
querySqlA custom SQL query for advanced filtering scenarios where where alone is insufficient. When querySql is configured, Vertica Reader ignores the table, column, and where parameters.NoNone"SELECT id, name FROM orders WHERE status = 'active'"
fetchSizeThe number of records fetched from the database in each batch. Increasing this value reduces network round trips and improves extraction performance. Setting fetchSize above 2048 may cause an out-of-memory (OOM) error.No1024512

Writer script example

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "vertica",
      "parameter": {
        "datasource": "data_source_name",
        "column": [                    // The destination columns.
          "id",
          "name"
        ],
        "connection": [
          {
            "table": [                 // The destination table name.
              "vertica_table"
            ]
          }
        ],
        "preSql": [                    // SQL to run before the write task starts.
          "delete from @table where db_id = -1"
        ],
        "postSql": [                   // SQL to run after the write task completes.
          "update @table set db_modify_time = now() where db_id = 1"
        ]
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"                    // Maximum number of error records allowed.
    },
    "speed": {
      "throttle": true,                // Set to true to enable rate limiting; set to false to disable.
      "concurrent": 1,                 // Number of concurrent jobs.
      "mbps": "12"                     // Maximum transmission rate. 1 mbps = 1 MB/s.
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Writer script parameters

ParameterDescriptionRequiredDefault valueExample
datasourceThe data source name. Must match the name of the data source added in the code editor.YesNonemy_vertica_dest
jdbcUrlThe JDBC URL of the destination Vertica database, specified inside the connection block. Only one value is supported — multiple primary databases for the same database instance are not supported (such as in bidirectional data import scenarios).YesNonejdbc:vertica://127.0.0.1:3306/database
usernameThe username for authenticating to the data source.YesNonedbadmin
passwordThe password for the specified username.YesNone********
tableThe destination tables to write to, specified as a JSON array inside the connection block.YesNone["vertica_table"]
columnThe destination columns to write data to, separated by commas.YesNone["id", "name", "age"]
preSqlA SQL statement to run before data is written to the destination table. Use @table as a placeholder for the table name — it is replaced with the actual table name at runtime.NoNone"delete from @table where db_id = -1"
postSqlA SQL statement to run after data is written to the destination table.NoNone"update @table set db_modify_time = now() where db_id = 1"
batchSizeThe number of records committed in each batch. Larger values reduce network round trips and improve throughput, but setting this too high may cause an OOM error.No1024512