The Datagen connector is used for debugging. It generates random test data for source tables, letting you validate business logic during development and testing without connecting to an external data source. It periodically produces random values that match your table schema.
The connector also supports the computed column syntax for flexible data generation.
Connector capabilities
| Item | Description |
|---|---|
| Table type | Source table |
| Running mode | Batch mode and streaming mode |
| Data format | N/A |
| Metric | N/A |
| API type | SQL API |
| Data update or deletion in a sink table | N/A |
Limits
Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 2.0.0 or later supports the Datagen connector.
Syntax
CREATE TABLE datagen_source (
name VARCHAR,
score BIGINT
) WITH (
'connector' = 'datagen'
);
Parameters in the WITH clause
| Parameter | Data type | Required | Default value | Description |
|---|---|---|---|---|
connector |
String | Yes | None | Must be set to datagen. |
rows-per-second |
Long | No | 10000 | Rate at which rows are generated, in rows per second. |
number-of-rows |
Long | No | None | Total number of rows to generate. By default, the source is unbounded. If a sequence generator is used, the source becomes bounded automatically after the sequence ends. |
fields.<field>.kind |
String | No | random |
Generator type for the field. Valid values: random or sequence. For details, see Generators. |
fields.<field>.min |
Same as <field> data type |
No | Minimum value of the field's data type | Minimum value for a random generator. Only applies to numeric types. |
fields.<field>.max |
Same as <field> data type |
No | Maximum value of the field's data type | Maximum value for a random generator. Only applies to numeric types. |
fields.<field>.max-past |
Duration | No | 0 | Maximum past offset relative to the current local machine timestamp when generating random timestamp values. Only applies to TIMESTAMP and TIMESTAMP_LTZ types. |
fields.<field>.length |
Integer | No | 100 | Length of the random string, or size of the generated collection. Supported types: CHAR, VARCHAR, BINARY, VARBINARY, STRING, ARRAY, MAP, MULTISET. |
fields.<field>.start |
Same as <field> data type |
No | None | Start value for a sequence generator. |
fields.<field>.end |
Same as <field> data type |
No | None | End value for a sequence generator. |
Replace <field> with the field name defined in your DDL statement.
Generators
The Datagen connector supports two generator types:
-
Random generator: generates random values within the configured range. Specify
fields.<field>.minandfields.<field>.maxto control the range for numeric types. -
Sequence generator: generates ordered values from a start value to an end value. When the sequence is exhausted, data generation stops and a bounded table is produced. Specify
fields.<field>.startandfields.<field>.endto set the range.
The following table lists the generator type supported for each data type.
| Data type | Supported generator | Notes |
|---|---|---|
| BOOLEAN | Random | |
| CHAR | Random and sequence | |
| VARCHAR | Random and sequence | |
| BINARY | Random and sequence | |
| VARBINARY | Random and sequence | |
| STRING | Random and sequence | |
| DECIMAL | Random and sequence | |
| TINYINT | Random and sequence | |
| SMALLINT | Random and sequence | |
| INT | Random and sequence | |
| BIGINT | Random and sequence | |
| FLOAT | Random and sequence | |
| DOUBLE | Random and sequence | |
| DATE | Random | Always resolves to the current date of the local machine. |
| TIME | Random | Always resolves to the current time of the local machine. |
| TIMESTAMP | Random | Generates values within the past offset range relative to the current local machine timestamp. Configure the offset with fields.<field>.max-past. |
| TIMESTAMP_LTZ | Random | Generates values within the past offset range relative to the current local machine timestamp. Configure the offset with fields.<field>.max-past. |
| ROW | Random | Generates random subfields. |
| ARRAY | Random | Generates random elements. |
| MAP | Random | Generates random key-value pairs. |
| MULTISET | Random | Generates random elements. |
Examples
Minimal configuration
Create a source table with default settings:
CREATE TABLE datagen_source (
name VARCHAR,
score BIGINT
) WITH (
'connector' = 'datagen'
);
Mix sequence and random generators
In most cases, the Datagen connector is used together with the LIKE clause to simulate an existing table. The following example generates a bounded dataset by using a sequence generator for id and a random generator for score:
CREATE TABLE datagen_source (
id INT,
score INT
) WITH (
'connector' = 'datagen',
'fields.id.kind' = 'sequence',
'fields.id.start' = '1',
'fields.id.end' = '50',
'fields.score.kind' = 'random',
'fields.score.min' = '70',
'fields.score.max' = '100'
);
The sequence generator produces id values from 1 to 50, then stops. Because a sequence generator is used, the source is bounded. The random generator produces score values between 70 and 100.