All Products
Search
Document Center

Realtime Compute for Apache Flink:Datagen

Last Updated:Mar 26, 2026

The Datagen connector is used for debugging. It generates random test data for source tables, letting you validate business logic during development and testing without connecting to an external data source. It periodically produces random values that match your table schema.

The connector also supports the computed column syntax for flexible data generation.

Connector capabilities

Item Description
Table type Source table
Running mode Batch mode and streaming mode
Data format N/A
Metric N/A
API type SQL API
Data update or deletion in a sink table N/A

Limits

Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 2.0.0 or later supports the Datagen connector.

Syntax

CREATE TABLE datagen_source (
  name VARCHAR,
  score BIGINT
) WITH (
  'connector' = 'datagen'
);

Parameters in the WITH clause

Parameter Data type Required Default value Description
connector String Yes None Must be set to datagen.
rows-per-second Long No 10000 Rate at which rows are generated, in rows per second.
number-of-rows Long No None Total number of rows to generate. By default, the source is unbounded. If a sequence generator is used, the source becomes bounded automatically after the sequence ends.
fields.<field>.kind String No random Generator type for the field. Valid values: random or sequence. For details, see Generators.
fields.<field>.min Same as <field> data type No Minimum value of the field's data type Minimum value for a random generator. Only applies to numeric types.
fields.<field>.max Same as <field> data type No Maximum value of the field's data type Maximum value for a random generator. Only applies to numeric types.
fields.<field>.max-past Duration No 0 Maximum past offset relative to the current local machine timestamp when generating random timestamp values. Only applies to TIMESTAMP and TIMESTAMP_LTZ types.
fields.<field>.length Integer No 100 Length of the random string, or size of the generated collection. Supported types: CHAR, VARCHAR, BINARY, VARBINARY, STRING, ARRAY, MAP, MULTISET.
fields.<field>.start Same as <field> data type No None Start value for a sequence generator.
fields.<field>.end Same as <field> data type No None End value for a sequence generator.
Replace <field> with the field name defined in your DDL statement.

Generators

The Datagen connector supports two generator types:

  • Random generator: generates random values within the configured range. Specify fields.<field>.min and fields.<field>.max to control the range for numeric types.

  • Sequence generator: generates ordered values from a start value to an end value. When the sequence is exhausted, data generation stops and a bounded table is produced. Specify fields.<field>.start and fields.<field>.end to set the range.

The following table lists the generator type supported for each data type.

Data type Supported generator Notes
BOOLEAN Random
CHAR Random and sequence
VARCHAR Random and sequence
BINARY Random and sequence
VARBINARY Random and sequence
STRING Random and sequence
DECIMAL Random and sequence
TINYINT Random and sequence
SMALLINT Random and sequence
INT Random and sequence
BIGINT Random and sequence
FLOAT Random and sequence
DOUBLE Random and sequence
DATE Random Always resolves to the current date of the local machine.
TIME Random Always resolves to the current time of the local machine.
TIMESTAMP Random Generates values within the past offset range relative to the current local machine timestamp. Configure the offset with fields.<field>.max-past.
TIMESTAMP_LTZ Random Generates values within the past offset range relative to the current local machine timestamp. Configure the offset with fields.<field>.max-past.
ROW Random Generates random subfields.
ARRAY Random Generates random elements.
MAP Random Generates random key-value pairs.
MULTISET Random Generates random elements.

Examples

Minimal configuration

Create a source table with default settings:

CREATE TABLE datagen_source (
  name VARCHAR,
  score BIGINT
) WITH (
  'connector' = 'datagen'
);

Mix sequence and random generators

In most cases, the Datagen connector is used together with the LIKE clause to simulate an existing table. The following example generates a bounded dataset by using a sequence generator for id and a random generator for score:

CREATE TABLE datagen_source (
  id INT,
  score INT
) WITH (
  'connector' = 'datagen',
  'fields.id.kind' = 'sequence',
  'fields.id.start' = '1',
  'fields.id.end' = '50',
  'fields.score.kind' = 'random',
  'fields.score.min' = '70',
  'fields.score.max' = '100'
);

The sequence generator produces id values from 1 to 50, then stops. Because a sequence generator is used, the source is bounded. The random generator produces score values between 70 and 100.