All Products
Search
Document Center

Realtime Compute for Apache Flink:Datagen connector

Last Updated:Sep 13, 2023

This topic describes how to use the Datagen connector.

Background information

The Datagen connector is used for debugging. The connector periodically generates random data of the type that corresponds to the Datagen source table. If you want to use test data to efficiently check the business logic during development or testing, you can use the Datagen connector to generate random data.

The Datagen connector supports the computed column syntax to generate data in a flexible manner.

The following table describes the capabilities supported by the Datagen connector.

Item

Description

Table type

Source table

Running mode

Batch mode and streaming mode

Data format

N/A

Metric

N/A

API type

SQL API

Data update or deletion in a result table

N/A

Limits

Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 2.0.0 or later supports the Datagen connector.

Syntax

CREATE TABLE datagen_source (
  name VARCHAR,
  score BIGINT
) WITH (
  'connector' = 'datagen'
);

Parameters in the WITH clause

Parameter

Description

Data type

Required

Default value

Remarks

connector

The type of the source table.

STRING

Yes

No default value

Set the value to datagen.

rows-per-second

The rate at which random data is generated.

LONG

No

10000 (rows of data per second)

N/A.

number-of-rows

The total number of rows of data that can be generated.

LONG

No

No default value

By default, an unbounded data source table is generated. If a field generator is a sequence generator, data generation for the source is complete and a bounded table is generated after the sequence of a field is generated.

fields.<field>.kind

The type of the generator that generates data for <field>.

STRING

No

random

Valid values:

  • random: a random generator.

  • sequence: a sequence generator.

For more information about generators, see Generators.

fields.<field>.min

The minimum random value that can be generated.

Same as the data type of <field>

No

Minimum value for the data type of <field>

This parameter takes effect when the fields.<field>.kind parameter is set to random. Only numeric data types are supported.

fields.<field>.max

The maximum random value that can be generated.

Same as the data type of <field>

No

Maximum value for the data type of <field>

This parameter takes effect when the fields.<field>.kind parameter is set to random. Only numeric data types are supported.

fields.<field>.max-past

The maximum past time relative to the current timestamp of the on-premises machine when a random timestamp is generated.

DURATION

No

0

Only the timestamp type is supported.

fields.<field>.length

The length of the random string that is generated or the capacity of a set of data that is generated.

INTEGER

No

100

The following data types are supported:

  • CHAR

  • VARCHAR

  • BINARY

  • VARBINARY

  • STRING

  • ARRAY

  • MAP

  • MULTISET

fields.<field>.start

The start value of the sequence generator.

Same as the data type of <field>

No

No default value

This parameter takes effect when the fields.<field>.kind parameter is set to sequence.

fields.<field>.end

The end value of the sequence generator.

Same as the data type of <field>

No

No default value

This parameter takes effect when the fields.<field>.kind parameter is set to sequence.

Note

Replace <field> in the parameter with the name of the field that you defined in the DDL statement.

Generators

The Datagen connector can use one of the following types of generators to generate random data:

  • Random generator: generates random values. You can specify the maximum and minimum values for data that is randomly generated.

  • Sequence generator: generates ordered values within a specific range. When the generated sequence reaches the end value, the data generation process ends. Therefore, if a sequence generator is used, a bounded table is generated. You can specify the start and end values of the sequence.

The following table describes the generator types that are supported for each data type.

Data type

Supported generator

Remarks

BOOLEAN

Random

N/A.

CHAR

Random and sequence

N/A.

VARCHAR

Random and sequence

N/A.

BINARY

Random and sequence

N/A.

VARBINARY

Random and sequence

N/A.

STRING

Random and sequence

N/A.

DECIMAL

Random and sequence

N/A.

TINYINT

Random and sequence

N/A.

SMALLINT

Random and sequence

N/A.

INT

Random and sequence

N/A.

BIGINT

Random and sequence

N/A.

FLOAT

Random and sequence

N/A.

DOUBLE

Random and sequence

N/A.

DATE

Random

Uses the current date of the on-premises machine.

TIME

Random

Uses the current time of the on-premises machine.

TIMESTAMP

Random

Generates values within the maximum past time range relative to the current timestamp of the on-premises machine.

TIMESTAMP_LTZ

Random

Generates values within the maximum past time range relative to the current timestamp of the on-premises machine.

ROW

Random

Generates random subfields.

ARRAY

Random

Generates random elements.

MAP

Random

Generates random pairs of (key,value).

MULTISET

Random

Generates random elements.

Example

In most cases, the Datagen connector is used together with the LIKE clause to simulate a table. Sample code:

CREATE TABLE datagen_source (
 id INT,
 score INT
) WITH (
 'connector' = 'datagen',
 'fields.id.kind'='sequence',
 'fields.id.start'='1',
 'fields.id.end'='50',
 'fields.score.kind'='random',
 'fields.score.min'='70',
 'fields.score.max'='100'
);