This topic provides the DDL syntax that is used to create a Faker dimension table, describes the limits on the use of the Faker connector and the parameters in the WITH clause, and provides sample code.

What is a Faker?

Faker is a built-in connector of fully managed Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify business logic during job development or testing, we recommend that you use a Faker connector.

Limits

  • Only Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports Faker connectors.
  • The Faker connector supports the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.

DDL syntax

CREATE TABLE faker_dim (
  `name` STRING,
  `age` INT
) WITH (
  'connector' = 'faker',
  'fields.name.expression' = '#{superhero.name}',
  'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);

Parameters in the WITH clause

Parameter Description Data type Required Remarks
connector The type of the dimension table. STRING Yes Set the value to faker.
fields.<field>.expression The Java Faker expression that is used to generate the value of the field. STRING Yes For more information, see the description of field expressions in the following section.
fields.<field>.null-rate The rate at which the value of the field is null. FLOAT No Default value: 0.0.
fields.<field>.length The length of the field of the ARRAY, MAP, or MULTISET data type. INTEGER No Default value: 1.

Field expression

  • Operation
    When you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.
    Parameter Description
    field The name of a field in the DDL statement.
    className The name of a Faker class.

    Java Faker provides about 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.

    Note Faker class names are not case-sensitive.
    methodName The name of a method.
    Note Method names are not case-sensitive.
    parameter The input parameters of a method.
    Note
    • Each input parameter of a method must be enclosed in single quotation marks (').
    • Separate multiple input parameters with commas (,).
  • Example
    This example describes how to generate an SQL expression for a field in a DDL statement based on the Java Faker API documentation. The 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in DDL syntax is used in this example.
    1. Find the Number class in the Java Faker API documentation. Class Number
    2. Find the numberBetween method in the Number class and view the method description. numberBetween

      The numberBetween method specifies the numbers between which the return value falls.

    3. Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.

      This expression indicates that the generated value of the age field is in the range of 0 to 1000.

Sample code

CREATE TEMPORARY TABLE datagen_source (
  `character_id` INT,
  `location` STRING,
  `proctime` AS PROCTIME()
) WITH (
  'connector' = 'datagen'
);

CREATE TEMPORARY TABLE faker_dim (
  `character_id` INT,
  `name` STRING
) WITH (
  'connector' = 'faker',
  'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
  'fields.name.expression' = '#{harry_potter.characters}'
);

SELECT
  c.character_id,
  l.location,
  c.name
FROM datagen_source AS l
JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c
ON l.character_id = c.character_id;