This topic describes how to use the Faker connector.

Background information

The Faker connector is a built-in connector of fully managed Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify the business logic during draft development or deployment testing, we recommend that you use the Faker connector.

The following table describes the capabilities supported by the Faker connector.
ItemDescription
Table typeSource table and dimension table
Running modeBatch mode and streaming mode
Data formatN/A
MetricN/A
API typeSQL API
Data update or deletion in a result tableN/A

Prerequisites

N/A

Limits

  • Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports the Faker connector.
  • The Faker connector supports only the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.

Syntax

CREATE TABLE faker_source (
  `name` STRING,
  `age` INT
) WITH (
  'connector' = 'faker',
  'fields.name.expression' = '#{superhero.name}',
  'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);

Parameters in the WITH clause

CategoryParameterDescriptionData typeRequiredDefault valueRemarks
Common parametersconnectorThe type of the table. STRINGYesNo default valueSet the value to faker.
fields.<field>.expressionThe Java Faker expression that generates the value of the field. STRINGYesNo default valueFor more information, see Field expression.
fields.<field>.null-rateThe rate at which the value in this field is null. FLOATNo0.0N/A.
fields.<field>.lengthThe length of the field of the ARRAY, MAP, or MULTISET data type. INTEGERNo1N/A.
Parameters only for source tablesnumber-of-rowsThe number of rows of data that is generated. INTEGERNo-1If you configure this parameter, the source table is bounded. If you do not configure this parameter, the source table is unbounded.
rows-per-secondThe rate at which random data is generated. INTEGERNo10000Default value: 10000. Unit: records per second.

Sample code

  • Sample code for a source table
    CREATE TEMPORARY TABLE heros_source (
      `name` STRING,
      `power` STRING,
      `age` INT
    ) WITH (
      'connector' = 'faker',
      'fields.name.expression' = '#{superhero.name}',
      'fields.power.expression' = '#{superhero.power}',
      'fields.power.null-rate' = '0.05',
      'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
    );
    
    CREATE TEMPORARY table blackhole_sink(
      `name` STRING,
      `power` STRING,
      `age` INT
    ) WITH (
      'connector' = 'blackhole'
    );
    
    INSERT INTO blackhole_sink SELECT * FROM heros_source;
  • Sample code for a dimension table
    CREATE TEMPORARY TABLE datagen_source (
      `character_id` INT,
      `location` STRING,
      `proctime` AS PROCTIME()
    ) WITH (
      'connector' = 'datagen'
    );
    
    CREATE TEMPORARY TABLE faker_dim (
      `character_id` INT,
      `name` STRING
    ) WITH (
      'connector' = 'faker',
      'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
      'fields.name.expression' = '#{harry_potter.characters}'
    );
    
    SELECT
      c.character_id,
      l.location,
      c.name
    FROM datagen_source AS l
    JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c
    ON l.character_id = c.character_id;

Field expression

  • Operation
    When you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.
    ParameterDescription
    fieldThe name of a field in the DDL statement.
    classNameThe name of a Faker class.

    Java Faker provides about 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.

    Note Faker class names are not case-sensitive.
    methodNameThe name of a method.
    Note Method names are not case-sensitive.
    parameterThe input parameters of a method.
    Note
    • Each input parameter of a method must be enclosed in single quotation marks (').
    • Separate multiple input parameters with commas (,).
  • Example
    This example describes how to generate an SQL expression for a field in a DDL statement based on the Java Faker API documentation. The 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in Syntax is used in this example.
    1. Find the Number class in the Java Faker API documentation. Class Number
    2. Find the numberBetween method in the Number class and view the method description. numberBetween

      The numberBetween method specifies the numbers between which the return value falls.

    3. Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.

      This expression indicates that the generated value of the age field is in the range of 0 to 1000.