This topic describes how to use the Faker connector.
Background information
The Faker connector is a built-in connector of fully managed Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify the business logic during draft development or deployment testing, we recommend that you use the Faker connector.
Item | Description |
---|---|
Table type | Source table and dimension table |
Running mode | Batch mode and streaming mode |
Data format | N/A |
Metric | N/A |
API type | SQL API |
Data update or deletion in a result table | N/A |
Prerequisites
N/A
Limits
- Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports the Faker connector.
- The Faker connector supports only the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.
Syntax
CREATE TABLE faker_source (
`name` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);
Parameters in the WITH clause
Category | Parameter | Description | Data type | Required | Default value | Remarks |
---|---|---|---|---|---|---|
Common parameters | connector | The type of the table. | STRING | Yes | No default value | Set the value to faker. |
fields.<field>.expression | The Java Faker expression that generates the value of the field. | STRING | Yes | No default value | For more information, see Field expression. | |
fields.<field>.null-rate | The rate at which the value in this field is null. | FLOAT | No | 0.0 | N/A. | |
fields.<field>.length | The length of the field of the ARRAY, MAP, or MULTISET data type. | INTEGER | No | 1 | N/A. | |
Parameters only for source tables | number-of-rows | The number of rows of data that is generated. | INTEGER | No | -1 | If you configure this parameter, the source table is bounded. If you do not configure this parameter, the source table is unbounded. |
rows-per-second | The rate at which random data is generated. | INTEGER | No | 10000 | Default value: 10000. Unit: records per second. |
Sample code
- Sample code for a source table
CREATE TEMPORARY TABLE heros_source ( `name` STRING, `power` STRING, `age` INT ) WITH ( 'connector' = 'faker', 'fields.name.expression' = '#{superhero.name}', 'fields.power.expression' = '#{superhero.power}', 'fields.power.null-rate' = '0.05', 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' ); CREATE TEMPORARY table blackhole_sink( `name` STRING, `power` STRING, `age` INT ) WITH ( 'connector' = 'blackhole' ); INSERT INTO blackhole_sink SELECT * FROM heros_source;
- Sample code for a dimension table
CREATE TEMPORARY TABLE datagen_source ( `character_id` INT, `location` STRING, `proctime` AS PROCTIME() ) WITH ( 'connector' = 'datagen' ); CREATE TEMPORARY TABLE faker_dim ( `character_id` INT, `name` STRING ) WITH ( 'connector' = 'faker', 'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}', 'fields.name.expression' = '#{harry_potter.characters}' ); SELECT c.character_id, l.location, c.name FROM datagen_source AS l JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c ON l.character_id = c.character_id;
Field expression
- OperationWhen you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.
Parameter Description field The name of a field in the DDL statement. className The name of a Faker class. Java Faker provides about 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.
Note Faker class names are not case-sensitive.methodName The name of a method. Note Method names are not case-sensitive.parameter The input parameters of a method. Note- Each input parameter of a method must be enclosed in single quotation marks (').
- Separate multiple input parameters with commas (,).
- ExampleThis example describes how to generate an SQL expression for a field in a DDL statement based on the Java Faker API documentation. The 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in Syntax is used in this example.
- Find the Number class in the Java Faker API documentation.
- Find the numberBetween method in the Number class and view the method description.
The numberBetween method specifies the numbers between which the return value falls.
- Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.
This expression indicates that the generated value of the age field is in the range of 0 to 1000.
- Find the Number class in the Java Faker API documentation.