This topic describes how to use the Faker connector.
Background information
The Faker connector is a built-in connector of fully managed Flink. The connector generates test data based on Java Faker expressions that are provided for each field in a table. If you want to use test data to verify the business logic during draft development or deployment testing, we recommend that you use the Faker connector.
The following table describes the capabilities supported by the Faker connector.
Item | Description |
---|---|
Table type | Source table and dimension table |
Running mode | Batch mode and streaming mode |
Data format | N/A |
Metric | N/A |
API type | SQL API |
Data update or deletion in a result table | N/A |
Prerequisites
N/A
Limits
- Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 4.0.12 or later supports the Faker connector.
- The Faker connector supports only the following data types: CHAR(n), VARCHAR(n), STRING, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, TIMESTAMP, ARRAY, MAP, MULTISET, and ROW.
Syntax
CREATE TABLE faker_source (
`name` STRING,
`age` INT
) WITH (
'connector' = 'faker',
'fields.name.expression' = '#{superhero.name}',
'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);
Parameters in the WITH clause
Category | Parameter | Description | Data type | Required | Default value | Remarks |
---|---|---|---|---|---|---|
Common parameters | connector | The type of the table. | STRING | Yes | No default value | Set the value to faker. |
fields.<field>.expression | The Java Faker expression that generates the value of the field. | STRING | Yes | No default value | For more information, see Field expression. | |
fields.<field>.null-rate | The rate at which the value in this field is null. | FLOAT | No | 0.0 | N/A. | |
fields.<field>.length | The length of the field of the ARRAY, MAP, or MULTISET data type. | INTEGER | No | 1 | N/A. | |
Parameters only for source tables | number-of-rows | The number of rows of data that is generated. | INTEGER | No | -1 | If you configure this parameter, the source table is bounded. If you do not configure this parameter, the source table is unbounded. |
rows-per-second | The rate at which random data is generated. | INTEGER | No | 10000 | Default value: 10000. Unit: records per second. |
Sample code
- Sample code for a source table
CREATE TEMPORARY TABLE heros_source ( `name` STRING, `power` STRING, `age` INT ) WITH ( 'connector' = 'faker', 'fields.name.expression' = '#{superhero.name}', 'fields.power.expression' = '#{superhero.power}', 'fields.power.null-rate' = '0.05', 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' ); CREATE TEMPORARY table blackhole_sink( `name` STRING, `power` STRING, `age` INT ) WITH ( 'connector' = 'blackhole' ); INSERT INTO blackhole_sink SELECT * FROM heros_source;
- Sample code for a dimension table
CREATE TEMPORARY TABLE datagen_source ( `character_id` INT, `location` STRING, `proctime` AS PROCTIME() ) WITH ( 'connector' = 'datagen' ); CREATE TEMPORARY TABLE faker_dim ( `character_id` INT, `name` STRING ) WITH ( 'connector' = 'faker', 'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}', 'fields.name.expression' = '#{harry_potter.characters}' ); SELECT c.character_id, l.location, c.name FROM datagen_source AS l JOIN faker_dim FOR SYSTEM_TIME AS OF proctime AS c ON l.character_id = c.character_id;
Field expression
- OperationWhen you use the Faker connector, you must define an expression in the WITH clause for each field in the DDL statement. The expression is in the 'fields.<field>.expression' = '#{className.methodName ''parameter'', ...}' format. The following table describes the parameters in the expression.
Parameter Description field The name of a field in the DDL statement. className The name of a Faker class. Java Faker provides about 80 Faker classes to generate field expressions. You can select the classes based on your business requirements.
Note Faker class names are not case-sensitive.methodName The name of a method. Note Method names are not case-sensitive.parameter The input parameters of a method. Note- Each input parameter of a method must be enclosed in single quotation marks (').
- Separate multiple input parameters with commas (,).
- ExampleThis example describes how to generate an SQL expression for a field in a DDL statement based on the Java Faker API documentation. The 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' expression for the age field in Syntax is used in this example.
- Find the Number class in the Java Faker API documentation.
- Find the numberBetween method in the Number class and view the method description.
The numberBetween method specifies the numbers between which the return value falls.
- Obtain the SQL expression 'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}' for the age field based on the Number class and the values 0 and 1000 that are specified by the numberBetween method.
This expression indicates that the generated value of the age field is in the range of 0 to 1000.