This topic describes how to use AI_MASK to perform data masking.
Limitations
This feature requires Ververica Runtime (VVR) 11.4+.
The throughput of
AI_MASKoperators is subject to the rate limits of Alibaba Cloud Model Studio. When the rate limits for a model are reached, the Flink job will be backpressured withAI_MASKoperators as the bottleneck. In some cases, timeout errors and job restarts may be triggered.
Syntax
AI_MASK(
MODEL => MODEL <MODEL NAME>,
INPUT => <INPUT COLUMN NAME>,
MASK_ENTITIES => <MASK ENTITIES>
)Input parameters
Parameter | Data type | Description |
MODEL <MODEL NAME> | MODEL | The name of the registered model. For more information, see Model settings to register a model service. Note: The output type of this model must be VARIANT. |
<INPUT COLUMN NAME> | STRING | The original text for the model to analyze. |
<MASK ENTITIES> | ARRAY<STRING> | The entities to be masked. Note: This input parameter must be a constant. |
Return values
Parameter | Data type | Description |
masked_text | STRING | The masked text. |
detected_entities | ARRAY<STRING> | The detected entities. |
Example
Test data
id | content |
1 | Timmo loves reading and always does so in his spare time. |
Test statements
The sample SQL statements uses the Qwen-Plus model and AI_MASK to perform data masking.
CREATE TEMPORARY MODEL general_model
INPUT (`input` STRING)
OUTPUT (`content` VARIANT)
WITH (
'provider' = 'openai-compat',
'endpoint'='<YOUR ENDPOINT>',
'apiKey' = '<YOUR KEY>',
'model' = 'qwen-plus'
);
CREATE TEMPORARY VIEW infos(id, content)
AS VALUES (1, 'Timmo loves reading and always does so in his spare time.');
-- Use positional argument to call AI_MASK
SELECT id, masked_text, detected_entities
FROM infos,
LATERAL TABLE(
AI_MASK(
MODEL general_model,
content,
ARRAY['name']
));
-- Use named argument to call AI_MASK
SELECT id, masked_text, detected_entities
FROM infos,
LATERAL TABLE(
AI_MASK(
MODEL => MODEL general_model,
INPUT => content,
MASK_ENTITIES => ARRAY['name']
));Result
id | masked_text | detected_entities |
1 | [NAME] loves reading and always does so in his spare time. | [{"entity":"Timmo","type":"name"}] |