All Products
Search
Document Center

DataWorks:AI-assisted processing

Last Updated:Mar 28, 2026

AI-assisted processing integrates large language model (LLM) inference directly into the DataWorks Data Integration pipeline. Instead of moving data as-is, the pipeline calls AI models in real time to analyze, enrich, or transform each record as it travels from the source to the destination—unlocking value from unstructured text without writing any AI invocation code.

This feature is designed for enterprise users who need to perform advanced analysis and processing on data during data synchronization. It is especially useful for companies that want to use AI to improve data quality and extract value from data.

Supported NLP tasks

Task

What it does

Sentiment analysis

Classifies the sentiment of text

Summary generation

Condenses long documents into key points

Keyword extraction

Pulls out the most relevant terms and phrases

Text translation

Converts text from one language to another

Use cases

Industry

How AI processing helps

Customer service / E-commerce

Analyze sentiment in user comments and support tickets; extract core issues and key feedback points

Compliance / Legal / Scientific research

Generate summaries and extract key information from policy documents, legal contracts, and research papers during sync

Manufacturing / Supply chain / Healthcare

Analyze device logs, supply chain feedback, and doctor-patient records to surface threat alerts and service quality signals

Cross-language collaboration

Translate social media comments, news articles, or business documents into a single language for centralized analysis

Prerequisites

Before you begin, ensure that you have:

Billing

In addition to DataWorks subscription fees and resource group fees, AI-assisted processing incurs model inference fees.

Example: translate customer feedback during a Hologres-to-Hologres sync

This example walks through configuring AI-assisted processing in an offline sync task that copies data from one Hologres table to another. The task translates each value in the feedback_info column into English and writes the result to the destination table.

What the AI processing does in this example:

Source field

Processing description

Output field

feedback_info

Translate '#{feedback_info}' into English

feedback_processed

Step 1: Prepare source data

Create the source table and insert sample records:

CREATE TABLE customer_feedback (
    id BIGINT PRIMARY KEY,
    device STRING,
    feedback_info STRING,
    pt INT
)
PARTITIONED BY (pt)
DISTRIBUTED BY HASH(id)
WITH (table_type='Duplication');

INSERT INTO customer_feedback (id, device, feedback_info, pt)
VALUES
(8, 'Huawei MateBook D14', 'Affordable, suitable for students, performance is adequate', 2020),
(1, 'iphone', 'This product is okay, I have used it for 1 year', 2013),
(10, 'Bose QuietComfort 35 II', 'A classic among noise-canceling headphones, maximum comfort', 2021);

Step 2: Create an offline sync task

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a region. Find the target workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left navigation pane, click image to open Data Studio. To the right of Workspace Directories, click image and choose Create Node > Data Integration > Batch Synchronization. The New Node dialog box opens.

  3. Set the Path, Data Source and Destination, and Name for the node, then click OK.

Step 3: Configure the sync task

After the node is created, the task configuration page opens. Configure each section in order.

Data source

  • Type — Set during node creation and cannot be changed. To use a different data source type, create a new node.

  • Data Source — Select an existing data source from the drop-down list, or click Add Data Source to create one.

Runtime resource

  1. Select a Resource Group for the sync task. For a serverless resource group, specify the number of CUs in the Resource Usage(CU) field.

  2. After selecting a resource group, Data Integration checks connectivity to the source and destination automatically. Click Connectivity Check to run the check manually.

image

Source

Configure the source table settings: Schema, Table, Partition, and Data Filtering conditions. Click Data Preview to preview the records that will be synced.

image

Data processing

  1. In the data processing section, toggle Enable to turn on data processing. This requires additional computing resources.

  2. Click Add Node and select AI Process.

    image

  3. Configure the AI Process node. The key parameters are:

    Parameter

    Description

    Model Provider

    Select the provider: DataWorks Model Service, Aliyun Bailian, or PAI Model Gallery

    Model Endpoint

    Applies when Model Provider is PAI Model Gallery. Enter the model invocation endpoint. To get the endpoint, see Test service invocation

    Model Name

    The model used for intelligent data processing. Select one from the list

    API Key

    The credential to access the model. For Alibaba Cloud Model Studio, see Obtain a Model Studio API key. For PAI Model Gallery, go to the deployed EAS task, start online debugging, and use the access token as the API key

    Processing Description

    A natural language instruction describing how to process the source field. Reference field names using the #{column_name} format. For this example, enter Translate '#{feedback_info}' into English

    Output Field

    The field where the processed result is stored. If the field does not exist, it is created automatically. For this example, enter feedback_processed

  4. Click Data Output Preview in the upper-right corner of the data processing section to preview the processed output.

  5. (Optional) Add more processing nodes. Multiple data processing nodes execute sequentially.

    image

Destination

  1. Configure the destination table: Schema, Table, and Partition.

    • Click Generate Target Table Schema to generate the schema automatically.

    • Select an existing table if one already exists.

  2. Set the Write Mode and Write Conflict Strategy.

    image

  3. Configure whether to delete existing data in the Hologres table before the sync.

  4. (Optional) Set Maximum Connections.

    Maximum Connections applies only when Write Mode is SQL(INSERT INTO). A single task can use up to nine connections. Make sure the Hologres instance has enough idle connections before starting the task.

Destination field mapping

After configuring the source, data processing, and destination, the field mapping between the source and destination tables is shown. Fields are mapped by name and position by default.

For this example, in addition to the default mapping for id, device, feedback_info, and pt, manually map feedback_processed (the AI output field from the source) to translate_feedback in the destination table.

image

Step 4: Test the task

  1. In the right panel, click Run Configuration. Set the Resource Group and any Script Parameters for this test run.

  2. In the toolbar, click Save and then Run. After the task finishes, verify the result is successful and check the destination table to confirm the data is correct.

Step 5: Configure scheduling

To run the sync task on a schedule, set the Scheduling Policies in the Scheduling section on the right side of the page and configure the node scheduling properties.

Step 6: Publish the node

Click the Publish icon in the toolbar to start the publishing flow. This publishes the task to the production environment. Periodic scheduling takes effect only after the node is published.

What's next

After the node is published, you can backfill historical data or monitor and manage the task in Operation Center.

References