Data push configuration - DataWorks - Alibaba Cloud Documentation Center

Overview

You can schedule periodic tasks to push data to a target Webhook or email address.

Supported data sources and channels

Supported data source types:
- MySQL (compatible with StarRocks and Doris)
- PostgreSQL (compatible with Snowflake and Redshift)
- Hologres
- MaxCompute (ODPS)
- ClickHouse
Supported push channels include DingTalk, Lark, WeCom, email, and Teams.

Limitations

Each SELECT statement in the data push service can return a maximum of 10,000 rows.
Data size limits for different destinations:
- For DingTalk, the pushed data size must not exceed 20 KB.
- For Lark, the pushed data size must not exceed 20 KB, and each image must be smaller than 10 MB.
- For WeCom, each bot can send up to 20 messages per minute.
- For Teams, the pushed content must not exceed 28 KB.
- For email, each data push task supports only one email body. For more limits, see the SMTP restrictions of your email service.
The data push feature is available only in DataWorks workspaces in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Japan (Tokyo), US (Silicon Valley), US (Virginia), and Germany (Frankfurt).

Prerequisites

Ensure that a data source is created. For details, see Data Source Management.
Ensure that public network access is enabled for your resource group. For details, see Network connectivity solution overview.

Step 1: Create a push task

Go to Data Service.

Log on to the DataWorks console. In the top navigation bar, select the region where your data source resides. In the left-side navigation pane, choose Data Analysis and Service > Data Services. Select the desired workspace from the drop-down list and click Go to Data Service.

Create a data push task.

In the left-side navigation pane of Data Services, choose Service Development > Data Push to go to the Data Push page. Click the icon, select Create Data Push Task, enter a name for the task, and click OK. The task configuration page opens.

Step 2: Configure the push task

Preparation (optional)

To help you quickly perform a data push, this topic uses an example to explain how to push query results from a MaxCompute table. In this example, you use the data push feature to send data from a table named sales to a specified channel. The data includes the daily sales amount for each department and the change in sales amount compared to the previous day. If you want to follow the steps in this example, you must first create the sales table in your environment. The following code provides the statements to create the sales table and insert data into it. For more information about how to create a table, see Create and use MaxCompute tables.

CREATE TABLE IF NOT EXISTS sales (
    id BIGINT COMMENT 'Unique identifier',
    department STRING COMMENT 'Department name',
    revenue DOUBLE COMMENT 'Revenue amount'
) PARTITIONED BY (ds STRING);
-- Insert sample data into partitions
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (1, 'Department 1', 12000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (2, 'Department 2', 21000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (3, 'Department 3', 5000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (1, 'Department 1', 11000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (2, 'Department 2', 20000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (3, 'Department 3', 10000.00);

Select a data source

Select the Data Source Type, Data Source Name, and Data Source Environment to determine the environment of the data table for the data push. You can select the data source environment based on whether the data push is for a development table or a production table. If you are performing a hands-on exercise, confirm the environment where the sales table you created during the preparation phase is located.

For example, set Data source type to odps, Data source name to MaxCompute_Source, and Data source environment to production environment. To create a new data source, click the link below the Data source name field.

Note

For a list of supported data source types, see Supported data sources and channels.

Write query SQL

Define the data scope and retrieve data.

In the Edit Query SQL section, use single-table or multiple-table SQL queries to define the data to be pushed. For example:

-- Get the sales revenue for each department on 20240102
SELECT id, department, revenue FROM sales WHERE ds='20240102';
-- Get the change in sales revenue compared to the previous day
SELECT  a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id AND a.ds > b.ds WHERE a.ds = '20240102'AND b.ds = '20240101';

After you write the SQL, the result fields are automatically populated in the Parameters > Output Parameters section. If parsing the output parameters fails or they are incorrect, you can disable Automatically Parse Parameters and manually Add Parameter.

You can also configure custom variables in SQL by using the ${variable_name} format. This variable is an Assignment Parameters(Assignment Parameters can be assigned time expressions and constants) to implement dynamic parameter input for your code. For more information, see Configure push content.

-- Use scheduling parameters to dynamically assign time variables.
-- Get the latest daily sales revenue for each department
SELECT id, department, revenue FROM sales WHERE ds='${date}';
-- Get the change in sales revenue compared to the previous day
SELECT a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id  and  a.ds > b.ds WHERE a.ds = '${date}' AND b.ds = '${previous_date}';

Paginated query.

For large tables, data push supports paginated queries using a Next Token. Click Code Help > Code Template > Next Token on the code editor toolbar for usage instructions.

Configure push content

In the Content to Push section, you can edit the message content using Markdown and Table formats. This content will be pushed to the Webhook.

After you customize the message title in the Title field, click Add in the body area. Then, choose Markdown, Table, or Email Body to edit the content. The following example shows a sample configuration. You can click Preview on the toolbar to see the message format.

Note

If the push destination is an email address, the content customized in the Markdown and Table sections is sent as attachments. The email body is rendered and displayed in the email message.
If the push destination is not an email address, the content customized in the Markdown and Table sections will be displayed as the main body of the Webhook message. The Email Body will be hidden in the Webhook push message.

Markdown content

Use parameter variables: When composing the push content, you can add Assignment Parameters and Output Parameters to the rich text using the ${parameter_name} format. These variables are replaced with the corresponding assigned data or SQL query results when the data push task runs.
- Assignment Parameters: You need to assign a Constant or a scheduling parameter's Time Expression to the variable in the Parameters > Assignment Parameters section.
- Output Parameters: These parameters correspond to the field names or aliases from your SQL query, such as A, B, ... in a statement like SELECT A, B, ... FROM TABLE. They represent the queried data.
@mention members: You can configure this when pushing to a Lark Webhook to automatically @mention specific users.
- By default, Markdown mode uses rich text to configure message content. When pushing to Lark, you can use the @mention feature to notify relevant personnel. You can click the icon to switch to Markdown source mode and then use <at id="all" /> or <at email="username@example.com" /> to achieve this.
In addition to the features above, Markdown also supports functions like Add Image and inserting DingTalk Emoji.

In the push content area, select Markdown as the template type. In the body, use the ${parameter_name} syntax to reference parameters defined in the Input Parameters panel on the right. For example, if you write ${creator} and ${subscriber} in the body, and set creator to "admin" and subscriber to "user" on the Input parameters tab, the variables are automatically replaced with their values when the task runs. Input parameters also support scheduling time variables. For example, you can set date to ${yyyymmdd} and previous_date to ${yyyymmdd-1}. The Write Query SQL section can also reference input parameters for dynamic values, for example, SELECT id, department, revenue FROM sales WHERE ds='${date}';.

Table content

Click Add Column to increase the number of columns in the table. You can then associate Parameters with the corresponding columns.
When the push destination is a Lark Webhook, click the icon to the right of a created table column to open the Modify Field dialog box. In this dialog box, you can adjust the Field, Display Name, Display Style, and Condition to create diverse display effects for the pushed content.
- Field: Switch to another Output Parameters field.
- Display Name: The name you want to show in the table header when pushing to collaboration tools.
- Display Style: Add a fixed prefix or suffix before or after the Value in the table.
- Condition: Compares the Value in a table with a configured comparison value. You can customize the display color for values that Yes or No, and specify an Additional Unicode. Condition: You can enable conditional logic, set an operator (such as >=) and a threshold (such as 60). If the condition is met, you can select Change to green. If the condition is not met, you can select Change to red. You can also configure an Appended Identifier.
Note
- The method for authoring tables varies by channel. Table content support for different channels is as follows:
  - DingTalk: Supports Markdown tables and the built-in tables of data push. It does not support rendering the Display Style and Condition settings configured in the Modify Field dialog box. Also, DingTalk mobile does not support displaying tables.
  - Lark: Supports both Markdown and built-in tables, including the rendering of custom display styles and conditions.
  - WeCom: Supports pushing Markdown tables but does not render them.
  - Teams mobile: Supports pushing Markdown tables and can render them.

Email body

DataWorks data push supports adding an email body to the push content. When you edit the email body, note the following:

Each data push task supports only one email body.
The email body is rendered only when the push destination is an email address. If the push destination is not an email address, the Email Body is hidden in the Webhook push message.

Step 3: Configure push settings

Before you configure Push Settings, click the icon in the lower-left corner of the Service Development page to open the settings panel. Switch to the Destination Management tab, and click Create Destination to create a destination. Supported channel types include DingTalk, Lark, WeCom, Teams, and Email.

Create a Webhook destination

When you click Create Destination, configure the following parameters:

Type: Select a channel type. Options include DingTalk, Lark, WeCom, and Teams.
Destination Name: Enter a custom name for the new push destination.
Webhook: The Webhook URL of the selected push channel.

Note

For how to obtain a Lark bot Webhook, see Configure a Lark Webhook trigger.
For how to obtain a Teams Webhook, see Use Microsoft Teams workflows to create an incoming Webhook.

The Type drop-down list also supports the Email channel. After you complete the configuration, click OK.

Create an email destination

Before you configure Push Settings, click the icon in the lower-left corner of the Service Development page to open the settings panel. Switch to the Destination Management tab, and click Create Destination to create a destination.

When you click Create Destination, you must configure the following parameters:

Type: Select Email.
Destination Name: Enter a custom name for the new push destination.
SMTP Host: The address of the mail server.
SMTP Port: The port number of the mail server. The default value is 465, which can be manually changed.
Sender Address: The email sending address.
SMTP Account: The full email account.
SMTP Password: The password for the email account.
Receiver Address: The destination email address.

Push settings

Click Push Settings on the right side to configure the task's scheduling cycle, scheduling resources, and push destinations. The specific configuration items are as follows:

Scheduling cycle and run time configuration: Configure the scheduling cycle and specific time for the data push service to push the edited content.

Scheduling cycle	Specified time	Scheduling time	Example
Month	Specify the days of the month on which to run the push task.	The scheduling time for the data push task on the push day.	Scheduling Frequency: Month Specified Time: 1st of every month Data Timestamp: 08:00 Actual run time: The push task runs at 08:00 on the 1st of every month.
Week	Specify the days of the week on which to run the push task.	The scheduling time for the data push task on the push day.	Scheduling Frequency: Week Specified Time: Monday Data Timestamp: 09:00 Actual run time: The push task runs at 09:00 every Monday.
Day	Note The daily cycle schedules the task to run every day.	The scheduling time for the data push task on the push day.	Scheduling Frequency: Day Data Timestamp: 08:00 Actual run time: The push task runs at 08:00 every day.
Hour	Note You can choose between two push modes: Push at a specified hourly interval. Push at specified hours and minutes.		Push at an hourly interval: Start Time: 00:00 Time Interval: 1 hour End Time: 23:59 Actual run time: Pushes once every hour from 00:00 to 23:59 daily. Push at specified hours and minutes: Hour: 0, 1 Specified Minute: 10 Actual run time: Pushes at 00:10 and 01:10 daily.

Timeout Definition: Sets a time limit for task execution. The task is terminated if it exceeds this limit.
- Default Value: With the Default Value setting, the task timeout is dynamically adjusted based on the system load, with a value ranging from 3 to 7 days. Timed-out tasks are terminated.
- Example: If you set a Custom timeout of 1 hour, the push task is terminated if it runs for more than 1 hour after its scheduled start time.
Valid From: Configure the time range during which the data push task is active.
- Permanent: The data push task remains effective permanently and is not limited by an effective date range.
- Example: If you configure a Specified Time range from 2024-01-01 to 2024-12-31, the push task runs according to the configured scheduling cycle within this period.
Resource Group for Scheduling: You can configure an Exclusive resource group for scheduling or a serverless resource group (general-purpose resource group) to provide scheduling resources for the data push task. For more information about resource groups, see Resource Group Management.
Push Every Time: Controls whether to send a push notification when the SQL query returns no data.
- Enabled (default): The push is executed on every scheduled run, regardless of whether the query returns data.
- Disabled: If all variables used in the push content, except for input parameters, are empty, the message is not sent. You can use WHERE or HAVING clauses in SQL to filter data. If the filter conditions are not met and the query result is empty, the push task is automatically skipped and no message is sent.
Destination: You can push the configured content to a selected destination. You can only choose from existing push destinations, which are configured in Data Push Task Management.

Note
When pushing to a DingTalk Webhook, you must add a keyword in the Security Settings > Custom Keywords section of the bot's configuration. Ensure that the push content includes this keyword for the push to succeed.

Step 4: Test the push task

After creating the data push task, click the Save button on the toolbar to save the current configuration. Then, click Test to perform a development-stage test to verify that the data push works correctly. You must manually assign constant values to the variables for the test.

Note

A data push task must pass a test push in the development environment before it can be Submit and Publish.

Step 5: Publish the push task

Manage task versions

After you confirm that the tests during development are successful, click Submit. If the push task is not submitted, it remains in a draft state and no new version is generated.

After you submit the service, a new version is generated. In the Version panel on the right, find the submitted version that is Can Be Published and click Publish. Publishing the task activates its schedule as defined in the Push Settings.

In the Version panel, manage the data push task as follows.

Status	Actions	Description
Publish	Data Push Task Management	Goes to the Data Push Task Management page, where you can view detailed information about published tasks. For more information, see Manage data push tasks.
Can Be Published	Publish	Publishes the corresponding version of the task.
Can Be Published	Abandoned	Discards the corresponding version of the task and changes its status to Abandoned.
Off-Line, Abandoned	Version Details	View the configuration information and corresponding push content for that version of the data push task.
Off-Line, Abandoned	Roll Back	Restores this version, making it the current configuration.

Note

The Version Details and Roll Back operations are available and function identically for tasks in all statuses.

Manage push tasks

After a data push task is successfully published, click Data Push Task Management in the Operation column of the Version panel, or navigate to the Data Push Tasks list page via the Service Management > Data Push Task Management path.

This page lists all published Data Push Tasks and displays details such as their ID, Name, Data Source Name, Data Source Environment, Node Mode, Resource Group for Scheduling, Owner, Deployer, and Published Time. In the Operation column, perform the following operations on published data push tasks:

Actions	Description
Unpublish	Takes the selected task offline.
Test	Goes to the Test Data Push Task page, where you can test a published task.

Note

Clicking the icon in the Name column takes you to the Version Details page for the selected task.

Test a published task

Go to the Data Push Test page in either of the following ways:

Method 1: Choose Service Management > Test Data Push Task.
Method 2: Choose Service Management > Data Push Task Management > Data Push Tasks.

Testing a published task confirms that it runs correctly and that the destination receives the data as expected.

On the Data Push Test page, select or search for the target data push task from the drop-down list, select the Push to Destination check box as needed, and then click Start Test.

FAQ

Q: Does data push support on-demand pushes?

A: For occasional, on-demand pushes, use the Test function with the Push to Destination option selected. For conditional recurring pushes, disable the Push Every Time setting; the task will then only run if your SQL query returns data.