This topic describes how to compute and analyze collected data by using DataWorks.
Prerequisites
The required data is collected. For more information, see Collect data.Create tables
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click DataStudio in the Actions column.
- In the Scheduled Workflow pane of the DataStudio page, find the created workflow and click the workflow name. Right-click MaxCompute and select Create Table.
- In the Create Table dialog box, configure the Name parameter and click Create. Create a table named ods_log_info_d at the operational data store (ODS) layer, a table named dw_user_info_all_d at the common data model (CDM) layer, and a table named rpt_user_info_d at the application data store (ADS) layer.
- Create tables by executing DDL statements.
- Create the ods_log_info_d table. Double-click the name of the ods_log_info_d table. On the configuration tab of the table, click DDL Statement and enter the following table creation statement:
-- Create a table at the ODS layer. CREATE TABLE IF NOT EXISTS ods_log_info_d ( ip STRING COMMENT 'The IP address', uid STRING COMMENT 'The ID of the user', time STRING COMMENT 'The time in the format of yyyymmddhh:mi:ss', status STRING COMMENT 'The status code that is returned by the server', bytes STRING COMMENT 'The number of bytes that are returned to the client', region STRING COMMENT 'The region, which is obtained based on the IP address', method STRING COMMENT 'The type of the HTTP request', url STRING COMMENT 'url', protocol STRING COMMENT 'The version number of HTTP', referer STRING COMMENT 'The source URL', device STRING COMMENT 'The terminal type', identity STRING COMMENT 'The access type, which can be crawler, feed, user, or unknown' ) PARTITIONED BY ( dt STRING );
- Create the dw_user_info_all_d table. Double-click the name of the dw_user_info_all_d table. On the configuration tab of the table, click DDL Statement and enter the following table creation statement:
-- Create a table at the CDM layer. CREATE TABLE IF NOT EXISTS dw_user_info_all_d ( uid STRING COMMENT 'The ID of the user', gender STRING COMMENT 'The gender', age_range STRING COMMENT 'The age range', zodiac STRING COMMENT 'The zodiac sign', region STRING COMMENT 'The region, which is obtained based on the IP address', device STRING COMMENT 'The terminal type', identity STRING COMMENT 'The access type, which can be crawler, feed, user, or unknown', method STRING COMMENT 'The type of the HTTP request', url STRING COMMENT 'url', referer STRING COMMENT 'The source URL', time STRING COMMENT 'The time in the format of yyyymmddhh:mi:ss' ) PARTITIONED BY ( dt STRING );
- Create the rpt_user_info_d table. Double-click the name of the rpt_user_info_d table. On the configuration tab of the table, click DDL Statement and enter the following table creation statement:
-- Create a table at the ADS layer. CREATE TABLE IF NOT EXISTS rpt_user_info_d ( uid STRING COMMENT 'The ID of the user', region STRING COMMENT 'The region, which is obtained based on the IP address', device STRING COMMENT 'The terminal type', pv BIGINT COMMENT 'pv', gender STRING COMMENT 'The gender', age_range STRING COMMENT 'The age range', zodiac STRING COMMENT 'The zodiac sign' ) PARTITIONED BY ( dt STRING );
- Create the ods_log_info_d table.
- After you enter the CREATE TABLE statements, click Generate Table Schema. In the Confirm message, click OK.
- On the configuration tab of each table, enter a display name in the General section.
- After the tables are configured, click Commit to DEV and Commit to PROD in sequence. Note If you use a workspace that is in basic mode, only Commit to PROD is available.
Design the workflow
For more information about how to configure the dependencies among nodes in a workflow, see Collect data.
In the Scheduled Workflow pane of the DataStudio page, find the created workflow and double-click the workflow name. On the configuration tab of the workflow, click ODPS SQL or drag ODPS SQL to the configuration tab on the right. In the Create Node dialog box, configure the Name parameter and click Commit.

Create a user-defined function (UDF)
- Create a resource.
- Register a function.
Configure the ODPS SQL nodes
- Configure the ods_log_info_d node.
- Configure the dw_user_info_all_d node.
- Configure the rpt_user_info_d node.
Commit the workflow
- On the configuration tab of the workflow, click the
icon in the top toolbar to commit the nodes that are configured in the workflow.
- In the Commit dialog box, select the nodes that you want to commit and select Ignore I/O Inconsistency Alerts.
- Click Commit.
Run the workflow
- On the configuration tab of the workflow, click the
icon in the top toolbar to verify the logic of node code.
- After all nodes are run and a green check mark (✓) appears, click Ad Hoc Query in the left-side navigation pane.
- In the Ad Hoc Query pane, right-click Ad Hoc Query and choose .
- Write and execute an SQL statement to query the node execution result and check whether required data is generated. Execute the following SQL query statement. By default, the data timestamp of a node is one day before the node is run.
--- View data in the rpt_user_info_d table. select * from rpt_user_info_d where dt=Data timestamp limit 10;
Deploy the workflow
- Before you deploy the nodes to the production environment, test the node code to ensure that the code is correct.
- In a workspace in basic mode, the
icon is unavailable. After you commit a node, click the
icon to go to the Operation Center page.
- On the configuration tab of the workflow, click the
icon in the top toolbar to go to the Deploy page.
- Select the nodes that you want to deploy and click Add to List.
- Click Nodes to Deploy in the upper-right corner. In the Nodes to Deploy panel, click Deploy All.
- In the Create Deploy Task dialog box, click Deploy.
- In the left-side navigation pane of the Deploy page, click Release Package to view the deployment status.
Run the nodes in the production environment
- After the nodes are deployed, click Operation Center in the upper part of the Deploy page. You can also click Operation Center in the top toolbar on the configuration tab of the workflow to go to the Operation Center page.
- In the left-side navigation pane of the Operation Center page, choose Cycle Task page. Then, click the root node of the workshop workflow. to go to the
- Double-click the zero load node which is the root node of the workflow in the directed acyclic graph (DAG) to show the workflow. Right-click the workshop_start node and choose .
- Select nodes to backfill data, specify the data timestamp, and then click OK. The Patch Data page appears.
- Click Refresh until all SQL nodes are successfully run.
What to do next
You have learned how to create SQL nodes and process raw log data. You can now proceed with the next tutorial to learn how to monitor and ensure the quality of the data that is generated by the developed nodes. For more information, see Configure rules to monitor data quality.