This topic uses create business flow as an example to describe how to create nodes and configure dependencies in your business flow to facilitate the design and presentation of steps and sequences of data analysis. This article briefly explains how to use the data development function to further analyze and calculate the workspace data.
DataWorks data development features support visual drag-and-drop in the business flow to complete inter-node dependency settings. The data flow and dependencies are implemented in the form of operational business flow. Currently supports multiple task types, such as MaxCompute SQL, data synchronization, open_mr, shell, machine learning, and virtual nodes. For specific usage methods for each task type, seeNode type overview.
Make sure you have built the table and uploaded the data, prepare the business data table bank_data and data in the workspace, as well as the result table.
- After Create Workspace, click Enter workspace in the corresponding project.
- Go to the DataStudio page and select .
- Enter the name and description of the business flow.
Create a node and dependency on the flow canvas
- The virtual node is a control-type node that does not affect data during business flow operation and is only used to control O&M of downstream nodes.
- When a virtual node depends on other nodes and the status is manually set to error by the O&M personnel, downstream nodes that have not run yet cannot be triggered. This prevents further propagation of erroneous upstream data during the O&M flow. For more information, see the section on virtual nodes in Node type overview.
- The upstream task of a virtual node in a business flow is typically set as the root node of the project, the format of the Project root node is: Project name _ root.
- Double-click the virtual node and enter the node name start.
- Double-click MaxCompute SQL to enter the node name “insert_data”.
- Click the start node, and draw a line between start and insert_data to make insert_data a dependency on start, as shown in the following figure:
Editing code in the MaxCompute SQL Node
This section describes how to use SQL code in the MaxCompute SQL node insert_data to query the number of mortgages available for individuals with different educational backgrounds and save results for analysis or display by the following nodes.
The SQL statements are as follows. For more information about the syntax, see MaxCompute SQL.
INSERT OVERWRITE TABLE result_table --Insert data to result_table SELECT education , COUNT(marital) AS num FROM bank_data WHERE housing = 'yes' AND marital = 'single' GROUP BY education
- After editing the SQL statements in the insert_data node, click Save to prevent code loss.
- Click Run to view the operations logs and results,
Save and submit business flows
Now you have learned how to create, save, and submit the business flow. You can proceed to the next topic which shows how to create a synchronization task to export data to the different types of data sources. For more information, see create synchronization task export results.