edit-icon download-icon

Data processing analysis

Last Updated: Mar 27, 2018

Create a workflow

Procedure

Go to the Data Development page, and create the workflow coolshell_log.

Step 1: Create a directory of workflow files. Switch the file directory tree to Data Development (New directory - Web_log_analysis):

72

Step 2: Right-click the folder in the directory to create a workflow or click New in the upper-right corner of the workspace, and select Create task from the drop-down list.

73

Enter the workflow name (coolshell_log) and its descriptions, and select the Periodic Scheduling as the scheduling type. The “Periodic Scheduling” is selected because the workflow generates daily logs through auto scheduling.

Click Create to successfully create a workflow.

Step 3: Configure the workflow properties. The scheduling time properties are shown in the following figure. The dependency properties are not required in this case. In practical scenarios, to import data to the workflow, dependency properties are required.

74

Design workflow nodes

After a workflow is created, enter the workflow design panel to add nodes for node design.

Step 1: Double-click the node widget or drag it to the canvas on the right, and add the following nodes in order:

■ Data processing (ODPS SQL): The node name is dw_log_parser. After the data is imported, perform ETL process (request field dismantling) for the data, and write the data into dw_log_parser.

In practical scenarios, if the data is not imported to other workflows, you must have an import node. In this case, the step of data import is excluded.

■ Data analysis (ODPS SQL): The node name is dw_log_detail. You must perform further data analysis and processing on the dw_log_parser table to get the dw_log_detail.

■ Data analysis (ODPS SQL): Build a user dimension table (dim_user_info) and a website access fact table (dw_log_fact) based on the dw_log_detail table. The names of the nodes required are dim_user_info and dw_log_fact.

■ Data application (ODPS SQL): based on the preceding user dimension table and website access fact table, satisfy the business needs specified in the “Requirements Analysis” for this test to produce PV/UV table (adm_user_measueres) which measures the website based on device types of user, and the website access source table (adm_refer_info).

Note: Data applications are designed to meet the business needs. Under normal conditions, it is possible that this layer is developed by a member of another team, any one in another project or another workflow. To facilitate complete walkthrough, the two tasks dm_user_measueres and adm_refer_info in this layer are also placed in this workflow.

The following figures displays how does the nodes spread on the canvas appear:

75

You can connect these nodes with lines according to the internal logic of the nodes to see the relationship among them. When you hover the cursor over the node, you can see a small semicircle that appears in the middle of the node. Move the cursor towards the semicircle till the cursor turns into a cross. Press the left button to draw a line, and drag the mouse pointer to the next node to connect two nodes with a line. This line reflects the dependency of nodes, while the arrow reflects the order. Click Save after connecting the nodes to save your design. The connected nodes are shown in the following figure:

76

Configure the node

In the overall view of the development panel, double-click all the nodes to enter the node code editing area, enter the corresponding SQL statements (see the attachment for specific SQL statements), and complete the corresponding parameter configurations.Several SQL node codes do not use the custom variables, and you do not have to configure the parameters. You can use the default values.

Codes and parameters are configured. Save the nodes.

Run workflow nodes

To run the entire workflow, you must click Save and Submit.

77

After the workflow is submitted, you can test the running status of all nodes in the entire workflow through scheduling.

78

The original data used in this test is the daily data for February 24, 2018. Therefore, we have selected 2014/02/12 as a business date. After creating a smoke workflow, you can turn to the O&M center to check the details of the workflow test

Double-click the workflow to display a specific node instance to see the running status of the node, and finally confirm that all table data is successfully outputted.

79

>>>Next: Prepare a BI report>>>

Thank you! We've received your feedback.