This tutorial demonstrates how to analyze home buyer groups to help you master DataWorks Data Development and Data Analysis.
Case introduction
This tutorial analyzes purchasing behavior based on home buyer data. You will use DataWorks to upload local data to a MaxCompute bank_data table, analyze user groups using a MaxCompute SQL node to generate a result_table, and visualize the results to create group profiles.
This tutorial uses simulated data. In actual scenarios, replace this with your own business data.
The following flowchart illustrates the data flow and development process.
The analysis yields the following profile: Single home buyers with loans primarily hold university.degree or high.school diplomas.

Prerequisites
Activate DataWorks
Create a workspace
Create and associate resources
Associate MaxCompute resources
Procedure
In this tutorial, you will use DataWorks to upload test data to a MaxCompute project. Then, you will create a DataStudio workflow to clean and write data, debug the workflow, and verify the results using SQL.
Step 1: Create a table
First, use Data Catalog in DataWorks to create a bank_data table in MaxCompute.
Log on to the DataWorks Console. Switch to the target region, click in the left navigation pane, select the corresponding workspace from the drop-down list, and then click Go to Data Studio.
Click the
icon in the left navigation pane to go to the Data Catalog page.(Optional) If your MaxCompute project is missing from Data Catalog, click the
icon, go to DataWorks Data Sources, and add the project.Click to expand the MaxCompute directory, select the target MaxCompute project, and create a MaxCompute table in the Table folder.
NoteIf the schema feature is enabled for your MaxCompute project, you must select the target schema after selecting the project to create the MaxCompute table in the Table folder.
This example uses a standard mode workspace. Create the
bank_datatable in the development environment only. If you are using a simple mode workspace, you only need to create thebank_datatable in the MaxCompute project corresponding to the production environment.
Click the
icon to open the table editing page.Enter the following SQL statement in the DDL section. The system will automatically generate the table information.
CREATE TABLE IF NOT EXISTS bank_data ( age BIGINT COMMENT 'Age', job STRING COMMENT 'Job type', marital STRING COMMENT 'Marital status', education STRING COMMENT 'Education level', `default` STRING COMMENT 'Has credit card', housing STRING COMMENT 'Housing loan', loan STRING COMMENT 'Loan', contact STRING COMMENT 'Contact method', month STRING COMMENT 'Month', day_of_week STRING COMMENT 'Day of week', duration STRING COMMENT 'Duration', campaign BIGINT COMMENT 'Number of contacts in this campaign', pdays DOUBLE COMMENT 'Interval from last contact', previous DOUBLE COMMENT 'Number of previous contacts', poutcome STRING COMMENT 'Outcome of previous marketing campaign', emp_var_rate DOUBLE COMMENT 'Employment variation rate', cons_price_idx DOUBLE COMMENT 'Consumer price index', cons_conf_idx DOUBLE COMMENT 'Consumer confidence index', euribor3m DOUBLE COMMENT 'Euribor 3 month rate', nr_employed DOUBLE COMMENT 'Number of employees', y BIGINT COMMENT 'Has term deposit' );On the editing page, click Deploy to create the
bank_datatable in the MaxCompute project corresponding to the development environment.After the
bank_datatable is created, you can click the table name in the Data Catalog to view the table details.
Step 2: Upload data
Download the banking.csv file. Use the DataWorks upload feature to upload it to the bank_data table.
Ensure that a Scheduling Resource Group and a Data Integration Resource Group are configured before uploading. For details, see Data upload Limitations.
Click the
icon and choose to go to the Upload & Download page.Click Upload Data and configure the following settings:
Parameter
Description
Data Source
Local file.
Specify Data to Be Uploaded
Data Source Type
Upload the local
banking.csvfile.Configure Destination Table
Target Engine
MaxCompute
MaxCompute Project Name
Select the project containing the
bank_datatable.Select Destination Table
Select the
bank_datatable as the target table.Preview Data of Uploaded File
Click Mapping by Order to map data to table fields.
NoteLocal files support
.csv,.xls,.xlsx, and.jsonformats.For spreadsheet files, the first sheet is uploaded by default.
The maximum size for
.csvfiles is 5 GB. For other file types, the limit is 100 MB.
Click Upload Data to upload the data from the downloaded CSV file to the
bank_datatable in the MaxCompute computing resource.Verify the upload.
Verify the data in the
bank_datatable via SQL Query (Legacy).Click the
icon in the upper-left corner, and click in the pop-up page.Click next to My Files, customize the File Name, and click OK.
On the SQL Query page, configure the following SQL.
SELECT * FROM bank_data limit 10;Select the workspace and MaxCompute data source where the
bank_datatable resides in the upper-right corner, and then click OK.NoteThis example uses a standard mode workspace and the
bank_datatable is created only in the development environment. Therefore, you must select the MaxCompute data source for the development environment. If you are using a simple mode workspace, you can select the MaxCompute data source for the production environment.Click Run (confirm cost estimation if prompted). The bottom pane displays the first 10 records, confirming the upload.

Step 3: Process data
Use a MaxCompute SQL node to filter the bank_data table for the education levels of single home buyers with loans, and then write the results to the result_table.
Build the data processing pipeline
Click the
icon in the upper-left corner and choose .Switch to the workspace created in this tutorial at the top of the page. Click
in the left navigation pane to go to Data Studio.In Workspace Directories, click
> Create Workflow. Name it dw_basic_caseand click OK.Drag a Zero Load Node and two MaxCompute SQL nodes onto the canvas. Rename them as follows:
The node names and functions used in this tutorial are as follows:
Type
Name
Function
Zero Loadworkshop_startManages the workflow structure. This is a no-op task requiring no code.
MaxCompute SQLddl_result_tableCreates the result_table to store the cleaned data from bank_data.
MaxCompute SQLinsert_result_tableFilters the bank_data and writes the results to the result_table.
Connect the nodes as shown:
NoteWorkflows support configuring upstream/downstream dependencies via manual connection or by automatically identifying dependencies through code parsing. This tutorial uses the manual connection method. For more information, see Automatic dependency parsing.
Click Save in the node toolbar.
Configure data processing nodes
Step 4: Debug and run
Click the
icon to execute the workflow. Check the logs if any failures occur.

Step 5: Data query and display
Data processing is complete. Query the result_table and analyze the data in SQL Query (Legacy).
Click the
icon in the upper-left corner, and click in the pop-up page.Click next to My Files, customize the File Name, and click OK.
On the SQL Query page, configure the following SQL.
SELECT * FROM result_table;Select the workspace and MaxCompute data source where the
result_tabletable resides in the upper-right corner, and then click OK.NoteThis example uses a standard mode workspace.
result_tableexists only in the development environment, so select the corresponding data source. If you are using a simple mode workspace, you can select the MaxCompute data source for the production environment.Click the Run button at the top. In the cost estimation page, click Run.
Click
in the query results to view the visualized chart results. You can click
in the upper-right corner of the chart to customize the chart style. You can also click Save in the upper-right corner of the chart to save the chart as a card, and then click Card (
) in the left navigation pane to view it.
Next steps
For details on modules and parameters, see Data Studio (new version) and Data Analysis.
In addition to the modules introduced in this tutorial, DataWorks also supports multiple modules such as Data Modeling, Data Quality, Data Security Guard, DataService Studio, Data Integration, and Node scheduling configuration, providing you with one-stop data monitoring and O&M.
You can also experience more DataWorks practice tutorials. For specific content, see More use cases and tutorials.

