DataWorks serves as a unified end-to-end big data development and governance platform based on compute engines such as MaxCompute. This topic describes how to use MaxCompute in DataWorks.
Background information
DataWorks allows you to associate compute engines with a DataWorks workspace. After you associate a compute engine with a DataWorks workspace as a compute engine instance, you can create nodes of the same compute engine type in the DataWorks console and then enable the system to periodically schedule the nodes. You can use one of the following methods to connect DataWorks to MaxCompute:
Use the SQL query feature of DataAnalysis
You can use this feature to perform operations such as editing MaxCompute SQL statements, querying data, analyzing data by using workbooks, and sharing and downloading data online. For more information about the SQL query feature, see SQL query.
Use ODPS nodes in DataStudio
DataWorks encapsulates different types of compute engine tasks into different types of nodes to define data development tasks. You can use resources, functions, and related logic processing nodes to develop more complex tasks. ODPS nodes include ODPS SQL nodes, ODPS Spark nodes, PyODPS 2 nodes, PyODPS 3 nodes, ODPS Script nodes, and ODPS MR nodes.
Scenarios
Use scenarios of DataAnalysis
You can use the SQL query feature of DataAnalysis in the following scenarios:
You can use the SQL query feature of DataAnalysis to query data and use Web Excel in analysis mode to analyze query results. To reduce the frequency at which data is transferred and ensure data security, you can also download the query results to your on-premises machine for analysis.
Use scenarios of ODPS nodes
If you want to periodically run a MaxCompute job, you can use Data Studio in the DataWorks console to develop an auto triggered node that relates to the job and configure settings such as time properties and scheduling dependencies for the node. Then, you can commit the node to DataWorks Operation Center for periodic scheduling.
Procedure
Associate MaxCompute computing resources with the workspace or create a MaxCompute data source.
You can perform subsequent operations based on whether you turn on Participate in Public Preview of Data Studio when you create the workspace.
You can find the desired workspace on the Workspaces page in the DataWorks console and perform the following operations to check whether Participate in Public Preview of Data Studio is turned on:
Participate in Public Preview of Data Studio not turned on
Participate in Public Preview of Data Studio turned on
Choose Shortcuts > Data Development in the Actions column.
The old-version DataStudio page appears, as shown in the following figure.
For more information about old-version DataStudio, see Overview.
Choose Shortcuts > DataStudio (new version) in the Actions column.
The new-version Data Studio page appears, as shown in the following figure.
For more information about new-version Data Studio, see Data Studio (new version).
If you participated in the public preview, associate MaxCompute computing resources with the workspace. For more information, see Associate a computing resource with a workspace (Participate in Public Preview of Data Studio turned on).
If you did not participate in the public preview, create a MaxCompute data source and bind the data source to DataStudio (old version). For more information, see Add a data source or register a cluster (Participate in Public Preview of Data Studio not turned on).
NoteIf a MaxCompute data source is created but it is not bound to DataStudio (old version), only data synchronization operations can be performed. Operations such as data development, task scheduling, and data analysis cannot be performed based on MaxCompute.
Use MaxCompute in DataWorks.
Use DataAnalysis
You can use one of the following methods to go to the SQL Query page in DataAnalysis:
In the left-side navigation pane of the MaxCompute console, click Data Analytics. On the DataAnalysis page in the DataWorks console, click SQL Query. The SQL Query page appears.
In the Shortcuts section on the homepage of DataAnalysis, click SQL Query. The SQL Query page appears.
In the left-side navigation pane of the DataAnalysis page, click SQL Query to go to the SQL Query page.
For more information about how to perform operations such as creating SQL queries and executing query statements, see SQL query.
Use ODPS nodes
For information about how to create an ODPS node, see Overview.