All Products
Search
Document Center

MaxCompute:DataWorks

Last Updated:Jun 25, 2023

DataWorks serves as a unified end-to-end big data development and governance platform based on compute engines such as MaxCompute compute engines. This topic describes how to use MaxCompute in DataWorks.

Background information

DataWorks allows you to associate compute engines with a DataWorks workspace. After you associate a compute engine with a DataWorks workspace as a compute engine instance, you can create nodes of the same compute engine type in the DataWorks console and then enable the system to periodically schedule the nodes. You can use one of the following methods to connect DataWorks to MaxCompute:

  • Use the SQL query feature of DataAnalysis

    You can use this feature to perform operations such as editing MaxCompute SQL statements, querying data, analyzing data by using workbooks, and sharing and downloading data online. For more information about the SQL query feature, see SQL query.

  • Use ODPS nodes in DataStudio

    DataWorks encapsulates different types of compute engine tasks into different types of nodes to define data development tasks. You can use resources, functions, and related logic processing nodes to develop more complex tasks. ODPS nodes include ODPS SQL nodes, ODPS Spark nodes, PyODPS 2 nodes, PyODPS 3 nodes, ODPS Script nodes, and ODPS MR nodes.

Scenarios

Use scenarios of DataAnalysis

You can use the SQL query feature of DataAnalysis in the following scenarios:

You can use the SQL query feature of DataAnalysis to query data and use Web Excel in analysis mode to analyze query results. To reduce the frequency at which data is transferred and ensure data security, you can also download the query results to your on-premises machine for analysis.

Use scenarios of ODPS nodes

If you want to periodically run a MaxCompute job, you can use DataStudio in the DataWorks console to develop an auto triggered node that relates to the job and configure settings such as time properties and scheduling dependencies for the node. Then, you can commit the node to DataWorks Operation Center for periodic scheduling.

Instructions

  1. Create a DataWorks workspace.

    For more information, see Create and manage workspaces.

  2. Associate a MaxCompute compute engine with the DataWorks workspace.

    For more information, see Associate a MaxCompute compute engine with a workspace.

    Note

    If you use a workspace in basic mode, you can associate existing MaxCompute compute engines with the workspace. If you use a workspace in standard mode, you cannot associate existing MaxCompute compute engines with the workspace.

  3. (Optional) Add a MaxCompute data source to DataWorks.

    Data sources in DataWorks include data sources that are automatically generated when you associate compute engines with a workspace and data sources that are added to DataWorks on the Data Source page. If you want to perform operations on a data source that is not automatically generated for the related compute engine, make sure that the related data source is created. For more information, see Create and manage data sources.

  4. Use MaxCompute in DataWorks.

    • DataAnalysis

      You can use one of the following methods to go to the SQL Query page in DataAnalysis:

      • In the left-side navigation pane of the MaxCompute console, click Data Analytics. On the DataAnalysis page in the DataWorks console, click SQL Query. The SQL Query page appears.

      • In the Shortcuts section on the homepage of DataAnalysis, click SQL Query. The SQL Query page appears.

      • In the left-side navigation pane of the DataAnalysis page, click SQL Query to go to the SQL Query page.

      For more information about how to perform operations such as creating SQL queries and executing query statements, see SQL query.

    • ODPS nodes

      For information about how to create an ODPS node, see Overview.