Overview - MaxCompute - Alibaba Cloud Documentation Center

If you want to periodically run a MaxCompute job, you can use DataStudio in the DataWorks console to develop the job that runs on an auto triggered node and configure related parameters for the node. The related parameters include time properties and scheduling dependencies. Then, you can submit the MaxCompute job to DataWorks Operation Center for periodic scheduling. This topic describes how to develop a periodically scheduled job in the DataWorks console.

Development workflow

The following figure shows the basic development workflow of a MaxCompute job in the DataWorks console.

Note A DataWorks workspace can work in basic mode or standard mode. In standard mode, one DataWorks workspace is associated with a MaxCompute project in the production environment and a MaxCompute project in the development environment. In the preceding figure, a DataWorks workspace in standard mode is used.

Take note of the following points:

Different types of jobs can run on different types of nodes, such as ODPS SQL nodes, ODPS Spark nodes, and PyODPS nodes.
In the DataWorks console, you can create MaxCompute tables, register MaxCompute functions with DataWorks, and create MaxCompute resources. This way, job development efficiency is improved.
You can also use general nodes in the DataWorks console to implement complex job logic, such as loop and traversing.

Billing

When you create MaxCompute-related synchronization nodes and data processing nodes in DataStudio and enable periodical scheduling of these nodes in Operation Center, you are charged for not only DataWorks resources but also the resources of other Alibaba Cloud services.

Fees for DataWorks resources

This section describes the fees that are included in your DataWorks bill. For information about the billable items of DataWorks, see Billing overview.

Fees for the DataWorks edition that you use:
You must activate DataWorks before you can develop nodes in DataWorks. If you activate an advanced edition such as DataWorks Enterprise Edition, you are charged the related fees when you purchase the edition.
Fees for the scheduling resources that you use to schedule nodes:
After nodes are developed, scheduling resources are required to schedule the nodes. You can purchase resource groups for scheduling, such as subscription exclusive resource groups for scheduling and the pay-as-you-go shared resource group for scheduling, based on your business requirements, and pay for the resource groups.
Fees for the resources that you use to synchronize data:
A data synchronization node consumes scheduling resources and synchronization resources. You can purchase resource groups for Data Integration, such as subscription exclusive resource groups for Data Integration and the pay-as-you-go shared resource group for Data Integration (debugging), based on your business requirements, and pay for the resource groups.

Note

You are not charged scheduling fees if you run nodes by clicking Run or Run with Parameters in the top toolbar on the DataStudio page.
You are not charged scheduling fees for failed nodes or dry-run nodes.

For more information that helps you understand the billing details, see Issuing logic of scheduling nodes in DataWorks.

Fees for the resources of other Alibaba Cloud services

This section describes the fees that are not included in your DataWorks bill.

Important You are charged for the resources of other Alibaba Cloud services based on the billing logic of the Alibaba Cloud services. For more information, see the billing documentation of the Alibaba Cloud services. For example, for information about the billing details of a MaxCompute compute engine that you use, see Billable items of MaxCompute.

You may also be charged for the resources of other Alibaba Cloud services that are used to develop and run nodes in DataWorks. The fees include but are not limited to the following items:

Database fees:
When you run data synchronization nodes to read data from and write data to databases, database fees may be generated.
Computing and storage fees:
When you run nodes of a specific type of compute engine, computing and storage fees of this type of compute engine may be generated. For example, if you run an ODPS SQL node to create a MaxCompute table and write data to the MaxCompute table, you may be charged for computing and storage resources of a MaxCompute compute engine.
Network service fees:
When you establish network connections between DataWorks and other related services, network service fees may be generated. For example, if you use services, such as Express Connect, Elastic IP Address (EIP), and EIP Bandwidth Plan, to establish network connections between DataWorks and other related services, you may be charged network service fees.

Permission management

DataWorks provides a comprehensive permission management system for you to manage product-level permissions and module-level permissions. In the DataWorks console, you can request permissions on MaxCompute compute engine resources and process requests for accessing MaxCompute compute engine resources.

Management of data access permissions

You can use an ODPS SQL or ad hoc query node to query data in MaxCompute tables. If you use a DataWorks workspace in basic mode, fine-grained permission management and isolation of data between development and production environments are not supported. In this topic, a DataWorks workspace in standard mode is used.

Description of permissions of built-in workspace-level roles on MaxCompute

The following table describes the permissions of RAM users on MaxCompute after the RAM users are added to a workspace as members and are assigned workspace-level roles.


Permission type	Description
Permissions on a MaxCompute project in the development environment	After you assign a RAM user a built-in workspace-level role in your workspace and associate a MaxCompute project with the workspace in the development environment, the RAM user is automatically granted the permissions of the mapped role of the MaxCompute project. By default, the RAM user has the permissions of the MaxCompute project in the development environment. However, the RAM user does not have the permissions of the MaxCompute project that is associated with the workspace in the production environment.
Permissions on a MaxCompute project in the production environment	The RAM user that is used as a scheduling access identity has high permissions on a MaxCompute project in the production environment. Other RAM users do not have permissions on the MaxCompute project in the production environment. To perform operations on MaxCompute tables in the production environment, you must go to Security Center to request the required permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted management permissions to customize request processing procedures.

For more information about permission management for MaxCompute, see Manage permissions on data in a MaxCompute compute engine instance.

Description of data access behaviors

MaxCompute allows you to query tables across projects. You can query data in a MaxCompute project that is associated with a workspace in the production environment by specifying the project name on the DataStudio page. The following table describes the methods to query tables across projects and the accounts that can be used to access the tables in different environments.

Note

In the Compute Engine Information section of the Workspace Management page, you can view the MaxCompute projects that are associated with the workspace in the development and production environments and the accounts that are used to configure environments for the MaxCompute projects. For more information, see Associate a MaxCompute compute engine with a workspace.
In the MaxCompute project that is associated with the workspace in the development environment, the personal identity of a node executor is used to run nodes by default. In the MaxCompute project that is associated with the workspace in the production environment, an Alibaba Cloud account is used to run nodes. The Alibaba Cloud account is used as the scheduling access identity. For more information, see Associate a MaxCompute compute engine with a workspace.


Sample code	Execution account in the development environment (DataStudio and Operation Center in the development environment)	Execution account in the production environment (Operation Center in the production environment)
Access tables in the MaxCompute project in the development environment: `select col1 from projectname_dev.tablename;`	The personal Alibaba Cloud account of a node executor is used to access tables in the MaxCompute project in the development environment. If a RAM user runs a node, the personal Alibaba Cloud account of the RAM user is used to access tables in the MaxCompute project in the development environment. If an Alibaba Cloud account is used to run a node, the Alibaba Cloud account is used to access tables in the MaxCompute project in the development environment.	The scheduling access identity is used to access tables in the MaxCompute project in the development environment.
Access tables in the MaxCompute project in the production environment: `select col1 from projectname.tablename;`	The personal Alibaba Cloud account of a node executor is used to access tables in the MaxCompute project in the production environment. Note Due to security control on data in the production environment, a personal Alibaba Cloud account cannot be used to access tables in the MaxCompute project in the production environment. To use a personal Alibaba Cloud account to access tables in the MaxCompute project in the production environment, go to Security Center to request the permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted management permissions to customize request processing procedures.	The scheduling access identity is used to access tables in the MaxCompute project in the production environment.
Execute the following statement in the MaxCompute project in the desired environment such as the development environment to access tables in the MaxCompute project: `select col1 from tablename;`	If the statement is executed in the MaxCompute project in the development environment, you can use the personal Alibaba Cloud account of a node executor to access tables in the MaxCompute project in the development environment.	If the statement is executed in the MaxCompute project in the production environment, you can use the scheduling access identity to access tables in the MaxCompute project in the production environment.

Management of permissions on services and features

Before you develop data in DataWorks as a RAM user, you must assign a workspace-level role to the RAM user to grant the RAM user specific permissions. For more information, see Best practices for managing permissions of RAM users.

You can use RAM policy-based authorization to manage permissions on DataWorks service modules, such as prohibiting DataWorks users from accessing DataMap, and to manage permissions of performing operations in the DataWorks console, such as allowing DataWorks users to delete a workspace.
You can use role-based access control (RBAC) to manage permissions on DataWorks workspace-level service modules, such as allowing DataWorks users to access DataStudio to perform development-related operations, and to manage permissions on DataWorks global-level service modules, such as prohibiting DataWorks users from accessing Data Security Guard.

Manage permissions on services and features