edit-icon download-icon

Data development overview

Last Updated: Apr 04, 2018

The Data Development feature helps you to design data computing processes according to your business demands and make mutually dependent tasks be automatically run in the scheduling system.

Objects

In the data development stage, DataWorks provides four types of objects for you to choose according to your needs: task, script, resource, and function. The project relationship among these objects is as follows.

Objects

Object description:

  • Tasks: Tasks are the main objects of data development, including periodic properties and dependencies, and serve as a main carrier of data computing. Various types of tasks and nodes are supported for different scenarios. For more information, see Task type overview.

  • Scripts: Scripts are the auxiliary objects of data development, excluding the periodic properties and dependencies. Scripts are mainly used to process non-periodic temporary data, such as adding, deleting, and modifying temporary tables. For more information, see Script development.

  • Function and resources: Files and computing functions that must be referenced when running the codes in a task must be uploaded to the computing space (MaxCompute) before the task runs. For more information, see Resource management and Function management.

Process

The following figure shows how a task is developed and used.

TaskProcess

For more information, see Guide description.

Instructions on running a task

From the preceding process, we can see that DataWorks provides four running modes to make the computing statements in a task take effect. The use cases and limits are as follows.

Procedure Trigger mode If instances are generated in the Operation center Scheduling property Use case Note
Page direct run Manual No Not subject to scheduling period and dependency Suitable for the code debugging stage. Saving or submitting is not required Scripts and tasks are supported. Supported task types only include ODPS_SQL, OPEN_MR, ODPS_MR, and SHELL.
Test Run Manual Yes Subject to scheduling period but not to dependency Suitable for checking parameter replacements and code running Only tasks are supported, and the latest submitted version is used.
System automatic run Automatic Yes Subject to scheduling period and dependency A main method to automatically compute data by using DataWorks. Maintainers in the O&M center are required to maintain all the periodic instances and make sure they run in a sequence Only tasks are supported, and the latest submitted version is used.
Data completing run Manual Yes Subject to scheduling period and dependency A supplement to system automatic run. Used when some newly created or wrong tasks must trigger the data computing for a period before today Only tasks are supported, and the latest submitted version is used.
Thank you! We've received your feedback.