A task is used to perform various operations on data. The following describes the uses of various tasks:
A data synchronization node task is used to copy data from RDS to MaxCompute.
A MaxCompute SQL node task is used to run MaxCompute SQL for data conversion.
A flow task is used to perform a series of data conversions among several inner SQL nodes.
Each task uses zero or more data tables (data sets) as an input, and generates one or more data tables (data sets) as the output.
Tasks are divided into node tasks, flow tasks, and inner nodes. See the relationships between these tasks in the following figure:
A node task is an operation performed on data. It can be configured to be dependent on other node tasks and flow tasks to form a Directed Acyclic Graph (DAG).
A flow task is formed by a group of inner nodes that are processing a small business. We recommend using less than 10 flow tasks. Inner nodes of a flow task cannot depend on by other flow or node tasks. A flow task can be configured to be dependent on other flow and node tasks to form a DAG.
An inner node is a node inside a flow task. It basically provides the same capabilities as a node task. Its scheduling frequency is inherited from the scheduling frequency of the flow task, and cannot be configured independently. The dependency can only be dragged.
For more infomation about data operation types, see Task type description.
When a task is scheduled by the system or triggered manually, an instance is generated. An instance is a snapshot that runs by a task at a certain moment. The instance contains the task operating time, operating status, operating logs, and other information. For example:
Assume that Task 1 is configured to run at 02:00 each day. In this case, the scheduling system automatically generates a snapshot at the time predefined by the periodic node task at 23:30 each day. That is, the instance of Task 1 to be run at 02:00 the next day. When it is detected that the upstream task is complete, the system automatically runs the Task 1 instance at 02:00 the next day.
You can query task instance information on the O&M Center > Task O&M page.
Submit is a process by which the developed node task or flow task is released from the development environment to the scheduling system. After a task is submitted, its code and scheduling configuration are synchronized to the scheduling system, which schedules the task according to the configuration.
Node tasks and flow tasks that are not submitted, do not enter the scheduling system.
A script is a code storage space that is provided for data analysis. The script code cannot be released to the scheduling system, and its scheduling parameters cannot be configured. It can only be used for data query and analysis.
In DataWorks, you can use interfaces for resource and function management. Resources and functions that are managed through other MaxCompute methods, cannot be queried in DataWorks.