This topic introduces the basic concepts related to DataWorks, including the workflow, solution, SQL script template, node, instance, commit operation, script, resource, function, and output name.
- Nodes in a workflow are organized by type.
- A hierarchical directory structure is supported. We recommend that you create a maximum of four levels of subfolders.
- You can view and optimize each workflow from a business perspective.
- You can deploy and manage each workflow as a whole.
- You can view each workflow on a dashboard to efficiently develop code.
A solution contains one or more workflows.
- A solution can contain multiple workflows.
- A workflow can be added to multiple solutions.
- Workspace members can collaboratively develop and manage all solutions in a workspace.
SQL script template
SQL script templates are general logic chunks abstracted from SQL scripts. They can be reused to enhance the efficiency of code development.
Each SQL script template involves one or more source tables. You can filter source table data, join source tables, and aggregate them to generate a result table based on the requirements of new business. An SQL script template includes multiple input and output parameters.
- A sync node is used to synchronize data from ApsaraDB for RDS to MaxCompute.
- An ODPS SQL node is used to run MaxCompute SQL for data conversion.
Each node has zero or more input tables or datasets and generates one or more output tables or datasets.
|Node task||A node task is a data operation. You can configure the dependencies between a node task and other node tasks or flow tasks to form a directed acyclic graph (DAG).|
|Flow task||A flow task contains a group of inner nodes that process a workflow. We recommend
that you create less than 10 flow tasks.
Inner nodes in a flow task cannot be used as dependencies of other flow tasks or node tasks. You can configure the dependencies between a flow task and other flow tasks or node tasks to form a DAG.
Note In DataWorks V2.0 and later, you can find the flow tasks created in DataWorks V1.0 but cannot create flow tasks. You can create workflows instead to perform similar operations.
|Inner node||An inner node is a node within a flow task. It has basically the same features as a node task. You can configure dependencies between inner nodes in a flow task by using drag-and-drop operations. However, you cannot configure a recurrence for inner nodes because they follow the recurrence configuration of the flow task.|
An instance is a snapshot of a node at a certain time point. It is generated when a node is scheduled by the scheduling system or triggered manually. The instance contains information such as the time when the node is run, the running status of the node, and operational logs.
A script stores the code for data analysis. The code in a script can only be used for data query and analysis. It cannot be deployed to the scheduling system and cannot be scheduled.
Resource and function
The DataWorks console allows you to manage resources and functions. Note that you cannot query resources and functions in DataWorks if they are uploaded through other services such as MaxCompute.
Under an Alibaba Cloud account, each node has an output name that is used to connect to its descendant nodes.