DataWorks offers two task types and seven node types:
Task types: Node task and workflow task.
Node types: Virtual node, ODPS_SQL node, SHELL node, data synchronization node, machine learning node, ODPS_MR node, and OPEN_MR node.
Node task supports using a single node type to adapt to different business demands.
Workflow task involves different types of nodes and their relations for completing a complex data computing task.
This article describes how to create a node task and edit the codes by taking ODPS_SQL node task as an example. For the usage of other task types, see Task type overview.
Go to Data Development > New > New task to create a new task.
Complete the following configurations in the New Task window.
- Task Type: Select Node Task.
- Type: Select ODPS_SQL.
- Scheduling Type: Select Periodic Scheduling.
For a node task, only the Periodic Scheduling can be selected as the Scheduling Type, which indicates that the codes enter the scheduling system and run periodically according to the scheduling property, if the task is submitted successfully.
When an ODPS_SQL task is created, you can write MaxCompute SQL statements in the code editor. The syntax of the MaxCompute SQL statement is different from the traditional SQL syntax. For more information about the differences, see Basic differences with standard SQL and solutions.
After writing and debugging MaxCompute SQL statements, click Save. By doing so, you can continue to edit nodes after you open them later.
The node task has the property of periodic scheduling, so we recommended that you use the computing statements, and use other functions such as Table creation and Script development to operate and maintain table operation statements.
For example, you can write a MaxCompute SQL statement as follows.
select * from bank_data
DataWorks also provides the following shortcut keys for code writing and debugging.
|Function||PC shortcut key||MAC shortcut key|
|Delete a row||Ctrl+Shift+K||Cmd+Shift+K|
|Select same words||Ctrl+D||Cmd+D|
|Highlight same words in batch||Ctrl+Alt+G||Cmd+Alt+G|
For more MaxCompute SQL syntaxes, see Summary.
For MaxCompute SQL limits, see SQL restrictions.
For the differences in syntax between MaxCompute SQL and standard SQL, see Basic differences with standard SQL and solutions.
A maximum of 10,000 records of data is displayed in the query result. To download all the data records, see Tunnel commands.
SET syntax can be directly used for SQL nodes.
The multiple SQL statements are issued to run in a sequence.
Note: If the code selected to be run contains SET statements, when it runs on the page, these SET statements run in a sequence before each non-SET statement runs. This also applies when all the codes in the task run.
To make sure the node runs periodically and can adapt to the context for each run, you must configure the scheduling period and parameters for the node.
DataWorks provides a wide selection of options for Scheduling configuration and Dependency, and various System scheduling parameters. See the relevant article to select the configuration method suitable for your business demands.
After the codes and parameters are configured and debugged, the scheduling system generates running instances and run codes on schedule according to the configuration period only when the periodic task is submitted successfully.
For more information, see Submit a task.