ODPS SQL nodes can process terabytes of data in distributed scenarios that do not require real-time processing by using the SQL-like syntax. This topic describes how to create an ODPS SQL node.
Prerequisites
A MaxCompute compute engine instance is associated with your workspace. The MaxCompute service is available in a workspace only after you associate a MaxCompute compute engine instance with the workspace on the Workspace Management page. For more information, see Create and manage workspaces.Background information
In most cases, a long period of time is required from preparation to job committing. To reduce the time required, you can use ODPS SQL nodes to process thousands to tens of thousands of transactions. ODPS SQL nodes are online analytical processing (OLAP) applications that are designed to process large amounts of data..
Limits
ODPS SQL nodes have the following limits:
- You cannot use a SET or USE statement separately in the code of an ODPS SQL node. They must be executed with other SQL statements. For example, you can use a SET statement together with a CREATE TABLE statement.
set a=b; create table name(id string);
- You cannot add comments to statements that contain keywords in the code of an ODPS SQL node. The statements include the SET and USE statements. For example, a comment is not allowed in the following code:
create table name(id string); set a=b; --Comment // You cannot add a comment. create table name1(id string);
- You cannot add comments to the end of a complete statement in the code of an ODPS SQL node. Examples: Note If a semicolon (;) is added to the end of an SQL statement, the SQL statement is considered complete.
select * --Comment // This statement is incomplete. You can add a comment. from dual;--Comment // This statement is complete. You cannot add a comment. show tables;
- The running of an ODPS SQL node during data development and the scheduled running of an ODPS SQL node have the following differences:
- Running during data development: All the statements that contain keywords are combined in the node code and are executed before you execute other SQL statements.
- Scheduled running: All SQL statements are executed in sequence.
set a=b; create table name1(id string); set c=d; create table name2(id string);
The following table describes the differences between the two execution modes for the preceding SQL statements.SQL statement Running during data development Scheduled running First SQL statement set a=b; set c=d; create table name1(id string);
set a=b; create table name1(id string);
Second SQL statement set a=b; set c=d; create table name2(id string);
set c=d; create table name2(id string);
- You must configure a scheduling parameter in the format of
key=value
. Do not add spaces before or after the equal sign (=). Examples:time={yyyymmdd hh:mm:ss} -- The format is invalid. a =b -- The format is invalid.
- If you use keywords such as bizdate and date as scheduling parameters, you must specify the values in the format of yyyymmdd. If you want to use other time formats, do not use the preceding keywords as scheduling parameters. Example:
bizdate=201908 -- The format is invalid.
- You can use only statements that start with SELECT, READ, or WITH to query the result data of a node during data development. Otherwise, no results are returned.
- If you want to execute multiple SQL statements, separate the SQL statements with semicolons (;). Specify each SQL statement in a separate line.
- Incorrect example
create table1;create table2
- Correct example
create table1; create table2;
- Incorrect example
- If new data types are used for the additional functions of MaxCompute V2.0, you must add
set odps.sql.type.system.odps2=true;
before the SQL statements that use the functions, and commit and execute the code together with the SQL statements. - If you want to add comments to SQL statements, do not use semicolons (;) in comments. Incorrect example:
create table1; -- Create a table named table1; then, create a table named table2. create table2;
- The code of an ODPS SQL node can be a maximum of 128 KB in size and can contain a maximum of 200 SQL statements.
Procedure
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click DataStudio in the Actions column.
- In the Scheduled Workflow pane, move the pointer over the icon and choose . Alternatively, you can click a workflow in the Scheduled Workflow pane, right-click MaxCompute, and then choose .
- In the Create Node dialog box, configure the Name and Path parameters. Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Confirm.
- On the configuration tab of the ODPS SQL node, write and run code in the code editor. After the node is created, write the code of the ODPS SQL node. The code must conform to the SQL syntax. For more information about SQL syntax, see Overview of MaxCompute SQL.Note Due to the adjustment made by the International Organization for Standardization (ISO) on the UTC+8 time zone, differences exist between the actual time and the output time when you execute related SQL statements in DataWorks. In a year from 1900 to 1928, the time difference is 352 seconds. Before the year of 1900, the time difference is 9 seconds.You cannot use a
SET
statement separately in the node code. The SET statement must be executed together with other SQL statements. For example, you can execute a SET statement together with a SELECT statement.set odps.sql.allow.fullscan=true; select 1;
For more information about SET statements, see SET operations.
Example: Create a table, insert data into the table, and then query data in the table.
- On the node configuration tab, click the Properties tab in the right-side navigation pane and configure scheduling properties for the node. For more information, see Configure basic properties.
- Commit the node. Important You can commit the node only after you configure the Rerun and Parent Nodes parameters.In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit a node. For more information, see Deploy nodes.
- Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.