All Products
Search
Document Center

DataWorks:SQL coding guidelines and specifications

Last Updated:Mar 27, 2026

Consistent SQL formatting reduces review time, lowers the risk of logic errors, and makes it easier for teammates to understand and maintain your code. This document defines the formatting rules and naming conventions for SQL written in DataWorks (MaxCompute).

Coding guidelines

Apply these principles when writing SQL:

  • Write comprehensive code that covers all necessary conditions and edge cases.

  • Keep code clear, neat, and well-organized.

  • Consider execution performance during SQL design.

  • Add comments where the logic is not self-evident.

  • Write all SQL keywords and reserved words in consistent case — either all uppercase or all lowercase. For example: select or SELECT, from or FROM, where or WHERE. Mixed case (for example, Select or seLECT) is not allowed.

  • Use four spaces as one indentation unit. All indentation must be a multiple of this unit.

  • Never use SELECT *. Always specify column names explicitly to avoid unexpected results when the table schema changes.

  • Place matching opening and closing parentheses in the same column.

  • These guidelines are non-mandatory. Reasonable deviations are allowed as long as the general rules are followed.

SQL coding specifications

Code header

Add a header at the top of each SQL script. The header includes the subject, description, author, and creation date. Reserve a row for change logs and a title line so that future changes can be recorded. Each line must not exceed 80 characters.

Template:

-- MaxCompute(ODPS) SQL
--**************************************************************************
-- ** Subject:      Transaction
-- ** Description:  Transaction refund analysis
-- ** Author:       Youma
-- ** Created on:   20170616
-- ** Change log:
-- ** Modified on   Modified by   Content
-- yyyymmdd          name          comment
-- 20170831          Wuma          Add a comment on the biz_type=1234 transaction
--**************************************************************************

Column arrangement

Each selected column in a SELECT statement goes on its own line. This makes it easy to add, remove, or reorder columns without affecting adjacent lines.

Follow these formatting rules:

  • Leave one indentation unit between SELECT and the first column.

  • Start each subsequent column on a new line with two indentation units, preceded by a comma.

  • Place the comma before the column name (not after the preceding column). This way, commenting out any single column does not break the syntax of adjacent lines.

  • Place the AS keyword on the same line as its column, and align all AS keywords in the same column.

Good:

SELECT
        order_id
    ,   user_id        AS uid
    ,   amount         AS total_amount
    ,   created_at     AS create_time
FROM orders

Bad:

SELECT order_id, user_id AS uid, amount AS total_amount, created_at AS create_time FROM orders

INSERT statement

Arrange all clauses of an INSERT statement on the same line.

SELECT statement clause arrangement

Clauses such as FROM, WHERE, GROUP BY, HAVING, ORDER BY, JOIN, and UNION must follow these formatting rules:

  • Place each clause on its own line.

  • Left-align all clauses with the SELECT keyword.

  • Separate the first keyword of a clause from its content with two indentation units.

  • If the clause name is longer than two indentation units (such as ORDER BY or GROUP BY), add a single space between the clause name and its content.

  • Place AND and OR at the beginning of the line, left-aligned with WHERE. Starting each condition with its logical operator makes each condition independently scannable and immediately shows whether conditions are additive or alternative.

Good:

SELECT
        order_id
    ,   user_id
    ,   amount
FROM    orders
WHERE   status = 'completed'
AND     amount > 100
AND     created_at >= '2024-01-01'
GROUP BY
        order_id
    ,   user_id
    ,   amount
ORDER BY amount DESC

Bad:

SELECT order_id, user_id, amount FROM orders WHERE status = 'completed' AND amount > 100

Operator spacing

Place one space before and after each arithmetic operator and logical operator. Keep all operators on the same line unless the line exceeds 80 characters.

Good:

WHERE amount > 100 AND status = 'completed'

Bad:

WHERE amount>100 AND status='completed'

CASE expressions

Follow these rules when writing CASE expressions:

  • Write the WHEN clause one indentation unit after CASE, on the same line.

  • Put each WHEN clause on its own line. Wrap long clauses.

  • Always include an ELSE clause to handle unexpected values. Align ELSE with WHEN.

Good:

SELECT
        order_id
    ,   CASE status
            WHEN 'completed'  THEN 'Success'
            WHEN 'cancelled'  THEN 'Cancelled'
            ELSE 'Pending'
        END AS order_status
FROM    orders

Bad:

SELECT order_id, CASE status WHEN 'completed' THEN 'Success' WHEN 'cancelled' THEN 'Cancelled' END AS order_status FROM orders

The "Bad" example omits the ELSE clause, which leaves unexpected values unhandled.

Subqueries

Subqueries are commonly used for extract, transform, load (ETL) operations in data warehouse systems. When nesting subqueries, use the following naming convention for aliases at each level:

Level Full name Abbreviation
1 part P (or a)
2 segment S (or b)
3 unit U (or c)
4 detail D (or d)

To distinguish multiple subqueries at the same level, append a number to the alias (for example, P1, P2). Add comments to aliases as needed.

Example:

SELECT
        P.order_id
    ,   P.user_id
    ,   P.total_amount
FROM (
    -- P: Aggregate completed orders by user
    SELECT
            S.user_id
        ,   SUM(S.amount) AS total_amount
    FROM (
        -- S: Filter completed orders
        SELECT
                order_id
            ,   user_id
            ,   amount
        FROM    orders
        WHERE   status = 'completed'
    ) S
    GROUP BY S.user_id
) P
WHERE P.total_amount > 1000

Table aliases

Follow these rules when using table aliases:

  • Assign an alias to every table in a SELECT statement.

  • Once an alias is assigned, use it consistently throughout the statement.

  • Define aliases in alphabetical order.

  • Do not use SQL keywords as aliases.

  • In nested queries, use the level-based naming convention (P/S/U/D or a/b/c/d) described in the Subqueries section.

  • Add comments to aliases when their meaning is not immediately obvious.

SQL comments

Follow these rules when adding comments:

  • Add a comment to each SQL statement.

  • Place statement-level comments on a separate line, before the statement.

  • Place column-level comments immediately after the column.

  • Add comments to clauses that are difficult to understand.

  • Comment on any important or business-critical code.

  • For long statements, add comments by segment based on purpose.

  • Describing a constant or variable is required. Commenting on the valid value range is optional.

Adding comments with keyboard shortcuts:

  • Windows: Press Ctrl+/ to comment or uncomment the current line. Select multiple lines first to comment them all at once.

  • macOS: Press cmd+/ to comment or uncomment the current line. Select multiple lines first to comment them all at once.

End-to-end example

The following example demonstrates all the specifications above in a single SQL script: code header, column arrangement, clause formatting, CASE expression, subquery, aliases, and comments.

-- MaxCompute(ODPS) SQL
--**************************************************************************
-- ** Subject:      Sales Analysis
-- ** Description:  Monthly revenue by user segment for completed orders
-- ** Author:       Youma
-- ** Created on:   20240101
-- ** Change log:
-- ** Modified on   Modified by   Content
-- yyyymmdd          name          comment
--**************************************************************************

-- P: Final output -- monthly revenue by user segment
SELECT
        S.order_month
    ,   S.user_segment
    ,   S.monthly_revenue
FROM (
    -- S: Aggregate revenue and classify users
    SELECT
            DATE_FORMAT(U.created_at, '%Y-%m') AS order_month   -- truncate to month
        ,   CASE
                WHEN U.total_amount >= 10000 THEN 'High-value'
                WHEN U.total_amount >= 1000  THEN 'Mid-value'
                ELSE 'Standard'
            END                                AS user_segment
        ,   SUM(U.total_amount)                AS monthly_revenue
    FROM (
        -- U: Filter completed orders within the target period
        SELECT
                order_id
            ,   user_id
            ,   amount   AS total_amount
            ,   created_at
        FROM    orders
        WHERE   status     = 'completed'
        AND     created_at >= '2024-01-01'
        AND     created_at <  '2025-01-01'
    ) U
    GROUP BY
            DATE_FORMAT(U.created_at, '%Y-%m')
        ,   CASE
                WHEN U.total_amount >= 10000 THEN 'High-value'
                WHEN U.total_amount >= 1000  THEN 'Mid-value'
                ELSE 'Standard'
            END
) S
ORDER BY S.order_month ASC
       , S.monthly_revenue DESC