Consistent SQL formatting reduces review time, lowers the risk of logic errors, and makes it easier for teammates to understand and maintain your code. This document defines the formatting rules and naming conventions for SQL written in DataWorks (MaxCompute).
Coding guidelines
Apply these principles when writing SQL:
-
Write comprehensive code that covers all necessary conditions and edge cases.
-
Keep code clear, neat, and well-organized.
-
Consider execution performance during SQL design.
-
Add comments where the logic is not self-evident.
-
Write all SQL keywords and reserved words in consistent case — either all uppercase or all lowercase. For example:
selectorSELECT,fromorFROM,whereorWHERE. Mixed case (for example,SelectorseLECT) is not allowed. -
Use four spaces as one indentation unit. All indentation must be a multiple of this unit.
-
Never use
SELECT *. Always specify column names explicitly to avoid unexpected results when the table schema changes. -
Place matching opening and closing parentheses in the same column.
-
These guidelines are non-mandatory. Reasonable deviations are allowed as long as the general rules are followed.
SQL coding specifications
Code header
Add a header at the top of each SQL script. The header includes the subject, description, author, and creation date. Reserve a row for change logs and a title line so that future changes can be recorded. Each line must not exceed 80 characters.
Template:
-- MaxCompute(ODPS) SQL
--**************************************************************************
-- ** Subject: Transaction
-- ** Description: Transaction refund analysis
-- ** Author: Youma
-- ** Created on: 20170616
-- ** Change log:
-- ** Modified on Modified by Content
-- yyyymmdd name comment
-- 20170831 Wuma Add a comment on the biz_type=1234 transaction
--**************************************************************************
Column arrangement
Each selected column in a SELECT statement goes on its own line. This makes it easy to add, remove, or reorder columns without affecting adjacent lines.
Follow these formatting rules:
-
Leave one indentation unit between
SELECTand the first column. -
Start each subsequent column on a new line with two indentation units, preceded by a comma.
-
Place the comma before the column name (not after the preceding column). This way, commenting out any single column does not break the syntax of adjacent lines.
-
Place the
ASkeyword on the same line as its column, and align allASkeywords in the same column.
Good:
SELECT
order_id
, user_id AS uid
, amount AS total_amount
, created_at AS create_time
FROM orders
Bad:
SELECT order_id, user_id AS uid, amount AS total_amount, created_at AS create_time FROM orders
INSERT statement
Arrange all clauses of an INSERT statement on the same line.
SELECT statement clause arrangement
Clauses such as FROM, WHERE, GROUP BY, HAVING, ORDER BY, JOIN, and UNION must follow these formatting rules:
-
Place each clause on its own line.
-
Left-align all clauses with the
SELECTkeyword. -
Separate the first keyword of a clause from its content with two indentation units.
-
If the clause name is longer than two indentation units (such as
ORDER BYorGROUP BY), add a single space between the clause name and its content. -
Place
ANDandORat the beginning of the line, left-aligned withWHERE. Starting each condition with its logical operator makes each condition independently scannable and immediately shows whether conditions are additive or alternative.
Good:
SELECT
order_id
, user_id
, amount
FROM orders
WHERE status = 'completed'
AND amount > 100
AND created_at >= '2024-01-01'
GROUP BY
order_id
, user_id
, amount
ORDER BY amount DESC
Bad:
SELECT order_id, user_id, amount FROM orders WHERE status = 'completed' AND amount > 100
Operator spacing
Place one space before and after each arithmetic operator and logical operator. Keep all operators on the same line unless the line exceeds 80 characters.
Good:
WHERE amount > 100 AND status = 'completed'
Bad:
WHERE amount>100 AND status='completed'
CASE expressions
Follow these rules when writing CASE expressions:
-
Write the
WHENclause one indentation unit afterCASE, on the same line. -
Put each
WHENclause on its own line. Wrap long clauses. -
Always include an
ELSEclause to handle unexpected values. AlignELSEwithWHEN.
Good:
SELECT
order_id
, CASE status
WHEN 'completed' THEN 'Success'
WHEN 'cancelled' THEN 'Cancelled'
ELSE 'Pending'
END AS order_status
FROM orders
Bad:
SELECT order_id, CASE status WHEN 'completed' THEN 'Success' WHEN 'cancelled' THEN 'Cancelled' END AS order_status FROM orders
The "Bad" example omits the ELSE clause, which leaves unexpected values unhandled.
Subqueries
Subqueries are commonly used for extract, transform, load (ETL) operations in data warehouse systems. When nesting subqueries, use the following naming convention for aliases at each level:
| Level | Full name | Abbreviation |
|---|---|---|
| 1 | part | P (or a) |
| 2 | segment | S (or b) |
| 3 | unit | U (or c) |
| 4 | detail | D (or d) |
To distinguish multiple subqueries at the same level, append a number to the alias (for example, P1, P2). Add comments to aliases as needed.
Example:
SELECT
P.order_id
, P.user_id
, P.total_amount
FROM (
-- P: Aggregate completed orders by user
SELECT
S.user_id
, SUM(S.amount) AS total_amount
FROM (
-- S: Filter completed orders
SELECT
order_id
, user_id
, amount
FROM orders
WHERE status = 'completed'
) S
GROUP BY S.user_id
) P
WHERE P.total_amount > 1000
Table aliases
Follow these rules when using table aliases:
-
Assign an alias to every table in a
SELECTstatement. -
Once an alias is assigned, use it consistently throughout the statement.
-
Define aliases in alphabetical order.
-
Do not use SQL keywords as aliases.
-
In nested queries, use the level-based naming convention (P/S/U/D or a/b/c/d) described in the Subqueries section.
-
Add comments to aliases when their meaning is not immediately obvious.
SQL comments
Follow these rules when adding comments:
-
Add a comment to each SQL statement.
-
Place statement-level comments on a separate line, before the statement.
-
Place column-level comments immediately after the column.
-
Add comments to clauses that are difficult to understand.
-
Comment on any important or business-critical code.
-
For long statements, add comments by segment based on purpose.
-
Describing a constant or variable is required. Commenting on the valid value range is optional.
Adding comments with keyboard shortcuts:
-
Windows: Press
Ctrl+/to comment or uncomment the current line. Select multiple lines first to comment them all at once. -
macOS: Press
cmd+/to comment or uncomment the current line. Select multiple lines first to comment them all at once.
End-to-end example
The following example demonstrates all the specifications above in a single SQL script: code header, column arrangement, clause formatting, CASE expression, subquery, aliases, and comments.
-- MaxCompute(ODPS) SQL
--**************************************************************************
-- ** Subject: Sales Analysis
-- ** Description: Monthly revenue by user segment for completed orders
-- ** Author: Youma
-- ** Created on: 20240101
-- ** Change log:
-- ** Modified on Modified by Content
-- yyyymmdd name comment
--**************************************************************************
-- P: Final output -- monthly revenue by user segment
SELECT
S.order_month
, S.user_segment
, S.monthly_revenue
FROM (
-- S: Aggregate revenue and classify users
SELECT
DATE_FORMAT(U.created_at, '%Y-%m') AS order_month -- truncate to month
, CASE
WHEN U.total_amount >= 10000 THEN 'High-value'
WHEN U.total_amount >= 1000 THEN 'Mid-value'
ELSE 'Standard'
END AS user_segment
, SUM(U.total_amount) AS monthly_revenue
FROM (
-- U: Filter completed orders within the target period
SELECT
order_id
, user_id
, amount AS total_amount
, created_at
FROM orders
WHERE status = 'completed'
AND created_at >= '2024-01-01'
AND created_at < '2025-01-01'
) U
GROUP BY
DATE_FORMAT(U.created_at, '%Y-%m')
, CASE
WHEN U.total_amount >= 10000 THEN 'High-value'
WHEN U.total_amount >= 1000 THEN 'Mid-value'
ELSE 'Standard'
END
) S
ORDER BY S.order_month ASC
, S.monthly_revenue DESC