MaxCompute SQL shares most syntax with standard SQL but has specific behavioral differences—such as restrictions on Cartesian products, subquery placement, and ORDER BY usage—that can cause unexpected errors when you migrate queries from other systems. This topic demonstrates seven common query patterns using a concrete employee dataset, so you can learn the syntax and constraints together in context.
Prerequisites
Before you begin, ensure that you have:
A MaxCompute project with read and write permissions
Access to a SQL execution tool, such as the MaxCompute console or odpscmd
Set up the dataset
The examples use two tables: emp (employees) and dept (departments).
Step 1: Create the tables
Run the following statements to create the emp and dept tables in your MaxCompute project.
emp table:
CREATE TABLE IF NOT EXISTS emp (
EMPNO STRING,
ENAME STRING,
JOB STRING,
MGR BIGINT,
HIREDATE DATETIME,
SAL DOUBLE,
COMM DOUBLE,
DEPTNO BIGINT);dept table:
CREATE TABLE IF NOT EXISTS dept (
DEPTNO BIGINT,
DNAME STRING,
LOC STRING);Step 2: Prepare the data files
Create two local files with the following content.
emp.txt
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10dept.txt
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTONStep 3: Upload data to the tables
Upload emp.txt and dept.txt to the corresponding tables. For supported upload methods, see Data upload scenarios and tools.
Examples
Example 1: Query departments with at least one employee
Use a JOIN clause rather than a subquery filter to avoid scanning all rows. A subquery with DISTINCT extracts unique department numbers from emp, and the outer JOIN matches them against dept.
SELECT d.*
FROM dept d
JOIN (
SELECT DISTINCT deptno AS no
FROM emp
) e
ON d.deptno = e.no;Result:
+------------+------------+------------+
| deptno | dname | loc |
+------------+------------+------------+
| 10 | ACCOUNTING | NEW YORK |
| 20 | RESEARCH | DALLAS |
| 30 | SALES | CHICAGO |
+------------+------------+------------+Department 40 (OPERATIONS) has no employees and is excluded.
MaxCompute SQL does not support Cartesian products in JOIN clauses. Use MAPJOIN as an alternative when one table in the join is small. See Example 2 for MAPJOIN usage.
Example 2: Query employees with higher salaries than SMITH
This query uses the MAPJOIN hint to enable a non-equi join condition (e.sal > a.sal).
SELECT /*+ MapJoin(a) */ e.empno
, e.ename
, e.sal
FROM emp e
JOIN (
SELECT MAX(sal) AS sal
FROM `emp`
WHERE `ENAME` = 'SMITH'
) a
ON e.sal > a.sal;Result:
+------------+------------+------------+
| empno | ename | sal |
+------------+------------+------------+
| 7499 | ALLEN | 1600.0 |
| 7521 | WARD | 1250.0 |
| 7566 | JONES | 2975.0 |
| 7654 | MARTIN | 1250.0 |
| 7698 | BLAKE | 2850.0 |
| 7782 | CLARK | 2450.0 |
| 7788 | SCOTT | 3000.0 |
| 7839 | KING | 5000.0 |
| 7844 | TURNER | 1500.0 |
| 7876 | ADAMS | 1100.0 |
| 7900 | JAMES | 950.0 |
| 7902 | FORD | 3000.0 |
| 7934 | MILLER | 1300.0 |
+------------+------------+------------+The SELECT expression does not support subqueries directly. Rewrite subquery logic as a JOIN clause, as shown above.
Example 3: Query employees and their direct managers
A self-join on emp matches each employee's MGR field against another employee's EMPNO. Use LEFT OUTER JOIN so that employees without a manager—such as KING—still appear in the results.
SELECT a.ename
, b.ename
FROM emp a
LEFT OUTER JOIN emp b
ON b.empno = a.mgr;Result:
+------------+------------+
| ename | ename2 |
+------------+------------+
| SMITH | FORD |
| ALLEN | BLAKE |
| WARD | BLAKE |
| JONES | KING |
| MARTIN | BLAKE |
| BLAKE | KING |
| CLARK | KING |
| SCOTT | JONES |
| KING | NULL |
| TURNER | BLAKE |
| ADAMS | SCOTT |
| JAMES | BLAKE |
| FORD | JONES |
| MILLER | CLARK |
+------------+------------+KING has no manager record in the table, so ename2 is NULL.
Example 4: Query job types with minimum salaries above 1,500 USD
GROUP BY groups rows by job, and HAVING filters the grouped results to keep only jobs where the minimum salary exceeds 1,500 USD.
SELECT emp.`JOB`
, MIN(emp.sal) AS sal
FROM `emp`
GROUP BY emp.`JOB`
HAVING MIN(emp.sal) > 1500;Result:
+------------+------------+
| job | sal |
+------------+------------+
| MANAGER | 2450.0 |
| ANALYST | 3000.0 |
| PRESIDENT | 5000.0 |
+------------+------------+When usingGROUP BY, the SELECT list can include only aggregate functions and columns that appear in theGROUP BYclause.
Example 5: Query employee count, average salary, and average tenure per department
This query combines COUNT, AVG, and ROUND with the datediff and getdate built-in functions to calculate each department's headcount, average salary, and average days of service.
SELECT COUNT(empno) AS cnt_emp
, ROUND(AVG(sal), 2) AS avg_sal
, ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire
FROM `emp`
GROUP BY `DEPTNO`;Result:
+------------+------------+------------+
| cnt_emp | avg_sal | avg_hire |
+------------+------------+------------+
| 5 | 2175.0 | 14886.2 |
| 6 | 1566.67 | 15715.33 |
| 3 | 2916.67 | 15606.33 |
+------------+------------+------------+Theavg_hirevalues depend on the current date becausegetdate()returns the system date at query time. Your results will differ from the values shown above.
Example 6: Query the top three earners per department
Window functions compute a value for each row relative to a partition of rows, without collapsing the partition into a single output row. This is different from GROUP BY, which produces one row per group.
ROW_NUMBER() OVER (...)assigns a sequential integer to each row within a partition.PARTITION BY deptnoresets the counter for each department.ORDER BY sal DESCranks employees from highest to lowest salary within each partition.The outer
WHERE emp1.nums < 4keeps only the top three rows per department.
SELECT *
FROM (
SELECT deptno
, ename
, sal
, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
FROM emp
) emp1
WHERE emp1.nums < 4;Result:
+------------+------------+------------+------------+
| deptno | ename | sal | nums |
+------------+------------+------------+------------+
| 10 | KING | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 2 |
| 10 | MILLER | 1300.0 | 3 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 2 |
| 20 | JONES | 2975.0 | 3 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
+------------+------------+------------+------------+ORDER BYmust be followed byLIMIT Nwhen used at the top level of a query. In this example,ORDER BYappears inside a window function'sOVERclause, so noLIMITis required.
Example 7: Query employee count and clerk proportion per department
CASE WHEN inside SUM converts a condition to a numeric value (1 for clerks, 0 for others). Dividing the sum by COUNT(empno) gives the proportion.
SELECT deptno
, COUNT(empno) AS cnt
, ROUND(SUM(CASE
WHEN job = 'CLERK' THEN 1
ELSE 0
END) / COUNT(empno), 2) AS rate
FROM `EMP`
GROUP BY deptno;Result:
+------------+------------+------------+
| deptno | cnt | rate |
+------------+------------+------------+
| 20 | 5 | 0.4 |
| 30 | 6 | 0.17 |
| 10 | 3 | 0.33 |
+------------+------------+------------+MaxCompute SQL constraints
The following constraints apply to MaxCompute SQL. Understanding them before writing queries prevents common errors.
| Clause or feature | Constraint |
|---|---|
GROUP BY | The SELECT list can include only aggregate functions and columns in the GROUP BY clause. |
ORDER BY | Must be followed by LIMIT N when used at the top level of a query. |
| SELECT expression | Does not support subqueries. Rewrite as a JOIN clause. |
JOIN | Does not support Cartesian products. Use MAPJOIN as an alternative. |
UNION ALL | Must be used with subqueries. |
IN / NOT IN subquery | Can reference only one column and return up to 1,000 rows. For larger result sets, use a JOIN clause. |