Write MaxCompute SQL for common query examples - MaxCompute

MaxCompute SQL shares most syntax with standard SQL but has specific behavioral differences—such as restrictions on Cartesian products, subquery placement, and ORDER BY usage—that can cause unexpected errors when you migrate queries from other systems. This topic demonstrates seven common query patterns using a concrete employee dataset, so you can learn the syntax and constraints together in context.

Prerequisites

Before you begin, ensure that you have:

A MaxCompute project with read and write permissions
Access to a SQL execution tool, such as the MaxCompute console or odpscmd

Set up the dataset

The examples use two tables: emp (employees) and dept (departments).

Step 1: Create the tables

Run the following statements to create the emp and dept tables in your MaxCompute project.

emp table:

CREATE TABLE IF NOT EXISTS emp (
  EMPNO STRING,
  ENAME STRING,
  JOB STRING,
  MGR BIGINT,
  HIREDATE DATETIME,
  SAL DOUBLE,
  COMM DOUBLE,
  DEPTNO BIGINT);

dept table:

CREATE TABLE IF NOT EXISTS dept (
  DEPTNO BIGINT,
  DNAME STRING,
  LOC STRING);

Step 2: Prepare the data files

Create two local files with the following content.

emp.txt

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10

dept.txt

10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

Step 3: Upload data to the tables

Upload emp.txt and dept.txt to the corresponding tables. For supported upload methods, see Data upload scenarios and tools.

Examples

Example 1: Query departments with at least one employee

Use a JOIN clause rather than a subquery filter to avoid scanning all rows. A subquery with DISTINCT extracts unique department numbers from emp, and the outer JOIN matches them against dept.

SELECT d.*
FROM dept d
JOIN (
    SELECT DISTINCT deptno AS no
    FROM emp
) e
ON d.deptno = e.no;

Result:

+------------+------------+------------+
| deptno     | dname      | loc        |
+------------+------------+------------+
| 10         | ACCOUNTING | NEW YORK   |
| 20         | RESEARCH   | DALLAS     |
| 30         | SALES      | CHICAGO    |
+------------+------------+------------+

Department 40 (OPERATIONS) has no employees and is excluded.

MaxCompute SQL does not support Cartesian products in JOIN clauses. Use MAPJOIN as an alternative when one table in the join is small. See Example 2 for MAPJOIN usage.

Example 2: Query employees with higher salaries than SMITH

This query uses the MAPJOIN hint to enable a non-equi join condition (e.sal > a.sal).

SELECT /*+ MapJoin(a) */ e.empno
    , e.ename
    , e.sal
FROM emp e
JOIN (
    SELECT MAX(sal) AS sal
    FROM `emp`
    WHERE `ENAME` = 'SMITH'
) a
ON e.sal > a.sal;

Result:

+------------+------------+------------+
| empno      | ename      | sal        |
+------------+------------+------------+
| 7499       | ALLEN      | 1600.0     |
| 7521       | WARD       | 1250.0     |
| 7566       | JONES      | 2975.0     |
| 7654       | MARTIN     | 1250.0     |
| 7698       | BLAKE      | 2850.0     |
| 7782       | CLARK      | 2450.0     |
| 7788       | SCOTT      | 3000.0     |
| 7839       | KING       | 5000.0     |
| 7844       | TURNER     | 1500.0     |
| 7876       | ADAMS      | 1100.0     |
| 7900       | JAMES      | 950.0      |
| 7902       | FORD       | 3000.0     |
| 7934       | MILLER     | 1300.0     |
+------------+------------+------------+

The SELECT expression does not support subqueries directly. Rewrite subquery logic as a JOIN clause, as shown above.

Example 3: Query employees and their direct managers

A self-join on emp matches each employee's MGR field against another employee's EMPNO. Use LEFT OUTER JOIN so that employees without a manager—such as KING—still appear in the results.

SELECT a.ename
    , b.ename
FROM emp a
LEFT OUTER JOIN emp b
ON b.empno = a.mgr;

Result:

+------------+------------+
| ename      | ename2     |
+------------+------------+
| SMITH      | FORD       |
| ALLEN      | BLAKE      |
| WARD       | BLAKE      |
| JONES      | KING       |
| MARTIN     | BLAKE      |
| BLAKE      | KING       |
| CLARK      | KING       |
| SCOTT      | JONES      |
| KING       | NULL       |
| TURNER     | BLAKE      |
| ADAMS      | SCOTT      |
| JAMES      | BLAKE      |
| FORD       | JONES      |
| MILLER     | CLARK      |
+------------+------------+

KING has no manager record in the table, so ename2 is NULL.

Example 4: Query job types with minimum salaries above 1,500 USD

GROUP BY groups rows by job, and HAVING filters the grouped results to keep only jobs where the minimum salary exceeds 1,500 USD.

SELECT emp.`JOB`
    , MIN(emp.sal) AS sal
FROM `emp`
GROUP BY emp.`JOB`
HAVING MIN(emp.sal) > 1500;

Result:

+------------+------------+
| job        | sal        |
+------------+------------+
| MANAGER    | 2450.0     |
| ANALYST    | 3000.0     |
| PRESIDENT  | 5000.0     |
+------------+------------+

When using GROUP BY, the SELECT list can include only aggregate functions and columns that appear in the GROUP BY clause.

Example 5: Query employee count, average salary, and average tenure per department

This query combines COUNT, AVG, and ROUND with the datediff and getdate built-in functions to calculate each department's headcount, average salary, and average days of service.

SELECT COUNT(empno) AS cnt_emp
    , ROUND(AVG(sal), 2) AS avg_sal
    , ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire
FROM `emp`
GROUP BY `DEPTNO`;

Result:

+------------+------------+------------+
| cnt_emp    | avg_sal    | avg_hire   |
+------------+------------+------------+
| 5          | 2175.0     | 14886.2    |
| 6          | 1566.67    | 15715.33   |
| 3          | 2916.67    | 15606.33   |
+------------+------------+------------+

The avg_hire values depend on the current date because getdate() returns the system date at query time. Your results will differ from the values shown above.

Example 6: Query the top three earners per department

Window functions compute a value for each row relative to a partition of rows, without collapsing the partition into a single output row. This is different from GROUP BY, which produces one row per group.

ROW_NUMBER() OVER (...) assigns a sequential integer to each row within a partition.
PARTITION BY deptno resets the counter for each department.
ORDER BY sal DESC ranks employees from highest to lowest salary within each partition.
The outer WHERE emp1.nums < 4 keeps only the top three rows per department.

SELECT *
FROM (
  SELECT deptno
    , ename
    , sal
    , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
  FROM emp
) emp1
WHERE emp1.nums < 4;

Result:

+------------+------------+------------+------------+
| deptno     | ename      | sal        | nums       |
+------------+------------+------------+------------+
| 10         | KING       | 5000.0     | 1          |
| 10         | CLARK      | 2450.0     | 2          |
| 10         | MILLER     | 1300.0     | 3          |
| 20         | SCOTT      | 3000.0     | 1          |
| 20         | FORD       | 3000.0     | 2          |
| 20         | JONES      | 2975.0     | 3          |
| 30         | BLAKE      | 2850.0     | 1          |
| 30         | ALLEN      | 1600.0     | 2          |
| 30         | TURNER     | 1500.0     | 3          |
+------------+------------+------------+------------+

ORDER BY must be followed by LIMIT N when used at the top level of a query. In this example, ORDER BY appears inside a window function's OVER clause, so no LIMIT is required.

Example 7: Query employee count and clerk proportion per department

CASE WHEN inside SUM converts a condition to a numeric value (1 for clerks, 0 for others). Dividing the sum by COUNT(empno) gives the proportion.

SELECT deptno
    , COUNT(empno) AS cnt
    , ROUND(SUM(CASE
      WHEN job = 'CLERK' THEN 1
      ELSE 0
    END) / COUNT(empno), 2) AS rate
FROM `EMP`
GROUP BY deptno;

Result:

+------------+------------+------------+
| deptno     | cnt        | rate       |
+------------+------------+------------+
| 20         | 5          | 0.4        |
| 30         | 6          | 0.17       |
| 10         | 3          | 0.33       |
+------------+------------+------------+

MaxCompute SQL constraints

The following constraints apply to MaxCompute SQL. Understanding them before writing queries prevents common errors.

Clause or feature	Constraint
`GROUP BY`	The SELECT list can include only aggregate functions and columns in the `GROUP BY` clause.
`ORDER BY`	Must be followed by `LIMIT N` when used at the top level of a query.
SELECT expression	Does not support subqueries. Rewrite as a JOIN clause.
`JOIN`	Does not support Cartesian products. Use MAPJOIN as an alternative.
`UNION ALL`	Must be used with subqueries.
`IN` / `NOT IN` subquery	Can reference only one column and return up to 1,000 rows. For larger result sets, use a JOIN clause.