This topic describes the common scenarios of using MaxCompute SQL statements and how to write them.

Prepare a dataset

The emp and dept tables are used as the dataset in this example. You can create a table on a MaxCompute project and upload data to the table. For more information about how to import data, see Overview.

Download data files of the emp table and data files of the dept table.

Execute the following statement to create the emp table:

CREATE TABLE IF NOT EXISTS emp (
  EMPNO STRING,
  ENAME STRING,
  JOB STRING,
  MGR BIGINT,
  HIREDATE DATETIME,
  SAL DOUBLE,
  COMM DOUBLE,
  DEPTNO BIGINT);

Execute the following statement to create the dept table:

CREATE TABLE IF NOT EXISTS dept (
  DEPTNO BIGINT,
  DNAME STRING,
  LOC STRING);

Examples

  • Example 1: Query all departments that have at least one employee.
    We recommend that you use the JOIN clause to avoid large amounts of data in the query. Execute the following SQL statement:
    SELECT d. *
    FROM dept d
    JOIN (
        SELECT DISTINCT deptno AS no
        FROM emp
    ) e
    ON d.deptno = e.no;
  • Example 2: Query all employees who have higher salaries than Smith.
    The following code shows how to use MAPJOIN in the SQL statement for this scenario:
    SELECT /*+ MapJoin(a) */ e.empno
        , e.ename
        , e.sal
    FROM emp e
    JOIN (
        SELECT MAX(sal) AS sal
        FROM `emp`
        WHERE `ENAME` = 'SMITH'
    ) a
    ON e.sal > a.sal;
  • Example 3: Query the names of all employees and the names of their immediate superiors.
    The following code shows how to use EQUI JOIN in the SQL statement for this scenario:
    SELECT a.ename
        , b.ename
    FROM emp a
    LEFT OUTER JOIN emp b
    ON b.empno = a.mgr;
  • Example 4: Query all jobs that have basic salaries higher than USD 1,500.
    The following code shows how to use the HAVING clause in the SQL statement for this scenario:
    SELECT emp. `JOB`
        , MIN(emp.sal) AS sal
    FROM `emp`
    GROUP BY emp. `JOB`
    HAVING MIN(emp.sal) > 1500;
  • Example 5: Query the number of employees in each department, the average salary, and the average length of service.
    The following code shows how to use built-in functions in the SQL statement for this scenario:
    SELECT COUNT(empno) AS cnt_emp
        , ROUND(AVG(sal), 2) AS avg_sal
        , ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire
    FROM `emp`
    GROUP BY `DEPTNO`;
  • Example 6: Query the names and the sorting order of the first three employees who have the highest salaries.
    The following code shows how to use the TOP N clause in the SQL statement for this scenario:
    SELECT *
    FROM (
      SELECT deptno
        , ename
        , sal
        , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
      FROM emp
    ) emp1
    WHERE emp1.nums < 4;
  • Example 7: Query the number of employees in each department and the proportion of clerks in these departments.
    SELECT deptno
        , COUNT(empno) AS cnt
        , ROUND(SUM(CASE 
          WHEN job = 'CLERK' THEN 1
          ELSE 0
        END) / COUNT(empno), 2) AS rate
    FROM `EMP`
    GROUP BY deptno;

Notes

  • When you use the GROUP BY clause, the SELECT list can only consist of aggregate functions or columns that are part of the GROUP BY clause.
  • ORDER BY must be followed by LIMIT N.
  • The SELECT expression does not support subqueries. To use subqueries, you can rewrite the code to include a JOIN clause.
  • The JOIN clause does not support Cartesian projects. You can replace the JOIN clause with MAPJOIN.
  • UNION ALL must be replaced with subqueries.
  • The subquery that is specified in the IN or NOT IN clause must contain only one column and return a maximum of 1,000 rows. Otherwise, use the JOIN clause.