PIVOT and UNPIVOT - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute supports the PIVOT and UNPIVOT keywords. PIVOT lets you convert one or more value-filled rows with specified values into columns by aggregating data, while UNPIVOT lets you convert one or more columns into rows. This topic describes the PIVOT and UNPIVOT keywords and provides examples on how to use the keywords.

PIVOT keyword

The PIVOT keyword is used to convert value-filled rows into columns. The PIVOT keyword is specified in the FROM clause and can be used with other keywords such as JOIN.

Note

The PIVOT keyword is in canary release and may not be available to some users.

Syntax

SELECT ... 
FROM ... 
PIVOT ( 
    <aggregate function> [AS <alias>] [, <aggregate function> [AS <alias>]] ... 
    FOR (<column> [, <column>] ...) 
    IN ( 
        (<value> [, <value>] ...) AS <new column> 
        [, (<value> [, <value>] ...) AS <new column>] 
        ... 
       ) 
    ) 
[...]

Parameter description

Parameter	Required	Description
aggregate function	Yes	The aggregate function that is used for row-to-column conversion. For more information about supported aggregate functions, see Aggregate functions.
alias	No	The alias of the aggregate function. The alias is a part of the name for the generated column. For more information, see Limits.
column	Yes	The name of the column in the source table where the row value that you want to aggregate resides.
value	Yes	The row value that you want to aggregate for row-to-column conversion.
new column	No	The name of the column after the conversion.

Limits

Limits on aggregate functions:
- You cannot nest an aggregate function in another function.
- You can specify aggregate functions that have Scalar functions and grouped columns.
- You cannot specify aggregate functions that contain window functions or aggregate functions.
- The columns that are specified in an aggregate function can only be the columns in the source table that you want to aggregate.
An alias can be used only as a part of the generated column. You cannot specify an expression.
A value can be an expression. The expression can reference only columns in the source table that you want to aggregate. The expression can contain Scalar functions but not aggregate functions or window functions.
The aliases that you specify in the PIVOT keyword determine the column names of the new table that is generated after the PIVOT function is executed. The following examples show how aliases affect column names.
- PIVOT (agg1 for axis1 in ('1', '2', '3', ...)):
  If values are constants and no alias is specified, values are used as the column names. In this example, the column names are 1,2,3....
- PIVOT (agg1 as a for axis1 in ('1', '2', '3', ...)):
  If values are constants and an alias is specified, the columns are named in the value_alias in the aggregate function format. In this example, the column names are '1'_a, '2'_a, ....
- PIVOT (agg1 as a, agg2 as b for axis1 in ('1', '2', '3', ...)):
  If values are constants and multiple aliases are specified, the columns are named in the value_alias of the aggregate function format. In this example, the column names are '1'_a, '2'_a, ..., '1'_b, '2'_b, ....
- PIVOT (agg1 as a, agg2 for axis1 in (expr1, expr2, '3', ...)):
  If some values are expressions, MaxCompute generates aliases for the expressions and for the aggregate functions that are not specified with aliases. In this example, MaxCompute generates aliases for expr1, expr2, and agg2. The query is translated into PIVOT (agg1 as a, agg2 as generated_alias1 for axis1 in (expr1 as generated_alias2, expr2 as generated_alias3, '3', ...)). As a result, the columns are named in the value/expression alias_alias of the aggregate function format. In this example, the column names are generated_alias2_a, generated_alias3_a,'3'_a ..., generated_alias2_generated_alias1, generated_alias3_generated_alias1,'3'_ generated_alias1....

Usage notes

The PIVOT keyword is equivalent to a combination of a GROUP BY clause, aggregate functions, and FILTER functions. Example:

SELECT ...
FROM ...
PIVOT (
  agg1 AS a, agg2 AS b, ...
  FOR (axis1, ..., axisN)
  IN (
      (v11, ..., v1N) AS label1,
      (v21, ..., v2N) AS label2, 
      ...)
)

The preceding statement is equivalent to the following statement:

select 
k1, ... kN, 
agg1 AS label1_a filter (where axis1 = v11 and ... and axisN = v1N), 
agg2 AS label1_b filter (where axis1 = v21 and ... and axisN = v2N), 
..., 
agg1 AS label2_a filter (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label2_b filter (where axis1 = v21 and ... and axisN = v2N), 
..., 
from xxxxxx
group by k1, ... kN

The table that is specified in the FROM clause is the source table of the PIVOT keyword. k1, ... kN are a set of columns that are not specified in agg1, agg2, ... or axis1, ..., axisN.

Example

In this example, the fruit sales data of a company in each quarter of this year is used. The following data definition language (DDL) statement is used to create a table:

-- Create a table.
create table mf_cop_sales (tran_id bigint,
                           productID string,
                           tran_amt decimal,
                           season string);
insert into table mf_cop_sales values(1,'apple',100,'Q1'),
                                     (2,'orange',200,'Q1'),
                                     (3,'banana',300,'Q1'),
                                     (4,'apple',400,'Q2'),
                                     (5,'orange',500,'Q2'),
                                     (6,'banana',600,'Q2'),
                                     (7,'apple',700,'Q3'),
                                     (8,'orange',800,'Q3'),
                                     (9,'banana',700,'Q3'),
                                     (10,'apple',500,'Q4'),
                                     (11,'orange',400,'Q4'),
                                     (12,'banana',200,'Q4');
-- Execute the following statement to view sales details:
select * from mf_cop_sales;
+------------+------------+------------+------------+
| tran_id    | productid  | tran_amt   | season     | 
+------------+------------+------------+------------+
| 1          | apple      | 100        | Q1         | 
| 2          | orange     | 200        | Q1         | 
| 3          | banana     | 300        | Q1         | 
| 4          | apple      | 400        | Q2         | 
| 5          | orange     | 500        | Q2         | 
| 6          | banana     | 600        | Q2         | 
| 7          | apple      | 700        | Q3         | 
| 8          | orange     | 800        | Q3         | 
| 9          | banana     | 700        | Q3         | 
| 10         | apple      | 500        | Q4         | 
| 11         | orange     | 400        | Q4         | 
| 12         | banana     | 200        | Q4         | 
+------------+------------+------------+------------+

Query sales data of each quarter in this year.

select * from (select season, 
                      tran_amt 
                 from mf_cop_sales) 
                      pivot (sum(tran_amt) for season in ('Q1' as spring, 
                                              						'Q2' as summer, 
                                              						'Q3' as autumn, 
                                              						'Q4' as winter) 
                ); 
-- The following result is returned:
+--------+--------+--------+--------+
| spring | summer | autumn | winter |
+--------+--------+--------+--------+
| 600    | 1500   | 2200   | 1100   |
+--------+--------+--------+--------+

Query the sales data of each product in this year.

select * from (select productid, 
                      tran_amt 
                 from mf_cop_sales) 
              pivot (sum(tran_amt) as sumbypro 
                 for productid in ('apple', 'orange', 'banana')); 
-- The following result is returned:
+------------------+-------------------+-------------------+
| 'apple'_sumbypro | 'orange'_sumbypro | 'banana'_sumbypro |
+------------------+-------------------+-------------------+
| 1700             | 1900              | 1800              |
+------------------+-------------------+-------------------+

Query the product with the maximum sales amount in the fourth quarter of this year.

select * from (select season, 
                      tran_amt 
                 from mf_cop_sales) 
         pivot (max(tran_amt) for season in ('Q4')); 

-- The following result is returned:
+------+
| 'Q4' |
+------+
| 500  |
+------+

UNPIVOT keyword

The UNPIVOT keyword is used to convert columns into rows. The UNPIVOT keyword is specified in the FROM clause and can be used with other keywords such as JOIN.

Syntax

SELECT ...
FROM ...
UNPIVOT (
  <new column of value> [, <new column of value>] ...
  FOR (<new column of name> [, <new column of name>] ...)
  IN (
      (<column> [, <column>] ...) [AS (<column value> [, <column value>] ...)]
      [, (<column> [, <column>] ...) [AS (<column value> [, <column value>] ...)]]
      ...
    )
)
[...]

Parameter description

Parameter	Required	Description
new column of value	Yes	The name of a new column that is generated after the conversion. The values in this column are populated with the values of the columns that you want to convert.
new column of name	Yes	The name of a new column that is generated after the conversion. The values in this column are populated with the names of the columns that you want to convert.
column	Yes	The name of a column that you want to convert. The name of this column is used to populate new column of name, and the values in this column are used to populate new column of value.
column value	No	The alias of a column that you want to convert.

Limits

A new column that is specified by new column of value must correspond to a group of to-be-converted columns that are specified by (<column1> [, <column2>] ...). Therefore, the number of the new column of value parameters must be the same as the number of groups of the to-be-converted columns. new column of value1, ..., new column of valueM corresponds to the following code:
```
(column11, ..., column1N) AS (column value11, ..., column value1N),
(column21, ... column2N) AS (column value21, ... column value2N), 
...
(columnM1, ..., columnMN) AS (column valueM1, ..., column valueMN)
```
A new column that is specified by new column of name must correspond to a group of the aliases of to-be-converted columns that are specified by (<column value1> [, <column value2>] ...). Therefore, the number of the new column of name parameters must be the same as the number of groups of the aliases for columns that you want to convert. new column of name1, ..., new column of nameM corresponds to the following code:
```
(column value11, ..., column value1N), 
(column value21, ... column value2N),  
...
(column valueM1, ... column valueMN)
```
Note
In actual practice, you do not need to specify (<column value> [, <column value>] ...). MaxCompute automatically generates aliases for specified columns. If you want to specify the aliases, you must ensure the number consistency.
The new column of value and new column of name parameters specify unique column names of the new table that is generated after the conversion. The parameter values cannot contain any expression.
The column parameter specifies the name of the column in the source table that you want to perform column-to-row conversion.
The column value can be a constant or an expression. If the column value parameter is set to an expression, the expression cannot contain column names. You must make sure that the result returned by the expression is a constant.
You can specify up to 100 columns in (<column1> [, <column2>] ...). If more than 100 columns are specified, data bloat occurs.
If you do not configure (<column value> [, <column value>] ...) to specify aliases for specified columns, MaxCompute automatically generates a group of strings as the aliases based on the following rules:
- UNPIVOT (measure1 for axis in (c1, c2, c3, ...)): In the UNPIVOT parameter, MaxCompute generates the aliases (c1, c2, c3, ...). This syntax is changed to UNPIVOT (measure1 for axis in (c1 as c1, c2 as c2, c3 as c3, ...)).
- In other scenarios in which aliases are not specified, MaxCompute automatically generates aliases for specified columns.
- If you specify aliases for only some columns, you must make sure that the aliases that you configure are of the STRING type to be compatible with the aliases that are automatically generated by MaxCompute. If the aliases that you configure are not of the STRING type, you must specify aliases for all columns.

Usage notes

The UNPIVOT keyword is equivalent to a combination of the CROSS JOIN clause and FILTER clause (CASE WHEN expression). Example:

SELECT ...
FROM ...
UNPIVOT (
    (measure1, ..., measureM)
    FOR (axis1, ..., axisN)
    IN ((c11, ..., c1M) AS (value11, ..., value1N),
        (c21, ..., c2M) AS (value21, ..., value2N), ...))
[...]

The preceding statement is equivalent to the following statement:

select 
    k1, ... kN,
    case 
    when axis1 = value11 and ... and axisN = value1N then c11
    when axis1 = value21 and ... and axisN = value2N then c21
    ...
    else null as measure1,
    ..., 
    case 
    when axis1 = value11 and ... and axisN = value1N then c1M
    when axis1 = value21 and ... and axisN = value2N then c2M
    else null as measureM, 
    axis1, ..., axisN
    from xxxx 
    join (values (value11, ..., value1N),(value21, ..., value2N), ... as generated_table_name(axis1, ..., axisN))

Example

In this example, the product sales data of each shop in each year is used. The following DDL statement is used to create a table:

-- Create a table. 
create table mf_shops(item_id bigint, 
                      year string, 
                      shop1 decimal, 
                      shop2 decimal, 
                      shop3 decimal, 
                      shop4 decimal); 
-- Insert data into the table.
with shops_table as  
		 (select * from values(1, 2020, 100, 200, 300, 400), 
                          (1, 2021, 100, 200, 200, 100), 
                          (2, 2020, 300, 400, 300, 200), 
                          (2, 2021, 400, 300, 100, 100)
                shops(item_id, year, shop1, shop2, shop3, shop4) 
     ) 
insert overwrite table mf_shops 
select * from shops_table; 
-- Query data from the table.
select * from mf_shops;
-- The following result is returned:
+------------+------+-------+-------+-------+-------+
| item_id    | year | shop1 | shop2 | shop3 | shop4 |
+------------+------+-------+-------+-------+-------+
| 1          | 2020 | 100   | 200   | 300   | 400   |
| 1          | 2021 | 100   | 200   | 200   | 100   |
| 2          | 2020 | 300   | 400   | 300   | 200   |
| 2          | 2021 | 400   | 300   | 100   | 100   |
+------------+------+-------+-------+-------+-------+

Consolidate the sales amount of each shop, and specify a new column named sales to display the sales amount.

-- Consolidate sales amount of each shop.
select * from mf_shops
unpivot (sales for shop in (shop1, shop2, shop3, shop4));

-- The following result is returned:
+------------+------------+------------+------+
| item_id    | year       | sales      | shop |
+------------+------------+------------+------+
| 1          | 2020       | 100        | shop1 |
| 1          | 2020       | 200        | shop2 |
| 1          | 2020       | 300        | shop3 |
| 1          | 2020       | 400        | shop4 |
| 1          | 2021       | 100        | shop1 |
| 1          | 2021       | 200        | shop2 |
| 1          | 2021       | 200        | shop3 |
| 1          | 2021       | 100        | shop4 |
| 2          | 2020       | 300        | shop1 |
| 2          | 2020       | 400        | shop2 |
| 2          | 2020       | 300        | shop3 |
| 2          | 2020       | 200        | shop4 |
| 2          | 2021       | 400        | shop1 |
| 2          | 2021       | 300        | shop2 |
| 2          | 2021       | 100        | shop3 |
| 2          | 2021       | 100        | shop4 |
+------------+------------+------------+------+

Specify an alias for each shop. The alias can be the same as a value in the source table or a custom string.

select * from mf_shops
unpivot (sales for shop in (shop1 as 'shop_name_1', shop2 as 'shop_name_2', shop3 as 'shop_name_3', shop4 as 'shop_name_4'));

-- The following result is returned:
+------------+------------+------------+------+
| item_id    | year       | sales      | shop |
+------------+------------+------------+------+
| 1          | 2020       | 100        | shop_name_1 |
| 1          | 2020       | 200        | shop_name_2 |
| 1          | 2020       | 300        | shop_name_3 |
| 1          | 2020       | 400        | shop_name_4 |
| 1          | 2021       | 100        | shop_name_1 |
| 1          | 2021       | 200        | shop_name_2 |
| 1          | 2021       | 200        | shop_name_3 |
| 1          | 2021       | 100        | shop_name_4 |
| 2          | 2020       | 300        | shop_name_1 |
| 2          | 2020       | 400        | shop_name_2 |
| 2          | 2020       | 300        | shop_name_3 |
| 2          | 2020       | 200        | shop_name_4 |
| 2          | 2021       | 400        | shop_name_1 |
| 2          | 2021       | 300        | shop_name_2 |
| 2          | 2021       | 100        | shop_name_3 |
| 2          | 2021       | 100        | shop_name_4 |
+------------+------------+------------+------+

Define shop1 and shop2 as east shops, and shop3 and shop4 as west shops. Generate a table in which the shop column indicates whether a shop is an east shop or a west shop, and the sales1 or sales2 column indicates the sales amount of each east or west shop.

select * from mf_shops
unpivot ((sales1, sales2) for shop in ((shop1, shop2) as 'east_shop', (shop3, shop4) as 'west_shop'));

-- The following result is returned:
+------------+------------+------------+------------+------+
| item_id    | year       | sales1     | sales2     | shop |
+------------+------------+------------+------------+------+
| 1          | 2020       | 100        | 200        | east_shop |
| 1          | 2020       | 300        | 400        | west_shop |
| 1          | 2021       | 100        | 200        | east_shop |
| 1          | 2021       | 200        | 100        | west_shop |
| 2          | 2020       | 300        | 400        | east_shop |
| 2          | 2020       | 300        | 200        | west_shop |
| 2          | 2021       | 400        | 300        | east_shop |
| 2          | 2021       | 100        | 100        | west_shop |
+------------+------------+------------+------------+------+

You can specify multiple aliases. The columns that correspond to the aliases are included in the output.

select * from mf_shops
unpivot ((sales1, sales2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west')));

+------------+------------+------------+------------+-----------+----------+
| item_id    | year       | sales1     | sales2     | shop_name | location |
+------------+------------+------------+------------+-----------+----------+
| 1          | 2020       | 100        | 200        | east_shop | east     |
| 1          | 2020       | 300        | 400        | west_shop | west     |
| 1          | 2021       | 100        | 200        | east_shop | east     |
| 1          | 2021       | 200        | 100        | west_shop | west     |
| 2          | 2020       | 300        | 400        | east_shop | east     |
| 2          | 2020       | 300        | 200        | west_shop | west     |
| 2          | 2021       | 400        | 300        | east_shop | east     |
| 2          | 2021       | 100        | 100        | west_shop | west     |
+------------+------------+------------+------------+-----------+----------+

You can add exclude nulls after the UNPIVOT keyword to exclude rows that contain null values in the sales1 and sales2 columns.

with 
shops as (select * from values
          (1, 2020, 100, 200, 300, 400),
          (1, 2021, 100, 200, 200, 100),
          (2, 2020, 300, 400, 300, 200),
          (2, 2021, 400, 300, 100, 100),
          (3, 2020, null, null, null, null) 
          shops(item_id, year, shop1, shop2, shop3, shop4)) 
select * from shops 
unpivot exclude nulls ((sales1, sales2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west')));

-- The following result is returned:
+------------+------------+------------+------------+-----------+----------+
| item_id    | year       | sales1     | sales2     | shop_name | location |
+------------+------------+------------+------------+-----------+----------+
| 1          | 2020       | 100        | 200        | east_shop | east     |
| 1          | 2020       | 300        | 400        | west_shop | west     |
| 1          | 2021       | 100        | 200        | east_shop | east     |
| 1          | 2021       | 200        | 100        | west_shop | west     |
| 2          | 2020       | 300        | 400        | east_shop | east     |
| 2          | 2020       | 300        | 200        | west_shop | west     |
| 2          | 2021       | 400        | 300        | east_shop | east     |
| 2          | 2021       | 100        | 100        | west_shop | west     |
+------------+------------+------------+------------+-----------+----------+