Transpose columns to rows or rows to columns in MaxCompute - MaxCompute

Data analysis often requires transforming data by pivoting rows into columns or unpivoting columns into rows. For most standard scenarios, MaxCompute's native PIVOT and UNPIVOT operators are the recommended best practice for their conciseness and performance. This topic details the manual methods, which are powerful alternatives for edge cases where the native operators may not be flexible enough for highly complex aggregation logic.

Background

The following diagram illustrates concepts of pivoting and unpivoting. 行转列与列转行

Pivot (Rows to Columns): Transforms a "long" format table into a "wide" format. It rotates values from a single column (e.g., 'subject') into multiple new columns.
Unpivot (Columns to Rows): Transforms a "wide" format table into a "long" format. It collapses multiple columns (e.g., 'chinese', 'mathematics') into a single value column, using a new column to label their original source.

Sample data

We will use the following two tables to demonstrate the transformations.

Create the source table rowtocolumn and insert data for the pivoting example.

CREATE TABLE rowtocolumn (name string, subject string, result bigint);
INSERT INTO TABLE rowtocolumn VALUES 
('Bob' , 'chinese' , 74),
('Bob' , 'mathematics' , 83),
('Bob' , 'physics' , 93),
('Alice' , 'chinese' , 74),
('Alice' , 'mathematics' , 84),
('Alice' , 'physics' , 94);

Query the data in the rowtocolumn table:

SELECT * FROM rowtocolumn;

-- The following result is returned.
+------------+-------------+------------+
| name       | subject     | result     |
+------------+-------------+------------+
| Alice      | chinese     | 74         |
| Alice      | mathematics | 83         |
| Alice      | physics     | 93         |
| Bob        | chinese     | 74         |
| Bob        | mathematics | 84         |
| Bob        | physics     | 94         |
+------------+-------------+------------+

Create the source table columntorow and insert data for the unpivoting example.

CREATE TABLE columntorow (name string, chinese bigint, mathematics bigint, physics bigint);
INSERT INTO TABLE columntorow VALUES 
('Bob' , 74, 83, 93),
('Alice', 74, 84, 94);

Query the data in the columntorow table:

SELECT * FROM columntorow;

-- The following result is returned: 
+------------+------------+-------------+------------+
| name       | chinese    | mathematics | physics    |
+------------+------------+-------------+------------+
| Bob        | 74         | 83          | 93         |
| Alice      | 74         | 84          | 94         |
+------------+------------+-------------+------------+

Pivot using CASE WHEN or built-in functions

The following methods pivot data from rows to columns:

Method 1: Use CASE WHEN (SQL Standard)
This method uses conditional aggregation. It uses a CASE WHEN expression to extract the score for each subject and an aggregate function like MAX() to consolidate the results into a single row for each student.
When to use this method:
- Portability is key: As this is standard SQL, the logic works across almost any database system.
- Readability matters: The logic is explicit and easy for any SQL developer to understand.
- The number of pivot columns is fixed and manageable.
Drawbacks:
- Becomes verbose and hard to maintain if you need to pivot into a large number of columns.
```
SELECT 
    name,
    MAX(CASE subject WHEN 'chinese' THEN result END) AS chinese,
    MAX(CASE subject WHEN 'mathematics' THEN result END) AS mathematics,
    MAX(CASE subject WHEN 'physics' THEN result END) AS physics 
FROM rowtocolumn 
GROUP BY name;
```
The following result is returned:
```
+--------+------------+------------+------------+
| name   | chinese    | mathematics| physics    |
+--------+------------+------------+------------+
| Bob    | 74         | 83         | 93         |
| Alice  | 74         | 84         | 94         |
+--------+------------+------------+------------+
```
Method 2: Use built-in functions (MaxCompute-specific)
This method first uses CONCAT and WM_CONCAT to aggregate subjects and scores into a single key-value string for each student. The KEYVALUE function then parses this string to extract the score for each subject into a separate column.
When to use this method:
- Code conciseness is a priority, especially with many pivot values.
- You are committed to the MaxCompute ecosystem and do not require cross-platform compatibility.
Drawbacks:
- Platform Lock-in: This code is not portable to other SQL databases.
- Potential Performance Bottleneck: String manipulation on very large datasets can be less efficient than direct aggregation.
```
SELECT 
    name,
    KEYVALUE(key_value_string, 'chinese') AS chinese,
    KEYVALUE(key_value_string, 'mathematics') AS mathematics,
    KEYVALUE(key_value_string, 'physics') AS physics
FROM (
    SELECT 
        name, 
        WM_CONCAT(';', CONCAT(subject, ':', result)) AS key_value_string
    FROM rowtocolumn
    GROUP BY name
) AS source_with_concat;
```
The following result is returned:
```
+--------+------------+------------+------------+
| name   | chinese    | mathematics| physics    |
+--------+------------+------------+------------+
| Bob    | 74         | 83         | 93         |
| Alice  | 74         | 84         | 94         |
+--------+------------+------------+------------+
```

Note

For complex data transformations that require expanding a single row into multiple rows, use Lateral View with functions such as EXPLODE, INLINE, or TRANS_ARRAY.

Unpivot using UNION ALL or built-in functions

The following methods unpivot data from columns to rows:

Method 1: Use UNION ALL (SQL Standard)
This method uses UNION ALL to combine SELECT statements, where each statement retrieves a different subject. This effectively transforms the subject columns (chinese, mathematics, physics) into a single subject column.
When to use this method:
- Guaranteed Correctness and Portability: This is the most robust and universally compatible method. It handles NULL values gracefully without extra logic.
- Clarity over Performance: The intent of the code is crystal clear, even if it's not the most performant.
Drawbacks:
- Performance Penalty on Large Tables: Scanning a large table multiple times can be very inefficient and costly in a distributed environment.
- Verbosity: The code length grows linearly with the number of columns to unpivot.
```
SELECT name, subject, result 
FROM (
    SELECT name, 'chinese' AS subject, chinese AS result FROM columntorow 
    UNION ALL 
    SELECT name, 'mathematics' AS subject, mathematics AS result FROM columntorow 
    UNION ALL 
    SELECT name, 'physics' AS subject, physics AS result FROM columntorow
) unpivoted_data
ORDER BY name;
```
The following result is returned:
```
+--------+--------+------------+
| name   | subject| result     |
+--------+--------+------------+
| Bob    | chinese| 74         |
| Bob    | mathematics| 83         |
| Bob    | physics| 93         |
| Alice  | chinese| 74         |
| Alice  | mathematics| 84         |
| Alice  | physics| 94         |
+--------+--------+------------+
```
Method 2: Use built-in functions (MaxCompute-specific)
This method first uses CONCAT to create a delimited string from the subject columns. The TRANS_ARRAY and SPLIT_PART functions then explode this string into multiple rows and parse the subject and score for each.
When to use this method:
- Performance is critical on large tables, as this avoids multiple table scans.
- You are comfortable with MaxCompute-specific syntax and functions.
Drawbacks:
- Platform Lock-in: The code is highly specific to MaxCompute and not portable.
- Complex Syntax: The combination of TRANS_ARRAY and nested subqueries can be harder to read and debug than UNION ALL.
Note
This method relies on CONCAT. If any source column (chinese, mathematics, physics) contains a NULL, the entire concatenated string becomes NULL, leading to silent data loss for that row. To prevent this, you must wrap columns in NVL() (e.g., NVL(chinese, 0)) before concatenation, or revert to the safer UNION ALL method. For more information about the NVL function, see NVL.
```
SELECT name,
       SPLIT_PART(subject,':',1) AS subject,
       SPLIT_PART(subject,':',2) AS result
FROM (
       SELECT TRANS_ARRAY(1,';',name,subject) AS (name,subject) 
       FROM (
            SELECT name,
                   CONCAT('chinese',':',chinese,';','mathematics',':',mathematics,';','physics',':',physics) AS subject 
            FROM columntorow )
);
```
The following result is returned:
```
+--------+--------+------------+
| name   | subject   | result       |
+--------+--------+------------+
| Bob    | chinese   | 74         |
| Bob    | mathematics   | 83         |
| Bob    | physics   | 93         |
| Alice  | chinese   | 74         |
| Alice  | mathematics   | 84         |
| Alice  | physics   | 94         |
+--------+--------+------------+
```

MaxCompute:Manual pivoting and unpivoting in SQL

Background

Sample data

Pivot using CASE WHEN or built-in functions

Unpivot using UNION ALL or built-in functions

Related documents