MaxCompute Unleashed - Part 13: Global Zorder

By Haiqing

MaxCompute (formerly known as ODPS) is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.

Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).

This article will introduce the support of Global Z-Order in MaxCompute.

Background of Global Z-Ordering

In typical big data analytics scenarios, queries often include one or more filter conditions. For this type of query, especially when filtering on high cardinality fields, you can utilize file-level Min/Max indexes to skip irrelevant data and only read a small portion of the required data. This technology is commonly known as Data Clustering and Data Skipping. Data Clustering refers to how data is clustered, while Data Skipping refers to the task of skipping irrelevant data during data reading. The effectiveness of Data Skipping is determined by the Data Clustering method and the filter conditions in the query, which affects the query's execution efficiency and runtime. The figure below illustrates the Data Skipping effect of a regular table under different Data Clustering. From the figure on the right, you can observe that with appropriate Data Clustering, the task can filter out the majority of files.

Currently, MaxCompute not only supports traditional ETL jobs but also a large number of analytical jobs. In practical business scenarios, we often face the challenge of achieving the most efficient Data Skipping during queries, rather than scanning the entire table, in order to greatly improve query execution efficiency. The key point here is Data Clustering. In this case, we can easily think of global or local sorting for filter fields. While this approach is effective for one-dimensional analysis queries, it is not as effective for multi-dimensional analysis queries. For instance, when sorting using fields a, b, and c, data clustering is best for field a and becomes worse for subsequent fields. If the cardinality of field a is high, there may be no clustering for the following fields, thereby failing to achieve filter effectiveness.

To optimize multi-dimensional analysis queries, MaxCompute provides clustered indexes (Hash Clustering and Range Clustering). However, these two methods have their limitations:

Hash Clustering: Data can only be filtered when all Cluster Keys are included in the filter condition.
Range Clustering: It adheres to the leftmost matching principle. A good filtering effect can only be achieved if the filter condition contains the Cluster Key prefix. Without this prefix, the filtering effect is significantly reduced.

In response to these challenges, MaxCompute supports local Z-Ordering to skip irrelevant data in multi-dimensional analysis scenarios. However, this method only sorts data within an individual file based on Z-Ordering and doesn't redistribute data globally. Consequently, if data is dispersed across files, the degree of clustering might be low, preventing the most efficient Data Skipping. To address this issue, MaxCompute has introduced support for global Z-Ordering. This allows data within a table or partition to be globally sorted by specified fields using Z-Ordering, leading to better data clustering.

The Use of MaxCompute Global Z-Ordering

Syntax Format

set odps.sql.default.zorder.type=global;
-- Required. If this flag is not added, the default value is local Z-Ordering.
set odps.sql.zorder.add.noise=true; 
-- Optional. You can add this parameter for skewed data to scatter the data. The scatter may affect the data clustering effect.
INSERT OVERWRITE|INTOTABLE table_name [PARTITION (partcol1=val1, partcol2=val2 ...)] [(col1,col2 ...)]
select_statement
FROM from_statement
[ZORDER BY zcol1 [, zcol2 ...]]
[SORT BY zol1 [, zcol2 ...]]
;

There is no difference between global Z-Ordering and local Z-Ordering in syntax. It needs to be distinguished by flag. The default is local Z-Ordering. Therefore, to use global Z-Ordering, you need to add flag: set odps.sql.default.zorder.type=global before the script.
For the partition table, only one partition can be sorted by zorder by at a time.
The number of zorder by fields can be between 2 and 4. If the number of fields exceeds 4, an error is reported.
The sort by statement is used to specify the internal sorting method of a single file. If this parameter is not specified, a single file is sorted by using local Z-ordering according to specified fields.
If the destination table is a clustered table, the zorder by clause is not supported.

Examples of Performing ZORDER BY Sorting on a Newly Created Table

-- Create the destination table named mf_src. 
create table mf_src (key string, value string);
insert overwrite table mf_src
select a, b from values ('1', '1'),('3', '3'),('2', '2')
as t(a, b);
select * from mf_src;-- The following result is returned:
+-----+-------+
| key | value |
+-----+-------+
| 1   | 1     |
| 3   | 3     |
| 2   | 2     |
+-----+-------+
-- Create the table named mf_zorder_src that has the same schema as the mf_src table. 
create table mf_zorder_src like mf_src;
-- Sort data in global Z-Ordering mode. 
set odps.sql.default.zorder.type=global;
insert overwrite table mf_zorder_src
select key, value from mf_src 
zorder by key, value;
select * from mf_zorder_src;
-- The following result is returned:
+-----+-------+
| key | value |
+-----+-------+
| 1   | 1     |
| 2   | 2     |
| 3   | 3     |
+-----+-------+

Examples of Performing ZORDER BY Sorting on Existing Tables

-- The target table is an existing table.
set odps.sql.default.zorder.type=global;
insert overwrite table target
select key, value from target 
zorder by key, value;

Technical Analysis

This section focuses on the implementation of global Z-Ordering in the industry and MaxCompute.

Implementation Research

Multidimensional data lacks natural order and needs to be mapped to one-dimensional data for comparison. Z-Ordering is a way to map multidimensional data without natural order to one-dimensional data for comparison. The mapped one-dimensional data ensures the clustering of each original dimension to the same degree. Implementing global Z-Ordering generally involves two steps:

Step 1: Calculate the z-value and flatten the values of multiple columns into one-dimensional space.
Step 2: Sort them globally by the z-value.

In big data systems, range partitions are commonly used for global sorting, and there are two methods to calculate the z-value:

Method 1: Generating a z-value based on the mapping strategy

1) Directly convert the data type to bits. The disadvantage of this method is that it incurs a high cost and may have poor results due to uneven data distribution. For example, if the values of the x field are (0, 1, 2) and the values of the y field are (100, 200, 300), the z-value generated by sorting x and y is only a part of the complete z-curve, and the sorting effect of the z-values is the same as sorting directly by x. Another example is when the cardinality of x is much lower than that of y, the sorting effect of the above strategy is basically the same as sorting by y value, but it is not as effective as sorting by x and then by y.

2) Use a global dictionary. This method requires obtaining the distinct values of the field and performing global sorting. The sorted sequence numbers are consecutive positive integers starting from 0. Then, the sorted sequence numbers are used to calculate the z-value. However, this method incurs a large cost and is not feasible in practice.

Method 2: Generating z-values based on boundary indexes (as exemplified by Delta and Hudi)

Sample several values for each field participating in Z-Ordering and sort them to calculate the boundary of each field. In the actual mapping process, each field mapping uses the subscript of the boundary where the data is located, and then the subscript is used to calculate the z-value.

The Implementation of MaxCompute Global Z-Order

The implementation of MaxCompute Global Z-Order is similar to method 2. It maps the Z-Ordering field to the subscript of the boundary for calculating the z-value. However, unlike Spark, which uses shuffle to sample fields and calculate partition boundaries, MaxCompute utilizes built-in functions based on its engine characteristics to complete the calculation, as shown below:

The three functions involved are as follows:

boundary

Function declaration: Array boundary(T ref, bigint bucketNum);

Return several boundaries of the specified column. In the current implementation, bucketNum is 1000.

array_lower_bound_index

Function declaration: Long ARRAY_LOWER_BOUND_INDEX(Array array, T value);

Return the index of the first element in the array that is not less than the value.

zvalue

Function declaration: Long ZVALUE(Integer a, Integer b.. Integer d);

Return the z-value of the given column.

Summary

The Z-Ordering feature is not a one-size-fits-all solution, and there are no fixed rules to guide its usage. It is often necessary to try different approaches on a case-by-case basis. When considering the additional computational costs of using the Z-Ordering feature for data sorting, it is important to evaluate whether the savings in storage, downstream consumption, and computation outweigh these costs. The following suggestions are based on practical experience and provide guidance on using the Z-Ordering feature effectively:

Scenarios where clustered indexing should be used instead of Z-Ordering:
- If filter conditions are predominantly combinations of prefixes, such as a, or the combination of a and b, or a, b, and c, then clustered indexing (ORDER BY a, b, c) is more effective than using ZORDER BY. This is because ORDER BY provides excellent sorting for the first field, with diminishing impact on subsequent fields, whereas ZORDER BY gives equal weight to all fields, making its specific field sorting less effective than ORDER BY's first field.
- If certain fields are commonly used in JOIN KEY, Hash or Range Clustering is more suitable. MaxCompute's Z-Ordering is only applied within individual files, and the SQL engine does not perceive the data distribution of Z-Ordered data as it does with clustered indexing, which can be optimized for JOIN performance during the query planning stage.
- If particular fields are frequently involved in GROUP BY or ORDER BY clauses, clustered indexing is likely to deliver better performance.
- Suggestions for using Z-Ordering:
Suggestions for using Z-Ordering:
- Choose fields that frequently appear in filter conditions, especially those often used together for filtering.
- Limit the number of fields used for ZORDER BY to no more than four. Sorting performance degrades with each additional field; use clustered indexing instead of Z-Ordering if there's only one field to sort.
- Ensure the data volume of the table to be sorted is neither too small nor too large. Sorting with Z-Ordering has negligible effects on small data volumes, while the cost of sorting large data volumes can be prohibitive, particularly affecting the output timing of baseline tasks.
- Global Z-Ordering offers better clustering compared to Local Z-Ordering, but at a higher cost, so choose the appropriate method based on the characteristics of the task.

Community

MaxCompute Unleashed - Part 13: Global Zorder

Background of Global Z-Ordering

The Use of MaxCompute Global Z-Ordering

Syntax Format

Examples of Performing ZORDER BY Sorting on a Newly Created Table

Examples of Performing ZORDER BY Sorting on Existing Tables

Technical Analysis

Implementation Research

Method 1: Generating a z-value based on the mapping strategy

Method 2: Generating z-values based on boundary indexes (as exemplified by Delta and Hudi)

The Implementation of MaxCompute Global Z-Order

Summary

Read previous post:

Read next post:

Alibaba Cloud MaxCompute

You may also like

Comments

Alibaba Cloud MaxCompute

Related Products

MaxCompute

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

Realtime Compute for Apache Flink