MaxCompute MapReduce jobs pass data through four stages — Map/Reduce, Sort, Partition, and Combiner — each with distinct configuration options. Understanding these terms helps you design jobs correctly and debug unexpected results.
How it works
Data flows through a MapReduce job in this order:
(input) → map → [combine] → shuffle & sort → reduce → (output)
Each term in this topic corresponds to a stage or configuration option in this pipeline.
Map/Reduce
A map or reduce task runs three methods in sequence:
-
setup()— runs once per worker before processing begins -
map()orreduce()— runs for each input record -
cleanup()— runs once per worker after all records are processed
For runnable examples, see Example programs.
Sort
MaxCompute lets you control sort behavior through two column types:
| Column type | Role | Custom comparator |
|---|---|---|
| Sort columns | Determine the sort order. Designated from columns in the key records generated by a mapper. | Not supported |
| Group columns | A subset of sort columns. Used for secondary sorting. | Not supported |
For an example of secondary sorting, see Secondary sorting source code.
Partition
Partitioners route data generated by a mapper to different reducers based on partitioning logic. MaxCompute supports two mechanisms:
| Mechanism | Description |
|---|---|
| Partition columns | Partitioning based on designated columns. |
| Custom partitioners | User-defined logic for routing records to reducers. |
When both are configured, partition columns take precedence over custom partitioners.
Combiner
The combiner is an optional optimization of the MapReduce computing framework that runs at the shuffle stage. It combines adjacent records, reducing the volume of data transferred from mappers to reducers.
You can determine whether to use the combiner function based on your business logic. The combiner logic is identical to the reducer logic: after a mapper generates data, the framework applies the combiner to all records sharing the same key.
For runnable examples, see Example programs.