This topic introduces the basic terms of MapReduce.

Map/Reduce

When a map or reduce task runs, the setup(), map() or reduce(), and cleanup() methods are called. The setup() method is called prior to the map() or reduce() method. Each worker calls it only once.

The cleanup() method is called after the map() or reduce() method. Each worker calls it only once.

For more information about usage examples, see Example programs.

Sort

Some columns in the key records generated by a mapper can be used as sort columns. These columns do not support a custom comparator. You can select a few sort columns as group columns. These columns do not support a custom group comparator. Sort columns are used to sort your data, while group columns are used for secondary sorting.

For more information about usage examples, see Secondary sorting source code.

Partition

MaxCompute supports partition columns and custom partitioners. Partition columns take precedence over custom partitioners.

Partitioners are used to allocate the data generated by a mapper to different reducers based on the partitioning logic.

Combiner

The combiner function combines adjacent records at the shuffle stage. You can determine whether to use the combiner function based on your business logic.

The combiner function is the optimization of the MapReduce computing framework. The combiner logic is the same as the reducer logic. After a mapper generates data, the framework combines the data with the same key at the map stage.

For more information about usage examples, see Example programs.