Basic conception

Last Updated: Oct 31, 2017

Map and Reduce support corresponding map/reduce method, setup method and cleanup method respectively. The setup method is called before the map/reduce method; each worker will call it and can only call once. The reduce method is called after the map/reduce method; each work will call it and only call once.

Note:

Sort/Group

A few columns in output key records can be taken as sort columns, but user-defined comparator is not supported. User can select several columns from sort columns as Group columns, but user defined Group comparator is not supported. Sort columns are used to sort user data while Group columns are used for secondarySort.

Note:

Partitioner

To set partition column and user defined function (partitioner) is supported. The use priority of partition column is higher than partitioner. The partitioner is used to distribute the output data on Map terminal to different Reduce Workers according to Hash logic.

Combiner

The function Combiner is used to combine adjacent records in Shuffle stage. User can choose whether to use Combiner according to different business logic. Combiner is a kind of optimization of MapReduce computing framework and the logic of Combiner is usually the same as Reduce. After Map outputs the data, the framework will do local combiner operation for the data which has the same key value on Map terminal.

Note:

Thank you! We've received your feedback.