Map Reduce 简明教程
MapReduce - Partitioner
分区程序在处理输入数据集时像条件一样工作。分区阶段发生在 Map 阶段之后,Reduce 阶段之前。
A partitioner works like a condition in processing an input dataset. The partition phase takes place after the Map phase and before the Reduce phase.
分区程序的数量等于还原程序的数量。这意味着分区程序会根据还原程序的数量划分数据。因此,由单个分区程序传递的数据由单个还原程序处理。
The number of partitioners is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers. Therefore, the data passed from a single partitioner is processed by a single Reducer.
Partitioner
分区程序对 Map 中间输出的键值对分区。它使用类似哈希函数的用户定义条件对数据进行分区。分区总数与作业的还原程序任务数量相同。我们举个例子来了解分区程序的工作原理。
A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job. Let us take an example to understand how the partitioner works.
MapReduce Partitioner Implementation
为了方便,我们假设我们有一个名为“Employee”的小型表格,其中包含以下数据。我们将使用此示例数据作为输入数据集来演示分区程序的工作原理。
For the sake of convenience, let us assume we have a small table called Employee with the following data. We will use this sample data as our input dataset to demonstrate how the partitioner works.
Id |
Name |
Age |
Gender |
Salary |
1201 |
gopal |
45 |
Male |
50,000 |
1202 |
manisha |
40 |
Female |
50,000 |
1203 |
khalil |
34 |
Male |
30,000 |
1204 |
prasanth |
30 |
Male |
30,000 |
1205 |
kiran |
20 |
Male |
40,000 |
1206 |
laxmi |
25 |
Female |
35,000 |
1207 |
bhavya |
20 |
Female |
15,000 |
1208 |
reshma |
19 |
Female |
15,000 |
1209 |
kranthi |
22 |
Male |
22,000 |
1210 |
Satish |
24 |
Male |
25,000 |
1211 |
Krishna |
25 |
Male |
25,000 |
1212 |
Arshad |
28 |
Male |
20,000 |
1213 |
lavanya |
18 |
Female |
8,000 |
我们必须编写一个应用程序来处理输入数据集,以便按性别在不同年龄组(例如,20 岁以下、21 岁至 30 岁、30 岁以上)中查找薪水最高的员工。
We have to write an application to process the input dataset to find the highest salaried employee by gender in different age groups (for example, below 20, between 21 to 30, above 30).