根据题目要求,需要对每个学生的最高成绩进行计算。因此,在 Mapper 阶段,可以将每个学生的姓名作为 key,成绩作为 value 输出。在 Reducer 阶段,则可以将相同姓名的所有成绩进行比较,取最大值作为该学生的最好成绩。
下面是修改后的代码:
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class BestScore {
public static class ScoreMapper
extends Mapper<LongWritable, Text, Text, IntWritable>{
private Text name = new Text();
private IntWritable score = new IntWritable();
public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
String[] tokens = value.toString().split(" ");
name.set(tokens[0]);
score.set(Integer.parseInt(tokens[1]));
context.write(name, score);
}
}
public static class BestScoreReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int bestScore = Integer.MIN_VALUE; // 初始化为最小值
for (IntWritable val : values) {
bestScore = Math.max(bestScore, val.get()); // 取最大值
}
result.set(bestScore);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Best Score");
job.setJarByClass(BestScore.class);
job.setMapperClass(ScoreMapper.class);
job.setReducerClass(BestScoreReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 设置输入和输出路径
FileInputFormat.addInputPath(job, new Path("/user/test/input"));
FileOutputFormat.setOutputPath(job, new Path("/user/test/output"));
System.exit(job.waitForCompletion(true) ? 0 : 1); // 提交任务并等待完成
} }
需要注意的是,每个学生的最好成绩可能不止一个,因此在 Reducer 阶段需要将 values 中的所有值进行比较。同时,由于输出结果中包含了姓名和成绩两个信息,因此 key 类型为 Text,value 类型为 IntWritable。