Job.getInstance()
通常与 Hadoop 的 MapReduce 相关,用于获取 Job
类的实例。具体来说,Hadoop 提供了一个 Job
类来设置和配置作业(即 MapReduce 作业)。以下是其基本用法:
基本用法
导入必要的包:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
创建 Job 实例:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "My Job Name");
设置 Mapper 和 Reducer:
job.setJarByClass(MyMainClass.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
设置输入和输出格式: “`java FileInputFormat.addInputPath(job, new Path(“input_path”)); FileOutputFormat.setOutputPath(job, new Path(“output_path”));
job.setOutputKeyClass(Text.class); // 根据需要设定输出键类型 job.setOutputValueClass(IntWritable.class); // 根据需要设定输出值类型
5. **提交作业**:
```java
System.exit(job.waitForCompletion(true) ? 0 : 1);
完整示例
下面是一个完整的 MapReduce 作业示例:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class MyMapReduce {
public static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] words = value.toString().split("\\s+");
for (String w : words) {
word.set(w);
context.write(word, one);
}
}
}
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(MyMapReduce.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyReducer.class); // 可选,使用 Combiner 来优化性能
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
总结
Job.getInstance(Configuration conf)
方法用于创建一个新的Job
实例,配置并准备执行 MapReduce 作业。- 设置 Mapper、Reducer、输入输出路径及格式后,可以提交作业。
确保在运行之前已经正确配置了 Hadoop 环境,并根据自己的需求进行调整。
内容由零声教学AI助手提供,问题来源于学员提问