搭建Hadoop2.6.0+Eclipse开发调试环境

共计 15256 个字符，预计需要花费 39 分钟才能阅读完成。

上一篇在 Win7 虚拟机下搭建了 Hadoop2.6.0 伪分布式环境 (见 http://www.linuxidc.com/Linux/2015-08/120942.htm)。为了开发调试方便，本文介绍在 Eclipse 下搭建开发环境，连接和提交任务到 Hadoop 集群。

Eclipse 版本 Luna 4.4.1

搭建 Hadoop2.6.0+Eclipse 开发调试环境

安装插件 hadoop-eclipse-plugin-2.6.0.jar，下载后放到 eclipse/plugins 目录即可。

解压缩 hadoop-2.6.0.tar.gz 到 C:\Downloads\hadoop-2.6.0，在 eclipse 的 Windows->Preferences 的 Hadoop Map/Reduce 中设置安装目录。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

打开 Windows->Open Perspective 中的 Map/Reduce，在此 perspective 下进行 hadoop 程序开发。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

打开 Windows->Show View 中的 Map/Reduce Locations，如下图右键选择 New Hadoop location…新建 hadoop 连接。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

确认完成以后如下，eclipse 会连接 hadoop 集群。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

如果连接成功，在 project explorer 的 DFS Locations 下会展现 hdfs 集群中的文件。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

开发一个 Sort 示例，对输入整数进行排序。输入文件格式是每行一个整数。

  1 package com.ccb;
 2 
 3 /**
 4  * Created by hp on 2015-7-20.
 5  */
 6 
 7 import java.io.IOException;
 8 
 9 import org.apache.hadoop.conf.Configuration;
10 import org.apache.hadoop.fs.FileSystem;
11 import org.apache.hadoop.fs.Path;
12 import org.apache.hadoop.io.IntWritable;
13 import org.apache.hadoop.io.Text;
14 import org.apache.hadoop.mapreduce.Job;
15 import org.apache.hadoop.mapreduce.Mapper;
16 import org.apache.hadoop.mapreduce.Reducer;
17 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
18 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
19 
20 public class Sort {21 
22     // 每行记录是一个整数。将 Text 文本转换为 IntWritable 类型，作为 map 的 key
23     public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {24         private static IntWritable data = new IntWritable();
25 
26         // 实现 map 函数 
27         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {28             String line = value.toString();
29             data.set(Integer.parseInt(line));
30             context.write(data, new IntWritable(1));
31         }
32     }
33 
34     // reduce 之前 hadoop 框架会进行 shuffle 和排序，因此直接输出 key 即可。
35     public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, Text> {36 
37         // 实现 reduce 函数 
38         public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {39             for (IntWritable v : values) {40                 context.write(key, new Text(""));
41             }
42         }
43     }
44 
45     public static void main(String[] args) throws Exception {46         Configuration conf = new Configuration();
47 
48         // 指定 JobTracker 地址 
49         conf.set("mapred.job.tracker", "192.168.62.129:9001");
50         if (args.length != 2) {51             System.err.println("Usage: Data Sort <in> <out>");
52             System.exit(2);
53         }
54         System.out.println(args[0]);
55         System.out.println(args[1]);
56 
57         Job job = Job.getInstance(conf, "Data Sort");
58         job.setJarByClass(Sort.class);
59 
60         // 设置 Map 和 Reduce 处理类 
61         job.setMapperClass(Map.class);
62         job.setReducerClass(Reduce.class);
63 
64         // 设置输出类型 
65         job.setOutputKeyClass(IntWritable.class);
66         job.setOutputValueClass(IntWritable.class);
67 
68         // 设置输入和输出目录 
69         FileInputFormat.addInputPath(job, new Path(args[0]));
70         FileOutputFormat.setOutputPath(job, new Path(args[1]));
71         System.exit(job.waitForCompletion(true) ? 0 : 1);
72     }
73 }

View Code

把 log4j.properties 和 hadoop 集群中的 core-site.xml 加入到 classpath 中。我的示例工程是 maven 组织，因此放到 src/main/resources 目录。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

程序执行时会从 core-site.xml 中获取 hdfs 地址。

右键选择 Run As -> Run Configurations…，在参数中填好输入输出目录，执行 Run 即可。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

执行日志：

   1 hdfs://192.168.62.129:9000/user/vm/sort_in
  2 hdfs://192.168.62.129:9000/user/vm/sort_out
  3 15/07/27 16:21:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  4 15/07/27 16:21:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  5 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
  6 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
  7 15/07/27 16:21:36 INFO input.FileInputFormat: Total input paths to process : 3
  8 15/07/27 16:21:36 INFO mapreduce.JobSubmitter: number of splits:3
  9 15/07/27 16:21:36 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
 10 15/07/27 16:21:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1592166400_0001
 11 15/07/27 16:21:37 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
 12 15/07/27 16:21:37 INFO mapreduce.Job: Running job: job_local1592166400_0001
 13 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter set in config null
 14 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 15 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Waiting for map tasks
 16 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000000_0
 17 15/07/27 16:21:37 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 18 15/07/27 16:21:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4c90dbc4
 19 15/07/27 16:21:37 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file1:0+25
 20 15/07/27 16:21:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 21 15/07/27 16:21:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 22 15/07/27 16:21:37 INFO mapred.MapTask: soft limit at 83886080
 23 15/07/27 16:21:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 24 15/07/27 16:21:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 25 15/07/27 16:21:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 26 15/07/27 16:21:38 INFO mapred.LocalJobRunner: 
 27 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
 28 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
 29 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 56; bufvoid = 104857600
 30 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
 31 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
 32 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000000_0 is done. And is in the process of committing
 33 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
 34 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000000_0' done.
 35 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000000_0
 36 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000001_0
 37 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 38 15/07/27 16:21:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@69e4d7d
 39 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file2:0+15
 40 15/07/27 16:21:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 41 15/07/27 16:21:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 42 15/07/27 16:21:38 INFO mapred.MapTask: soft limit at 83886080
 43 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 44 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 45 15/07/27 16:21:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 46 15/07/27 16:21:38 INFO mapred.LocalJobRunner: 
 47 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
 48 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
 49 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 32; bufvoid = 104857600
 50 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
 51 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
 52 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000001_0 is done. And is in the process of committing
 53 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
 54 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000001_0' done.
 55 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000001_0
 56 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000002_0
 57 15/07/27 16:21:38 INFO mapreduce.Job: Job job_local1592166400_0001 running in uber mode : false
 58 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 59 15/07/27 16:21:38 INFO mapreduce.Job:  map 100% reduce 0%
 60 15/07/27 16:21:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4e931efa
 61 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file3:0+8
 62 15/07/27 16:21:39 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 63 15/07/27 16:21:39 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 64 15/07/27 16:21:39 INFO mapred.MapTask: soft limit at 83886080
 65 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 66 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 67 15/07/27 16:21:39 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 68 15/07/27 16:21:39 INFO mapred.LocalJobRunner: 
 69 15/07/27 16:21:39 INFO mapred.MapTask: Starting flush of map output
 70 15/07/27 16:21:39 INFO mapred.MapTask: Spilling map output
 71 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufend = 24; bufvoid = 104857600
 72 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
 73 15/07/27 16:21:39 INFO mapred.MapTask: Finished spill 0
 74 15/07/27 16:21:39 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000002_0 is done. And is in the process of committing
 75 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map
 76 15/07/27 16:21:39 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000002_0' done.
 77 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000002_0
 78 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map task executor complete.
 79 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
 80 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_r_000000_0
 81 15/07/27 16:21:39 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 82 15/07/27 16:21:39 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@49250068
 83 15/07/27 16:21:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2129404b
 84 15/07/27 16:21:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=652528832, maxSingleShuffleLimit=163132208, mergeThreshold=430669056, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 85 15/07/27 16:21:39 INFO reduce.EventFetcher: attempt_local1592166400_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
 86 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000002_0 decomp: 32 len: 36 to MEMORY
 87 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 32 bytes from map-output for attempt_local1592166400_0001_m_000002_0
 88 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 32, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->32
 89 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000000_0 decomp: 72 len: 76 to MEMORY
 90 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 72 bytes from map-output for attempt_local1592166400_0001_m_000000_0
 91 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 72, inMemoryMapOutputs.size() -> 2, commitMemory -> 32, usedMemory ->104
 92 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000001_0 decomp: 42 len: 46 to MEMORY
 93 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 42 bytes from map-output for attempt_local1592166400_0001_m_000001_0
 94 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 42, inMemoryMapOutputs.size() -> 3, commitMemory -> 104, usedMemory ->146
 95 15/07/27 16:21:40 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
 96 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
 97 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
 98 15/07/27 16:21:40 INFO mapred.Merger: Merging 3 sorted segments
 99 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 128 bytes
100 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merged 3 segments, 146 bytes to disk to satisfy reduce memory limit
101 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 1 files, 146 bytes from disk
102 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
103 15/07/27 16:21:40 INFO mapred.Merger: Merging 1 sorted segments
104 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 136 bytes
105 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
106 15/07/27 16:21:40 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
107 15/07/27 16:21:40 INFO mapred.Task: Task:attempt_local1592166400_0001_r_000000_0 is done. And is in the process of committing
108 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
109 15/07/27 16:21:40 INFO mapred.Task: Task attempt_local1592166400_0001_r_000000_0 is allowed to commit now
110 15/07/27 16:21:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1592166400_0001_r_000000_0' to hdfs://192.168.62.129:9000/user/vm/sort_out/_temporary/0/task_local1592166400_0001_r_000000
111 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce > reduce
112 15/07/27 16:21:40 INFO mapred.Task: Task 'attempt_local1592166400_0001_r_000000_0' done.
113 15/07/27 16:21:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_r_000000_0
114 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce task executor complete.
115 15/07/27 16:21:40 INFO mapreduce.Job:  map 100% reduce 100%
116 15/07/27 16:21:41 INFO mapreduce.Job: Job job_local1592166400_0001 completed successfully
117 15/07/27 16:21:41 INFO mapreduce.Job: Counters: 38
118     File System Counters
119         FILE: Number of bytes read=3834
120         FILE: Number of bytes written=1017600
121         FILE: Number of read operations=0
122         FILE: Number of large read operations=0
123         FILE: Number of write operations=0
124         HDFS: Number of bytes read=161
125         HDFS: Number of bytes written=62
126         HDFS: Number of read operations=41
127         HDFS: Number of large read operations=0
128         HDFS: Number of write operations=10
129     Map-Reduce Framework
130         Map input records=14
131         Map output records=14
132         Map output bytes=112
133         Map output materialized bytes=158
134         Input split bytes=339
135         Combine input records=0
136         Combine output records=0
137         Reduce input groups=13
138         Reduce shuffle bytes=158
139         Reduce input records=14
140         Reduce output records=14
141         Spilled Records=28
142         Shuffled Maps =3
143         Failed Shuffles=0
144         Merged Map outputs=3
145         GC time elapsed (ms)=10
146         CPU time spent (ms)=0
147         Physical memory (bytes) snapshot=0
148         Virtual memory (bytes) snapshot=0
149         Total committed heap usage (bytes)=1420296192
150     Shuffle Errors
151         BAD_ID=0
152         CONNECTION=0
153         IO_ERROR=0
154         WRONG_LENGTH=0
155         WRONG_MAP=0
156         WRONG_REDUCE=0
157     File Input Format Counters 
158         Bytes Read=48
159     File Output Format Counters 
160         Bytes Written=62

修改集群 hdfs-site.xml 配置，关闭 hadoop 集群的权限校验。

<name>dfs.permissions</name>

<value>false</value>

</property>

在环境变量中配置 %HADOOP_HOME% 为 C:\Download\hadoop-2.6.0\

下载 winutils.exe 和 hadoop.dll 到 C:\Download\hadoop-2.6.0\bin

注意：网上很多资料说的是下载 hadoop-common-2.2.0-bin-master.zip，但很多不支持 hadoop2.6.0 版本。需要下载支持 hadoop2.6.0 版本的程序。

需要执行 Run on Hadoop，而不是 Java Application。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

Ubuntu14.04 下 Hadoop2.4.1 单机 / 伪分布式安装配置教程 http://www.linuxidc.com/Linux/2015-02/113487.htm

CentOS 安装和配置 Hadoop2.2.0 http://www.linuxidc.com/Linux/2014-01/94685.htm

Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu 上搭建 Hadoop 环境（单机模式 + 伪分布模式）http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

更多 Hadoop 相关信息见 Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址 ：http://www.linuxidc.com/Linux/2015-08/120943.htm

搭建Hadoop2.6.0+Eclipse开发调试环境

1. 环境

2. 配置插件

2.1 配置 hadoop 主目录

2.2 配置插件

3. 开发 hadoop 程序

3.1 程序开发

3.2 配置文件

3.3 程序执行

4. 可能出现的问题

4.1 权限问题，无法访问 HDFS

4.2 出现 NullPointerException 异常

4.3 程序执行失败

申请腾讯混元的API Key并且使用LobeChat调用混元AI

基于Docker快速搭建一个开源的IT人员在线工具箱-it-tools

让每个人都可以轻松使用Git-腾讯自研Git客户端

使用Docker部署开源的WPS-Office

如何安装官方ChatGPT桌面软件，支持Windows和MacOS系统

使用zabbix监控redis内存使用

轻量教程：阿里云轻量应用服务器开放端口（防火墙添加规则）

阿里云 RDS 数据库恢复到本地记录

孤独的程序员

怎样在Linux中用一个命令升级全部软件

	1 package com.ccb;
	2
	3 /**
	4 * Created by hp on 2015-7-20.
	5 */
	6
	7 import java.io.IOException;
	8
	9 import org.apache.hadoop.conf.Configuration;
	10 import org.apache.hadoop.fs.FileSystem;
	11 import org.apache.hadoop.fs.Path;
	12 import org.apache.hadoop.io.IntWritable;
	13 import org.apache.hadoop.io.Text;
	14 import org.apache.hadoop.mapreduce.Job;
	15 import org.apache.hadoop.mapreduce.Mapper;
	16 import org.apache.hadoop.mapreduce.Reducer;
	17 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
	18 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
	19
	20 public class Sort {21
	22 // 每行记录是一个整数。将 Text 文本转换为 IntWritable 类型，作为 map 的 key
	23 public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {24 private static IntWritable data = new IntWritable();
	25
	26 // 实现 map 函数
	27 public void map(Object key, Text value, Context context) throws IOException, InterruptedException {28 String line = value.toString();
	29 data.set(Integer.parseInt(line));
	30 context.write(data, new IntWritable(1));
	31 }
	32 }
	33
	34 // reduce 之前 hadoop 框架会进行 shuffle 和排序，因此直接输出 key 即可。
	35 public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, Text> {36
	37 // 实现 reduce 函数
	38 public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {39 for (IntWritable v : values) {40 context.write(key, new Text(""));
	41 }
	42 }
	43 }
	44
	45 public static void main(String[] args) throws Exception {46 Configuration conf = new Configuration();
	47
	48 // 指定 JobTracker 地址
	49 conf.set("mapred.job.tracker", "192.168.62.129:9001");
	50 if (args.length != 2) {51 System.err.println("Usage: Data Sort <in> <out>");
	52 System.exit(2);
	53 }
	54 System.out.println(args[0]);
	55 System.out.println(args[1]);
	56
	57 Job job = Job.getInstance(conf, "Data Sort");
	58 job.setJarByClass(Sort.class);
	59
	60 // 设置 Map 和 Reduce 处理类
	61 job.setMapperClass(Map.class);
	62 job.setReducerClass(Reduce.class);
	63
	64 // 设置输出类型
	65 job.setOutputKeyClass(IntWritable.class);
	66 job.setOutputValueClass(IntWritable.class);
	67
	68 // 设置输入和输出目录
	69 FileInputFormat.addInputPath(job, new Path(args[0]));
	70 FileOutputFormat.setOutputPath(job, new Path(args[1]));
	71 System.exit(job.waitForCompletion(true) ? 0 : 1);
	72 }
	73 }

	1 hdfs://192.168.62.129:9000/user/vm/sort_in
	2 hdfs://192.168.62.129:9000/user/vm/sort_out
	3 15/07/27 16:21:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
	4 15/07/27 16:21:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
	5 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
	6 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
	7 15/07/27 16:21:36 INFO input.FileInputFormat: Total input paths to process : 3
	8 15/07/27 16:21:36 INFO mapreduce.JobSubmitter: number of splits:3
	9 15/07/27 16:21:36 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
	10 15/07/27 16:21:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1592166400_0001
	11 15/07/27 16:21:37 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
	12 15/07/27 16:21:37 INFO mapreduce.Job: Running job: job_local1592166400_0001
	13 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter set in config null
	14 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
	15 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Waiting for map tasks
	16 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000000_0
	17 15/07/27 16:21:37 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
	18 15/07/27 16:21:37 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4c90dbc4
	19 15/07/27 16:21:37 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file1:0+25
	20 15/07/27 16:21:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
	21 15/07/27 16:21:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
	22 15/07/27 16:21:37 INFO mapred.MapTask: soft limit at 83886080
	23 15/07/27 16:21:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
	24 15/07/27 16:21:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
	25 15/07/27 16:21:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
	26 15/07/27 16:21:38 INFO mapred.LocalJobRunner:
	27 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
	28 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
	29 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 56; bufvoid = 104857600
	30 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
	31 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
	32 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000000_0 is done. And is in the process of committing
	33 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
	34 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000000_0' done.
	35 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000000_0
	36 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000001_0
	37 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
	38 15/07/27 16:21:38 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@69e4d7d
	39 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file2:0+15
	40 15/07/27 16:21:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
	41 15/07/27 16:21:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
	42 15/07/27 16:21:38 INFO mapred.MapTask: soft limit at 83886080
	43 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
	44 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
	45 15/07/27 16:21:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
	46 15/07/27 16:21:38 INFO mapred.LocalJobRunner:
	47 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
	48 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
	49 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 32; bufvoid = 104857600
	50 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
	51 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
	52 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000001_0 is done. And is in the process of committing
	53 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
	54 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000001_0' done.
	55 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000001_0
	56 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000002_0
	57 15/07/27 16:21:38 INFO mapreduce.Job: Job job_local1592166400_0001 running in uber mode : false
	58 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
	59 15/07/27 16:21:38 INFO mapreduce.Job: map 100% reduce 0%
	60 15/07/27 16:21:38 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4e931efa
	61 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file3:0+8
	62 15/07/27 16:21:39 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
	63 15/07/27 16:21:39 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
	64 15/07/27 16:21:39 INFO mapred.MapTask: soft limit at 83886080
	65 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
	66 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
	67 15/07/27 16:21:39 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
	68 15/07/27 16:21:39 INFO mapred.LocalJobRunner:
	69 15/07/27 16:21:39 INFO mapred.MapTask: Starting flush of map output
	70 15/07/27 16:21:39 INFO mapred.MapTask: Spilling map output
	71 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufend = 24; bufvoid = 104857600
	72 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
	73 15/07/27 16:21:39 INFO mapred.MapTask: Finished spill 0
	74 15/07/27 16:21:39 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000002_0 is done. And is in the process of committing
	75 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map
	76 15/07/27 16:21:39 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000002_0' done.
	77 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000002_0
	78 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map task executor complete.
	79 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
	80 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_r_000000_0
	81 15/07/27 16:21:39 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
	82 15/07/27 16:21:39 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@49250068
	83 15/07/27 16:21:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2129404b
	84 15/07/27 16:21:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=652528832, maxSingleShuffleLimit=163132208, mergeThreshold=430669056, ioSortFactor=10, memToMemMergeOutputsThreshold=10
	85 15/07/27 16:21:39 INFO reduce.EventFetcher: attempt_local1592166400_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
	86 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000002_0 decomp: 32 len: 36 to MEMORY
	87 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 32 bytes from map-output for attempt_local1592166400_0001_m_000002_0
	88 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 32, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->32
	89 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000000_0 decomp: 72 len: 76 to MEMORY
	90 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 72 bytes from map-output for attempt_local1592166400_0001_m_000000_0
	91 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 72, inMemoryMapOutputs.size() -> 2, commitMemory -> 32, usedMemory ->104
	92 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000001_0 decomp: 42 len: 46 to MEMORY
	93 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 42 bytes from map-output for attempt_local1592166400_0001_m_000001_0
	94 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 42, inMemoryMapOutputs.size() -> 3, commitMemory -> 104, usedMemory ->146
	95 15/07/27 16:21:40 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
	96 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
	97 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
	98 15/07/27 16:21:40 INFO mapred.Merger: Merging 3 sorted segments
	99 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 128 bytes
	100 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merged 3 segments, 146 bytes to disk to satisfy reduce memory limit
	101 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 1 files, 146 bytes from disk
	102 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
	103 15/07/27 16:21:40 INFO mapred.Merger: Merging 1 sorted segments
	104 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 136 bytes
	105 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
	106 15/07/27 16:21:40 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
	107 15/07/27 16:21:40 INFO mapred.Task: Task:attempt_local1592166400_0001_r_000000_0 is done. And is in the process of committing
	108 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
	109 15/07/27 16:21:40 INFO mapred.Task: Task attempt_local1592166400_0001_r_000000_0 is allowed to commit now
	110 15/07/27 16:21:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1592166400_0001_r_000000_0' to hdfs://192.168.62.129:9000/user/vm/sort_out/_temporary/0/task_local1592166400_0001_r_000000
	111 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce > reduce
	112 15/07/27 16:21:40 INFO mapred.Task: Task 'attempt_local1592166400_0001_r_000000_0' done.
	113 15/07/27 16:21:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_r_000000_0
	114 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce task executor complete.
	115 15/07/27 16:21:40 INFO mapreduce.Job: map 100% reduce 100%
	116 15/07/27 16:21:41 INFO mapreduce.Job: Job job_local1592166400_0001 completed successfully
	117 15/07/27 16:21:41 INFO mapreduce.Job: Counters: 38
	118 File System Counters
	119 FILE: Number of bytes read=3834
	120 FILE: Number of bytes written=1017600
	121 FILE: Number of read operations=0
	122 FILE: Number of large read operations=0
	123 FILE: Number of write operations=0
	124 HDFS: Number of bytes read=161
	125 HDFS: Number of bytes written=62
	126 HDFS: Number of read operations=41
	127 HDFS: Number of large read operations=0
	128 HDFS: Number of write operations=10
	129 Map-Reduce Framework
	130 Map input records=14
	131 Map output records=14
	132 Map output bytes=112
	133 Map output materialized bytes=158
	134 Input split bytes=339
	135 Combine input records=0
	136 Combine output records=0
	137 Reduce input groups=13
	138 Reduce shuffle bytes=158
	139 Reduce input records=14
	140 Reduce output records=14
	141 Spilled Records=28
	142 Shuffled Maps =3
	143 Failed Shuffles=0
	144 Merged Map outputs=3
	145 GC time elapsed (ms)=10
	146 CPU time spent (ms)=0
	147 Physical memory (bytes) snapshot=0
	148 Virtual memory (bytes) snapshot=0
	149 Total committed heap usage (bytes)=1420296192
	150 Shuffle Errors
	151 BAD_ID=0
	152 CONNECTION=0
	153 IO_ERROR=0
	154 WRONG_LENGTH=0
	155 WRONG_MAP=0
	156 WRONG_REDUCE=0
	157 File Input Format Counters
	158 Bytes Read=48
	159 File Output Format Counters
	160 Bytes Written=62