阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

搭建Hadoop2.6.0+Eclipse开发调试环境

193次阅读
没有评论

共计 15256 个字符,预计需要花费 39 分钟才能阅读完成。

上一篇在 Win7 虚拟机下搭建了 Hadoop2.6.0 伪分布式环境 (见 http://www.linuxidc.com/Linux/2015-08/120942.htm)。为了开发调试方便,本文介绍在 Eclipse 下搭建开发环境,连接和提交任务到 Hadoop 集群。

1. 环境

Eclipse 版本 Luna 4.4.1

搭建 Hadoop2.6.0+Eclipse 开发调试环境

安装插件 hadoop-eclipse-plugin-2.6.0.jar,下载后放到 eclipse/plugins 目录即可。

2. 配置插件

2.1 配置 hadoop 主目录

解压缩 hadoop-2.6.0.tar.gz 到 C:\Downloads\hadoop-2.6.0,在 eclipse 的 Windows->Preferences 的 Hadoop Map/Reduce 中设置安装目录。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

2.2 配置插件

打开 Windows->Open Perspective 中的 Map/Reduce,在此 perspective 下进行 hadoop 程序开发。

搭建 Hadoop2.6.0+Eclipse 开发调试环境     搭建 Hadoop2.6.0+Eclipse 开发调试环境

打开 Windows->Show View 中的 Map/Reduce Locations,如下图右键选择 New Hadoop location…新建 hadoop 连接。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

搭建 Hadoop2.6.0+Eclipse 开发调试环境

确认完成以后如下,eclipse 会连接 hadoop 集群。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

如果连接成功,在 project explorer 的 DFS Locations 下会展现 hdfs 集群中的文件。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

 

3. 开发 hadoop 程序

3.1 程序开发

开发一个 Sort 示例,对输入整数进行排序。输入文件格式是每行一个整数。

 
 1 package com.ccb;
 2 
 3 /**
 4  * Created by hp on 2015-7-20.
 5  */
 6 
 7 import java.io.IOException;
 8 
 9 import org.apache.hadoop.conf.Configuration;
10 import org.apache.hadoop.fs.FileSystem;
11 import org.apache.hadoop.fs.Path;
12 import org.apache.hadoop.io.IntWritable;
13 import org.apache.hadoop.io.Text;
14 import org.apache.hadoop.mapreduce.Job;
15 import org.apache.hadoop.mapreduce.Mapper;
16 import org.apache.hadoop.mapreduce.Reducer;
17 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
18 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
19 
20 public class Sort {21 
22     // 每行记录是一个整数。将 Text 文本转换为 IntWritable 类型,作为 map 的 key
23     public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {24         private static IntWritable data = new IntWritable();
25 
26         // 实现 map 函数 
27         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {28             String line = value.toString();
29             data.set(Integer.parseInt(line));
30             context.write(data, new IntWritable(1));
31         }
32     }
33 
34     // reduce 之前 hadoop 框架会进行 shuffle 和排序,因此直接输出 key 即可。
35     public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, Text> {36 
37         // 实现 reduce 函数 
38         public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {39             for (IntWritable v : values) {40                 context.write(key, new Text(""));
41             }
42         }
43     }
44 
45     public static void main(String[] args) throws Exception {46         Configuration conf = new Configuration();
47 
48         // 指定 JobTracker 地址 
49         conf.set("mapred.job.tracker", "192.168.62.129:9001");
50         if (args.length != 2) {51             System.err.println("Usage: Data Sort <in> <out>");
52             System.exit(2);
53         }
54         System.out.println(args[0]);
55         System.out.println(args[1]);
56 
57         Job job = Job.getInstance(conf, "Data Sort");
58         job.setJarByClass(Sort.class);
59 
60         // 设置 Map 和 Reduce 处理类 
61         job.setMapperClass(Map.class);
62         job.setReducerClass(Reduce.class);
63 
64         // 设置输出类型 
65         job.setOutputKeyClass(IntWritable.class);
66         job.setOutputValueClass(IntWritable.class);
67 
68         // 设置输入和输出目录 
69         FileInputFormat.addInputPath(job, new Path(args[0]));
70         FileOutputFormat.setOutputPath(job, new Path(args[1]));
71         System.exit(job.waitForCompletion(true) ? 0 : 1);
72     }
73 }

3.2 配置文件

把 log4j.properties 和 hadoop 集群中的 core-site.xml 加入到 classpath 中。我的示例工程是 maven 组织,因此放到 src/main/resources 目录。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

程序执行时会从 core-site.xml 中获取 hdfs 地址。

3.3 程序执行

右键选择 Run As -> Run Configurations…,在参数中填好输入输出目录,执行 Run 即可。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

 执行日志:

  1 hdfs://192.168.62.129:9000/user/vm/sort_in
  2 hdfs://192.168.62.129:9000/user/vm/sort_out
  3 15/07/27 16:21:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  4 15/07/27 16:21:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  5 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
  6 15/07/27 16:21:36 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
  7 15/07/27 16:21:36 INFO input.FileInputFormat: Total input paths to process : 3
  8 15/07/27 16:21:36 INFO mapreduce.JobSubmitter: number of splits:3
  9 15/07/27 16:21:36 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
 10 15/07/27 16:21:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1592166400_0001
 11 15/07/27 16:21:37 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
 12 15/07/27 16:21:37 INFO mapreduce.Job: Running job: job_local1592166400_0001
 13 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter set in config null
 14 15/07/27 16:21:37 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 15 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Waiting for map tasks
 16 15/07/27 16:21:37 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000000_0
 17 15/07/27 16:21:37 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 18 15/07/27 16:21:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4c90dbc4
 19 15/07/27 16:21:37 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file1:0+25
 20 15/07/27 16:21:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 21 15/07/27 16:21:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 22 15/07/27 16:21:37 INFO mapred.MapTask: soft limit at 83886080
 23 15/07/27 16:21:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 24 15/07/27 16:21:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 25 15/07/27 16:21:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 26 15/07/27 16:21:38 INFO mapred.LocalJobRunner: 
 27 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
 28 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
 29 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 56; bufvoid = 104857600
 30 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
 31 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
 32 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000000_0 is done. And is in the process of committing
 33 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
 34 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000000_0' done.
 35 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000000_0
 36 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000001_0
 37 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 38 15/07/27 16:21:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@69e4d7d
 39 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file2:0+15
 40 15/07/27 16:21:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 41 15/07/27 16:21:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 42 15/07/27 16:21:38 INFO mapred.MapTask: soft limit at 83886080
 43 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 44 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 45 15/07/27 16:21:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 46 15/07/27 16:21:38 INFO mapred.LocalJobRunner: 
 47 15/07/27 16:21:38 INFO mapred.MapTask: Starting flush of map output
 48 15/07/27 16:21:38 INFO mapred.MapTask: Spilling map output
 49 15/07/27 16:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 32; bufvoid = 104857600
 50 15/07/27 16:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
 51 15/07/27 16:21:38 INFO mapred.MapTask: Finished spill 0
 52 15/07/27 16:21:38 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000001_0 is done. And is in the process of committing
 53 15/07/27 16:21:38 INFO mapred.LocalJobRunner: map
 54 15/07/27 16:21:38 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000001_0' done.
 55 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000001_0
 56 15/07/27 16:21:38 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_m_000002_0
 57 15/07/27 16:21:38 INFO mapreduce.Job: Job job_local1592166400_0001 running in uber mode : false
 58 15/07/27 16:21:38 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 59 15/07/27 16:21:38 INFO mapreduce.Job:  map 100% reduce 0%
 60 15/07/27 16:21:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4e931efa
 61 15/07/27 16:21:38 INFO mapred.MapTask: Processing split: hdfs://192.168.62.129:9000/user/vm/sort_in/file3:0+8
 62 15/07/27 16:21:39 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 63 15/07/27 16:21:39 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 64 15/07/27 16:21:39 INFO mapred.MapTask: soft limit at 83886080
 65 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 66 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 67 15/07/27 16:21:39 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 68 15/07/27 16:21:39 INFO mapred.LocalJobRunner: 
 69 15/07/27 16:21:39 INFO mapred.MapTask: Starting flush of map output
 70 15/07/27 16:21:39 INFO mapred.MapTask: Spilling map output
 71 15/07/27 16:21:39 INFO mapred.MapTask: bufstart = 0; bufend = 24; bufvoid = 104857600
 72 15/07/27 16:21:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
 73 15/07/27 16:21:39 INFO mapred.MapTask: Finished spill 0
 74 15/07/27 16:21:39 INFO mapred.Task: Task:attempt_local1592166400_0001_m_000002_0 is done. And is in the process of committing
 75 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map
 76 15/07/27 16:21:39 INFO mapred.Task: Task 'attempt_local1592166400_0001_m_000002_0' done.
 77 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_m_000002_0
 78 15/07/27 16:21:39 INFO mapred.LocalJobRunner: map task executor complete.
 79 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
 80 15/07/27 16:21:39 INFO mapred.LocalJobRunner: Starting task: attempt_local1592166400_0001_r_000000_0
 81 15/07/27 16:21:39 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
 82 15/07/27 16:21:39 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@49250068
 83 15/07/27 16:21:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2129404b
 84 15/07/27 16:21:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=652528832, maxSingleShuffleLimit=163132208, mergeThreshold=430669056, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 85 15/07/27 16:21:39 INFO reduce.EventFetcher: attempt_local1592166400_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
 86 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000002_0 decomp: 32 len: 36 to MEMORY
 87 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 32 bytes from map-output for attempt_local1592166400_0001_m_000002_0
 88 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 32, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->32
 89 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000000_0 decomp: 72 len: 76 to MEMORY
 90 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 72 bytes from map-output for attempt_local1592166400_0001_m_000000_0
 91 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 72, inMemoryMapOutputs.size() -> 2, commitMemory -> 32, usedMemory ->104
 92 15/07/27 16:21:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1592166400_0001_m_000001_0 decomp: 42 len: 46 to MEMORY
 93 15/07/27 16:21:40 INFO reduce.InMemoryMapOutput: Read 42 bytes from map-output for attempt_local1592166400_0001_m_000001_0
 94 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 42, inMemoryMapOutputs.size() -> 3, commitMemory -> 104, usedMemory ->146
 95 15/07/27 16:21:40 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
 96 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
 97 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
 98 15/07/27 16:21:40 INFO mapred.Merger: Merging 3 sorted segments
 99 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 128 bytes
100 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merged 3 segments, 146 bytes to disk to satisfy reduce memory limit
101 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 1 files, 146 bytes from disk
102 15/07/27 16:21:40 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
103 15/07/27 16:21:40 INFO mapred.Merger: Merging 1 sorted segments
104 15/07/27 16:21:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 136 bytes
105 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
106 15/07/27 16:21:40 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
107 15/07/27 16:21:40 INFO mapred.Task: Task:attempt_local1592166400_0001_r_000000_0 is done. And is in the process of committing
108 15/07/27 16:21:40 INFO mapred.LocalJobRunner: 3 / 3 copied.
109 15/07/27 16:21:40 INFO mapred.Task: Task attempt_local1592166400_0001_r_000000_0 is allowed to commit now
110 15/07/27 16:21:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1592166400_0001_r_000000_0' to hdfs://192.168.62.129:9000/user/vm/sort_out/_temporary/0/task_local1592166400_0001_r_000000
111 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce > reduce
112 15/07/27 16:21:40 INFO mapred.Task: Task 'attempt_local1592166400_0001_r_000000_0' done.
113 15/07/27 16:21:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local1592166400_0001_r_000000_0
114 15/07/27 16:21:40 INFO mapred.LocalJobRunner: reduce task executor complete.
115 15/07/27 16:21:40 INFO mapreduce.Job:  map 100% reduce 100%
116 15/07/27 16:21:41 INFO mapreduce.Job: Job job_local1592166400_0001 completed successfully
117 15/07/27 16:21:41 INFO mapreduce.Job: Counters: 38
118     File System Counters
119         FILE: Number of bytes read=3834
120         FILE: Number of bytes written=1017600
121         FILE: Number of read operations=0
122         FILE: Number of large read operations=0
123         FILE: Number of write operations=0
124         HDFS: Number of bytes read=161
125         HDFS: Number of bytes written=62
126         HDFS: Number of read operations=41
127         HDFS: Number of large read operations=0
128         HDFS: Number of write operations=10
129     Map-Reduce Framework
130         Map input records=14
131         Map output records=14
132         Map output bytes=112
133         Map output materialized bytes=158
134         Input split bytes=339
135         Combine input records=0
136         Combine output records=0
137         Reduce input groups=13
138         Reduce shuffle bytes=158
139         Reduce input records=14
140         Reduce output records=14
141         Spilled Records=28
142         Shuffled Maps =3
143         Failed Shuffles=0
144         Merged Map outputs=3
145         GC time elapsed (ms)=10
146         CPU time spent (ms)=0
147         Physical memory (bytes) snapshot=0
148         Virtual memory (bytes) snapshot=0
149         Total committed heap usage (bytes)=1420296192
150     Shuffle Errors
151         BAD_ID=0
152         CONNECTION=0
153         IO_ERROR=0
154         WRONG_LENGTH=0
155         WRONG_MAP=0
156         WRONG_REDUCE=0
157     File Input Format Counters 
158         Bytes Read=48
159     File Output Format Counters 
160         Bytes Written=62

4. 可能出现的问题

4.1 权限问题,无法访问 HDFS

修改集群 hdfs-site.xml 配置,关闭 hadoop 集群的权限校验。

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

4.2 出现 NullPointerException 异常

在环境变量中配置 %HADOOP_HOME% 为 C:\Download\hadoop-2.6.0\

下载 winutils.exe 和 hadoop.dll 到 C:\Download\hadoop-2.6.0\bin

注意:网上很多资料说的是下载 hadoop-common-2.2.0-bin-master.zip,但很多不支持 hadoop2.6.0 版本。需要下载支持 hadoop2.6.0 版本的程序。

4.3 程序执行失败

需要执行 Run on Hadoop,而不是 Java Application。

搭建 Hadoop2.6.0+Eclipse 开发调试环境

Ubuntu14.04 下 Hadoop2.4.1 单机 / 伪分布式安装配置教程  http://www.linuxidc.com/Linux/2015-02/113487.htm

CentOS 安装和配置 Hadoop2.2.0  http://www.linuxidc.com/Linux/2014-01/94685.htm

Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

更多 Hadoop 相关信息见 Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址 :http://www.linuxidc.com/Linux/2015-08/120943.htm

正文完
星哥玩云-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-20发表,共计15256字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中