共计 10736 个字符,预计需要花费 27 分钟才能阅读完成。
Hadoop 集群部署,就是以 Cluster mode 方式进行部署。本文是基于 JDK1.7.0_79,hadoop2.7.5。
1.Hadoop 的节点构成如下:
HDFS daemon:NameNode, SecondaryNameNode, DataNode
YARN damones:ResourceManager, NodeManager, WebAppProxy
MapReduce Job History Server
本次测试的分布式环境为:Master 1 台 (test166),Slave 1 台 (test167)
2.1 安装 JDK 及下载解压 hadoop
JDK 安装可参考:http://www.linuxidc.com/Linux/2017-01/139874.htm 或 CentOS7.2 安装 JDK1.7 http://www.linuxidc.com/Linux/2016-11/137398.htm
从官网下载 Hadoop 最新版 2.7.5
[hadoop@hadoop-master ~]$ su – hadoop
[hadoop@hadoop-master ~]$ cd /usr/hadoop/
[hadoop@hadoop-master ~]$ wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
将 hadoop 解压到 /usr/hadoop/ 下
[hadoop@hadoop-master ~]$ tar zxvf /root/hadoop-2.7.5.tar.gz
结果:
[hadoop@hadoop-master ~]$ ll
total 211852
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Desktop
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Documents
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Downloads
drwxr-xr-x. 10 hadoop hadoop 4096 Feb 22 01:36 hadoop-2.7.5
-rw-rw-r–. 1 hadoop hadoop 216929574 Dec 16 12:03 hadoop-2.7.5.tar.gz
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Music
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Pictures
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Public
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Templates
drwxr-xr-x. 2 hadoop hadoop 6 Jan 31 23:41 Videos
[hadoop@hadoop-master ~]$
2.2 在各节点上设置主机名及创建 hadoop 组和用户
所有节点(master,slave)
1 [root@hadoop-master ~]# su – root
2 [root@hadoop-master ~]# vi /etc/hosts
3 10.86.255.166 hadoop-master
4 10.86.255.167 slave1
5 注意:修改 hosts 中,是立即生效的,无需 source 或者.。
先使用
建立 hadoop 用户组
新建用户,useradd -d /usr/hadoop -g hadoop -m hadoop(新建用户 hadoop 指定用户主目录 /usr/hadoop 及所属组 hadoop)
passwd hadoop 设置 hadoop 密码(这里设置密码为 hadoop)
[root@hadoop-master ~]# groupadd hadoop
[root@hadoop-master ~]# useradd -d /usr/hadoop -g hadoop -m hadoop
[root@hadoop-master ~]# passwd hadoop
2.3 在各节点上设置 SSH 无密码登录
最终达到目的:即在 master: 节点执行 ssh hadoop@salve1 不需要密码,此处只需配置 master 访问 slave1 免密。
su – hadoop
进入~/.ssh 目录
执行:ssh-keygen -t rsa,一路回车
生成两个文件,一个私钥,一个公钥,在 master1 中执行:cp id_rsa.pub authorized_keys
[hadoop@hadoop-master ~]$ su – hadoop
[hadoop@hadoop-master ~]$ pwd
/usr/hadoop
[hadoop@hadoop-master ~]$ cd .ssh
[hadoop@hadoop-master .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/usr/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /usr/hadoop/.ssh/id_rsa.
Your public key has been saved in /usr/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
11:b2:23:8c:e7:32:1d:4c:2f:00:32:1a:15:43:bb:de hadoop@hadoop-master
The key’s randomart image is:
+–[RSA 2048]—-+
|=+*.. . . |
|oo O . o . |
|. o B + . |
| = + . . |
| + o S |
| . + |
| . E |
| |
| |
+—————–+
[hadoop@hadoop-master .ssh]$
[hadoop@hadoop-master .ssh]$ cp id_rsa.pub authorized_keys
[hadoop@hadoop-master .ssh]$ ll
total 16
-rwx——. 1 hadoop hadoop 1230 Jan 31 23:27 authorized_keys
-rwx——. 1 hadoop hadoop 1675 Feb 23 19:07 id_rsa
-rwx——. 1 hadoop hadoop 402 Feb 23 19:07 id_rsa.pub
-rwx——. 1 hadoop hadoop 874 Feb 13 19:40 known_hosts
[hadoop@hadoop-master .ssh]$
2.3.1:本机无密钥登录
[hadoop@hadoop-master ~]$ pwd
/usr/hadoop
[hadoop@hadoop-master ~]$ chmod -R 700 .ssh
[hadoop@hadoop-master ~]$ cd .ssh
[hadoop@hadoop-master .ssh]$ chmod 600 authorized_keys
[hadoop@hadoop-master .ssh]$ ll
total 16
-rwx——. 1 hadoop hadoop 1230 Jan 31 23:27 authorized_keys
-rwx——. 1 hadoop hadoop 1679 Jan 31 23:26 id_rsa
-rwx——. 1 hadoop hadoop 410 Jan 31 23:26 id_rsa.pub
-rwx——. 1 hadoop hadoop 874 Feb 13 19:40 known_hosts
验证:
没有提示输入密码则表示本机无密钥登录成功,如果此步不成功,后续启动 hdfs 脚本会要求输入密码
[hadoop@hadoop-master ~]$ ssh hadoop@hadoop-master
Last login: Fri Feb 23 18:54:59 2018 from hadoop-master
[hadoop@hadoop-master ~]$
2.3.2:master 与其他节点无密钥登录
(若已有 authorized_keys,则执行 ssh-copy-id ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1 上面命令的功能 ssh-copy-id 将 pub 值写入远程机器的~/.ssh/authorized_key 中)
从 master 中把 authorized_keys 分发到各个结点上 (会提示输入密码,输入 slave1 相应的密码即可):
scp /usr/hadoop/.ssh/authorized_keys hadoop@slave1:/home/master/.ssh
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
hadoop@slave1’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘hadoop@slave1′” and check to make sure that only the key(s) you wanted were added.
[hadoop@hadoop-master .ssh]$
然后在各个节点对 authorized_keys 执行 (一定要执行该步,否则会报错):chmod 600 authorized_keys
保证.ssh 700,.ssh/authorized_keys 600 权限
测试如下(第一次 ssh 时会提示输入 yes/no,输入 yes 即可):
[hadoop@hadoop-master ~]$ ssh hadoop@slave1
Last login: Fri Feb 23 18:40:10 2018
[hadoop@slave1 ~]$
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.
[hadoop@hadoop-master ~]$
2.4 设置 Hadoop 的环境变量
Master 及 slave1 都需操作
[root@hadoop-master ~]# su – root
[root@hadoop-master ~]# vi /etc/profile 末尾添加,保证任何路径下可执行 hadoop 命令
Java_HOME=/usr/java/jdk1.7.0_79
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=/usr/hadoop/hadoop-2.7.5/bin:$JAVA_HOME/bin:$PATH
让设置生效
[root@hadoop-master ~]# source /etc/profile
或者
[root@hadoop-master ~]# . /etc/profile
Master 设置 hadoop 环境
su – hadoop
1 # vi etc/hadoop/hadoop-env.sh 新增以下内容
2 export JAVA_HOME=/usr/java/jdk1.7.0_79
3 export HADOOP_HOME=/usr/hadoop/hadoop-2.7.5
此时 hadoop 安装已完成,可执行 hadoop 命令,后续步骤为集群部署
[hadoop@hadoop-master ~]$ hadoop
Usage: hadoop [–config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use “yarn jar” to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
[hadoop@hadoop-master ~]$
2.5 Hadoop 设定
2.5.0 开放端口 50070
注:centos7 版本对防火墙进行 加强, 不再使用原来的 iptables, 启用 firewall
Master 节点:
su – root
firewall-cmd –state 查看状态(若关闭,则先开启 systemctl start firewalld)
firewall-cmd –list-ports 查看已开放的端口
开启 8000 端口:firewall-cmd –zone=public(作用域) –add-port=8000/tcp(端口和访问类型) –permanent(永久生效)
firewall-cmd –zone=public –add-port=1521/tcp –permanent
firewall-cmd –zone=public –add-port=3306/tcp –permanent
firewall-cmd –zone=public –add-port=50070/tcp –permanent
firewall-cmd –zone=public –add-port=8088/tcp –permanent
firewall-cmd –zone=public –add-port=19888/tcp –permanent
firewall-cmd –zone=public –add-port=9000/tcp –permanent
firewall-cmd –zone=public –add-port=9001/tcp –permanent
firewall-cmd –reload - 重启防火墙
firewall-cmd –list-ports 查看已开放的端口
systemctl stop firewalld.service 停止防火墙
systemctl disable firewalld.service 禁止防火墙开机启动
关闭端口:firewall-cmd –zone= public –remove-port=8000/tcp –permanent
Slave1 节点:
su – root
systemctl stop firewalld.service 停止防火墙
systemctl disable firewalld.service 禁止防火墙开机启动
2.5.1 在 Master 节点的设定文件中指定 Slave 节点
[hadoop@hadoop-master hadoop]$ pwd
/usr/hadoop/hadoop-2.7.5/etc/hadoop
[hadoop@hadoop-master hadoop]$ vi slaves
slave1
2.5.2 在各节点指定 HDFS 文件存储的位置(默认是 /tmp)
Master 节点:namenode
创建目录并赋予权限
Su – root
# mkdir -p /usr/local/hadoop-2.7.5/tmp/dfs/name
# chmod -R 777 /usr/local/hadoop-2.7.5/tmp
# chown -R hadoop:hadoop /usr/local/hadoop-2.7.5
Slave 节点:datanode
创建目录并赋予权限,改变所有者
Su – root
# mkdir -p /usr/local/hadoop-2.7.5/tmp/dfs/data
# chmod -R 777 /usr/local/hadoop-2.7.5/tmp
# chown -R hadoop:hadoop /usr/local/hadoop-2.7.5
2.5.3 在 Master 中设置配置文件 (包括 yarn)
su – hadoop
# vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.7.5/tmp</value>
</property>
</configuration>
# vi etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop-2.7.5/tmp/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop-2.7.5/tmp/dfs/data</value>
</property>
</configuration>
#cp mapred-site.xml.template mapred-site.xml
# vi etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
YARN 设定
yarn 的组成(Master 节点:resourcemanager,Slave 节点:nodemanager)
以下仅在 master 操作, 后面步骤会统一分发至 salve1。
# vi etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2.5.4 将 Master 的文件分发至 slave1 节点。
cd /usr/hadoop
scp -r hadoop-2.7.5 hadoop@hadoop-master:/usr/hadoop
2.5.5 Master 上启动 job history server,Slave 节点上指定
此步 2.5.5 可跳过
Mater:
启动 jobhistory daemon
# sbin/mr-jobhistory-daemon.sh start historyserver
确认
# jps
访问 Job History Server 的 web 页面
http://localhost:19888/
Slave 节点:
# vi etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
2.5.6 格式化 HDFS(Master)
# hadoop namenode -format
Master 结果:
2.5.7 在 Master 上启动 daemon,Slave 上的服务会一起启动
启动:
[hadoop@hadoop-master hadoop-2.7.5]$ pwd
/usr/hadoop/hadoop-2.7.5[hadoop@hadoop-master hadoop-2.7.5]$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-master]
hadoop-master: starting namenode, logging to /usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop-master.out
slave1: starting datanode, logging to /usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop-master.out
slave1: starting nodemanager, logging to /usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-slave1.out
[hadoop@hadoop-master hadoop-2.7.5]$
确认
Master 节点:
[hadoop@hadoop-master hadoop-2.7.5]$ jps
81209 NameNode
81516 SecondaryNameNode
82052 Jps
81744 ResourceManager
Slave 节点:
[hadoop@slave1 ~]$ jps
58913 NodeManager
59358 Jps
58707 DataNode
停止(需要的时候再停止,后续步骤需 running 状态):
[hadoop@hadoop-master hadoop-2.7.5]$ sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hadoop-master]
hadoop-master: stopping namenode
slave1: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
no proxyserver to stop
2.5.8 创建 HDFS
# hdfs dfs -mkdir /user
# hdfs dfs -mkdir /user/test22
2.5.9 拷贝 input 文件到 HDFS 目录下
# hdfs dfs -put etc/hadoop/*.sh /user/test22/input
查看
# hdfs dfs -ls /user/test22/input
2.5.10 执行 hadoop job
统计单词的例子,此时的 output 是 hdfs 中的目录,hdfs dfs -ls 可查看
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount /user/test22/input output
确认执行结果
# hdfs dfs -cat output/*
2.5.11 查看错误日志
注:日志在 salve1 的 *.log 中而不是在 master 或 *.out 中
2.6 Q&A
1. hdfs dfs -put 报错如下,解决关闭 master&salve 防火墙
hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host