共计 8713 个字符,预计需要花费 22 分钟才能阅读完成。
Hadoop 集群安装 -CDH5(3 台服务器集群)
CDH5包下载:http://archive.cloudera.com/cdh5/
主机规划:
IP |
Host |
部署模块 |
进程 |
192.168.107.82 |
|
NameNode ResourceManager |
NameNode DFSZKFailoverController ResourceManager |
192.168.107.83 |
Hadoop-DN-01 Zookeeper-01 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
192.168.107.84 |
Hadoop-DN-02 Zookeeper-02 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
各个进程解释:
- NameNode
- ResourceManager
- DFSZKFC:DFS Zookeeper Failover Controller 激活 Standby NameNode
- DataNode
- NodeManager
- JournalNode:NameNode 共享 editlog 结点服务(如果使用 NFS 共享,则该进程和所有启动相关配置接可省略)。
- QuorumPeerMain:Zookeeper 主进程
目录规划:
名称 |
路径 |
$HADOOP_HOME |
/home/hadoopuser/hadoop-2.6.0-cdh5.6.0 |
Data |
$ HADOOP_HOME/data |
Log |
$ HADOOP_HOME/logs |
配置:
一、关闭防火墙(防火墙可以以后配置)
二、安装JDK(略)
三、修改 HostName 并配置 Host(3 台)
[ | ]|
[ | ]|
192.168.107.82 Hadoop-NN-01 | |
192.168.107.83 Hadoop-DN-01 Zookeeper-01 | |
192.168.107.84 Hadoop-DN-02 Zookeeper-01 |
四、为了安全,创建 Hadoop 专门登录的用户(5台)
[ | ]|
[ | ]|
[ | ]
五、配置 SSH 免密码登录(2台NameNode)
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-keygen #生成公私钥 | |
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopuser@Hadoop-NN-01 |
-I 表示 input
~/.ssh/id_rsa.pub 表示哪个公钥组
或者省略为:
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id Hadoop-NN-01(或写 IP:10.10.51.231)#将公钥扔到对方服务器 | |
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id”6000 Hadoop-NN-01”#如果带端口则这样写 |
注意修改 Hadoop 的配置文件 Hadoop-env.sh
export HADOOP_SSH_OPTS=”-p 6000”
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 #验证(退出当前连接命令:exit、logout) | |
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 –p 6000 #如果带端口这样写 |
六、配置环境变量:vi ~/.bashrc 然后 source ~/.bashrc(5 台)
[hadoopuser@Linux01 ~]$ vi ~/.bashrc | |
# hadoop cdh5 | |
export HADOOP_HOME=/home/hadoopuser/hadoop-2.6.0-cdh5.6.0 | |
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin | |
[hadoopuser@Linux01 ~]$ source ~/.bashrc #生效 |
七、安装 zookeeper(2 台DataNode)
1、解压
2、配置环境变量:vi ~/.bashrc
[hadoopuser@Linux01 ~]$ vi ~/.bashrc | |
# zookeeper cdh5 | |
export ZOOKEEPER_HOME=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0 | |
export PATH=$PATH:$ZOOKEEPER_HOME/bin | |
[hadoopuser@Linux01 ~]$ source ~/.bashrc # 生效 |
3、修改日志输出
[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/libexec/zkEnv.sh | |
56 行:找到如下位置修改语句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs" |
4、修改配置文件
[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/conf/zoo.cfg | |
# zookeeper | |
tickTime=2000 | |
initLimit=10 | |
syncLimit=5 | |
dataDir=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0/data | |
clientPort=2181 | |
# cluster | |
server.1=Zookeeper-01:2888:3888 | |
server.2=Zookeeper-02:2888:3888 |
5、设置myid
(1)Hadoop-DN -01:
mkdir $ZOOKEEPER_HOME/data | |
echo 1 > $ZOOKEEPER_HOME/data/myid |
(2)Hadoop-DN -02:
mkdir $ZOOKEEPER_HOME/data | |
echo 2 > $ZOOKEEPER_HOME/data/myid |
6、各结点启动:
[hadoopuser@Linux01 ~]$ zkServer.sh start
7、验证
[hadoopuser@Linux01 ~]$ jps | |
3051 Jps | |
2829 QuorumPeerMain |
8、状态
[hadoopusersh status | ~]$ zkServer.|
JMX enabled by default | |
Using config: /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg | |
Mode: follower |
9、附录 zoo.cfg 各配置项说明
属性 |
意义 |
tickTime |
时间单元,心跳和最低会话超时时间为 tickTime 的两倍 |
dataDir |
数据存放位置,存放内存快照和事务更新日志 |
clientPort |
客户端访问端口 |
initLimit |
配 置 Zookeeper 接受客户端(这里所说的客户端不是用户连接 Zookeeper 服务器的客户端,而是 Zookeeper 服务器集群中连接到 Leader 的 Follower 服务器)初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过 10 个心跳的时间(也就是 tickTime)长度后 Zookeeper 服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败。总的时间长度就是 5*2000=10 秒。 |
syncLimit |
这个配置项标识 Leader 与 Follower 之间发送消息,请求和应答时间长度,最长不能超过多少个 |
server.id=host:port:port server.A=B:C:D |
集群结点列表: A:是一个数字,表示这个是第几号服务器; B:是这个服务器的 ip 地址; C:表示的是这个服务器与集群中的 Leader 服务器交换信息的端口; D:表示的是万一集群中的 Leader 服务器挂了,需要一个端口来重新进行选举,选出一个新的 Leader,而这个端口就是用来执行选举时服务器相互通信的端口。如果是伪集群的配置方式,由于 B 都是一样,所以不同的 Zookeeper 实例通信端口号不能一样,所以要给它们分配不同的端口号。 |
八、安装 Hadoop,并配置(只装1 台配置完成后分发给其它节点)
1、解压
2、修改配置文件
(1)修改 $HADOOP_HOME/etc/hadoop/masters
Hadoop-NN-01
(2)修改 $HADOOP_HOME/etc/hadoop/slaves
Hadoop-DN-01 | |
Hadoop-DN-02 |
(3)修改 $HADOOP_HOME/etc/hadoop/vi core-site.xml
<configuration> | |
<property> | |
<name>fs.defaultFS</name> | |
<value>hdfs://Hadoop-NN-01:9000</value> | |
<description>定义 HadoopMaster 的 URI 和端口</description> | |
</property> | |
<property> | |
<name>io.file.buffer.size</name> | |
<value>131072</value> | |
<description>用作序列化文件处理时读写 buffer 的大小</description> | |
</property> | |
<property> | |
<name>hadoop.tmp.dir</name> | |
<value>file:/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/tmp</value> | |
<description>临时数据存储目录设定</description> | |
</property> | |
</configuration> |
(4)修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> | |
<property> | |
<name>dfs.namenode.name.dir</name> | |
<value>file:/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/name</value> | |
<description> namenode 存放 name table(fsimage)本地目录(需要修改)</description> | |
</property> | |
<property> | |
<name>dfs.namenode.data.dir</name> | |
<value>file:/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/data</value> | |
<description>datanode 存放 block 本地目录(需要修改)</description> | |
</property> | |
<property> | |
<name>dfs.replication</name> | |
<value>1</value> | |
<description>文件副本个数,默认为 3 </description> | |
</property> | |
</configuration> |
(5)修改 $HADOOP_HOME/etc/hadoop/ yarn-site.xml
<configuration> | |
<property> | |
<name>yarn.resourcemanager.address</name> | |
<value>Hadoop-NN-01:8032</value> | |
</property> | |
<property> | |
<name>yarn.resourcemanager.scheduler.address</name> | |
<value>Hadoop-NN-01:8030</value> | |
</property> | |
<property> | |
<name>yarn.resourcemanager.resource-tracker.address</name> | |
<value>Hadoop-NN-01:8031</value> | |
</property> | |
<property> | |
<name>yarn.resourcemanager.admin.address</name> | |
<value>Hadoop-NN-01:8033</value> | |
</property> | |
<property> | |
<name>yarn.resourcemanager.webapp.address</name> | |
<value>Hadoop-NN-01:8088</value> | |
</property> | |
<property> | |
<name>yarn.nodemanager.aux-services</name> | |
<value>mapreduce_shuffle</value> | |
</property> | |
<property> | |
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> | |
<value>org.apache.hadoop.mapred.ShuffleHandler</value> | |
</property> | |
</configuration> |
(6)修改 $HADOOP_HOME/etc/hadoop/ mapred-site.xml
<configuration> | |
<property> | |
<name>mapreduce.framework.name</name> | |
<value>yarn</value> | |
</property> | |
<property> | |
<name>mapreduce.jobhistory.address</name> | |
<value>Hadoop-NN-01:10020</value> | |
</property> | |
<property> | |
<name>mapreduce.jobhistory.webapp.address</name> | |
<value>Hadoop-NN-01:19888</value> | |
</property> | |
</configuration> |
(7)修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh
--------------------Java Env------------------------------ | |
export JAVA_HOME="/usr/java/jdk1.8.0_73" | |
--------------------Hadoop Env---------------------------- | |
export HADOOP_PID_DIR=${HADOOP_PID_DIR} | |
export HADOOP_PREFIX="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0" | |
--------------------Hadoop Daemon Options----------------- | |
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" | |
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" | |
--------------------Hadoop Logs--------------------------- | |
export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER | |
--------------------SSH PORT------------------------------- | |
export HADOOP_SSH_OPTS="-p 6000" #如果你修改了 SSH 登录端口,一定要修改此配置。 |
(8)修改 $HADOOP_HOME/etc/hadoop/yarn-env.sh
Yarn Daemon Options | |
export YARN_RESOURCEMANAGER_OPTS | |
export YARN_NODEMANAGER_OPTS | |
export YARN_PROXYSERVER_OPTS | |
export HADOOP_JOB_HISTORYSERVER_OPTS | |
#Yarn Logs | |
export YARN_LOG_DIR="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs" |
3、分发程序
scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-01:/home/hadoopuser | |
scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-02:/home/hadoopuser |
4、格式化NameNode
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop namenode -format
5、启动JournalNode:
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs/hadoop-puppet-journalnode-BigData-03.out
验证JournalNode:
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ jps | |
9076 Jps | |
9029 JournalNode |
6、启动HDFS
集群启动法:Hadoop-NN-01
:start-dfs.sh
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-dfs.sh
单进程启动法:
<1>NameNode(Hadoop-NN-01
,Hadoop-NN-02):hadoop-daemon.sh start namenode
<2>DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start datanode
<3>JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode
7、启动Yarn
<1> 集群启动
Hadoop-NN-01
启动 Yarn,命令所在目录:$HADOOP_HOME/sbin
[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-yarn.sh
<2> 单进程启动
ResourceManager(Hadoop-NN-01
,Hadoop-NN-02):yarn-daemon.sh start resourcemanager
DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):yarn-daemon.sh start nodemanager
验证(略)
下面关于 Hadoop 的文章您也可能喜欢,不妨看看:
Ubuntu14.04 下 Hadoop2.4.1 单机 / 伪分布式安装配置教程 http://www.linuxidc.com/Linux/2015-02/113487.htm
CentOS 安装和配置 Hadoop2.2.0 http://www.linuxidc.com/Linux/2014-01/94685.htm
Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm
Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm
Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm
Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm
单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm
更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13
本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-05/131868.htm
