共计 5401 个字符,预计需要花费 14 分钟才能阅读完成。
Hadoop 版本是 2.2.0 的稳定版本 下载地址
spark 版本:spark-0.9.1-bin-hadoop2 下载地址 http://spark.apache.org/downloads.html
这里的 spark 有三个版本:
For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
For CDH4: find an Apache mirror or direct file download
For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
我的 hadoop 版本是 hadoop2.2.0 的,所以下载的是 for hadoop2
关于 spark 的介绍可以参看 http://spark.apache.org/
Apache Spark is a fast and general engine for large-scale data processing.
spark 运行时需要 scala 环境,这里下载最新版本的 scala http://www.scala-lang.org/
scala 是一种可伸缩的语言是一种多范式的编程语言,一种类似 java 的编程,设计初衷是要集成面向对象编程和函数式编程的各种特性。Scala 是在 JVM 上运行,Scala 是一种纯粹的面向对象编程语言,而又无缝地结合了命令式和函数式的编程风格
ok 开始配置 spark:
我是在 hadoop 的安装用户下面安装的,所以这里直接编辑 /home/hadoop/.bashrc
[hadoop@localhost ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [-f /etc/bashrc]; then
. /etc/bashrc
fi
# User specific aliases and functions
export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_HOME=/home/hadoop/hbase
export HIVE_HOME=/home/hadoop/hive
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=/etc/home/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/home/hadoop/scala
export SPARK_HOME=/home/hadoop/spark
export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP/lib:$HBASE_HOME/lib
1.scala 安装:
将 scala 解压到 hadoop 根目录下
ln -ls scala-2.11.0 scala# 建立软链接
lrwxrwxrwx. 1 hadoop hadoop 12 May 21 09:15 scala -> scala-2.11.0
drwxrwxr-x. 6 hadoop hadoop 4096 Apr 17 16:10 scala-2.11.0
编辑.bashrc 加入 export SCALA_HOME=/home/hadoop/scala
export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
保存 并使环境变量生效 source .bashrc
验证安装:
[hadoop@localhost ~]$ scala -version
Scala code runner version 2.11.0 — Copyright 2002-2013, LAMP/EPFL
能够正常显示版本说明安装成功
2:spark 配置:
tar -xzvf spark-0.9.1-bin-hadoop2.tgz
ln -s spark-0.9.1-bin-hadoop2 spark
然后配置.bashrc
export SPARK_HOME=/home/hadoop/spark
export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
编辑完成 source .bashrc 使环境变量生效
spark-env.sh 配置:
spark-env.sh 是不存在的 需要从 cat spark-env.sh.template >> spark-env.sh 生成
然后编辑 spark-env.sh
加入一下内容
export SCALA_HOME=/home/hadoop/scala
export JAVA_HOME=/usr/java/jdk
export SPARK_MASTER=localhost
export SPARK_LOCAL_IP=localhost
export HADOOP_HOME=/home/hadoop/hadoop
export SPARK_HOME=/home/hadoop/spark
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
保存退出
3. 启动 spark
跟 hadoop 的目录结构相似 在 spark 下面的 sbin 里边放了启动和关闭的 shell 文件
-rwxrwxr-x. 1 hadoop hadoop 2504 Mar 27 13:44 slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1403 Mar 27 13:44 spark-config.sh
-rwxrwxr-x. 1 hadoop hadoop 4503 Mar 27 13:44 spark-daemon.sh
-rwxrwxr-x. 1 hadoop hadoop 1176 Mar 27 13:44 spark-daemons.sh
-rwxrwxr-x. 1 hadoop hadoop 965 Mar 27 13:44 spark-executor
-rwxrwxr-x. 1 hadoop hadoop 1263 Mar 27 13:44 start-all.sh
-rwxrwxr-x. 1 hadoop hadoop 2384 Mar 27 13:44 start-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1520 Mar 27 13:44 start-slave.sh
-rwxrwxr-x. 1 hadoop hadoop 2258 Mar 27 13:44 start-slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1047 Mar 27 13:44 stop-all.sh
-rwxrwxr-x. 1 hadoop hadoop 1124 Mar 27 13:44 stop-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1427 Mar 27 13:44 stop-slaves.sh
[hadoop@localhost sbin]$ pwd
/home/hadoop/spark/sbin
这里只需要运行 start-all 就可以了~~~
[hadoop@localhost sbin]$ ./start-all.sh
rsync from localhost
rsync: change_dir “/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost” failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
localhost: rsync from localhost
localhost: rsync: change_dir “/home/hadoop/spark-0.9.1-bin-hadoop2/localhost” failed: No such file or directory (2)
localhost: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out
通过 jps 查看启动是否成功:
[hadoop@localhost sbin]$ jps
4706 Jps
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode
可以看到有一个 Master 跟 Worker 进程 说明启动成功
可以通过 http://localhost:8080/ 查看 spark 集群状况
4 运行 spark 自带的程序
首先需要进入 spark 下面的 bin 目录:
[hadoop@localhost sbin]$ ll ../bin/
total 56
-rw-rw-r–. 1 hadoop hadoop 2601 Mar 27 13:44 compute-classpath.cmd
-rwxrwxr-x. 1 hadoop hadoop 3330 Mar 27 13:44 compute-classpath.sh
-rwxrwxr-x. 1 hadoop hadoop 2070 Mar 27 13:44 pyspark
-rw-rw-r–. 1 hadoop hadoop 1827 Mar 27 13:44 pyspark2.cmd
-rw-rw-r–. 1 hadoop hadoop 1000 Mar 27 13:44 pyspark.cmd
-rwxrwxr-x. 1 hadoop hadoop 3055 Mar 27 13:44 run-example
-rw-rw-r–. 1 hadoop hadoop 2046 Mar 27 13:44 run-example2.cmd
-rw-rw-r–. 1 hadoop hadoop 1012 Mar 27 13:44 run-example.cmd
-rwxrwxr-x. 1 hadoop hadoop 5151 Mar 27 13:44 spark-class
-rwxrwxr-x. 1 hadoop hadoop 3212 Mar 27 13:44 spark-class2.cmd
-rw-rw-r–. 1 hadoop hadoop 1010 Mar 27 13:44 spark-class.cmd
-rwxrwxr-x. 1 hadoop hadoop 3184 Mar 27 13:44 spark-shell
-rwxrwxr-x. 1 hadoop hadoop 941 Mar 27 13:44 spark-shell.cmd
run-example org.apache.spark.examples.SparkLR spark://localhost:7077
run-example org.apache.spark.examples.SparkPi spark://localhost:7077
Hadoop2.5.2 HA 高可靠性集群搭建(Hadoop+Zookeeper) http://www.linuxidc.com/Linux/2016-03/128913.htm
Hadoop2.7 完全分布式集群搭建以及任务测试 http://www.linuxidc.com/Linux/2016-02/128730.htm
一步步教你 Hadoop 多节点集群安装配置 http://www.linuxidc.com/Linux/2016-02/128149.htm
更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13
本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-03/129064.htm