阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

CentOS 6.4+Hadoop2.2.0 Spark伪分布式安装

246次阅读
没有评论

共计 5401 个字符,预计需要花费 14 分钟才能阅读完成。

Hadoop 版本是 2.2.0 的稳定版本 下载地址
spark 版本:spark-0.9.1-bin-hadoop2  下载地址 http://spark.apache.org/downloads.html
这里的 spark 有三个版本:

    For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
    For CDH4: find an Apache mirror or direct file download
    For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
我的 hadoop 版本是 hadoop2.2.0 的,所以下载的是 for hadoop2

关于 spark 的介绍可以参看 http://spark.apache.org/
Apache Spark is a fast and general engine for large-scale data processing.

spark 运行时需要 scala 环境,这里下载最新版本的 scala  http://www.scala-lang.org/

scala 是一种可伸缩的语言是一种多范式的编程语言,一种类似 java 的编程,设计初衷是要集成面向对象编程和函数式编程的各种特性。Scala 是在 JVM 上运行,Scala 是一种纯粹的面向对象编程语言,而又无缝地结合了命令式和函数式的编程风格

ok 开始配置 spark:

我是在 hadoop 的安装用户下面安装的,所以这里直接编辑 /home/hadoop/.bashrc

[hadoop@localhost ~]$ cat .bashrc
# .bashrc

# Source global definitions
if [-f /etc/bashrc]; then
. /etc/bashrc
fi

# User specific aliases and functions
export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_HOME=/home/hadoop/hbase
export HIVE_HOME=/home/hadoop/hive
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=/etc/home/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/home/hadoop/scala
export SPARK_HOME=/home/hadoop/spark

export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP/lib:$HBASE_HOME/lib

1.scala 安装:
将 scala 解压到 hadoop 根目录下
ln -ls scala-2.11.0 scala# 建立软链接
lrwxrwxrwx.  1 hadoop hadoop        12 May 21 09:15 scala -> scala-2.11.0
drwxrwxr-x.  6 hadoop hadoop      4096 Apr 17 16:10 scala-2.11.0

编辑.bashrc  加入  export SCALA_HOME=/home/hadoop/scala
export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
保存 并使环境变量生效  source  .bashrc 
验证安装:
[hadoop@localhost ~]$ scala -version
Scala code runner version 2.11.0 — Copyright 2002-2013, LAMP/EPFL
能够正常显示版本说明安装成功

2:spark 配置:
tar -xzvf  spark-0.9.1-bin-hadoop2.tgz
ln -s spark-0.9.1-bin-hadoop2 spark
然后配置.bashrc 
export SPARK_HOME=/home/hadoop/spark
export PATH=${PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

编辑完成 source .bashrc 使环境变量生效

spark-env.sh 配置:
spark-env.sh 是不存在的 需要从 cat spark-env.sh.template >> spark-env.sh 生成

然后编辑 spark-env.sh

加入一下内容
export SCALA_HOME=/home/hadoop/scala
export JAVA_HOME=/usr/java/jdk
export SPARK_MASTER=localhost
export SPARK_LOCAL_IP=localhost
export HADOOP_HOME=/home/hadoop/hadoop
export SPARK_HOME=/home/hadoop/spark
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

保存退出

3. 启动 spark
跟 hadoop 的目录结构相似 在 spark 下面的 sbin 里边放了启动和关闭的 shell 文件
-rwxrwxr-x. 1 hadoop hadoop 2504 Mar 27 13:44 slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1403 Mar 27 13:44 spark-config.sh
-rwxrwxr-x. 1 hadoop hadoop 4503 Mar 27 13:44 spark-daemon.sh
-rwxrwxr-x. 1 hadoop hadoop 1176 Mar 27 13:44 spark-daemons.sh
-rwxrwxr-x. 1 hadoop hadoop  965 Mar 27 13:44 spark-executor
-rwxrwxr-x. 1 hadoop hadoop 1263 Mar 27 13:44 start-all.sh
-rwxrwxr-x. 1 hadoop hadoop 2384 Mar 27 13:44 start-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1520 Mar 27 13:44 start-slave.sh
-rwxrwxr-x. 1 hadoop hadoop 2258 Mar 27 13:44 start-slaves.sh
-rwxrwxr-x. 1 hadoop hadoop 1047 Mar 27 13:44 stop-all.sh
-rwxrwxr-x. 1 hadoop hadoop 1124 Mar 27 13:44 stop-master.sh
-rwxrwxr-x. 1 hadoop hadoop 1427 Mar 27 13:44 stop-slaves.sh
[hadoop@localhost sbin]$ pwd
/home/hadoop/spark/sbin

这里只需要运行 start-all 就可以了~~~
[hadoop@localhost sbin]$ ./start-all.sh
rsync from localhost
rsync: change_dir “/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost” failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
localhost: rsync from localhost
localhost: rsync: change_dir “/home/hadoop/spark-0.9.1-bin-hadoop2/localhost” failed: No such file or directory (2)
localhost: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out

通过 jps 查看启动是否成功:
[hadoop@localhost sbin]$ jps
4706 Jps
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode

可以看到有一个 Master 跟 Worker 进程 说明启动成功
可以通过 http://localhost:8080/ 查看 spark 集群状况

4 运行 spark 自带的程序 
首先需要进入 spark 下面的 bin 目录:
[hadoop@localhost sbin]$ ll ../bin/
total 56
-rw-rw-r–. 1 hadoop hadoop 2601 Mar 27 13:44 compute-classpath.cmd
-rwxrwxr-x. 1 hadoop hadoop 3330 Mar 27 13:44 compute-classpath.sh
-rwxrwxr-x. 1 hadoop hadoop 2070 Mar 27 13:44 pyspark
-rw-rw-r–. 1 hadoop hadoop 1827 Mar 27 13:44 pyspark2.cmd
-rw-rw-r–. 1 hadoop hadoop 1000 Mar 27 13:44 pyspark.cmd
-rwxrwxr-x. 1 hadoop hadoop 3055 Mar 27 13:44 run-example
-rw-rw-r–. 1 hadoop hadoop 2046 Mar 27 13:44 run-example2.cmd
-rw-rw-r–. 1 hadoop hadoop 1012 Mar 27 13:44 run-example.cmd
-rwxrwxr-x. 1 hadoop hadoop 5151 Mar 27 13:44 spark-class
-rwxrwxr-x. 1 hadoop hadoop 3212 Mar 27 13:44 spark-class2.cmd
-rw-rw-r–. 1 hadoop hadoop 1010 Mar 27 13:44 spark-class.cmd
-rwxrwxr-x. 1 hadoop hadoop 3184 Mar 27 13:44 spark-shell
-rwxrwxr-x. 1 hadoop hadoop  941 Mar 27 13:44 spark-shell.cmd

run-example org.apache.spark.examples.SparkLR spark://localhost:7077

run-example org.apache.spark.examples.SparkPi spark://localhost:7077

Hadoop2.5.2 HA 高可靠性集群搭建(Hadoop+Zookeeper) http://www.linuxidc.com/Linux/2016-03/128913.htm

Hadoop2.7 完全分布式集群搭建以及任务测试   http://www.linuxidc.com/Linux/2016-02/128730.htm

一步步教你 Hadoop 多节点集群安装配置 http://www.linuxidc.com/Linux/2016-02/128149.htm

更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-03/129064.htm

正文完
星哥玩云-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-21发表,共计5401字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中