共计 7625 个字符,预计需要花费 20 分钟才能阅读完成。
Ubuntu 14.04 LTS
java version “1.7.0_60”
export HIVE_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin
export PATH=$HIVE_HOME/bin:$PATH
进到 conf 目录下拷贝模板配置文件重命名
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls
hive-default.xml.template hive-exec-log4j.properties.template
hive-env.sh.template hive-log4j.properties.template
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-env.sh.template hive-env.sh
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-default.xml.template hive-site.xml
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls
hive-default.xml.template hive-env.sh.template hive-log4j.properties.template
hive-env.sh hive-exec-log4j.properties.template hive-site.xml
修改配置文件 hive-env.sh 中的以下几处,分别制定 Hadoop 的根目录,Hive 的 conf 和 lib 目录
# Set HADOOP_HOME to point to a specific hadoop install directory
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/fulong/Hive/apache-hive-0.13.1-bin/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/fulong/Hive/apache-hive-0.13.1-bin/lib
修改配置文件 hive-site.sh 中的以下几处 连接 Oracle 相关参数
<description>JDBC connect string for a JDBC metastore</description>
<description>Driver class name for a JDBC metastore</description>
<description>username to use against metastore database</description>
<description>password to use against metastore database</description>
在 $HIVE_HOME 下创建 log4j 目录,用于存储日志文件
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-log4j.properties.template hive-log4j.properties
拷贝 Oracle JDBC 的 jar 包
将对应 Oracle 的 jdbc 包拷贝到 $HIVE_HOME/lib 下
启动 Hive
fulong@FBI006:~/Hive/apache-hive-0.13.1-bin$ hive
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/20 17:14:05 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/08/20 17:14:05 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/fulong/Hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/fulong/Hadoop/hadoop-2.2.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It’s highly recommended that you fix the library with ‘execstack -c <libfile>’, or link it with ‘-z noexecstack’.
hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : ORA-01754: a table may contain only one column of type LONG
用解压缩工具打开 ${HIVE_HOME}/lib 中的 hive-metastore-0.13.0.jar,发现名为 package.jdo 的文件,打开该文件并找到下面的内容。
<field name=”viewOriginalText” default-fetch-group=”false”>
<column name=”VIEW_ORIGINAL_TEXT” jdbc-type=”LONGVARCHAR“/>
<field name=”viewExpandedText” default-fetch-group=”false”>
<column name=”VIEW_EXPANDED_TEXT” jdbc-type=”LONGVARCHAR“/>
可以发现列 VIEW_ORIGINAL_TEXT 和 VIEW_EXPANDED_TEXT 的类型都为 LONGVARCHAR,对应于 Oracle 中的 LONG,这样就与 Oracle 表只能存在一列类型为 LONG 的列的要求相矛盾,所以就出现错误了。
按照 Hive 官网的建议将该两列的 jdbc-type 的值改为 CLOB,修改后的内容如下所示。
<field name=”viewOriginalText”default-fetch-group=”false”>
<column name=”VIEW_ORIGINAL_TEXT” jdbc-type=”CLOB“/>
<field name=”viewExpandedText”default-fetch-group=”false”>
<column name=”VIEW_EXPANDED_TEXT” jdbc-type=”CLOB“/>
修改以后,重启 hive。
hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ stored as textfile;
Time taken: 0.986 seconds
hive> load data local inpath ‘/home/fulong/Downloads/SogouQ.reduced’ overwrite into table searchlog;
Copying data from file:/home/fulong/Downloads/SogouQ.reduced
Copying file: file:/home/fulong/Downloads/SogouQ.reduced
Loading data to table default.searchlog
rmr: DEPRECATED: Please use ‘rm -r’ instead.
Deleted hdfs://fulonghadoop/user/hive/warehouse/searchlog
Table default.searchlog stats: [numFiles=1, numRows=0, totalSize=152006060, rawDataSize=0]
Time taken: 25.705 seconds
hive> show tables;
Time taken: 0.139 seconds, Fetched: 1 row(s)
hive> select count(*) from searchlog;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1407233914535_0001, Tracking URL = http://FBI003:8088/proxy/application_1407233914535_0001/
Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-20 18:03:17,667 Stage-1 map = 0%, reduce = 0%
2014-08-20 18:04:05,426 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.46 sec
2014-08-20 18:04:27,317 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.74 sec
MapReduce Total cumulative CPU time: 4 seconds 740 msec
Ended Job = job_1407233914535_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.74 sec HDFS Read: 152010455 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 740 msec
Time taken: 103.154 seconds, Fetched: 1 row(s)
Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm
Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm
Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm
Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm
单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm
搭建 Hadoop 环境(在 Winodws 环境下用虚拟机虚拟两个 Ubuntu 系统进行搭建)http://www.linuxidc.com/Linux/2011-12/48894.htm
更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13