共计 4910 个字符,预计需要花费 13 分钟才能阅读完成。
Sqoop 工具是 Hadoop 环境下连接关系数据库,和 hadoop 存储系统的桥梁,支持多种关系数据源和 hive,hdfs,hbase 的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop 可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是 sqoop 也提供了增量数据导入的机制。
下面介绍几个常用的 sqoop 的命令,以及一些参数:
序号 | 命令 /command | 类 | 说明 |
1 | impor | ImportTool | 从关系型数据库中导入数据 (来自表或者查询语句) 到 HDFS 中 |
2 | export | ExportTool | 将 HDFS 中的数据导入到关系型数据库中 |
3 | codegen | CodeGenTool | 获取数据库中某张表数据生成 Java 并打成 jar 包 |
4 | create-hive-table | CreateHiveTableTool | 创建 Hive 表 |
5 | eval | EvalSqlTool | 查看 SQL 执行结果 |
6 | import-all-tables | ImportAllTablesTool | 导入某个数据库下所有表到 HDFS 中 |
7 | job | JobTool | |
8 | list-databases | ListDatabasesTool | 列出所有数据库名 |
9 | list-tables | ListTablesTool | 列出某个数据库下所有表 |
10 | merge | MergeTool | |
11 | metastore | MetastoreTool | |
12 | help | HelpTool | 查看帮助 |
13 | version | VersionTool | 查看版本 |
接着列出 Sqoop 的各种通用参数, 然后针对以上 13 个命令列出他们自己的参数.Sqoop 通用参数又分 Common arguments
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments,Hive arguments
HBasearguments
GenericHadoop command-line arguments
1.Common arguments 通用参数, 主要是针对关系型数据库链接的一些参数
序号 | 参数 | 说明 | 样例 |
1 | connect | 连接关系型数据库的 URL | jdbc:mysql://localhost/sqoop_datas |
2 | connection-manager | 连接管理类, 一般不用 | |
3 | driver | 连接驱动 | |
4 | hadoop-home | hadoop 目录 | /home/hadoop |
5 | help | 查看帮助信息 | |
6 | password | 连接关系型数据库的密码 | |
7 | username | 链接关系型数据库的用户名 | |
8 | verbose | 查看更多的信息, 其实是将日志级别调低 | 该参数后面不接值 |
Importcontrol arguments:
Argument | Description |
–append | Append data to an existing dataset in HDFS |
–as-avrodatafile | Imports data to Avro Data Files |
–as-sequencefile | Imports data to SequenceFiles |
–as-textfile | Imports data as plain text (default) |
–boundary-query <statement> | Boundary query to use for creating splits |
–columns <col,col,col…> | Columns to import from table |
–direct | Use direct import fast path |
–direct-split-size <n> | Split the input stream every n bytes when importing in direct mode |
–inline-lob-limit <n> | Set the maximum size for an inline LOB |
-m,–num-mappers <n> | Use n map tasks to import in parallel |
-e,–query <statement> | Import the results of statement. |
–split-by <column-name> | Column of the table used to split work units |
–table <table-name> | Table to read |
–target-dir <dir> | HDFS destination dir |
–warehouse-dir <dir> | HDFS parent for table destination |
–where <where clause> | WHERE clause to use during import |
-z,–compress | Enable compression |
–compression-codec <c> | Use Hadoop codec (default gzip) |
–null-string <null-string> | The string to be written for a null value for string columns |
–null-non-string <null-string> | The string to be written for a null value for non-string columns |
Incrementalimport arguments:
Argument | Description |
–check-column (col) | Specifies the column to be examined when determining which rows to import. |
–incremental (mode) | Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. |
–last-value (value) | Specifies the maximum value of the check column from the previous import. |
Output lineformatting arguments:
Argument | Description |
–enclosed-by <char> | Sets a required field enclosing character |
–escaped-by <char> | Sets the escape character |
–fields-terminated-by <char> | Sets the field separator character |
–lines-terminated-by <char> | Sets the end-of-line character |
–mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘ |
–optionally-enclosed-by <char> | Sets a field enclosing character |
Hivearguments:
Argument | Description |
–hive-home <dir> | Override $HIVE_HOME |
–hive-import | Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
–hive-overwrite | Overwrite existing data in the Hive table. |
–create-hive-table | If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
–hive-table <table-name> | Sets the table name to use when importing to Hive. |
–hive-drop-import-delims | Drops \n, \r, and \01 from string fields when importing to Hive. |
–hive-delims-replacement | Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
–hive-partition-key | Name of a hive field to partition are sharded on |
–hive-partition-value <v> | String-value that serves as partition key for this imported into hive in this job. |
–map-column-hive <map> | Override default mapping from SQL type to Hive type for configured columns. |
HBasearguments:
Argument | Description |
–column-family <family> | Sets the target column family for the import |
–hbase-create-table | If specified, create missing HBase tables |
–hbase-row-key <col> | Specifies which input column to use as the row key |
–hbase-table <table-name> | Specifies an HBase table to use as the target instead of HDFS |
Codegeneration arguments:
Argument | Description |
–bindir <dir> | Output directory for compiled objects |
–class-name <name> | Sets the generated class name. This overrides –package-name. When combined with –jar-file, sets the input class. |
–jar-file <file> | Disable code generation; use specified jar |
–outdir <dir> | Output directory for generated code |
–package-name <name> | Put auto-generated classes in this package |
–map-column-java <m> | Override default mapping from SQL type to Java type for configured columns. |
Sqoop 的详细介绍:请点这里
Sqoop 的下载地址:请点这里
相关阅读:
通过 Sqoop 实现 Mysql / Oracle 与 HDFS / Hbase 互导数据 http://www.linuxidc.com/Linux/2013-06/85817.htm
[Hadoop] Sqoop 安装过程详解 http://www.linuxidc.com/Linux/2013-05/84082.htm
用 Sqoop 进行 MySQL 和 HDFS 系统间的数据互导 http://www.linuxidc.com/Linux/2013-04/83447.htm
Hadoop Oozie 学习笔记 Oozie 不支持 Sqoop 问题解决 http://www.linuxidc.com/Linux/2012-08/67027.htm
Hadoop 生态系统搭建(hadoop hive hbase zookeeper oozie Sqoop)http://www.linuxidc.com/Linux/2012-03/55721.htm
Hadoop 学习全程记录——使用 Sqoop 将 MySQL 中数据导入到 Hive 中 http://www.linuxidc.com/Linux/2012-01/51993.htm