? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Sqoop

1 什么是Sqoop

sqoop 是Apache的一款“Hadoop 和關(guān)系數(shù)據(jù)庫服務器之間傳送數(shù)據(jù)”的工具，Sqoop 的本質(zhì)是一個命令行工具。

2 Sqoop產(chǎn)生背景

早期數(shù)據(jù)存儲是基于傳統(tǒng)的關(guān)系型數(shù)據(jù)庫的，但是隨著數(shù)據(jù)量的增長，傳統(tǒng)的數(shù)據(jù)存儲方式無法滿足需求，隨著出現(xiàn)的HDFS分布式文件存儲系統(tǒng)等解決方案。那么sqoop就解決了傳統(tǒng)的關(guān)系型數(shù)據(jù)庫的數(shù)據(jù)要遷移到大數(shù)據(jù)平臺的問題。

3 數(shù)據(jù)導入方向

以大數(shù)據(jù)平臺為中心

數(shù)據(jù)遷入import? ：mysql|oracle等關(guān)系型數(shù)據(jù)量 ----> hadoop平臺（hdfs，hive，hbase等）

數(shù)據(jù)遷出 export? ： hadoop平臺數(shù)據(jù)----》 mysql|oracle

4?Sqoop的本質(zhì)（工作機制）

將導入或?qū)С雒罘g成 MapReduce 程序來實現(xiàn)，在翻譯出的MapReduce 中主要是對 InputFormat 和 OutputFormat 進行定制

（1）sqoop數(shù)據(jù)遷入：從mysql讀取數(shù)據(jù) ?將數(shù)據(jù)輸出到hdfs

map端：

? ? ? ? ? ?默認的FileInputFormat是針對文本的，需要重新定義輸入為DBInputFormat

? ? ? ? ? ?將數(shù)據(jù)庫的數(shù)據(jù)，一行一行的讀取過來

? ? ? ? ? ?map（） {輸出}

重新定義了數(shù)據(jù)輸入類 InputFormat，只需要maptask?

（2）sqoop數(shù)據(jù)遷出：從hdfs 讀取數(shù)據(jù)，將數(shù)據(jù)遷出到 mysql中

map端： ??

? ? ? ? ? ?輸入不用變，F(xiàn)ileInputFormat ? 一行一行的讀取過來

? ? ? ? ? ?map（）{

? ? ? ? ? ?? ? ? ? ? ?將數(shù)據(jù)輸出到mysql中

? ? ? ? ? ?? ? ? ? ? ?重新定義 ?OutputFormat--> DBOutputFormat ?

? ? ? ? ? ?}
? ? 重新定義了OutputFormat，只需要maptask

5 Sqoop安裝

（1）上傳，解壓，配置環(huán)境變量

（2）修改配置文件：mv sqoop-env-template.sh sqoop-env.sh?

#Set path to where bin/hadoop is availableexport HADOOP_COMMON_HOME=/home/refuel/opt/module/hadoop-2.7.7#Set path to where hadoop-*-core.jar is availableexport HADOOP_MAPRED_HOME=/home/refuel/opt/module/hadoop-2.7.7#set the path to where bin/hbase is availableexport HBASE_HOME=/home/home/refuel/opt/module/hbase-1.2.6#Set the path to where bin/hive is availableexport HIVE_HOME=/home/home/refuel/opt/module/apache-hive-2.3.2-bin#Set the path for where zookeper config dir isexport ZOOCFGDIR=/home/refuel/opt/module/zookeeper-3.4.10/conf

由于CDH版本的Haoop的Yarn和haoop是分開安裝的，所以有兩個haoop目錄需要配置

（3）將mysql的驅(qū)動添加到sqoop的lib下

（4）測試

查看mysql中所有數(shù)據(jù)庫sqoop list-databases --connect jdbc:mysql://bigdata01:3306/ --username root --password 123456

6 Sqoop操作

6.1 數(shù)據(jù)導入

參數(shù)說明：--connect     指定mysql連接--password    指定mysql的密碼--username    指定mysql的用戶名-m   指定maptask的任務數(shù)量 --target-dir   指定hdfs的輸出目錄--fields-terminated-by  指定輸入到hdfs的文件的字段之間的分隔符--columns      指定需要導入的列--where        指定過濾條件--split-by    指定切分maptask的依據(jù)字段的--table       指定mysql中需要導入到大數(shù)據(jù)平臺的表名

（1）Mysql導入到Hdfs

①普通導入

sqoop import --connect jdbc:mysql://bigdata01:3306/mysql --username root --password 123456 --table help_keyword -m 1默認輸出目錄  /user/用戶名/表名默認的分隔符 ，

②指定輸出目錄和分隔符?

sqoop import --connect jdbc:mysql://bigdata01:3306/mysql --username root --password 123456 --table help_keyword --target-dir /user/data01/mydata/help_keyword --fields-terminated-by '\t' -m 1

③指定需要導入的字段

// 查詢指定列sqoop import   --connect jdbc:mysql://bigdata01:3306/mysql   --username root  --password 123456 --columns "name" --table help_keyword  --target-dir /user/sqoop/outputdata01  -m 1

④指定過濾條件

sqoop import   --connect jdbc:mysql://bigdata01:3306/mysql   --username root  --password 123456   --columns "name" --where "help_keyword_id>100" --table help_keyword  --target-dir /user/sqoop/outputdata02  -m 1

⑤指定sql查詢語句結(jié)果導入

sqoop import   --connect jdbc:mysql://bigdata01:3306/mysql   --username root  --password 123456   --query 'select * from help_keyword where  help_keyword_id>200 and $CONDITIONS' --target-dir /user/sqoop/outputdata03  -m 1注意：    ① --query 和  --where  --columns  --table不可以一起使用的    ②報錯    Query [select * from help_keyword where help_keyword_id>200] must contain '$CONDITIONS' in WHERE clause    $CONDITIONS 沒有實際含義，但語法要求要有    ③)使用單引號

⑥指定啟動多個maptask任務

sqoop import   --connect jdbc:mysql://bigdata01:3306/  --username root  --password 123456   --target-dir /user/sqoop/outputdata04  --query 'select * from mysql.help_keyword where help_keyword_id > 100 and $CONDITIONS' --split-by  help_keyword_id --fields-terminated-by '\t'  -m 3注意： -m是多個   必須配合 --split-by 整型，可以是自增主鍵  須配合plit-by不然會報錯，多個maptask任務數(shù)據(jù)的劃分是先獲取最小id，然后獲取最大id (最大id-最小id  1 ) /maptask 求的是每一個maptask分配的數(shù)據(jù)條數(shù)，每一個maptask順序獲取數(shù)據(jù)，所以也是有可能造成數(shù)據(jù)傾斜的

（2）Mysql導入到Hive?

步驟：①先把這個數(shù)據(jù)導入到hdfs的默認路徑下；②在hive中建表；③將hdfs的文件 ?加載到hive的表中

注意：在sqoop導入數(shù)據(jù)到hive 時候,默認會建表，但是數(shù)據(jù)庫不會創(chuàng)建的，數(shù)據(jù)庫需要手動創(chuàng)建的

參數(shù)說明--hive-import   指定導入到hive中--hive-overwrite   覆蓋導入--hive-database  指定數(shù)據(jù)庫--hive-table 指定hive中的表

①普通導入

sqoop import   --connect jdbc:mysql://bigdata01:3306/mysql   --username root  --password 123456   --table help_keyword   --hive-import -m 1

②指定導入的hive的庫和表

sqoop import  --connect jdbc:mysql://bigdata01:3306/mysql  --username root  --password 123456  --table help_keyword  --fields-terminated-by "\t"  --lines-terminated-by "\n"  --hive-import  --hive-overwrite  --create-hive-table  --delete-target-dir --hive-database  test_sqoop --hive-table new_help_keyword

（3）Mysql導入到HBase

參數(shù)說明--hbase-table  指定hbase的表名--column-family 指定hbase對應的列族下--hbase-row-key 指定hbase的rowkey 注意：hbase中的表  需要手動創(chuàng)建的create "new_help_keyword","info"

sqoop import --connect jdbc:mysql://bigdata01:3306/mysql --username root --password 123456 --table help_keyword --hbase-table new_help_keyword --column-family info --hbase-row-key help_keyword_id

（4）增量數(shù)據(jù)導入

全量數(shù)據(jù)導入：每次導入全部的數(shù)據(jù)

增量數(shù)據(jù)導入：每次導入新增的數(shù)據(jù)

三個重要參數(shù)Incremental import arguments:   --check-column <column>        Source column to check for incremental change  校驗鍵，一般選主鍵，因為主鍵是自增   --incremental <import-type>    Define an incremental import of type 'append' or 'lastmodified'  指定增量導入的類型                append  追加        lastmodified  最后一次修改的時間或建   --last-value <value>           Last imported value in the incremental check column  指定上一次的最后最后一個增量的建的值，這次導入則是從這個值的下一個值開始導入

sqoop import   --connect jdbc:mysql://bigdata01:3306/mysql   --username root  --password 123456   --table help_keyword  --target-dir /user/outputdata06  --incremental  append  --check-column  help_keyword_id --last-value 200  -m 1

6.2 數(shù)據(jù)導出

（1）Hdfs導出到Mysql

sqoop export --connect jdbc:mysql://bigdata01:3306/sqoopdb  --username root --password 123456 --table sqoopfur --export-dir /user/outputdata03/help_keyword --fields-terminated-by '\t'--export-dir 指定導出的hdfs的路徑--fields-terminated-by  指定hdfs的文件的字段分割符--table  指定的導出的mysql中的表mysql中的數(shù)據(jù)庫和表都需要自己手動創(chuàng)建的create database sqoopdb;use sqoopdb;CREATE TABLE sqoopfur (    id INT,    name VARCHAR(60));

（2）Hive導出到Mysq

sqoop export --connect jdbc:mysql://bigdata01:3306/sqoopdb --username root --password 123456 --table uv_info --export-dir /user/hive/warehouse/test_sqoop.db/new_help_keyword --input-fields-terminated-by '\t'--export-dir   指定hive表所在的hdfs的路徑--input-fields-terminated-by  指定hive的表文件的分隔符CREATE TABLE uv_info (    id INT,    name VARCHAR(60));

（3）HBase導入Mysql

沒有一種直接的方式可以hbase的數(shù)據(jù)直接導出mysql中?？梢詫base和hive整合，導出mysql?

來源：https://www.icode9.com/content-4-406851.html

本站僅提供存儲服務，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版