sqoop、datax几种导入导出
生活随笔
收集整理的這篇文章主要介紹了
sqoop、datax几种导入导出
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、sqoop
1、列出數據庫
sqoop list-databases \ --connect jdbc:mysql://hadoop01:3306 \ --username root --password 123456;2、列出表
sqoop list-tables \ --connect jdbc:mysql://hadoop01:3306/test \ --username root --password 123456;3、本地密碼
本地文件: file:///home/user/password hdfs文件: /user/user/password 注意: 密碼文件直接vim編輯會報錯, 應采用重定向追加或覆蓋寫入操作 echo -n "password" >> /home/taojiamin/data/passwd.pwd && chmod 400 /home/taojiamin/data/passwd.pwdecho -n 不換行輸出;bin/sqoop import \ --connect jdbc:mysql://com.hadoop05:3306/testdb \ --username root \ --password-file file:///home/hadoop/mypasswd \ --table user \ --target-dir /sqoop/input \ -m 1 \ --fields-terminated-by '\t' \ --delete-target-dir4、mysql導入數據到hdfs
1)全量導入
sqoop import --connect jdbc:mysql://hadoop01:3306/test \ --driver com.mysql.jdbc.Driver \ --username root --password 123456 \ ##--table user \ --query “select * from user where \$CONDITIONS” \ -m 1 \ --split-by user_id\ --fields-terminated-by '\t' \ --compress \ --compression-codec com.hadoop.compression.lzo.LzoCodec \ --hive-drop-import-delims \ --null-string “\\N” \ --null-non-string “\\N” \ --columns 'user_id,user_name,trade_time' \ --delete-target-dir \ --target-dir hdfs://hadoop01:9000/sqooptest/mysql2hdfs| --connect | 指定連接jdbc端口和數據庫名稱 |
| --driver | mysql驅動 |
| --table | mysql表名 |
| --query | 查詢語句后面必須有where條件\$CONDITIONS相當于1=0 |
| -m | 指定mapper數量,默認為4 |
| --split-by | 數據切分字段,默認主鍵 ,無主鍵必須指定 |
| --fields-terminated-by | 指定數據分隔符,默認逗號 |
| --compress | 開啟壓縮,默認gz,不寫compress則不啟 |
| --compression-codec | 指定壓縮格式 |
| --hive-drop-import-delims | 刪除數據中包含的Hive默認分隔符(^A, ^B, \n) |
| --null-string | 指定string類型空值的替換符 |
| --null-non-string | 指定非string類型空值的替換符 |
| --columns | 數據庫字段,可不寫 |
| --delete-target-dir | 如果目標目錄已存在,則先刪除 |
| --target-dir | 導入目標目錄 |
2)增量導入
①Append方式
sqoop import \--connect jdbc:mysql://hadoop01:3306/test \--username root \--password 123456 \--query “select order_id, name from user where \$CONDITIONS” \--target-dir hdfs://hadoop01:9000/sqooptest/mysql2hdfs \ --split-by user_id\-m 1\--incremental append \--check-column user_id \--last-value 0| --incremental append | 基于遞增列的增量導入(將遞增列值大于閾值的所有數據增量導入Hadoop |
| --check-column | 遞增列(int) |
| --last-value | 閾值(int) |
②lastModify方式
sqoop import \--connect jdbc:mysql://192.168.xxx.xxx:3316/testdb \--username root \--password transwarp \--query “select order_id, name from order_table where \$CONDITIONS” \--target-dir /user/root/order_all \ --split-by id \-m 4 \--incremental lastmodified \--merge-key order_id \--check-column time \# remember this date !!!--last-value “2014-11-09 21:00:00”| --incremental lastmodified | 基于時間列的增量導入(將時間列大于等于閾值的所有數據增量導入Hadoop) |
| --check-column | 時間列 |
| --last-value | 閾值 |
| –merge-key | 合并列(主鍵,合并鍵值相同的記錄) |
5、hdfs普通文件(無壓縮)導入數據到mysql
sqoop export \ --connect jdbc:mysql://hadoop01:3306/test \ --username root \ --password 123456 \ --table u2 \ --input-fields-terminated-by ',' --export-dir hdfs://hadoop01:9000/sqooptest/mysql2hdfs2/*\6、mysql導入數據到hive(無壓縮)
sqoop import --connect jdbc:mysql://hadoop01:3306/test \ --driver com.mysql.jdbc.Driver \ --username root --password 123456 \ --table user \ --columns 'user_id,user_name,trade_time' \ --null-string '\\N' \ --null-non-string 0 \ --delete-target-dir \ --fields-terminated-by '\t' \ --hive-import \ --hive-overwrite \ --hive-table qf24.sq17、增量導入
append:
sqoop import --connect jdbc:mysql://hadoop01:3306/test \ --driver com.mysql.jdbc.Driver \ --username root --password 123456 \ --table user \ -m 1 \ --incremental append \ --check-column id \ --last-value 0 \ --target-dir hdfs://hadoop01:9000/sqooptest/mysql2hdfslastmodified(使用時間戳):
在這里插入代碼片8、添加到sqoop job
sqoop job:
--create <job-id> Create a new saved job--delete <job-id> Delete a saved job--exec <job-id> Run a saved job--help Print usage instructions--list List saved jobs--meta-connect <jdbc-uri> Specify JDBC connect string for themetastore--show <job-id> Show the parameters for a saved job--verbose Print more information while workingsqoop job --list 查看sqoop job列表添加job:
sqoop job --create myjob -- import --connect jdbc:mysql://hadoop01:3306/test \ --driver com.mysql.jdbc.Driver \ --username root --password 123456 \ --table user \ -m 1 \ --incremental append \ --check-column id \ --last-value 0 \ --target-dir hdfs://hadoop01:9000/sqooptest/mysql2hdfs執行job
sqoop job --exec myjob如下兩個使用于有主鍵的表的數據更新:
--update-key --update-mode allowinsert二、datax
注:需要提前建好目錄和表
運行:python /home/hadoop/app/datax/datax/bin/datax.py mysql2hive.json
1、mysql導入hdfs
vi mysql2hdfs.json
{"job": {"content": [{"reader": {"name": "mysqlreader", "parameter": {"column": ["user_id","user_name","trade_time"], "connection": [{"jdbcUrl": ["jdbc:mysql://hadoop02:3306/test"], "table": ["user"]}], "username": "root", "password": "123456"}}, "writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://hadoop02:9000","fileType": "orc","path": "/datax/mysql2hdfs/userorc","fileName": "m2huser","column": [{"name": "col1","type": "BIGINT"},{"name": "col2","type": "STRING"},{"name": "col3","type": "STRING"}],"writeMode": "append","fieldDelimiter": "\t","compress":"NONE"}}}], "setting": {"speed": {"channel": "1"}}} }2、hdfs導入mysql
mysql的數值型(int,bigint、long等),在hdfs導入mysql是type為LONG
vi hdfs2mysql.json:
3、mysql導入hive
vi hdfs2hive.json:
{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"column": ["user_id","user_name","trade_time"], "connection": [{"jdbcUrl": ["jdbc:mysql://hadoop01:3306/test"],"table": ["u2"]}], "username": "root","password": "123456"}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://hadoop02:9000","fileType": "text","path": "/user/hive/warehouse/qf24.db/dataxhive1","fileName": "user","column": [{"name": "col1","type": "BIGINT"},{"name": "col2","type": "STRING"},{"name": "col3","type": "STRING"}],"writeMode": "append","fieldDelimiter": "\t"}}}], "setting": {"speed": {"channel": "1"}}} }4、hive導入mysql
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的sqoop、datax几种导入导出的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: WEB页面导出为EXCEL文档的方法
- 下一篇: 找不到dll原因