DataX使用入门
DataX 是阿里云 DataWorks數(shù)據(jù)集成 的開源版本,在阿里巴巴集團內(nèi)被廣泛使用的離線數(shù)據(jù)同步工具/平臺。DataX 實現(xiàn)了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各種異構數(shù)據(jù)源之間高效的數(shù)據(jù)同步功能。
一、datax需要python環(huán)境,需要先安裝python
 打開官網(wǎng) https://www.python.org/downloads/windows/ 下載中心
 
 此處下載2.6.5版本安裝
 安裝完成后使用python -V查看是否已安裝成功
二、下載datax
 方法一、直接下載DataX工具包:DataX下載地址
 http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
下載后解壓至本地某個目錄,進入bin目錄,即可運行同步作業(yè):
$ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json}方法二、下載DataX源碼,自己編譯:DataX源碼
 https://github.com/alibaba/DataX
datax的目錄結構
 
 bin目錄下是pytho腳本文件,主要用來執(zhí)行job文件(默認需要依賴Python2的環(huán)境,也可以修改為Python3)
conf目錄存放一些配置文件
job目錄下存放了一個job測試文件(我們通過datax-web生成的臨時job文件不會放在這里,而是在data-web里邊自己配置存放目錄)
lib是依賴的一些jar包
log目錄存放job文件的執(zhí)行日志
plugin目錄存放的是對不同數(shù)據(jù)源讀取(Reader)和寫入(Writer)的插件支持
如果沒有在plugin目錄下發(fā)現(xiàn)自己需要的Reader或者Writer則需要自己手動安裝(比如ES的Reader和Writer)。
使用Datax執(zhí)行job文件
python datax.py job文件txt文件傳向mysql的Job文件模板如下(Mysql為例):
{"job": {"content": [{"reader": {"name": "txtfilereader","parameter": {"column": [{"index": 0,"type": "long"},{"index": 1,"type": "string"},{"index": 2,"type": "string"},{"index": 3,"type": "string"},{"index": 4,"type": "string"},{"index": 5,"type": "string"},{"index": 6,"type": "string"},{"index": 7,"type": "string"},{"index": 8,"type": "string"},{"index": 9,"type": "date","format": "yyyy-MM-dd HH:mm:ss"},{"index": 10,"type": "string"},{"index": 11,"type": "date","format": "yyyy-MM-dd HH:mm:ss"},{"index": 12,"type": "long"}],"encoding": "UTF-8","fieldDelimiter": ",","path": ["C:/Users/jxk/Desktop/tst.txt"]}},"writer": {"name": "mysqlwriter","parameter": {"column": ["id","project_type","attach_type","attach_name","attach_url","attach_key","attach_hash","attach_size","created_by","created_date","last_updated_by","last_updated_date","version"],"connection": [{"jdbcUrl": "jdbc:mysql://8.68.24.3:3306/testkettle?characterEncoding=utf-8&serverTimezone=Asia/Shanghai","table": ["comm_attachment"]}],"password": "274100","preSql": ["delete from comm_attachment"],"session": [],"username": "root","writeMode": "insert"}}}],"setting": {"speed": {"channel": "5"}}} }C:/Users/jxk/Desktop/tst.txt文件內(nèi)容如下
1,sunnyDay,image/png,ttt.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1A, tst,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0 2,sunnyDay,image/png,ttb.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1A, tsb,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0數(shù)據(jù)庫建庫腳本如下
CREATE TABLE `comm_attachment` (`id` int NOT NULL AUTO_INCREMENT COMMENT '主鍵',`project_type` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '項目名-說明該附件是屬于哪個項目的',`attach_type` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件類型',`attach_name` varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件名',`attach_url` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件下載地址',`attach_key` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件key',`attach_hash` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件hash',`attach_size` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件大小',`created_by` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '創(chuàng)建人',`created_date` timestamp NULL DEFAULT NULL COMMENT '創(chuàng)建時間',`last_updated_by` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '最后修改人',`last_updated_date` timestamp NULL DEFAULT NULL COMMENT '最后修改時間',`version` int DEFAULT NULL COMMENT '樂觀鎖-版本號',PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci COMMENT='附件表'python執(zhí)行語句
 
python datax.py C:\Users\jxk\Desktop\abc.json
執(zhí)行結果:
 
 在數(shù)據(jù)庫查看數(shù)據(jù):
 
語句執(zhí)行過程中可能遇到的問題:
 問題描述:在使用Datax插件將數(shù)據(jù)從hive導入mysql時,發(fā)現(xiàn)寫入MySQL報錯 :Could not retrieve transation read-only status server
 匹配數(shù)據(jù)庫和應用中數(shù)據(jù)庫驅(qū)動版本(mysql驅(qū)動版本不一致) ----
 -查看MySQL版本:
-查看Datax插件MySQL驅(qū)動版本:
/datax/plugin/writer/mysqlwriter/libs$ ls mysql-connector* mysql-connector-java-5.1.34.jar下載對應的MySQL驅(qū)動版本:https://static.runoob.com/download/mysql-connector-java-8.0.16.jar
Illegalunsupported escape sequence near index 3
 注意json文件中的路徑書寫
 正確解析:
錯誤寫法:
C:\\Users\\jxk\\Desktop\\tst.txt總結
 
                            
                        - 上一篇: 图片管理系统空间 php,自建图片网站
- 下一篇: 影视并购,是谁写的万能故事大纲?
