如何在阿里云上使用Data Lake Analytics分析Table Store数据
0. Data Lake Analytics(簡稱DLA)介紹
數(shù)據(jù)湖(Data Lake)是時下熱門的概念,更多閱讀可以參考:
https://en.wikipedia.org/wiki/Data_lake
基于數(shù)據(jù)湖,可以不用做任何ETL、數(shù)據(jù)搬遷等過程,實現(xiàn)跨各種異構(gòu)數(shù)據(jù)源進行大數(shù)據(jù)關(guān)聯(lián)分析,從而極大的節(jié)省成本和提升用戶體驗。
以及AWS和Azure關(guān)于Data Lake的解讀:
https://amazonaws-china.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
https://azure.microsoft.com/en-us/solutions/data-lake/
終于,阿里云現(xiàn)在也有了自己的數(shù)據(jù)湖分析產(chǎn)品:https://www.aliyun.com/product/datalakeanalytics
可以點擊申請使用(目前公測階段還屬于邀測模式),體驗本教程分析OTS數(shù)據(jù)之旅。
產(chǎn)品文檔:https://help.aliyun.com/product/70174.html
1. Table Store(簡稱OTS)介紹
關(guān)于Table Store的詳細介紹,請看:https://help.aliyun.com/document_detail/27280.html
2. DLA和OTS等存儲引擎間的關(guān)系
- DLA是上層MPP架構(gòu)的SQL執(zhí)行引擎,以MySQL語法作為語法API而實現(xiàn),解決實時OLAP分析需求;
- OTS是下層數(shù)據(jù)存儲引擎,基于LSM實現(xiàn),與HBase、BigTable有類似的設(shè)計和實現(xiàn);
- DLA支持連接多種存儲引擎,除了OTS,還有OSS、ADS、RDS等,并且這些引擎可以做實時混合分析;
- DLA在計算一個Query時,通過OTS核心接口,查詢下層數(shù)據(jù)并參與上層分析;
- DLA是大小寫不敏感的,而OTS是大小寫敏感的;
3. OTS和DLA元信息映射邏輯
- 庫和表等概念映射
| 實例(instance) | schema或database,不同的用戶不同的叫法 |
| 表(table) | table |
| 主鍵列(pk) | column,isPrimaryKey=true,isNullable=false |
| 非主鍵列(column) | column,isPrimaryKey=false,isNullable=<看用戶的DDL定義> |
- 字段的映射關(guān)系
| INTEGER(8bytes) | bigint(8bytes) |
| STRING | varchar |
| BINARY | varbinary |
| DOUBLE | double |
| BOOLEAN | byte |
4. 購買OTS的實例,并寫入數(shù)據(jù)
下面,我們開始真正的操作:
- 開通OTS服務(wù),并購買OTS實例:https://help.aliyun.com/document_detail/27293.html
- 進入OTS的管理控制臺,選擇合適的region,創(chuàng)建實例和表:https://ots.console.aliyun.com/index
- 當然,也可以選擇通過SDK來創(chuàng)建表并寫入數(shù)據(jù):https://help.aliyun.com/document_detail/43005.html
5. 查看OTS的實例,獲取關(guān)鍵信息
下面,我就以我們的測試數(shù)據(jù),來開啟整個過程(跳過具體的申請步驟):
1. 查看目前DLA已開通的Region,并確保與你的OTS在同一個Region:
2. 進入OTS管理控制臺,選擇杭州Region,查看我的實例(標準TPC-H生成的測試集,有8張表;已提前建好庫表,并通過SDK寫入了數(shù)據(jù)):
3. 查看實例信息,看到相關(guān)的__endpoint(DLA目前支持公網(wǎng),所以請選擇私網(wǎng))__,這里以hz-tpch-1x-vol作測試:
4.查看nation表定義(表名、主鍵名、主鍵類型、多主鍵順序等)和數(shù)據(jù),用作后續(xù)對比測試:
6. 用戶開通DLA賬號步驟:
- 用戶具備了阿里云賬號(主賬號);
- 用戶進入產(chǎn)品介紹頁,開通DLA并進入控制臺:https://www.aliyun.com/product/datalakeanalytics
- 等用戶開通之后,會在你的短信、站內(nèi)信、郵箱收到賬號相關(guān)的信息(內(nèi)容模板可能會升級):
- 用戶通過在頁面上查看一下,得到如下的訪問入口信息:
- 如下是基于mysql/jdbc方式通過公網(wǎng)經(jīng)典endpoint連接到dla杭州集群:
7. DLA和OTS網(wǎng)絡(luò)連通性問題
目前DLA和OTS服務(wù)之間,通過VPC相關(guān)的策略,是直接為用戶打通網(wǎng)絡(luò)環(huán)境的,用戶無需擔心這個過程。但DLA目前不支持公網(wǎng)訪問,請__務(wù)必使用OTS的VPC Endpoint!__
8. 使用DLA,連接你的OTS,進行查詢和分析
注:我們是多租戶場景的,所以新用戶剛進去時看不到任何庫表;
1)創(chuàng)建自己的DLA庫(相關(guān)信息從上述過程中查找):
mysql> create database hangzhou_ots_test with dbproperties (catalog = 'ots',location = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',instance = 'hz-tpch-1x-vol' );Query OK, 0 rows affected (0.23 sec)#hangzhou_ots_test ---請注意庫名,允許字母、數(shù)字、下劃線 #catalog = 'ots', ? ---指定為ots,是為了區(qū)分其他數(shù)據(jù)源,比如oss、rds等 #location = 'https://xxx' ? ---ots的endpoint,從實例上可以看到 #instance = 'hz-tpch-1x-vol' ? ---指定instance名,因為endpoint可以不帶實例名;最終映射到DLA的schema2)查看自己創(chuàng)建的庫:
mysql> show databases; +------------------------------+ | Database | +------------------------------+ | hangzhou_ots_test | +------------------------------+ 1 rows in set (0.22 sec)mysql> show create database hangzhou_ots_test; +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Database | Create Database | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | hangzhou_ots_test | CREATE DATABASE `hangzhou_ots_test` WITH DBPROPERTIES (CATALOG = 'ots',LOCATION = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',INSTANCE = 'hz-tpch-1x-vol' ) | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.31 sec)3)查看自己的DLA表:
mysql> use hangzhou_ots_test; Database changedmysql> show tables; Empty set (0.30 sec)4)創(chuàng)建DLA表,映射到OTS的表:
mysql> CREATE EXTERNAL TABLE `nation` (`N_NATIONKEY` int not NULL ,`N_COMMENT` varchar(100) NULL ,`N_NAME` varchar(100) NULL ,`N_REGIONKEY` int NULL ,PRIMARY KEY (`N_NATIONKEY`) ); Query OK, 0 rows affected (0.36 sec)## `N_NATIONKEY` int not NULL ---- 如果是主鍵的話,必須要not null ## PRIMARY KEY (`N_NATIONKEY`) ?---- 務(wù)必與ots中的主鍵順序相同;名稱的話也要對應(yīng)5)查看自己創(chuàng)建的表和相關(guān)的DDL語句:
mysql> show tables; +------------+ | Table_Name | +------------+ | nation | +------------+ 1 row in set (0.35 sec)mysql> show create table nation; +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | nation | CREATE EXTERNAL TABLE `nation` (`n_nationkey` int not NULL COMMENT '',`n_comment` varchar(100) NULL COMMENT '',`n_name` varchar(100) NULL COMMENT '',`n_regionkey` int NULL COMMENT '',PRIMARY KEY (`n_nationkey`) ) TBLPROPERTIES (COLUMN_MAPPING = 'n_nationkey,N_NATIONKEY; n_comment,N_COMMENT; n_name,N_NAME; n_regionkey,N_REGIONKEY; ') COMMENT '' | +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.30 sec)6)開始查詢和分析(沒有做太復(fù)雜的query;用戶可以分析自己的數(shù)據(jù),符合mysql的語法)
mysql> select count(*) from nation; +-------+ | count(*) | +-------+ | 25 | +-------+ 1 row in set (1.19 sec)mysql> select * from nation; +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+ | n_nationkey | n_comment | n_name | n_regionkey | +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+ | 0 | haggle. carefully final deposits detect slyly agai | ALGERIA | 0 | | 1 | al foxes promise slyly according to the regular accounts. bold requests alon | ARGENTINA | 1 | | 2 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special | BRAZIL | 1 | | 3 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold | CANADA | 1 | | 4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d | EGYPT | 4 | | 5 | ven packages wake quickly. regu | ETHIOPIA | 0 | | 6 | refully final requests. regular, ironi | FRANCE | 3 | | 7 | l platelets. regular accounts x-ray: unusual, regular acco | GERMANY | 3 | | 8 | ss excuses cajole slyly across the packages. deposits print aroun | INDIA | 2 | | 9 | slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull | INDONESIA | 2 | | 10 | efully alongside of the slyly final dependencies. | IRAN | 4 | | 11 | nic deposits boost atop the quickly final requests? quickly regula | IRAQ | 4 | | 12 | ously. final, express gifts cajole a | JAPAN | 2 | | 13 | ic deposits are blithely about the carefully regular pa | JORDAN | 4 | | 14 | pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t | KENYA | 0 | | 15 | rns. blithely bold courts among the closely regular packages use furiously bold platelets? | MOROCCO | 0 | | 16 | s. ironic, unusual asymptotes wake blithely r | MOZAMBIQUE | 0 | | 17 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun | PERU | 1 | | 18 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos | CHINA | 2 | | 19 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account | ROMANIA | 3 | | 20 | ts. silent requests haggle. closely express packages sleep across the blithely | SAUDI ARABIA | 4 | | 21 | hely enticingly express accounts. even, final | VIETNAM | 2 | | 22 | requests against the platelets use never according to the quickly regular pint | RUSSIA | 3 | | 23 | eans boost carefully special requests. accounts are. carefull | UNITED KINGDOM | 3 | | 24 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be | UNITED STATES | 1 | +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+ 25 rows in set (1.63 sec)從圖中的id,可以看到,與ots中的數(shù)據(jù)相同:
9. 其他相關(guān)的文檔參考:
- Data Lake Analytics使用場景:https://help.aliyun.com/document_detail/70380.html
- OLAP on TableStore——基于Data Lake Analytics的Serverless SQL大數(shù)據(jù)分析https://yq.aliyun.com/articles/618501
- 使用Data Lake Analytics從OSS清洗數(shù)據(jù)到AnalyticDB:https://yq.aliyun.com/articles/623401
- 使用Data Lake Analytics 分析OSS數(shù)據(jù):https://help.aliyun.com/document_detail/70387.html
- Data Lake Analytics數(shù)據(jù)庫的連接方式:https://help.aliyun.com/document_detail/71074.html
總結(jié)
以上是生活随笔為你收集整理的如何在阿里云上使用Data Lake Analytics分析Table Store数据的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python学习中的bug
- 下一篇: 我是如何使用python控制迅雷自动下载