MapReduce基础开发之七Hive外部表分区
MapReduce輸出的數據在實際應用中會加載在不同子目錄下,比如按照日期、地區等,然后統一到外部表,這就需要用到hive表的分區。
假設輸出的數據放在/tmp/fjs/dpi父目錄下,下面分別有三個類型的數據作為子目錄保存,分別是/tmp/fjs/dpi/adsl、/tmp/fjs/dpi/3g、/tmp/fjs/dpi/4g,具體分區如下:
Hive表:tmp_fjs_dpi
1)?建表腳本:
設置分區為DPI類型,分別是adsl、3g、4g
CREATE EXTERNAL TABLE`tmp_fjs_dpi`(
? `account` string,
`date` string,
`city` string,
`domain` string,
? `para` string,
? `value` string)
PARTITIONED BY (
? `dpitype` string)
ROW FORMAT DELIMITED
? FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
? 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
?'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
? 'hdfs://nameservice-ha/tmp/fjs/dpi';
2)分區腳本:
查看分區:show partitionstmp_fjs_dpi;
增加分區:
alter table tmp_fjs_dpiadd partition (dpitype ='adsl') location 'adsl';
alter table tmp_fjs_dpiadd partition (dpitype ='3g') location '3g';
alter table tmp_fjs_dpiadd partition (dpitype ='4g') location '4g';
刪除分區:
alter table tmp_fjs_dpidrop partition (dpitype ='4g');
3)查詢結果:
select count(*) fromtmp_fjs_dpi where dpitype='adsl';
結果:92564960
select count(*) fromtmp_fjs_dpi where dpitype='3g';
結果:636277
select count(*) fromtmp_fjs_dpi where dpitype='4g';
結果:3828847
select para, count(*)from tmp_fjs_dpi group by para;
結果:
imei??? 97030084
?
總結
以上是生活随笔為你收集整理的MapReduce基础开发之七Hive外部表分区的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 算法导论之动态规划(最长公共子序列和最优
- 下一篇: centos配置yum本地源