Featured image of post Hive笔记

Hive笔记

取前X条记录相关:行编号

如果一个表的列A有多条记录,想取对于按列B排序下的前X条记录。一般使用row_number() over(partition by xxx order by yyy)

1
2
3
4
5
6
select xxxx
from (select xxxx,
    row_number() over(partition by A order by B) as row_num
    from tab_yyy
    )
where row_num <= X

字符串截取

从某处开始截取字符串,并且截取多少长度: substr(str, start, length)。值得注意的是这里的start的index是从1开始!而第二个参数是截取长度而非结束位置!第二个参数为可选,如果不填则截取到字符率结束

1
2
3
4
select substr('I love u', 3, 2) as res1,
       substr('I love u', 4) as res2

>>输出: res1: 'lo', res2: 'ove u'

Json相关

获取json中某个key的值,使用:.key。其中$符号代表json根节点

1
2
3
select get_json_object('{"viewdata": {"clickdata": 3}}', '$.viewdata.clickdata') as res

>>输出: 3

获取json list中index为x的元素的值,使用下标符号:[x]。如果是在根节点取index,在某些引擎可能需要对$加上\\转义(即$[0]\\$[0])来防止执行错误

1
2
3
select get_json_object('[{"viewdata": [{"clickdata": 1}, {"clickdata": 3}]}, {"viewdata": [{"clickdata": 18}]}]', 'S[0].viewdata[1].clickdata') as res

>>输出: 3

获取json list中每个元素中某个key的值的list,使用下标符号并且里面填*代表遍历所有元素:[*]。值得注意的是输出是string,而非list!如果有list需要可用substr去除最开始和结束的[]然后用split函数分割。如果是在根节点取遍历,在某些引擎可能需要对$加上\\转义(即$[0]\\$[0])来防止执行错误

1
2
3
select get_json_object{'[{"viewdata": {"clickdata": 3}},{"viewdata": {"clickdata": 18}}]', '$[*].viewdata.clickdata') as res

>>输出: [3,18]

日期格式转换

将时间str转为时间戳:unix_timestamp(dt, dt_format)

将时间戳转为时间str:from_unixtime(time_stamp, dt_format)

1
2
3
select from_unixtime(unix_timestamp('20230909', 'yyyyMMdd'), 'yyyy-MM-dd') as res

>>输出: 2023-09-09

列转行

explode: 将一个array或map字段转成多行

1
2
3
4
5
6
select explode(array('A', 'B', 'C')) as res

>>输出:
A
B
C
1
2
3
4
5
6
select explode(map('A',10, 'B',8, 'C',4)) as (k,v)

>>输出:
A 10
B 8
C 4

posexplode:同explode,但只能转array,相对的其会返回两个值:list中元素的位置n及元素本身

1
2
3
4
5
6
select posexplode(array('A', 'B', 'C')) as (n, res1)

>>输出:
0 A
1 B
2 C

lateral view:一般会和explode,posexplode这些列转行操作符同时使用,用来将列换行后的列和其他字段进行join,从而产生多行。格式一般类似lateral view explode(col_name) tmp_table_name as out_col_name

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
with example as (
    select 'aa' as col1,
    array('A', 'B', 'C') as co12,
    array('1', '2', '3') as col3,
    1 as co14
)

select col1, ele
from example
lateral view explode(col2) tmp_table as ele
where co14 = 1

>>输出:
aa A
aa B
aa C
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
with example as (
    select 'aa' as col1,
    array('A', 'B', 'C') as co12,
    array('1', '2', '3') as col3,
    1 as co14
)

select coll, ele, col3[pos] as ele2
from example
lateral view posexplode(co12) tmp_table as pos, ele
where co14 = 1

>>输出:
aa A 1
aa B 2
aa C 3
Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy