• 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏吧

Hive_语法_行列转换

hive 来源:lang 3次浏览

[TOC]

用Hive处理数据的时候经常会遇到行列互相转换的需求,总结并记录一下行列转行的常见场景和操作语法

其中所有的操作都可以直接复制语句去自己的hive执行,查看结果

前置知识

执行hive或beeline进入,执行
desc function explode;查看函数说明;

explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
    可以将数组炸开成多行,或者将map炸开成多行多列,是Hive内置的UDTF

split(str, regex) - Splits str around occurances that match regex
    按照正则规则去切割字符串
    
collect_list(x) - Returns a list of objects with duplicates
    返回不去重的集合
    
collect_set(x) - Returns a set of objects with duplicate elements eliminated
    返回一个去重的集合

concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator
    返回一个特定分隔符的拼接字符串

max(expr) - Returns the maximum value of expr
    返回表达式的最大值

准备数据

创建一个name,subject,score的期末考试成绩表,代表每个同学每个学科的成绩

导入数据也可以自定义文本然后load也行,sql语句一步到位,测试简单方便

  • 用于行专列的数据,后称表1
create table school_final_test as
select 'jack' as name, 'english' as subject, 70 as score union all
select 'jack' as name, 'math' as subject, 80 as score union all
select 'jack' as name, 'chinese' as subject, 90 as score union all
select 'tim' as name, 'english' as subject, 10 as score union all
select 'tim' as name, 'math' as subject, 20 as score union all
select 'tim' as name, 'chinese' as subject, 30 as score;

表1数据:

name subject score
jack english 70
jack math 80
jack chinese 90
tim english 10
tim math 20
tim chinese 30
  • 用于列转行的数据,后称表2
create table school_final_test1 as
select 'jack' as name, 70 as english,80 as math, 90 as chinese union all
select 'tim' as name, 10 as english,20 as math, 30 as chinese;

表2数据:

name english math chinese
jack 70 80 90
tim 10 20 30

开始测试

需求1

多行转多列,数据源是表1

结果表:

name english math chinese
jack 70 80 90
tim 10 20 30
  • group by + max + case when语法
  • 巧妙的使用了数值型和0之间的大小关系
select name,
max(case subject when 'english' then score else 0 end) as english,
max(case subject when 'math' then score else 0 end) as math,
max(case subject when 'chinese' then score else 0 end) as chinese
from school_final_test
group by name;
  • 疑问: 如果需要多行转多列的不是数值型,而是str类型是否仍可行呢?
select max(str) from
    (select 'str' as str union all
    select 'sts' as str union all
    select null as str)t1; -- result : sts
  • 很明显,如果是字符串,可以使用null作为默认值,那么max仍然可行

需求2

多行转单列,数据源是表1

结果表:

name scores
jack english:70,math:80,chinese:90
tim english:10,math:20,chinese:30
  • group by + collect_list + concat_ws语法
  • 注意:函数中要求的类型
select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;

需求3

多列转多行,数据源是表2

结果表:

name subject score
jack english 70
jack math 80
jack chinese 90
tim english 10
tim math 20
tim chinese 30
  • 将每列需要的数据单独提取出来合并即可
  • 注意: union和union all的区别是是否去重
select name,'english' as subject,english as score from school_final_test1 
union all 
select name,'math' as subject,math as score from school_final_test1 
union all 
select name,'chinese' as subject,chinese as score from school_final_test1;

需求4

单行转多列

数据源:

create table school_final_test2 as
select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;
name scores
jack english:70,math:80,chinese:90
tim english:10,math:20,chinese:30

结果表:

name scores
jack english:70
jack math:80
jack chinese:90
tim english:10
tim math:20
tim chinese:30
  • split + explode语法

Lateral View是一个连接用户自定义函数(UDFs)的连接词,可以有多个;如果炸裂的列中有null值但是需要显示可以在view后面加outer,类似表的左外连接;

  • 其中table1表示开窗得到的新表别名,必须写但可不使用(不使用时必须保证没有重复的列名)
select name,table1.scores as scores
from school_final_test2
lateral view explode(split(scores,',')) table1 as scores;
  • 如果需要将结果表转换成表1的形式,可以继续对scores字段按照:进行拆分,然后用数组角标方式{{{arrya[index]}}}获取对应字段
select name,split(scores,':')[0] as subject,split(scores,':')[1] as score from (
    select name,table1.scores as scores
    from school_final_test2
    lateral view explode(split(scores,',')) table1 as scores
)t1;

需求5

将字符串"keyl=valuel&key2=value2...keyn=valuen"进行分割放到一个字段里面,可以查出任意一个keyx对应的valuex值

  • 核心思路就是将字符串内数据疯转在map结构中,可以使用内置函数str_to_map,如果有其他更复杂的操作可自定义UDF;
  • str_to_map(text, delimiter1, delimiter2) – Creates a map by parsing text; Split text into key-value pairs using two delimiters. The first delimiter seperates pairs, and the second delimiter sperates key and value. If only one parameter is given, default delimiters are used: ‘,’ as delimiter1 and ‘=’ as delimiter2.*
select result['key1'] from 
    (select str_to_map('key1=value1&key2=value2&keyn=valuen','&','=') as result)t1;

版权声明:本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。
喜欢 (0)