hive里经常需要将字符串转化为date或timestamp 或者转化为日期格式的string
先说一个简单的 cast(xx as date/string/timestamp) 这个大多情况都可以用
1.to_date
to_date只保留年月日,参数必须是string类型的yyyy-MM-dd HH:mm:ss或者date或timestamp类型,返回值是date类型,注意这个返回类型,这个是少数返回date类型的函数
---2023-03-21----补充下
这个函数还有一个特别好用的用法
select to_date('2022-1-2') ->也是可以转化为date的。 由此引申
select to_date(replace('2022/1/2','/','-')) 得到标准的date 由此再引申
select date_format(to_date(replace('2022/1/2','/','-')),'yyyy-MM-dd') 转化非标准时间为标准
select "to_date('2009-07-30 04:17:52')",to_date('2009-07-30 04:17:52') union all
select "to_date('2009-07-30 04')",to_date('2009-07-30 04') union all
select "to_date(current_date)",to_date(current_date) union all
select "to_date(current_timestamp)",to_date(current_timestamp)
2.date_format
date_format(date/timestamp/string, fmt) - converts a date/timestamp/string to a value of string in the format specified by the date format fmt.
date_format 参数1可以是date timestamp 和string类型,第二个是format格式(yyyy-MM-dd hh:mm:ss),返回值是string。
select "date_format('2009-07-30 12:13:14','yyyyMMddHHmmss')", date_format('2009-07-30 12:13:14','yyyyMMddHHmmss') union all
select "date_format(current_date,'yyyy-MM-dd HH:mm:ss')", date_format(current_date,'yyyy-MM-dd HH:mm:ss') union all
select "date_format(current_timestamp,'yyyy-MM-dd HH:mm:ss') ",date_format(current_timestamp,'yyyy-MM-dd HH:mm:ss')
3.unix_timestamp 和
unix_timestamp(date[, pattern]) - Converts the time to a number
这个是把时间转化为时间戳的 也就是我们常说的 时间到1970-01-01 过了多少ms,返回值bigint
from_unixtime(unix_time, format) - returns unix_time in the specified format
这个就是把时间戳转化为时间格式
select unix_timestamp(current_date) --1671379200
select unix_timestamp('20221219','yyyyMMdd') --1671379200
select unix_timestamp('20221219','yyyyMM') --4843814400 注意这里错了,格式一定要对
select from_unixtime(1671379200,'yyyyMM')--202212
这两个函数一般联合使用。
select "from_unixtime(UNIX_TIMESTAMP('20221219','yyyyMMdd'))",from_unixtime(UNIX_TIMESTAMP('20221219','yyyyMMdd'))union all
select "from_unixtime(UNIX_TIMESTAMP('2022/12/19','yyyy/MM/dd'))",from_unixtime(UNIX_TIMESTAMP('2022/12/19','yyyy/MM/dd'))
4.date_add
date_add(start_date, num_days) - Returns the date that is num_days after start_date.
start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored.
可以是string类型的日期,或者date类型或timestamp类型,返回值为date类型
select "date_add('2022-12-12',1)",date_add('2022-12-12',1) union all
select "date_add('2022-12-12 12:00:00', 1)",date_add('2022-12-12 12:00:00', 1)union all
select "date_add('2022/12/12 12:00:00', 1)",date_add('2022/12/12 12:00:00', 1) union all
select "date_add(CURRENT_DATE,1)",date_add(CURRENT_DATE,1) union all
select "date_add(CURRENT_timestamp,1)",date_add(CURRENT_timestamp,1)
5.add_months
add_months(start_date, num_months, output_date_format) - Returns the date that is num_months after start_date.
start_date is a string or timestamp indicating a valid date. num_months is a number. output_date_format is an optional String which specifies the format for output.
The default output format is 'YYYY-MM-dd'.
这个函数第一个入参是string类型的日期,或date,或者timestamp,第二个是增加的月份可以为负数,第三个是转化的类型,最后返回值是string类型
select "add_months('2009-07-30',1)",add_months('2009-07-30',1) union all
select "add_months(current_date,1)",add_months(current_date,1) union all
select "add_months(current_timestamp,1)",add_months(current_timestamp,1) union all
select "add_months(current_date,1,'yyyy-MM-dd HH:mm:ss')",add_months(current_date,1,'yyyy-MM-dd HH:mm:ss') union all
select "add_months('2009-07-30 12:13:14',1,'yyyyMMdd')",add_months('2009-07-30 12:13:14',1,'yyyyMMdd')
其实这个函数应该是分为两部分,第一部分计算增加月份,第二部分格式化date_format
---------------这么快有人收藏那我继续更新下-----------------------
6.datediff
datediff(date1, date2) - Returns the number of days between date1 and date2
date1 and date2 are strings in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time parts are ignored.If date1 is earlier than date2, the result is negative.
返回值是int 类型,比较的是天数,忽略时分秒
select 'datediff(current_date(),current_timestamp())',datediff(current_date(),current_timestamp()) union all
select "datediff('2022-01-02','2022-01-03')",datediff('2022-01-02','2022-01-03') union all
select "datediff('2022-01-02',current_date)",datediff('2022-01-02',current_date)
7.month_between()
months_between(date1, date2, roundOff) - returns number of months between dates date1 and date2
If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2.
date1 and date2 type can be date, timestamp or string in the format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal places by default. Set roundOff=false otherwise.
月份比较,入参可以是string,date和timestamp,返回的是一个8精度的decimal
注意如果都是第一天或者都是最后一天,返回的是整数,否则就按差值除以31.
select "months_between('2022-12-20','2022-01-20')",months_between('2022-12-20','2022-01-20') -- 都是20号所以是整数 11个月
union all
select "months_between('2022-12-20','2022-12-05')",months_between('2022-12-20','2022-12-05') -- =0.48387097 =15/31
union all
select "months_between('2022-12-20','2022-10-05')",months_between('2022-12-20','2022-10-05') -- =2.48387097=2+15/31
8.day/month/year
"day(param) - Returns the day of the month of date/timestamp, or day component of interval"
Synonyms: dayofmonth
param can be one of:
1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.
2. A date value
3. A timestamp value
4. A day-time interval valueExample:
> SELECT day('2009-07-30') FROM src LIMIT 1;
30
Function class:org.apache.hadoop.hive.ql.udf.UDFDayOfMonth
Function type:BUILTIN
说了哪些内容。
1.和dayofmonth 相同
2.可以是string date timestamp
3.注意返回值是int
month year 这两个函数和day都差不多
9.dayofmonth/DAYOFWEEK
同8
10.floor_day/floor_month/floor_quarter/floor_year
floor_day(param) - Returns the timestamp at a day granularity
param needs to be a timestamp value
Example:
> SELECT floor_day(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src;
yyyy-MM-dd 00:00:00
Function class:org.apache.hadoop.hive.ql.udf.UDFDateFloorDay
Function type:BUILTIN
注意事项
1. 入参必须是timestamp,必须! 其他函数date ts string 都可以就这类函数不行
2.返回值也是timestamp
其实很好记住。因为有floor_second 你不是timestamp,没有毫秒 怎么floor_second
select 'current_timestamp`()' ,`current_timestamp`() union all select 'floor_year(current_timestamp())',floor_year(current_timestamp()) union all select 'floor_quarter(`current_timestamp`())', floor_quarter(`current_timestamp`()) union all select 'floor_month(`current_timestamp`())', floor_month(`current_timestamp`()) union all select 'floor_week(`current_timestamp`())' ,floor_week(`current_timestamp`()) union all select 'floor_day(`current_timestamp`())', floor_day(`current_timestamp`()) union all select 'floor_hour(`current_timestamp`())', floor_hour(`current_timestamp`()) union all select 'floor_minute(`current_timestamp`())', floor_minute(`current_timestamp`()) union all select 'floor_second(`current_timestamp`())', floor_second(`current_timestamp`())
其实trunc函数也可以上述的作用
但是注意
1.入参比较随便string date ts 都行
2.返回值为string
3.功能比较简单只有年月季度,且只有如下几种写法
select 'current_date`()' ,date_format(`current_date`(),'yyyy-MM-dd') union all select "trunc(`current_date`(),'YYYY')", trunc(`current_date`(),'YYYY') union all select "trunc(`current_date`(),'YY')", trunc(`current_date`(),'YY') union all select "trunc(`current_date`(),'YEAR')", trunc(`current_date`(),'YEAR') union all select "trunc(`current_date`(),'MONTH')", trunc(`current_date`(),'MONTH') union all select "trunc(`current_date`(),'MM')", trunc(`current_date`(),'MM') union all select "trunc(`current_date`(),'MON')", trunc(`current_date`(),'MON') union all select "trunc(`current_date`(),'QUARTER')", trunc(`current_date`(),'QUARTER') union all select "trunc(`current_date`(),'Q')", trunc(`current_date`(),'Q')