hive decimal bug， nvl(decimal,1)=0-天翼云

hive decimal bug， nvl(decimal,1)=0

2023-08-02 05:54:57 阅读次数：70

话不多说直接上图 hive版本为2.1.2-cdh6.3.2

hive decimal bug， nvl(decimal,1)=0

一个nvl (decimal,1)结果=0 怎么就出现bug了呢？？？

按道理不是两列值相等么，真是服了，怎么这么多bug，直接百度再说，结果是我的文章，，难道要我自己借鉴自己？

hive decimal bug， nvl(decimal,1)=0

——————————————————————————————————————————

将decimal转string

继续研究，根据我上次研究的问题，decimal容易出现精度丢失等问题，我记得我当时转了string就好了。那么测试下，结果ok，但是decimal报错的问题还是没解决。

hive decimal bug， nvl(decimal,1)=0

继续测试

这次测试发现了一个很神奇的东西。

hive decimal bug， nvl(decimal,1)=0

nvl +了一个值居然就正确了，但是和原先的值又不一样。所以这个值是一个错误的正确值。

hive decimal bug， nvl(decimal,1)=0

这个和上面的基本一样只是，没有把t.pay_standard_rate转为decimal 因为本身就是decimal.

结果值完全不一样。。

测试中最有意思的事来了

hive decimal bug， nvl(decimal,1)=0

注意看这个sql和之前的sql一模一样就是多了个判断这个值是否是=0.

然后就出现了正确的结果！！！！！！！！！！

这时候就有疑问了，我查另外一列和我查nvl(T.PAY_STANDARD_RATE , 1) 有关系吗？

我抓周树人和鲁迅有关系么？你说瞎子坑，又不是说我李青打得不好？老子叫盖伦又不叫德玛.

大杀招

还是要依靠explain.既然你说周树人和鲁迅不一样，那我就把你们两个好好检查下

explain select
T.PAY_STANDARD_RATE,
nvl(T.PAY_STANDARD_RATE , 1),
T.PAY_STANDARD_RATE=1
from ODSCOSTDATA.UPAY_PAYMENT T

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      DagName: hive_20220525095616_c2aec1ed-f5ba-4625-a587-e323d8ac3e1c:3954
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: t
                  Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: pay_standard_rate (type: decimal(38,18)), NVL(pay_standard_rate,1) (type: decimal(38,18)), (pay_standard_rate = 1) (type: boolean)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

explain
select T.PAY_STANDARD_RATE,
nvl(T.PAY_STANDARD_RATE , 1)
from ODSCOSTDATA.UPAY_PAYMENT T

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      DagName: hive_20220525095658_abf11a23-3e89-47fa-ad1b-8e84b3988910:3955
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: t
                  Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: pay_standard_rate (type: decimal(38,18)), NVL(pay_standard_rate,1) (type: decimal(38,18))
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 600 Data size: 67284 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Execution mode: vectorized

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

注意看下面的多了一行 Execution mode: vectorized .

这个又是啥？因为我之前研究过这个 vectorized，所以我知道这是啥。。。。

给小伙伴一个办法。set 列出所有属性值

hive decimal bug， nvl(decimal,1)=0

结论

经过测试 hive.vectorized.execution.enabled=true改为false不使用矢量化模式就可解决bug ！！!

扩展学习

这个矢量化是什么？为什么他自然开启，开启了这么多bug，开启有什么好处？

Vectorized Query Execution - Apache Hive - Apache Software Foundationhttps://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution

Vectorized query execution is a Hive feature that greatly reduces the CPU usage for typical query operations like scans, filters, aggregates, and joins. A standard query execution system processes one row at a time. This involves long code paths and significant metadata interpretation in the inner loop of execution. Vectorized query execution streamlines operations by processing a block of 1024 rows at a time. Within the block, each column is stored as a vector (an array of a primitive data type). Simple operations like arithmetic and comparisons are done by quickly iterating through the vectors in a tight loop, with no or very few function calls or conditional branches inside the loop. These loops compile in a streamlined way that uses relatively few instructions and finishes each instruction in fewer clock cycles, on average, by effectively using the processor pipeline and cache memory. A detailed design document is attached to the vectorized query execution JIRA, at [HIVE-4160] Vectorized Query Execution in Hive - ASF JIRA.

简单的来说这个减少cpu使用率，批量查询，加快查询速度

开启方式1. set hive.vectorized.execution.enabled = true

2.hive表格式必须是orc

hive decimal bug， nvl(decimal,1)=0

Supported data types and operations

The following data types are currently supported for vectorized execution:

tinyint
smallint
int
bigint
boolean
float
double
decimal
date
timestamp (see Limitations below)
string

Using other data types will cause your query to execute using standard, row-at-a-time execution.

使用其他类型会不执行vectorized模式，变成一次一条...

The following expressions can be vectorized when used on supported types:

arithmetic: +, -, *, /, %
AND, OR, NOT
comparisons <, >, <=, >=, =, !=, BETWEEN, IN ( list-of-constants ) as filters
Boolean-valued expressions (non-filters) using AND, OR, NOT, <, >, <=, >=, =, !=
IS [NOT] NULL
all math functions (SIN, LOG, etc.)
string functions SUBSTR, CONCAT, TRIM, LTRIM, RTRIM, LOWER, UPPER, LENGTH
type casts
Hive user-defined functions, including standard and generic UDFs
date functions (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, UNIX_TIMESTAMP)
the IF conditional expression

注意啊这上面说的都是在where里比较计算还是用 vectorized mode。

但是我之前是在select 语句里的select 用到了 column =1 这种就会退出矢量化模式

最后说下如果有相同的小伙伴遇到这个问题。

set hive.vectorized.execution.enabled = false
或者在select 里加一列 column=1 随便等于几都行，主要是退出矢量化模式。

也可以通过explain，看下当前sql是否在矢量化模式。

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

hive decimal bug， nvl(decimal,1)=0

hive decimal bug， nvl(decimal,1)=0

将decimal转string

继续测试

测试中最有意思的事来了

大杀招

结论

扩展学习

Supported data types and operations

相关文章

【SpringAOP】Spring AOP 底层逻辑：切点表达式与原理简明阐述

作者介绍

热门标签

相关产品

弹性云主机

天翼云电脑（公众版）

对象存储

云硬盘

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

hive decimal bug， nvl(decimal,1)=0

hive decimal bug， nvl(decimal,1)=0

将decimal转string

继续测试

测试中最有意思的事来了

大杀招

结论

扩展学习

Supported data types and operations

相关文章

【SpringAOP】Spring AOP 底层逻辑：切点表达式与原理简明阐述

作者介绍

热门标签

相关产品

弹性云主机

天翼云电脑（公众版）

对象存储

云硬盘