TPCH 22条查询语句功能及特点:登录 石原子科技 · 石原子科技
在进行MySQL和StoneDB TPCH对比中发现有五条查询query 相比于MySQL性能较差,对这五条查询性能较差SQL进行分析结果如下
总结
本次TPCH测试过程中出现五条执行性能比MySQL低的语句,主要原因是子查询效果不大好,具体原因如下;
1)(co-related subquery) 语句子查询使用上层表字段进行查询,导致语句轮巡对比子查询,这块原因初步判断是优化器上执行方式不同,加上数据存储格式不同,执行计划相差较大,具体分析测试方案查看Q2、Q4、Q17
2)(IN subqery) IN条件采用SQL子查询,SQL子查询结果集越大,StoneDB性能比MySQL越差,当替换IN条件为子查询结果集常量后,子查询结果集常量越大,StoneDB性能比MySQL性能越好,具体测试分析方案查看Q16 和Q18对比结果
另外还有Q20 Q21 执行不出来,是因为Q20和Q21两个性能低的原因都涉及了,具体SQL查看
-- using default substitutions select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name; -- limit -1;
-- using default substitutions select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100;
Q22涉及到子查询调用外层表字段,orders 子查询调用上层customer 的c_custkey字段,触发了1)问题,具体SQL查看
-- using default substitutions select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode; -- limit -1;
Q2
-- using default substitutions select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
分析原因:这条语句使用的是子查询语句,子查询中where 条件的 p_partkey使用的是外层part 表数据,需要经过外层过滤后再逐步于子查询where条件进行匹配过滤,所以导致改查询速度较慢
StoneDB 执行计划
id |
select_type |
table |
type |
possible_keys |
key |
key_len |
ref |
rows |
Extra |
1 |
PRIMARY |
partsupp |
ALL |
PRIMARY |
8000000 |
Using temporary; Using filesort |
|||
1 |
PRIMARY |
supplier |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.partsupp.ps_suppkey |
1 |
|
1 |
PRIMARY |
nation |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.supplier.s_nationkey |
1 |
|
1 |
PRIMARY |
part |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.partsupp.ps_partkey |
1 |
Using where with pushed condition (t0) Pckrows: 1, susp. 1 (0 empty 0 full). Conditions: 1 |
1 |
PRIMARY |
region |
ALL |
PRIMARY |
5 |
Using where with pushed condition (t0) Pckrows: 1, susp. 1 (0 empty 0 full). Conditions: 1; Using join buffer (Block Nested Loop) |
|||
2 |
DEPENDENT SUBQUERY |
supplier |
ALL |
PRIMARY |
100000 |
||||
2 |
DEPENDENT SUBQUERY |
nation |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.supplier.s_nationkey |
1 |
|
2 |
DEPENDENT SUBQUERY |
partsupp |
eq_ref |
PRIMARY |
PRIMARY |
8 |
ztpch.part.p_partkey,ztpch.supplier.s_suppkey |
1 |
|
2 |
DEPENDENT SUBQUERY |
region |
ALL |
PRIMARY |
5 |
Using where with pushed condition (t0) Pckrows: 1, susp. 1 (0 empty 0 full). Conditions: 1; Using join buffer (Block Nested Loop) |
MySQL 执行计划
id |
select_type |
table |
type |
possible_keys |
key |
key_len |
ref |
rows |
Extra |
1 |
PRIMARY |
region |
ALL |
PRIMARY |
5 |
Using where; Using temporary; Using filesort |
|||
1 |
PRIMARY |
nation |
ref |
PRIMARY,nation_fk1 |
nation_fk1 |
4 |
ztpch.region.r_regionkey |
2 |
|
1 |
PRIMARY |
supplier |
ref |
PRIMARY,supplier_fk1 |
supplier_fk1 |
4 |
ztpch.nation.n_nationkey |
2057 |
|
1 |
PRIMARY |
partsupp |
ref |
PRIMARY,partsupp_fk1 |
partsupp_fk1 |
4 |
ztpch.supplier.s_suppkey |
37 |
|
1 |
PRIMARY |
part |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.partsupp.ps_partkey |
1 |
Using where |
2 |
DEPENDENT SUBQUERY |
partsupp |
ref |
PRIMARY,partsupp_fk1 |
PRIMARY |
4 |
ztpch.part.p_partkey |
1 |
|
2 |
DEPENDENT SUBQUERY |
supplier |
eq_ref |
PRIMARY,supplier_fk1 |
PRIMARY |
4 |
ztpch.partsupp.ps_suppkey |
1 |
|
2 |
DEPENDENT SUBQUERY |
nation |
eq_ref |
PRIMARY,nation_fk1 |
PRIMARY |
4 |
ztpch.supplier.s_nationkey |
1 |
|
2 |
DEPENDENT SUBQUERY |
region |
eq_ref |
PRIMARY |
PRIMARY |
4 |
ztpch.nation.n_regionkey |
1 |
Using where |
结果优化:尝试过改写SQL进行优化,改写的SQL语句后执行结果与原本TPCH结果不一致,无法满足TPCH规范,可能需要优化器上针对子查询的执行计划进行一些优化
Q4
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
分析原因:本条query执行慢的原因和Q2的类似,也是子查询内使用了外层表的orders表的o_orderkey字段作为where 条件进行过滤
结果优化:优化结果如Q2,需要优化器上针对子查询的执行计划进行一些优化
Q16
-- using default substitutions select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size; -- limit -1;
分析原因
IN中子查询共产生56条结果集
MySQL原生语句执行效果:
StoneDB原生语句执行效果:
StoneDB IN子查询替换为结果集常量进行IN查询
MySQL IN子查询替换为结果集常量进行IN查询
通过测试发现子查询内容替换为子查询的结果集进行查询非常快,仅需1.429s,如果子查询采用SQL语句,查询性能立马下降,
结果优化:StoneDB对IN (SQL)类型的子查询支持不是很好,IN (SQL)子查询SQL结果集越多,性能越低
例如:select * from b where b_c_id IN (select c_id from c)
建议使用IN (SQL结果集)
例如:先执行select c_id from c 获得 结果集(1,2,3,4)
再把结果集放入到IN中
select * from b where b_c_id IN (1,2,3,4)
-- using default substitutions select p_brand, p_type, p_size, SUM(p_size), count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( 358,2820,3804,9504,13746,13941,14836,15640,17960,19073,23541,25307,27538,28899,28942,30605,32235,34572,35391,35444,36739,39534,41266,45061,45337,49187,50346,51086,55645,55714,56086,58086,58537,59087,65630,66438,67752,67939,68272,69707,73881,75374,77067,77713,79552,79763,80884,84217,84705,88766,91288,91327,92353,93461,95523,99857 ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size; -- limit -1;
Q17
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
原因分析:与Q2 和Q4 一样,子查询中where 条件调用外层part表p_partkey字段作为条件进行过滤
结果优化:如上Q2和Q4,需要优化器上针对子查询的执行计划进行一些优化
Q18
-- using default substitutions select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
原因分析
由于原生语句子查询有600多个结果,执行时间比较长,所以本次分析修改了sum(l_quantity) > 320,获得27个子查询结果集,
提取子查询获得结果
StoneDB原生SQL 27个结果集子查询效果
MySQL 27个结果集子查询效果
把子查询替换为结果集常量优化测试
去除group 采用sum字段判断数据一致性
MySQL原生语句执行结果
StoneDB原生语句执行结果
MySQL替换常量IN执行结果
StoneDB 替换常量执行结果
和Q16类似,IN中采用SQL子查询,查询结果为600多条结果集,
结果优化
优化方案如Q16
-- using default substitutions
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
2199712,4722021,4806726,7235014,7810823,11141606,12314084,20971013,21213895,23245603,26465922,28184576,29122912,29294434,34201984,35542498,36667107,38914978,39702437,43609317,44143908,46685344,48881602,48943904,55799200,58000519,59138305
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;