半连接和反连接,其实就是in,exists,之前我对这几个玩意其实也很迷茫,看网上都有的说exists比in好,等等之类的,其实并不然,等了解了他的一些内部机制,就不会再这么盲目了。
那么我先看几个相关参数:
_always_semi_join=CHOOSE 这个说明是按最小成本选择半连接
这个参数可以通过select NAME_KSPVLD_VALUES name, VALUE_KSPVLD_VALUES value
from X$KSPVLD_VALUES
where NAME_KSPVLD_VALUES like nvl('&name',NAME_KSPVLD_VALUES);
Enter value for name: _always_semi_join来查看都有那些值。
HASH JOIN SEMI 也叫哈希半连接
做个实验:
create table filter (sex varchar2(2));
insert into filter values ('男');
insert into filter values ('女');
insert into filter values ('男');
insert into filter values ('男');
insert into filter values ('女');
commit;
create table emp1(emp_no number,sex varchar2(2));
insert into emp1 select rownum ,sex from filter order by 2;
insert into emp1 select rownum ,sex from filter order by 2;
commit;
execute dbms_stats.gather_table_stats(ownname=>user, tabname=>'FILTER')
execute dbms_stats.gather_table_stats(ownname=>user, tabname=>'EMP1')
首先测试HASH半连接:
首先hash半连接_always_semi_join=CHOOSE 这个参数一定是choose,也就是根据最小成本选择半连接。
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE EXISTS
(SELECT 0 FROM FILTER
WHERE FILTER.SEX=EMP1.SEX);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
看结果:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID bkvqwm45n1fdb, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE EXISTS (SELECT 0 FROM
FILTER WHERE FILTER.SEX=EMP1.SEX)
Plan hash value: 1392637843
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
|* 1 | HASH JOIN SEMI | | 1 | 10 | 10 |00:00:00.01 | 14 |
| 2 | TABLE ACCESS FULL| EMP1 | 1 | 10 | 10 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS FULL| FILTER | 1 | 5 | 2 |00:00:00.01 | 7 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("FILTER"."SEX"="EMP1"."SEX")
好,来解释一下这个执行计划,现在可以看到1是hash 半连接,首先扫描emp1表,再扫描filter表,emp1这里是驱动表,取出10行,生成hash表,每取出一行就去另一个hash表filter表过滤一下,emp1性别2种,所以一种性别去过滤一次就结束,不用再往下匹配,这里也就是'男'匹配一次,'女'匹配一次,总共两次。上面A-Rows为2,说明这个表过滤作用的就两个值,有几个值就有几个A-Rows.
上面是用exists,下面用in看一下:
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE sex in
(SELECT /*+ USE_HASH */ sex FROM FILTER);
结果和上面的执行计划是一样的。这里in和exists就是一样的,都是hash半连接。
下面看一下不用hash半连接的:
alter session set "_always_semi_join" = OFF;
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE sex in
(SELECT /*+ USE_HASH */ sex FROM FILTER);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 1p9a7sq3tdb4p, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE sex in (SELECT /*+
USE_HASH */ sex FROM FILTER)
Plan hash value: 3840124480
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | | 10 |00:00:00.01 | 20 |
| 2 | TABLE ACCESS FULL| EMP1 | 1 | 10 | 10 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| FILTER | 2 | 1 | 2 |00:00:00.01 | 12 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NOT NULL)
3 - filter("SEX"=:B1)
这里可以看到用的filter,性能比半连接的性能还好。
嵌套半连接:
alter session set "_always_semi_join" = 'CHOOSE';先修改回来
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE EXISTS
(SELECT /*+ NL_SJ */ 0 FROM FILTER
WHERE FILTER.SEX=EMP1.SEX);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
Plan hash value: 2921932404
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 1 | NESTED LOOPS SEMI | | 1 | 10 | 10 |00:00:00.01 | 20 |
| 2 | TABLE ACCESS FULL| EMP1 | 1 | 10 | 10 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| FILTER | 2 | 5 | 2 |00:00:00.01 | 12 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FILTER"."SEX"="EMP1"."SEX")
NL_SJ需要放到IN子句,或者EXISTS和NOT EXISTS语句中,是强制走嵌套半连接的意思,这里看到1就走了嵌套半连接了。
所以starts为什么是2呢?其实是emp1驱动表中sex只有男和女,所以需要比两次。
insert into emp1 values(100,'中') ;
commit;
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE EXISTS
(SELECT /*+ NL_SJ */ 0 FROM FILTER
WHERE FILTER.SEX=EMP1.SEX);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
插入一个'中'的性别,再查看计划:
Plan hash value: 2921932404
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 1 | NESTED LOOPS SEMI | | 1 | 10 | 10 |00:00:00.01 | 27 |
| 2 | TABLE ACCESS FULL| EMP1 | 1 | 10 | 11 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| FILTER | 3 | 5 | 2 |00:00:00.01 | 19 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FILTER"."SEX"="EMP1"."SEX")
大家看到这里starts是3了。对filter表操作了3次,也就是比对了3次。
如果我们再加一个条件:
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE sex='女' and EXISTS
(SELECT /*+ NL_SJ */ 0 FROM FILTER
WHERE FILTER.SEX=EMP1.SEX);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
Plan hash value: 2921932404
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 1 | NESTED LOOPS SEMI | | 1 | 5 | 4 |00:00:00.01 | 14 |
|* 2 | TABLE ACCESS FULL| EMP1 | 1 | 5 | 4 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| FILTER | 1 | 3 | 1 |00:00:00.01 | 6 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("SEX"='女')
3 - filter(("FILTER"."SEX"='女' AND "FILTER"."SEX"="EMP1"."SEX"))
那么这里有了女的过滤,那么filter表只操作一次就可以了,所以starts是1.
insert into filter values('中') ;
commit;
这里我给filter表里也插入一个,再次执行一次:
SELECT /*+ gather_plan_statistics */ * FROM EMP1 WHERE EXISTS
(SELECT /*+ NL_SJ */ 0 FROM FILTER
WHERE FILTER.SEX=EMP1.SEX);
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
Plan hash value: 2921932404
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 1 | NESTED LOOPS SEMI | | 1 | 10 | 11 |00:00:00.01 | 26 |
| 2 | TABLE ACCESS FULL| EMP1 | 1 | 10 | 11 |00:00:00.01 | 8 |
|* 3 | TABLE ACCESS FULL| FILTER | 3 | 5 | 3 |00:00:00.01 | 18 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FILTER"."SEX"="EMP1"."SEX")
看到3的A-Rows变化了,是3,所以A-Rows就是实际返回的行数。
嵌套全连接:
delete emp1 where sex='中';
commit;
SELECT /*+ gather_plan_statistics USE_NL(EMP1,FILTER) */ EMP1.* FROM EMP1,FILTER
WHERE FILTER.SEX=EMP1.SEX;
SELECT * FROM table(dbms_xplan.display_cursor(null,null,'iostats last'));
Plan hash value: 3269263915
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 1 | NESTED LOOPS | | 1 | 25 | 26 |00:00:00.01 | 53 |
| 2 | TABLE ACCESS FULL| FILTER | 1 | 5 | 6 |00:00:00.01 | 9 |
|* 3 | TABLE ACCESS FULL| EMP1 | 6 | 5 | 26 |00:00:00.01 | 44 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("FILTER"."SEX"="EMP1"."SEX")
看到3的starts是6了吧,因为是嵌套全连接,不是半连接,所以2的所以行都的比对一次。这里是根据行数取最少,成本最低的作为驱动表,所以看到filter是驱动表,emp1被驱动。这里3所以就操作了6次,返回26行。
这样我们就很清晰的了解了in和exists其实那个性能都不一定好,今天就记录到这里。