一、索引推荐
1、测试数据导入
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
CREATE TABLE tab_ysl_1 (col1 int, col2 int, col3 text);
INSERT INTO tab_ysl_1 VALUES(generate_series(1, 3000),generate_series(1, 3000),repeat( chr(int4(random()*26)+65),4));
ANALYZE tab_ysl_1;
CREATE TABLE tab_ysl_2 (col1 int, col2 int);
INSERT INTO tab_ysl_2 VALUES(generate_series(1, 1000),generate_series(1, 1000));
ANALYZE tab_ysl_2;
2、gs_index_advise函数
1.测试where
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 = 10');
schema | table | column | indextype
--------+-----------+--------+-----------
joe | tab_ysl_1 | col1 |
(1 row)
2.测试join
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1');
schema | table | column | indextype
--------+-----------+--------+-----------
joe | tab_ysl_1 | col1 |
joe | tab_ysl_2 | |
(2 rows)
3.测试多表
SELECT * FROM gs_index_advise('SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 =
tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1');
schema | table | column | indextype
--------+-----------+--------+-----------
joe | tab_ysl_1 | col2 |
joe | tab_ysl_2 | col1 |
(2 rows)
4.测试order by
SELECT * FROM gs_index_advise('SELECT *, col2 FROM tab_ysl_1 ORDER BY 1, 3');
schema | table | column | indextype
--------+-----------+-----------+-----------
joe | tab_ysl_1 | col1,col3 |
(1 row)
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 > 10 ORDER BY 1,col2');
schema | table | column | indextype
--------+-----------+-----------+-----------
joe | tab_ysl_1 | col1,col2 |
(1 row)
5.测试过长字符串
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 where col3 in (''aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa'',''bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'',''cccccccccccccccccccccccccccccccccccccc
c'',''ddddddddddddddddddddddddddddddddddddddd'',''ffffffffffffffffffffffffffffffffffffffff'',''gggggggggggggggggggg
ggggggggggggggggggggggggggggggg'',''ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt'',''vvv
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv'',''ggmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
mmm'')');
schema | table | column | indextype
--------+-----------+--------+-----------
joe | tab_ysl_1 | col3 |
(1 row)
3、Workload级别索引推荐
这种方式可以针对多条SQL,可以将待优化的SQL写到文件里,通过脚本获得推荐索引。
脚本目录在安装目录的bin/dbmind/index_advisor下边,我的目录为
/opt/gaussdb/app/bin/dbmind/index_advisor/index_advisor_workload.py
将待优化的SQL放到文件里
[omm@node1 index_advisor]$ cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 = tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1;
SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1;
使用如下方式调用脚本,可以批量获取推荐索引,26000为我的数据库端口,ysla为我的数据库名,1.sql为我待优化的SQL存放的文件
[omm@node1 index_advisor]$ pwd
/opt/gaussdb/app/bin/dbmind/index_advisor
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
table: tab_ysl_1 columns: col2
table: tab_ysl_2 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);
4、索引效率查看
1、未优化的情况下
cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
col1 | col2 | col3
------+------+------
10 | 10 | HHHH
(1 row)
total time: 1 ms
real 0m0.020s
user 0m0.007s
sys 0m0.001s
2、分析并创建索引
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);
通过Index-advisor获取推荐索引。并创建索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r -c "create index ind0 on tab_ysl_1(col1);"
[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
col1 | col2 | col3
------+------+------
10 | 10 | HHHH
(1 row)
total time: 0 ms
real 0m0.019s
user 0m0.006s
sys 0m0.002s
#短了一丢丢,可能是数据量太少了,不明显
5、总结
1、未添加索引的查询效率
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"
real 0m4.916s
user 0m0.014s
sys 0m0.001s
2、使用索引推荐函数gs_index_advise获取优化建议
select * from gs_index_advise('select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;')
table | column
lineitem | (1_orderkey)
3、创建索引
create index idx1 on lineitem(1_orderkey);
4、查看优化结果
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"
单索引推荐:适用于表中数据量大的情况,数据量过小不会进行推荐。
(1)当where中的查询条件只有一个的时候,推荐单一索引,如只有id在where中,只推荐id为索引;当where中的查询条件有多个的时候推荐多重索
引,如id,name在where中被当做条件,则一起被推荐为联合索引,但是如果同时存在id、person_id则默认推荐id( 目前不知道原因)
(2)当query中除了where这个语句时,还存在order by 和 group by等条件时,将where、order by、group by中的属性全部作为联合索引进行
推荐。
(3)使用like模糊查询时或精准查询都不对该属性进行索引建立,并且=属性一定给予索引推荐建立。
(4)当query中的条件过多时存在的属性也超过三个时,依旧推荐联合属性是在三个以上,会不会导致推荐索引过多从而性能下降,这个有待商榷,最
好推荐索引中的属性在三个以内最好。不过这个可能需要通过DRL来学习,判断究竟选择一条query中的哪几个属性来建立索引。
二、虚拟索引
一般在加索引时,会堵塞DML(不过PG支持并发加索引,不堵塞DML) 。只有索引真正能起到优化作用,我们建立索引才是有意义的。虚拟索引是一个很有用的东西,没有副作用,只是虚拟的索引,建立虚拟索引后,可以通过EXPLAIN来查看加索引后的成本估算,判断是否加索引COST会降低。
可以用虚拟索引检验索引的效果,根据效果可选择是否创建真实的索引优化查询。
1、清理之前的索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
\d+ tab_ysl_1
drop index ind0;
\d+ tab_ysl_1
Table "joe.tab_ysl_1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
col1 | integer | | plain | |
col2 | integer | | plain | |
col3 | text | | extended | |
Has OIDs: no
Options: orientation=row, compression=no
2、#测试建立虚拟索引(hypopg_create_index)
SELECT * FROM hypopg_create_index('CREATE INDEX ON tab_ysl_1(col1)');
indexrelid | indexname
------------+-----------------------------
54410 | <54410>btree_tab_ysl_1_col1
(1 row)
3、显示所有创建的虚拟索引信息(enable_hypo_index)
select * from hypopg_display_index();
indexname | indexrelid | table | column
-----------------------------+------------+-----------+--------
<54410>btree_tab_ysl_1_col1 | 54410 | tab_ysl_1 | (col1)
(1 row)
4、执行分析
set enable_hypo_index = on;explain SELECT * FROM tab_ysl_1 WHERE col1 = 100;
SET
QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using <54410>btree_tab_ysl_1_col1 on tab_ysl_1 (cost=0.00..8.27 rows=1 width=13)
Index Cond: (col1 = 100)
(2 rows)
5、测试删除指定虚拟索引(hypopg_display_index)
使用函数hypopg_drop_index删除指定oid的虚拟索引
select * from hypopg_drop_index(54410);
hypopg_drop_index
-------------------
t
(1 row)
6、再次查看虚拟索引
database_test=> select * from hypopg_display_index();
indexname | indexrelid | table | column
-----------+------------+-------+--------
(0 rows)
7、#使用函数hypopg_reset_index一次性清除所有创建的虚拟索引
SELECT * FROM hypopg_reset_index();
hypopg_reset_index
--------------------