1. 索引概述

1.1. 作用

索引是数据库中的一种快速查询数据的实现方法，记录了表中的一列或多列值与其物理位置之间的对应关系，类似于书本目录。
索引的作用不限于加快表记录（元组）的查询和排序速度，也可以实现唯一性约束（如唯一索引）。
索引的代价：

也是文件，会消耗一些存储空间
随着表记录（元组）的改变，索引也需要更新，消耗更多的时间和资源

1.2. 分类

BTree

创建索引默认的是BTree
支持范围、模糊等查询

Hash

处理简单的等于比较操作，非等于操作hash索引就失效了
hash索引的列如果重复度比较高，容易产生严重的hash冲突，查询效率较低，这种情况下也不适用

GiST (Generalized Search Tree)

通用搜索树，底层是一种平衡树，对比BTree，适用特殊的操作符（"@>","<<"等）
ip、几何、位置等更适用GIST索引，int和String也能使用，但性价比较低

SP-GIST

和GiST类似，但是是一棵不平衡树，支持多维和海量数据，把空间分割成互不相交的部分

倒排索引，适合于包含多个组成值的数据，比如数组，全文检索等。
与GIST类似，可以使用不同的操作符。

BRIN

块级索引，不是以行号为单位记录索引明细，而是记录每个数据块或者每段连续的数据块的统计信息。
当被索引列的值与物理存储相关性很强时，BRIN索引的效果比较好（例如时序数据，在时间或序列字段创建BRIN索引）。

2. 索引实践

2.1. 索引增删改

UNIQUE 指定唯一索引
USING method 指定索引类型
CONCURRENTLY 并发创建索引

在创建索引时，PG会锁定表，并全表扫描，用户的更新、删除等操作被阻塞。
对于更新频繁和较大的表，这会导致阻塞时间较长，此时可考虑用并发创建索引。

2.2. 选择好的索引

列的相关性

表的全表扫描次数

索引使用次数

2.3. 执行计划

使用工具猫生成百万条student数据
使用CopyManager导入生成的数据
Explain语句的用法

Seq Scan全表扫描

索引扫描

索引中找出需要的数据行的物理位置，然后再到表的数据块中把相应的数据读出来的过程（Index Only Scan 不回表、 Index Scan 回表）。

位图扫描

“Bitmap Index Scan”先在索引中找到符合条件的行，然后在内存中创建位图，再到表中扫描，也就是我们看到的“Bitmap Heap Scan”。
扫描索引，块中符合条件的数据将bit标为1，在内存中建立位图，扫描完索引后，再根据位图到表的数据文件中把相应的数据读出来，（一般执行计划的结果较多、非等值查询、多条件会走这个索引）。

3. 索引原理（Btree为主）

3.1. B-tree背景

3.1.1. 回顾B+树

所有的key + value 数据都保存在叶子节点，root 以及中间节点仅保存key 用作索引。
所有的叶子节点之间都维护一个单向/双向指针，方便顺序遍历。
这种结构，使得每次查找都会从root节点到叶子节点，稳定性较好（避免了Btree遍历节点的回溯）。

B+数的结构图（图片来源于PG数据库内核分析书籍）

3.1.2. B-Link-Tree

B-Link-Tree的设计来源于论文 1981 ACM论文 “Efficient Locking for Concurrent Operations on B-Trees”，即大名鼎鼎的 L & Y 论文。纯粹的 B+Tree 数据结构无法高效得解决并发操作的问题，如下时序场景：

时序场景图1（图片来源于csdn博客）

语法解释：

C <--- read(x) 表示从磁盘page x 读取数据到内存块 C中，用变量x表示其内存数据。
examin C；解析内存块C。
get child[1] to y，从解析的内存块中提取child[1]的磁盘page地址，并分配变量y表示这个child[1]指针指向的数据区域。
put (B, y’)，将内存块B 中 y’变量的数据写入到磁盘。

时序场景图2（图片来源于csdn博客）

即在 search (15) 操作和 Insert(12) 并发操作场景中，两种先后要准备读取y page的数据，在此期间 insert(12)先完成并发生了节点分裂以及leveup，修改了 search操作要访问的磁盘page y 的数据（将15 分裂到了磁盘page y’ ），导致 search(15)操作最后从磁盘读取的 y page没有找到y。
基于上述B+树在并发操作上的问题，B-Link-Tree 在B+Tree的基础上又增加了两个性质：

为每一个内部节点新增了一个指向兄弟节点的右向指针。
为每一个内部节点引入一个额外的key (high-key)，它是当前节点以及所有子节点中最大的key。

通过这两个性质的约束，实际的实现过程中可以保证正确性的情况下只需要加少量的写锁以及读无锁；能够提升 B+Tree 的并发访问性能。

B-Link-Tree图（图片来源于csdn博客）

3.1.3. B-tree的组织架构

B-tree的页面组织结构图

Page header存储一些元信息，索引类型、版本、页面层级等（linp0也在其中，为了表述方便画出来的）。
itup1、itup2、itup3是有序的索引元组。
linp1、linp2、linp3指向索引元组，存储了元组在页面内的实际位置。
Special space存储指向左右兄弟节点的指针、页面类型等信息。

索引页面组织结构图1（图片来源于PG数据库内核分析书籍）

假设页面填充三个元组就不允许插入新的元组了，此时linp1 2 3分别指向itup 1 2 3，现在继续插入元组会分如下情况：

节点不为最右节点：.首先将itup3（最大的索引元组）复制到当前节点的右兄弟节点，然后将linp0指向itup3（HK）。去掉linp3。使用linp0来指向页面中的HK。（HK不是一个关键字，只是给出该节点的范围）。

索引页面组织结构图2（图片来源于PG数据库内核分析书籍）

节点为最右节点：对于最右节点，其并不需要HK，故将每个linp递减一个位置，linp3不再使用。

索引页面组织结构图3（图片来源于PG数据库内核分析书籍）

根据上面的描述，可以得出一个完整的索引组织结构图：

索引整体组织结构图（图片来源于PG数据库内核分析书籍）

虚线上方为索引结构：

根节点和内部节点层：索引元组指向索引节点。
叶子节点：索引元组指向表元组的存放的实际位置。
根据BTree的特性，索引元组为有序，第一个叶子节点中itup3实际为最大索引元组，即HK，第二个叶子节点中itup1实际为最小索引元组，两者相同，故指向了同一物理存储位置。

虚线下方为表元组

3.2. B-tree构建过程

3.2.1. 数据结构介绍

IndexAmRoutine 结构体规定了制作索引需要定义的属性，以及实现的方法
BtHandler的实现如下
BTBuildState结构体主要保存索引元组
BTWriteState结构体主要保存块号信息等
BTPageState主要保存节点层次信息，以及指向父节点层次的指针

3.2.1.

/*
* API struct for an index AM.  Note this must be stored in a single palloc'd
* chunk of memory.
*/
typedef struct IndexAmRoutine
{
    NodeTag		type;
    
    /*
* Total number of strategies (operators) by which we can traverse/search
* this AM.  Zero if AM does not have a fixed set of strategy assignments.
*/
    uint16		amstrategies;
    /* total number of support functions that this AM uses */
    uint16		amsupport;
    /* opclass options support function number or 0 */
    uint16		amoptsprocnum;
    /* does AM support ORDER BY indexed column's value? */
    bool		amcanorder;
    /* does AM support ORDER BY result of an operator on indexed column? */
    bool		amcanorderbyop;
    /* does AM support backward scanning? */
    bool		amcanbackward;
    /* does AM support UNIQUE indexes? */
    bool		amcanunique;
    /* does AM support multi-column indexes? */
    bool		amcanmulticol;
    /* does AM require scans to have a constraint on the first index column? */
    bool		amoptionalkey;
    /* does AM handle ScalarArrayOpExpr quals? */
    bool		amsearcharray;
    /* does AM handle IS NULL/IS NOT NULL quals? */
    bool		amsearchnulls;
    /* can index storage data type differ from column data type? */
    bool		amstorage;
    /* can an index of this type be clustered on? */
    bool		amclusterable;
    /* does AM handle predicate locks? */
    bool		ampredlocks;
    /* does AM support parallel scan? */
    bool		amcanparallel;
    /* does AM support columns included with clause INCLUDE? */
    bool		amcaninclude;
    /* does AM use maintenance_work_mem? */
    bool		amusemaintenanceworkmem;
    /* OR of parallel vacuum flags.  See vacuum.h for flags. */
    uint8		amparallelvacuumoptions;
    /* type of data stored in index, or InvalidOid if variable */
    Oid			amkeytype;

    /*
* If you add new properties to either the above or the below lists, then
* they should also (usually) be exposed via the property API (see
* IndexAMProperty at the top of the file, and utils/adt/amutils.c).
*/

    /* interface functions */
    ambuild_function ambuild;
    ambuildempty_function ambuildempty;
    aminsert_function aminsert;
    ambulkdelete_function ambulkdelete;
    amvacuumcleanup_function amvacuumcleanup;
    amcanreturn_function amcanreturn;	/* can be NULL */
    amcostestimate_function amcostestimate;
    amoptions_function amoptions;
    amproperty_function amproperty; /* can be NULL */
    ambuildphasename_function ambuildphasename; /* can be NULL */
    amvalidate_function amvalidate;
    amadjustmembers_function amadjustmembers;	/* can be NULL */
    ambeginscan_function ambeginscan;
    amrescan_function amrescan;
    amgettuple_function amgettuple; /* can be NULL */
    amgetbitmap_function amgetbitmap;	/* can be NULL */
    amendscan_function amendscan;
    ammarkpos_function ammarkpos;	/* can be NULL */
    amrestrpos_function amrestrpos; /* can be NULL */

    /* interface functions to support parallel index scans */
    amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
    aminitparallelscan_function aminitparallelscan; /* can be NULL */
    amparallelrescan_function amparallelrescan; /* can be NULL */
} IndexAmRoutine;

/*
 * Btree handler function: return IndexAmRoutine with access method parameters
 * and callbacks.
 */
Datum
bthandler(PG_FUNCTION_ARGS)
{
	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);

	amroutine->amstrategies = BTMaxStrategyNumber;
	amroutine->amsupport = BTNProcs;
	amroutine->amoptsprocnum = BTOPTIONS_PROC;
	amroutine->amcanorder = true;
	amroutine->amcanorderbyop = false;
	amroutine->amcanbackward = true;
	amroutine->amcanunique = true;
	amroutine->amcanmulticol = true;
	amroutine->amoptionalkey = true;
	amroutine->amsearcharray = true;
	amroutine->amsearchnulls = true;
	amroutine->amstorage = false;
	amroutine->amclusterable = true;
	amroutine->ampredlocks = true;
	amroutine->amcanparallel = true;
	amroutine->amcaninclude = true;
	amroutine->amusemaintenanceworkmem = false;
	amroutine->amparallelvacuumoptions =
		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
	amroutine->amkeytype = InvalidOid;

	amroutine->ambuild = btbuild;
	amroutine->ambuildempty = btbuildempty;
	amroutine->aminsert = btinsert;
	amroutine->ambulkdelete = btbulkdelete;
	amroutine->amvacuumcleanup = btvacuumcleanup;
	amroutine->amcanreturn = btcanreturn;
	amroutine->amcostestimate = btcostestimate;
	amroutine->amoptions = btoptions;
	amroutine->amproperty = btproperty;
	amroutine->ambuildphasename = btbuildphasename;
	amroutine->amvalidate = btvalidate;
	amroutine->amadjustmembers = btadjustmembers;
	amroutine->ambeginscan = btbeginscan;
	amroutine->amrescan = btrescan;
	amroutine->amgettuple = btgettuple;
	amroutine->amgetbitmap = btgetbitmap;
	amroutine->amendscan = btendscan;
	amroutine->ammarkpos = btmarkpos;
	amroutine->amrestrpos = btrestrpos;
	amroutine->amestimateparallelscan = btestimateparallelscan;
	amroutine->aminitparallelscan = btinitparallelscan;
	amroutine->amparallelrescan = btparallelrescan;

	PG_RETURN_POINTER(amroutine);
}

/*
 * Working state for btbuild and its callback.
 *
 * When parallel CREATE INDEX is used, there is a BTBuildState for each
 * participant.
 */
typedef struct BTBuildState
{
	bool		isunique;
	bool		nulls_not_distinct;
	bool		havedead;
	Relation	heap;
	BTSpool    *spool;

	/*
	 * spool2 is needed only when the index is a unique index. Dead tuples are
	 * put into spool2 instead of spool in order to avoid uniqueness check.
	 */
	BTSpool    *spool2;
	double		indtuples;

	/*
	 * btleader is only present when a parallel index build is performed, and
	 * only in the leader process. (Actually, only the leader has a
	 * BTBuildState.  Workers have their own spool and spool2, though.)
	 */
	BTLeader   *btleader;
} BTBuildState;

/*
 * Overall status record for index writing phase.
 */
typedef struct BTWriteState
{
	Relation	heap;
	Relation	index;
	BTScanInsert inskey;		/* generic insertion scankey */
	bool		btws_use_wal;	/* dump pages to WAL? 是否开启wal */
	BlockNumber btws_pages_alloced; /* # pages allocated 记录下一次申请page的块号*/
	BlockNumber btws_pages_written; /* # pages written out 已经写入文件的页面块号*/
	Page		btws_zeropage;	/* workspace for filling zeroes */
} BTWriteState;

/*
 * Status record for a btree page being built.  We have one of these
 * for each active tree level.
 */
typedef struct BTPageState
{
	Page		btps_page;		/* workspace for page building */
	BlockNumber btps_blkno;		/* block # to write this page at */
	IndexTuple	btps_lowkey;	/* page's strict lower bound pivot tuple */
	OffsetNumber btps_lastoff;	/* last item offset loaded */
	Size		btps_lastextra; /* last item's extra posting list space */
	uint32		btps_level;		/* tree level (0 = leaf) */
	Size		btps_full;		/* "full" if less than this much free space */
	struct BTPageState *btps_next;	/* link to parent level, if any */
} BTPageState;

3.2.2. 构建流程

整体过程：首先将待索引的表元组封装为索引元组，并对索引元组进行排序，然后将排好序的索引元组填充到当前叶子节点中，若当前叶子节点填充满，则新申请一个叶子节点作为当前叶子节点的右兄弟节点，然后在父节点中添加指向新叶子节点的指针（没有父节点则创建），接下来把新申请的节点当作叶子节点。重复这个过程，直到添加完成所有的索引元组为止。
创建索引流程图

indexBuildHeapScan函数为BTBuildState结构体中的spool1 2赋值。

构建流程图1（图片来源于PG数据库内核分析书籍）

生成索引结构流程图

构建流程图2（图片来源于PG数据库内核分析书籍）

插入索引元组流程和实例图（假设一个页面最多两个元组）

add函数插入时发现空间不够，则申请新的页面作为右兄弟节点，然后设置旧节点HK。
接着通过当前层次的BTPageState.btps_next找到父节点（若没有则创建），将旧节点的min key插入到父节点（同样使用add函数，递归调用），然后设置新旧节点的兄弟关系。
旧节点填充完成不会再修改，此时调用_bt_blwritepage将旧节点信息写入索引文件，并修改BTWriteState结构。
最后调用_bt_sortaddtup将待插入的索引元组插入到节点（若右有足够空间直接调用次函数）

构建流程图3（图片来源于PG数据库内核分析书籍）

上述得到了初步的索引结构，但每层最右节点与父节点的连接还没完成，因此使用_uppershutdown函数进行调整

BTPageState结构中有指向上一层的BTPageState结构，通过这个结构依次调整最右节点和父节点

构建流程图4（图片来源于PG数据库内核分析书籍）

3.2.3. 插入流程

插入背景：表已经创建索引，现在插入新的元组，索引也需要调整，使用btinsert实现。
btinsert过程：往下找到合适的插入节点（若是唯一索引，则验证），将其插入到索引中。

找合适节点的过程中，会利用BTStackData保存查询过程中的父节点（保存查询路径），便于插入操作需要分裂叶子节点时，可以根据次结构找到父节点，并对父节点做出调整。
当需要分裂时，需要找到合适的分裂位置，保证左右节点的剩余空间合理（PG规定，若是最右节点，分裂后产生的右节点应为左节点剩余空间的两倍，否则相等，也可以在创建索引时指定fillfactor填充因子）。

插入索引元组的流程图。

插入流程图（图片来源于PG数据库内核分析书籍）

_bt_moveright是根据L&Y的论文实现的，通过 _bt_moveright 检测当前操作的页面是否有并发insert 导致 page-split 的情况。如果发现查找的key 大于页面的 high-key，则通过页面的右指针移动到其兄弟节点。
_bt_findinsertloc不会局限于当前的节点，还会到右兄弟节点中查找。
_bt_insertonpg执行插入操作，检查是否有足够的剩余空间，如果有直接插入，如果没有，则调用_bt_findsplitloc寻找最合适的分裂位置，然后进行分裂操作，最后插入元组。

4. 总结

本篇简单介绍了索引概念、分类，并展示了索引的相关实践。
关于索引的原理，主要是以Btree为主，阐述了PG中Btree的背景以及构建过程。
篇幅和时间原因，没有描述函数具体实现和delete元组的操作。

1. 索引概述

1.1. 作用

索引是数据库中的一种快速查询数据的实现方法，记录了表中的一列或多列值与其物理位置之间的对应关系，类似于书本目录。
索引的作用不限于加快表记录（元组）的查询和排序速度，也可以实现唯一性约束（如唯一索引）。
索引的代价：

也是文件，会消耗一些存储空间
随着表记录（元组）的改变，索引也需要更新，消耗更多的时间和资源

1.2. 分类

BTree

创建索引默认的是BTree
支持范围、模糊等查询

Hash

处理简单的等于比较操作，非等于操作hash索引就失效了
hash索引的列如果重复度比较高，容易产生严重的hash冲突，查询效率较低，这种情况下也不适用

GiST (Generalized Search Tree)

通用搜索树，底层是一种平衡树，对比BTree，适用特殊的操作符（"@>","<<"等）
ip、几何、位置等更适用GIST索引，int和String也能使用，但性价比较低

SP-GIST

和GiST类似，但是是一棵不平衡树，支持多维和海量数据，把空间分割成互不相交的部分

倒排索引，适合于包含多个组成值的数据，比如数组，全文检索等。
与GIST类似，可以使用不同的操作符。

BRIN

块级索引，不是以行号为单位记录索引明细，而是记录每个数据块或者每段连续的数据块的统计信息。
当被索引列的值与物理存储相关性很强时，BRIN索引的效果比较好（例如时序数据，在时间或序列字段创建BRIN索引）。

2. 索引实践

2.1. 索引增删改

UNIQUE 指定唯一索引
USING method 指定索引类型
CONCURRENTLY 并发创建索引

在创建索引时，PG会锁定表，并全表扫描，用户的更新、删除等操作被阻塞。
对于更新频繁和较大的表，这会导致阻塞时间较长，此时可考虑用并发创建索引。

2.2. 选择好的索引

列的相关性

表的全表扫描次数

索引使用次数

2.3. 执行计划

使用工具猫生成百万条student数据
使用CopyManager导入生成的数据
Explain语句的用法

Seq Scan全表扫描

索引扫描

索引中找出需要的数据行的物理位置，然后再到表的数据块中把相应的数据读出来的过程（Index Only Scan 不回表、 Index Scan 回表）。

位图扫描

“Bitmap Index Scan”先在索引中找到符合条件的行，然后在内存中创建位图，再到表中扫描，也就是我们看到的“Bitmap Heap Scan”。
扫描索引，块中符合条件的数据将bit标为1，在内存中建立位图，扫描完索引后，再根据位图到表的数据文件中把相应的数据读出来，（一般执行计划的结果较多、非等值查询、多条件会走这个索引）。

3. 索引原理（Btree为主）

3.1. B-tree背景

3.1.1. 回顾B+树

所有的key + value 数据都保存在叶子节点，root 以及中间节点仅保存key 用作索引。
所有的叶子节点之间都维护一个单向/双向指针，方便顺序遍历。
这种结构，使得每次查找都会从root节点到叶子节点，稳定性较好（避免了Btree遍历节点的回溯）。

B+数的结构图（图片来源于PG数据库内核分析书籍）

3.1.2. B-Link-Tree

B-Link-Tree的设计来源于论文 1981 ACM论文 “Efficient Locking for Concurrent Operations on B-Trees”，即大名鼎鼎的 L & Y 论文。纯粹的 B+Tree 数据结构无法高效得解决并发操作的问题，如下时序场景：

时序场景图1（图片来源于csdn博客）

语法解释：

C <--- read(x) 表示从磁盘page x 读取数据到内存块 C中，用变量x表示其内存数据。
examin C；解析内存块C。
get child[1] to y，从解析的内存块中提取child[1]的磁盘page地址，并分配变量y表示这个child[1]指针指向的数据区域。
put (B, y’)，将内存块B 中 y’变量的数据写入到磁盘。

时序场景图2（图片来源于csdn博客）

即在 search (15) 操作和 Insert(12) 并发操作场景中，两种先后要准备读取y page的数据，在此期间 insert(12)先完成并发生了节点分裂以及leveup，修改了 search操作要访问的磁盘page y 的数据（将15 分裂到了磁盘page y’ ），导致 search(15)操作最后从磁盘读取的 y page没有找到y。
基于上述B+树在并发操作上的问题，B-Link-Tree 在B+Tree的基础上又增加了两个性质：

为每一个内部节点新增了一个指向兄弟节点的右向指针。
为每一个内部节点引入一个额外的key (high-key)，它是当前节点以及所有子节点中最大的key。

通过这两个性质的约束，实际的实现过程中可以保证正确性的情况下只需要加少量的写锁以及读无锁；能够提升 B+Tree 的并发访问性能。

B-Link-Tree图（图片来源于csdn博客）

3.1.3. B-tree的组织架构

B-tree的页面组织结构图

Page header存储一些元信息，索引类型、版本、页面层级等（linp0也在其中，为了表述方便画出来的）。
itup1、itup2、itup3是有序的索引元组。
linp1、linp2、linp3指向索引元组，存储了元组在页面内的实际位置。
Special space存储指向左右兄弟节点的指针、页面类型等信息。

索引页面组织结构图1（图片来源于PG数据库内核分析书籍）

假设页面填充三个元组就不允许插入新的元组了，此时linp1 2 3分别指向itup 1 2 3，现在继续插入元组会分如下情况：

节点不为最右节点：.首先将itup3（最大的索引元组）复制到当前节点的右兄弟节点，然后将linp0指向itup3（HK）。去掉linp3。使用linp0来指向页面中的HK。（HK不是一个关键字，只是给出该节点的范围）。

索引页面组织结构图2（图片来源于PG数据库内核分析书籍）

节点为最右节点：对于最右节点，其并不需要HK，故将每个linp递减一个位置，linp3不再使用。

索引页面组织结构图3（图片来源于PG数据库内核分析书籍）

根据上面的描述，可以得出一个完整的索引组织结构图：

索引整体组织结构图（图片来源于PG数据库内核分析书籍）

虚线上方为索引结构：

根节点和内部节点层：索引元组指向索引节点。
叶子节点：索引元组指向表元组的存放的实际位置。
根据BTree的特性，索引元组为有序，第一个叶子节点中itup3实际为最大索引元组，即HK，第二个叶子节点中itup1实际为最小索引元组，两者相同，故指向了同一物理存储位置。

虚线下方为表元组

3.2. B-tree构建过程

3.2.1. 数据结构介绍

IndexAmRoutine 结构体规定了制作索引需要定义的属性，以及实现的方法
BtHandler的实现如下
BTBuildState结构体主要保存索引元组
BTWriteState结构体主要保存块号信息等
BTPageState主要保存节点层次信息，以及指向父节点层次的指针

3.2.1.

/*
* API struct for an index AM.  Note this must be stored in a single palloc'd
* chunk of memory.
*/
typedef struct IndexAmRoutine
{
    NodeTag		type;
    
    /*
* Total number of strategies (operators) by which we can traverse/search
* this AM.  Zero if AM does not have a fixed set of strategy assignments.
*/
    uint16		amstrategies;
    /* total number of support functions that this AM uses */
    uint16		amsupport;
    /* opclass options support function number or 0 */
    uint16		amoptsprocnum;
    /* does AM support ORDER BY indexed column's value? */
    bool		amcanorder;
    /* does AM support ORDER BY result of an operator on indexed column? */
    bool		amcanorderbyop;
    /* does AM support backward scanning? */
    bool		amcanbackward;
    /* does AM support UNIQUE indexes? */
    bool		amcanunique;
    /* does AM support multi-column indexes? */
    bool		amcanmulticol;
    /* does AM require scans to have a constraint on the first index column? */
    bool		amoptionalkey;
    /* does AM handle ScalarArrayOpExpr quals? */
    bool		amsearcharray;
    /* does AM handle IS NULL/IS NOT NULL quals? */
    bool		amsearchnulls;
    /* can index storage data type differ from column data type? */
    bool		amstorage;
    /* can an index of this type be clustered on? */
    bool		amclusterable;
    /* does AM handle predicate locks? */
    bool		ampredlocks;
    /* does AM support parallel scan? */
    bool		amcanparallel;
    /* does AM support columns included with clause INCLUDE? */
    bool		amcaninclude;
    /* does AM use maintenance_work_mem? */
    bool		amusemaintenanceworkmem;
    /* OR of parallel vacuum flags.  See vacuum.h for flags. */
    uint8		amparallelvacuumoptions;
    /* type of data stored in index, or InvalidOid if variable */
    Oid			amkeytype;

    /*
* If you add new properties to either the above or the below lists, then
* they should also (usually) be exposed via the property API (see
* IndexAMProperty at the top of the file, and utils/adt/amutils.c).
*/

    /* interface functions */
    ambuild_function ambuild;
    ambuildempty_function ambuildempty;
    aminsert_function aminsert;
    ambulkdelete_function ambulkdelete;
    amvacuumcleanup_function amvacuumcleanup;
    amcanreturn_function amcanreturn;	/* can be NULL */
    amcostestimate_function amcostestimate;
    amoptions_function amoptions;
    amproperty_function amproperty; /* can be NULL */
    ambuildphasename_function ambuildphasename; /* can be NULL */
    amvalidate_function amvalidate;
    amadjustmembers_function amadjustmembers;	/* can be NULL */
    ambeginscan_function ambeginscan;
    amrescan_function amrescan;
    amgettuple_function amgettuple; /* can be NULL */
    amgetbitmap_function amgetbitmap;	/* can be NULL */
    amendscan_function amendscan;
    ammarkpos_function ammarkpos;	/* can be NULL */
    amrestrpos_function amrestrpos; /* can be NULL */

    /* interface functions to support parallel index scans */
    amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
    aminitparallelscan_function aminitparallelscan; /* can be NULL */
    amparallelrescan_function amparallelrescan; /* can be NULL */
} IndexAmRoutine;

/*
 * Btree handler function: return IndexAmRoutine with access method parameters
 * and callbacks.
 */
Datum
bthandler(PG_FUNCTION_ARGS)
{
	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);

	amroutine->amstrategies = BTMaxStrategyNumber;
	amroutine->amsupport = BTNProcs;
	amroutine->amoptsprocnum = BTOPTIONS_PROC;
	amroutine->amcanorder = true;
	amroutine->amcanorderbyop = false;
	amroutine->amcanbackward = true;
	amroutine->amcanunique = true;
	amroutine->amcanmulticol = true;
	amroutine->amoptionalkey = true;
	amroutine->amsearcharray = true;
	amroutine->amsearchnulls = true;
	amroutine->amstorage = false;
	amroutine->amclusterable = true;
	amroutine->ampredlocks = true;
	amroutine->amcanparallel = true;
	amroutine->amcaninclude = true;
	amroutine->amusemaintenanceworkmem = false;
	amroutine->amparallelvacuumoptions =
		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
	amroutine->amkeytype = InvalidOid;

	amroutine->ambuild = btbuild;
	amroutine->ambuildempty = btbuildempty;
	amroutine->aminsert = btinsert;
	amroutine->ambulkdelete = btbulkdelete;
	amroutine->amvacuumcleanup = btvacuumcleanup;
	amroutine->amcanreturn = btcanreturn;
	amroutine->amcostestimate = btcostestimate;
	amroutine->amoptions = btoptions;
	amroutine->amproperty = btproperty;
	amroutine->ambuildphasename = btbuildphasename;
	amroutine->amvalidate = btvalidate;
	amroutine->amadjustmembers = btadjustmembers;
	amroutine->ambeginscan = btbeginscan;
	amroutine->amrescan = btrescan;
	amroutine->amgettuple = btgettuple;
	amroutine->amgetbitmap = btgetbitmap;
	amroutine->amendscan = btendscan;
	amroutine->ammarkpos = btmarkpos;
	amroutine->amrestrpos = btrestrpos;
	amroutine->amestimateparallelscan = btestimateparallelscan;
	amroutine->aminitparallelscan = btinitparallelscan;
	amroutine->amparallelrescan = btparallelrescan;

	PG_RETURN_POINTER(amroutine);
}

/*
 * Working state for btbuild and its callback.
 *
 * When parallel CREATE INDEX is used, there is a BTBuildState for each
 * participant.
 */
typedef struct BTBuildState
{
	bool		isunique;
	bool		nulls_not_distinct;
	bool		havedead;
	Relation	heap;
	BTSpool    *spool;

	/*
	 * spool2 is needed only when the index is a unique index. Dead tuples are
	 * put into spool2 instead of spool in order to avoid uniqueness check.
	 */
	BTSpool    *spool2;
	double		indtuples;

	/*
	 * btleader is only present when a parallel index build is performed, and
	 * only in the leader process. (Actually, only the leader has a
	 * BTBuildState.  Workers have their own spool and spool2, though.)
	 */
	BTLeader   *btleader;
} BTBuildState;

/*
 * Overall status record for index writing phase.
 */
typedef struct BTWriteState
{
	Relation	heap;
	Relation	index;
	BTScanInsert inskey;		/* generic insertion scankey */
	bool		btws_use_wal;	/* dump pages to WAL? 是否开启wal */
	BlockNumber btws_pages_alloced; /* # pages allocated 记录下一次申请page的块号*/
	BlockNumber btws_pages_written; /* # pages written out 已经写入文件的页面块号*/
	Page		btws_zeropage;	/* workspace for filling zeroes */
} BTWriteState;

/*
 * Status record for a btree page being built.  We have one of these
 * for each active tree level.
 */
typedef struct BTPageState
{
	Page		btps_page;		/* workspace for page building */
	BlockNumber btps_blkno;		/* block # to write this page at */
	IndexTuple	btps_lowkey;	/* page's strict lower bound pivot tuple */
	OffsetNumber btps_lastoff;	/* last item offset loaded */
	Size		btps_lastextra; /* last item's extra posting list space */
	uint32		btps_level;		/* tree level (0 = leaf) */
	Size		btps_full;		/* "full" if less than this much free space */
	struct BTPageState *btps_next;	/* link to parent level, if any */
} BTPageState;

3.2.2. 构建流程

整体过程：首先将待索引的表元组封装为索引元组，并对索引元组进行排序，然后将排好序的索引元组填充到当前叶子节点中，若当前叶子节点填充满，则新申请一个叶子节点作为当前叶子节点的右兄弟节点，然后在父节点中添加指向新叶子节点的指针（没有父节点则创建），接下来把新申请的节点当作叶子节点。重复这个过程，直到添加完成所有的索引元组为止。
创建索引流程图

indexBuildHeapScan函数为BTBuildState结构体中的spool1 2赋值。

构建流程图1（图片来源于PG数据库内核分析书籍）

生成索引结构流程图

构建流程图2（图片来源于PG数据库内核分析书籍）

插入索引元组流程和实例图（假设一个页面最多两个元组）

add函数插入时发现空间不够，则申请新的页面作为右兄弟节点，然后设置旧节点HK。
接着通过当前层次的BTPageState.btps_next找到父节点（若没有则创建），将旧节点的min key插入到父节点（同样使用add函数，递归调用），然后设置新旧节点的兄弟关系。
旧节点填充完成不会再修改，此时调用_bt_blwritepage将旧节点信息写入索引文件，并修改BTWriteState结构。
最后调用_bt_sortaddtup将待插入的索引元组插入到节点（若右有足够空间直接调用次函数）

构建流程图3（图片来源于PG数据库内核分析书籍）

上述得到了初步的索引结构，但每层最右节点与父节点的连接还没完成，因此使用_uppershutdown函数进行调整

BTPageState结构中有指向上一层的BTPageState结构，通过这个结构依次调整最右节点和父节点

构建流程图4（图片来源于PG数据库内核分析书籍）

3.2.3. 插入流程

插入背景：表已经创建索引，现在插入新的元组，索引也需要调整，使用btinsert实现。
btinsert过程：往下找到合适的插入节点（若是唯一索引，则验证），将其插入到索引中。

找合适节点的过程中，会利用BTStackData保存查询过程中的父节点（保存查询路径），便于插入操作需要分裂叶子节点时，可以根据次结构找到父节点，并对父节点做出调整。
当需要分裂时，需要找到合适的分裂位置，保证左右节点的剩余空间合理（PG规定，若是最右节点，分裂后产生的右节点应为左节点剩余空间的两倍，否则相等，也可以在创建索引时指定fillfactor填充因子）。

插入索引元组的流程图。

插入流程图（图片来源于PG数据库内核分析书籍）

_bt_moveright是根据L&Y的论文实现的，通过 _bt_moveright 检测当前操作的页面是否有并发insert 导致 page-split 的情况。如果发现查找的key 大于页面的 high-key，则通过页面的右指针移动到其兄弟节点。
_bt_findinsertloc不会局限于当前的节点，还会到右兄弟节点中查找。
_bt_insertonpg执行插入操作，检查是否有足够的剩余空间，如果有直接插入，如果没有，则调用_bt_findsplitloc寻找最合适的分裂位置，然后进行分裂操作，最后插入元组。

4. 总结

本篇简单介绍了索引概念、分类，并展示了索引的相关实践。
关于索引的原理，主要是以Btree为主，阐述了PG中Btree的背景以及构建过程。
篇幅和时间原因，没有描述函数具体实现和delete元组的操作。

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

PG数据库中索引构建过程分析（一）

1. 索引概述

1.1. 作用

1.2. 分类

2. 索引实践

2.1. 索引增删改

2.2. 选择好的索引

2.3. 执行计划

3. 索引原理（Btree为主）

3.1. B-tree背景

3.1.1. 回顾B+树

3.1.2. B-Link-Tree

3.1.3. B-tree的组织架构

3.2. B-tree构建过程

3.2.1. 数据结构介绍

3.2.1.

3.2.2. 构建流程

3.2.3. 插入流程

4. 总结

PG数据库中索引构建过程分析（一）

1. 索引概述

1.1. 作用

1.2. 分类

2. 索引实践

2.1. 索引增删改

2.2. 选择好的索引

2.3. 执行计划

3. 索引原理（Btree为主）

3.1. B-tree背景

3.1.1. 回顾B+树

3.1.2. B-Link-Tree

3.1.3. B-tree的组织架构

3.2. B-tree构建过程

3.2.1. 数据结构介绍

3.2.1.

3.2.2. 构建流程

3.2.3. 插入流程

4. 总结