PostgreSQL索引浅析-天翼云开发者社区

概述

PG索引类型有Btree、Hash、GiST、GIN、SPGiST、BRIN等。支持的索引方式包括唯一索引、主键索引、多属性索引、部分索引和表达式索引。其中部分索引是对表的子集建立索引（例如一个表中id在1~100的元祖的属性建立索引）。表达式索引可建立在一个函数或表中属性计算出来的表达式上。

索引相关系统表

索引也是表，需通过系统表进行管理。各系统表作用如下：

pg_am：PG支持的索引结构都存储在pg_am中。

pg_opclass：索引可以运用在不同数据类型的字段上，索引结构并不直接管理目标字段的类型，这些信息由pg_opclass系统表维护。pg_opclass定义索引访问方法操作符类。

pg_opfamily：管理不同索引下的数据库的操作符集合。以Btree为例，共有42种opclass。其中，int2_ops、int4_ops、int8_ops三个opclass属于同一个opfamily，oid为1976。
pg_index：存储创建元组的部分信息，创建索引时，PG会找出每个索引列对应的opclass，并且把它们存在pg_index系统表的indclass列中。

pg_class：保存所有表、视图、索引等信息。每个创建的索引都会作为一个元组保存在pg_class中，可通过索引名获取其oid。

pg_amproc：存储具体操作符的过程函数信息，对btree来说，主要与比较和排序相关。

pg_amop：存储具体操作符的关联信息。因为pg_opfamily只是提供了操作符集合，具体的操作符实现存储在这里。可以看到一个操作符类包含哪些操作符，使得条件中包含这个操作符时可以使用这个索引。根据操作符左右两边的数据类型，相同的操作符也有不同的oid。

索引代码结构

在PG中，索引接口被设计为上层操作函数与下层接口函数。上层操作函数统一了不同索引类型的操作。下层接口函数实现了不同索引类型对此的操作。

索引的操作函数：每种索引类型都在pg_am中注册了操作函数，不同索引支持的操作函数的数量并不相同。

PG索引的下层接口函数。

ambuild	构建索引
ambuildempty	构造空索引
aminsert	向现有索引插入新的索引元组
ambulkdelete	从索引中删除元组
amvacuumcleanup	在一次VACUUM后执行，进行额外的清理，一般用于批量删除后
amcanreturn	判断能否返回一个索引元组
amcostestimate	估算一个索引扫描的代价
amoptions	分析和验证一个索引的reloptions数组
amproperty	给出索引的字段属性
ambuildphasename	返回amproperty所属阶段
amvalidate	验证运算符类的定义
ambeginscan	开始一个新的扫描
amrescan	重新扫描
amgettuple	在给出的扫描描述符里获取下一个元组
amgetbitmap	获取索引位图
amendscan	结束扫描，释放资源
ammarkpos	标记当前扫描位置
amrestrpos	把扫描恢复到最近标记的位置
amestimateparallelscan	估算索引并行扫描运算符的大小
aminitparallelscan	初始化并行扫描
amparallelrescan	重新并行扫描

以B树结构为例，btree对应的amhandler为bthandler，其中定义了B树结构索引需要实现得下层操作函数。btree索引的ambuild是btbuild，通过调用btbuild实现btree索引构建逻辑。

Datum
bthandler(PG_FUNCTION_ARGS)
{
    IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);

    amroutine->amstrategies = BTMaxStrategyNumber;
    amroutine->amsupport = BTNProcs;
    amroutine->amcanorder = true;
    amroutine->amcanorderbyop = false;
    amroutine->amcanbackward = true;
    amroutine->amcanunique = true;
    amroutine->amcanmulticol = true;
    amroutine->amoptionalkey = true;
    amroutine->amsearcharray = true;
    amroutine->amsearchnulls = true;
    amroutine->amstorage = false;
    amroutine->amclusterable = true;
    amroutine->ampredlocks = true;
    amroutine->amcanparallel = true;
    amroutine->amcaninclude = true;
    amroutine->amkeytype = InvalidOid;

    amroutine->ambuild = btbuild;
    amroutine->ambuildempty = btbuildempty;
    amroutine->aminsert = btinsert;
    amroutine->ambulkdelete = btbulkdelete;
    amroutine->amvacuumcleanup = btvacuumcleanup;
    amroutine->amcanreturn = btcanreturn;
    amroutine->amcostestimate = btcostestimate;
    amroutine->amoptions = btoptions;
    amroutine->amproperty = btproperty;
    amroutine->ambuildphasename = btbuildphasename;
    amroutine->amvalidate = btvalidate;
    amroutine->ambeginscan = btbeginscan;
    amroutine->amrescan = btrescan;
    amroutine->amgettuple = btgettuple;
    amroutine->amgetbitmap = btgetbitmap;
    amroutine->amendscan = btendscan;
    amroutine->ammarkpos = btmarkpos;
    amroutine->amrestrpos = btrestrpos;
    amroutine->amestimateparallelscan = btestimateparallelscan;
    amroutine->aminitparallelscan = btinitparallelscan;
    amroutine->amparallelrescan = btparallelrescan;

    PG_RETURN_POINTER(amroutine);
}

以创建索引为例，当执行“create index xxx_idx on table(xxx);”时，索引创建流程如图所示。

DefineIndex函数会创建索引相关结构体并校验相关参数合法性，其内部执行index_create函数。

Oid
index_create(Relation heapRelation,
             const char *indexRelationName,
             ...
             ...
             bool allow_system_table_mods,
             bool is_internal,
             Oid *constraintId)
{
    Oid         heapRelationId = RelationGetRelid(heapRelation);
    ...
    ...
    char        relkind;
    TransactionId relfrozenxid;
    MultiXactId relminmxid;

    /* 参数检查初始化、操作符类检查、参数合法检查 */
    Assert((constr_flags == 0) ||
           ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
    Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
    ...
    ...
    /*
     * 构建用于描述索引元祖信息的结构
     */
    indexTupDesc = ConstructTupleDescriptor(heapRelation,
                                            indexInfo,
                                            indexColNames,
                                            accessMethodObjectId,
                                            collationObjectId,
                                            classObjectId);

    /*
     * OID检查，为索引生成新的OID
     */
    if (!OidIsValid(indexRelationId))
    {
        if (IsBinaryUpgrade)
        {
            if (!OidIsValid(binary_upgrade_next_index_pg_class_oid))
                ereport(ERROR,
                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                         errmsg("pg_class index OID value not set when in binary upgrade mode")));

            indexRelationId = binary_upgrade_next_index_pg_class_oid;
            binary_upgrade_next_index_pg_class_oid = InvalidOid;
        }
        else
        {
            indexRelationId =
                GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
        }
    }

    /*
     * 在磁盘上创建索引文件，新建索引时relfilenode设置为和OID相同
     */
    indexRelation = heap_create(indexRelationName,
                                namespaceId,
                                tableSpaceId,
                                ...
                                &relfrozenxid,
                                &relminmxid);
    ...
    ...

    /*
     * 维护系统表pg_class、pg_attribute、pg_index、pg_constraint等
     */
    InsertPgClassTuple(pg_class, indexRelation,
                       RelationGetRelid(indexRelation),
                       (Datum) 0,
                       reloptions);
    ...
    ...
    
    /* 进入index_build真正构建索引结构 */
    if (IsBootstrapProcessingMode())
    {
        index_register(heapRelationId, indexRelationId, indexInfo);
    }
    else if ((flags & INDEX_CREATE_SKIP_BUILD) != 0)
    {
        index_update_stats(heapRelation,
                           true,
                           -1.0);
        CommandCounterIncrement();
    }
    else
    {
        index_build(heapRelation, indexRelation, indexInfo, false, true);
    }
    index_close(indexRelation, NoLock);

    return indexRelationId;
}

调用index_build之前，索引相关元数据及系统表已插入，索引文件已创建，build根据pg_am中ambuild指定的索引构建函数执行创建流程。

void
index_build(Relation heapRelation,
            Relation indexRelation,
            IndexInfo *indexInfo,
            bool isreindex,
            bool parallel)
{
    IndexBuildResult *stats;
    Oid         save_userid;
    int         save_sec_context;
    int         save_nestlevel;

    /*
     * 安全检查、环境检查、信息初始化等
     */
    Assert(RelationIsValid(indexRelation));
    Assert(PointerIsValid(indexRelation->rd_indam));
    Assert(PointerIsValid(indexRelation->rd_indam->ambuild));
    Assert(PointerIsValid(indexRelation->rd_indam->ambuildempty));

    ...
    ...

    /*
     * 调用下层接口ambuild创建索引
     */
    stats = indexRelation->rd_indam->ambuild(heapRelation, indexRelation,
                                             indexInfo);
    Assert(PointerIsValid(stats));

    ...
    ...

    AtEOXact_GUC(false, save_nestlevel);
    SetUserIdAndSecContext(save_userid, save_sec_context);
}

索引扫描

查询SQL执行

用户对数据库表格数据进行增删查改操作时，PG会解析这些语言后会生成一颗计划树用于执行查询。普通的增删改查命令由exec_simple_query函数处理，该函数会创建PortalData数据结构，Portal记录了该SQL执行的所有信息。对于普通的查询语句，Portal按照执行策略会选择Executor进行处理。

exec_simple_query函数中的PortalStart会调用ChoosePortalStrategy选取执行策略，并创建维护queryDesc信息，该结构中包含PlanState节点。实际上，PG的查询计划会被组织为树形PlanState，树的每个节点代表一个执行任务。

PG按功能将这些节点分为了控制节点、扫描节点、连接节点、物化节点。

扫描节点

所有扫描节点都使用Scan作为公共父类。

typedef struct Scan
{
    Plan        plan;
    Index       scanrelid;      /* relid is index into the range table */
} Scan;

PG中的扫描包含以下Scan类型。

T_ScanState,
T_SeqScanState,
T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
T_BitmapHeapScanState,
T_TidScanState,
T_SubqueryScanState,
T_FunctionScanState,
T_TableFuncScanState,
T_ValuesScanState,
T_CteScanState,
T_NamedTuplestoreScanState,
T_WorkTableScanState,
T_ForeignScanState,
T_CustomScanState

扫描节点有各自的执行函数，这些函数都由ExecScan来实现，ExecScan会迭代的进行扫描，使用accessMtd获取元祖。

IndexScan

当查询条件的属性存在索引时，则生成的查询计划中涉及的扫描会使用indexScan节点。

以下SQL为例，属性id存在索引id_idx。

select * from demo where id=xxx;

执行器中ExecutePlan函数处理查询计划，其主体是一个循环，直到检索到指定数量的元组。通过ExecProcNode执行任务，该函数会递归调用获取元祖。对于需要进行索引扫描的节点会调用ExecIndexScan执行。实际上ExecIndexScan会调用IndexNext检索元祖，IndexNext会调用至index_beginscan_internal函数，最终由ambeginscan选择对应结构的索引访问方法进行真正的扫描。其调用过程如图所示。

索引路径生成

当满足条件的元组较大时，索引扫描的代价会高于全表扫描，这时PG会选择全表扫描执行。

实际上，对查询SQL中的计划命令处理，无非是获取一个或多个元组，然后将这些元组返回给用户。对于一个执行计划来说，最重要的是如何获取元组。元组来源于一个基本表或一系列基本表连接而成的连接表。由于单表的访问方式、两个表的连接方式以及多表之间的连接顺序不同，因此可能有多条路径。需选取效率最高的路径作为执行计划。

exec_simple_query函数生成Portal之前会通过pg_plan_queries进行查询规划。该函数调用链如下。

exec_simple_query
    -> pg_plan_queries
        -> pg_plan_query 
            -> planner  //优化器的入口函数，输入为经过重写后的查询树，输出最优的计划树
                -> standard_planner //标准优化器入口
                    -> subquery_planner // 优化处理主体函数
                        -> grouping_planner // 不存在继承表的规划处理

pg_plan_queries会将查询树链表转换为执行计划链表，它调用pg_plan_query对每个查询树进行处理。planner函数开始会进入执行计划的生成阶段，通过standard_planner进入标准查询规划处理。subquery_planner递归调用为子查询生成相应子计划。后进入grouping_planner开始查询路径生成。

PG有三种算法生成路径：动态规划、遗传算法、用户自定义。

在生成路径时会遍历所有的关系生成不同的访问路径。query_planner会为查询生成路径。

-> query_planner // 为查询生成路径
    // 获取RelOptInfo
     -> add_base_rels_to_query // 递归找出基本表并生成基本关系
        -> build_simple_rel // 构建表信息结构RelOptInfo
            -> get_relation_info //补全RelOptInfo
                ...
     -> make_one_rel
         -> set_base_rel_pathlists

set_base_rel_pathlists会为每个关系生成不同的访问路径。

static void
set_base_rel_pathlists(PlannerInfo *root)
{
         ...
         ...
        set_rel_pathlist(root, rel, rti, root->simple_rte_array[rti]);
}

static void
set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                 Index rti, RangeTblEntry *rte)
{
        ......
        switch (rel->rtekind)
        {
            //不同关系有不同的生成路径
            case RTE_RELATION:
                if (rte->relkind == RELKIND_FOREIGN_TABLE)
                {
                    /* Foreign table */
                    set_foreign_pathlist(root, rel, rte);
                }
                else if (rte->tablesample != NULL)
                {
                    /* Sampled relation */
                    set_tablesample_rel_pathlist(root, rel, rte);
                }
                else
                {
                    /* 普通关系 */
                    set_plain_rel_pathlist(root, rel, rte);
                }
                break;
            ......
        }
    }
    ......
    
    /* Now find the cheapest of the paths for this rel */
    set_cheapest(rel);
}

set_base_rel_pathlists实际调用set_rel_pathlis处理各种关系的路径，对于非子查询、无继承关系的查询称为普通关系，由set_plain_rel_pathlist处理。

static void
set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
{
    Relids      required_outer;
    required_outer = rel->lateral_relids;

    /* 添加顺序扫描路径 */
    add_path(rel, create_seqscan_path(root, rel, required_outer, 0));

    if (rel->consider_parallel && required_outer == NULL)
        create_plain_partial_paths(root, rel);

    /* 添加索引扫描路径 */
    create_index_paths(root, rel);

    create_tidscan_paths(root, rel);
}

首先会添加顺序扫描路径至pathlist中，若关系涉及索引将会建立索引扫描路径并加入到pathlist。但最终生成的路径不一定会保留在pathlist中，除非在代价或排序上更有优势。

以下为索引路径创建过程。

void
create_index_paths(PlannerInfo *root, RelOptInfo *rel)
{
    ...
    foreach(lc, rel->indexlist)
    {
        IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
        Assert(index->nkeycolumns <= INDEX_MAX_KEYS);
        if (index->indpred != NIL && !index->predOK)
            continue;
              
        MemSet(&rclauseset, 0, sizeof(rclauseset));
        match_restriction_clauses_to_index(root, index, &rclauseset);
        
        get_index_paths(root, rel, index, &rclauseset,
                        &bitindexpaths);

        MemSet(&jclauseset, 0, sizeof(jclauseset));
        match_join_clauses_to_index(root, rel, index,
                                    &jclauseset, &joinorclauses);

       ...
              
      }
    ...
    }

在生成索引路径之前通过match_restriction_clauses_to_index函数将索引与约束进行匹配。

create_index_paths
    -> match_restriction_clauses_to_index
        -> match_clauses_to_index
            -> match_clause_to_index
                -> match_clause_to_indexcol
                    -> match_opclause_to_indexcol

match_opclause_to_indexcol负责将一个条件与索引列进行匹配。在生成路径前，查询中的条件操作符会被解析，通过操作符oid和opfamily的oid对pg_amop系统表进行查询。若查询为空则当前条件不能使用索引。

get_index_paths

static void
get_index_paths(PlannerInfo *root, RelOptInfo *rel,
                IndexOptInfo *index, IndexClauseSet *clauses,
                List **bitindexpaths)
{
    ...
    indexpaths = build_index_paths(root, rel,
                                   index, clauses,
                                   index->predOK,
                                   ST_ANYSCAN,
                                   &skip_nonnative_saop,
                                   &skip_lower_saop);
     ...
     
    foreach(lc, indexpaths)
    {
        IndexPath  *ipath = (IndexPath *) lfirst(lc);

        if (index->amhasgettuple)
            add_path(rel, (Path *) ipath);

        if (index->amhasgetbitmap &&
            (ipath->path.pathkeys == NIL ||
             ipath->indexselectivity < 1.0))
            *bitindexpaths = lappend(*bitindexpaths, ipath);
    }

    ...
}

位图扫描：当满足条件的数据行数很少时，使用索引扫描十分高效。但当满足条件的行数增加时，可能需要在多次访问同一个页面，并且在多个页面之间来回跳跃，而不是一次性将某个基表页面上满足条件的数据行全部读取出来，导致查询性能非最优。此时，优化器转而会使用位图扫描。

build_index_paths

static List *
build_index_paths(PlannerInfo *root, RelOptInfo *rel,
                  IndexOptInfo *index, IndexClauseSet *clauses,
                  bool useful_predicate,
                  ScanTypeControl scantype,
                  bool *skip_nonnative_saop,
                  bool *skip_lower_saop)
{
    ...

    /* 1.处理SAOP */
    for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++)
    {
        ...
        if (index_clauses == NIL && !index->amoptionalkey)
            return NIL;
    }
    ...

    /* 2.计算排序相关的优化 */
    pathkeys_possibly_useful = (scantype != ST_BITMAPSCAN &&
                                !found_lower_saop_clause &&
                                has_useful_pathkeys(root, rel));
    index_is_ordered = (index->sortopfamily != NULL);
    if (index_is_ordered && pathkeys_possibly_useful)
    {
        index_pathkeys = build_index_pathkeys(root, index,
                                              ForwardScanDirection);
     ...

    /* 3. 覆盖索引 */
    index_only_scan = (scantype != ST_BITMAPSCAN &&
                       check_index_only(rel, index));

    /* 4.生成索引路径 */
    if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
        index_only_scan)
    {
        ipath = create_index_path(root, index,
                                  index_clauses,
                                  ...
                                  false);
        result = lappend(result, ipath);
        ...
        
        }
    }

    /*
     * 5. backwards scan 个人理解升序索引的降序排序处理。
     */
    if (index_is_ordered && pathkeys_possibly_useful)
    {
        index_pathkeys = build_index_pathkeys(root, index,BackwardScanDirection);
        useful_pathkeys = truncate_useless_pathkeys(root, rel,index_pathkeys);
        if (useful_pathkeys != NIL)
        {
            ipath = create_index_path(root, index,
                                      index_clauses,
                                      NIL,
            ...
        }
    }

    return result;
}

create_index_path

IndexPath *
create_index_path(PlannerInfo *root,
                  IndexOptInfo *index,
                  List *indexclauses,
                  List *indexorderbys,
                  List *indexorderbycols,
                  List *pathkeys,
                  ScanDirection indexscandir,
                  bool indexonly,
                  Relids required_outer,
                  double loop_count,
                  bool partial_path)
{
    IndexPath  *pathnode = makeNode(IndexPath);
    RelOptInfo *rel = index->rel;
    /* 维护参数化路径的参数信息 */
    pathnode->path.pathtype = indexonly ? T_IndexOnlyScan : T_IndexScan;
    pathnode->path.parent = rel;
    pathnode->path.pathtarget = rel->reltarget;
    pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
                                                          required_outer);
    pathnode->path.parallel_aware = false;
    pathnode->path.parallel_safe = rel->consider_parallel;
    pathnode->path.parallel_workers = 0;
    pathnode->path.pathkeys = pathkeys;

    pathnode->indexinfo = index;
    pathnode->indexclauses = indexclauses;
    pathnode->indexorderbys = indexorderbys;
    pathnode->indexorderbycols = indexorderbycols;
    pathnode->indexscandir = indexscandir;
    /* 计算索引代价 */
    cost_index(pathnode, root, loop_count, partial_path);

    return pathnode;
}

代价估计需从I/O和CPU开销两个方面考虑。

void
cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
           bool partial_path)
{
    IndexOptInfo *index = path->indexinfo;
    ...
    /* 调用索引自身的代价估计函数计算启动代价、总代价、选择度、相关度 */
    amcostestimate = (amcostestimate_function) index->amcostestimate;
    amcostestimate(root, path, loop_count,
                   &indexStartupCost, &indexTotalCost,
                   &indexSelectivity, &indexCorrelation,
                   &index_pages);
    /* 记录索引启动代价与运行代价 */
    startup_cost += indexStartupCost;
    run_cost += indexTotalCost - indexStartupCost;

    /* 估算基表要取出的元组数 */
    tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

    if (loop_count > 1)
        ...
        /* 估算最大最小IO代价 */
        rand_heap_pages = pages_fetched;
        max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
        pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
        pages_fetched = index_pages_fetched(pages_fetched * loop_count,
                                            baserel->pages,
                                            (double) index->pages,
                                            root);
        min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
    }
    else
    {
        ...
    }

    ...

    /* 根据最大最小代价和索引相关度计算总代价并加到运行代价中 */
    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

    /* 估算CPU代价 */
    cost_qual_eval(&qpqual_cost, qpquals, root);

    startup_cost += qpqual_cost.startup;
    cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
    cpu_run_cost += cpu_per_tuple * tuples_fetched;
    
    ...
    /* 叠加出总代价并保存在path中 */
    run_cost += cpu_run_cost;
    path->path.startup_cost = startup_cost;
    path->path.total_cost = startup_cost + run_cost;
}

create_index_paths函数会通过add_path将索引路径添加至pathlist中。在添加时会将新路径与老路径进行比较，若新路径更优，则会保留新路径，删除老路径。

当普通关系路径构建完成后，通过set_cheapest函数设置代价最低路径。接着在外层make_one_rel中借由动态规划等算法生成最终路径。

总结以上流程。

实际上在许多场景下，索引有着更复杂的优化，比如涉及表连接、子查询、排序、覆盖索引、位图扫描转换等。此处仅以一个简单的例子对索引的优化过程进行了分析。

概述

索引相关系统表

索引也是表，需通过系统表进行管理。各系统表作用如下：

pg_am：PG支持的索引结构都存储在pg_am中。

pg_opclass：索引可以运用在不同数据类型的字段上，索引结构并不直接管理目标字段的类型，这些信息由pg_opclass系统表维护。pg_opclass定义索引访问方法操作符类。

pg_opfamily：管理不同索引下的数据库的操作符集合。以Btree为例，共有42种opclass。其中，int2_ops、int4_ops、int8_ops三个opclass属于同一个opfamily，oid为1976。
pg_index：存储创建元组的部分信息，创建索引时，PG会找出每个索引列对应的opclass，并且把它们存在pg_index系统表的indclass列中。

pg_class：保存所有表、视图、索引等信息。每个创建的索引都会作为一个元组保存在pg_class中，可通过索引名获取其oid。

pg_amproc：存储具体操作符的过程函数信息，对btree来说，主要与比较和排序相关。

pg_amop：存储具体操作符的关联信息。因为pg_opfamily只是提供了操作符集合，具体的操作符实现存储在这里。可以看到一个操作符类包含哪些操作符，使得条件中包含这个操作符时可以使用这个索引。根据操作符左右两边的数据类型，相同的操作符也有不同的oid。

索引代码结构

在PG中，索引接口被设计为上层操作函数与下层接口函数。上层操作函数统一了不同索引类型的操作。下层接口函数实现了不同索引类型对此的操作。

索引的操作函数：每种索引类型都在pg_am中注册了操作函数，不同索引支持的操作函数的数量并不相同。

PG索引的下层接口函数。

ambuild	构建索引
ambuildempty	构造空索引
aminsert	向现有索引插入新的索引元组
ambulkdelete	从索引中删除元组
amvacuumcleanup	在一次VACUUM后执行，进行额外的清理，一般用于批量删除后
amcanreturn	判断能否返回一个索引元组
amcostestimate	估算一个索引扫描的代价
amoptions	分析和验证一个索引的reloptions数组
amproperty	给出索引的字段属性
ambuildphasename	返回amproperty所属阶段
amvalidate	验证运算符类的定义
ambeginscan	开始一个新的扫描
amrescan	重新扫描
amgettuple	在给出的扫描描述符里获取下一个元组
amgetbitmap	获取索引位图
amendscan	结束扫描，释放资源
ammarkpos	标记当前扫描位置
amrestrpos	把扫描恢复到最近标记的位置
amestimateparallelscan	估算索引并行扫描运算符的大小
aminitparallelscan	初始化并行扫描
amparallelrescan	重新并行扫描

Datum
bthandler(PG_FUNCTION_ARGS)
{
    IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);

    amroutine->amstrategies = BTMaxStrategyNumber;
    amroutine->amsupport = BTNProcs;
    amroutine->amcanorder = true;
    amroutine->amcanorderbyop = false;
    amroutine->amcanbackward = true;
    amroutine->amcanunique = true;
    amroutine->amcanmulticol = true;
    amroutine->amoptionalkey = true;
    amroutine->amsearcharray = true;
    amroutine->amsearchnulls = true;
    amroutine->amstorage = false;
    amroutine->amclusterable = true;
    amroutine->ampredlocks = true;
    amroutine->amcanparallel = true;
    amroutine->amcaninclude = true;
    amroutine->amkeytype = InvalidOid;

    amroutine->ambuild = btbuild;
    amroutine->ambuildempty = btbuildempty;
    amroutine->aminsert = btinsert;
    amroutine->ambulkdelete = btbulkdelete;
    amroutine->amvacuumcleanup = btvacuumcleanup;
    amroutine->amcanreturn = btcanreturn;
    amroutine->amcostestimate = btcostestimate;
    amroutine->amoptions = btoptions;
    amroutine->amproperty = btproperty;
    amroutine->ambuildphasename = btbuildphasename;
    amroutine->amvalidate = btvalidate;
    amroutine->ambeginscan = btbeginscan;
    amroutine->amrescan = btrescan;
    amroutine->amgettuple = btgettuple;
    amroutine->amgetbitmap = btgetbitmap;
    amroutine->amendscan = btendscan;
    amroutine->ammarkpos = btmarkpos;
    amroutine->amrestrpos = btrestrpos;
    amroutine->amestimateparallelscan = btestimateparallelscan;
    amroutine->aminitparallelscan = btinitparallelscan;
    amroutine->amparallelrescan = btparallelrescan;

    PG_RETURN_POINTER(amroutine);
}

以创建索引为例，当执行“create index xxx_idx on table(xxx);”时，索引创建流程如图所示。

DefineIndex函数会创建索引相关结构体并校验相关参数合法性，其内部执行index_create函数。

Oid
index_create(Relation heapRelation,
             const char *indexRelationName,
             ...
             ...
             bool allow_system_table_mods,
             bool is_internal,
             Oid *constraintId)
{
    Oid         heapRelationId = RelationGetRelid(heapRelation);
    ...
    ...
    char        relkind;
    TransactionId relfrozenxid;
    MultiXactId relminmxid;

    /* 参数检查初始化、操作符类检查、参数合法检查 */
    Assert((constr_flags == 0) ||
           ((flags & INDEX_CREATE_ADD_CONSTRAINT) != 0));
    Assert(!partitioned || (flags & INDEX_CREATE_SKIP_BUILD));
    ...
    ...
    /*
     * 构建用于描述索引元祖信息的结构
     */
    indexTupDesc = ConstructTupleDescriptor(heapRelation,
                                            indexInfo,
                                            indexColNames,
                                            accessMethodObjectId,
                                            collationObjectId,
                                            classObjectId);

    /*
     * OID检查，为索引生成新的OID
     */
    if (!OidIsValid(indexRelationId))
    {
        if (IsBinaryUpgrade)
        {
            if (!OidIsValid(binary_upgrade_next_index_pg_class_oid))
                ereport(ERROR,
                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                         errmsg("pg_class index OID value not set when in binary upgrade mode")));

            indexRelationId = binary_upgrade_next_index_pg_class_oid;
            binary_upgrade_next_index_pg_class_oid = InvalidOid;
        }
        else
        {
            indexRelationId =
                GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
        }
    }

    /*
     * 在磁盘上创建索引文件，新建索引时relfilenode设置为和OID相同
     */
    indexRelation = heap_create(indexRelationName,
                                namespaceId,
                                tableSpaceId,
                                ...
                                &relfrozenxid,
                                &relminmxid);
    ...
    ...

    /*
     * 维护系统表pg_class、pg_attribute、pg_index、pg_constraint等
     */
    InsertPgClassTuple(pg_class, indexRelation,
                       RelationGetRelid(indexRelation),
                       (Datum) 0,
                       reloptions);
    ...
    ...
    
    /* 进入index_build真正构建索引结构 */
    if (IsBootstrapProcessingMode())
    {
        index_register(heapRelationId, indexRelationId, indexInfo);
    }
    else if ((flags & INDEX_CREATE_SKIP_BUILD) != 0)
    {
        index_update_stats(heapRelation,
                           true,
                           -1.0);
        CommandCounterIncrement();
    }
    else
    {
        index_build(heapRelation, indexRelation, indexInfo, false, true);
    }
    index_close(indexRelation, NoLock);

    return indexRelationId;
}

调用index_build之前，索引相关元数据及系统表已插入，索引文件已创建，build根据pg_am中ambuild指定的索引构建函数执行创建流程。

void
index_build(Relation heapRelation,
            Relation indexRelation,
            IndexInfo *indexInfo,
            bool isreindex,
            bool parallel)
{
    IndexBuildResult *stats;
    Oid         save_userid;
    int         save_sec_context;
    int         save_nestlevel;

    /*
     * 安全检查、环境检查、信息初始化等
     */
    Assert(RelationIsValid(indexRelation));
    Assert(PointerIsValid(indexRelation->rd_indam));
    Assert(PointerIsValid(indexRelation->rd_indam->ambuild));
    Assert(PointerIsValid(indexRelation->rd_indam->ambuildempty));

    ...
    ...

    /*
     * 调用下层接口ambuild创建索引
     */
    stats = indexRelation->rd_indam->ambuild(heapRelation, indexRelation,
                                             indexInfo);
    Assert(PointerIsValid(stats));

    ...
    ...

    AtEOXact_GUC(false, save_nestlevel);
    SetUserIdAndSecContext(save_userid, save_sec_context);
}

索引扫描

查询SQL执行

PG按功能将这些节点分为了控制节点、扫描节点、连接节点、物化节点。

扫描节点

所有扫描节点都使用Scan作为公共父类。

typedef struct Scan
{
    Plan        plan;
    Index       scanrelid;      /* relid is index into the range table */
} Scan;

PG中的扫描包含以下Scan类型。

T_ScanState,
T_SeqScanState,
T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
T_BitmapHeapScanState,
T_TidScanState,
T_SubqueryScanState,
T_FunctionScanState,
T_TableFuncScanState,
T_ValuesScanState,
T_CteScanState,
T_NamedTuplestoreScanState,
T_WorkTableScanState,
T_ForeignScanState,
T_CustomScanState

扫描节点有各自的执行函数，这些函数都由ExecScan来实现，ExecScan会迭代的进行扫描，使用accessMtd获取元祖。

IndexScan

当查询条件的属性存在索引时，则生成的查询计划中涉及的扫描会使用indexScan节点。

以下SQL为例，属性id存在索引id_idx。

select * from demo where id=xxx;

索引路径生成

当满足条件的元组较大时，索引扫描的代价会高于全表扫描，这时PG会选择全表扫描执行。

exec_simple_query函数生成Portal之前会通过pg_plan_queries进行查询规划。该函数调用链如下。

exec_simple_query
    -> pg_plan_queries
        -> pg_plan_query 
            -> planner  //优化器的入口函数，输入为经过重写后的查询树，输出最优的计划树
                -> standard_planner //标准优化器入口
                    -> subquery_planner // 优化处理主体函数
                        -> grouping_planner // 不存在继承表的规划处理

PG有三种算法生成路径：动态规划、遗传算法、用户自定义。

在生成路径时会遍历所有的关系生成不同的访问路径。query_planner会为查询生成路径。

-> query_planner // 为查询生成路径
    // 获取RelOptInfo
     -> add_base_rels_to_query // 递归找出基本表并生成基本关系
        -> build_simple_rel // 构建表信息结构RelOptInfo
            -> get_relation_info //补全RelOptInfo
                ...
     -> make_one_rel
         -> set_base_rel_pathlists

set_base_rel_pathlists会为每个关系生成不同的访问路径。

static void
set_base_rel_pathlists(PlannerInfo *root)
{
         ...
         ...
        set_rel_pathlist(root, rel, rti, root->simple_rte_array[rti]);
}

static void
set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                 Index rti, RangeTblEntry *rte)
{
        ......
        switch (rel->rtekind)
        {
            //不同关系有不同的生成路径
            case RTE_RELATION:
                if (rte->relkind == RELKIND_FOREIGN_TABLE)
                {
                    /* Foreign table */
                    set_foreign_pathlist(root, rel, rte);
                }
                else if (rte->tablesample != NULL)
                {
                    /* Sampled relation */
                    set_tablesample_rel_pathlist(root, rel, rte);
                }
                else
                {
                    /* 普通关系 */
                    set_plain_rel_pathlist(root, rel, rte);
                }
                break;
            ......
        }
    }
    ......
    
    /* Now find the cheapest of the paths for this rel */
    set_cheapest(rel);
}

set_base_rel_pathlists实际调用set_rel_pathlis处理各种关系的路径，对于非子查询、无继承关系的查询称为普通关系，由set_plain_rel_pathlist处理。

static void
set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
{
    Relids      required_outer;
    required_outer = rel->lateral_relids;

    /* 添加顺序扫描路径 */
    add_path(rel, create_seqscan_path(root, rel, required_outer, 0));

    if (rel->consider_parallel && required_outer == NULL)
        create_plain_partial_paths(root, rel);

    /* 添加索引扫描路径 */
    create_index_paths(root, rel);

    create_tidscan_paths(root, rel);
}

以下为索引路径创建过程。

void
create_index_paths(PlannerInfo *root, RelOptInfo *rel)
{
    ...
    foreach(lc, rel->indexlist)
    {
        IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
        Assert(index->nkeycolumns <= INDEX_MAX_KEYS);
        if (index->indpred != NIL && !index->predOK)
            continue;
              
        MemSet(&rclauseset, 0, sizeof(rclauseset));
        match_restriction_clauses_to_index(root, index, &rclauseset);
        
        get_index_paths(root, rel, index, &rclauseset,
                        &bitindexpaths);

        MemSet(&jclauseset, 0, sizeof(jclauseset));
        match_join_clauses_to_index(root, rel, index,
                                    &jclauseset, &joinorclauses);

       ...
              
      }
    ...
    }

在生成索引路径之前通过match_restriction_clauses_to_index函数将索引与约束进行匹配。

create_index_paths
    -> match_restriction_clauses_to_index
        -> match_clauses_to_index
            -> match_clause_to_index
                -> match_clause_to_indexcol
                    -> match_opclause_to_indexcol

get_index_paths

static void
get_index_paths(PlannerInfo *root, RelOptInfo *rel,
                IndexOptInfo *index, IndexClauseSet *clauses,
                List **bitindexpaths)
{
    ...
    indexpaths = build_index_paths(root, rel,
                                   index, clauses,
                                   index->predOK,
                                   ST_ANYSCAN,
                                   &skip_nonnative_saop,
                                   &skip_lower_saop);
     ...
     
    foreach(lc, indexpaths)
    {
        IndexPath  *ipath = (IndexPath *) lfirst(lc);

        if (index->amhasgettuple)
            add_path(rel, (Path *) ipath);

        if (index->amhasgetbitmap &&
            (ipath->path.pathkeys == NIL ||
             ipath->indexselectivity < 1.0))
            *bitindexpaths = lappend(*bitindexpaths, ipath);
    }

    ...
}

build_index_paths

static List *
build_index_paths(PlannerInfo *root, RelOptInfo *rel,
                  IndexOptInfo *index, IndexClauseSet *clauses,
                  bool useful_predicate,
                  ScanTypeControl scantype,
                  bool *skip_nonnative_saop,
                  bool *skip_lower_saop)
{
    ...

    /* 1.处理SAOP */
    for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++)
    {
        ...
        if (index_clauses == NIL && !index->amoptionalkey)
            return NIL;
    }
    ...

    /* 2.计算排序相关的优化 */
    pathkeys_possibly_useful = (scantype != ST_BITMAPSCAN &&
                                !found_lower_saop_clause &&
                                has_useful_pathkeys(root, rel));
    index_is_ordered = (index->sortopfamily != NULL);
    if (index_is_ordered && pathkeys_possibly_useful)
    {
        index_pathkeys = build_index_pathkeys(root, index,
                                              ForwardScanDirection);
     ...

    /* 3. 覆盖索引 */
    index_only_scan = (scantype != ST_BITMAPSCAN &&
                       check_index_only(rel, index));

    /* 4.生成索引路径 */
    if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
        index_only_scan)
    {
        ipath = create_index_path(root, index,
                                  index_clauses,
                                  ...
                                  false);
        result = lappend(result, ipath);
        ...
        
        }
    }

    /*
     * 5. backwards scan 个人理解升序索引的降序排序处理。
     */
    if (index_is_ordered && pathkeys_possibly_useful)
    {
        index_pathkeys = build_index_pathkeys(root, index,BackwardScanDirection);
        useful_pathkeys = truncate_useless_pathkeys(root, rel,index_pathkeys);
        if (useful_pathkeys != NIL)
        {
            ipath = create_index_path(root, index,
                                      index_clauses,
                                      NIL,
            ...
        }
    }

    return result;
}

create_index_path

IndexPath *
create_index_path(PlannerInfo *root,
                  IndexOptInfo *index,
                  List *indexclauses,
                  List *indexorderbys,
                  List *indexorderbycols,
                  List *pathkeys,
                  ScanDirection indexscandir,
                  bool indexonly,
                  Relids required_outer,
                  double loop_count,
                  bool partial_path)
{
    IndexPath  *pathnode = makeNode(IndexPath);
    RelOptInfo *rel = index->rel;
    /* 维护参数化路径的参数信息 */
    pathnode->path.pathtype = indexonly ? T_IndexOnlyScan : T_IndexScan;
    pathnode->path.parent = rel;
    pathnode->path.pathtarget = rel->reltarget;
    pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
                                                          required_outer);
    pathnode->path.parallel_aware = false;
    pathnode->path.parallel_safe = rel->consider_parallel;
    pathnode->path.parallel_workers = 0;
    pathnode->path.pathkeys = pathkeys;

    pathnode->indexinfo = index;
    pathnode->indexclauses = indexclauses;
    pathnode->indexorderbys = indexorderbys;
    pathnode->indexorderbycols = indexorderbycols;
    pathnode->indexscandir = indexscandir;
    /* 计算索引代价 */
    cost_index(pathnode, root, loop_count, partial_path);

    return pathnode;
}

代价估计需从I/O和CPU开销两个方面考虑。

void
cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
           bool partial_path)
{
    IndexOptInfo *index = path->indexinfo;
    ...
    /* 调用索引自身的代价估计函数计算启动代价、总代价、选择度、相关度 */
    amcostestimate = (amcostestimate_function) index->amcostestimate;
    amcostestimate(root, path, loop_count,
                   &indexStartupCost, &indexTotalCost,
                   &indexSelectivity, &indexCorrelation,
                   &index_pages);
    /* 记录索引启动代价与运行代价 */
    startup_cost += indexStartupCost;
    run_cost += indexTotalCost - indexStartupCost;

    /* 估算基表要取出的元组数 */
    tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);

    if (loop_count > 1)
        ...
        /* 估算最大最小IO代价 */
        rand_heap_pages = pages_fetched;
        max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
        pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
        pages_fetched = index_pages_fetched(pages_fetched * loop_count,
                                            baserel->pages,
                                            (double) index->pages,
                                            root);
        min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
    }
    else
    {
        ...
    }

    ...

    /* 根据最大最小代价和索引相关度计算总代价并加到运行代价中 */
    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

    /* 估算CPU代价 */
    cost_qual_eval(&qpqual_cost, qpquals, root);

    startup_cost += qpqual_cost.startup;
    cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
    cpu_run_cost += cpu_per_tuple * tuples_fetched;
    
    ...
    /* 叠加出总代价并保存在path中 */
    run_cost += cpu_run_cost;
    path->path.startup_cost = startup_cost;
    path->path.total_cost = startup_cost + run_cost;
}

create_index_paths函数会通过add_path将索引路径添加至pathlist中。在添加时会将新路径与老路径进行比较，若新路径更优，则会保留新路径，删除老路径。

当普通关系路径构建完成后，通过set_cheapest函数设置代价最低路径。接着在外层make_one_rel中借由动态规划等算法生成最终路径。

总结以上流程。

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

PostgreSQL索引浅析

概述

索引相关系统表

索引代码结构

索引扫描

查询SQL执行

扫描节点

IndexScan

索引路径生成

PostgreSQL索引浅析

概述

索引相关系统表

索引代码结构

索引扫描

查询SQL执行

扫描节点

IndexScan

索引路径生成

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

PostgreSQL索引浅析

概述

索引相关系统表

索引代码结构

索引扫描

查询SQL执行

扫描节点

IndexScan

索引路径生成

PostgreSQL索引浅析

概述

索引相关系统表

索引代码结构

索引扫描

查询SQL执行

扫描节点

IndexScan

索引路径生成