概念介绍
1.关系(Relation): "Relation"是一个表示表(Table)的数学术语。在PostgreSQL中,"Relation"表示表或索引。其他数据库一般表示为Table,索引为Index,而PostgreSQL有时会将它们统称为关系。表用来存数据,一个表通常由若干行组成。
2.元组(Tuple):表示表中的行(Row),有的数据库将元组称为记录(Record)。一个元组一般由多个域(Field)组成,域有时也被称为字段(Field)或列(Columu)。
1 系统表的访问
-
用户想要创建一个自己的表,是需要访问多个不同的系统表,如果没有缓存,每次需要读磁盘,且读多次,这个性能显然是接受不到了的。所以就需要有系统表的缓存机制来加速对其访问。
-
缓存设计为什么样的形态能保证性能收益最大化呢?首先PG 是进程模型,就是每一个用户连接到 Postmaster,都会为这个client 连接 fork一个 backend子进程用作服务交互。PG 之所以是多进程还是因为历史原因(1996年PostgreSQL 6.0发布时 linux kernel还不支持 multi-thread),当然良好多进程架构的设计和多线程架构设计的性能是没有差异的。对于缓存的设计,因为系统表的数据量并不大,且DDL 操作并不是高频操作,对于系统表的访问 应该尽可能得让数据靠近CPU,即最大程度得利用好CPU-cache。 所以 PG采用的缓存设计形态可以理解为 per-process-cache,类似thread-cache。每一个backend 进程都维护一个自己进程级别的 cache,每一个backend进程在有访问系统表的需求时可以尽可能得利用好cpu-cache。
-
因为是 per-process-cache, cache需要在 某一个backend 对系统表发生更改时其他的 backend 进程能够感知到。所以需要有一套维护cache 一致性的机制,也就是 PG 的 InvalidMessage机制。
-
有了per-process-cache 能够保障对系统表数据的访问性能,但是多个backend对同一个系统表的同一行的更新则需要有安全性的保障,这个时候就需要有锁 + MVCC 机制来做保障。
综上,系统表访问的核心 就是 cache + 锁|MVCC。一个保障访问高效,一个保障访问安全
由于系统表保存了数据库的所有元数据,所以系统运行时对系统表的访问是非常频繁的。为了 提高系统性能,在内存中建立了共享的系统表CACHE,使用 Hash 函数和 Hash 表提高查询效率。
cache种类
cache的相关代码在src/backend/utils/cache目录下
attribute cache代码:attoptcache.c
系统表cache代码:catcache.c
事件触发器cache代码:evtcache.c
计划cache代码:plancache.c
普通表描述符cache代码:relcache.c
系统表oid映射cache代码:relmapper.c
普通表oid映射cache代码:relfilenodemap.c
表空间cache代码:spccache.c
系统表cache代码:syscache.c
系统表检索函数:lsyscache.c
typecache代码:typcache.c
文本检索cache代码:ts_cache.c
此文主要对catcache和relcache进行说明。
2 SysCache(CatCache
)
系统表缓存,也叫
CatCache
(catalog cache),这个 Cache用来缓存系统表的数据。从实现上看SysCache就是一个数组,数组的长度为预定义的系统表的个数。在PG 8.4.1中实现了54个系统表,因此SysCache数组具有54个元素。每个元素的数据结构为CatCache,该结构体内使用Hash来存储被缓存的系统表元组,每个系统表唯一地对应一个SysCache数组中的CatCache结构。
2.1 SysCache中的数据结构
SysCache在PG中以数组的形式存在,并且数组的长度与预定义的系统表的个数相等。
static CatCache *SysCache[SysCacheSize];
SysCache数组中的元素数据类型为CatCache
CatCache的数据结构
typedef struct catcache
{
int id; /* cache identifier --- see syscache.h */
int cc_nbuckets; /* # 该catcache的hash桶数量 */
TupleDesc cc_tupdesc; /* tuple descriptor (copied from reldesc) */
dlist_head *cc_bucket; /* 该catcache的hash桶 */
CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* hash function for each key */
CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* fast equal function for
* each key */
int cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
dlist_head cc_lists; /* CatCList 链表,用于缓存部分匹配的元组,每一个CatCList结构保持一次部分匹配元组的结果*/
int cc_ntup; /* # 缓存在该catcahe的元组数量 */
int cc_nkeys; /* # 查找关键字个数*/
const char *cc_relname; /* 系统表名称 */
Oid cc_reloid; /* 系统表OID */
Oid cc_indexoid; /* 索引OID */
bool cc_relisshared; /* 是否多个数据库共享 */
slist_node cc_next; /* 单向链表指针 */
ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* 堆保存时使用的关键字 */
/*
* Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
* doesn't break ABI for other modules
*/
#ifdef CATCACHE_STATS
long cc_searches; /* total # searches against this cache */
long cc_hits; /* # of matches against existing entry */
long cc_neg_hits; /* # of matches against negative entry */
long cc_newloads; /* # of successful loads of new entry */
/*
* cc_searches - (cc_hits + cc_neg_hits + cc_newloads) is number of failed
* searches, each of which will result in loading a negative entry
*/
long cc_invals; /* # of entries invalidated from cache */
long cc_lsearches; /* total # list-searches */
long cc_lhits; /* # of matches against existing lists */
#endif
} CatCache;
由CatCache的成员变量可以看到,CatCache中的存在一个双向链表,与四个哈希函数,并拥有四个哈希关键字。实际上,CatCache中含有两种双向链表:cc_lists(CatCList)与cc_bucket(CatCTup),分别用于部分匹配查找,与精准匹配查找。其中cc_bucket也可以由cc_lists管理。CatCache使用双向链表cc_lists(CatCList)将不同的哈希桶串在了一起。
catcacheheader数据结构
catcacheheader用于管理所有缓存(cache)的信息。CatCacheHeader 是 catcache 链表的头,将每一个 catcache 初始化后,就将这个 cacatche 的结构链接到此链表。
typedef struct catcacheheader
{
slist_head ch_caches; /* 指向catcache链表的头部*/
int ch_ntup; /* # 所有catcahe中缓存元组的总数 */
} CatCacheHeader;
SysCacheIdentifier结构
SysCacheIdentifier 枚举了所有的 catcache 的类型
enum SysCacheIdentifier
{
AGGFNOID = 0,
AMNAME, //1
AMOID, //2
AMOPOPID, //3
AMOPSTRATEGY,
AMPROCNUM,
ATTNAME,
ATTNUM,
#ifdef __AUDIT__
AUDITSTMTCONF,
AUDITSTMTCONFOID,
AUDITUSERCONF,
AUDITUSERCONFOID,
AUDITOBJCONF,
AUDITOBJCONFOID,
AUDITOBJDEFAULT,
AUDITOBJDEFAULTOID,
AUDITFGAPOLICYCONF,
AUDITFGAOBJPOLICYCONF,
#endif
......
......
USERMAPPINGOID,
USERMAPPINGUSERSERVER
#define SysCacheSize (USERMAPPINGUSERSERVER + 1) //109
};
cacheinfo结构
穷举了每一个 cache 的结构。在SysCache.c文件中已经将所有系统表的CatCache信息存储在一个名为cacheinfo的静态数组中。
static const struct cachedesc cacheinfo[] = {
{AggregateRelationId, /* AGGFNOID */ //AggregateRelationId=2600
AggregateFnoidIndexId, //AggregateFnoidIndexId=2650
1,
{
Anum_pg_aggregate_aggfnoid,
0,
0,
0
},
16
},
{AccessMethodRelationId, /* AMNAME */ //catcache对应的系统表OID,AccessMethodRelationId=2601
AmNameIndexId, //catcache用到的索引OID,AmNameIndexId=2651
1, //查询关键字的个数
{ //查询关键字的属性号
Anum_pg_am_amname,
0,
0,
0
},
4 //该catcache需要的Hash桶数
},
......
}
cachedesc结构
在SysCache.c文件中已经将所有系统表的CatCache信息存储在一个名为cacheinfo的静态数组 中,每个系统表的CatCache信息用一个数组元素来描述,其数据类型为cachedesc:
/*
* struct cachedesc: information defining a single syscache
*/
struct cachedesc
{
Oid reloid; /*catcache对应的系统表OID*/
Oid indoid; /*catcache用到的索引OID*/
int nkeys; /* 查询关键字的个数 */
int key[4]; /* 查询关键字的属性号 */
int nbuckets; /* 该catcache需要的Hash桶数 */
};
CatCList数据结构
CatCList
用于部分匹配查找,即仅使用缓存的前K个键列进行搜索,而不使用所有N个键列。这在某些情况下可以提高搜索效率。CatList 结构以链表的方式存放了在 Cache 中找到的元组。 CatCList 中所包含的元组实际通过 members 字段表示的变长数据来记录,该数组的实际长度由 n_members 字段记录。
typedef struct catclist
{
int cl_magic; /* for identifying CatCList entries */
#define CL_MAGIC 0x52765103
uint32 hash_value; /* 查找键的哈希值*/
dlist_node cache_elem; /* list member of per-catcache list */
/*
* Lookup keys for the entry, with the first nkeys elements being valid.
* All by-reference are separately allocated.
*/
Datum keys[CATCACHE_MAXKEYS];
int refcount; /* 引用计数 */
bool dead; /* dead but not yet removed? */
bool ordered; /* members listed in index order? */
short nkeys; /* number of lookup keys specified */
int n_members; /* CatCList中元组的个数 */
CatCache *my_cache; /* 指向所在的catcahe */
CatCTup *members[FLEXIBLE_ARRAY_MEMBER]; /* CatCList中元组的数组 */
} CatCList;
catctup数据结构
catctup用于缓存(cache)中的单个元组(tuples)。
typedef struct catctup
{
int ct_magic; /* for identifying CatCTup entries */
#define CT_MAGIC 0x57261502
uint32 hash_value; /* 用于查找此元组的哈希值*/
/*
keys存储了用于查找此元组的键值。对于正元组,这些键值是对元组的引用,
而对于负元组,这些键值通常是单独分配的。
*/
Datum keys[CATCACHE_MAXKEYS];
/*
CatCTup 的元组通常是哈希桶(bucket)中的一部分,这个属性用于将元组链接到哈希桶中的双向链表。
这些链表以LRU(最近最少使用)的顺序维护,以加快重复查找的速度。
*/
dlist_node cache_elem; /* list member of per-bucket list */
/*如果元组被标记为 "dead",不会被删除,但是在后续搜索中不会返回它,
直到引用计数(refcount)为零时从缓存中删除。*/
int refcount; /* 活跃引用计数 */
bool dead; /* 是否dead */
/*如果 negative 为 true,则表示这是一个负元组,就是实际并不存在于系统表中,
但是其键值曾经用于在 CatCache 中进行查找的元组。负元组只有键值,其他属性均为空。
负元组的存在是为了避免反复到物理表中去查找不存在的元组所带来的I/O开销*/
bool negative; /* 是否是负元组*/
HeapTupleData tuple; /* 元组管理的头部 */
/*
* The tuple may also be a member of at most one CatCList. (If a single
* catcache is list-searched with varying numbers of keys, we may have to
* make multiple entries for the same tuple because of this restriction.
* Currently, that's not expected to be common, so we accept the potential
* inefficiency.)
*/
struct catclist *c_list; /* containing CatCList, or NULL if none */
CatCache *my_cache; /* link to owning catcache */
/* properly aligned tuple data follows, unless a negative entry */
} CatCTup;
CatCache 中缓存元组的组织
-
SysCacheIdentifier :枚举了所有的 catcache 的类型。
-
cacheinfo :穷举了每一个 cache 的结构。
-
struct catcacheheader:用于管理所有缓存(cache)的信息,是 catcache 链表的头。
-
struct catcache:用于管理缓存(cache)的信息,描述了每一个 catcache 的信息。
-
struct catctup:缓存(cache)中的单个元组(tuples)。
-
struct catclist:与分部键匹配的元组(tuples)列表。
-
SysCacheIdentifier 枚举了所有的 catcache 的类型,而 cacheinfo 则是穷举了每一个 cache 的结构。
-
CatCacheHeader 是 catcache 链表的头,将每一个 catcache 初始化后,就将这个 cacatche 的结构链接到此链表。catcache 结构体中描述了每一个 cacatche 的信息。
-
缓存的系统表 tuple 数据就存储在 cc_bucket 中,在 catche 初始化时将 cc_bucket 初始化为一个 hash 桶数组。在每一个 hash 桶中存储着一个 CatTup 链表,每一个 CatTup 结构就对应着一个系统表的 tuple。
-
在 catcache 结构中还有一个 cc_lists 链表,此链表是为了缓存一个多结果的系统表扫描。比如使用表名作为条件扫描 pg_class,可能返回多条记录。返回的记录还是存储在 cc_bucket 哈希桶数组中,同时将在哈希桶中的位置存储在 CatCList 结构的 members 数组中。
以上结构体按照逻辑大小排序的顺序为:
CatCacheHeader->SysCache->CatCList->CatCTup->HeapTupleData
2.2 SysCache 初始化
2.2.1 InitCatalogCache-初始化第一阶段
PG 在初始化 backend进程时 会通过
InitPostgres
--> InitCatalogCache
完成对 SysCache
的初始化。//syscache第一阶段初始化的调用栈
InitCatCache
InitCatalogCache
InitPostgres
SysCache 的初始化实际上是填充 SysCache 数组中每个元素的 CatCache 结构的过程,主要任务是将查找系统表元组的关键字信息写入 SysCache 数组元素中。这样通过指定的关键字可以快速定位到系统表元组的存储位置。
InitCatalogCache
初始化的基本过程如下:/*
* InitCatalogCache - initialize the caches
*
* Note that no database access is done here; we only allocate memory
* and initialize the cache structure. Interrogation of the database
* to complete initialization of a cache happens upon first use
* of that cache.
*/
void
InitCatalogCache(void)
{
int cacheId;
int i,
j;
StaticAssertStmt(SysCacheSize == (int) lengthof(cacheinfo),
"SysCacheSize does not match syscache.c's array");
Assert(!CacheInitialized);
SysCacheRelationOidSize = SysCacheSupportingRelOidSize = 0;
//1.循环初始化缓存,逐个拿 syscache.h 预定义好的 SysCacheIdentifier 作为 cacheid
for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
{
//2. 将预定义好的 cacheinfo 数组中的 CatCache 数据结构填充到 SysCache[cacheId]中
SysCache[cacheId] = InitCatCache(cacheId,
cacheinfo[cacheId].reloid,
cacheinfo[cacheId].indoid,
cacheinfo[cacheId].nkeys,
cacheinfo[cacheId].key,
cacheinfo[cacheId].nbuckets);
if (!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "could not initialize cache %u (%d)",
cacheinfo[cacheId].reloid, cacheId);
/* Accumulate data for OID lists, too */
SysCacheRelationOid[SysCacheRelationOidSize++] =
cacheinfo[cacheId].reloid;
SysCacheSupportingRelOid[SysCacheSupportingRelOidSize++] =
cacheinfo[cacheId].reloid;
SysCacheSupportingRelOid[SysCacheSupportingRelOidSize++] =
cacheinfo[cacheId].indoid;
/* see comments for RelationInvalidatesSnapshotsOnly */
Assert(!RelationInvalidatesSnapshotsOnly(cacheinfo[cacheId].reloid));
}
Assert(SysCacheRelationOidSize <= lengthof(SysCacheRelationOid));
Assert(SysCacheSupportingRelOidSize <= lengthof(SysCacheSupportingRelOid));
/* Sort and de-dup OID arrays, so we can use binary search. */
//接下来,对OID数组进行排序和去重。这使得可以使用二分查找来查找特定的OID
pg_qsort(SysCacheRelationOid, SysCacheRelationOidSize,
sizeof(Oid), oid_compare);
for (i = 1, j = 0; i < SysCacheRelationOidSize; i++)
{
if (SysCacheRelationOid[i] != SysCacheRelationOid[j])
SysCacheRelationOid[++j] = SysCacheRelationOid[i];
}
SysCacheRelationOidSize = j + 1;
pg_qsort(SysCacheSupportingRelOid, SysCacheSupportingRelOidSize,
sizeof(Oid), oid_compare);
for (i = 1, j = 0; i < SysCacheSupportingRelOidSize; i++)
{
if (SysCacheSupportingRelOid[i] != SysCacheSupportingRelOid[j])
SysCacheSupportingRelOid[++j] = SysCacheSupportingRelOid[i];
}
SysCacheSupportingRelOidSize = j + 1;
CacheInitialized = true; //初始化完设成true
}
/* Sorted array of OIDs of tables that have caches on them */
static Oid SysCacheRelationOid[SysCacheSize];
static int SysCacheRelationOidSize;
/* Sorted array of OIDs of tables and indexes used by caches */
static Oid SysCacheSupportingRelOid[SysCacheSize * 2];
static int SysCacheSupportingRelOidSize;
RelationHasSysCache//测试relation是否有system cache
GetNonHistoricCatalogSnapshot
RelationSupportsSysCache//测试relation是否支持system cache
RelationIdIsInInitFile//确定给定关系(通过OID标识)是否应该存储在本地的关系缓存初始化文件中
RegisterRelcacheInvalidation
write_relcache_init_file
-
拿SysCacheIdentifier作为cacheid
逐个拿
syscache.h
预定义好的 SysCacheIdentifier
作为 cacheid。for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
{
....
}
enum SysCacheIdentifier
{
AGGFNOID = 0,
AMNAME, //1
AMOID, //2
AMOPOPID, //3
AMOPSTRATEGY,
AMPROCNUM,
ATTNAME,
ATTNUM,
#ifdef __AUDIT__
AUDITSTMTCONF,
AUDITSTMTCONFOID,
AUDITUSERCONF,
AUDITUSERCONFOID,
AUDITOBJCONF,
AUDITOBJCONFOID,
AUDITOBJDEFAULT,
AUDITOBJDEFAULTOID,
AUDITFGAPOLICYCONF,
AUDITFGAOBJPOLICYCONF,
#endif
......
......
USERMAPPINGOID,
USERMAPPINGUSERSERVER
#define SysCacheSize (USERMAPPINGUSERSERVER + 1)
};
-
将cacheinfo数组中的 CatCache 数据结构填充到 SysCache[cacheId]
将预定义好的
cacheinfo
数组中的 CatCache
数据结构填充到 SysCache[cacheId]
中。 cacheinfo
中预定义好了每一个 cacheid 对应的 CatCache
的属性,比如: cacheid 为 AMNAME=1
的 cacheinfo 内容如下:for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
{
//2. 将预定义好的 cacheinfo 数组中的 CatCache 数据结构填充到 SysCache[cacheId]中
SysCache[cacheId] = InitCatCache(cacheId,
cacheinfo[cacheId].reloid,
cacheinfo[cacheId].indoid,
cacheinfo[cacheId].nkeys,
cacheinfo[cacheId].key,
cacheinfo[cacheId].nbuckets);
}
static const struct cachedesc cacheinfo[] = {
........
{AccessMethodRelationId, /* AMNAME=1 */ //catcache对应的系统表OID,AccessMethodRelationId=2601
AmNameIndexId, //catcache用到的索引OID,AmNameIndexId=2651
1, //查询关键字的个数
{ //查询关键字的属性号
Anum_pg_am_amname,
0,
0,
0
},
4 //该catcache需要的Hash桶数
},
......
}
-
InitCatCache
InitCatcache每调用一次将处理一个cachedesc结构。该函数根据cachedesc中要求的Hash桶的数量为即将建立的CatCache结构分配内存,并根据cachedesc结构中的信息填充CatCache的各个字段, 包括
CatCache->cc_reloid
, CatCache->cc_indexoid
, CatCache->cc_nbuckets
等信息。最后将生成的CatCache链接在CacheHdr所指向的链表的头部。CatCache *
InitCatCache(int id,
Oid reloid,
Oid indexoid,
int nkeys,
const int *key,
int nbuckets)
{
CatCache *cp;
MemoryContext oldcxt;
size_t sz;
int i;
/*
* nbuckets is the initial number of hash buckets to use in this catcache.
* It will be enlarged later if it becomes too full.
*
* nbuckets must be a power of two. We check this via Assert rather than
* a full runtime check because the values will be coming from constant
* tables.
*
* If you're confused by the power-of-two check, see comments in
* bitmapset.c for an explanation.
*/
//检查 nbuckets:首先,它通过Assert 确保 nbuckets 大于0,并且是2的幂。
//这是因为哈希桶的数量通常应该是2的幂,以获得较好的哈希性能。
Assert(nbuckets > 0 && (nbuckets & -nbuckets) == nbuckets);
/*
* first switch to the cache context so our allocations do not vanish at
* the end of a transaction
*/
if (!CacheMemoryContext)
CreateCacheMemoryContext();
oldcxt = MemoryContextSwitchTo(CatlogMemoryContext);
/*
* if first time through, initialize the cache group header
如果 CacheHdr 为 NULL,表示第一次创建缓存,它会分配内存并初始化缓存组头部 CacheHdr。这个头部包括缓存链表和其他信息。
*/
if (CacheHdr == NULL)
{
CacheHdr = (CatCacheHeader *) palloc(sizeof(CatCacheHeader));
slist_init(&CacheHdr->ch_caches);
CacheHdr->ch_ntup = 0;
#ifdef CATCACHE_STATS
/* set up to dump stats at backend exit */
on_proc_exit(CatCachePrintStats, 0);
#endif
}
/*
* allocate a new cache structure
* Note: we rely on zeroing to initialize all the dlist headers correctly
*/
//根据cachedesc中要求的Hash桶的数量为即将建立的CatCache结构分配内存
sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
cp = (CatCache *) CACHELINEALIGN(palloc0(sz));
cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head));
/*
* initialize the cache's relation information for the relation
* corresponding to this cache, and initialize some of the new cache's
* other internal fields. But don't open the relation yet.
* 根据cachedesc结构中的信息填充CatCache的各个字段
包括 CatCache->cc_reloid, CatCache->cc_indexoid , CatCache->cc_nbuckets 等信息。
*/
cp->id = id;
cp->cc_relname = "(not known yet)";
cp->cc_reloid = reloid;
cp->cc_indexoid = indexoid;
cp->cc_relisshared = false; /* temporary */
cp->cc_tupdesc = (TupleDesc) NULL;
cp->cc_ntup = 0;
cp->cc_nbuckets = nbuckets;
cp->cc_nkeys = nkeys;
for (i = 0; i < nkeys; ++i)
cp->cc_keyno[i] = key[i];
/*
* new cache is initialized as far as we can go for now. print some
* debugging information, if appropriate.
*/
InitCatCache_DEBUG2;
/*
* add completed cache to top of group header's list
最后将生成的CatCache链接在CacheHdr所指向的链表的头部。
*/
slist_push_head(&CacheHdr->ch_caches, &cp->cc_next);
/*
* back to the old context before we return...
*/
MemoryContextSwitchTo(oldcxt);
return cp;
}
2.2.2 增加一个系统表
如果想要增加更多的 syscache,首先需要确保该cacheid 能够唯一标识一个系统表的一行(该系统表能够在该属性列上建立唯一索引),在
SysCacheIdentifier
添加一行,同时 cacheinfo 数组相应的偏移位置上需要添加该属性列的声明。2.2.3 InitCatcachePhase2--初始化第二阶段
InitCatalogCache 函数中实际只完成了 SysCache 初始化的第 1个阶段,在稍后被调用的函数 RelationCachelnitializePhase3 (负责 RelCache 的初始化)来完成最后的 SysCache 初始化工作。
//初始化第二阶段调用栈
CatalogCacheInitializeCache
InitCatcachePhase2
InitCatalogCachePhase2
RelationCacheInitializePhase3
InitPostgres
第2阶段:RelationCacheInitializePhase3-->InitCatalogCachePhase2->InitCatcachePhase2。InitCatcachePhase2将依次完善SysCache数组中的CatCache结构,主要是根据对应的系统表填充CatCache结构中的元组描述符(cc_tupledesc)、系统表名(cc_relname)、查找关键字的相关字段(cc_hashfunc、cc_isname、cc_skey)等。
/*
* InitCatalogCachePhase2 - finish initializing the caches
*
* Finish initializing all the caches, including necessary database
* access.
*
* This is *not* essential; normally we allow syscaches to be initialized
* on first use. However, it is useful as a mechanism to preload the
* relcache with entries for the most-commonly-used system catalogs.
* Therefore, we invoke this routine when we need to write a new relcache
* init file.
*/
void
InitCatalogCachePhase2(void)
{
int cacheId;
Assert(CacheInitialized);
for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
InitCatCachePhase2(SysCache[cacheId], true);
}
/*
* InitCatCachePhase2 -- external interface for CatalogCacheInitializeCache
*
* One reason to call this routine is to ensure that the relcache has
* created entries for all the catalogs and indexes referenced by catcaches.
* Therefore, provide an option to open the index as well as fixing the
* cache itself. An exception is the indexes on pg_am, which we don't use
* (cf. IndexScanOK).
*/
void
InitCatCachePhase2(CatCache *cache, bool touch_index)
{
//初始化目录缓存
if (cache->cc_tupdesc == NULL)
CatalogCacheInitializeCache(cache);
//如果设置了touch_index参数为true,并且缓存不是对pg_am表的索引,则打开目录的索引
if (touch_index &&
cache->id != AMOID &&
cache->id != AMNAME)
{
Relation idesc;
/*
* We must lock the underlying catalog before opening the index to
* avoid deadlock, since index_open could possibly result in reading
* this same catalog, and if anyone else is exclusive-locking this
* catalog and index they'll be doing it in that order.
*/
LockRelationOid(cache->cc_reloid, AccessShareLock);
idesc = index_open(cache->cc_indexoid, AccessShareLock);
/*
* While we've got the index open, let's check that it's unique (and
* not just deferrable-unique, thank you very much). This is just to
* catch thinkos in definitions of new catcaches, so we don't worry
* about the pg_am indexes not getting tested.
*/
#ifdef __TELEDBX__
/*
* we have already added some non-unique indexes into syscache,
* so we can not do this assert.
*/
#else
Assert(idesc->rd_index->indisunique &&
idesc->rd_index->indimmediate);
#endif
index_close(idesc, AccessShareLock);
UnlockRelationOid(cache->cc_reloid, AccessShareLock);
}
}
//将依次完善SysCache数组中的CatCache结构,主要是根据对应的系统表填充
//CatCache结构中的元组描述符(cc_tupledesc)、系统表名(cc_relname)、
//查找关键字的相关字段(cc_hashfunc、cc_isname、cc_skey)
static void
CatalogCacheInitializeCache(CatCache *cache)
{
Relation relation;
MemoryContext oldcxt;
TupleDesc tupdesc;
int i;
CatalogCacheInitializeCache_DEBUG1;
//查relcache:在relcache初始化第三阶段最后调用,这个时候关键系统表的relation结构已经加载到relcache了
relation = heap_open(cache->cc_reloid, AccessShareLock);
/* 切换内存上下文
* switch to the cache context so our allocations do not vanish at the end
* of a transaction
*/
Assert(CatlogMemoryContext != NULL);
oldcxt = MemoryContextSwitchTo(CatlogMemoryContext);
/*将关系缓存的元组描述符复制到永久cache存储中。
* copy the relcache's tuple descriptor to permanent cache storage
*/
tupdesc = CreateTupleDescCopyConstr(RelationGetDescr(relation));
/*
* save the relation's name and relisshared flag, too (cc_relname is used
* only for debugging purposes)
*/
cache->cc_relname = pstrdup(RelationGetRelationName(relation));
cache->cc_relisshared = RelationGetForm(relation)->relisshared;
/*
* return to the caller's memory context and close the rel
*/
MemoryContextSwitchTo(oldcxt);
heap_close(relation, AccessShareLock);
CACHE3_elog(DEBUG2, "CatalogCacheInitializeCache: %s, %d keys",
cache->cc_relname, cache->cc_nkeys);
/* 初始化cache的key信息
* initialize cache's key information
*/
for (i = 0; i < cache->cc_nkeys; ++i)
{
Oid keytype;
RegProcedure eqfunc;
CatalogCacheInitializeCache_DEBUG2;
if (cache->cc_keyno[i] > 0)
{
Form_pg_attribute attr = TupleDescAttr(tupdesc,
cache->cc_keyno[i] - 1);
keytype = attr->atttypid;
/* cache key columns should always be NOT NULL */
Assert(attr->attnotnull);
}
else
{
if (cache->cc_keyno[i] != ObjectIdAttributeNumber)
elog(FATAL, "only sys attr supported in caches is OID");
keytype = OIDOID;
}
GetCCHashEqFuncs(keytype,
&cache->cc_hashfunc[i],
&eqfunc,
&cache->cc_fastequal[i]);
/*
* Do equality-function lookup (we assume this won't need a catalog
* lookup for any supported type)
*/
fmgr_info_cxt(eqfunc,
&cache->cc_skey[i].sk_func,
CatlogMemoryContext);
/* Initialize sk_attno suitably for HeapKeyTest() and heap scans */
cache->cc_skey[i].sk_attno = cache->cc_keyno[i];
/* Fill in sk_strategy as well --- always standard equality */
cache->cc_skey[i].sk_strategy = BTEqualStrategyNumber;
cache->cc_skey[i].sk_subtype = InvalidOid;
/* Currently, there are no catcaches on collation-aware data types */
cache->cc_skey[i].sk_collation = InvalidOid;
CACHE4_elog(DEBUG2, "CatalogCacheInitializeCache %s %d %p",
cache->cc_relname,
i,
cache);
}
/* 标志着这个cache完全初始化完成了
* mark this cache fully initialized
*/
cache->cc_tupdesc = tupdesc;
}
SysCache数组初始化后,CatCache内是没有任何元组的,但是随着系统运行,对于系统表元组的访问,其中的系统表元组也会逐渐增多。
2.3 CatCache查找元组(保存和读取缓存数据)
经过 catcache 的初始化后,已经为所有的 catcache 初始化了用来缓存扫描数据的 hash 桶。下面就说明 hash 桶如何存取数据的。
CatCache中查找元组有两种方式:精确匹配和部分匹配。
SearchCatCache 是为了精确匹配,给定 CatCache 所需的所有键值,并返回 CatCache 中能完全匹配这个键值的一个元组。
SearchCatCacheList 则是部分匹配,只需要给出部分键值,并将部分匹配的一组元组以一个 CatCList 的方式返回。
精准匹配-SearchCatCache
-
精确匹配查找函数SearchCatCache 函数实现,其函数原型如下:
其中, vl v2 v3 和v4叫都用于查找元组的键值,分别对应该 Cache 中记录的元组搜索键。可以看到, SearchCatcache 最多可以使用4 个属性的键值进行查询, 4个参数分别对应该CatCache 数据结构中 cc_key 字段定义的查找键。
SearchCatCache 需要在1个给定的 CatCache 中查找元组,为了确定要在哪个 CatCache 中进行查找,还需要先通过 CacheHdr 遍历 SysCache 中所有的 CatCache 结构体,并根据查询的系统表名或系统表 OID 找到对应的 CatCache。
//精准匹配查找元组
HeapTuple
SearchCatCache(CatCache *cache,
Datum v1,
Datum v2,
Datum v3,
Datum v4)
{
return SearchCatCacheInternal(cache, cache->cc_nkeys, v1, v2, v3, v4);
}
//返回一个HeapTuple元组
static inline HeapTuple
SearchCatCacheInternal(CatCache *cache,
int nkeys,
Datum v1,
Datum v2,
Datum v3,
Datum v4)
{
Datum arguments[CATCACHE_MAXKEYS];
uint32 hashValue;
Index hashIndex;
dlist_iter iter;
dlist_head *bucket;
CatCTup *ct;
......
// 1. 如果cache未初始化,调用CatalogCacheInitializeCache初始化
if (unlikely(cache->cc_tupdesc == NULL))
CatalogCacheInitializeCache(cache);
.......
//2.对所找元组的键值进行哈希处理,得到哈希索引hashIndex。
hashValue = CatalogCacheComputeHashValue(cache, nkeys, v1, v2, v3, v4);
hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);
//3. 根据hashIndex找到cc_bucket中的哈希桶
bucket = &cache->cc_bucket[hashIndex];
//4. 遍历哈希桶找到满足查询需求的CatCTup元组
//CatCTup中的HeapTupleData中就是要查找的元组数据的头部(指针)。
dlist_foreach(iter, bucket)
{
ct = dlist_container(CatCTup, cache_elem, iter.cur);
if (ct->dead)
continue; /* ignore dead entries */
if (ct->hash_value != hashValue)
continue; /* quickly skip entry if wrong hash val */
if (!CatalogCacheCompareTuple(cache, nkeys, ct->keys, arguments))
continue;
//4.1将该元组放到hash桶的首位
dlist_move_head(bucket, &ct->cache_elem);
if (!ct->negative)
{//4.2 在hash同种检索到了结果,将此正元组返回
ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
ct->refcount++;
ResourceOwnerRememberCatCacheRef(CurrentResourceOwner, &ct->tuple);
CACHE3_elog(DEBUG2, "SearchCatCache(%s): found in bucket %d",
cache->cc_relname, hashIndex);
#ifdef CATCACHE_STATS
cache->cc_hits++;
#endif
return &ct->tuple;
}
else
{//4.3 此时检索到的是一个负元组,则返回null
CACHE3_elog(DEBUG2, "SearchCatCache(%s): found neg entry in bucket %d",
cache->cc_relname, hashIndex);
#ifdef CATCACHE_STATS
cache->cc_neg_hits++;
#endif
return NULL;
}
}
//5. hash桶找不到,则打开系统表来查找该元组
return SearchCatCacheMiss(cache, nkeys, hashValue, hashIndex, v1, v2, v3, v4);
}
扫描物理系统表来查找该元组:
static pg_noinline HeapTuple
SearchCatCacheMiss(CatCache *cache,
int nkeys,
uint32 hashValue,
Index hashIndex,
Datum v1,
Datum v2,
Datum v3,
Datum v4)
{
...........
//1. 打开并扫描系统表
relation = heap_open(cache->cc_reloid, AccessShareLock);
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
NULL,
nkeys,
cur_skey);
ct = NULL;
while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
{
//2. 找到相应的元组,将元组插入到cache->cc_bucket[hashIndex]哈希桶中
ct = CatalogCacheCreateEntry(cache, ntp, arguments,
hashValue, hashIndex,
false);
/* immediately set the refcount to 1 */
ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
ct->refcount++;
ResourceOwnerRememberCatCacheRef(CurrentResourceOwner, &ct->tuple);
break; /* assume only one match */
}
systable_endscan(scandesc);
heap_close(relation, AccessShareLock);
//3. 如果没有扫描到任何结果,则创建一个负元组
if (ct == NULL)
{
if (IsBootstrapProcessingMode())
return NULL;
#ifdef __TELEDBX__
if (cache->cc_reloid != PgxcKeyValueRelationId)
{
#endif
ct = CatalogCacheCreateEntry(cache, NULL, arguments,
hashValue, hashIndex,
true);
CACHE4_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
CACHE3_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d",
cache->cc_relname, hashIndex);
#ifdef __TELEDBX__
}
#endif
/*
* We are not returning the negative entry to the caller, so leave its
* refcount zero.
*/
return NULL;
}
.......
return &ct->tuple;
}
部分匹配-SearchCatCacheList
-
部分匹配:给出部分键值,并将以CatCList的形式返回所有的buckets。 该过程需要调用SearchCatCacheList()函数。该函数首先也会对搜索的值进行哈希处理,并搜索已有的cc_lists查看是否存在搜索记录。如果搜索历史没有该哈希值,那么要搜索所有的CatCTup判断是否匹配。并将所有匹配的CatCTup组织成一条CatCList的形式返回。
/*
* List-search interface
*/
struct catclist *
SearchSysCacheList(int cacheId, int nkeys,
Datum key1, Datum key2, Datum key3)
{
if (cacheId < 0 || cacheId >= SysCacheSize ||
!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "invalid cache ID: %d", cacheId);
return SearchCatCacheList(SysCache[cacheId], nkeys,
key1, key2, key3);
}
//返回一个CatCList
CatCList *
SearchCatCacheList(CatCache *cache,
int nkeys,
Datum v1,
Datum v2,
Datum v3)
{
Datum v4 = 0; /* dummy last-column value */
Datum arguments[CATCACHE_MAXKEYS];
uint32 lHashValue;
dlist_iter iter;
CatCList *cl;
CatCTup *ct;
List *volatile ctlist;
ListCell *ctlist_item;
int nmembers;
bool ordered;
HeapTuple ntp;
MemoryContext oldcxt;
int i;
// 1. 如果cache未初始化,调用CatalogCacheInitializeCache初始化
if (cache->cc_tupdesc == NULL)
CatalogCacheInitializeCache(cache);
.......
//2. 计算查找键的hash值
lHashValue = CatalogCacheComputeHashValue(cache, nkeys, v1, v2, v3, v4);
//3. 在cache的cc_lists中查找CatCList
dlist_foreach(iter, &cache->cc_lists)
{
cl = dlist_container(CatCList, cache_elem, iter.cur);
if (cl->dead)
continue; /* ignore dead entries */
if (cl->hash_value != lHashValue)
continue; /* quickly skip entry if wrong hash val */
/*
* see if the cached list matches our key.
*/
if (cl->nkeys != nkeys)
continue;
if (!CatalogCacheCompareTuple(cache, nkeys, cl->keys, arguments))
continue;
//3.1 为了加速后续的查询,将找到的CatCLisl放在cc lists 的最前面,然后将CatCList的refcount加1
dlist_move_head(&cache->cc_lists, &cl->cache_elem);
/* Bump the list's refcount and return it */
ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner);
cl->refcount++;
ResourceOwnerRememberCatCacheListRef(CurrentResourceOwner, cl);
CACHE2_elog(DEBUG2, "SearchCatCacheList(%s): found list",
cache->cc_relname);
#ifdef CATCACHE_STATS
cache->cc_lhits++;
#endif
return cl;//返回一个CatCList,CatCList里面有一个members数组,每个数组存储一个CatCtup
}
/*
4. 如果在CatCache中找不到 CatCList ,则扫描物理系统表并构建相应的 CatCList
并将它加入到 cc_list 所指向链表的头部。
*/
ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner);
ctlist = NIL;
PG_TRY();
{
.....
//4.1打开系统表,扫描
relation = heap_open(cache->cc_reloid, AccessShareLock);
scandesc = systable_beginscan(relation,
cache->cc_indexoid,
IndexScanOK(cache, cur_skey),
NULL,
nkeys,
cur_skey);
/* The list will be ordered iff we are doing an index scan */
ordered = (scandesc->irel != NULL);
while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
{
uint32 hashValue;
Index hashIndex;
bool found = false;
dlist_head *bucket;
/*4.1 查看是否已经存在这个元组
* See if there's an entry for this tuple already.
*/
ct = NULL;
hashValue = CatalogCacheComputeTupleHashValue(cache, cache->cc_nkeys, ntp);
hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);
bucket = &cache->cc_bucket[hashIndex];
dlist_foreach(iter, bucket)
{
ct = dlist_container(CatCTup, cache_elem, iter.cur);
if (ct->dead || ct->negative)
continue; /* ignore dead and negative entries */
if (ct->hash_value != hashValue)
continue; /* quickly skip entry if wrong hash val */
if (!ItemPointerEquals(&(ct->tuple.t_self), &(ntp->t_self)))
continue; /* not same tuple */
/*
已经找到了一个匹配的元素,但如果该元素已经属于另一个列表,就不能使用它。
*/
if (ct->c_list)
continue;
found = true;
break; /* A-OK */
}
if (!found)
{
/* 没找到元组,创建一个元组 */
ct = CatalogCacheCreateEntry(cache, ntp, arguments,
hashValue, hashIndex,
false);
}
/* 将ct元组添加到 ctlist中,然后增加它的引用计数。
这种操作方式是为了确保如果 lappend由于内存不足而失败,
仍然保持状态的正确性。*/
ctlist = lappend(ctlist, ct);
ct->refcount++;
}
systable_endscan(scandesc);
heap_close(relation, AccessShareLock);
/* 创建一个CatCList*/
oldcxt = MemoryContextSwitchTo(CatlogMemoryContext);
nmembers = list_length(ctlist);
cl = (CatCList *)
palloc(offsetof(CatCList, members) + nmembers * sizeof(CatCTup *));
/* Extract key values */
CatCacheCopyKeys(cache->cc_tupdesc, nkeys, cache->cc_keyno,
arguments, cl->keys);
MemoryContextSwitchTo(oldcxt);
/*
* We are now past the last thing that could trigger an elog before we
* have finished building the CatCList and remembering it in the
* resource owner. So it's OK to fall out of the PG_TRY, and indeed
* we'd better do so before we start marking the members as belonging
* to the list.
*/
}
PG_CATCH();
{
foreach(ctlist_item, ctlist)
{
ct = (CatCTup *) lfirst(ctlist_item);
Assert(ct->c_list == NULL);
Assert(ct->refcount > 0);
ct->refcount--;
if (
#ifndef CATCACHE_FORCE_RELEASE
ct->dead &&
#endif
ct->refcount == 0 &&
(ct->c_list == NULL || ct->c_list->refcount == 0))
CatCacheRemoveCTup(cache, ct);
}
PG_RE_THROW();
}
PG_END_TRY();
cl->cl_magic = CL_MAGIC;
cl->my_cache = cache;
cl->refcount = 0; /* for the moment */
cl->dead = false;
cl->ordered = ordered;
cl->nkeys = nkeys;
cl->hash_value = lHashValue;
cl->n_members = nmembers;
i = 0;
foreach(ctlist_item, ctlist)
{
cl->members[i++] = ct = (CatCTup *) lfirst(ctlist_item);
Assert(ct->c_list == NULL);
ct->c_list = cl;
/* release the temporary refcount on the member */
Assert(ct->refcount > 0);
ct->refcount--;
/* mark list dead if any members already dead */
if (ct->dead)
cl->dead = true;
}
Assert(i == nmembers);
//并将CatCList加入到 cc_lists 所指向链表的头部。
dlist_push_head(&cache->cc_lists, &cl->cache_elem);
/* Finally, bump the list's refcount and return it */
cl->refcount++;
ResourceOwnerRememberCatCacheListRef(CurrentResourceOwner, cl);
CACHE3_elog(DEBUG2, "SearchCatCacheList(%s): made list of %d members",
cache->cc_relname, nmembers);
return cl;
}
SearchCatcacheList 函数也会先计算查找键的 Hash 值,不过该函数首先会在 CatCache的 cc_lists
字段中记录的 CatCList 链表中查找以前是否缓存了该查找键的结果,该查找过程将使用 CatCList中
tuple 字段指向的元组与查找键进行 Hash 值比较。如果能够找到匹配该 Hash 值的 CatCList ,则 cc_lhits 加1并将该 CatCList 移到 cc_lists 所指向链表的头部,然后返回找到的 CatCList 。如果在CatCache 中找不到 CatCList ,则扫描物理系统表并构建相应的 CatCList 并将它加入到 cc_lists所指向链表的头部。