摘要:
从物理层开始,解读引擎是如何封装列数据, 如何使用列数据, 以及与其他模块的关系
数据目录:
以lineitem表为例:
[root@htap lineitem.tianmu]# tree
.
├── columns
│ ├── 0 -> /stonedb57/install/data/tianmu_data/16.0
│ ├── 1 -> /stonedb57/install/data/tianmu_data/16.1
│ ├── 10 -> /stonedb57/install/data/tianmu_data/16.10
│ ├── 11 -> /stonedb57/install/data/tianmu_data/16.11
│ ├── 12 -> /stonedb57/install/data/tianmu_data/16.12
│ ├── 13 -> /stonedb57/install/data/tianmu_data/16.13
│ ├── 14 -> /stonedb57/install/data/tianmu_data/16.14
│ ├── 15 -> /stonedb57/install/data/tianmu_data/16.15
│ ├── 2 -> /stonedb57/install/data/tianmu_data/16.2
│ ├── 3 -> /stonedb57/install/data/tianmu_data/16.3
│ ├── 4 -> /stonedb57/install/data/tianmu_data/16.4
│ ├── 5 -> /stonedb57/install/data/tianmu_data/16.5
│ ├── 6 -> /stonedb57/install/data/tianmu_data/16.6
│ ├── 7 -> /stonedb57/install/data/tianmu_data/16.7
│ ├── 8 -> /stonedb57/install/data/tianmu_data/16.8
│ └── 9 -> /stonedb57/install/data/tianmu_data/16.9
├── TABLE_DESC
├── V.630c75a90000274c
└── VERSION -> V.630c75a90000274c
17 directories, 3 files
说明:
- columns目录: 内部的文件夹名字为列的序号, 每一个列都是一个单独的存储
- TABLE_DESC文件: 表的描述信息, 主要包含表的存储引擎,及编码等相关属性
- VERSION: 表的最后一次的事务号, 指向V.xid
columns目录:
[root@htap columns]# ll
total 0
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 0 -> /stonedb57/install/data/tianmu_data/16.0
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 1 -> /stonedb57/install/data/tianmu_data/16.1
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 10 -> /stonedb57/install/data/tianmu_data/16.10
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 11 -> /stonedb57/install/data/tianmu_data/16.11
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 12 -> /stonedb57/install/data/tianmu_data/16.12
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 13 -> /stonedb57/install/data/tianmu_data/16.13
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 14 -> /stonedb57/install/data/tianmu_data/16.14
lrwxrwxrwx. 1 mysql mysql 41 Aug 29 16:16 15 -> /stonedb57/install/data/tianmu_data/16.15
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 2 -> /stonedb57/install/data/tianmu_data/16.2
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 3 -> /stonedb57/install/data/tianmu_data/16.3
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 4 -> /stonedb57/install/data/tianmu_data/16.4
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 5 -> /stonedb57/install/data/tianmu_data/16.5
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 6 -> /stonedb57/install/data/tianmu_data/16.6
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 7 -> /stonedb57/install/data/tianmu_data/16.7
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 8 -> /stonedb57/install/data/tianmu_data/16.8
lrwxrwxrwx. 1 mysql mysql 40 Aug 29 16:16 9 -> /stonedb57/install/data/tianmu_data/16.9
说明:
- 内部的软连接名字为列的序号
- 软连接文件指向该列真实的目录
可以对比表结构中的列字段:
mysql> desc lineitem;
+-----------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| l_orderkey | int(11) | NO | PRI | NULL | |
| l_partkey | int(11) | NO | | NULL | |
| l_suppkey | int(11) | NO | | NULL | |
| l_linenumber | int(11) | NO | PRI | NULL | |
| l_quantity | decimal(15,2) | NO | | NULL | |
| l_extendedprice | decimal(15,2) | NO | | NULL | |
| l_discount | decimal(15,2) | NO | | NULL | |
| l_tax | decimal(15,2) | NO | | NULL | |
| l_returnflag | char(1) | NO | | NULL | |
| l_linestatus | char(1) | NO | | NULL | |
| l_shipdate | date | NO | | NULL | |
| l_commitdate | date | NO | | NULL | |
| l_receiptdate | date | NO | | NULL | |
| l_shipinstruct | char(25) | NO | | NULL | |
| l_shipmode | char(10) | NO | | NULL | |
| l_comment | varchar(44) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+-------+
16 rows in set (0.05 sec)
具体列目录:
以第0列为例:
[root@htap 0]# tree
.
├── DATA
├── DN
├── filters
│ ├── bloom
│ ├── cmap
│ └── hist
├── META
└── v
└── 630c75a90000274c
5 directories, 4 files
说明:
MVCC多版本控制:
读取列的逻辑:
类结构关系:
创建列数据时的时序关系:
数据关系-DPN与PACK:
数据关系-VER事务版本与DPN的关系:
核心数据结构:
DPN
// Data Pack Node. Same layout on disk and in memory
struct DPN final {
public:
uint8_t used : 1; // occupied or not
uint8_t local : 1; // owned by a write transaction, thus to-be-commit
uint8_t synced : 1; // if the pack data in memory is up to date with the
// version on disk
uint8_t null_compressed : 1;
uint8_t data_compressed : 1;
uint8_t no_compress : 1;
uint8_t padding[3];
uint32_t base; // index of the DPN from which we copied, used by local pack
uint64_t addr; // data start address
uint64_t len; // data length
uint32_t nr; // number of records
uint32_t nn; // number of nulls
common::TX_ID xmin; // creation trx id
common::TX_ID xmax; // delete trx id
union {
int64_t min_i;
double min_d;
char min_s[8];
};
union {
int64_t max_i;
double max_d;
char max_s[8];
};
union {
int64_t sum_i;
double sum_d;
uint64_t maxlen;
};
private:
// a tagged pointer, 16 bits as ref count.
// Only read-only dpn uses it for ref counting; local dpn is managed only by
// one write session
std::atomic_ulong tagged_ptr;
COL_META
struct COL_META {
uint32_t magic;
uint32_t ver; // file version
uint8_t pss; // pack size shift
common::CT type; // type
common::PackFmt fmt; // data format: LZ4, snappy, lookup, raw, etc
uint8_t flag;
uint32_t precision;
uint32_t scale;
};
COL_VER_HDR_V3
struct alignas(128) COL_VER_HDR_V3 {
uint64_t nr; // no. of records
uint64_t nn; // no. of nulls
uint64_t np; // no. of packs
uint64_t auto_inc_next;
int64_t min;
int64_t max;
uint32_t dict_ver; // dict file version name. 0 means n/a
uint32_t unique : 1;
uint32_t unique_updated : 1;
uint64_t natural_size;
uint64_t compressed_size;
};