机器信息
[ck@nor ~]$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
96 Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
[ck@nor ~]$
[ck@nor ~]$ free -m
total used free shared buff/cache available
Mem: 385290 14033 17124 1395 354131 356301
Swap: 16383 0 16383
[ck@nor ~]$
[ck@nor ~]$ grep ^ /sys/block/*/queue/rotational | grep nvme1n1
/sys/block/nvme1n1/queue/rotational:0
[ck@nor ~]$
[ck@nor ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 893.1G 0 disk
├─sda1 8:1 0 300M 0 part /boot/efi
├─sda2 8:2 0 2G 0 part /boot
├─sda3 8:3 0 10G 0 part /
├─sda4 8:4 0 100G 0 part /home
├─sda5 8:5 0 16G 0 part [SWAP]
├─sda6 8:6 0 14G 0 part /tmp
├─sda7 8:7 0 10G 0 part /usr
└─sda8 8:8 0 740.9G 0 part /var
nvme0n1 259:0 0 2.9T 0 disk
└─nvme0n1p1 259:2 0 2.9T 0 part /data1
nvme1n1 259:1 0 2.9T 0 disk
└─nvme1n1p1 259:3 0 2.9T 0 part /data2
压测数据
根据 ClickHouse 官网压测方法、数据生成方式:clickhouse.com/docs/en/getting-started/example-datasets/star-schema/
压测参数 `-s 1000`, 总共生成 60 亿条数据,670G 数据文件。
| 文件名 | 大小 | 行数 | 表磁盘占用空间 |
| ------------- | ---- | -------------------- | -------------- |
| customer.tbl | 3.2G | 30000000(3000万) | 1.2G |
| lineorder.tbl | 680G | 5999989709(近60亿) | 168G |
| part.tbl | 193M | 2000000(200万) | 35M |
| supplier.tbl | 188M | 2000000(200万) | 76M |
表统计信息
SELECT
table AS `表名`,
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率%`
FROM system.parts
WHERE database = 'bench'
GROUP BY table
ORDER BY table;
+----------------+------------+--------------+--------------+------------+
| 表名 | 总行数 | 原始大小 | 压缩大小 | 压缩率% |
+----------------+------------+--------------+--------------+------------+
| customer | 30000000 | 1.65 GiB | 1.12 GiB | 68 |
| lineorder | 5999989709 | 240.32 GiB | 167.26 GiB | 70 |
| lineorder_flat | 7684711169 | 1.21 TiB | 694.26 GiB | 56 |
| part | 2000000 | 48.97 MiB | 34.39 MiB | 70 |
| supplier | 2000000 | 110.64 MiB | 75.33 MiB | 68 |
+----------------+------------+--------------+--------------+------------+
5 rows in set (0.00 sec)
1.压缩大小,就是实际占用的磁盘大小,稍微有些偏差是因为目录中还包含其他一些文件
2.总行数是统计值,不是精确值
3.原始大小不知怎么计算的,跟文件大小差别很大,例如 lineorder 文件大小 680G,原始大小只有 240.35 GiB。
通过列统计表大小
SELECT
table AS `表名`,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(column_data_compressed_bytes)) AS `压缩大小`,
round((sum(column_data_compressed_bytes) / sum(column_data_uncompressed_bytes)) * 100, 0) AS `压缩率%`
FROM system.parts_columns
WHERE database = 'bench'
GROUP BY table
ORDER BY table;
+----------------+--------------+--------------+------------+
| 表名 | 原始大小 | 压缩大小 | 压缩率% |
+----------------+--------------+--------------+------------+
| customer | 1.65 GiB | 1.12 GiB | 68 |
| lineorder | 240.30 GiB | 167.25 GiB | 70 |
| lineorder_flat | 1.20 TiB | 688.05 GiB | 56 |
| part | 48.97 MiB | 34.39 MiB | 70 |
| supplier | 110.64 MiB | 75.33 MiB | 68 |
+----------------+--------------+--------------+------------+
5 rows in set (0.04 sec)
通过列统计表大小,跟通过表统计值是一样的。
列统计信息
SELECT
table AS `表名`,
column AS `列名`,
any(type) AS `类型`,
sum(rows) AS `总行数`,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(column_data_compressed_bytes)) AS `压缩大小`,
round((sum(column_data_compressed_bytes) / sum(column_data_uncompressed_bytes)) * 100, 0) AS `压缩率%`
FROM system.parts_columns
WHERE database = 'bench' and table != 'lineorder_flat'
GROUP BY table, column
ORDER BY table, column ASC;
+-----------+------------------+------------------------+------------+--------------+--------------+------------+
| 表名 | 列名 | 类型 | 总行数 | 原始大小 | 压缩大小 | 压缩率% |
+-----------+------------------+------------------------+------------+--------------+--------------+------------+
| customer | C_ADDRESS | String | 30000000 | 457.73 MiB | 459.61 MiB | 100 |
| customer | C_CITY | LowCardinality(String) | 30000000 | 28.73 MiB | 28.81 MiB | 100 |
| customer | C_CUSTKEY | UInt32 | 30000000 | 114.44 MiB | 114.94 MiB | 100 |
| customer | C_MKTSEGMENT | LowCardinality(String) | 30000000 | 28.71 MiB | 17.27 MiB | 60 |
| customer | C_NAME | String | 30000000 | 543.59 MiB | 120.19 MiB | 22 |
| customer | C_NATION | LowCardinality(String) | 30000000 | 28.71 MiB | 28.80 MiB | 100 |
| customer | C_PHONE | String | 30000000 | 457.76 MiB | 360.34 MiB | 79 |
| customer | C_REGION | LowCardinality(String) | 30000000 | 28.71 MiB | 17.27 MiB | 60 |
| lineorder | LO_COMMITDATE | Date | 5999989709 | 11.17 GiB | 8.37 GiB | 75 |
| lineorder | LO_CUSTKEY | UInt32 | 5999989709 | 22.35 GiB | 8.98 GiB | 40 |
| lineorder | LO_DISCOUNT | UInt8 | 5999989709 | 5.59 GiB | 4.53 GiB | 81 |
| lineorder | LO_EXTENDEDPRICE | UInt32 | 5999989709 | 22.35 GiB | 22.45 GiB | 100 |
| lineorder | LO_LINENUMBER | UInt8 | 5999989709 | 5.59 GiB | 2.80 GiB | 50 |
| lineorder | LO_ORDERDATE | Date | 5999989709 | 11.17 GiB | 51.28 MiB | 0 |
| lineorder | LO_ORDERKEY | UInt32 | 5999989709 | 22.35 GiB | 7.34 GiB | 33 |
| lineorder | LO_ORDERPRIORITY | LowCardinality(String) | 5999989709 | 5.61 GiB | 2.54 GiB | 45 |
| lineorder | LO_ORDTOTALPRICE | UInt32 | 5999989709 | 22.35 GiB | 9.01 GiB | 40 |
| lineorder | LO_PARTKEY | UInt32 | 5999989709 | 22.35 GiB | 22.45 GiB | 100 |
| lineorder | LO_QUANTITY | UInt8 | 5999989709 | 5.59 GiB | 5.61 GiB | 100 |
| lineorder | LO_REVENUE | UInt32 | 5999989709 | 22.35 GiB | 22.45 GiB | 100 |
| lineorder | LO_SHIPMODE | LowCardinality(String) | 5999989709 | 5.61 GiB | 4.24 GiB | 76 |
| lineorder | LO_SHIPPRIORITY | UInt8 | 5999989709 | 5.59 GiB | 25.49 MiB | 0 |
| lineorder | LO_SUPPKEY | UInt32 | 5999989709 | 22.35 GiB | 22.45 GiB | 100 |
| lineorder | LO_SUPPLYCOST | UInt32 | 5999989709 | 22.35 GiB | 19.71 GiB | 88 |
| lineorder | LO_TAX | UInt8 | 5999989709 | 5.59 GiB | 4.25 GiB | 76 |
| part | P_BRAND | LowCardinality(String) | 2000000 | 3.84 MiB | 3.84 MiB | 100 |
| part | P_CATEGORY | LowCardinality(String) | 2000000 | 1.91 MiB | 1.92 MiB | 100 |
| part | P_COLOR | LowCardinality(String) | 2000000 | 1.91 MiB | 1.92 MiB | 100 |
| part | P_CONTAINER | LowCardinality(String) | 2000000 | 1.91 MiB | 1.92 MiB | 100 |
| part | P_MFGR | LowCardinality(String) | 2000000 | 1.91 MiB | 1.16 MiB | 61 |
| part | P_NAME | String | 2000000 | 26.03 MiB | 12.14 MiB | 47 |
| part | P_PARTKEY | UInt32 | 2000000 | 7.63 MiB | 7.66 MiB | 100 |
| part | P_SIZE | UInt8 | 2000000 | 1.91 MiB | 1.92 MiB | 100 |
| part | P_TYPE | LowCardinality(String) | 2000000 | 1.92 MiB | 1.92 MiB | 100 |
| supplier | S_ADDRESS | String | 2000000 | 30.52 MiB | 30.65 MiB | 100 |
| supplier | S_CITY | LowCardinality(String) | 2000000 | 1.92 MiB | 1.92 MiB | 100 |
| supplier | S_NAME | String | 2000000 | 36.24 MiB | 8.01 MiB | 22 |
| supplier | S_NATION | LowCardinality(String) | 2000000 | 1.91 MiB | 1.92 MiB | 100 |
| supplier | S_PHONE | String | 2000000 | 30.52 MiB | 24.02 MiB | 79 |
| supplier | S_REGION | LowCardinality(String) | 2000000 | 1.91 MiB | 1.15 MiB | 60 |
| supplier | S_SUPPKEY | UInt32 | 2000000 | 7.63 MiB | 7.66 MiB | 100 |
+-----------+------------------+------------------------+------------+--------------+--------------+------------+
41 rows in set (0.00 sec)
感觉列统计信息跟表统计信息,使用了很多共同的统计值。
数据导入性能
主要记录 lineorder.tbl 导入时候性能:
[ck@nor ~]$ iostat -x 10
avg-cpu: %user %nice %system %iowait %steal %idle
5.85 0.00 0.87 0.34 0.00 92.93
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 0.00 54993.30 2488.20 2213.80 318016.00 228832.80 232.60 60.04 12.64 0.32 26.49 0.21 96.76
sda 0.00 0.00 2.80 1.00 48.40 5.20 28.21 0.00 0.13 0.18 0.00 0.11 0.04
clickhouser srv 进程资源(top 命令)
[ck@nor ~]$ top
94850 ck 20 0 158.9g 6.3g 256264 S 215.6 1.7 9472:34 clickhouse-serv
[ck@nor ~]$ iostat 10
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
nvme0n1 0.00 0.00 0.00 0 0
nvme1n1 4700.90 322771.60 223831.20 3227716 2238312
sda 0.20 0.00 1.60 0 16
写入过程中,cpu、内存资源消耗很低,磁盘读达到 320MB/s,写达到 220MB/s。
中间过程,可能是由于 compaction 或者其他,写能达到 1.1GB/s。
耗时 37 分钟,268 万条每秒。
在构建 lineorder_flat 时候,设置 max_memory_usage 怎么都不生效,一直限制在 10G,不知道为啥。
Query OK, 5999989709 rows affected (1 hour 41 min 47.27 sec)
测试结果
mysql> select count(1) from lineorder_flat;
+------------+
| count() |
+------------+
| 5999989709 |
+------------+
1 row in set (0.00 sec)
执行结果是多次执行后,后几次的平均值,往往第一次执行较慢,后面几次很快,后面几次执行时间差不多。
| 查询 | 耗时 |
| ---- | -------- |
| Q1.1 | 0.51 sec |
| Q1.2 | 0.06 sec |
| Q1.3 | 0.04 sec |
| Q2.1 | 2.79 sec |
| Q2.2 | 2.29 sec |
| Q2.3 | 2.13 sec |
| Q3.1 | 5.12 sec |
| Q3.2 | 4.12 sec |
| Q3.3 | 3.65 sec |
| Q3.4 | 0.06 sec |
| Q4.1 | 3.95 sec |
| Q4.2 | 1.60 sec |
| Q4.3 | 1.56 sec |