teledbx压测tpmc教程
本教程适用于采用benchmarksql_teledbx-5.0版本作为客户端压测6.11版本的teledbx服务端,客户端和服务端分别部署在不同机器上,且客户端通过千兆网口连接服务端进行压测实践。
1.压测准备
1.1 客户端
客户端采用改造版的benchmarksql-5.0,其文件名: benchmarksql_teledbx-5.0.tar.gz
1.1.1 简介
benchmarkSQL是一个开源的基准测试工具,支持多种数据库系统,可以用于测试各种关系型数据库系统的性能,主要用于TPC-C测试。它是一个 Java 应用程序,可以模拟多用户并发访问数据库,并测量数据库系统的吞吐量和响应时间。
TPC-C 是针对联机交易处理系统(OLTP)进行测试的规范。使用一个商品销售模型对 OLTP 系统进行测试,其中包含五类事务:
-
NewOrder:新订单的生成
-
Payment:订单付款
-
OrderStatus:最近订单查询
-
Delivery:配送
-
StockLevel:库存缺货状态分析
TPC-C 使用 tpmC 值(每分钟运行的事务数,Transactions per Minute)来衡量系统最大有效吞吐量(Max Qualified Throughput,MQTh),其中 Transactions 以 NewOrder Transaction 为准,即最终衡量单位为每分钟处理的新订单数。
1.1.2 配置工作
如果存在benchmarksql_teledbx-5.0.tar.gz这个文件,解压即可使用;否则按照下述步骤更新benchmarksql。
安装java
# 1.下载解压jdk
$ tar -zxvf jdk-8u202-linux-x64.tar.gz
$ mv jdk-8u202-linux-x64 jdk1.8.0_202
# 2.配置环境变量
$ export JAVA_HOME=/yourpath/jdk1.8.0_202
$ export PATH=$JAVA_HOME/bin:$PATH
安装ant
##方法1
#安装
$ sudo yum install -y ant
##方法2
#在计划安装的路径获取ant
$ tar -zxvf apache-ant-1.10.13-bin.tar.gz
#环境配置
vim ~/.bashrc
export ANT_HOME=/yourpath/apache-ant-1.10.13
export PATH=$PATH:$ANT_HOME/bin
source ~/.bashrc
#验证
ant -version
Apache Ant(TM) version 1.10.13 compiled on January 4 2023
安装benchmarksql
# 切换到5.0分支
$ cd benchmarksql
$ git checkout v5.0
# 更新pgjdbc
$ cd lib/postgres
$ mv postgresql-9.3.jar postgresql-9.3.jar.bak
$ cd ../run/sql.common
# 注释掉foreignkeys.sql内容
vi foreignkeys.sql
# 编译生成benchmarksql.jar包
$ cd benchmarksql && ant clean && ant
1.2 服务端
集群1通过管控面开出。
1.2.1 集群实例规格
集群1:2cn(1主1备),4dn(2主2备)
gtm:16C128G
cn:16C128G
dn:8C128G
节点名称 | IP | port | 数据目录 |
---|---|---|---|
gtm主 | 192.168.26.4 | 21000 | /data/teledbx/tdata_03/peformance/data/gtm1 |
gtm备 | 192.168.26.5 | 21000 | /data/teledbx/tdata_03/peformance/data/gtm2 |
cn001主 | 192.168.26.4 | 11345 | /data/teledbx/tdata_02/peformance/data/cn001 |
cn001备 | 192.168.26.5 | 11345 | 同上 |
dn001主 | 192.168.26.4 | 11000 | /data/teledbx/tdata_00/peformance/data/dn001 |
dn002主 | 192.168.26.4 | 11002 | 同上 |
dn001备 | 192.168.26.5 | 11000 | /data/teledbx/tdata_01/peformance/data/dn002 |
dn002备 | 192.168.26.5 | 11002 | 同上 |
内存和cpu
cpu: 112
内存总数: 1007GB
1.2.2 参数配置
-
集群2的开通参数。然后修改下述几个参数重启集群2:
init_pool_size=1
min_free_size=1
min_pool_size=1
pooler_scale_factor = 1 #该参数设置为64会更高
max_connections=5000
2.压测
2.1 压测配置修改
cp prop.pg prop.teledbx
###########props.teledbx内容如下###############
db=postgres
driver=org.postgresql.Driver
conn=jdbc:postgresql://192.168.26.4:11345/tpcc
user=teledbx
password=密码
warehouses=100
loadWorkers=95
terminals=200
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=10
//Number of total transactions per minute
limitTxnsPerMin=0
//Set to true to run in 4.x compatible mode. Set to false to use the
//entire configured database evenly.
terminalWarehouseFixed=true
//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
newOrderWeight=45
paymentWeight=43
orderStatusWeight=4
deliveryWeight=4
stockLevelWeight=4
// Directory name to create for collecting detailed result data.
// Comment this out to suppress.
resultDirectory=my_result_%tY-%tm-%td_%tH%tM%tS
//osCollectorScript=./misc/os_collector_linux.py
//osCollectorInterval=1
//osCollectorSSHAddr=user@dbhost
//osCollectorDevices=net_eth0 blk_sda
步骤
#导入数据
nohup ./runDatabaseBuild.sh props.teledb > out_build.log 2>&1 &
#运行压测
nohup ./runBenchmark.sh props.teledb > out_run.log 2>&1 &
#卸载数据
./runDatabaseDestroy.sh props.teledbx &
2.2 压测结果
集群1压测结果:
并发数 | tpmc平均 | 事务总数 | 总消耗内存(cn) | 压测cpu占比(cn) |
---|---|---|---|---|
100 | 15.9w | 150.3w | 23GB | 16-26% |
200 | 25.9w | 251.6w | 32GB | 35-43% |
300 | 47.1w | 409.8w | 50GB | 45-55% |
500 | 51.8w | 457.9w | 70GB | 75-90% |
1000 | 36.2w | 35.3w | 90GB | 75-84% |
3.问题
3.1 问题1
基于管控开出示例,如果ulimit -n为1024,则benchmarksql压测300-1000并发抛出java堆栈,对应连接的cn日志抛出错误:
2023-10-16 17:18:15.852 CST,,,2203938,coord(2203938,0),,652cfc2f.21a122,coord(2203938,0),28151682,,2023-10-16 17:02:39 CST,,0,LOG,08006,"PoolMgr: failed to connect to Datanode:[port=11004 dbname=tpcc user=teledbx application_name='pgxc:cn001' sslmode=disable options='-c remotetype=coordinator -c parentnode=cn001 -c DateStyle=iso,ymd -c timezone=asia/shanghai -c geqo=on -c intervalstyle=postgres -c lc_monetary=zh_CN.utf8 -c inner_conn=no' connect_timeout=60], validSize:0, size:1, errmsg:could not create socket: 打开的文件过多
",,,,,,,,,""
已解决
1.管控agent代码修改sysctl.conf和/etc/security/limits.conf
3.2 问题2
压测高并发:500或者1000并发,cn报事务rollback,dn节点日志报大量如下日志
process 2335253 still waiting for ShareLock on transaction 608333 after 1000.058 ms","Process holding the lock: 0. Wait queue:
暂未解决
该问题主要是主从节点集群部署的高可用agent程序引起,因此需要彻底停止主从的oss_observer服务进程,下面步骤主从都执行:
#1.取消定时任务
crontab -e
#*/1 * * * * (/home/teledbx/install/teledbx_oss/script/start_monitor.sh &> /tmp/start_monitor.sh.log)
#2.停止agent程序,oss_observer.
cd /home/teledbx/install/teledbx_oss/tools/op
sh stop.sh
进一步确定
该定时任务主要去执行函数:
语句1: select pg_unlock_check_deadlock();
postgres=# select pg_unlock_check_deadlock();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Failed to read a valid object file image from memory.
Core was generated by `postgres: tangyujie postgres [local] '.
Program terminated with signal 11, Segmentation fault.
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
1368 memcpy(block_temp[count].blocking_gid, blocking_gid, sizeof(char) * MAX_GID);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-55.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libselinux-2.5-15.el7.x86_64 openssl-libs-1.0.2k-26.el7_9.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
Cannot access memory at address 0x7ffd1bafeea8
语句2: select pg_unlock_execute()
4.长稳一致性检验
总共包含9个sql语句,该sql语句保存在benchmarksql-5.0/run/sql.common/test.sql中。
语句验证
语句1:
-- Condition 1: W_YTD = sum(D_YTD)
SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
FROM bmsql_warehouse w,
(SELECT d_w_id, SUM(d_ytd) sum_d_ytd
FROM bmsql_district
GROUP BY d_w_id) d
WHERE w.w_id = d.d_w_id) as x
WHERE w_ytd != sum_d_ytd;
tpcc=# SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
tpcc(# FROM bmsql_warehouse w,
tpcc(# (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
tpcc(# FROM bmsql_district
tpcc(# GROUP BY d_w_id) d
tpcc(# WHERE w.w_id = d.d_w_id) as x
tpcc-# WHERE w_ytd != sum_d_ytd;
w_id | w_ytd | sum_d_ytd
------+-------+-----------
(0 rows)
语句2:
-- Condition 2: D_NEXT_O_ID - 1 = max(O_ID) = max(NO_O_ID)
SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
FROM bmsql_district d,
(SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
FROM bmsql_oorder
GROUP BY o_w_id, o_d_id) o,
(SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
FROM bmsql_new_order
GROUP BY no_w_id, no_d_id) no
WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;
tpcc=# SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
tpcc(# FROM bmsql_district d,
tpcc(# (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
tpcc(# FROM bmsql_oorder
tpcc(# GROUP BY o_w_id, o_d_id) o,
tpcc(# (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
tpcc(# FROM bmsql_new_order
tpcc(# GROUP BY no_w_id, no_d_id) no
tpcc(# WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
tpcc(# d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
tpcc-# WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;
d_w_id | d_id | d_next_o_id | max_o_id | max_no_o_id
--------+------+-------------+----------+-------------
(0 rows)
语句3:
-- Condition 3: max(NO_O_ID) - min(NO_O_ID) + 1
-- = [number of rows in the NEW-ORDER table for this bmsql_district]
SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
MIN(no_o_id) min_no_o_id, COUNT(*) count_no
FROM bmsql_new_order
GROUP BY no_w_id, no_d_Id) as x
WHERE max_no_o_id - min_no_o_id + 1 != count_no;
tpcc=# SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
tpcc(# MIN(no_o_id) min_no_o_id, COUNT(*) count_no
tpcc(# FROM bmsql_new_order
tpcc(# GROUP BY no_w_id, no_d_Id) as x
tpcc-# WHERE max_no_o_id - min_no_o_id + 1 != count_no;
no_w_id | no_d_id | max_no_o_id | min_no_o_id | count_no
---------+---------+-------------+-------------+----------
(0 rows)
语句4:(这个有点久)
-- Condition 4: sum(O_OL_CNT)
-- = [number of rows in the ORDER-LINE table for this bmsql_district]
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
FROM bmsql_oorder
GROUP BY o_w_id, o_d_id) o,
(SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
FROM bmsql_order_line
GROUP BY ol_w_id, ol_d_id) ol
WHERE o.o_w_id = ol.ol_w_id AND
o.o_d_id = ol.ol_d_id) as x
WHERE sum_o_ol_cnt != count_ol;
tpcc=# -- Condition 4: sum(O_OL_CNT)
tpcc=# -- = [number of rows in the ORDER-LINE table for this bmsql_district]
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
tpcc(# FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
tpcc(# FROM bmsql_oorder
tpcc(# GROUP BY o_w_id, o_d_id) o,
tpcc(# (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
tpcc(# FROM bmsql_order_line
tpcc(# GROUP BY ol_w_id, ol_d_id) ol
tpcc(# WHERE o.o_w_id = ol.ol_w_id AND
tpcc(# o.o_d_id = ol.ol_d_id) as x
tpcc-# WHERE sum_o_ol_cnt != count_ol;
o_w_id | o_d_id | sum_o_ol_cnt | count_ol
--------+--------+--------------+----------
(0 rows)
语句5:
-- Condition 5: For any row in the ORDER table, O_CARRIER_ID is set to a null
-- value if and only if there is a corresponding row in the
-- NEW-ORDER table
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
FROM bmsql_oorder o,
(SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
FROM bmsql_new_order
GROUP BY no_w_id, no_d_id, no_o_id) no
WHERE o.o_w_id = no.no_w_id AND
o.o_d_id = no.no_d_id AND
o.o_id = no.no_o_id) as x
WHERE (o_carrier_id IS NULL AND count_no = 0) OR
(o_carrier_id IS NOT NULL AND count_no != 0);
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
tpcc(# FROM bmsql_oorder o,
tpcc(# (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
tpcc(# FROM bmsql_new_order
tpcc(# GROUP BY no_w_id, no_d_id, no_o_id) no
tpcc(# WHERE o.o_w_id = no.no_w_id AND
tpcc(# o.o_d_id = no.no_d_id AND
tpcc(# o.o_id = no.no_o_id) as x
tpcc-# WHERE (o_carrier_id IS NULL AND count_no = 0) OR
tpcc-# (o_carrier_id IS NOT NULL AND count_no != 0);
o_w_id | o_d_id | o_id | o_carrier_id | count_no
--------+--------+------+--------------+----------
(0 rows)
语句6:(比较耗时)
-- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
-- of rows in the ORDER-LINE table for the corresponding order
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
FROM bmsql_oorder o,
(SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
FROM bmsql_order_line
GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
WHERE o.o_w_id = ol.ol_w_id AND
o.o_d_id = ol.ol_d_id AND
o.o_id = ol.ol_o_id) as x
WHERE o_ol_cnt != count_ol;
tpcc=# -- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
tpcc=# -- of rows in the ORDER-LINE table for the corresponding order
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
tpcc(# FROM bmsql_oorder o,
tpcc(# (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
tpcc(# FROM bmsql_order_line
tpcc(# GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
tpcc(# WHERE o.o_w_id = ol.ol_w_id AND
tpcc(# o.o_d_id = ol.ol_d_id AND
tpcc(# o.o_id = ol.ol_o_id) as x
tpcc-# WHERE o_ol_cnt != count_ol;
o_w_id | o_d_id | o_id | o_ol_cnt | count_ol
--------+--------+------+----------+----------
(0 rows)
语句7:
-- Condition 7: For any row in the ORDER-LINE table, OL_DELIVERY_D is set to
-- a null date/time if and only if the corresponding row in the
-- ORDER table has O_CARRIER_ID set to a null value
SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
o.o_carrier_id
FROM bmsql_order_line ol,
bmsql_oorder o
WHERE ol.ol_w_id = o.o_w_id AND
ol.ol_d_id = o.o_d_id AND
ol.ol_o_id = o.o_id) as x
WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
(ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
tpcc=# SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
tpcc(# o.o_carrier_id
tpcc(# FROM bmsql_order_line ol,
tpcc(# bmsql_oorder o
tpcc(# WHERE ol.ol_w_id = o.o_w_id AND
tpcc(# ol.ol_d_id = o.o_d_id AND
tpcc(# ol.ol_o_id = o.o_id) as x
tpcc-# WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
tpcc-# (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
ol_w_id | ol_d_id | ol_o_id | ol_delivery_d | o_carrier_id
---------+---------+---------+---------------+--------------
(0 rows)
语句8:
-- Condition 8: W_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
FROM bmsql_warehouse w,
(SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
WHERE w.w_id = h.h_w_id) as x
WHERE w_ytd != sum_h_amount;
tpcc=# SELECT *
tpcc-# FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
tpcc(# FROM bmsql_warehouse w,
tpcc(# (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
tpcc(# WHERE w.w_id = h.h_w_id) as x
tpcc-# WHERE w_ytd != sum_h_amount;
w_id | w_ytd | sum_h_amount
------+-------+--------------
(0 rows)
语句9:
-- Condition 9: D_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
FROM bmsql_district d,
(SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
FROM bmsql_history
GROUP BY h_w_id, h_d_id) h
WHERE d.d_w_id = h.h_w_id
AND d.d_id = h.h_d_id) as x
WHERE d_ytd != sum_h_amount;
tpcc=# SELECT *
tpcc-# FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
tpcc(# FROM bmsql_district d,
tpcc(# (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
tpcc(# FROM bmsql_history
tpcc(# GROUP BY h_w_id, h_d_id) h
tpcc(# WHERE d.d_w_id = h.h_w_id
tpcc(# AND d.d_id = h.h_d_id) as x
tpcc-# WHERE d_ytd != sum_h_amount;
d_w_id | d_id | d_ytd | sum_h_amount
--------+------+-------+--------------
(0 rows)
结论
性能:100-1000的并发连接中,500并发表现性能更为优秀,提高pooler_scale_factor = 64,修改参数、通过绑核、设置建表填充因子会使得tpmc更上一个台阶。
稳定性:基于teledbx-6.11.03的版本进行7*24小时压测,基于一致性检验的语句输出为空或0,长稳测试通过,可应用于生产。