searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享
原创

teledbx压测tpmc教程

2024-10-17 09:34:34
9
0

teledbx压测tpmc教程

本教程适用于采用benchmarksql_teledbx-5.0版本作为客户端压测6.11版本的teledbx服务端,客户端和服务端分别部署在不同机器上,且客户端通过千兆网口连接服务端进行压测实践。

1.压测准备

1.1 客户端

客户端采用改造版的benchmarksql-5.0,其文件名: benchmarksql_teledbx-5.0.tar.gz

1.1.1 简介

benchmarkSQL是一个开源的基准测试工具,支持多种数据库系统,可以用于测试各种关系型数据库系统的性能,主要用于TPC-C测试。它是一个 Java 应用程序,可以模拟多用户并发访问数据库,并测量数据库系统的吞吐量和响应时间。

TPC-C 是针对联机交易处理系统(OLTP)进行测试的规范。使用一个商品销售模型对 OLTP 系统进行测试,其中包含五类事务:

  • NewOrder:新订单的生成

  • Payment:订单付款

  • OrderStatus:最近订单查询

  • Delivery:配送

  • StockLevel:库存缺货状态分析

TPC-C 使用 tpmC 值(每分钟运行的事务数,Transactions per Minute)来衡量系统最大有效吞吐量(Max Qualified Throughput,MQTh),其中 Transactions 以 NewOrder Transaction 为准,即最终衡量单位为每分钟处理的新订单数。

1.1.2 配置工作

如果存在benchmarksql_teledbx-5.0.tar.gz这个文件,解压即可使用;否则按照下述步骤更新benchmarksql。

安装java

# 1.下载解压jdk
$ tar -zxvf jdk-8u202-linux-x64.tar.gz
$ mv jdk-8u202-linux-x64 jdk1.8.0_202
# 2.配置环境变量
$ export JAVA_HOME=/yourpath/jdk1.8.0_202
$ export PATH=$JAVA_HOME/bin:$PATH

安装ant

##方法1
#安装
$ sudo yum install -y ant

##方法2
#在计划安装的路径获取ant
$ tar -zxvf apache-ant-1.10.13-bin.tar.gz
#环境配置
vim ~/.bashrc
export ANT_HOME=/yourpath/apache-ant-1.10.13
export PATH=$PATH:$ANT_HOME/bin
source ~/.bashrc
#验证
ant -version
Apache Ant(TM) version 1.10.13 compiled on January 4 2023

安装benchmarksql

# 切换到5.0分支
$ cd benchmarksql
$ git checkout v5.0

# 更新pgjdbc
$ cd lib/postgres
$ mv postgresql-9.3.jar postgresql-9.3.jar.bak
$ cd ../run/sql.common
# 注释掉foreignkeys.sql内容
vi foreignkeys.sql
# 编译生成benchmarksql.jar包
$ cd benchmarksql && ant clean && ant

1.2 服务端

集群1通过管控面开出。

1.2.1 集群实例规格

集群1:2cn(1主1备),4dn(2主2备)

gtm:16C128G
cn:16C128G
dn:8C128G
节点名称 IP port 数据目录
gtm主 192.168.26.4 21000 /data/teledbx/tdata_03/peformance/data/gtm1
gtm备 192.168.26.5 21000 /data/teledbx/tdata_03/peformance/data/gtm2
cn001主 192.168.26.4 11345 /data/teledbx/tdata_02/peformance/data/cn001
cn001备 192.168.26.5 11345 同上
dn001主 192.168.26.4 11000 /data/teledbx/tdata_00/peformance/data/dn001
dn002主 192.168.26.4 11002 同上
dn001备 192.168.26.5 11000 /data/teledbx/tdata_01/peformance/data/dn002
dn002备 192.168.26.5 11002 同上

内存和cpu

cpu: 112
内存总数: 1007GB

1.2.2 参数配置

  • 集群2的开通参数。然后修改下述几个参数重启集群2:

    init_pool_size=1
    min_free_size=1
    min_pool_size=1
    pooler_scale_factor = 1 #该参数设置为64会更高
    max_connections=5000

     

2.压测

2.1 压测配置修改

cp prop.pg prop.teledbx
###########props.teledbx内容如下###############
db=postgres
driver=org.postgresql.Driver
conn=jdbc:postgresql://192.168.26.4:11345/tpcc
user=teledbx
password=密码

warehouses=100
loadWorkers=95

terminals=200
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=10
//Number of total transactions per minute
limitTxnsPerMin=0

//Set to true to run in 4.x compatible mode. Set to false to use the
//entire configured database evenly.
terminalWarehouseFixed=true

//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
newOrderWeight=45
paymentWeight=43
orderStatusWeight=4
deliveryWeight=4
stockLevelWeight=4

// Directory name to create for collecting detailed result data.
// Comment this out to suppress.
resultDirectory=my_result_%tY-%tm-%td_%tH%tM%tS
//osCollectorScript=./misc/os_collector_linux.py
//osCollectorInterval=1
//osCollectorSSHAddr=user@dbhost
//osCollectorDevices=net_eth0 blk_sda

步骤

#导入数据
nohup ./runDatabaseBuild.sh props.teledb > out_build.log 2>&1 &
#运行压测
nohup ./runBenchmark.sh props.teledb > out_run.log 2>&1 &
#卸载数据
./runDatabaseDestroy.sh props.teledbx &

2.2 压测结果

集群1压测结果

并发数 tpmc平均 事务总数 总消耗内存(cn) 压测cpu占比(cn)
100 15.9w 150.3w 23GB 16-26%
200 25.9w 251.6w 32GB 35-43%
300 47.1w 409.8w 50GB 45-55%
500 51.8w 457.9w 70GB 75-90%
1000 36.2w 35.3w 90GB 75-84%

3.问题

3.1 问题1

基于管控开出示例,如果ulimit -n为1024,则benchmarksql压测300-1000并发抛出java堆栈,对应连接的cn日志抛出错误:

2023-10-16 17:18:15.852 CST,,,2203938,coord(2203938,0),,652cfc2f.21a122,coord(2203938,0),28151682,,2023-10-16 17:02:39 CST,,0,LOG,08006,"PoolMgr: failed to connect to Datanode:[port=11004 dbname=tpcc user=teledbx application_name='pgxc:cn001' sslmode=disable options='-c remotetype=coordinator -c parentnode=cn001  -c DateStyle=iso,ymd -c timezone=asia/shanghai -c geqo=on -c intervalstyle=postgres -c lc_monetary=zh_CN.utf8 -c inner_conn=no' connect_timeout=60], validSize:0, size:1, errmsg:could not create socket: 打开的文件过多
",,,,,,,,,""

已解决

1.管控agent代码修改sysctl.conf和/etc/security/limits.conf

3.2 问题2

压测高并发:500或者1000并发,cn报事务rollback,dn节点日志报大量如下日志

process 2335253 still waiting for ShareLock on transaction 608333 after 1000.058 ms","Process holding the lock: 0. Wait queue: 

暂未解决

该问题主要是主从节点集群部署的高可用agent程序引起,因此需要彻底停止主从的oss_observer服务进程,下面步骤主从都执行:

#1.取消定时任务
crontab -e
#*/1 * * * * (/home/teledbx/install/teledbx_oss/script/start_monitor.sh &> /tmp/start_monitor.sh.log)

#2.停止agent程序,oss_observer.
cd /home/teledbx/install/teledbx_oss/tools/op
sh stop.sh

进一步确定

该定时任务主要去执行函数:

语句1: select pg_unlock_check_deadlock();
postgres=# select pg_unlock_check_deadlock();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Failed to read a valid object file image from memory.
Core was generated by `postgres: tangyujie postgres [local]                                         '.
Program terminated with signal 11, Segmentation fault.
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
  blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
  blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
1368 memcpy(block_temp[count].blocking_gid, blocking_gid, sizeof(char) * MAX_GID);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-55.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libselinux-2.5-15.el7.x86_64 openssl-libs-1.0.2k-26.el7_9.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
  blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
  blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
Cannot access memory at address 0x7ffd1bafeea8


语句2: select pg_unlock_execute()

4.长稳一致性检验

总共包含9个sql语句,该sql语句保存在benchmarksql-5.0/run/sql.common/test.sql中。

语句验证

语句1:

-- Condition 1: W_YTD = sum(D_YTD)
SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
          FROM bmsql_warehouse w,
              (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
                FROM bmsql_district
                GROUP BY d_w_id) d
          WHERE w.w_id = d.d_w_id) as x
WHERE w_ytd != sum_d_ytd;

tpcc=# SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
tpcc(#            FROM bmsql_warehouse w,
tpcc(#                 (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
tpcc(#                  FROM bmsql_district
tpcc(#                  GROUP BY d_w_id) d
tpcc(#            WHERE w.w_id = d.d_w_id) as x
tpcc-# WHERE w_ytd != sum_d_ytd;
w_id | w_ytd | sum_d_ytd
------+-------+-----------
(0 rows)

语句2:

-- Condition 2: D_NEXT_O_ID - 1 = max(O_ID) = max(NO_O_ID)
SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
          FROM bmsql_district d,
              (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
                FROM bmsql_oorder
                GROUP BY o_w_id, o_d_id) o,
              (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
                FROM bmsql_new_order
                GROUP BY no_w_id, no_d_id) no
          WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
                d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;

tpcc=# SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
tpcc(#            FROM bmsql_district d,
tpcc(#                 (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
tpcc(#                  FROM bmsql_oorder
tpcc(#                  GROUP BY o_w_id, o_d_id) o,
tpcc(#                 (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
tpcc(#                  FROM bmsql_new_order
tpcc(#                  GROUP BY no_w_id, no_d_id) no
tpcc(#            WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
tpcc(#                 d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
tpcc-# WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;
d_w_id | d_id | d_next_o_id | max_o_id | max_no_o_id
--------+------+-------------+----------+-------------
(0 rows)

语句3:

-- Condition 3: max(NO_O_ID) - min(NO_O_ID) + 1
-- = [number of rows in the NEW-ORDER table for this bmsql_district]
SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
                MIN(no_o_id) min_no_o_id, COUNT(*) count_no
          FROM bmsql_new_order
          GROUP BY no_w_id, no_d_Id) as x
WHERE max_no_o_id - min_no_o_id + 1 != count_no;

tpcc=# SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
tpcc(#                   MIN(no_o_id) min_no_o_id, COUNT(*) count_no
tpcc(#            FROM bmsql_new_order
tpcc(#            GROUP BY no_w_id, no_d_Id) as x
tpcc-# WHERE max_no_o_id - min_no_o_id + 1 != count_no;
no_w_id | no_d_id | max_no_o_id | min_no_o_id | count_no
---------+---------+-------------+-------------+----------
(0 rows)

语句4:(这个有点久)

-- Condition 4: sum(O_OL_CNT)
-- = [number of rows in the ORDER-LINE table for this bmsql_district]
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
          FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
                FROM bmsql_oorder
                GROUP BY o_w_id, o_d_id) o,
              (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
                FROM bmsql_order_line
                GROUP BY ol_w_id, ol_d_id) ol
          WHERE o.o_w_id = ol.ol_w_id AND
                o.o_d_id = ol.ol_d_id) as x
WHERE sum_o_ol_cnt != count_ol;

tpcc=# -- Condition 4: sum(O_OL_CNT)
tpcc=# -- = [number of rows in the ORDER-LINE table for this bmsql_district]
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
tpcc(#            FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
tpcc(#                  FROM bmsql_oorder
tpcc(#                  GROUP BY o_w_id, o_d_id) o,
tpcc(#                 (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
tpcc(#                  FROM bmsql_order_line
tpcc(#                  GROUP BY ol_w_id, ol_d_id) ol
tpcc(#            WHERE o.o_w_id = ol.ol_w_id AND
tpcc(#                 o.o_d_id = ol.ol_d_id) as x
tpcc-# WHERE sum_o_ol_cnt != count_ol;
o_w_id | o_d_id | sum_o_ol_cnt | count_ol
--------+--------+--------------+----------
(0 rows)

语句5:

-- Condition 5: For any row in the ORDER table, O_CARRIER_ID is set to a null
-- value if and only if there is a corresponding row in the
-- NEW-ORDER table
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
          FROM bmsql_oorder o,
              (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
                FROM bmsql_new_order
                GROUP BY no_w_id, no_d_id, no_o_id) no
          WHERE o.o_w_id = no.no_w_id AND
                o.o_d_id = no.no_d_id AND
                o.o_id = no.no_o_id) as x
WHERE (o_carrier_id IS NULL AND count_no = 0) OR
    (o_carrier_id IS NOT NULL AND count_no != 0);
     
     
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
tpcc(#            FROM bmsql_oorder o,
tpcc(#                 (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
tpcc(#                  FROM bmsql_new_order
tpcc(#                  GROUP BY no_w_id, no_d_id, no_o_id) no
tpcc(#            WHERE o.o_w_id = no.no_w_id AND
tpcc(#                 o.o_d_id = no.no_d_id AND
tpcc(#                 o.o_id = no.no_o_id) as x
tpcc-# WHERE (o_carrier_id IS NULL AND count_no = 0) OR
tpcc-#       (o_carrier_id IS NOT NULL AND count_no != 0);
o_w_id | o_d_id | o_id | o_carrier_id | count_no
--------+--------+------+--------------+----------
(0 rows)

语句6:(比较耗时)

-- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
-- of rows in the ORDER-LINE table for the corresponding order
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
          FROM bmsql_oorder o,
              (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
                FROM bmsql_order_line
                GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
          WHERE o.o_w_id = ol.ol_w_id AND
                o.o_d_id = ol.ol_d_id AND
                o.o_id = ol.ol_o_id) as x
WHERE o_ol_cnt != count_ol;

tpcc=# -- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
tpcc=# -- of rows in the ORDER-LINE table for the corresponding order
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
tpcc(#            FROM bmsql_oorder o,
tpcc(#                 (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
tpcc(#                  FROM bmsql_order_line
tpcc(#                  GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
tpcc(#            WHERE o.o_w_id = ol.ol_w_id AND
tpcc(#                 o.o_d_id = ol.ol_d_id AND
tpcc(#                 o.o_id = ol.ol_o_id) as x
tpcc-# WHERE o_ol_cnt != count_ol;
o_w_id | o_d_id | o_id | o_ol_cnt | count_ol
--------+--------+------+----------+----------
(0 rows)

语句7:

-- Condition 7: For any row in the ORDER-LINE table, OL_DELIVERY_D is set to
-- a null date/time if and only if the corresponding row in the
-- ORDER table has O_CARRIER_ID set to a null value
SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
                o.o_carrier_id
          FROM bmsql_order_line ol,
              bmsql_oorder o
          WHERE ol.ol_w_id = o.o_w_id AND
                ol.ol_d_id = o.o_d_id AND
                ol.ol_o_id = o.o_id) as x
WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
    (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
     
tpcc=# SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
tpcc(#                   o.o_carrier_id
tpcc(#            FROM bmsql_order_line ol,
tpcc(#                 bmsql_oorder o
tpcc(#            WHERE ol.ol_w_id = o.o_w_id AND
tpcc(#                 ol.ol_d_id = o.o_d_id AND
tpcc(#                 ol.ol_o_id = o.o_id) as x
tpcc-# WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
tpcc-#       (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
ol_w_id | ol_d_id | ol_o_id | ol_delivery_d | o_carrier_id
---------+---------+---------+---------------+--------------
(0 rows)

语句8:

-- Condition 8: W_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
     FROM bmsql_warehouse w,
          (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
     WHERE w.w_id = h.h_w_id) as x
WHERE w_ytd != sum_h_amount;

tpcc=# SELECT *
tpcc-# FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
tpcc(#       FROM bmsql_warehouse w,
tpcc(#           (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
tpcc(#       WHERE w.w_id = h.h_w_id) as x
tpcc-# WHERE w_ytd != sum_h_amount;
w_id | w_ytd | sum_h_amount
------+-------+--------------
(0 rows)

语句9:

-- Condition 9: D_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
     FROM bmsql_district d,
          (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
           FROM bmsql_history
           GROUP BY h_w_id, h_d_id) h
     WHERE d.d_w_id = h.h_w_id
       AND d.d_id = h.h_d_id) as x
WHERE d_ytd != sum_h_amount;

tpcc=# SELECT *
tpcc-# FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
tpcc(#       FROM bmsql_district d,
tpcc(#           (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
tpcc(#             FROM bmsql_history
tpcc(#             GROUP BY h_w_id, h_d_id) h
tpcc(#       WHERE d.d_w_id = h.h_w_id
tpcc(#         AND d.d_id = h.h_d_id) as x
tpcc-# WHERE d_ytd != sum_h_amount;
d_w_id | d_id | d_ytd | sum_h_amount
--------+------+-------+--------------
(0 rows)

结论

性能:100-1000的并发连接中,500并发表现性能更为优秀,提高pooler_scale_factor = 64,修改参数、通过绑核、设置建表填充因子会使得tpmc更上一个台阶。

稳定性:基于teledbx-6.11.03的版本进行7*24小时压测,基于一致性检验的语句输出为空或0,长稳测试通过,可应用于生产。

0条评论
0 / 1000
唐****杰
3文章数
1粉丝数
唐****杰
3 文章 | 1 粉丝
唐****杰
3文章数
1粉丝数
唐****杰
3 文章 | 1 粉丝
原创

teledbx压测tpmc教程

2024-10-17 09:34:34
9
0

teledbx压测tpmc教程

本教程适用于采用benchmarksql_teledbx-5.0版本作为客户端压测6.11版本的teledbx服务端,客户端和服务端分别部署在不同机器上,且客户端通过千兆网口连接服务端进行压测实践。

1.压测准备

1.1 客户端

客户端采用改造版的benchmarksql-5.0,其文件名: benchmarksql_teledbx-5.0.tar.gz

1.1.1 简介

benchmarkSQL是一个开源的基准测试工具,支持多种数据库系统,可以用于测试各种关系型数据库系统的性能,主要用于TPC-C测试。它是一个 Java 应用程序,可以模拟多用户并发访问数据库,并测量数据库系统的吞吐量和响应时间。

TPC-C 是针对联机交易处理系统(OLTP)进行测试的规范。使用一个商品销售模型对 OLTP 系统进行测试,其中包含五类事务:

  • NewOrder:新订单的生成

  • Payment:订单付款

  • OrderStatus:最近订单查询

  • Delivery:配送

  • StockLevel:库存缺货状态分析

TPC-C 使用 tpmC 值(每分钟运行的事务数,Transactions per Minute)来衡量系统最大有效吞吐量(Max Qualified Throughput,MQTh),其中 Transactions 以 NewOrder Transaction 为准,即最终衡量单位为每分钟处理的新订单数。

1.1.2 配置工作

如果存在benchmarksql_teledbx-5.0.tar.gz这个文件,解压即可使用;否则按照下述步骤更新benchmarksql。

安装java

# 1.下载解压jdk
$ tar -zxvf jdk-8u202-linux-x64.tar.gz
$ mv jdk-8u202-linux-x64 jdk1.8.0_202
# 2.配置环境变量
$ export JAVA_HOME=/yourpath/jdk1.8.0_202
$ export PATH=$JAVA_HOME/bin:$PATH

安装ant

##方法1
#安装
$ sudo yum install -y ant

##方法2
#在计划安装的路径获取ant
$ tar -zxvf apache-ant-1.10.13-bin.tar.gz
#环境配置
vim ~/.bashrc
export ANT_HOME=/yourpath/apache-ant-1.10.13
export PATH=$PATH:$ANT_HOME/bin
source ~/.bashrc
#验证
ant -version
Apache Ant(TM) version 1.10.13 compiled on January 4 2023

安装benchmarksql

# 切换到5.0分支
$ cd benchmarksql
$ git checkout v5.0

# 更新pgjdbc
$ cd lib/postgres
$ mv postgresql-9.3.jar postgresql-9.3.jar.bak
$ cd ../run/sql.common
# 注释掉foreignkeys.sql内容
vi foreignkeys.sql
# 编译生成benchmarksql.jar包
$ cd benchmarksql && ant clean && ant

1.2 服务端

集群1通过管控面开出。

1.2.1 集群实例规格

集群1:2cn(1主1备),4dn(2主2备)

gtm:16C128G
cn:16C128G
dn:8C128G
节点名称 IP port 数据目录
gtm主 192.168.26.4 21000 /data/teledbx/tdata_03/peformance/data/gtm1
gtm备 192.168.26.5 21000 /data/teledbx/tdata_03/peformance/data/gtm2
cn001主 192.168.26.4 11345 /data/teledbx/tdata_02/peformance/data/cn001
cn001备 192.168.26.5 11345 同上
dn001主 192.168.26.4 11000 /data/teledbx/tdata_00/peformance/data/dn001
dn002主 192.168.26.4 11002 同上
dn001备 192.168.26.5 11000 /data/teledbx/tdata_01/peformance/data/dn002
dn002备 192.168.26.5 11002 同上

内存和cpu

cpu: 112
内存总数: 1007GB

1.2.2 参数配置

  • 集群2的开通参数。然后修改下述几个参数重启集群2:

    init_pool_size=1
    min_free_size=1
    min_pool_size=1
    pooler_scale_factor = 1 #该参数设置为64会更高
    max_connections=5000

     

2.压测

2.1 压测配置修改

cp prop.pg prop.teledbx
###########props.teledbx内容如下###############
db=postgres
driver=org.postgresql.Driver
conn=jdbc:postgresql://192.168.26.4:11345/tpcc
user=teledbx
password=密码

warehouses=100
loadWorkers=95

terminals=200
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=10
//Number of total transactions per minute
limitTxnsPerMin=0

//Set to true to run in 4.x compatible mode. Set to false to use the
//entire configured database evenly.
terminalWarehouseFixed=true

//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
newOrderWeight=45
paymentWeight=43
orderStatusWeight=4
deliveryWeight=4
stockLevelWeight=4

// Directory name to create for collecting detailed result data.
// Comment this out to suppress.
resultDirectory=my_result_%tY-%tm-%td_%tH%tM%tS
//osCollectorScript=./misc/os_collector_linux.py
//osCollectorInterval=1
//osCollectorSSHAddr=user@dbhost
//osCollectorDevices=net_eth0 blk_sda

步骤

#导入数据
nohup ./runDatabaseBuild.sh props.teledb > out_build.log 2>&1 &
#运行压测
nohup ./runBenchmark.sh props.teledb > out_run.log 2>&1 &
#卸载数据
./runDatabaseDestroy.sh props.teledbx &

2.2 压测结果

集群1压测结果

并发数 tpmc平均 事务总数 总消耗内存(cn) 压测cpu占比(cn)
100 15.9w 150.3w 23GB 16-26%
200 25.9w 251.6w 32GB 35-43%
300 47.1w 409.8w 50GB 45-55%
500 51.8w 457.9w 70GB 75-90%
1000 36.2w 35.3w 90GB 75-84%

3.问题

3.1 问题1

基于管控开出示例,如果ulimit -n为1024,则benchmarksql压测300-1000并发抛出java堆栈,对应连接的cn日志抛出错误:

2023-10-16 17:18:15.852 CST,,,2203938,coord(2203938,0),,652cfc2f.21a122,coord(2203938,0),28151682,,2023-10-16 17:02:39 CST,,0,LOG,08006,"PoolMgr: failed to connect to Datanode:[port=11004 dbname=tpcc user=teledbx application_name='pgxc:cn001' sslmode=disable options='-c remotetype=coordinator -c parentnode=cn001  -c DateStyle=iso,ymd -c timezone=asia/shanghai -c geqo=on -c intervalstyle=postgres -c lc_monetary=zh_CN.utf8 -c inner_conn=no' connect_timeout=60], validSize:0, size:1, errmsg:could not create socket: 打开的文件过多
",,,,,,,,,""

已解决

1.管控agent代码修改sysctl.conf和/etc/security/limits.conf

3.2 问题2

压测高并发:500或者1000并发,cn报事务rollback,dn节点日志报大量如下日志

process 2335253 still waiting for ShareLock on transaction 608333 after 1000.058 ms","Process holding the lock: 0. Wait queue: 

暂未解决

该问题主要是主从节点集群部署的高可用agent程序引起,因此需要彻底停止主从的oss_observer服务进程,下面步骤主从都执行:

#1.取消定时任务
crontab -e
#*/1 * * * * (/home/teledbx/install/teledbx_oss/script/start_monitor.sh &> /tmp/start_monitor.sh.log)

#2.停止agent程序,oss_observer.
cd /home/teledbx/install/teledbx_oss/tools/op
sh stop.sh

进一步确定

该定时任务主要去执行函数:

语句1: select pg_unlock_check_deadlock();
postgres=# select pg_unlock_check_deadlock();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Failed to read a valid object file image from memory.
Core was generated by `postgres: tangyujie postgres [local]                                         '.
Program terminated with signal 11, Segmentation fault.
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
  blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
  blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
1368 memcpy(block_temp[count].blocking_gid, blocking_gid, sizeof(char) * MAX_GID);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-55.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libselinux-2.5-15.el7.x86_64 openssl-libs-1.0.2k-26.el7_9.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0 addBlockinginfo (txn_index=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_pid=18192,
  blocking_pid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>, blocking_gid=0x0,
  blocking_gid@entry=<error reading variable: Cannot access memory at address 0x7ffd1bafeea8>) at pg_unlock.c:1368
Cannot access memory at address 0x7ffd1bafeea8


语句2: select pg_unlock_execute()

4.长稳一致性检验

总共包含9个sql语句,该sql语句保存在benchmarksql-5.0/run/sql.common/test.sql中。

语句验证

语句1:

-- Condition 1: W_YTD = sum(D_YTD)
SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
          FROM bmsql_warehouse w,
              (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
                FROM bmsql_district
                GROUP BY d_w_id) d
          WHERE w.w_id = d.d_w_id) as x
WHERE w_ytd != sum_d_ytd;

tpcc=# SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
tpcc(#            FROM bmsql_warehouse w,
tpcc(#                 (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
tpcc(#                  FROM bmsql_district
tpcc(#                  GROUP BY d_w_id) d
tpcc(#            WHERE w.w_id = d.d_w_id) as x
tpcc-# WHERE w_ytd != sum_d_ytd;
w_id | w_ytd | sum_d_ytd
------+-------+-----------
(0 rows)

语句2:

-- Condition 2: D_NEXT_O_ID - 1 = max(O_ID) = max(NO_O_ID)
SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
          FROM bmsql_district d,
              (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
                FROM bmsql_oorder
                GROUP BY o_w_id, o_d_id) o,
              (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
                FROM bmsql_new_order
                GROUP BY no_w_id, no_d_id) no
          WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
                d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;

tpcc=# SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
tpcc(#            FROM bmsql_district d,
tpcc(#                 (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
tpcc(#                  FROM bmsql_oorder
tpcc(#                  GROUP BY o_w_id, o_d_id) o,
tpcc(#                 (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
tpcc(#                  FROM bmsql_new_order
tpcc(#                  GROUP BY no_w_id, no_d_id) no
tpcc(#            WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
tpcc(#                 d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
tpcc-# WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;
d_w_id | d_id | d_next_o_id | max_o_id | max_no_o_id
--------+------+-------------+----------+-------------
(0 rows)

语句3:

-- Condition 3: max(NO_O_ID) - min(NO_O_ID) + 1
-- = [number of rows in the NEW-ORDER table for this bmsql_district]
SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
                MIN(no_o_id) min_no_o_id, COUNT(*) count_no
          FROM bmsql_new_order
          GROUP BY no_w_id, no_d_Id) as x
WHERE max_no_o_id - min_no_o_id + 1 != count_no;

tpcc=# SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
tpcc(#                   MIN(no_o_id) min_no_o_id, COUNT(*) count_no
tpcc(#            FROM bmsql_new_order
tpcc(#            GROUP BY no_w_id, no_d_Id) as x
tpcc-# WHERE max_no_o_id - min_no_o_id + 1 != count_no;
no_w_id | no_d_id | max_no_o_id | min_no_o_id | count_no
---------+---------+-------------+-------------+----------
(0 rows)

语句4:(这个有点久)

-- Condition 4: sum(O_OL_CNT)
-- = [number of rows in the ORDER-LINE table for this bmsql_district]
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
          FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
                FROM bmsql_oorder
                GROUP BY o_w_id, o_d_id) o,
              (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
                FROM bmsql_order_line
                GROUP BY ol_w_id, ol_d_id) ol
          WHERE o.o_w_id = ol.ol_w_id AND
                o.o_d_id = ol.ol_d_id) as x
WHERE sum_o_ol_cnt != count_ol;

tpcc=# -- Condition 4: sum(O_OL_CNT)
tpcc=# -- = [number of rows in the ORDER-LINE table for this bmsql_district]
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
tpcc(#            FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
tpcc(#                  FROM bmsql_oorder
tpcc(#                  GROUP BY o_w_id, o_d_id) o,
tpcc(#                 (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
tpcc(#                  FROM bmsql_order_line
tpcc(#                  GROUP BY ol_w_id, ol_d_id) ol
tpcc(#            WHERE o.o_w_id = ol.ol_w_id AND
tpcc(#                 o.o_d_id = ol.ol_d_id) as x
tpcc-# WHERE sum_o_ol_cnt != count_ol;
o_w_id | o_d_id | sum_o_ol_cnt | count_ol
--------+--------+--------------+----------
(0 rows)

语句5:

-- Condition 5: For any row in the ORDER table, O_CARRIER_ID is set to a null
-- value if and only if there is a corresponding row in the
-- NEW-ORDER table
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
          FROM bmsql_oorder o,
              (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
                FROM bmsql_new_order
                GROUP BY no_w_id, no_d_id, no_o_id) no
          WHERE o.o_w_id = no.no_w_id AND
                o.o_d_id = no.no_d_id AND
                o.o_id = no.no_o_id) as x
WHERE (o_carrier_id IS NULL AND count_no = 0) OR
    (o_carrier_id IS NOT NULL AND count_no != 0);
     
     
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
tpcc(#            FROM bmsql_oorder o,
tpcc(#                 (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
tpcc(#                  FROM bmsql_new_order
tpcc(#                  GROUP BY no_w_id, no_d_id, no_o_id) no
tpcc(#            WHERE o.o_w_id = no.no_w_id AND
tpcc(#                 o.o_d_id = no.no_d_id AND
tpcc(#                 o.o_id = no.no_o_id) as x
tpcc-# WHERE (o_carrier_id IS NULL AND count_no = 0) OR
tpcc-#       (o_carrier_id IS NOT NULL AND count_no != 0);
o_w_id | o_d_id | o_id | o_carrier_id | count_no
--------+--------+------+--------------+----------
(0 rows)

语句6:(比较耗时)

-- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
-- of rows in the ORDER-LINE table for the corresponding order
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
          FROM bmsql_oorder o,
              (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
                FROM bmsql_order_line
                GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
          WHERE o.o_w_id = ol.ol_w_id AND
                o.o_d_id = ol.ol_d_id AND
                o.o_id = ol.ol_o_id) as x
WHERE o_ol_cnt != count_ol;

tpcc=# -- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
tpcc=# -- of rows in the ORDER-LINE table for the corresponding order
tpcc=# SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
tpcc(#            FROM bmsql_oorder o,
tpcc(#                 (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
tpcc(#                  FROM bmsql_order_line
tpcc(#                  GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
tpcc(#            WHERE o.o_w_id = ol.ol_w_id AND
tpcc(#                 o.o_d_id = ol.ol_d_id AND
tpcc(#                 o.o_id = ol.ol_o_id) as x
tpcc-# WHERE o_ol_cnt != count_ol;
o_w_id | o_d_id | o_id | o_ol_cnt | count_ol
--------+--------+------+----------+----------
(0 rows)

语句7:

-- Condition 7: For any row in the ORDER-LINE table, OL_DELIVERY_D is set to
-- a null date/time if and only if the corresponding row in the
-- ORDER table has O_CARRIER_ID set to a null value
SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
                o.o_carrier_id
          FROM bmsql_order_line ol,
              bmsql_oorder o
          WHERE ol.ol_w_id = o.o_w_id AND
                ol.ol_d_id = o.o_d_id AND
                ol.ol_o_id = o.o_id) as x
WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
    (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
     
tpcc=# SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
tpcc(#                   o.o_carrier_id
tpcc(#            FROM bmsql_order_line ol,
tpcc(#                 bmsql_oorder o
tpcc(#            WHERE ol.ol_w_id = o.o_w_id AND
tpcc(#                 ol.ol_d_id = o.o_d_id AND
tpcc(#                 ol.ol_o_id = o.o_id) as x
tpcc-# WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
tpcc-#       (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
ol_w_id | ol_d_id | ol_o_id | ol_delivery_d | o_carrier_id
---------+---------+---------+---------------+--------------
(0 rows)

语句8:

-- Condition 8: W_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
     FROM bmsql_warehouse w,
          (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
     WHERE w.w_id = h.h_w_id) as x
WHERE w_ytd != sum_h_amount;

tpcc=# SELECT *
tpcc-# FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
tpcc(#       FROM bmsql_warehouse w,
tpcc(#           (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
tpcc(#       WHERE w.w_id = h.h_w_id) as x
tpcc-# WHERE w_ytd != sum_h_amount;
w_id | w_ytd | sum_h_amount
------+-------+--------------
(0 rows)

语句9:

-- Condition 9: D_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
     FROM bmsql_district d,
          (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
           FROM bmsql_history
           GROUP BY h_w_id, h_d_id) h
     WHERE d.d_w_id = h.h_w_id
       AND d.d_id = h.h_d_id) as x
WHERE d_ytd != sum_h_amount;

tpcc=# SELECT *
tpcc-# FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
tpcc(#       FROM bmsql_district d,
tpcc(#           (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
tpcc(#             FROM bmsql_history
tpcc(#             GROUP BY h_w_id, h_d_id) h
tpcc(#       WHERE d.d_w_id = h.h_w_id
tpcc(#         AND d.d_id = h.h_d_id) as x
tpcc-# WHERE d_ytd != sum_h_amount;
d_w_id | d_id | d_ytd | sum_h_amount
--------+------+-------+--------------
(0 rows)

结论

性能:100-1000的并发连接中,500并发表现性能更为优秀,提高pooler_scale_factor = 64,修改参数、通过绑核、设置建表填充因子会使得tpmc更上一个台阶。

稳定性:基于teledbx-6.11.03的版本进行7*24小时压测,基于一致性检验的语句输出为空或0,长稳测试通过,可应用于生产。

文章来自个人专栏
teledbx压测tpcc
1 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
1
0