在优化表结构前后,请测试和记录以下详细信息以对比系统性能差异:
- 数据加载时间。
- 表占用的存储空间大小。
- 查询性能。
本次实践中的示例基于使用8节点的dws.d2.xlarge集群。因为系统性能受到许多因素的影响,即使您使用相同的集群配置,结果也会有所不同。
项目 规格及参数 机器型号
dws.d2.xlarge VM
CPU
4*CPU E5-2680 v2 @ 2.80GHZ
内存
32GB
网络
1GB
磁盘
1.63TB
节点数目
8
请使用下面的基准表来记录结果。
基准 优化前 优化后 加载时间(11张表)
341584 ms
-
占用存储
Store_Sales
-
-
Date_Dim
-
-
Store
-
-
Item
-
-
Time_Dim
-
-
Promotion
-
-
Customer_Demographics
-
-
Customer_Address
-
-
Household_Demographics
-
-
Customer
-
-
Income_Band
-
-
总存储空间
-
-
查询执行时间
查询1
-
-
查询2
-
-
查询3
-
-
总执行时间
-
-
执行以下步骤测试优化前的系统性能,以建立基准。
1.将上一节记下的所有11张表的累计加载时间填入基准表的“优化前”一列。
2.记录各表的存储使用情况。
使用pg_size_pretty函数查询每张表使用的磁盘空间,并将结果记录到基准表中。
SELECT T_NAME, PG_SIZE_PRETTY(PG_RELATION_SIZE(t_name)) FROM (VALUES('store_sales'),('date_dim'),('store'),('item'),('time_dim'),('promotion'),('customer_demographics'),('customer_address'),('household_demographics'),('customer'),('income_band')) AS names1(t_name);
显示结果如下:
t_name | pg_size_pretty
------------------------+----------------
store_sales | 42 GB
date_dim | 11 MB
store | 232 kB
item | 110 MB
time_dim | 11 MB
promotion | 256 kB
customer_demographics | 171 MB
customer_address | 170 MB
household_demographics | 504 kB
customer | 441 MB
income_band | 88 kB
(11 rows)
3.测试查询性能。
运行如下三个查询,并记录每个查询的耗费时间。考虑到操作系统缓存的影响,同一查询在每次执行时耗时会有不同属正常现象,建议多测试几次,取一组平均值。
\timing on
SELECT * FROM (SELECT COUNT(*)
FROM store_sales
,household_demographics
,time_dim, store
WHERE ss_sold_time_sk = time_dim.t_time_sk
AND ss_hdemo_sk = household_demographics.hd_demo_sk
AND ss_store_sk = s_store_sk
AND time_dim.t_hour = 8
AND time_dim.t_minute >= 30
AND household_demographics.hd_dep_count = 5
AND store.s_store_name = 'ese'
ORDER BY COUNT(*)
) LIMIT 100;
SELECT * FROM (SELECT i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact,
SUM(ss_ext_sales_price) ext_price
FROM date_dim, store_sales, item,customer,customer_address,store
WHERE d_date_sk = ss_sold_date_sk
AND ss_item_sk = i_item_sk
AND i_manager_id=8
AND d_moy=11
AND d_year=1999
AND ss_customer_sk = c_customer_sk
AND c_current_addr_sk = ca_address_sk
AND substr(ca_zip,1,5) <> substr(s_zip,1,5)
AND ss_store_sk = s_store_sk
GROUP BY i_brand
,i_brand_id
,i_manufact_id
,i_manufact
ORDER BY ext_price desc
,i_brand
,i_brand_id
,i_manufact_id
,i_manufact
) LIMIT 100;
SELECT * FROM (SELECT s_store_name, s_store_id,
SUM(CASE WHEN (d_day_name='Sunday') THEN ss_sales_price ELSE null END) sun_sales,
SUM(CASE WHEN (d_day_name='Monday') THEN ss_sales_price ELSE null END) mon_sales,
SUM(CASE WHEN (d_day_name='Tuesday') THEN ss_sales_price ELSE null END) tue_sales,
SUM(CASE WHEN (d_day_name='Wednesday') THEN ss_sales_price ELSE null END) wed_sales,
SUM(CASE WHEN (d_day_name='Thursday') THEN ss_sales_price ELSE null END) thu_sales,
SUM(CASE WHEN (d_day_name='Friday') THEN ss_sales_price ELSE null END) fri_sales,
SUM(CASE WHEN (d_day_name='Saturday') THEN ss_sales_price ELSE null END) sat_sales
FROM date_dim, store_sales, store
WHERE d_date_sk = ss_sold_date_sk AND
s_store_sk = ss_store_sk AND
s_gmt_offset = -5 AND
d_year = 2000
GROUP BY s_store_name, s_store_id
ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales
) LIMIT 100;
经过上面的统计后,记录的基准表信息如下:
基准 优化前 优化后 加载时间(11张表)
341584ms
-
占用存储
Store_Sales
42GB
-
Date_Dim
11MB
-
Store
232kB
-
Item
110MB
-
Time_Dim
11MB
-
Promotion
256kB
-
Customer_Demographics
171MB
-
Customer_Address
170MB
-
Household_Demographics
504kB
-
Customer
441MB
-
Income_Band
88kB
-
总存储空间
42GB
-
查询执行时间
查询1
14552.05ms
-
查询2
27952.36ms
-
查询3
17721.15ms
-
总执行时间
60225.56ms
-