重新创建了具有存储方式、压缩级别、分布方式和分布列的测试数据集后,重新测试系统性能。
1.记录各表的存储使用情况。
使用pg_size_pretty函数查询每张表使用的磁盘空间,并将结果记录到基准表中。
SELECT T_NAME, PG_SIZE_PRETTY(PG_RELATION_SIZE(t_name)) FROM (VALUES('store_sales'),('date_dim'),('store'),('item'),('time_dim'),('promotion'),('customer_demographics'),('customer_address'),('household_demographics'),('customer'),('income_band')) AS names1(t_name);
         t_name         | pg_size_pretty 
------------------------+---------------- 
 store_sales            | 14 GB 
 date_dim               | 27 MB 
 store                  | 4352 kB 
 item                   | 259 MB 
 time_dim               | 14 MB 
 promotion              | 3200 kB 
 customer_demographics  | 11 MB 
 customer_address       | 27 MB 
 household_demographics | 1280 kB 
 customer               | 111 MB 
 income_band            | 896 kB 
(11 rows)
2.测试查询性能,并将性能数据录入基准表中。
再次运行如下三个查询,并记录每个查询的耗费时间。
\timing on 
SELECT * FROM (SELECT  COUNT(*)   
FROM store_sales  
    ,household_demographics   
    ,time_dim, store  
WHERE ss_sold_time_sk = time_dim.t_time_sk   
    AND ss_hdemo_sk = household_demographics.hd_demo_sk   
    AND ss_store_sk = s_store_sk  
    AND time_dim.t_hour = 8  
    AND time_dim.t_minute >= 30  
    AND household_demographics.hd_dep_count = 5  
    AND store.s_store_name = 'ese'  
ORDER BY COUNT(*)  
 ) LIMIT 100; 
 
SELECT * FROM (SELECT  i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, 
SUM(ss_ext_sales_price) ext_price 
 FROM date_dim, store_sales, item,customer,customer_address,store 
 WHERE d_date_sk = ss_sold_date_sk 
   AND ss_item_sk = i_item_sk 
   AND i_manager_id=8 
   AND d_moy=11 
   AND d_year=1999 
   AND ss_customer_sk = c_customer_sk  
   AND c_current_addr_sk = ca_address_sk 
   AND substr(ca_zip,1,5) <> substr(s_zip,1,5)  
   AND ss_store_sk = s_store_sk  
 GROUP BY i_brand 
      ,i_brand_id 
      ,i_manufact_id 
      ,i_manufact 
 ORDER BY ext_price desc 
         ,i_brand 
         ,i_brand_id 
         ,i_manufact_id 
         ,i_manufact 
 ) LIMIT 100; 
 
SELECT * FROM (SELECT  s_store_name, s_store_id, 
        SUM(CASE WHEN (d_day_name='Sunday') THEN ss_sales_price ELSE null END) sun_sales, 
        SUM(CASE WHEN (d_day_name='Monday') THEN ss_sales_price ELSE null END) mon_sales, 
        SUM(CASE WHEN (d_day_name='Tuesday') THEN ss_sales_price ELSE  null END) tue_sales, 
        SUM(CASE WHEN (d_day_name='Wednesday') THEN ss_sales_price ELSE null END) wed_sales, 
        SUM(CASE WHEN (d_day_name='Thursday') THEN ss_sales_price ELSE null END) thu_sales, 
        SUM(CASE WHEN (d_day_name='Friday') THEN ss_sales_price ELSE null END) fri_sales, 
        SUM(CASE WHEN (d_day_name='Saturday') THEN ss_sales_price ELSE null END) sat_sales 
 FROM date_dim, store_sales, store 
 WHERE d_date_sk = ss_sold_date_sk AND 
       s_store_sk = ss_store_sk AND 
       s_gmt_offset = -5 AND 
       d_year = 2000  
 GROUP BY s_store_name, s_store_id 
 ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales 
  ) LIMIT 100;
下面的基准表显示了本次实践中所用集群的验证结果。您的结果可能会因多方面的原因而有所变化,但规律性应该相差不大。考虑到操作系统缓存的影响,相同表结构的同一查询在每次执行时耗时会有不同属正常现象,建议多测试几次,取一组平均值。
基准 优化前 优化后 加载时间(11张表)
341584ms
257241ms
占用存储
Store_Sales
42GB
14GB
Date_Dim
11MB
27MB
Store
232kB
4352kB
Item
110MB
259MB
Time_Dim
11MB
14MB
Promotion
256kB
3200kB
Customer_Demographics
171MB
11MB
Customer_Address
170MB
27MB
Household_Demographics
504kB
1280kB
Customer
441MB
111MB
Income_Band
88kB
896kB
总存储空间
42GB
15GB
查询执行时间
查询1
14552.05ms
1783.353ms
查询2
27952.36ms
14247.803ms
查询3
17721.15ms
11441.659ms
总执行时间
60225.56ms
27472.815ms
3.如果对表设计后的性能还有更高期望,可以运行EXPLAIN PERFORMANCE以查看执行计划进行调优。
关于执行计划的更详细介绍及查询优化请参考《开发指南》中的“SQL执行计划介绍”及“优化查询性能概述” 。
