选择存储方式
此实践中所使用的样例表为典型的TPC-DS表,是典型的多字段表,统计分析类查询场景多,因此选择列存存储方式。
WITH (ORIENTATION = column)
选择压缩级别
在步骤1:“创建初始表并加装样例数据”中没有指定压缩比,DWS默认为用户选择LOW级别压缩比。在这一步中我们把压缩比调整为MIDDLE级别,进行验证对比。
增加存储方式和压缩比后的建表样例如下:
CREATE TABLE store_sales
(
ss_sold_date_sk integer ,
ss_sold_time_sk integer ,
ss_item_sk integer not null,
ss_customer_sk integer ,
ss_cdemo_sk integer ,
ss_hdemo_sk integer ,
ss_addr_sk integer ,
ss_store_sk integer ,
ss_promo_sk integer ,
ss_ticket_number bigint not null,
ss_quantity integer ,
ss_wholesale_cost decimal(7,2) ,
ss_list_price decimal(7,2) ,
ss_sales_price decimal(7,2) ,
ss_ext_discount_amt decimal(7,2) ,
ss_ext_sales_price decimal(7,2) ,
ss_ext_wholesale_cost decimal(7,2) ,
ss_ext_list_price decimal(7,2) ,
ss_ext_tax decimal(7,2) ,
ss_coupon_amt decimal(7,2) ,
ss_net_paid decimal(7,2) ,
ss_net_paid_inc_tax decimal(7,2) ,
ss_net_profit decimal(7,2)
)
WITH (ORIENTATION = column,COMPRESSION=middle);
选择分布方式
依据步骤2:“测试初始表结构下的系统性能并建立基线”中所建立基线的各表大小,分布方式设置如下:
表名 | 行数 | 分布方式 |
---|---|---|
Store_Sales | 287997024 | Hash |
Date_Dim | 73049 | Replication |
Store | 402 | Replication |
Item | 204000 | Replication |
Time_Dim | 86400 | Replication |
Promotion | 1000 | Replication |
Customer_Demographics | 1920800 | Hash |
Customer_Address | 1000000 | Hash |
Household_Demographics | 7200 | Replication |
Customer | 1981703 | Hash |
Income_Band | 20 | Replication |
选择分布列
当表的分布方式选择了Hash分布策略时,分布列选取至关重要。在这一步中,建议按照“表结构设计”章节中的“选择分布列”选择分布键:
选择各表的主键作为Hash表分布键。
表名 | 记录数 | 分布方式 | 分布键 |
---|---|---|---|
Store_Sales | 287997024 | Hash | ss_item_sk |
Date_Dim | 73049 | Replication | - |
Store | 402 | Replication | - |
Item | 204000 | Replication | - |
Time_Dim | 86400 | Replication | - |
Promotion | 1000 | Replication | - |
Customer_Demographics | 1920800 | Hash | cd_demo_sk |
Customer_Address | 1000000 | Hash | ca_address_sk |
Household_Demographics | 7200 | Replication | - |
Customer | 1981703 | Hash | c_customer_sk |
Income_Band | 20 | Replication | - |