一般我们表都是堆表,按行存储的,Greenplum支持列存储,并且支持数据压缩
如果当你的表大到一个度,那么就需要进行逻辑上的拆分,将一张大表拆分为多张小表,这样可以提升相关表操作的效率,一般我们的行存储在返回多列的效率上比较高,所以一般OLTP都是使用行存储,列存储更适合用于统计的相关操作,所以都是用在OLAP系统下
创建一个范围分区表
CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan16 START (date '2016-01-01') INCLUSIVE ,
PARTITION Feb16 START (date '2016-02-01') INCLUSIVE ,
PARTITION Mar16 START (date '2016-03-01') INCLUSIVE ,
PARTITION Apr16 START (date '2016-04-01') INCLUSIVE ,
PARTITION May16 START (date '2016-05-01') INCLUSIVE ,
PARTITION Jun16 START (date '2016-06-01') INCLUSIVE ,
PARTITION Jul16 START (date '2016-07-01') INCLUSIVE ,
PARTITION Aug16 START (date '2016-08-01') INCLUSIVE ,
PARTITION Sep16 START (date '2016-09-01') INCLUSIVE ,
PARTITION Oct16 START (date '2016-10-01') INCLUSIVE ,
PARTITION Nov16 START (date '2016-11-01') INCLUSIVE ,
PARTITION Dec16 START (date '2016-12-01') INCLUSIVE
END (date '2017-01-01') EXCLUSIVE );
insert into sales values (1,'2016-01-31',1.02);
select * from sales_1_prt_jan16;
id | date | amt
----+------------+------
1 | 2016-01-31 | 1.02
insert into sales values (1,'2016-03-02',1.02);
select * from sales_1_prt_mar16;
id | date | amt
----+------------+------
1 | 2016-03-02 | 1.02
这里没有定义end的值,那么结束值就是下一个开始值,但是不包含这个值
创建一个list分区
CREATE TABLE rank (id int, rank int, year int, gender
char(1), count int )
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'),
PARTITION boys VALUES ('M'),
DEFAULT PARTITION other );
list之前一篇文章详细介绍过,不再细讲
创建Multi-level分区表
CREATE TABLE sales (trans_id int, date date, amount
decimal(9,2), region text)
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
SUBPARTITION asia VALUES ('asia'),
SUBPARTITION europe VALUES ('europe'),
DEFAULT SUBPARTITION other_regions)
(START (date '2011-01-01') INCLUSIVE
END (date '2012-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month'),
DEFAULT PARTITION outlying_dates );
这里先按date范围分区,间隔按月,然后又按region进行list分区
warehouse=# \dt+ sales_1_prt_3* --经过查证,后面插入22年2月的数据都是落在这个范围分区
List of relations
Schema | Name | Type | Owner | Storage | Description
--------+-----------------------------------+-------+---------+---------+-------------
public | sales_1_prt_3 | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_asia | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_europe | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_other_regions | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_usa | table | gpadmin | heap |
插入2011年不同region举例
warehouse=# insert into sales VALUES (1,'2011-02-22',3.96,'hank');
INSERT 0 1
warehouse=# select * from sales;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | hank
(1 row)
warehouse=# insert into sales VALUES (1,'2011-02-22',3.96,'usa');
INSERT 0 1
warehouse=# select * from sales_1_prt_3;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | usa
1 | 2011-02-22 | 3.96 | hank
(2 rows)
warehouse=# select * from sales_1_prt_3_2_prt_usa;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | usa
warehouse=# select * from sales_1_prt_3_2_prt_other_regions;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | hank
大于2011年的数据举例:
warehouse=# insert into sales VALUES (3,'2012-03-31',3.96,'asia');
warehouse=# select * from sales_1_prt_outlying_dates;
trans_id | date | amount | region
----------+------------+--------+--------
3 | 2012-03-31 | 3.96 | asia
(1 row)
warehouse=# select * from sales_1_prt_outlying_dates_2_prt_asia;
trans_id | date | amount | region
----------+------------+--------+--------
3 | 2012-03-31 | 3.96 | asia
创建three_level分区表
CREATE TABLE p3_sales (id int, year int, month int, day int,
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
SUBPARTITION BY RANGE (month)
SUBPARTITION TEMPLATE (
START (1) END (13) EVERY (1),
DEFAULT SUBPARTITION other_months )
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE (
SUBPARTITION usa VALUES ('usa'),
SUBPARTITION europe VALUES ('europe'),
SUBPARTITION asia VALUES ('asia'),
DEFAULT SUBPARTITION other_regions )
( START (2002) END (2012) EVERY (1),
DEFAULT PARTITION outlying_years );
以上例子是三层分区,就不再插入数据进行举例,是按year,再按month,最后按region进行的分区,但是这样会创建很多自分区,所以在选择分区键和策略的时候尽量减少没有必要的分区,否则也会增加相关查询语句的时间和内存占用。
如果想要将一个普通表转换为分区表
1.创建分区表
2.从普通表将数据导入
3.删除原始表
4.修改分区表名为原始表名
5.赋予原始表的相关权限
CREATE TABLE sales2 (LIKE sales)
PARTITION BY RANGE (date)
( START (date 2016-01-01') INCLUSIVE
END (date '2017-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month') );
INSERT INTO sales2 SELECT * FROM sales;
DROP TABLE sales;
ALTER TABLE sales2 RENAME TO sales;
GRANT ALL PRIVILEGES ON sales TO admin;
GRANT SELECT ON sales TO guest;
分区表最大可以有32767个分区
分区表如果是等值查询,会选择性的找对应的分区,以提升效率
warehouse=# EXPLAIN SELECT * FROM sales WHERE date='01-07-12' AND
warehouse-# region='usa';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Gather Motion 64:1 (slice1; segments: 64) (cost=0.00..0.00 rows=2 width=53)
-> Append (cost=0.00..0.00 rows=1 width=53)
-> Seq Scan on sales_1_prt_outlying_dates_2_prt_usa sales (cost=0.00..0.00 rows=1 width=53)
Filter: date = '2012-01-07'::date AND region = 'usa'::text
-> Seq Scan on sales_1_prt_outlying_dates_2_prt_other_regions sales (cost=0.00..0.00 rows=1 width=53)
Filter: date = '2012-01-07'::date AND region = 'usa'::text
Settings: effective_cache_size=4GB; random_page_cost=2
可以通过一下数据字典查询分区表的相关信息
SELECT partitionboundary, partitiontablename, partitionname,
partitionlevel, partitionrank
FROM pg_partitions
WHERE tablename='sales';
pg_partition -跟踪分区表子分区的级别依赖关系以及分区表信息.
pg_partition_templates - Shows the subpartitions created using a subpartition template.
pg_partition_columns - Shows the partition key columns used in a partition design.