本文共 14590 字,大约阅读时间需要 48 分钟。
PostgreSQL , Greenplum , 交换分区 , 清理垃圾 , 存储格式变更
1、在使用Greenplum的过程中,如果业务会经常对数据进行更新,删除,插入,可能导致数据出现膨胀。
2、用户在建表的时候,存储模式选择不当,但是业务已经运行了一段时间,表里面已经写入了大量的数据。
3、用户在建表的时候,分布键选得不对,但是业务已经运行了一段时间,表里面已经写入了大量的数据。
4、用户在建表的时候表分区做得不对,但是业务已经运行了一段时间,表里面已经写入了大量的数据。
5、数据分区在初期选择了行存储,为了提到明细查询的效率。但是到数据冷却后,明细查询的需求下降,更多的是统计查询的需求,那么需要将这样的冷分区的行存储转换成列存储,提高统计的效率。
6、数据在使用过程中,由于分布键的数据在业务层发生了一些变化,导致了数据存储的倾斜。
以上情况发生时,我们就有必要进行相应的行动:
清理垃圾、调整分布键、分区键、修改存储模式等。
那么,有什么方法可以让这些操作尽量的减少对业务的影响呢?
答案是 交换数据、交换分区。
postgres=# select relname, relkind, relstorage from pg_class where relkind='r'; relname | relkind | relstorage -------------------------------+---------+------------ sql_languages | r | h sql_packages | r | h sql_parts | r | h sql_sizing | r | h sql_sizing_profiles | r | h gp_san_configuration | r | h
h表示heap 行存储表。
c表示append only column 存储表。
a表示表示append only 行存储表。
postgres=# select relname, relkind, relstorage from pg_class where relname='a'; relname | relkind | relstorage ---------+---------+------------ a | r | c (1 row) Time: 0.518 ms
1、查看分布键
SELECT attrnums FROM pg_catalog.gp_distribution_policy t WHERE localoid = '679926' ; attrnums ---------- {1} (1 row)
2、查看分布键名称
SELECT attname FROM pg_attribute WHERE attrelid = '679926' AND attnum = '1' ; attname --------- uid (1 row)
postgres=# select * from pg_partitions where tablename='tbl_pos';
对sales表进行交换。
CREATE TABLE sales2 (LIKE sales) WITH (appendonly=true, compresstype=quicklz, compresslevel=1, orientation=column); INSERT INTO sales2 SELECT * FROM sales; DROP TABLE sales; ALTER TABLE sales2 RENAME TO sales; GRANT ALL PRIVILEGES ON sales TO admin; GRANT SELECT ON sales TO guest;
对于分区表,Greenplum提供了交换分区的语法
EXCHANGE [DEFAULT] PARTITION Exchanges another table into the partition hierarchy into the place of an existing partition. In a multi-level partition design, you can only exchange the lowest level partitions (those that contain data). The Greenplum Database server configuration parameter gp_enable_exchange_default_partition controls availability of the EXCHANGE DEFAULT PARTITION clause. The default value for the parameter is off. The clause is not available and Greenplum Database returns an error if the clause is specified in an ALTER TABLE command. For information about the parameter, see Server Configuration Parameters. Warning: Before you exchange the default partition, you must ensure the data in the table to be exchanged, the new default partition, is valid for the default partition. For example, the data in the new default partition must not contain data that would be valid in other leaf child partitions of the partitioned table. Otherwise, queries against the partitioned table with the exchanged default partition that are executed by GPORCA might return incorrect results. WITH TABLE table_name - The name of the table you are swapping into the partition design. You can exchange a table where the table data is stored in the database. For example, the table is created with the CREATE TABLE command. With the EXCHANGE PARTITION clause, you can also exchange a readable external table (created with the CREATE EXTERNAL TABLE command) into the partition hierarchy in the place of an existing leaf child partition. If you specify a readable external table, you must also specify the WITHOUT VALIDATION clause to skip table validation against the CHECK constraint of the partition you are exchanging. Exchanging a leaf child partition with an external table is not supported in these cases: The partitioned table is created with the SUBPARTITION clause or if a partition has a subpartition. The partitioned table contains a column with a check constraint or a NOT NULL constraint. WITH | WITHOUT VALIDATION - Validates that the data in the table matches the CHECK constraint of the partition you are exchanging. The default is to validate the data against the CHECK constraint. Warning: If you specify the WITHOUT VALIDATION clause, you must ensure that the data in table that you are exchanging for an existing child leaf partition is valid against the CHECK constraints on the partition. Otherwise, queries against the partitioned table might return incorrect results. SET SUBPARTITION TEMPLATE Modifies the subpartition template for an existing partition. After a new subpartition template is set, all new partitions added will have the new subpartition design (existing partitions are not modified). SPLIT DEFAULT PARTITION Splits a default partition. Only a range partition can be split, not a list partition. In a multi-level partition design, you can only split the lowest level default partitions (those that contain data). Splitting a default partition creates a new partition containing the values specified and leaves the default partition containing any values that do not match to an existing partition. AT - For list partitioned tables, specifies a single list value that should be used as the criteria for the split. START - For range partitioned tables, specifies a starting value for the new partition. END - For range partitioned tables, specifies an ending value for the new partition. INTO - Allows you to specify a name for the new partition. When using the INTO clause to split a default partition, the second partition name specified should always be that of the existing default partition. If you do not know the name of the default partition, you can look it up using the pg_partitions view. SPLIT PARTITION Splits an existing partition into two partitions. Only a range partition can be split, not a list partition. In a multi-level partition design, you can only split the lowest level partitions (those that contain data). AT - Specifies a single value that should be used as the criteria for the split. The partition will be divided into two new partitions with the split value specified being the starting range for the latter partition. INTO - Allows you to specify names for the two new partitions created by the split. partition_name The given name of a partition. FOR (RANK(number)) For range partitions, the rank of the partition in the range. FOR ('value') Specifies a partition by declaring a value that falls within the partition boundary specification. If the value declared with FOR matches to both a partition and one of its subpartitions (for example, if the value is a date and the table is partitioned by month and then by day), then FOR will operate on the first level where a match is found (for example, the monthly partition). If your intent is to operate on a subpartition, you must declare so as follows: ALTER TABLE name ALTER PARTITION FOR ('2016-10-01') DROP PARTITION FOR ('2016-10-01');
例子
postgres=# select oid from pg_class where relname='tbl_pos'; oid -------- 679926 (1 row) postgres=# SELECT c.oid::pg_catalog.regclass FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i WHERE c.oid=i.inhrelid AND i.inhparent = '679926' ORDER BY c.relname; oid ------------------- tbl_pos_1_prt_p1 tbl_pos_1_prt_p10 tbl_pos_1_prt_p2 tbl_pos_1_prt_p3 tbl_pos_1_prt_p4 tbl_pos_1_prt_p5 tbl_pos_1_prt_p6 tbl_pos_1_prt_p7 tbl_pos_1_prt_p8 tbl_pos_1_prt_p9 (10 rows) SELECT attrnums FROM pg_catalog.gp_distribution_policy t WHERE localoid = '679926' ; attrnums ---------- {1} (1 row) SELECT attname FROM pg_attribute WHERE attrelid = '679926' AND attnum = '1' ; attname --------- uid (1 row) postgres=# select * from pg_partitions where partitiontablename='tbl_pos_1_prt_p5'; -[ RECORD 1 ]------------+-------------------------------------------------------------------------------------------------- schemaname | postgres tablename | tbl_pos partitionschemaname | postgres partitiontablename | tbl_pos_1_prt_p5 partitionname | p5 parentpartitiontablename | parentpartitionname | partitiontype | list partitionlevel | 0 partitionrank | partitionposition | 5 partitionlistvalues | 5::smallint partitionrangestart | partitionstartinclusive | partitionrangeend | partitionendinclusive | partitioneveryclause | partitionisdefault | f partitionboundary | PARTITION p5 VALUES(5) WITH (appendonly=true, orientation=row, compresstype=zlib, checksum=false) parenttablespace | pg_default partitiontablespace | pg_default
查看这个表的膨胀率发现已经膨胀了100%
postgres=# select * from gp_toolkit.__gp_aovisimap_compaction_info('tbl_pos_1_prt_p5'::regclass); NOTICE: gp_appendonly_compaction_threshold = 10 content | datafile | compaction_possible | hidden_tupcount | total_tupcount | percent_hidden ---------+----------+---------------------+-----------------+----------------+---------------- 2 | 1 | t | 20833264 | 20833264 | 100.00 1 | 1 | t | 20833546 | 20833546 | 100.00 6 | 1 | t | 20833383 | 20833383 | 100.00 7 | 1 | t | 20833287 | 20833287 | 100.00 0 | 1 | t | 20833180 | 20833180 | 100.00 5 | 1 | t | 20833205 | 20833205 | 100.00 10 | 1 | t | 20833420 | 20833420 | 100.00 12 | 1 | t | 20833193 | 20833193 | 100.00 4 | 1 | t | 20833417 | 20833417 | 100.00 9 | 1 | t | 20833259 | 20833259 | 100.00 8 | 1 | t | 20833455 | 20833455 | 100.00 3 | 1 | t | 20833441 | 20833441 | 100.00 17 | 1 | t | 20833182 | 20833182 | 100.00 13 | 1 | t | 20833436 | 20833436 | 100.00 15 | 1 | t | 20833418 | 20833418 | 100.00 14 | 1 | t | 20833197 | 20833197 | 100.00 11 | 1 | t | 20833306 | 20833306 | 100.00 16 | 1 | t | 20833304 | 20833304 | 100.00 19 | 1 | t | 20833285 | 20833285 | 100.00 26 | 1 | t | 20833215 | 20833215 | 100.00 25 | 1 | t | 20833440 | 20833440 | 100.00 24 | 1 | t | 20833193 | 20833193 | 100.00 28 | 1 | t | 20833394 | 20833394 | 100.00 30 | 1 | t | 20833336 | 20833336 | 100.00 32 | 1 | t | 20833516 | 20833516 | 100.00 31 | 1 | t | 20833313 | 20833313 | 100.00 35 | 1 | t | 20833274 | 20833274 | 100.00 39 | 1 | t | 20833460 | 20833460 | 100.00 47 | 1 | t | 20833269 | 20833269 | 100.00 34 | 1 | t | 20833406 | 20833406 | 100.00 23 | 1 | t | 20833253 | 20833253 | 100.00 45 | 1 | t | 20833305 | 20833305 | 100.00 42 | 1 | t | 20833365 | 20833365 | 100.00 36 | 1 | t | 20833176 | 20833176 | 100.00 41 | 1 | t | 20833301 | 20833301 | 100.00 20 | 1 | t | 20833407 | 20833407 | 100.00 22 | 1 | t | 20833333 | 20833333 | 100.00 29 | 1 | t | 20833259 | 20833259 | 100.00 37 | 1 | t | 20833547 | 20833547 | 100.00 38 | 1 | t | 20833285 | 20833285 | 100.00 44 | 1 | t | 20833413 | 20833413 | 100.00 40 | 1 | t | 20833352 | 20833352 | 100.00 27 | 1 | t | 20833464 | 20833464 | 100.00 33 | 1 | t | 20833272 | 20833272 | 100.00 43 | 1 | t | 20833230 | 20833230 | 100.00 18 | 1 | t | 20833330 | 20833330 | 100.00 46 | 1 | t | 20833467 | 20833467 | 100.00 21 | 1 | t | 20833247 | 20833247 | 100.00 (48 rows)
主备回收空间,新建表,写入数据,交换分区
postgres=# create table tbl_pos_1_prt_p5_exchange(like tbl_pos_1_prt_p5) with (appendonly=true, orientation=row, compresstype=zlib, checksum=false) distributed by (uid); CREATE TABLE postgres=# begin; BEGIN postgres=# lock table tbl_pos_1_prt_p5 in ACCESS EXCLUSIVE mode; LOCK TABLE postgres=# insert into tbl_pos_1_prt_p5_exchange select * from tbl_pos_1_prt_p5; -- 注意只调用一次,多次调用会被多次交换(交互偶数次就等于没有改变了)。 postgres=# alter table tbl_pos alter partition p5 EXCHANGE PARTITION p5 WITH TABLE tbl_pos_1_prt_p5_exchange with VALIDATION; NOTICE: exchanged partition "p5" of partition "p5" of relation "tbl_pos" with relation "tbl_pos_1_prt_p5_exchange" ALTER TABLE postgres=# drop table tbl_pos_1_prt_p5_exchange; DROP TABLE postgres=# end;
Greenplum提供给了一种方法:交换数据、交换分区。可以平滑的处理 清理垃圾、调整分布键、分区键、修改存储模式等。
转载地址:http://kmgul.baihongyu.com/