Monday 14 September 2009

Partitioning by dates: the quick how-to

Partitioning by dates: the quick how-to: "There is thorough documentation about the Partitioning feature in MySQL 5.1. There are also nice articles like this one by Robin. However, I thought it would be useful to have a quick 'how-to' guide to partitioning by dates. I'll use the world schema to keep it easy.



Partitioning a table by a range of dates is quite popular. Unfortunately, the PARTITION BY RANGE only accepts an integer (or a function that evaluates to an integer) as the partition expression. That's fine if you want to partition by numeric types like this:



ALTER TABLE City PARTITION BY RANGE(id) (

PARTITION p0 VALUES LESS THAN (1000),

PARTITION p1 VALUES LESS THAN (2000),

PARTITION p2 VALUES LESS THAN (3000),

PARTITION p3 VALUES LESS THAN (4000),

PARTITION p4 VALUES LESS THAN (MAXVALUE));





Side node: when partitioning using range, the partitions must be defined using 'VALUES LESS THAN' and the ranges must be listed in increasing order.



If you'd rather partition by a date or datetime, it's not as easy. You cannot use a date directly (e.g., this syntax fails: RANGE(date_column)..PARTITION p0 VALUES LESS THAN ('2000-01-01')). There are plenty of workarounds, but my favorite is the to_days function.



First I need a date column, so let's add one to the City table:

ALTER TABLE City ADD citydate DATE;



# Now populate it with random days in the last ~3 years

UPDATE City SET citydate = current_date - INTERVAL truncate(rand()*1000,0) DAY;



#Remove the old partitioning

ALTER TABLE City REMOVE PARTITIONING;



#Remove the PRIMARY KEY constraint and replace it with an index (I'll explain below)

ALTER TABLE City DROP PRIMARY KEY, ADD INDEX(id);



#Partition by the date col:

ALTER TABLE City PARTITION BY RANGE (to_days(citydate)) (

PARTITION p0 VALUES LESS THAN (to_days('2007-01-01')),

PARTITION p1 VALUES LESS THAN (to_days('2008-01-01')),

PARTITION p2 VALUES LESS THAN (to_days('2009-01-01')),

PARTITION p3 VALUES LESS THAN (to_days('2009-03-01')),

PARTITION p4 VALUES LESS THAN (to_days('2009-06-01')),

PARTITION p5 VALUES LESS THAN MAXVALUE);




Notice the partitions do not have to be even. This is very handy; you may want to put older records that aren't accessed often in a larger partition and keep the recent data in small, fast partitions.



Let's see partition pruning in action. If I run a query that only needs rows from a few partitions, the optimizer will only read from those partitions:

EXPLAIN PARTITIONS SELECT count(*) FROM City WHERE citydate BETWEEN '2009-01-01' AND '2009-08-01';



+----+-------------+-------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+-------------+

| 1 | SIMPLE | City | p3,p4,p5 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+-------------+




Notice the optimizer realizes it only needs to use certain partitions to find the results, shown by the partitions column.



Now let me explain why I removed the primary key. There is a rule about using unique constraints with partitioning. The short answer is: you can't have a unique constraint on something that you didn't partition by. The reason is that when you insert a record and the uniqueness needs to be checked, we don't want to search through every partition to verify uniqueness. Indexes are local to the partition (global indexes may be implemented in the future). So you can only have a unique constraint if all the columns in the constraint are used in the partitioning expression.
PlanetMySQL Voting:
Vote UP /
Vote DOWN"

No comments:

Sike's shared items