In my previous post I’ve
introduced pg_pathman as an
extension which accelerate query planning over partitioned tables. In this post
I would like to covert another aspect of pg_pathman: it not only produce plans
faster, but also produce better plans. Thanks to it query execution with
pg_pathman becomes much faster in some cases.
When you search partitioned table with some filter conditions, pg_pathman adopts
this filter to each individual partition. Therefore, each partition receives
the only filter conditions which are useful to check against it.
Let me illustrate this on the example. At first, let’s see what’s happening
with filter conditions while dealing with PostgreSQL inheritance mechanism.
Let us make some partitioned table using inheritance.
We can see that filter condition was passed to each partition as is. But
actually it could be simplified a lot. For instance, table test_2 could be scan
without filter condition at all because all its rows are matching. Filter
condition to test_3 could be simplified to ts < '2015-03-15', therefore
BitmapOr is not necessary.
Let’s try the same example with
pg_pathman. Firstly create test
table and its partitions.
We can see that pg_pathman selects the same partitions, but query plan becomes
way simpler. Now, test_2 is scanned without useless filter condition. test_3
is scanned using just ts < '2015-03-15' filter condition. Thanks to it, plain
Index Scan is used instead of BitmapOr. And similar advances was applied to
rest of partitions.
How was this simplification possible? The common fear here is that such
simplification could be computational expensive in general case. But since
pg_pathman is intended to decrease query planning time, it’s very important to
keep all transformations cheap and simple. And this cheap and simple algorithm
of transformation really exists.
Let’s see how it works on simple example. The filter condition (ts >=
'2015-02-01' AND ts < '2015-03-15') OR (ts >= '2015-05-15' AND ts <
'2015-07-01') have following tree representation.
Leaf nodes of tree are simple conditions. Non-leaf nodes are logical operators
which forms complex conditions. For particular partition each filter condition
(either simple or complex) could be treated into one of three classes.
Filter condition is always true for rows of this partition (t). For
instance, condition ts >= '2015-04-15' is always true for partition ts >=
2015-05-01 AND ts < 2015-06-01.
Filter condition could be either true or false for rows of this partition
(m). For instance, condition ts >= '2015-03-15' could be either true or
false for partition ts >= 2015-03-01 AND ts < 2015-03-01.
Filter condition is always false for rows of this partition (f). For
instance, condition ts <= '2015-02-01' is always false for partition ts >=
2015-04-01 AND ts < 2015-04-01.
We can mark each tree node with vector of classes which corresponding condition
is treated against each partition. These vectors could be filled upwards: for
leaf nodes first, and then for non-leaf nodes using tri-state logic.
It’s evident that only conditions which could be either true or false (m) are
useful for filtering. Conditions which are always true or always false
shouldn’t be presented in the partitions filter. Using produced three we can
now produce filter conditions for each partition.
For ts >= 2015-01-01 AND ts < 2015-02-01 partition, whole filter condition
is false. So, skip it.
For ts >= 2015-02-01 AND ts < 2015-03-01 partition, whole filter condition
is true. So, scan it without filter.
For ts >= 2015-03-01 AND ts < 2015-04-01 partition, filter condition tree
would be reduced into following tree.
Therefore, this partition will be scan with ts < '2015-03-15' filter.
For ts >= 2015-04-01 AND ts < 2015-05-01 partition, whole filter condition
is false. So, skip it.
For ts >= 2015-05-01 AND ts < 2015-06-01 partition, filter condition tree
would be reduced into following tree.
Therefore, this partition will be scan with ts >= '2015-05-15' filter.
For ts >= 2015-06-01 AND ts < 2015-07-01 partition, whole filter condition
is true. So, scan it without filter.
This is how filter conditions processing works in
pg_pathman. The explanation could
be a bit exhausting for reading, but I hope you feel enlighten by getting how it
works. I remember that pg_pathman is open source extension for PostgreSQL 9.5
in beta-release stage. I appeal to everyone interested for trying it and
sharing a feedback.