How Does Pg_pathman Handle Filter Conditions?

In my previous post I’ve introduced pg_pathman as an extension which accelerate query planning over partitioned tables. In this post I would like to covert another aspect of pg_pathman: it not only produce plans faster, but also produce better plans. Thanks to it query execution with pg_pathman becomes much faster in some cases.

When you search partitioned table with some filter conditions, pg_pathman adopts this filter to each individual partition. Therefore, each partition receives the only filter conditions which are useful to check against it.

Let me illustrate this on the example. At first, let’s see what’s happening with filter conditions while dealing with PostgreSQL inheritance mechanism.

Let us make some partitioned table using inheritance.

CREATE TABLE test (ts timestamp NOT NULL, title text);
CREATE INDEX test_ts_idx ON test (ts);
CREATE TABLE test_1 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-01-01' AND ts < '2015-02-01' )) INHERITS (test);
CREATE TABLE test_2 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-02-01' AND ts < '2015-03-01' )) INHERITS (test);
CREATE TABLE test_3 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-03-01' AND ts < '2015-04-01' )) INHERITS (test);
CREATE TABLE test_4 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-04-01' AND ts < '2015-05-01' )) INHERITS (test);
CREATE TABLE test_5 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-05-01' AND ts < '2015-06-01' )) INHERITS (test);
CREATE TABLE test_6 (LIKE test INCLUDING INDEXES, CHECK ( ts >= '2015-06-01' AND ts < '2015-07-01' )) INHERITS (test);

And them fill it with test data.

INSERT INTO test_1 (SELECT '2015-01-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 31 - 1) i);
INSERT INTO test_2 (SELECT '2015-02-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 28 - 1) i);
INSERT INTO test_3 (SELECT '2015-03-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 31 - 1) i);
INSERT INTO test_4 (SELECT '2015-04-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 30 - 1) i);
INSERT INTO test_5 (SELECT '2015-05-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 31 - 1) i);
INSERT INTO test_6 (SELECT '2015-06-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 30 - 1) i);

Then let’s try to select rows from two time intervals.

# EXPLAIN SELECT * FROM test WHERE (ts >= '2015-02-01' AND ts < '2015-03-15') OR (ts >= '2015-05-15' AND ts < '2015-07-01');
                                                                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..5028.22 rows=128059 width=41)
   ->  Seq Scan on test  (cost=0.00..0.00 rows=1 width=40)
         Filter: (((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone)) OR ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone)))
   ->  Seq Scan on test_2  (cost=0.00..1183.40 rows=40320 width=41)
         Filter: (((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone)) OR ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone)))
   ->  Bitmap Heap Scan on test_3  (cost=444.46..1266.02 rows=20178 width=41)
         Recheck Cond: (((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone)) OR ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone)))
         ->  BitmapOr  (cost=444.46..444.46 rows=20178 width=0)
               ->  Bitmap Index Scan on test_3_ts_idx  (cost=0.00..430.07 rows=20178 width=0)
                     Index Cond: ((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone))
               ->  Bitmap Index Scan on test_3_ts_idx  (cost=0.00..4.30 rows=1 width=0)
                     Index Cond: ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone))
   ->  Seq Scan on test_5  (cost=0.00..1310.80 rows=24360 width=41)
         Filter: (((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone)) OR ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone)))
   ->  Seq Scan on test_6  (cost=0.00..1268.00 rows=43200 width=41)
         Filter: (((ts >= '2015-02-01 00:00:00'::timestamp without time zone) AND (ts < '2015-03-15 00:00:00'::timestamp without time zone)) OR ((ts >= '2015-05-15 00:00:00'::timestamp without time zone) AND (ts < '2015-07-01 00:00:00'::timestamp without time zone)))
(16 rows)

We can see that filter condition was passed to each partition as is. But actually it could be simplified a lot. For instance, table test_2 could be scan without filter condition at all because all its rows are matching. Filter condition to test_3 could be simplified to ts < '2015-03-15', therefore BitmapOr is not necessary.

Let’s try the same example with pg_pathman. Firstly create test table and its partitions.

CREATE TABLE test (ts timestamp NOT NULL, title text);
CREATE INDEX test_ts_idx ON test (ts);
SELECT create_range_partitions('test', 'ts', '2015-01-01'::timestamp, '1 month'::interval, 6);

Then insert test data into table. pg_pathman automatically creates trigger which distribute data between partitions. Just like pg_partman does.

INSERT INTO test (SELECT '2015-01-01'::timestamp + i * interval '1 minute', md5(i::text) FROM generate_series(0, 1440 * 181 - 1) i);

And finally try the same query with pg_pathman.

# EXPLAIN SELECT * FROM test WHERE (ts >= '2015-02-01' AND ts < '2015-03-15') OR (ts >= '2015-05-15' AND ts < '2015-07-01');
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Append  (cost=0.00..3248.59 rows=0 width=0)
   ->  Seq Scan on test_2  (cost=0.00..780.20 rows=0 width=0)
   ->  Index Scan using test_3_ts_idx on test_3  (cost=0.29..767.99 rows=0 width=0)
         Index Cond: (ts < '2015-03-15 00:00:00'::timestamp without time zone)
   ->  Seq Scan on test_5  (cost=0.00..864.40 rows=0 width=0)
         Filter: (ts >= '2015-05-15 00:00:00'::timestamp without time zone)
   ->  Seq Scan on test_6  (cost=0.00..836.00 rows=0 width=0)
(7 rows)

We can see that pg_pathman selects the same partitions, but query plan becomes way simpler. Now, test_2 is scanned without useless filter condition. test_3 is scanned using just ts < '2015-03-15' filter condition. Thanks to it, plain Index Scan is used instead of BitmapOr. And similar advances was applied to rest of partitions.

How was this simplification possible? The common fear here is that such simplification could be computational expensive in general case. But since pg_pathman is intended to decrease query planning time, it’s very important to keep all transformations cheap and simple. And this cheap and simple algorithm of transformation really exists.

Let’s see how it works on simple example. The filter condition (ts >= '2015-02-01' AND ts < '2015-03-15') OR (ts >= '2015-05-15' AND ts < '2015-07-01') have following tree representation.

Leaf nodes of tree are simple conditions. Non-leaf nodes are logical operators which forms complex conditions. For particular partition each filter condition (either simple or complex) could be treated into one of three classes.

Filter condition is always true for rows of this partition (t). For instance, condition ts >= '2015-04-15' is always true for partition ts >= 2015-05-01 AND ts < 2015-06-01.
Filter condition could be either true or false for rows of this partition (m). For instance, condition ts >= '2015-03-15' could be either true or false for partition ts >= 2015-03-01 AND ts < 2015-03-01.
Filter condition is always false for rows of this partition (f). For instance, condition ts <= '2015-02-01' is always false for partition ts >= 2015-04-01 AND ts < 2015-04-01.

We can mark each tree node with vector of classes which corresponding condition is treated against each partition. These vectors could be filled upwards: for leaf nodes first, and then for non-leaf nodes using tri-state logic.

It’s evident that only conditions which could be either true or false (m) are useful for filtering. Conditions which are always true or always false shouldn’t be presented in the partitions filter. Using produced three we can now produce filter conditions for each partition.

For ts >= 2015-01-01 AND ts < 2015-02-01 partition, whole filter condition is false. So, skip it.
For ts >= 2015-02-01 AND ts < 2015-03-01 partition, whole filter condition is true. So, scan it without filter.
For ts >= 2015-03-01 AND ts < 2015-04-01 partition, filter condition tree would be reduced into following tree.

Therefore, this partition will be scan with ts < '2015-03-15' filter.
For ts >= 2015-04-01 AND ts < 2015-05-01 partition, whole filter condition is false. So, skip it.
For ts >= 2015-05-01 AND ts < 2015-06-01 partition, filter condition tree would be reduced into following tree.

Therefore, this partition will be scan with ts >= '2015-05-15' filter.
For ts >= 2015-06-01 AND ts < 2015-07-01 partition, whole filter condition is true. So, scan it without filter.

This is how filter conditions processing works in pg_pathman. The explanation could be a bit exhausting for reading, but I hope you feel enlighten by getting how it works. I remember that pg_pathman is open source extension for PostgreSQL 9.5 in beta-release stage. I appeal to everyone interested for trying it and sharing a feedback.

Comments