Long story short, using PostgreSQL 11 and RUM index you can do both TOP-N query and COUNT(*) query for non-selective FTS queries without fetching all the matching results from heap (and that is certainly much faster). If you’re interested in details, then please read the detailed description below.

At November 1st 2017, Tom Lane committed a patch enabling bitmap scans to behave like index-only scan when possible. In particular, since PostgreSQL 11 COUNT(*) queries can be evaluated using bitmap scans without accessing heap, when corresponding bit in visibility map is set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
commit 7c70996ebf0949b142a99c9445061c3c83ce62b3
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Nov 1 17:38:12 2017 -0400

    Allow bitmap scans to operate as index-only scans when possible.

    If we don't have to return any columns from heap tuples, and there's
    no need to recheck qual conditions, and the heap page is all-visible,
    then we can skip fetching the heap page altogether.

    Skip prefetching pages too, when possible, on the assumption that the
    recheck flag will remain the same from one page to the next.  While that
    assumption is hardly bulletproof, it seems like a good bet most of the
    time, and better than prefetching pages we don't need.

    This commit installs the executor infrastructure, but doesn't change
    any planner cost estimates, thus possibly causing bitmap scans to
    not be chosen in cases where this change renders them the best choice.
    I (tgl) am not entirely convinced that we need to account for this
    behavior in the planner, because I think typically the bitmap scan would
    get chosen anyway if it's the best bet.  In any case the submitted patch
    took way too many shortcuts, resulting in too many clearly-bad choices,
    to be committable.

    Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
    rather heavily by me.

    Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru

It’s not very widely known, but PostgreSQL is gathering statistics for indexed expressions. See following example.

1
2
3
4
5
6
7
8
9
CREATE TABLE test AS (SELECT random() x, random() y FROM generate_series(1,1000000));
ANALYZE test;

EXPLAIN ANALYZE SELECT * FROM test WHERE x + y < 0.01;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..20406.00 rows=333333 width=16) (actual time=1.671..113.693 rows=56 loops=1)
   Filter: ((x + y) < '0.01'::double precision)
   Rows Removed by Filter: 999944

We created table with two columns x and y whose values are independently and uniformly distributed from 0 to 1. Despite we analyze that table, PostgreSQL optimizer estimates selectivity of x + y < 0.01 qual as 1/3. You can see that this estimation is not even close to reality: we actually selected 56 rows instead of 333333 rows estimated. This estimation comes from a rough assumption that < operator selects 1/3 of rows unless something more precise is known. Of course, it could be possible for planner to do something better in this case. For instance, it could try to calculate histogram for x + y from the separate histograms for x and y. However, PostgreSQL optimizer doesn’t perform such costly and complex computations for now.

Situation changes once we define an index on x + y.

Today I gave a talk “Our answer to Uber” at United Dev Conf, Minsk. Slides could be found at slideshare. In my talk I attempted to make a review of Uber’s notes and summarize community efforts to overcome highlighted shortcomings.

United Dev Conf is quite big IT conference with more than 700 attendees. I’d like to notice that interest in PostgreSQL is quire high. The room was almost full during my talk. Also, after the talk I was continuously giving answers to surroundings in about 1 hour.

I think that Minsk is very attractive place for IT events. There are everything required for it: lovely places for events, good and not expensive hotels, developed infrastructure. Additionally Belarus introduces 5 days visa-free travel for 80 countries, and that made conference attendance much easier for many people. It would be nice to have PGDay.Minsk one day.

New major release of PostgreSQL is approaching. PostgreSQL 9.6 is expected to be released later today. This is a great release which provides to users set of outstanding new features. I’m especially happy that Postgres Professional did substantial contribution to this release. In particular, full-text search for phrases and scalability improvements are listed as major enhancements of this new PostgreSQL release.

The full list of Postgres Professional constributions including:

Faceted search is very popular buzzword nowadays. In short, faceted search specialty is that its results are organized per category. Popular search engines are receiving special support of faceted search.

Let’s see what PostgreSQL can do in this field. At first, let’s formalize our task. For each category which have matching documents we want to obtain:

  • Total number of matching documents;
  • TOP N matching documents.

For sure, it’s possible to query such data using multiple per category SQL queries. But we’ll make it in a single SQL query. That also would be faster in majority of cases. The query below implements faceted search over PostgreSQL mailing lists archives using window functions and CTE. Usage of window function is essential while CTE was used for better query readability.

Dealing with partitioned tables we can’t always select relevant partitions during query planning. Naturally, during query planning you can’t know values which come from subquery or outer part of nested loop join. Nevertheless, it would be ridiculous to scan all the partitions in such cases.

This is why my Postgres Professional colleague Dmitry Ivanov developed a new custom executor node for pg_pathman: RuntimeAppend. This node behaves like regular Append node: it contains set of children Nodes which should be appended. However, RuntimeAppend have one distinction: each run it selects only relevant children to append basing on parameter values.

For people who are actively working with psql, it frequently happens that you want to draw graph for the table you’re currently seeing. Typically, it means a cycle of actions including: exporting data, importing it into graph drawing tool and drawing graph itself. It appears that this process could be automated: graph could be drawn by typing a single command directly in psql. See an example on the screenshot below.

PostgreSQL scalability on multicore and multisocket machines became a subject of optimization long time ago once such machines became widely used. This blog post shows brief history of vertical scalability improvements between versions 8.0 and 8.4. PostgreSQL 9.2 had very noticeable scalability improvement. Thanks to fast path locking and other optimizations it becomes possible to achieve more than 350 000 TPS in select-only pgbench test. The latest stable release PostgreSQL 9.5 also contain significant scalability advancements including LWLock improvement which allows achieving about 400 000 TPS in select-only pgbench test.

Postgres Professional company also became involved into scalability optimization. In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers. The results of this research was published in popular Russian blog habrahabr (Google translated version). As brief result of this research we identify two ways to improve PostgreSQL scalability on Power8:

  1. Implement Pin/UnpinBuffer() using CAS operations instead of buffer header spinlock;
  2. Optimize LWLockAttemptLock() in assembly to make fewer loops for changing lwlock state.

The optimization #1 appears to give huge benefit on big Intel servers as well, while optimization #2 is Power-specific. After long rounds of optimization, cleaning and testing #1 was finally committed by Andres Freund.

PostgreSQL 9.6 receives suitable support of extensible index access methods. And that’s good news because Postgres was initially designed to support it.

“It is imperative that a user be able to construct new access methods to provide efficient access to instances of nontraditional base types”

Michael Stonebraker, Jeff Anton, Michael Hirohama. Extendability in POSTGRES , IEEE Data Eng. Bull. 10 (2) pp.16-23, 1987

That was a huge work which consists of multiple steps.

Recently Robert Haas has committed a patch which allows seeing some more detailed information about current wait event of the process. In particular, user will be able to see if process is waiting for heavyweight lock, lightweight lock (either individual or tranche) or buffer pin. The full list of wait events is available in the documentation. Hopefully, it will be more wait events in further releases.

It’s nice to see current wait event of the process, but just one snapshot is not very descriptive and definitely not enough to do any conclusion. But we can use sampling for collecting suitable statistics. This is why I’d like to present pg_wait_sampling which automates gathering sampling statistics of wait events. pg_wait_sampling enables you to gather statistics for graphs like the one below.