<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Category: Pg | Alexander Korotkov's blog]]></title>
  <link href="https://akorotkov.github.io/blog/categories/pg/atom.xml" rel="self"/>
  <link href="https://akorotkov.github.io/"/>
  <updated>2021-05-28T22:52:41+03:00</updated>
  <id>https://akorotkov.github.io/</id>
  <author>
    <name><![CDATA[Alexander Korotkov]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[PostgreSQL 14: Substantion Change to Fulltext Query Parsing]]></title>
    <link href="https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing/"/>
    <updated>2021-05-22T14:12:00+03:00</updated>
    <id>https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing</id>
    <content type="html"><![CDATA[<p><img class="no-border 2x" src="/images/fts.png" width="490" height="235"></p>

<p>Long story short, since PostgreSQL 14 <code>to_tsquery('pg_class')</code> becomes
<code>'pg' &lt;-&gt; 'class'</code> instead of <code>'pg' &amp; 'class'</code>
(<a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0c4f355c6a">commit 0c4f355c6a</a>).  That is for instance,
in PostgreSQL 13 and earlier <code>to_tsquery('pg_class')</code> matches
<code>to_tsvector('a class of pg')</code>.  But since PostgreSQL 14 it doesn’t match, but
still matches <code>to_tsvector('pg_class')</code> and <code>to_tsvector('pg*class')</code>.
This is incompatible change, which affects fts users, but we have to do this
in order to fix phrase search design problems.</p>

<p>The story started with a
<a href="https://www.postgresql.org/message-id/16592-70b110ff9731c07d@postgresql.org">bug</a>
when <code>to_tsvector('pg_class pg')</code> didn’t match to
<code>websearch_to_tsquery('"pg_class pg"')</code>.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">select</span> <span class="n">to_tsvector</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">’</span><span class="p">)</span> <span class="o">@@</span>
</span><span class='line'>         <span class="n">websearch_to_tsquery</span><span class="p">(</span><span class="err">‘“</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">”’</span><span class="p">);</span>
</span><span class='line'> <span class="o">?</span><span class="k">column</span><span class="o">?</span>
</span><span class='line'><span class="err">———</span><span class="o">-</span>
</span><span class='line'> <span class="n">f</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>Looks strange!  Naturally, when you search for some
text in quotes, you expect it to match at least the exact same text in the document.
But it doesn’t. My first idea was that it’s just bug of <code>websearch_to_tsquery()</code>
function, but <code>to_tsquery()</code> appears to have the same problem:
<code>to_tsquery('pg_class &lt;-&gt; pg')</code> doesn’t match to <code>to_tsvector('pg_class pg')</code>
as well.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">select</span> <span class="n">to_tsvector</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">’</span><span class="p">)</span> <span class="o">@@</span>
</span><span class='line'>         <span class="n">to_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="n">pg</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'> <span class="o">?</span><span class="k">column</span><span class="o">?</span>
</span><span class='line'><span class="err">———</span><span class="o">-</span>
</span><span class='line'> <span class="n">f</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>I was surprised that although phrase search arrived many years ago,
basic things don’t work.</p>

<!--more-->

<p>Looking under the hood, both <code>websearch_to_tsquery('"pg_class pg"')</code> and
<code>to_tsquery('pg_class &lt;-&gt; pg')</code> compiles into <code>( 'pg' &amp; 'class' ) &lt;-&gt; 'pg'</code>.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">select</span> <span class="n">websearch_to_tsquery</span><span class="p">(</span><span class="err">‘“</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">”’</span><span class="p">),</span>
</span><span class='line'>         <span class="n">to_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="n">pg</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'>    <span class="n">websearch_to_tsquery</span>     <span class="o">|</span>         <span class="n">to_tsquery</span>
</span><span class='line'><span class="err">—————————–</span><span class="o">+</span><span class="err">—————————–</span>
</span><span class='line'> <span class="p">(</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘</span><span class="k">class</span><span class="err">’</span> <span class="p">)</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="o">|</span> <span class="p">(</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘</span><span class="k">class</span><span class="err">’</span> <span class="p">)</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>This tsquery expects both <code>pg</code> and <code>class</code> to be one position left from another
<code>pg</code>.  That means both <code>pg</code> and <code>class</code> need to reside in the same position.
In principle, that’s possible, for instance, when a single word is split into two
synonyms by fulltext dictionary.  But that’s not our case.  When we parse
<code>pg_class pg</code> text, each word gets position sequentially.  No two of them
reside in the same position.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">select</span> <span class="n">to_tsvector</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'>    <span class="n">to_tsvector</span>
</span><span class='line'><span class="err">——————–</span>
</span><span class='line'> <span class="err">‘</span><span class="k">class</span><span class="err">’</span><span class="p">:</span><span class="mi">2</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="mi">3</span>
</span><span class='line'><span class="p">(</span><span class="mi">1</span> <span class="k">row</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>Why does tsquery parsing work this way?  Historically, in PostgreSQL fulltext search
<code>to_tsquery('pg_class')</code> compiles into <code>'pg' &amp; 'class'</code>.  Therefore, <code>pg</code> and
<code>class</code> don’t have to appear together.  Before phrase search, that was the
only way to process this query as soon as we split <code>pg_class</code> into <code>pg</code> and
<code>class</code>.  Thus, querying compound words was a bit relaxed.  But now, when
combined with phrase search, it becomes unreasonably strict.</p>

<p>My original intention was to choose the way to compile <code>pg_class</code> depending
on the context.  With phrase search operator nearby <code>pg_class</code> should become
<code>'pg' &lt;-&gt; 'class'</code>, but be <code>'pg' &amp; 'class'</code> in the rest of cases.  But this
way required invasive refactoring of tsquery processing, taking more time than
I could to spend on this bug.</p>

<p>Fortunately, <a href="https://www.postgresql.org/message-id/10026.1609953512%40sss.pgh.pa.us">Tom Lane came with a proposal</a>
to always compile <code>pg_class</code> into <code>'pg' &lt;-&gt; 'class'</code>.  Thus, now both
<code>websearch_to_tsquery('"pg_class pg"')</code> and <code>to_tsquery('pg_class &lt;-&gt; pg')</code>
compile into <code>'pg' &lt;-&gt; 'class' &lt;-&gt; 'pg'</code>.  And both of them match to
<code>to_tsvector('pg_class pg')</code>.  That is a win!</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">select</span> <span class="n">websearch_to_tsquery</span><span class="p">(</span><span class="err">‘“</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">”’</span><span class="p">),</span>
</span><span class='line'>         <span class="n">to_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="n">pg</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'>   <span class="n">websearch_to_tsquery</span>    <span class="err">│</span>        <span class="n">to_tsquery</span>
</span><span class='line'><span class="err">───────────────────────────┼───────────────────────────</span>
</span><span class='line'> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="k">class</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="err">│</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="k">class</span><span class="err">’</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="err">‘</span><span class="n">pg</span><span class="err">’</span><span class="o">&lt;/</span><span class="n">p</span><span class="o">&gt;</span>
</span><span class='line'>
</span><span class='line'><span class="o">&lt;</span><span class="n">h1</span> <span class="n">id</span><span class="o">=</span><span class="ss">&quot;select-totsvectorpgclass-pg--websearchtotsquerypgclass-pg&quot;</span><span class="o">&gt;</span><span class="k">select</span> <span class="n">to_tsvector</span><span class="p">(</span><span class="err">‘</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">’</span><span class="p">)</span> <span class="o">@@</span> <span class="n">websearch_to_tsquery</span><span class="p">(</span><span class="err">‘“</span><span class="n">pg_class</span> <span class="n">pg</span><span class="err">”’</span><span class="p">),</span><span class="o">&lt;/</span><span class="n">h1</span><span class="o">&gt;</span>
</span><span class='line'><span class="o">&lt;</span><span class="n">pre</span><span class="o">&gt;&lt;</span><span class="n">code</span><span class="o">&gt;</span>     <span class="n">to_tsvector</span><span class="p">(</span><span class="s1">&#39;pg_class pg&#39;</span><span class="p">)</span> <span class="o">@@</span> <span class="n">to_tsquery</span><span class="p">(</span><span class="s1">&#39;pg_class &amp;lt;-&amp;gt; pg&#39;</span><span class="p">);</span>  <span class="o">?</span><span class="k">column</span><span class="o">?</span> <span class="err">│</span> <span class="o">?</span><span class="k">column</span><span class="o">?</span> <span class="err">──────────┼──────────</span>  <span class="n">t</span>        <span class="err">│</span> <span class="n">t</span>
</span></code></pre></td></tr></table></div></figure>
</code></pre>

<p>This approach would make
all queries involving compound words more strict. But at first, this appears
the only easy way to fix this design bug. Secondly, this is probably a better
way to handle compound words themselves.</p>

<p>And AFAICS, this approach seems to be the right way.  Thanks to it, yet another
<a href="https://www.postgresql.org/message-id/CA%2B0DEqiZs7gdOd4ikmg%3D0UWG%2BSwWOLxPsk_JW-sx9WNOyrb0KQ%40mail.gmail.com">phrase search bug</a>
appears to be quite <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=eb086056fec44516efdd5db71244a079fed65c7f">easy to fix</a>.</p>

<p>Happy phrase searching in PostgreSQL 14!  Hopefully, we would further manage
without incompatible changes :)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Rainbow Your Psql Output]]></title>
    <link href="https://akorotkov.github.io/blog/2021/05/17/rainbow-psql-output/"/>
    <updated>2021-05-17T23:30:00+03:00</updated>
    <id>https://akorotkov.github.io/blog/2021/05/17/rainbow-psql-output</id>
    <content type="html"><![CDATA[<p><img class="no-border 2x" src="/images/rainbow-psql.png" width="850" height="655">
It seems a good idea to change grey psql output to a lovely rainbow in honor
of <a href="https://en.wikipedia.org/wiki/International_Day_Against_Homophobia,_Transphobia_and_Biphobia">IDAHOT</a> day.
Thankfully there is <a href="https://github.com/busyloop/lolcat">lolcat</a> utility,
which is very easy to install on Linux and Mac OS.</p>

<p>Linux
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>sudo snap install lolcat
</span></code></pre></td></tr></table></div></figure>
Mac OS
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">$ </span>brew install lolcat
</span></code></pre></td></tr></table></div></figure></p>

<p>Having lolcat installed, you can set it up as a psql pager and get lovely rainbow
psql output!</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="err">\</span><span class="n">pset</span> <span class="n">pager</span> <span class="n">always</span>
</span><span class='line'><span class="err">\</span><span class="n">setenv</span> <span class="n">PAGER</span> <span class="err">‘</span><span class="n">lolcat</span> <span class="o">-</span><span class="n">f</span> <span class="o">|</span> <span class="k">less</span> <span class="o">-</span><span class="n">iMSx4R</span> <span class="o">-</span><span class="n">FX</span><span class="err">’</span>
</span></code></pre></td></tr></table></div></figure></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Jsonpath: ** Operator and Lax Mode Are't Meant to Be Together.]]></title>
    <link href="https://akorotkov.github.io/blog/2021/05/06/jsonpath-double-asterisk-lax/"/>
    <updated>2021-05-06T18:10:00+03:00</updated>
    <id>https://akorotkov.github.io/blog/2021/05/06/jsonpath-double-asterisk-lax</id>
    <content type="html"><![CDATA[<p><img class="no-border 2x" src="/images/double_asterisk_lax.png" width="529" height="221"></p>

<p>PostgreSQL has an extension to jsonpath: <code>**</code> operator, which explores
arbitrary depth finding your values everywhere.  At the same time, there is
a <code>lax</code> mode, defined by the standard, providing a “relaxed” way for working
with json.  In the <code>lax</code> mode, accessors automatically unwrap arrays; missing
keys don’t trigger errors; etc.  In short, it appears that the <code>**</code> operator
and <code>lax</code> mode aren’t designed to be together :)</p>

<!--more-->

<p>The story started with <a href="https://www.postgresql.org/message-id/16828-2b0229babfad2d8c%40postgresql.org">the bug report</a>.
The simplified version is below.  Jsonpath query is intended to select the
value of key <code>"y"</code> everywhere.  But it appears to select these values twice.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">jsonb_path_query</span><span class="p">(</span><span class="err">‘</span><span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">y</span><span class="err">”</span><span class="p">:</span> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span><span class="err">”</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span><span class="err">}</span><span class="p">]</span><span class="err">’</span><span class="p">::</span><span class="n">jsonb</span><span class="p">,</span>
</span><span class='line'>                                 <span class="err">‘$</span><span class="p">.</span><span class="o">**</span><span class="p">.</span><span class="n">x</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'> <span class="n">jsonb_path_query</span>
</span><span class='line'><span class="err">——————</span>
</span><span class='line'> <span class="err">“</span><span class="n">a</span><span class="err">”</span>
</span><span class='line'> <span class="err">“</span><span class="n">a</span><span class="err">”</span>
</span><span class='line'> <span class="err">“</span><span class="n">b</span><span class="err">”</span>
</span><span class='line'> <span class="err">“</span><span class="n">b</span><span class="err">”</span>
</span><span class='line'><span class="p">(</span><span class="mi">4</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>This case looks like a bug. But is it? Let’s dig into details. Let’s split
the jsonpath query into two parts: one containing the <code>**</code> operator and another
having the key accessor.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">SELECT</span> <span class="n">var</span><span class="p">,</span>
</span><span class='line'>         <span class="n">jsonb_path_query_array</span><span class="p">(</span><span class="n">var</span><span class="p">,</span> <span class="err">‘$</span><span class="p">.</span><span class="n">x</span><span class="err">’</span><span class="p">)</span> <span class="n">key_x</span>
</span><span class='line'>  <span class="k">FROM</span> <span class="n">jsonb_path_query</span><span class="p">(</span><span class="err">‘</span><span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">y</span><span class="err">”</span><span class="p">:</span> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span><span class="err">”</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span><span class="err">}</span><span class="p">]</span><span class="err">’</span><span class="p">::</span><span class="n">jsonb</span><span class="p">,</span>
</span><span class='line'>                        <span class="err">‘$</span><span class="p">.</span><span class="o">**</span><span class="err">’</span><span class="p">)</span> <span class="n">var</span><span class="p">;</span>
</span><span class='line'>               <span class="n">var</span>               <span class="o">|</span> <span class="n">key_x</span>
</span><span class='line'><span class="err">———————————</span><span class="o">+</span><span class="err">——</span><span class="o">-</span>
</span><span class='line'> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">y</span><span class="err">”</span><span class="p">:</span> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span><span class="err">}</span><span class="p">]</span> <span class="o">|</span> <span class="p">[</span><span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">]</span>
</span><span class='line'> <span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">y</span><span class="err">”</span><span class="p">:</span> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span><span class="err">}</span>   <span class="o">|</span> <span class="p">[</span><span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">]</span>
</span><span class='line'> <span class="err">“</span><span class="n">a</span><span class="err">”</span>                             <span class="o">|</span> <span class="p">[]</span>
</span><span class='line'> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span>                    <span class="o">|</span> <span class="p">[</span><span class="err">“</span><span class="n">b</span><span class="err">”</span><span class="p">]</span>
</span><span class='line'> <span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">b</span><span class="err">”}</span>                      <span class="o">|</span> <span class="p">[</span><span class="err">“</span><span class="n">b</span><span class="err">”</span><span class="p">]</span>
</span><span class='line'> <span class="err">“</span><span class="n">b</span><span class="err">”</span>                             <span class="o">|</span> <span class="p">[]</span>
</span><span class='line'><span class="p">(</span><span class="mi">6</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>As you can see, the <code>**</code> operator selects every child in the json document as
expected. The key accessor extracts corresponding values from both objects
themselves and their wrapping arrays. And that’s also expected in the <code>lax</code>
mode. So, it appears there is no bug; everything works as designed, although
it’s surprising for users.</p>

<p>Finally, I’ve <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b41645460a">committed a paragraph</a> to the
<a href="https://www.postgresql.org/docs/devel/functions-json.html#STRICT-AND-LAX-MODES">docs</a>,
which explicitly clarifies this issue.
It seems that <code>lax</code> mode and <code>**</code> operator just aren’t designed to be
used together.  If you need <code>**</code> operator, you can use <code>strict</code> mode. and
everything is intuitively correct.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="o">#</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">jsonb_path_query</span><span class="p">(</span><span class="err">‘</span><span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span> <span class="err">“</span><span class="n">a</span><span class="err">”</span><span class="p">,</span> <span class="err">“</span><span class="n">y</span><span class="err">”</span><span class="p">:</span> <span class="p">[</span><span class="err">{“</span><span class="n">x</span><span class="err">”</span><span class="p">:</span><span class="err">”</span><span class="n">b</span><span class="err">”}</span><span class="p">]</span><span class="err">}</span><span class="p">]</span><span class="err">’</span><span class="p">::</span><span class="n">jsonb</span><span class="p">,</span>
</span><span class='line'>                                 <span class="err">‘</span><span class="k">strict</span> <span class="err">$</span><span class="p">.</span><span class="o">**</span><span class="p">.</span><span class="n">x</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'> <span class="n">jsonb_path_query</span>
</span><span class='line'><span class="err">——————</span>
</span><span class='line'> <span class="err">“</span><span class="n">a</span><span class="err">”</span>
</span><span class='line'> <span class="err">“</span><span class="n">b</span><span class="err">”</span>
</span><span class='line'><span class="p">(</span><span class="mi">2</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Dramatical Effect of LSE Instructions for PostgreSQL on Graviton2 Instances]]></title>
    <link href="https://akorotkov.github.io/blog/2021/04/30/arm/"/>
    <updated>2021-04-30T03:10:00+03:00</updated>
    <id>https://akorotkov.github.io/blog/2021/04/30/arm</id>
    <content type="html"><![CDATA[<p>The world changes. ARM architecture breaks into new areas of computing. An only
decade ago, only your mobile, router, or another specialized device could be
ARM-based, while your desktop and server were typically x86-based. Nowadays,
your new MacBook is ARM-based, and your EC2 instance could be ARM as well.</p>

<p>In the mid-2020, Amazon made graviton2 instances publically available. The
maximum number of CPU core there is 64. This number is where it becomes
interesting to check PostgreSQL scalability. It’s exciting to check because
ARM implements atomic operations using pair of load/store. So, in a sense,
ARM is just like Power, where
<a href="https://www.postgresql.org/message-id/CAPpHfdsKrh7c7P8-5eG-qW3VQobybbwqH%3DgL5Ck%2BdOES-gBbFg%40mail.gmail.com">I’ve previously seen</a>
a significant effect of platform-specific atomics optimizations.</p>

<p>But on the other hand, ARM 8.1 defines a set of LSE instructions, which,
in particular, provide the way to implement atomic operation in a single
instruction (just like x86). What would be better: special optimization,
which puts custom logic between load and store instructions, or just a simple
loop of LSE CAS instructions? I’ve tried them both.</p>

<p>You can see the results of read-only and read-write pgbench on the graphs
below (details on experiments are <a href="https://www.postgresql.org/message-id/CAPpHfdsGqVd6EJ4mr_RZVE5xSiCNBy4MuSvdTrKmTpM0eyWGpg%40mail.gmail.com">here</a>).
<code>pg14-devel-lwlock-ldrex-strex</code> is the patched PostgreSQL with special
load/store optimization for lwlock, <code>pg14-devel-lse</code> is PostgreSQL compiled
with LSE support enabled.</p>

<p><img class="no-border 2x" src="/images/arm-ro.png" width="720" height="432"></p>

<p><img class="no-border 2x" src="/images/arm-rw.png" width="720" height="432"></p>

<p>You can see that load/store optimization gives substantial positive effect, but
LSE rocks here!</p>

<p>So, if you’re running PostgreSQL on graviton2 instance, make sure you’ve
binaries compiled with LSE support (see <a href="https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md">the instruction</a>)
because the effect is dramatic.</p>

<p>BTW, it appears that <a href="https://www.postgresql.org/message-id/1367116.1606802480%40sss.pgh.pa.us">none of these optimizations have a noticeable effect on the performance of Apple M1</a>.
Probably, M1 has a smart enough inner optimizer to recognize these different
implementations to be equivalent.  And it was surprising that LSE usage might
give <a href="https://www.postgresql.org/message-id/CAB10pyYgh%2BKM4rY6XYbj3NNHkUQVV9UNpqaVmb9_fLbsUW%2BVyg%40mail.gmail.com">a small negative effect on Kunpeng 920</a>.
It was discouraging for me to know an ARM processor, where single instruction
operation is slower than multiple instruction equivalent. Hopefully,
processor architects would fix this in new Kunpeng processors.</p>

<p>In general, we see that now different ARM embodiments have different
performance characteristics and different effects of optimizations. Hopefully,
this is a problem of growth, and it will be overcome soon.</p>

<p><strong>Update:</strong> As Krunal Bauskar pointer in the comments, LSE instructions are still
faster than the load/store option on Kunpeng 920. Different timings might cause
the regression. For instance, with LSE instructions, we could just faster reach
the regression caused by another bottleneck.</p>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Full Text Search Done (Almost) Right in PostgreSQL 11]]></title>
    <link href="https://akorotkov.github.io/blog/2018/02/17/fulltext-search-made-almost-right/"/>
    <updated>2018-02-17T18:20:00+03:00</updated>
    <id>https://akorotkov.github.io/blog/2018/02/17/fulltext-search-made-almost-right</id>
    <content type="html"><![CDATA[<p>Long story short, using PostgreSQL 11 with <a href="https://github.com/postgrespro/rum">RUM index</a>
you can do both TOP-N query and COUNT(*) for non-selective FTS queries without
fetching all the results from heap (that means much faster).  Are you bored yet?
If not, please read the detailed description below.</p>

<p>At November 1st 2017, Tome Lane committed a <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7c70996e">patch</a>
enabling bitmap scans to behave like index-only scan when possible.
In particular, since PostgreSQL 11 COUNT(*) queries can be evaluated using
bitmap scans without accessing heap when corresponding bit in visibility map
is set.  This patch was written by Alexander Kuzmenkov and reviewed by
Alexey Chernyshov (sboth are my Postgres Pro colleagues), and it was heavily
revised by Tom Lane.</p>

<!--more-->

<p>This commit might seem to be just one of planner and executor optimizations,
nice but doesn’t deserve much attention.  However, under detailed consideration
this patch appears to be significant improvement on the way of making full text
search in PostgreSQL to be done the right way.</p>

<p>I’ve started working on FTS improvements in 2012.  That time I realized that GIN
index is good for selective FTS queries, when number of matching results is low.
See the example below: GIN did great work for us by returning just few dozens of
matching rows very fast.  The rest operations including relevance calculation
and sorting are also fast, because they are performed over very small row set.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">EXPLAIN</span> <span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span> <span class="n">BUFFERS</span><span class="p">)</span>
</span><span class='line'><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">pgmail</span>
</span><span class='line'><span class="k">WHERE</span> <span class="n">fts</span> <span class="o">@@</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">exclusion</span> <span class="k">constraint</span><span class="err">’</span><span class="p">)</span>
</span><span class='line'><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ts_rank_cd</span><span class="p">(</span><span class="n">fts</span><span class="p">,</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">exclusion</span> <span class="k">constraint</span><span class="err">’</span><span class="p">))</span> <span class="k">DESC</span>
</span><span class='line'><span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</span><span class='line'>                                                               <span class="n">QUERY</span> <span class="n">PLAN</span>
</span><span class='line'><span class="err">—————————————————————————————————————————————</span><span class="o">-</span>
</span><span class='line'> <span class="k">Limit</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">144</span><span class="p">.</span><span class="mi">26</span><span class="p">..</span><span class="mi">144</span><span class="p">.</span><span class="mi">28</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">width</span><span class="o">=</span><span class="mi">784</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">320</span><span class="p">.</span><span class="mi">142</span><span class="p">..</span><span class="mi">320</span><span class="p">.</span><span class="mi">149</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>   <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">7138</span> <span class="k">read</span><span class="o">=</span><span class="mi">7794</span>
</span><span class='line'>   <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Sort</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">144</span><span class="p">.</span><span class="mi">26</span><span class="p">..</span><span class="mi">144</span><span class="p">.</span><span class="mi">32</span> <span class="k">rows</span><span class="o">=</span><span class="mi">25</span> <span class="n">width</span><span class="o">=</span><span class="mi">784</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">320</span><span class="p">.</span><span class="mi">141</span><span class="p">..</span><span class="mi">320</span><span class="p">.</span><span class="mi">147</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>         <span class="n">Sort</span> <span class="k">Key</span><span class="p">:</span> <span class="p">(</span><span class="n">ts_rank_cd</span><span class="p">(</span><span class="n">fts</span><span class="p">,</span> <span class="err">‘</span><span class="s1">&#39;’exclus’’ &amp;amp; ‘‘constraint’’’::tsquery)) DESC</span>
</span><span class='line'><span class="s1">         Sort Method: top-N heapsort  Memory: 38kB</span>
</span><span class='line'><span class="s1">         Buffers: shared hit=7138 read=7794</span>
</span><span class='line'><span class="s1">         -&amp;gt;  Bitmap Heap Scan on pgmail  (cost=44.20..143.72 rows=25 width=784) (actual time=5.232..315.302 rows=3357 loops=1)</span>
</span><span class='line'><span class="s1">               Recheck Cond: (fts @@ ‘&#39;</span><span class="err">’</span><span class="n">exclus</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="k">constraint</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>               <span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">2903</span>
</span><span class='line'>               <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">7138</span> <span class="k">read</span><span class="o">=</span><span class="mi">7794</span>
</span><span class='line'>               <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">pgmail_fts_idx</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">44</span><span class="p">.</span><span class="mi">19</span> <span class="k">rows</span><span class="o">=</span><span class="mi">25</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">3</span><span class="p">.</span><span class="mi">689</span><span class="p">..</span><span class="mi">3</span><span class="p">.</span><span class="mi">689</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3357</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>                     <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">fts</span> <span class="o">@@</span> <span class="err">‘&#39;’</span><span class="n">exclus</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="k">constraint</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>                     <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">11</span> <span class="k">read</span><span class="o">=</span><span class="mi">23</span>
</span><span class='line'> <span class="n">Planning</span> <span class="n">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">176</span> <span class="n">ms</span>
</span><span class='line'> <span class="n">Execution</span> <span class="n">time</span><span class="p">:</span> <span class="mi">320</span><span class="p">.</span><span class="mi">213</span> <span class="n">ms</span>
</span><span class='line'><span class="p">(</span><span class="mi">15</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>But situation is different if FTS query is not selective and number of matching
rows is high.  Then we have fetch all those rows from heap, calculate relevance
for each of them and sort them.  And despite we only need TOP-10 rows, this
query takes a lot of time.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">EXPLAIN</span> <span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span> <span class="n">BUFFERS</span><span class="p">)</span>
</span><span class='line'><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">pgmail</span>
</span><span class='line'><span class="k">WHERE</span> <span class="n">fts</span> <span class="o">@@</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">)</span>
</span><span class='line'><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ts_rank_cd</span><span class="p">(</span><span class="n">fts</span><span class="p">,</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">))</span> <span class="k">DESC</span>
</span><span class='line'><span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</span><span class='line'>                                                                 <span class="n">QUERY</span> <span class="n">PLAN</span>
</span><span class='line'><span class="err">——————————————————————————————————————————————–</span>
</span><span class='line'> <span class="k">Limit</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">144</span><span class="p">.</span><span class="mi">26</span><span class="p">..</span><span class="mi">144</span><span class="p">.</span><span class="mi">28</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">width</span><span class="o">=</span><span class="mi">784</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">18110</span><span class="p">.</span><span class="mi">231</span><span class="p">..</span><span class="mi">18110</span><span class="p">.</span><span class="mi">236</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>   <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">1358323</span> <span class="k">read</span><span class="o">=</span><span class="mi">399077</span>
</span><span class='line'>   <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Sort</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">144</span><span class="p">.</span><span class="mi">26</span><span class="p">..</span><span class="mi">144</span><span class="p">.</span><span class="mi">32</span> <span class="k">rows</span><span class="o">=</span><span class="mi">25</span> <span class="n">width</span><span class="o">=</span><span class="mi">784</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">18110</span><span class="p">.</span><span class="mi">229</span><span class="p">..</span><span class="mi">18110</span><span class="p">.</span><span class="mi">231</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>         <span class="n">Sort</span> <span class="k">Key</span><span class="p">:</span> <span class="p">(</span><span class="n">ts_rank_cd</span><span class="p">(</span><span class="n">fts</span><span class="p">,</span> <span class="err">‘</span><span class="s1">&#39;’tom’’ &amp;amp; ‘‘lane’’’::tsquery)) DESC</span>
</span><span class='line'><span class="s1">         Sort Method: top-N heapsort  Memory: 44kB</span>
</span><span class='line'><span class="s1">         Buffers: shared hit=1358323 read=399077</span>
</span><span class='line'><span class="s1">         -&amp;gt;  Bitmap Heap Scan on pgmail  (cost=44.20..143.72 rows=25 width=784) (actual time=70.267..17895.628 rows=224568 loops=1)</span>
</span><span class='line'><span class="s1">               Recheck Cond: (fts @@ ‘&#39;</span><span class="err">’</span><span class="n">tom</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="n">lane</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>               <span class="k">Rows</span> <span class="n">Removed</span> <span class="k">by</span> <span class="k">Index</span> <span class="k">Recheck</span><span class="p">:</span> <span class="mi">266782</span>
</span><span class='line'>               <span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">39841</span> <span class="n">lossy</span><span class="o">=</span><span class="mi">79307</span>
</span><span class='line'>               <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">1358323</span> <span class="k">read</span><span class="o">=</span><span class="mi">399077</span>
</span><span class='line'>               <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">pgmail_fts_idx</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">44</span><span class="p">.</span><span class="mi">19</span> <span class="k">rows</span><span class="o">=</span><span class="mi">25</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">63</span><span class="p">.</span><span class="mi">914</span><span class="p">..</span><span class="mi">63</span><span class="p">.</span><span class="mi">914</span> <span class="k">rows</span><span class="o">=</span><span class="mi">224568</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>                     <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">fts</span> <span class="o">@@</span> <span class="err">‘&#39;’</span><span class="n">tom</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="n">lane</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>                     <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">41</span> <span class="k">read</span><span class="o">=</span><span class="mi">102</span>
</span><span class='line'> <span class="n">Planning</span> <span class="n">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">131</span> <span class="n">ms</span>
</span><span class='line'> <span class="n">Execution</span> <span class="n">time</span><span class="p">:</span> <span class="mi">18110</span><span class="p">.</span><span class="mi">376</span> <span class="n">ms</span>
</span><span class='line'><span class="p">(</span><span class="mi">16</span> <span class="k">rows</span><span class="p">)(</span><span class="mi">15</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>How can we improve this situation?  If we would get results from index
pre-ordered by relevance, then we would be able to evaluate TOP-N query
without fetching the whole set of matching rows from heap.  Unfortunately,
that appears to be impossible for GIN index which stores only facts of occurence
of specifix terms in document.  But if we have additional infromation
about terms positions in the index, then it might work.  That information
would be enough to calculate relevance only basing on index information.</p>

<p><img class="no-border center 2x" src="/images/gin2rum.png" width="614" height="134"></p>

<p>Thus, I’ve proposed <a href="https://www.postgresql.org/message-id/CAPpHfdtSt47PpRQBK6OawHePLJk8PF-wNhswaUpre7_%2Bcc_kmA%40mail.gmail.com">proposed</a>
a set of patches to GIN index.  Some improvements were committed including
<a href="http://www.sai.msu.su/~megera/postgres/talks/329_PGCon2014-GIN.pdf">index compression and index search optimization</a>.  However, additional information storage for GIN
index wasn’t committed, because it alters GIN index structure too much.</p>

<p>Fortunately, we have
<a href="blog/2016/04/06/extensible-access-methods/">extensible index access methods</a>
in PostgreSQL 9.6.  And that enables us to implement things, which wasn’t
committed to GIN and more, as a separate index access method
<a href="https://github.com/postgrespro/rum">RUM</a>.  Using RUM, one can execute TOP-N
FTS query much faster without fetching all the matching rows from heap.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">EXPLAIN</span> <span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span> <span class="n">BUFFERS</span><span class="p">)</span>
</span><span class='line'><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">pgmail</span>
</span><span class='line'><span class="k">WHERE</span> <span class="n">fts</span> <span class="o">@@</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">)</span>
</span><span class='line'><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">fts</span> <span class="o">&amp;</span><span class="n">lt</span><span class="p">;</span><span class="o">=&amp;</span><span class="n">gt</span><span class="p">;</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">)</span>
</span><span class='line'><span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</span><span class='line'>                                                                <span class="n">QUERY</span> <span class="n">PLAN</span>
</span><span class='line'><span class="err">——————————————————————————————————————————————</span><span class="o">-</span>
</span><span class='line'> <span class="k">Limit</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">48</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">83</span><span class="p">.</span><span class="mi">25</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">width</span><span class="o">=</span><span class="mi">1523</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">242</span><span class="p">.</span><span class="mi">974</span><span class="p">..</span><span class="mi">248</span><span class="p">.</span><span class="mi">366</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>   <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">809</span> <span class="k">read</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span> <span class="n">temp</span> <span class="k">read</span><span class="o">=</span><span class="mi">187</span> <span class="n">written</span><span class="o">=</span><span class="mi">552</span>
</span><span class='line'>   <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">pgmail_idx</span> <span class="k">on</span> <span class="n">pgmail</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">48</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">193885</span><span class="p">.</span><span class="mi">14</span> <span class="k">rows</span><span class="o">=</span><span class="mi">54984</span> <span class="n">width</span><span class="o">=</span><span class="mi">1523</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">242</span><span class="p">.</span><span class="mi">972</span><span class="p">..</span><span class="mi">248</span><span class="p">.</span><span class="mi">358</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>         <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">fts</span> <span class="o">@@</span> <span class="err">‘</span><span class="s1">&#39;’tom’’ &amp;amp; ‘‘lane’’’::tsquery)</span>
</span><span class='line'><span class="s1">         Order By: (fts &amp;lt;=&amp;gt; ‘&#39;</span><span class="err">’</span><span class="n">tom</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="n">lane</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>         <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">809</span> <span class="k">read</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span> <span class="n">temp</span> <span class="k">read</span><span class="o">=</span><span class="mi">187</span> <span class="n">written</span><span class="o">=</span><span class="mi">552</span>
</span><span class='line'> <span class="n">Planning</span> <span class="n">time</span><span class="p">:</span> <span class="mi">14</span><span class="p">.</span><span class="mi">709</span> <span class="n">ms</span>
</span><span class='line'> <span class="n">Execution</span> <span class="n">time</span><span class="p">:</span> <span class="mi">312</span><span class="p">.</span><span class="mi">794</span> <span class="n">ms</span>
</span><span class='line'><span class="p">(</span><span class="mi">8</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>However, the problem persisted if you need to get total count of matching rows.
Then PostgreSQL executor still have to fetch all the matching rows from the
heap in order to check their visibility.  So, if you need total number of
resulting rows for pagination, then it’s still might be very slow.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">EXPLAIN</span> <span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span> <span class="n">BUFFERS</span><span class="p">)</span>
</span><span class='line'><span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">pgmail</span>
</span><span class='line'><span class="k">WHERE</span> <span class="n">fts</span> <span class="o">@@</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'>                                                              <span class="n">QUERY</span> <span class="n">PLAN</span>
</span><span class='line'><span class="err">————————————————————————————————————————————–</span>
</span><span class='line'> <span class="k">Aggregate</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">118931</span><span class="p">.</span><span class="mi">46</span><span class="p">..</span><span class="mi">118931</span><span class="p">.</span><span class="mi">47</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">36263</span><span class="p">.</span><span class="mi">708</span><span class="p">..</span><span class="mi">36263</span><span class="p">.</span><span class="mi">709</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>   <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">800692</span> <span class="k">read</span><span class="o">=</span><span class="mi">348338</span>
</span><span class='line'>   <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">pgmail</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">530</span><span class="p">.</span><span class="mi">19</span><span class="p">..</span><span class="mi">118799</span><span class="p">.</span><span class="mi">14</span> <span class="k">rows</span><span class="o">=</span><span class="mi">52928</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">74</span><span class="p">.</span><span class="mi">724</span><span class="p">..</span><span class="mi">36195</span><span class="p">.</span><span class="mi">946</span> <span class="k">rows</span><span class="o">=</span><span class="mi">224568</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>         <span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">fts</span> <span class="o">@@</span> <span class="err">‘</span><span class="s1">&#39;’tom’’ &amp;amp; ‘‘lane’’’::tsquery)</span>
</span><span class='line'><span class="s1">         Rows Removed by Index Recheck: 266782</span>
</span><span class='line'><span class="s1">         Heap Blocks: exact=39841 lossy=79307</span>
</span><span class='line'><span class="s1">         Buffers: shared hit=800692 read=348338</span>
</span><span class='line'><span class="s1">         -&amp;gt;  Bitmap Index Scan on pgmail_fts_idx  (cost=0.00..516.96 rows=52928 width=0) (actual time=67.467..67.467 rows=224568 loops=1)</span>
</span><span class='line'><span class="s1">               Index Cond: (fts @@ ‘&#39;</span><span class="err">’</span><span class="n">tom</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="n">lane</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>               <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">41</span> <span class="k">read</span><span class="o">=</span><span class="mi">102</span>
</span><span class='line'> <span class="n">Planning</span> <span class="n">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">210</span> <span class="n">ms</span>
</span><span class='line'> <span class="n">Execution</span> <span class="n">time</span><span class="p">:</span> <span class="mi">36263</span><span class="p">.</span><span class="mi">790</span> <span class="n">ms</span>
</span><span class='line'><span class="p">(</span><span class="mi">12</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

<p>For sure, some modern UIs use techniques like continuous scrolling which doesn’t
require to show full number of results to user.  Also, one can use planner
estimation for number of resulting rows which is typically matching the order
of magnitude to actual number of resulting rows.  But nevertheless, slow counting
of total results number was a problem for many of RUM users.</p>

<p><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">EXPLAIN</span> <span class="p">(</span><span class="k">ANALYZE</span><span class="p">,</span> <span class="n">BUFFERS</span><span class="p">)</span>
</span><span class='line'><span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">pgmail</span>
</span><span class='line'><span class="k">WHERE</span> <span class="n">fts</span> <span class="o">@@</span> <span class="n">plainto_tsquery</span><span class="p">(</span><span class="err">‘</span><span class="n">english</span><span class="err">’</span><span class="p">,</span> <span class="err">‘</span><span class="n">Tom</span> <span class="n">Lane</span><span class="err">’</span><span class="p">);</span>
</span><span class='line'>                                                              <span class="n">QUERY</span> <span class="n">PLAN</span>
</span><span class='line'><span class="err">————————————————————————————————————————————–</span>
</span><span class='line'> <span class="k">Aggregate</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">121794</span><span class="p">.</span><span class="mi">28</span><span class="p">..</span><span class="mi">121794</span><span class="p">.</span><span class="mi">29</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">132</span><span class="p">.</span><span class="mi">336</span><span class="p">..</span><span class="mi">132</span><span class="p">.</span><span class="mi">336</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>   <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">404</span>
</span><span class='line'>   <span class="o">-&amp;</span><span class="n">gt</span><span class="p">;</span>  <span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">pgmail</span>  <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">558</span><span class="p">.</span><span class="mi">13</span><span class="p">..</span><span class="mi">121656</span><span class="p">.</span><span class="mi">82</span> <span class="k">rows</span><span class="o">=</span><span class="mi">54984</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="n">time</span><span class="o">=</span><span class="mi">83</span><span class="p">.</span><span class="mi">676</span><span class="p">..</span><span class="mi">116</span><span class="p">.</span><span class="mi">889</span> <span class="k">rows</span><span class="o">=</span><span class="mi">224568</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span><span class='line'>         <span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">fts</span> <span class="o">@@</span> <span class="err">‘</span><span class="s1">&#39;’tom’’ &amp;amp; ‘‘lane’’’::tsquery)</span>
</span><span class='line'><span class="s1">         Heap Blocks: exact=119148</span>
</span><span class='line'><span class="s1">         Buffers: shared hit=404</span>
</span><span class='line'><span class="s1">         -&amp;gt;  Bitmap Index Scan on pgmail_idx  (cost=0.00..544.38 rows=54984 width=0) (actual time=61.459..61.459 rows=224568 loops=1)</span>
</span><span class='line'><span class="s1">               Index Cond: (fts @@ ‘&#39;</span><span class="err">’</span><span class="n">tom</span><span class="err">’’</span> <span class="o">&amp;</span><span class="n">amp</span><span class="p">;</span> <span class="err">‘‘</span><span class="n">lane</span><span class="err">’’’</span><span class="p">::</span><span class="n">tsquery</span><span class="p">)</span>
</span><span class='line'>               <span class="n">Buffers</span><span class="p">:</span> <span class="n">shared</span> <span class="n">hit</span><span class="o">=</span><span class="mi">398</span>
</span><span class='line'> <span class="n">Planning</span> <span class="n">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">183</span> <span class="n">ms</span>
</span><span class='line'> <span class="n">Execution</span> <span class="n">time</span><span class="p">:</span> <span class="mi">133</span><span class="p">.</span><span class="mi">885</span> <span class="n">ms</span>
</span><span class='line'><span class="p">(</span><span class="mi">11</span> <span class="k">rows</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure></p>

]]></content>
  </entry>
  
</feed>
