diff options
author | Tomas Vondra <tomas.vondra@postgresql.org> | 2020-04-06 21:33:28 +0200 |
---|---|---|
committer | Tomas Vondra <tomas.vondra@postgresql.org> | 2020-04-06 21:35:10 +0200 |
commit | d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da (patch) | |
tree | 66e2560c49ee43d13c29bd9f5760731001312738 /doc/src | |
parent | 3c8553547b1493c4afdb80393f4a47dbfa019a79 (diff) | |
download | postgresql-d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da.tar.gz postgresql-d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da.zip |
Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.
This has a number of benefits:
- Reduced memory consumption, because only a single group (determined by
values in the sorted prefix) needs to be kept in memory. This may also
eliminate the need to spill to disk.
- Lower startup cost, because Incremental Sort produce results after each
prefix group, which is beneficial for plans where startup cost matters
(like for example queries with LIMIT clause).
We consider both Sort and Incremental Sort, and decide based on costing.
The implemented algorithm operates in two different modes:
- Fetching a minimum number of tuples without check of equality on the
prefix keys, and sorting on all columns when safe.
- Fetching all tuples for a single prefix group and then sorting by
comparing only the remaining (non-prefix) keys.
We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.
This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.
There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.
Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/config.sgml | 14 | ||||
-rw-r--r-- | doc/src/sgml/perform.sgml | 42 |
2 files changed, 55 insertions, 1 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 114db38116c..f68c9922136 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4573,6 +4573,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" </listitem> </varlistentry> + <varlistentry id="guc-enable-incrementalsort" xreflabel="enable_incrementalsort"> + <term><varname>enable_incrementalsort</varname> (<type>boolean</type>) + <indexterm> + <primary><varname>enable_incrementalsort</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Enables or disables the query planner's use of incremental sort steps. + The default is <literal>on</literal>. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-enable-indexscan" xreflabel="enable_indexscan"> <term><varname>enable_indexscan</varname> (<type>boolean</type>) <indexterm> diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 58477ac83a8..0dfc3e80e29 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -291,7 +291,47 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42; often see this plan type for queries that fetch just a single row. It's also often used for queries that have an <literal>ORDER BY</literal> condition that matches the index order, because then no extra sorting step is needed - to satisfy the <literal>ORDER BY</literal>. + to satisfy the <literal>ORDER BY</literal>. In this example, adding + <literal>ORDER BY unique1</literal> would use the same plan because the + index already implicitly provides the requested ordering. + </para> + + <para> + The planner may implement an <literal>ORDER BY</literal> clause in several + ways. The above example shows that such an ordering clause may be + implemented implicitly. The planner may also add an explicit + <literal>sort</literal> step: + +<screen> +EXPLAIN SELECT * FROM tenk1 ORDER BY unique1; + QUERY PLAN +------------------------------------------------------------------- + Sort (cost=1109.39..1134.39 rows=10000 width=244) + Sort Key: unique1 + -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244) +</screen> + + If the a part of the plan guarantess an ordering on a prefix of the + required sort keys, then the planner may instead decide to use an + <literal>incremental sort</literal> step: + +<screen> +EXPLAIN SELECT * FROM tenk1 ORDER BY four, ten LIMIT 100; + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Limit (cost=521.06..538.05 rows=100 width=244) + -> Incremental Sort (cost=521.06..2220.95 rows=10000 width=244) + Sort Key: four, ten + Presorted Key: four + -> Index Scan using index_tenk1_on_four on tenk1 (cost=0.29..1510.08 rows=10000 width=244) +</screen> + + Compared to regular sorts, sorting incrementally allows returning tuples + before the entire result set has been sorted, which particularly enables + optimizations with <literal>LIMIT</literal> queries. It may also reduce + memory usage and the likelihood of spilling sorts to disk, but it comes at + the cost of the increased overhead of splitting the result set into multiple + sorting batches. </para> <para> |