diff options
author | Peter Geoghegan <pg@bowt.ie> | 2019-03-20 10:12:19 -0700 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2019-03-20 10:12:19 -0700 |
commit | fab2502433870d98271ba8751f3794e2ed44140a (patch) | |
tree | 8c453424bb5406a50232cd58982f6fdbd04710c8 /src/backend/access/nbtree/nbtutils.c | |
parent | dd299df8189bd00fbe54b72c64f43b6af2ffeccd (diff) | |
download | postgresql-fab2502433870d98271ba8751f3794e2ed44140a.tar.gz postgresql-fab2502433870d98271ba8751f3794e2ed44140a.zip |
Consider secondary factors during nbtree splits.
Teach nbtree to give some consideration to how "distinguishing"
candidate leaf page split points are. This should not noticeably affect
the balance of free space within each half of the split, while still
making suffix truncation truncate away significantly more attributes on
average.
The logic for choosing a leaf split point now uses a fallback mode in
the case where the page is full of duplicates and it isn't possible to
find even a minimally distinguishing split point. When the page is full
of duplicates, the split should pack the left half very tightly, while
leaving the right half mostly empty. Our assumption is that logical
duplicates will almost always be inserted in ascending heap TID order
with v4 indexes. This strategy leaves most of the free space on the
half of the split that will likely be where future logical duplicates of
the same value need to be placed.
The number of cycles added is not very noticeable. This is important
because deciding on a split point takes place while at least one
exclusive buffer lock is held. We avoid using authoritative insertion
scankey comparisons to save cycles, unlike suffix truncation proper. We
use a faster binary comparison instead.
Note that even pg_upgrade'd v3 indexes make use of these optimizations.
Benchmarking has shown that even v3 indexes benefit, despite the fact
that suffix truncation will only truncate non-key attributes in INCLUDE
indexes. Grouping relatively similar tuples together is beneficial in
and of itself, since it reduces the number of leaf pages that must be
accessed by subsequent index scans.
Author: Peter Geoghegan
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WzmmoLNQOj9mAD78iQHfWLJDszHEDrAzGTUMG3mVh5xWPw@mail.gmail.com
Diffstat (limited to 'src/backend/access/nbtree/nbtutils.c')
-rw-r--r-- | src/backend/access/nbtree/nbtutils.c | 55 |
1 files changed, 55 insertions, 0 deletions
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c index 2f9f6e76015..6b59e16c4d5 100644 --- a/src/backend/access/nbtree/nbtutils.c +++ b/src/backend/access/nbtree/nbtutils.c @@ -22,6 +22,7 @@ #include "access/relscan.h" #include "miscadmin.h" #include "utils/array.h" +#include "utils/datum.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/rel.h" @@ -2296,6 +2297,60 @@ _bt_keep_natts(Relation rel, IndexTuple lastleft, IndexTuple firstright, } /* + * _bt_keep_natts_fast - fast bitwise variant of _bt_keep_natts. + * + * This is exported so that a candidate split point can have its effect on + * suffix truncation inexpensively evaluated ahead of time when finding a + * split location. A naive bitwise approach to datum comparisons is used to + * save cycles. + * + * The approach taken here usually provides the same answer as _bt_keep_natts + * will (for the same pair of tuples from a heapkeyspace index), since the + * majority of btree opclasses can never indicate that two datums are equal + * unless they're bitwise equal (once detoasted). Similarly, result may + * differ from the _bt_keep_natts result when either tuple has TOASTed datums, + * though this is barely possible in practice. + * + * These issues must be acceptable to callers, typically because they're only + * concerned about making suffix truncation as effective as possible without + * leaving excessive amounts of free space on either side of page split. + * Callers can rely on the fact that attributes considered equal here are + * definitely also equal according to _bt_keep_natts. + */ +int +_bt_keep_natts_fast(Relation rel, IndexTuple lastleft, IndexTuple firstright) +{ + TupleDesc itupdesc = RelationGetDescr(rel); + int keysz = IndexRelationGetNumberOfKeyAttributes(rel); + int keepnatts; + + keepnatts = 1; + for (int attnum = 1; attnum <= keysz; attnum++) + { + Datum datum1, + datum2; + bool isNull1, + isNull2; + Form_pg_attribute att; + + datum1 = index_getattr(lastleft, attnum, itupdesc, &isNull1); + datum2 = index_getattr(firstright, attnum, itupdesc, &isNull2); + att = TupleDescAttr(itupdesc, attnum - 1); + + if (isNull1 != isNull2) + break; + + if (!isNull1 && + !datumIsEqual(datum1, datum2, att->attbyval, att->attlen)) + break; + + keepnatts++; + } + + return keepnatts; +} + +/* * _bt_check_natts() -- Verify tuple has expected number of attributes. * * Returns value indicating if the expected number of attributes were found |