aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/gist
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2010-12-03 23:49:06 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2010-12-03 23:49:06 -0500
commitb576757d7ee064ada5351c2e6a36c2f7234aa1d4 (patch)
tree5ca6b09b224f7e5a970bed4a9a58f59a47824521 /src/backend/access/gist
parent04910a3ad5cd2901558da2a4fad9a2e2819348aa (diff)
downloadpostgresql-b576757d7ee064ada5351c2e6a36c2f7234aa1d4.tar.gz
postgresql-b576757d7ee064ada5351c2e6a36c2f7234aa1d4.zip
Add external documentation for KNNGIST.
Diffstat (limited to 'src/backend/access/gist')
-rw-r--r--src/backend/access/gist/README94
1 files changed, 49 insertions, 45 deletions
diff --git a/src/backend/access/gist/README b/src/backend/access/gist/README
index 66d559d33ba..03069f02685 100644
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -20,6 +20,7 @@ The current implementation of GiST supports:
* Variable length keys
* Composite keys (multi-key)
+ * Ordered search (nearest-neighbor search)
* provides NULL-safe interface to GiST core
* Concurrency
* Recovery support via WAL logging
@@ -32,8 +33,8 @@ Marcel Kornaker:
The original algorithms were modified in several ways:
-* They should be adapted to PostgreSQL conventions. For example, the SEARCH
- algorithm was considerably changed, because in PostgreSQL function search
+* They had to be adapted to PostgreSQL conventions. For example, the SEARCH
+ algorithm was considerably changed, because in PostgreSQL the search function
should return one tuple (next), not all tuples at once. Also, it should
release page locks between calls.
* Since we added support for variable length keys, it's not possible to
@@ -41,12 +42,12 @@ The original algorithms were modified in several ways:
defined function picksplit doesn't have information about size of tuples
(each tuple may contain several keys as in multicolumn index while picksplit
could work with only one key) and pages.
-* We modified original INSERT algorithm for performance reason. In particular,
+* We modified original INSERT algorithm for performance reasons. In particular,
it is now a single-pass algorithm.
* Since the papers were theoretical, some details were omitted and we
- have to find out ourself how to solve some specific problems.
+ had to find out ourself how to solve some specific problems.
-Because of the above reasons, we have to revised interaction of GiST
+Because of the above reasons, we have revised the interaction of GiST
core and PostgreSQL WAL system. Moreover, we encountered (and solved)
a problem of uncompleted insertions when recovering after crash, which
was not touched in the paper.
@@ -54,46 +55,49 @@ was not touched in the paper.
Search Algorithm
----------------
-Function gettuple finds a tuple which satisfies the search
-predicate. It store their state and returns next tuple under
-subsequent calls. Stack contains page, its LSN and LSN of parent page
-and currentposition is saved between calls.
-
-gettuple(search-pred)
- if ( firsttime )
- push(stack, [root, 0, 0]) // page, LSN, parentLSN
- currentposition=0
- end
- ptr = top of stack
- while(true)
- latch( ptr->page, S-mode )
- if ( ptr->page->lsn != ptr->lsn )
- ptr->lsn = ptr->page->lsn
- currentposition=0
- if ( ptr->parentlsn < ptr->page->nsn )
- add to stack rightlink
- else
- currentposition++
- end
-
- while(true)
- currentposition = find_first_match( currentposition )
- if ( currentposition is invalid )
- unlatch( ptr->page )
- pop stack
- ptr = top of stack
- if (ptr is NULL)
- return NULL
- break loop
- else if ( ptr->page is leaf )
- unlatch( ptr->page )
- return tuple
- else
- add to stack child page
- end
- currentposition++
- end
- end
+The search code maintains a queue of unvisited items, where an "item" is
+either a heap tuple known to satisfy the search conditions, or an index
+page that is consistent with the search conditions according to inspection
+of its parent page's downlink item. Initially the root page is searched
+to find unvisited items in it. Then we pull items from the queue. A
+heap tuple pointer is just returned immediately; an index page entry
+causes that page to be searched, generating more queue entries.
+
+The queue is kept ordered with heap tuple items at the front, then
+index page entries, with any newly-added index page entry inserted
+before existing index page entries. This ensures depth-first traversal
+of the index, and in particular causes the first few heap tuples to be
+returned as soon as possible. That is helpful in case there is a LIMIT
+that requires only a few tuples to be produced.
+
+To implement nearest-neighbor search, the queue entries are augmented
+with distance data: heap tuple entries are labeled with exact distance
+from the search argument, while index-page entries must be labeled with
+the minimum distance that any of their children could have. Then,
+queue entries are retrieved in smallest-distance-first order, with
+entries having identical distances managed as stated in the previous
+paragraph.
+
+The search algorithm keeps an index page locked only long enough to scan
+its entries and queue those that satisfy the search conditions. Since
+insertions can occur concurrently with searches, it is possible for an
+index child page to be split between the time we make a queue entry for it
+(while visiting its parent page) and the time we actually reach and scan
+the child page. To avoid missing the entries that were moved to the right
+sibling, we detect whether a split has occurred by comparing the child
+page's NSN to the LSN that the parent had when visited. If it did, the
+sibling page is immediately added to the front of the queue, ensuring that
+its items will be scanned in the same order as if they were still on the
+original child page.
+
+As is usual in Postgres, the search algorithm only guarantees to find index
+entries that existed before the scan started; index entries added during
+the scan might or might not be visited. This is okay as long as all
+searches use MVCC snapshot rules to reject heap tuples newer than the time
+of scan start. In particular, this means that we need not worry about
+cases where a parent page's downlink key is "enlarged" after we look at it.
+Any such enlargement would be to add child items that we aren't interested
+in returning anyway.
Insert Algorithm