aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
diff options
context:
space:
mode:
authorJeff Davis <jdavis@postgresql.org>2014-05-04 13:18:55 -0700
committerJeff Davis <jdavis@postgresql.org>2014-05-06 19:27:43 -0700
commit35c0cd3b05b0be18dc2d049c33b38a2d13993ffe (patch)
tree5af0e7df971771484e8b7f976867ca89edf9889a /src/backend/executor
parent3a9d430af515e9dd8a9d34a4011367e667a66521 (diff)
downloadpostgresql-35c0cd3b05b0be18dc2d049c33b38a2d13993ffe.tar.gz
postgresql-35c0cd3b05b0be18dc2d049c33b38a2d13993ffe.zip
Improve comment for tricky aspect of index-only scans.
Index-only scans avoid taking a lock on the VM buffer, which would cause a lot of contention. To be correct, that requires some intricate assumptions that weren't completely documented in the previous comment. Reviewed by Robert Haas.
Diffstat (limited to 'src/backend/executor')
-rw-r--r--src/backend/executor/nodeIndexonlyscan.c34
1 files changed, 25 insertions, 9 deletions
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c55723608d6..afcd1ff353e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -88,15 +88,31 @@ IndexOnlyNext(IndexOnlyScanState *node)
* Note on Memory Ordering Effects: visibilitymap_test does not lock
* the visibility map buffer, and therefore the result we read here
* could be slightly stale. However, it can't be stale enough to
- * matter. It suffices to show that (1) there is a read barrier
- * between the time we read the index TID and the time we test the
- * visibility map; and (2) there is a write barrier between the time
- * some other concurrent process clears the visibility map bit and the
- * time it inserts the index TID. Since acquiring or releasing a
- * LWLock interposes a full barrier, this is easy to show: (1) is
- * satisfied by the release of the index buffer content lock after
- * reading the TID; and (2) is satisfied by the acquisition of the
- * buffer content lock in order to insert the TID.
+ * matter.
+ *
+ * We need to detect clearing a VM bit due to an insert right away,
+ * because the tuple is present in the index page but not visible. The
+ * reading of the TID by this scan (using a shared lock on the index
+ * buffer) is serialized with the insert of the TID into the index
+ * (using an exclusive lock on the index buffer). Because the VM bit
+ * is cleared before updating the index, and locking/unlocking of the
+ * index page acts as a full memory barrier, we are sure to see the
+ * cleared bit if we see a recently-inserted TID.
+ *
+ * Deletes do not update the index page (only VACUUM will clear out
+ * the TID), so the clearing of the VM bit by a delete is not
+ * serialized with this test below, and we may see a value that is
+ * significantly stale. However, we don't care about the delete right
+ * away, because the tuple is still visible until the deleting
+ * transaction commits or the statement ends (if it's our
+ * transaction). In either case, the lock on the VM buffer will have
+ * been released (acting as a write barrier) after clearing the
+ * bit. And for us to have a snapshot that includes the deleting
+ * transaction (making the tuple invisible), we must have acquired
+ * ProcArrayLock after that time, acting as a read barrier.
+ *
+ * It's worth going through this complexity to avoid needing to lock
+ * the VM buffer, which could cause significant contention.
*/
if (!visibilitymap_test(scandesc->heapRelation,
ItemPointerGetBlockNumber(tid),