From 93ee38eade1b2b4964354b95b01b09e17d6f098d Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 29 Feb 2020 12:10:17 -0800 Subject: Teach pageinspect about nbtree deduplication. Add a new bt_metap() column to display the metapage's allequalimage field. Also add three new columns to contrib/pageinspect's bt_page_items() function: * Add a boolean column ("dead") that displays the LP_DEAD bit value for each non-pivot tuple. * Add a TID column ("htid") that displays a single heap TID value for each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(), so comparable values are shown for pivot tuples, plain non-pivot tuples, and posting list tuples. * Add a TID array column ("tids") that displays TIDs from each tuple's posting list, if any. This works just like the "tids" column from pageinspect's gin_leafpage_items() function. No version bump for the pageinspect extension, since there hasn't been a stable Postgres release since the last version bump (the last bump was part of commit 58b4cb30). Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com --- doc/src/sgml/pageinspect.sgml | 120 ++++++++++++++++++++++++++++++------------ 1 file changed, 85 insertions(+), 35 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml index 7e2e1487d79..fedc94e8b22 100644 --- a/doc/src/sgml/pageinspect.sgml +++ b/doc/src/sgml/pageinspect.sgml @@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags test=# SELECT * FROM bt_metap('pg_cast_oid_index'); -[ RECORD 1 ]-----------+------- magic | 340322 -version | 3 +version | 4 root | 1 level | 0 fastroot | 1 fastlevel | 0 oldest_xact | 582 last_cleanup_num_tuples | 1000 +allequalimage | f @@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1); -[ RECORD 1 ]-+----- blkno | 1 type | l -live_items | 256 +live_items | 224 dead_items | 0 -avg_item_size | 12 +avg_item_size | 16 page_size | 8192 -free_size | 4056 +free_size | 3668 btpo_prev | 0 btpo_next | 0 btpo | 0 @@ -356,33 +357,75 @@ btpo_flags | 3 bt_page_items returns detailed information about all of the items on a B-tree index page. For example: -test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1); - itemoffset | ctid | itemlen | nulls | vars | data -------------+---------+---------+-------+------+------------- - 1 | (0,1) | 12 | f | f | 23 27 00 00 - 2 | (0,2) | 12 | f | f | 24 27 00 00 - 3 | (0,3) | 12 | f | f | 25 27 00 00 - 4 | (0,4) | 12 | f | f | 26 27 00 00 - 5 | (0,5) | 12 | f | f | 27 27 00 00 - 6 | (0,6) | 12 | f | f | 28 27 00 00 - 7 | (0,7) | 12 | f | f | 29 27 00 00 - 8 | (0,8) | 12 | f | f | 2a 27 00 00 + test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids + FROM bt_page_items(get_raw_page('tenk2_hundred', 5)); + itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids +------------+-----------+---------+-------+------+-------------------------+------+--------+--------------------- + 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | | + 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"} + 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"} + 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"} + 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"} + 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"} + 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"} + 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"} + 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"} + 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"} + 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"} + 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"} + 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"} +(13 rows) - In a B-tree leaf page, ctid points to a heap tuple. - In an internal page, the block number part of ctid - points to another page in the index itself, while the offset part - (the second number) is ignored and is usually 1. + This is a B-tree leaf page. All tuples that point to the table + happen to be posting list tuples (all of which store a total of + 100 6 byte TIDs). There is also a high key tuple + at itemoffset number 1. + ctid is used to store encoded + information about each tuple in this example, though leaf page + tuples often store a heap TID directly in the + ctid field instead. + tids is the list of TIDs stored as a + posting list. + + + In an internal page (not shown), the block number part of + ctid is a downlink, + which is a block number of another page in the index itself. + The offset part (the second number) of + ctid stores encoded information about + the tuple, such as the number of columns present (suffix + truncation may have removed unneeded suffix columns). Truncated + columns are treated as having the value minus + infinity. + + + htid shows a heap TID for the tuple, + regardless of the underlying tuple representation. This value + may match ctid, or may be decoded + from the alternative representations used by posting list tuples + and tuples from internal pages. Tuples in internal pages + usually have the implementation level heap TID column truncated + away, which is represented as a NULL + htid value. Note that the first item on any non-rightmost page (any page with a non-zero value in the btpo_next field) is the page's high key, meaning its data serves as an upper bound on all items appearing on the page, while - its ctid field is meaningless. Also, on non-leaf - pages, the first real data item (the first item that is not a high - key) is a minus infinity item, with no actual value - in its data field. Such an item does have a valid - downlink in its ctid field, however. + its ctid field does not point to + another block. Also, on internal pages, the first real data + item (the first item that is not a high key) reliably has every + column truncated away, leaving no actual value in its + data field. Such an item does have a + valid downlink in its ctid field, + however. + + + For more details about the structure of B-tree indexes, see + . For more details about + deduplication and posting lists, see . @@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1); with get_raw_page should be passed as argument. So the last example could also be rewritten like this: -test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1)); - itemoffset | ctid | itemlen | nulls | vars | data -------------+---------+---------+-------+------+------------- - 1 | (0,1) | 12 | f | f | 23 27 00 00 - 2 | (0,2) | 12 | f | f | 24 27 00 00 - 3 | (0,3) | 12 | f | f | 25 27 00 00 - 4 | (0,4) | 12 | f | f | 26 27 00 00 - 5 | (0,5) | 12 | f | f | 27 27 00 00 - 6 | (0,6) | 12 | f | f | 28 27 00 00 - 7 | (0,7) | 12 | f | f | 29 27 00 00 - 8 | (0,8) | 12 | f | f | 2a 27 00 00 + test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids + FROM bt_page_items(get_raw_page('tenk2_hundred', 5)); + itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids +------------+-----------+---------+-------+------+-------------------------+------+--------+--------------------- + 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | | + 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"} + 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"} + 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"} + 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"} + 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"} + 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"} + 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"} + 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"} + 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"} + 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"} + 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"} + 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"} +(13 rows) All the other details are the same as explained in the previous item. -- cgit v1.2.3