diff options
author | Peter Geoghegan <pg@bowt.ie> | 2020-02-29 12:10:17 -0800 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2020-02-29 12:10:17 -0800 |
commit | 93ee38eade1b2b4964354b95b01b09e17d6f098d (patch) | |
tree | 5aabe12b03978f78d1364753844171d68294bbfb /doc/src | |
parent | 58c47ccfff20b8c125903482725c1dbfd30beade (diff) | |
download | postgresql-93ee38eade1b2b4964354b95b01b09e17d6f098d.tar.gz postgresql-93ee38eade1b2b4964354b95b01b09e17d6f098d.zip |
Teach pageinspect about nbtree deduplication.
Add a new bt_metap() column to display the metapage's allequalimage
field. Also add three new columns to contrib/pageinspect's
bt_page_items() function:
* Add a boolean column ("dead") that displays the LP_DEAD bit value for
each non-pivot tuple.
* Add a TID column ("htid") that displays a single heap TID value for
each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(),
so comparable values are shown for pivot tuples, plain non-pivot tuples,
and posting list tuples.
* Add a TID array column ("tids") that displays TIDs from each tuple's
posting list, if any. This works just like the "tids" column from
pageinspect's gin_leafpage_items() function.
No version bump for the pageinspect extension, since there hasn't been a
stable Postgres release since the last version bump (the last bump was
part of commit 58b4cb30).
Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/pageinspect.sgml | 120 |
1 files changed, 85 insertions, 35 deletions
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml index 7e2e1487d79..fedc94e8b22 100644 --- a/doc/src/sgml/pageinspect.sgml +++ b/doc/src/sgml/pageinspect.sgml @@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags test=# SELECT * FROM bt_metap('pg_cast_oid_index'); -[ RECORD 1 ]-----------+------- magic | 340322 -version | 3 +version | 4 root | 1 level | 0 fastroot | 1 fastlevel | 0 oldest_xact | 582 last_cleanup_num_tuples | 1000 +allequalimage | f </screen> </para> </listitem> @@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1); -[ RECORD 1 ]-+----- blkno | 1 type | l -live_items | 256 +live_items | 224 dead_items | 0 -avg_item_size | 12 +avg_item_size | 16 page_size | 8192 -free_size | 4056 +free_size | 3668 btpo_prev | 0 btpo_next | 0 btpo | 0 @@ -356,33 +357,75 @@ btpo_flags | 3 <function>bt_page_items</function> returns detailed information about all of the items on a B-tree index page. For example: <screen> -test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1); - itemoffset | ctid | itemlen | nulls | vars | data -------------+---------+---------+-------+------+------------- - 1 | (0,1) | 12 | f | f | 23 27 00 00 - 2 | (0,2) | 12 | f | f | 24 27 00 00 - 3 | (0,3) | 12 | f | f | 25 27 00 00 - 4 | (0,4) | 12 | f | f | 26 27 00 00 - 5 | (0,5) | 12 | f | f | 27 27 00 00 - 6 | (0,6) | 12 | f | f | 28 27 00 00 - 7 | (0,7) | 12 | f | f | 29 27 00 00 - 8 | (0,8) | 12 | f | f | 2a 27 00 00 + test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids + FROM bt_page_items(get_raw_page('tenk2_hundred', 5)); + itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids +------------+-----------+---------+-------+------+-------------------------+------+--------+--------------------- + 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | | + 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"} + 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"} + 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"} + 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"} + 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"} + 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"} + 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"} + 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"} + 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"} + 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"} + 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"} + 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"} +(13 rows) </screen> - In a B-tree leaf page, <structfield>ctid</structfield> points to a heap tuple. - In an internal page, the block number part of <structfield>ctid</structfield> - points to another page in the index itself, while the offset part - (the second number) is ignored and is usually 1. + This is a B-tree leaf page. All tuples that point to the table + happen to be posting list tuples (all of which store a total of + 100 6 byte TIDs). There is also a <quote>high key</quote> tuple + at <literal>itemoffset</literal> number 1. + <structfield>ctid</structfield> is used to store encoded + information about each tuple in this example, though leaf page + tuples often store a heap TID directly in the + <structfield>ctid</structfield> field instead. + <structfield>tids</structfield> is the list of TIDs stored as a + posting list. + </para> + <para> + In an internal page (not shown), the block number part of + <structfield>ctid</structfield> is a <quote>downlink</quote>, + which is a block number of another page in the index itself. + The offset part (the second number) of + <structfield>ctid</structfield> stores encoded information about + the tuple, such as the number of columns present (suffix + truncation may have removed unneeded suffix columns). Truncated + columns are treated as having the value <quote>minus + infinity</quote>. + </para> + <para> + <structfield>htid</structfield> shows a heap TID for the tuple, + regardless of the underlying tuple representation. This value + may match <structfield>ctid</structfield>, or may be decoded + from the alternative representations used by posting list tuples + and tuples from internal pages. Tuples in internal pages + usually have the implementation level heap TID column truncated + away, which is represented as a NULL + <structfield>htid</structfield> value. </para> <para> Note that the first item on any non-rightmost page (any page with a non-zero value in the <structfield>btpo_next</structfield> field) is the page's <quote>high key</quote>, meaning its <structfield>data</structfield> serves as an upper bound on all items appearing on the page, while - its <structfield>ctid</structfield> field is meaningless. Also, on non-leaf - pages, the first real data item (the first item that is not a high - key) is a <quote>minus infinity</quote> item, with no actual value - in its <structfield>data</structfield> field. Such an item does have a valid - downlink in its <structfield>ctid</structfield> field, however. + its <structfield>ctid</structfield> field does not point to + another block. Also, on internal pages, the first real data + item (the first item that is not a high key) reliably has every + column truncated away, leaving no actual value in its + <structfield>data</structfield> field. Such an item does have a + valid downlink in its <structfield>ctid</structfield> field, + however. + </para> + <para> + For more details about the structure of B-tree indexes, see + <xref linkend="btree-structure"/>. For more details about + deduplication and posting lists, see <xref + linkend="btree-deduplication"/>. </para> </listitem> </varlistentry> @@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1); with <function>get_raw_page</function> should be passed as argument. So the last example could also be rewritten like this: <screen> -test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1)); - itemoffset | ctid | itemlen | nulls | vars | data -------------+---------+---------+-------+------+------------- - 1 | (0,1) | 12 | f | f | 23 27 00 00 - 2 | (0,2) | 12 | f | f | 24 27 00 00 - 3 | (0,3) | 12 | f | f | 25 27 00 00 - 4 | (0,4) | 12 | f | f | 26 27 00 00 - 5 | (0,5) | 12 | f | f | 27 27 00 00 - 6 | (0,6) | 12 | f | f | 28 27 00 00 - 7 | (0,7) | 12 | f | f | 29 27 00 00 - 8 | (0,8) | 12 | f | f | 2a 27 00 00 + test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids + FROM bt_page_items(get_raw_page('tenk2_hundred', 5)); + itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids +------------+-----------+---------+-------+------+-------------------------+------+--------+--------------------- + 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | | + 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"} + 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"} + 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"} + 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"} + 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"} + 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"} + 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"} + 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"} + 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"} + 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"} + 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"} + 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"} +(13 rows) </screen> All the other details are the same as explained in the previous item. </para> |