aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorPeter Geoghegan <pg@bowt.ie>2020-02-29 12:10:17 -0800
committerPeter Geoghegan <pg@bowt.ie>2020-02-29 12:10:17 -0800
commit93ee38eade1b2b4964354b95b01b09e17d6f098d (patch)
tree5aabe12b03978f78d1364753844171d68294bbfb /doc/src
parent58c47ccfff20b8c125903482725c1dbfd30beade (diff)
downloadpostgresql-93ee38eade1b2b4964354b95b01b09e17d6f098d.tar.gz
postgresql-93ee38eade1b2b4964354b95b01b09e17d6f098d.zip
Teach pageinspect about nbtree deduplication.
Add a new bt_metap() column to display the metapage's allequalimage field. Also add three new columns to contrib/pageinspect's bt_page_items() function: * Add a boolean column ("dead") that displays the LP_DEAD bit value for each non-pivot tuple. * Add a TID column ("htid") that displays a single heap TID value for each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(), so comparable values are shown for pivot tuples, plain non-pivot tuples, and posting list tuples. * Add a TID array column ("tids") that displays TIDs from each tuple's posting list, if any. This works just like the "tids" column from pageinspect's gin_leafpage_items() function. No version bump for the pageinspect extension, since there hasn't been a stable Postgres release since the last version bump (the last bump was part of commit 58b4cb30). Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/pageinspect.sgml120
1 files changed, 85 insertions, 35 deletions
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 7e2e1487d79..fedc94e8b22 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags
test=# SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----------+-------
magic | 340322
-version | 3
+version | 4
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 582
last_cleanup_num_tuples | 1000
+allequalimage | f
</screen>
</para>
</listitem>
@@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
-live_items | 256
+live_items | 224
dead_items | 0
-avg_item_size | 12
+avg_item_size | 16
page_size | 8192
-free_size | 4056
+free_size | 3668
btpo_prev | 0
btpo_next | 0
btpo | 0
@@ -356,33 +357,75 @@ btpo_flags | 3
<function>bt_page_items</function> returns detailed information about
all of the items on a B-tree index page. For example:
<screen>
-test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
- itemoffset | ctid | itemlen | nulls | vars | data
-------------+---------+---------+-------+------+-------------
- 1 | (0,1) | 12 | f | f | 23 27 00 00
- 2 | (0,2) | 12 | f | f | 24 27 00 00
- 3 | (0,3) | 12 | f | f | 25 27 00 00
- 4 | (0,4) | 12 | f | f | 26 27 00 00
- 5 | (0,5) | 12 | f | f | 27 27 00 00
- 6 | (0,6) | 12 | f | f | 28 27 00 00
- 7 | (0,7) | 12 | f | f | 29 27 00 00
- 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
+ FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
+ itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
+------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
+ 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
+ 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
+ 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
+ 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
+ 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
+ 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
+ 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
+ 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
+ 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
+ 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
+ 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
+ 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
+ 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
+(13 rows)
</screen>
- In a B-tree leaf page, <structfield>ctid</structfield> points to a heap tuple.
- In an internal page, the block number part of <structfield>ctid</structfield>
- points to another page in the index itself, while the offset part
- (the second number) is ignored and is usually 1.
+ This is a B-tree leaf page. All tuples that point to the table
+ happen to be posting list tuples (all of which store a total of
+ 100 6 byte TIDs). There is also a <quote>high key</quote> tuple
+ at <literal>itemoffset</literal> number 1.
+ <structfield>ctid</structfield> is used to store encoded
+ information about each tuple in this example, though leaf page
+ tuples often store a heap TID directly in the
+ <structfield>ctid</structfield> field instead.
+ <structfield>tids</structfield> is the list of TIDs stored as a
+ posting list.
+ </para>
+ <para>
+ In an internal page (not shown), the block number part of
+ <structfield>ctid</structfield> is a <quote>downlink</quote>,
+ which is a block number of another page in the index itself.
+ The offset part (the second number) of
+ <structfield>ctid</structfield> stores encoded information about
+ the tuple, such as the number of columns present (suffix
+ truncation may have removed unneeded suffix columns). Truncated
+ columns are treated as having the value <quote>minus
+ infinity</quote>.
+ </para>
+ <para>
+ <structfield>htid</structfield> shows a heap TID for the tuple,
+ regardless of the underlying tuple representation. This value
+ may match <structfield>ctid</structfield>, or may be decoded
+ from the alternative representations used by posting list tuples
+ and tuples from internal pages. Tuples in internal pages
+ usually have the implementation level heap TID column truncated
+ away, which is represented as a NULL
+ <structfield>htid</structfield> value.
</para>
<para>
Note that the first item on any non-rightmost page (any page with
a non-zero value in the <structfield>btpo_next</structfield> field) is the
page's <quote>high key</quote>, meaning its <structfield>data</structfield>
serves as an upper bound on all items appearing on the page, while
- its <structfield>ctid</structfield> field is meaningless. Also, on non-leaf
- pages, the first real data item (the first item that is not a high
- key) is a <quote>minus infinity</quote> item, with no actual value
- in its <structfield>data</structfield> field. Such an item does have a valid
- downlink in its <structfield>ctid</structfield> field, however.
+ its <structfield>ctid</structfield> field does not point to
+ another block. Also, on internal pages, the first real data
+ item (the first item that is not a high key) reliably has every
+ column truncated away, leaving no actual value in its
+ <structfield>data</structfield> field. Such an item does have a
+ valid downlink in its <structfield>ctid</structfield> field,
+ however.
+ </para>
+ <para>
+ For more details about the structure of B-tree indexes, see
+ <xref linkend="btree-structure"/>. For more details about
+ deduplication and posting lists, see <xref
+ linkend="btree-deduplication"/>.
</para>
</listitem>
</varlistentry>
@@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
with <function>get_raw_page</function> should be passed as argument. So
the last example could also be rewritten like this:
<screen>
-test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
- itemoffset | ctid | itemlen | nulls | vars | data
-------------+---------+---------+-------+------+-------------
- 1 | (0,1) | 12 | f | f | 23 27 00 00
- 2 | (0,2) | 12 | f | f | 24 27 00 00
- 3 | (0,3) | 12 | f | f | 25 27 00 00
- 4 | (0,4) | 12 | f | f | 26 27 00 00
- 5 | (0,5) | 12 | f | f | 27 27 00 00
- 6 | (0,6) | 12 | f | f | 28 27 00 00
- 7 | (0,7) | 12 | f | f | 29 27 00 00
- 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
+ FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
+ itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
+------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
+ 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
+ 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
+ 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
+ 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
+ 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
+ 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
+ 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
+ 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
+ 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
+ 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
+ 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
+ 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
+ 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
+(13 rows)
</screen>
All the other details are the same as explained in the previous item.
</para>