From 62d712ecfd940f60e68bde5b6972b6859937c412 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20Herrera?= Date: Tue, 18 Mar 2025 18:56:11 +0100 Subject: Introduce squashing of constant lists in query jumbling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier Reviewed-by: Robert Haas Reviewed-by: Álvaro Herrera Reviewed-by: Sami Imseih Reviewed-by: Sutou Kouhei Reviewed-by: Tom Lane Reviewed-by: Michael Paquier Reviewed-by: Marcos Pegoraro Reviewed-by: Julien Rouhaud Reviewed-by: Zhihong Yu Tested-by: Yasuo Honda Tested-by: Sergei Kornilov Tested-by: Maciek Sakrejda Tested-by: Chengxi Sun Tested-by: Jakub Wartak Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com --- doc/src/sgml/config.sgml | 30 ++++++++++++++++++++++++++++++ doc/src/sgml/pgstatstatements.sgml | 24 +++++++++++++++++++++--- 2 files changed, 51 insertions(+), 3 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cd889142773..9e9c02cde83 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8701,6 +8701,36 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + query_id_squash_values (bool) + + query_id_squash_values configuration parameter + + + + + Specifies how a list of constants (e.g., for an IN + clause) contributes to the query identifier computation. + Normally, every element of such a list contributes to the query + identifier separately, which means that two queries that only differ + in the number of elements in such a list would get different query + identifiers. + If this parameter is on, a list of constants will not contribute + to the query identifier. This means that two queries whose only + difference is the number of constants in such a list are going to get the + same query identifier. + + + Only constants are affected; bind parameters do not benefit from this + functionality. The default value is off. + + + This parameter also affects how + generates normalized query texts. + + + + log_statement_stats (boolean) diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index e2ac1c2d501..f4e384e95ae 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -630,9 +630,27 @@ In some cases, queries with visibly different texts might get merged into a - single pg_stat_statements entry. Normally this will happen - only for semantically equivalent queries, but there is a small chance of - hash collisions causing unrelated queries to be merged into one entry. + single pg_stat_statements entry; as explained above, + this is expected to happen for semantically equivalent queries. + In addition, if query_id_squash_values is enabled + and the only difference between queries is the number of elements in a list + of constants, the list will get squashed down to a single element but shown + with a commented-out list indicator: + + +=# SET query_id_squash_values = on; +=# SELECT pg_stat_statements_reset(); +=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7); +=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7, 8); +=# SELECT query, calls FROM pg_stat_statements + WHERE query LIKE 'SELECT%'; +-[ RECORD 1 ]------------------------------ +query | SELECT * FROM test WHERE a IN ($1 /*, ... */) +calls | 2 + + + In addition to these cases, there is a small chance of hash collisions + causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.) -- cgit v1.2.3