diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2024-04-07 16:26:47 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2024-04-07 16:26:47 -0400 |
commit | 2daeba6a4e49f2ac4cc202fc9352c2a59e928a6c (patch) | |
tree | 1bfda8e54b3f7c645dd1119707fb0c24f2eee80e /doc/src | |
parent | 1973d9fb318316df16b0da89e6fe5f2c53473b38 (diff) | |
download | postgresql-2daeba6a4e49f2ac4cc202fc9352c2a59e928a6c.tar.gz postgresql-2daeba6a4e49f2ac4cc202fc9352c2a59e928a6c.zip |
Doc: show how to get the equivalent of LIMIT for UPDATE/DELETE.
Add examples showing use of a CTE and a self-join to perform
partial UPDATEs and DELETEs.
Corey Huinker, reviewed by Laurenz Albe
Discussion: https://postgr.es/m/CADkLM=caNEQsUwPWnfi2jR4ix99E0EJM_3jtcE-YjnEQC7Rssw@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/ref/delete.sgml | 24 | ||||
-rw-r--r-- | doc/src/sgml/ref/update.sgml | 40 |
2 files changed, 61 insertions, 3 deletions
diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml index 1b81b4e7d74..0b6fa005123 100644 --- a/doc/src/sgml/ref/delete.sgml +++ b/doc/src/sgml/ref/delete.sgml @@ -260,12 +260,32 @@ DELETE FROM tasks WHERE status = 'DONE' RETURNING *; </programlisting> </para> - <para> + <para> Delete the row of <structname>tasks</structname> on which the cursor <literal>c_tasks</literal> is currently positioned: <programlisting> DELETE FROM tasks WHERE CURRENT OF c_tasks; -</programlisting></para> +</programlisting> + </para> + + <para> + While there is no <literal>LIMIT</literal> clause + for <command>DELETE</command>, it is possible to get a similar effect + using the same method described in <link linkend="update-limit">the + documentation of <command>UPDATE</command></link>: +<programlisting> +WITH delete_batch AS ( + SELECT l.ctid FROM user_logs AS l + WHERE l.status = 'archived' + ORDER BY l.creation_date + FOR UPDATE + LIMIT 10000 +) +DELETE FROM user_logs AS dl + USING delete_batch AS del + WHERE dl.ctid = del.ctid; +</programlisting> + </para> </refsect1> <refsect1> diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 2ab24b0523e..babb34fa511 100644 --- a/doc/src/sgml/ref/update.sgml +++ b/doc/src/sgml/ref/update.sgml @@ -441,7 +441,45 @@ COMMIT; <literal>c_films</literal> is currently positioned: <programlisting> UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films; -</programlisting></para> +</programlisting> + </para> + + <para id="update-limit"> + Updates affecting many rows can have negative effects on system + performance, such as table bloat, increased replica lag, and increased + lock contention. In such situations it can make sense to perform the + operation in smaller batches, possibly with a <command>VACUUM</command> + operation on the table between batches. While there is + no <literal>LIMIT</literal> clause for <command>UPDATE</command>, it is + possible to get a similar effect through the use of + a <link linkend="queries-with">Common Table Expression</link> and a + self-join. With the standard <productname>PostgreSQL</productname> + table access method, a self-join on the system + column <link linkend="ddl-system-columns-ctid">ctid</link> is very + efficient: +<programlisting> +WITH exceeded_max_retries AS ( + SELECT w.ctid FROM work_item AS w + WHERE w.status = 'active' AND w.num_retries > 10 + ORDER BY w.retry_timestamp + FOR UPDATE + LIMIT 5000 +) +UPDATE work_item SET status = 'failed' + FROM exceeded_max_retries AS emr + WHERE work_item.ctid = emr.ctid; +</programlisting> + This command will need to be repeated until no rows remain to be updated. + Use of an <literal>ORDER BY</literal> clause allows the command to + prioritize which rows will be updated; it can also prevent deadlock + with other update operations if they use the same ordering. + If lock contention is a concern, then <literal>SKIP LOCKED</literal> + can be added to the <acronym>CTE</acronym> to prevent multiple commands + from updating the same row. However, then a + final <command>UPDATE</command> without <literal>SKIP LOCKED</literal> + or <literal>LIMIT</literal> will be needed to ensure that no matching + rows were overlooked. + </para> </refsect1> <refsect1> |