diff options
author | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2011-02-07 23:46:51 +0200 |
---|---|---|
committer | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2011-02-08 00:09:08 +0200 |
commit | dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21 (patch) | |
tree | 93271101a38832fce7a6864e96fc9de65b0acff4 /doc/src | |
parent | c18f51da17d8cf01d62218e0404e18ba246bde54 (diff) | |
download | postgresql-dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21.tar.gz postgresql-dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21.zip |
Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/catalogs.sgml | 9 | ||||
-rw-r--r-- | doc/src/sgml/config.sgml | 70 | ||||
-rw-r--r-- | doc/src/sgml/high-availability.sgml | 9 | ||||
-rw-r--r-- | doc/src/sgml/indexam.sgml | 13 | ||||
-rw-r--r-- | doc/src/sgml/lobj.sgml | 2 | ||||
-rw-r--r-- | doc/src/sgml/mvcc.sgml | 542 | ||||
-rw-r--r-- | doc/src/sgml/ref/begin.sgml | 9 | ||||
-rw-r--r-- | doc/src/sgml/ref/lock.sgml | 6 | ||||
-rw-r--r-- | doc/src/sgml/ref/pg_dump.sgml | 35 | ||||
-rw-r--r-- | doc/src/sgml/ref/select.sgml | 2 | ||||
-rw-r--r-- | doc/src/sgml/ref/set_transaction.sgml | 60 | ||||
-rw-r--r-- | doc/src/sgml/ref/start_transaction.sgml | 11 | ||||
-rw-r--r-- | doc/src/sgml/spi.sgml | 2 |
13 files changed, 591 insertions, 179 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index eda82c5f34e..be132f2eb7b 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -491,6 +491,13 @@ </row> <row> + <entry><structfield>ampredlocks</structfield></entry> + <entry><type>bool</type></entry> + <entry></entry> + <entry>Does an index of this type manage fine-grained predicate locks?</entry> + </row> + + <row> <entry><structfield>amkeytype</structfield></entry> <entry><type>oid</type></entry> <entry><literal><link linkend="catalog-pg-type"><structname>pg_type</structname></link>.oid</literal></entry> @@ -6577,7 +6584,7 @@ <entry><type>text</type></entry> <entry></entry> <entry>Name of the lock mode held or desired by this process (see <xref - linkend="locking-tables">)</entry> + linkend="locking-tables"> and <xref linkend="xact-serializable">)</entry> </row> <row> <entry><structfield>granted</structfield></entry> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d2a6445af31..2d8396e4e99 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4456,6 +4456,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; <varlistentry id="guc-default-transaction-isolation" xreflabel="default_transaction_isolation"> <indexterm> <primary>transaction isolation level</primary> + <secondary>setting default</secondary> </indexterm> <indexterm> <primary><varname>default_transaction_isolation</> configuration parameter</primary> @@ -4481,6 +4482,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; <varlistentry id="guc-default-transaction-read-only" xreflabel="default_transaction_read_only"> <indexterm> <primary>read-only transaction</primary> + <secondary>setting default</secondary> </indexterm> <indexterm> <primary><varname>default_transaction_read_only</> configuration parameter</primary> @@ -4500,6 +4502,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; </listitem> </varlistentry> + <varlistentry id="guc-default-transaction-deferrable" xreflabel="default_transaction_deferrable"> + <indexterm> + <primary>deferrable transaction</primary> + <secondary>setting default</secondary> + </indexterm> + <indexterm> + <primary><varname>default_transaction_deferrable</> configuration parameter</primary> + </indexterm> + + <term><varname>default_transaction_deferrable</varname> (<type>boolean</type>)</term> + <listitem> + <para> + When running at the <literal>serializable</> isolation level, + a deferrable read-only SQL transaction may be delayed before + it is allowed to proceed. However, once it begins executing + it does not incur any of the overhead required to ensure + serializability; so serialization code will have no reason to + force it to abort because of concurrent updates, making this + option suitable for long-running read-only transactions. + </para> + + <para> + This parameter controls the default deferrable status of each + new transaction. It currently has no effect on read-write + transactions or those operating at isolation levels lower + than <literal>serializable</>. The default is <literal>off</>. + </para> + + <para> + Consult <xref linkend="sql-set-transaction"> for more information. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-session-replication-role" xreflabel="session_replication_role"> <term><varname>session_replication_role</varname> (<type>enum</type>)</term> <indexterm> @@ -5125,6 +5162,39 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' </listitem> </varlistentry> + <varlistentry id="guc-max-predicate-locks-per-transaction" xreflabel="max_predicate_locks_per_transaction"> + <term><varname>max_predicate_locks_per_transaction</varname> (<type>integer</type>)</term> + <indexterm> + <primary><varname>max_predicate_locks_per_transaction</> configuration parameter</primary> + </indexterm> + <listitem> + <para> + The shared predicate lock table tracks locks on + <varname>max_predicate_locks_per_transaction</varname> * (<xref + linkend="guc-max-connections"> + <xref + linkend="guc-max-prepared-transactions">) objects (e.g., tables); + hence, no more than this many distinct objects can be locked at + any one time. This parameter controls the average number of object + locks allocated for each transaction; individual transactions + can lock more objects as long as the locks of all transactions + fit in the lock table. This is <emphasis>not</> the number of + rows that can be locked; that value is unlimited. The default, + 64, has generally been sufficient in testing, but you might need to + raise this value if you have clients that touch many different + tables in a single serializable transaction. This parameter can + only be set at server start. + </para> + + <para> + Increasing this parameter might cause <productname>PostgreSQL</> + to request more <systemitem class="osname">System V</> shared + memory than your operating system's default configuration + allows. See <xref linkend="sysvipc"> for information on how to + adjust those parameters, if necessary. + </para> + </listitem> + </varlistentry> + </variablelist> </sect1> diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 94d5ae8d358..a89296905b4 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1916,6 +1916,15 @@ LOG: database system is ready to accept read only connections your setting of <varname>max_prepared_transactions</> is 0. </para> </listitem> + <listitem> + <para> + The Serializable transaction isolation level is not yet available in hot + standby. (See <xref linkend="xact-serializable"> and + <xref linkend="serializable-consistency"> for details.) + An attempt to set a transaction to the serializable isolation level in + hot standby mode will generate an error. + </para> + </listitem> </itemizedlist> </para> diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 241064a40fb..c7e997793db 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -705,6 +705,19 @@ amrestrpos (IndexScanDesc scan); it is only safe to use such scans with MVCC-compliant snapshots. </para> + <para> + When the <structfield>ampredlocks</> flag is not set, any scan using that + index access method within a serializable transaction will acquire a + non-blocking predicate lock on the full index. This will generate a + read-write conflict with the insert of any tuple into that index by a + concurrent serializable transaction. If certain patterns of read-write + conflicts are detected among a set of concurrent serializable + transactions, one of those transactions may be cancelled to protect data + integrity. When the flag is set, it indicates that the index access + method implements finer-grained predicate locking, which will tend to + reduce the frequency of such transaction cancellations. + </para> + </sect1> <sect1 id="index-unique-checks"> diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml index eebc0d6aca1..0e6b7058353 100644 --- a/doc/src/sgml/lobj.sgml +++ b/doc/src/sgml/lobj.sgml @@ -256,7 +256,7 @@ int lo_open(PGconn *conn, Oid lobjId, int mode); from a descriptor opened with <symbol>INV_WRITE</symbol> returns data that reflects all writes of other committed transactions as well as writes of the current transaction. This is similar to the behavior - of <literal>SERIALIZABLE</> versus <literal>READ COMMITTED</> transaction + of <literal>REPEATABLE READ</> versus <literal>READ COMMITTED</> transaction modes for ordinary SQL <command>SELECT</> commands. </para> diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 579425dad64..f71f978ef20 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -21,9 +21,21 @@ <title>Introduction</title> <indexterm> + <primary>Multiversion Concurrency Control</primary> + </indexterm> + + <indexterm> <primary>MVCC</primary> </indexterm> + <indexterm> + <primary>Serializable Snapshot Isolation</primary> + </indexterm> + + <indexterm> + <primary>SSI</primary> + </indexterm> + <para> <productname>PostgreSQL</productname> provides a rich set of tools for developers to manage concurrent access to data. Internally, @@ -37,7 +49,7 @@ could be caused by (other) concurrent transaction updates on the same data rows, providing <firstterm>transaction isolation</firstterm> for each database session. <acronym>MVCC</acronym>, by eschewing - explicit locking methodologies of traditional database systems, + the locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments. </para> @@ -48,12 +60,17 @@ <acronym>MVCC</acronym> locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. + <productname>PostgreSQL</productname> maintains this guarantee + even when providing the strictest level of transaction + isolation through the use of an innovative <firstterm>Serializable + Snapshot Isolation</firstterm> (<acronym>SSI</acronym>) level. </para> <para> Table- and row-level locking facilities are also available in - <productname>PostgreSQL</productname> for applications that cannot - adapt easily to <acronym>MVCC</acronym> behavior. However, proper + <productname>PostgreSQL</productname> for applications which don't + generally need full transaction isolation and prefer to explicitly + manage particular points of conflict. However, proper use of <acronym>MVCC</acronym> will generally provide better performance than locks. In addition, application-defined advisory locks provide a mechanism for acquiring locks that are not tied @@ -70,9 +87,21 @@ <para> The <acronym>SQL</acronym> standard defines four levels of - transaction isolation in terms of three phenomena that must be - prevented between concurrent transactions. These undesirable - phenomena are: + transaction isolation. The most strict is Serializable, + which is defined by the standard in a paragraph which says that any + concurrent execution of a set of Serializable transactions is guaranteed + to produce the same effect as running them one at a time in some order. + The other three levels are defined in terms of phenomena, resulting from + interaction between concurrent transactions, which must not occur at + each level. The standard notes that due to the definition of + Serializable, none of these phenomena are possible at that level. (This + is hardly surprising -- if the effect of the transactions must be + consistent with having been run one at a time, how could you see any + phenomena caused by interactions?) + </para> + + <para> + The phenomena which are prohibited are various levels are: <variablelist> <varlistentry> @@ -211,15 +240,16 @@ <para> In <productname>PostgreSQL</productname>, you can request any of the four standard transaction isolation levels. But internally, there are - only two distinct isolation levels, which correspond to the levels Read - Committed and Serializable. When you select the level Read - Uncommitted you really get Read Committed, and when you select - Repeatable Read you really get Serializable, so the actual + only three distinct isolation levels, which correspond to the levels Read + Committed, Repeatable Read, and Serializable. When you select the level Read + Uncommitted you really get Read Committed, and phantom reads are not possible + in the <productname>PostgreSQL</productname> implementation of Repeatable + Read, so the actual isolation level might be stricter than what you select. This is permitted by the SQL standard: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen. The reason that <productname>PostgreSQL</> - only provides two isolation levels is that this is the only + only provides three isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture. The behavior of the available isolation levels is detailed in the following subsections. @@ -238,6 +268,10 @@ <secondary>read committed</secondary> </indexterm> + <indexterm> + <primary>read committed</primary> + </indexterm> + <para> <firstterm>Read Committed</firstterm> is the default isolation level in <productname>PostgreSQL</productname>. When a transaction @@ -345,39 +379,46 @@ COMMIT; </para> </sect2> - <sect2 id="xact-serializable"> - <title>Serializable Isolation Level</title> + <sect2 id="xact-repeatable-read"> + <title>Repeatable Read Isolation Level</title> <indexterm> <primary>transaction isolation level</primary> - <secondary>serializable</secondary> + <secondary>repeatable read</secondary> + </indexterm> + + <indexterm> + <primary>repeatable read</primary> </indexterm> <para> - The <firstterm>Serializable</firstterm> isolation level provides the strictest transaction - isolation. This level emulates serial transaction execution, - as if transactions had been executed one after another, serially, - rather than concurrently. However, applications using this level must - be prepared to retry transactions due to serialization failures. + The <firstterm>Repeatable Read</firstterm> isolation level only sees + data committed before the transaction began; it never sees either + uncommitted data or changes committed during transaction execution + by concurrent transactions. (However, the query does see the + effects of previous updates executed within its own transaction, + even though they are not yet committed.) This is a stronger + guarantee than is required by the <acronym>SQL</acronym> standard + for this isolation level, and prevents all of the phenomena described + in <xref linkend="mvcc-isolevel-table">. As mentioned above, this is + specifically allowed by the standard, which only describes the + <emphasis>minimum</emphasis> protections each isolation level must + provide. </para> <para> - When a transaction is using the serializable level, - a <command>SELECT</command> query only sees data committed before the - transaction began; it never sees either uncommitted data or changes - committed - during transaction execution by concurrent transactions. (However, - the query does see the effects of previous updates - executed within its own transaction, even though they are not yet - committed.) This is different from Read Committed in that - a query in a serializable transaction - sees a snapshot as of the start of the <emphasis>transaction</>, - not as of the start + This level is different from Read Committed in that a query in a + repeatable read transaction sees a snapshot as of the start of the + <emphasis>transaction</>, not as of the start of the current query within the transaction. Thus, successive <command>SELECT</command> commands within a <emphasis>single</> transaction see the same data, i.e., they do not see changes made by other transactions that committed after their own transaction started. - (This behavior can be ideal for reporting applications.) + </para> + + <para> + Applications using this level must be prepared to retry transactions + due to serialization failures. </para> <para> @@ -386,22 +427,21 @@ COMMIT; behave the same as <command>SELECT</command> in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a - target - row might have already been updated (or deleted or locked) by + target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the - serializable transaction will wait for the first updating transaction to commit or + repeatable read transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, - then its effects are negated and the serializable transaction can proceed + then its effects are negated and the repeatable read transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) - then the serializable transaction will be rolled back with the message + then the repeatable read transaction will be rolled back with the message <screen> ERROR: could not serialize access due to concurrent update </screen> - because a serializable transaction cannot modify or lock rows changed by - other transactions after the serializable transaction began. + because a repeatable read transaction cannot modify or lock rows changed by + other transactions after the repeatable read transaction began. </para> <para> @@ -419,39 +459,70 @@ ERROR: could not serialize access due to concurrent update </para> <para> - The Serializable mode provides a rigorous guarantee that each - transaction sees a wholly consistent view of the database. However, - the application has to be prepared to retry transactions when concurrent - updates make it impossible to sustain the illusion of serial execution. - Since the cost of redoing complex transactions can be significant, - serializable mode is recommended only when updating transactions contain logic - sufficiently complex that they might give wrong answers in Read - Committed mode. Most commonly, Serializable mode is necessary when - a transaction executes several successive commands that must see - identical views of the database. + The Repeatable Read mode provides a rigorous guarantee that each + transaction sees a completely stable view of the database. However, + this view will not necessarily always be consistent with some serial + (one at a time) execution of concurrent transactions of the same level. + For example, even a read only transaction at this level may see a + control record updated to show that a batch has been completed but + <emphasis>not</emphasis> see one of the detail records which is logically + part of the batch because it read an earlier revision of the control + record. Attempts to enforce business rules by transactions running at + this isolation level are not likely to work correctly without careful use + of explicit locks to block conflicting transactions. </para> - <sect3 id="mvcc-serializability"> - <title>Serializable Isolation Versus True Serializability</title> + <note> + <para> + Prior to <productname>PostgreSQL</productname> version 9.1, a request + for the Serializable transaction isolation level provided exactly the + same behavior described here. To retain the legacy Serializable + behavior, Repeatable Read should now be requested. + </para> + </note> + </sect2> + + <sect2 id="xact-serializable"> + <title>Serializable Isolation Level</title> + + <indexterm> + <primary>transaction isolation level</primary> + <secondary>serializable</secondary> + </indexterm> <indexterm> - <primary>serializability</primary> + <primary>serializable</primary> </indexterm> <indexterm> <primary>predicate locking</primary> </indexterm> + <indexterm> + <primary>serialization anomaly</primary> + </indexterm> + <para> - The intuitive meaning (and mathematical definition) of - <quote>serializable</> execution is that any two successfully committed - concurrent transactions will appear to have executed strictly serially, - one after the other — although which one appeared to occur first might - not be predictable in advance. It is important to realize that forbidding - the undesirable behaviors listed in <xref linkend="mvcc-isolevel-table"> - is not sufficient to guarantee true serializability, and in fact - <productname>PostgreSQL</productname>'s Serializable mode <emphasis>does - not guarantee serializable execution in this sense</>. As an example, + The <firstterm>Serializable</firstterm> isolation level provides the strictest transaction + isolation. This level emulates serial transaction execution, + as if transactions had been executed one after another, serially, + rather than concurrently. However, like the Repeatable Read level, + applications using this level must + be prepared to retry transactions due to serialization failures. + In fact, this isolation level works exactly the same as Repeatable + Read except that it monitors for conditions which could make + execution of a concurrent set of serializable transactions behave + in a manner inconsistent with all possible serial (one at a time) + executions of those transactions. This monitoring does not + introduce any blocking beyond that present in repeatable read, but + there is some overhead to the monitoring, and detection of the + conditions which could cause a + <firstterm>serialization anomaly</firstterm> will trigger a + <firstterm>serialization failure</firstterm>. + </para> + + <para> + As an example, consider a table <structname>mytab</>, initially containing: <screen> class | value @@ -472,48 +543,137 @@ SELECT SUM(value) FROM mytab WHERE class = 1; SELECT SUM(value) FROM mytab WHERE class = 2; </screen> and obtains the result 300, which it inserts in a new row with - <structfield>class</><literal> = 1</>. Then both transactions commit. None of - the listed undesirable behaviors have occurred, yet we have a result - that could not have occurred in either order serially. If A had + <structfield>class</><literal> = 1</>. Then both transactions try to commit. + If either transaction were running at the Repeatable Read isolation level, + both would be allowed to commit; but since there is no serial order of execution + consistent with the result, using Serializable transactions will allow one + transaction to commit and and will roll the other back with this message: + +<screen> +ERROR: could not serialize access due to read/write dependencies among transactions +</screen> + + This is because if A had executed before B, B would have computed the sum 330, not 300, and similarly the other order would have resulted in a different sum computed by A. </para> <para> - To guarantee true mathematical serializability, it is necessary for - a database system to enforce <firstterm>predicate locking</>, which - means that a transaction cannot insert or modify a row that would - have matched the <literal>WHERE</> condition of a query in another concurrent - transaction. For example, once transaction A has executed the query - <literal>SELECT ... WHERE class = 1</>, a predicate-locking system - would forbid transaction B from inserting any new row with class 1 - until A has committed. - <footnote> - <para> - Essentially, a predicate-locking system prevents phantom reads - by restricting what is written, whereas MVCC prevents them by - restricting what is read. - </para> - </footnote> - Such a locking system is complex to - implement and extremely expensive in execution, since every session must - be aware of the details of every query executed by every concurrent - transaction. And this large expense is mostly wasted, since in - practice most applications do not do the sorts of things that could - result in problems. (Certainly the example above is rather contrived - and unlikely to represent real software.) For these reasons, - <productname>PostgreSQL</productname> does not implement predicate - locking. + To guarantee true serializability <productname>PostgreSQL</productname> + uses <firstterm>predicate locking</>, which means that it keeps locks + which allow it to determine when a write would have had an impact on + the result of a previous read from a concurrent transaction, had it run + first. In <productname>PostgreSQL</productname> these locks do not + cause any blocking and therefore can <emphasis>not</> play any part in + causing a deadlock. They are used to identify and flag dependencies + among concurrent serializable transactions which in certain combinations + can lead to serialization anomalies. In contrast, a Read Committed or + Repeatable Read transaction which wants to ensure data consistency may + need to take out a lock on an entire table, which could block other + users attempting to use that table, or it may use <literal>SELECT FOR + UPDATE</literal> or <literal>SELECT FOR SHARE</literal> which not only + can block other transactions but cause disk access. + </para> + + <para> + Predicate locks in <productname>PostgreSQL</productname>, like in most + other database systems, are based on data actually accessed by a + transaction. These will show up in the + <link linkend="view-pg-locks"><structname>pg_locks</structname></link> + system view with a <literal>mode</> of <literal>SIReadLock</>. The + particular locks + acquired during execution of a query will depend on the plan used by + the query, and multiple finer-grained locks (e.g., tuple locks) may be + combined into fewer coarser-grained locks (e.g., page locks) during the + course of the transaction to prevent exhaustion of the memory used to + track the locks. A <literal>READ ONLY</> transaction may be able to + release its SIRead locks before completion, if it detects that no + conflicts can still occur which could lead to a serialization anomaly. + In fact, <literal>READ ONLY</> transactions will often be able to + establish that fact at startup and avoid taking any predicate locks. + If you explicitly request a <literal>SERIALIZABLE READ ONLY DEFERRABLE</> + transaction, it will block until it can establish this fact. (This is + the <emphasis>only</> case where Serializable transactions block but + Repeatable Read transactions don't.) On the other hand, SIRead locks + often need to be kept past transaction commit, until overlapping read + write transactions complete. + </para> + + <para> + Consistent use of Serializable transactions can simplify development. + The guarantee that any set of concurrent serializable transactions will + have the same effect as if they were run one at a time means that if + you can demonstrate that a singe transaction, as written, will do the + right thing when run by itself, you can have confidence that it will + do the right thing in any mix of serializable transactions, even without + any information about what those other transactions might do. It is + important that an environment which uses this technique have a + generalized way of handling serialization failures (which always return + with a SQLSTATE value of '40001'), because it will be very hard to + predict exactly which transactions might contribute to the read/write + dependencies and need to be rolled back to prevent serialization + anomalies. The monitoring of read/write dependences has a cost, as does + the restart of transactions which are terminated with a serialization + failure, but balanced against the cost and blocking involved in use of + explicit locks and <literal>SELECT FOR UPDATE</> or <literal>SELECT FOR + SHARE</>, Serializable transactions are the best performance choice + for some environments. </para> <para> - In cases where the possibility of non-serializable execution - is a real hazard, problems can be prevented by appropriate use of - explicit locking. Further discussion appears in the following - sections. + For optimal performance when relying on Serializable transactions for + concurrency control, these issues should be considered: + + <itemizedlist> + <listitem> + <para> + Declare transactions as <literal>READ ONLY</> when possible. + </para> + </listitem> + <listitem> + <para> + Control the number of active connections, using a connection pool if + needed. This is always an important performance consideration, but + it can be paricularly important in a busy system using Serializable + transactions. + </para> + </listitem> + <listitem> + <para> + Don't put more into a single transaction than needed for integrity + purposes. + </para> + </listitem> + <listitem> + <para> + Don't leave connections dangling <quote>idle in transaction</quote> + longer than necessary. + </para> + </listitem> + <listitem> + <para> + Eliminate explicit locks, <literal>SELECT FOR UPDATE</>, and + <literal>SELECT FOR SHARE</> where no longer needed due to the + protections automatically provided by Serializable transactions. + </para> + </listitem> + </itemizedlist> </para> - </sect3> + + <warning> + <para> + Support for the Serializable transaction isolation level has not yet + been added to Hot Standby replication targets (described in + <xref linkend="hot-standby">). The strictest isolation level currently + supported in hot standby mode is Repeatable Read. While performing all + permanent database writes within Serializable transactions on the + master will ensure that all standbys will eventually reach a consistent + state, a Repeatable Read transaction run on the standby can sometimes + see a transient state which in inconsistent with any serial execution + of serializable transactions on the master. + </para> + </warning> </sect2> </sect1> @@ -1109,80 +1269,148 @@ SELECT pg_advisory_lock(q.id) FROM <title>Data Consistency Checks at the Application Level</title> <para> - Because readers in <productname>PostgreSQL</productname> - do not lock data, regardless of - transaction isolation level, data read by one transaction can be - overwritten by another concurrent transaction. In other words, - if a row is returned by <command>SELECT</command> it doesn't mean that - the row is still current at the instant it is returned (i.e., sometime - after the current query began). The row might have been modified or - deleted by an already-committed transaction that committed after - the <command>SELECT</command> started. - Even if the row is still valid <quote>now</>, it could be changed or - deleted - before the current transaction does a commit or rollback. + It is very difficult to enforce business rules regarding data integrity + using Read Committed transactions because the view of the data is + shifting with each statement, and even a single statement may not + restrict itself to the statement's snapshot if a write conflict occurs. </para> <para> - Another way to think about it is that each - transaction sees a snapshot of the database contents, and concurrently - executing transactions might very well see different snapshots. So the - whole concept of <quote>now</quote> is somewhat ill-defined anyway. - This is not normally - a big problem if the client applications are isolated from each other, - but if the clients can communicate via channels outside the database - then serious confusion might ensue. + While a Repeatable Read transaction has a stable view of the data + throughout its execution, there is a subtle issue with using + <acronym>MVCC</acronym> snapshots for data consistency checks, involving + something known as <firstterm>read/write conflicts</firstterm>. + If one transaction writes data and a concurrent transaction attempts + to read the same data (whether before or after the write), it cannot + see the work of the other transaction. The reader then appears to have + executed first regardless of which started first or which committed + first. If that is as far as it goes, there is no problem, but + if the reader also writes data which is read by a concurrent transaction + there is now a transaction which appears to have run before either of + the previously mentioned transactions. If the transaction which appears + to have executed last actually commits first, it is very easy for a + cycle to appear in a graph of the order of execution of the transactions. + When such a cycle appears, integrity checks will not work correctly + without some help. </para> <para> - To ensure the current validity of a row and protect it against - concurrent updates one must use <command>SELECT FOR UPDATE</command>, - <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK - TABLE</command> statement. (<command>SELECT FOR UPDATE</command> - and <command>SELECT FOR SHARE</command> lock just the - returned rows against concurrent updates, while <command>LOCK - TABLE</command> locks the whole table.) This should be taken into - account when porting applications to - <productname>PostgreSQL</productname> from other environments. + As mentioned in <xref linkend="xact-serializable">, Serializable + transactions are just Repeatable Read transactions which add + non-blocking monitoring for dangerous patterns of read/write conflicts. + When a pattern is detected which could cause a cycle in the apparent + order of execution, one of the transactions involved is rolled back to + break the cycle. </para> - <para> - Global validity checks require extra thought under <acronym>MVCC</acronym>. - For example, a banking application might wish to check that the sum of - all credits in one table equals the sum of debits in another table, - when both tables are being actively updated. Comparing the results of two - successive <literal>SELECT sum(...)</literal> commands will not work reliably in - Read Committed mode, since the second query will likely include the results - of transactions not counted by the first. Doing the two sums in a - single serializable transaction will give an accurate picture of only the - effects of transactions that committed before the serializable transaction - started — but one might legitimately wonder whether the answer is still - relevant by the time it is delivered. If the serializable transaction - itself applied some changes before trying to make the consistency check, - the usefulness of the check becomes even more debatable, since now it - includes some but not all post-transaction-start changes. In such cases - a careful person might wish to lock all tables needed for the check, - in order to get an indisputable picture of current reality. A - <literal>SHARE</> mode (or higher) lock guarantees that there are no - uncommitted changes in the locked table, other than those of the current - transaction. - </para> + <sect2 id="serializable-consistency"> + <title>Enforcing Consistency With Serializable Transactions</title> - <para> - Note also that if one is relying on explicit locking to prevent concurrent - changes, one should either use Read Committed mode, or in Serializable - mode be careful to obtain - locks before performing queries. A lock obtained by a - serializable transaction guarantees that no other transactions modifying - the table are still running, but if the snapshot seen by the - transaction predates obtaining the lock, it might predate some now-committed - changes in the table. A serializable transaction's snapshot is actually - frozen at the start of its first query or data-modification command - (<literal>SELECT</>, <literal>INSERT</>, - <literal>UPDATE</>, or <literal>DELETE</>), so - it is possible to obtain locks explicitly before the snapshot is - frozen. - </para> + <para> + If the Serializable transaction isolation level is used for all writes + and for all reads which need a consistent view of the data, no other + effort is required to ensure consistency. Software from other + environments which is written to use serializable transactions to + ensure consistency should <quote>just work</quote> in this regard in + <productname>PostgreSQL</productname>. + </para> + + <para> + When using this technique, it will avoid creating an unnecessary burden + for application programmers if the application software goes through a + framework which automatically retries transactions which are rolled + back with a serialization failure. It may be a good idea to set + <literal>default_transaction_isolation</> to <literal>serializable</>. + It would also be wise to take some action to ensure that no other + transaction isolation level is used, either inadvertently or to + subvert integrity checks, through checks of the transaction isolation + level in triggers. + </para> + + <para> + See <xref linkend="xact-serializable"> for performance suggestions. + </para> + + <warning> + <para> + This level of integrity protection using Serializable transactions + does not yet extend to hot standby mode (<xref linkend="hot-standby">). + Because of that, those using hot standby may want to use Repeatable + Read and explicit locking.on the master. + </para> + </warning> + </sect2> + + <sect2 id="non-serializable-consistency"> + <title>Enforcing Consistency With Explicit Blocking Locks</title> + + <para> + When non-serializable writes are possible, + to ensure the current validity of a row and protect it against + concurrent updates one must use <command>SELECT FOR UPDATE</command>, + <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK + TABLE</command> statement. (<command>SELECT FOR UPDATE</command> + and <command>SELECT FOR SHARE</command> lock just the + returned rows against concurrent updates, while <command>LOCK + TABLE</command> locks the whole table.) This should be taken into + account when porting applications to + <productname>PostgreSQL</productname> from other environments. + </para> + + <para> + Also of note to those converting from other environments is the fact + that <command>SELECT FOR UPDATE</command> does not ensure that a + concurrent transaction will not update or delete a selected row. + To do that in <productname>PostgreSQL</productname> you must actually + update the row, even if no values need to be changed. + <command>SELECT FOR UPDATE</command> <emphasis>temporarily blocks</emphasis> + other transactions from acquiring the same lock or executing an + <command>UPDATE</command> or <command>DELETE</command> which would + affect the locked row, but once the transaction holding this lock + commits or rolls back, a blocked transaction will proceed with the + conflicting operation unless an actual <command>UPDATE</command> of + the row was performed while the lock was held. + </para> + + <para> + Global validity checks require extra thought under + non-serializable <acronym>MVCC</acronym>. + For example, a banking application might wish to check that the sum of + all credits in one table equals the sum of debits in another table, + when both tables are being actively updated. Comparing the results of two + successive <literal>SELECT sum(...)</literal> commands will not work reliably in + Read Committed mode, since the second query will likely include the results + of transactions not counted by the first. Doing the two sums in a + single repeatable read transaction will give an accurate picture of only the + effects of transactions that committed before the repeatable read transaction + started — but one might legitimately wonder whether the answer is still + relevant by the time it is delivered. If the repeatable read transaction + itself applied some changes before trying to make the consistency check, + the usefulness of the check becomes even more debatable, since now it + includes some but not all post-transaction-start changes. In such cases + a careful person might wish to lock all tables needed for the check, + in order to get an indisputable picture of current reality. A + <literal>SHARE</> mode (or higher) lock guarantees that there are no + uncommitted changes in the locked table, other than those of the current + transaction. + </para> + + <para> + Note also that if one is relying on explicit locking to prevent concurrent + changes, one should either use Read Committed mode, or in Repeatable Read + mode be careful to obtain + locks before performing queries. A lock obtained by a + repeatable read transaction guarantees that no other transactions modifying + the table are still running, but if the snapshot seen by the + transaction predates obtaining the lock, it might predate some now-committed + changes in the table. A repeatable read transaction's snapshot is actually + frozen at the start of its first query or data-modification command + (<literal>SELECT</>, <literal>INSERT</>, + <literal>UPDATE</>, or <literal>DELETE</>), so + it is possible to obtain locks explicitly before the snapshot is + frozen. + </para> + </sect2> </sect1> <sect1 id="locking-indexes"> diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml index c4d90ef8c6b..4c6a2b4a2ce 100644 --- a/doc/src/sgml/ref/begin.sgml +++ b/doc/src/sgml/ref/begin.sgml @@ -27,6 +27,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</ ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY + [ NOT ] DEFERRABLE </synopsis> </refsynopsisdiv> @@ -57,7 +58,7 @@ BEGIN [ WORK | TRANSACTION ] [ <replaceable class="parameter">transaction_mode</ </para> <para> - If the isolation level or read/write mode is specified, the new + If the isolation level, read/write mode, or deferrable mode is specified, the new transaction has those characteristics, as if <xref linkend="sql-set-transaction"> was executed. @@ -136,6 +137,12 @@ BEGIN; </para> <para> + The <literal>DEFERRABLE</literal> + <replaceable class="parameter">transaction_mode</replaceable> + is a <productname>PostgreSQL</productname> language extension. + </para> + + <para> Incidentally, the <literal>BEGIN</literal> key word is used for a different purpose in embedded SQL. You are advised to be careful about the transaction semantics when porting database applications. diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index 86cd744ea4b..9b464580aae 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -67,10 +67,12 @@ LOCK [ TABLE ] [ ONLY ] <replaceable class="PARAMETER">name</replaceable> [, ... </para> <para> - To achieve a similar effect when running a transaction at the Serializable + To achieve a similar effect when running a transaction at the + <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> isolation level, you have to execute the <command>LOCK TABLE</> statement before executing any <command>SELECT</> or data modification statement. - A serializable transaction's view of data will be frozen when its first + A <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction's + view of data will be frozen when its first <command>SELECT</> or data modification statement begins. A <command>LOCK TABLE</> later in the transaction will still prevent concurrent writes — but it won't ensure that what the transaction reads corresponds to diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index f90d6699955..25dc2a70141 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -647,6 +647,41 @@ PostgreSQL documentation </varlistentry> <varlistentry> + <term><option>--serializable-deferrable</option></term> + <listitem> + <para> + Use a <literal>serializable</literal> transaction for the dump, to + ensure that the snapshot used is consistent with later database + states; but do this by waiting for a point in the transaction stream + at which no anomalies can be present, so that there isn't a risk of + the dump failing or causing other transactions to roll back with a + <literal>serialization_failure</literal>. See <xref linkend="mvcc"> + for more information about transaction isolation and concurrency + control. + </para> + + <para> + This option is not beneficial for a dump which is intended only for + disaster recovery. It could be useful for a dump used to load a + copy of the database for reporting or other read-only load sharing + while the original database continues to be updated. Without it the + dump may reflect a state which is not consistent with any serial + execution of the transactions eventually committed. For example, if + batch processing techniques are used, a batch may show as closed in + the dump without all of the items which are in the batch appearing. + </para> + + <para> + This option will make no difference if there are no read-write + transactions active when pg_dump is started. If read-write + transactions are active, the start of the dump may be delayed for an + indeterminate length of time. Once running, performance with or + without the switch is the same. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>--no-tablespaces</option></term> <listitem> <para> diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 24f82497139..92e47d12791 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1144,7 +1144,7 @@ FOR SHARE [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ] has already locked a selected row or rows, <command>SELECT FOR UPDATE</command> will wait for the other transaction to complete, and will then lock and return the updated row (or no row, if the - row was deleted). Within a <literal>SERIALIZABLE</> transaction, + row was deleted). Within a <literal>REPEATABLE READ</> or <literal>SERIALIZABLE</> transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see <xref linkend="mvcc">. diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml index 57ab38b6858..2c57f45511b 100644 --- a/doc/src/sgml/ref/set_transaction.sgml +++ b/doc/src/sgml/ref/set_transaction.sgml @@ -15,6 +15,21 @@ <primary>SET TRANSACTION</primary> </indexterm> + <indexterm> + <primary>transaction isolation level</primary> + <secondary>setting</secondary> + </indexterm> + + <indexterm> + <primary>read-only transaction</primary> + <secondary>setting</secondary> + </indexterm> + + <indexterm> + <primary>deferrable transaction</primary> + <secondary>setting</secondary> + </indexterm> + <refsynopsisdiv> <synopsis> SET TRANSACTION <replaceable class="parameter">transaction_mode</replaceable> [, ...] @@ -24,6 +39,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY + [ NOT ] DEFERRABLE </synopsis> </refsynopsisdiv> @@ -42,8 +58,8 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa <para> The available transaction characteristics are the transaction - isolation level and the transaction access mode (read/write or - read-only). + isolation level, the transaction access mode (read/write or + read-only), and the deferrable mode. </para> <para> @@ -62,7 +78,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa </varlistentry> <varlistentry> - <term><literal>SERIALIZABLE</literal></term> + <term><literal>REPEATABLE READ</literal></term> <listitem> <para> All statements of the current transaction can only see rows committed @@ -71,14 +87,27 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa </para> </listitem> </varlistentry> + + <varlistentry> + <term><literal>SERIALIZABLE</literal></term> + <listitem> + <para> + All statements of the current transaction can only see rows committed + before the first query or data-modification statement was executed in + this transaction. If a pattern of reads and writes among concurrent + serializable transactions would create a situation which could not + have occurred for any serial (one-at-a-time) execution of those + transactions, one of them will be rolled back with a + <literal>serialization_failure</literal> <literal>SQLSTATE</literal>. + </para> + </listitem> + </varlistentry> </variablelist> - The SQL standard defines two additional levels, <literal>READ - UNCOMMITTED</literal> and <literal>REPEATABLE READ</literal>. + The SQL standard defines one additional level, <literal>READ + UNCOMMITTED</literal>. In <productname>PostgreSQL</productname> <literal>READ - UNCOMMITTED</literal> is treated as - <literal>READ COMMITTED</literal>, while <literal>REPEATABLE - READ</literal> is treated as <literal>SERIALIZABLE</literal>. + UNCOMMITTED</literal> is treated as <literal>READ COMMITTED</literal>. </para> <para> @@ -127,8 +156,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa <para> The session default transaction modes can also be set by setting the - configuration parameters <xref linkend="guc-default-transaction-isolation"> - and <xref linkend="guc-default-transaction-read-only">. + configuration parameters <xref linkend="guc-default-transaction-isolation">, + <xref linkend="guc-default-transaction-read-only">, and + <xref linkend="guc-default-transaction-deferrable">. (In fact <command>SET SESSION CHARACTERISTICS</command> is just a verbose equivalent for setting these variables with <command>SET</>.) This means the defaults can be set in the configuration file, via @@ -146,9 +176,7 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa isolation level in the standard. In <productname>PostgreSQL</productname> the default is ordinarily <literal>READ COMMITTED</literal>, but you can change it as - mentioned above. Because of lack of predicate locking, the - <literal>SERIALIZABLE</literal> level is not truly - serializable. See <xref linkend="mvcc"> for details. + mentioned above. </para> <para> @@ -159,6 +187,12 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa </para> <para> + The <literal>DEFERRABLE</literal> + <replaceable class="parameter">transaction_mode</replaceable> + is a <productname>PostgreSQL</productname> language extension. + </para> + + <para> The SQL standard requires commas between successive <replaceable class="parameter">transaction_modes</replaceable>, but for historical reasons <productname>PostgreSQL</productname> allows the commas to be diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml index ffa49762793..f25a3e9536e 100644 --- a/doc/src/sgml/ref/start_transaction.sgml +++ b/doc/src/sgml/ref/start_transaction.sgml @@ -27,6 +27,7 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY + [ NOT ] DEFERRABLE </synopsis> </refsynopsisdiv> @@ -34,8 +35,8 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable <title>Description</title> <para> - This command begins a new transaction block. If the isolation level or - read/write mode is specified, the new transaction has those + This command begins a new transaction block. If the isolation level, + read/write mode, or deferrable mode is specified, the new transaction has those characteristics, as if <xref linkend="sql-set-transaction"> was executed. This is the same as the <xref linkend="sql-begin"> command. </para> @@ -65,6 +66,12 @@ START TRANSACTION [ <replaceable class="parameter">transaction_mode</replaceable </para> <para> + The <literal>DEFERRABLE</literal> + <replaceable class="parameter">transaction_mode</replaceable> + is a <productname>PostgreSQL</productname> language extension. + </para> + + <para> The SQL standard requires commas between successive <replaceable class="parameter">transaction_modes</replaceable>, but for historical reasons <productname>PostgreSQL</productname> allows the commas to be diff --git a/doc/src/sgml/spi.sgml b/doc/src/sgml/spi.sgml index fcee74f6053..e2dec39244a 100644 --- a/doc/src/sgml/spi.sgml +++ b/doc/src/sgml/spi.sgml @@ -340,7 +340,7 @@ SPI_execute("INSERT INTO foo SELECT * FROM bar", false, 5); <function>SPI_execute</function> increments the command counter and computes a new <firstterm>snapshot</> before executing each command in the string. The snapshot does not actually change if the - current transaction isolation level is <literal>SERIALIZABLE</>, but in + current transaction isolation level is <literal>SERIALIZABLE</> or <literal>REPEATABLE READ</>, but in <literal>READ COMMITTED</> mode the snapshot update allows each command to see the results of newly committed transactions from other sessions. This is essential for consistent behavior when the commands are modifying |