Be more predictable about reporting "lock timeout" vs "statement timeout".

If both timeout indicators are set when we arrive at ProcessInterrupts, we've historically just reported "lock timeout". However, some buildfarm members have been observed to fail isolationtester's timeouts test by reporting "lock timeout" when the statement timeout was expected to fire first. The cause seems to be that the process is allowed to sleep longer than expected (probably due to heavy machine load) so that the lock timeout happens before we reach the point of reporting the error, and then this arbitrary tiebreak rule does the wrong thing. We can improve matters by comparing the scheduled timeout times to decide which error to report. I had originally proposed greatly reducing the 1-second window between the two timeouts in the test cases. On reflection that is a bad idea, at least for the case where the lock timeout is expected to fire first, because that would assume that it takes negligible time to get from statement start to the beginning of the lock wait. Thus, this patch doesn't completely remove the risk of test failures on slow machines. Empirically, however, the case this handles is the one we are seeing in the buildfarm. The explanation may be that the other case requires the scheduler to take the CPU away from a busy process, whereas the case fixed here only requires the scheduler to not give the CPU back right away to a process that has been woken from a multi-second sleep (and, perhaps, has been swapped out meanwhile). Back-patch to 9.3 where the isolationtester timeouts test was added. Discussion: <8693.1464314819@sss.pgh.pa.us>
author: Tom Lane <tgl@sss.pgh.pa.us> 2016-05-27 10:40:20 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2016-05-27 10:40:20 -0400
commit: 9dd4178cec3ffd825a4bef558632b7cba3e426c5 (patch)
tree: 5009128d92f7b99bb444884136c4d8e0a3d7cec3 /src/backend/tcop/postgres.c
parent: d74048defcb1f48c5cc5a1b2a8aa0f7da8663394 (diff)
download: postgresql-9dd4178cec3ffd825a4bef558632b7cba3e426c5.tar.gz
postgresql-9dd4178cec3ffd825a4bef558632b7cba3e426c5.zip
1 files changed, 19 insertions, 4 deletions
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 68811f1f217..b185c1b5eb6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2909,6 +2909,9 @@ ProcessInterrupts(void)
 
 	if (QueryCancelPending)
 	{
+		bool		lock_timeout_occurred;
+		bool		stmt_timeout_occurred;
+
 		/*
 		 * Don't allow query cancel interrupts while reading input from the
 		 * client, because we might lose sync in the FE/BE protocol.  (Die
@@ -2929,17 +2932,29 @@ ProcessInterrupts(void)
 
 		/*
 		 * If LOCK_TIMEOUT and STATEMENT_TIMEOUT indicators are both set, we
-		 * prefer to report the former; but be sure to clear both.
+		 * need to clear both, so always fetch both.
 		 */
-		if (get_timeout_indicator(LOCK_TIMEOUT, true))
+		lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
+		stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
+
+		/*
+		 * If both were set, we want to report whichever timeout completed
+		 * earlier; this ensures consistent behavior if the machine is slow
+		 * enough that the second timeout triggers before we get here.  A tie
+		 * is arbitrarily broken in favor of reporting a lock timeout.
+		 */
+		if (lock_timeout_occurred && stmt_timeout_occurred &&
+			get_timeout_finish_time(STATEMENT_TIMEOUT) < get_timeout_finish_time(LOCK_TIMEOUT))
+			lock_timeout_occurred = false;		/* report stmt timeout */
+
+		if (lock_timeout_occurred)
 		{
-			(void) get_timeout_indicator(STATEMENT_TIMEOUT, true);
 			LockErrorCleanup();
 			ereport(ERROR,
 					(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
 					 errmsg("canceling statement due to lock timeout")));
 		}
-		if (get_timeout_indicator(STATEMENT_TIMEOUT, true))
+		if (stmt_timeout_occurred)
 		{
 			LockErrorCleanup();
 			ereport(ERROR,
author	Tom Lane <tgl@sss.pgh.pa.us>	2016-05-27 10:40:20 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2016-05-27 10:40:20 -0400
commit	9dd4178cec3ffd825a4bef558632b7cba3e426c5 (patch)
tree	5009128d92f7b99bb444884136c4d8e0a3d7cec3 /src/backend/tcop/postgres.c
parent	d74048defcb1f48c5cc5a1b2a8aa0f7da8663394 (diff)
download	postgresql-9dd4178cec3ffd825a4bef558632b7cba3e426c5.tar.gz postgresql-9dd4178cec3ffd825a4bef558632b7cba3e426c5.zip