Restructure libpq's handling of send failures.

Originally, if libpq got a failure (e.g., ECONNRESET) while trying to send data to the server, it would just report that and wash its hands of the matter. It was soon found that that wasn't a very pleasant way of coping with server-initiated disconnections, so we introduced a hack (pqHandleSendFailure) in the code that sends queries to make it peek ahead for server error reports before reporting the send failure. It now emerges that related cases can occur during connection setup; in particular, as of TLS 1.3 it's unsafe to assume that SSL connection failures will be reported by SSL_connect rather than during our first send attempt. We could have fixed that in a hacky way by applying pqHandleSendFailure after a startup packet send failure, but (a) pqHandleSendFailure explicitly disclaims suitability for use in any state except query startup, and (b) the problem still potentially exists for other send attempts in libpq. Instead, let's fix this in a more general fashion by eliminating pqHandleSendFailure altogether, and instead arranging to postpone all reports of send failures in libpq until after we've made an attempt to read and process server messages. The send failure won't be reported at all if we find a server message or detect input EOF. (Note: this removes one of the reasons why libpq typically overwrites, rather than appending to, conn->errorMessage: pqHandleSendFailure needed that behavior so that the send failure report would be replaced if we got a server message or read failure report. Eventually I'd like to get rid of that overwrite behavior altogether, but today is not that day. For the moment, pqSendSome is assuming that its callees will overwrite not append to conn->errorMessage.) Possibly this change should get back-patched someday; but it needs testing first, so let's not consider that till after v12 beta. Discussion: https://postgr.es/m/CAEepm=2n6Nv+5tFfe8YnkUm1fXgvxR0Mm1FoD+QKG-vLNGLyKg@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2019-03-19 16:20:20 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2019-03-19 16:20:28 -0400
commit: 1f39a1c0641531e0462a4822f2dba904c5d4d699 (patch)
tree: 125928ffda2fbd24e09508e4def3bbc1cc15b70c /src/interfaces/libpq/fe-misc.c
parent: 5e28b778bf9a5835e702277119c5f92b4dbab45e (diff)
download: postgresql-1f39a1c0641531e0462a4822f2dba904c5d4d699.tar.gz
postgresql-1f39a1c0641531e0462a4822f2dba904c5d4d699.zip
1 files changed, 45 insertions, 8 deletions
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index e5ef8d44bd4..ea4c9d2ee0c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -824,6 +824,13 @@ definitelyFailed:
  *
  * Return 0 on success, -1 on failure and 1 when not all data could be sent
  * because the socket would block and the connection is non-blocking.
+ *
+ * Upon write failure, conn->write_failed is set and the error message is
+ * saved in conn->write_err_msg, but we clear the output buffer and return
+ * zero anyway; this is because callers should soldier on until it's possible
+ * to read from the server and check for an error message.  write_err_msg
+ * should be reported only when we are unable to obtain a server error first.
+ * (Thus, a -1 result is returned only for an internal *read* failure.)
  */
 static int
 pqSendSome(PGconn *conn, int len)
@@ -832,13 +839,32 @@ pqSendSome(PGconn *conn, int len)
 	int			remaining = conn->outCount;
 	int			result = 0;
 
+	/*
+	 * If we already had a write failure, we will never again try to send data
+	 * on that connection.  Even if the kernel would let us, we've probably
+	 * lost message boundary sync with the server.  conn->write_failed
+	 * therefore persists until the connection is reset, and we just discard
+	 * all data presented to be written.
+	 */
+	if (conn->write_failed)
+	{
+		/* conn->write_err_msg should be set up already */
+		conn->outCount = 0;
+		return 0;
+	}
+
 	if (conn->sock == PGINVALID_SOCKET)
 	{
 		printfPQExpBuffer(&conn->errorMessage,
 						  libpq_gettext("connection not open\n"));
+		conn->write_failed = true;
+		/* Transfer error message to conn->write_err_msg, if possible */
+		/* (strdup failure is OK, we'll cope later) */
+		conn->write_err_msg = strdup(conn->errorMessage.data);
+		resetPQExpBuffer(&conn->errorMessage);
 		/* Discard queued data; no chance it'll ever be sent */
 		conn->outCount = 0;
-		return -1;
+		return 0;
 	}
 
 	/* while there's still data to send */
@@ -876,17 +902,24 @@ pqSendSome(PGconn *conn, int len)
 
 				default:
 					/* pqsecure_write set the error message for us */
+					conn->write_failed = true;
 
 					/*
-					 * We used to close the socket here, but that's a bad idea
-					 * since there might be unread data waiting (typically, a
-					 * NOTICE message from the backend telling us it's
-					 * committing hara-kiri...).  Leave the socket open until
-					 * pqReadData finds no more data can be read.  But abandon
-					 * attempt to send data.
+					 * Transfer error message to conn->write_err_msg, if
+					 * possible (strdup failure is OK, we'll cope later).
+					 *
+					 * Note: this assumes that pqsecure_write and its children
+					 * will overwrite not append to conn->errorMessage.  If
+					 * that's ever changed, we could remember the length of
+					 * conn->errorMessage at entry to this routine, and then
+					 * save and delete just what was appended.
 					 */
+					conn->write_err_msg = strdup(conn->errorMessage.data);
+					resetPQExpBuffer(&conn->errorMessage);
+
+					/* Discard queued data; no chance it'll ever be sent */
 					conn->outCount = 0;
-					return -1;
+					return 0;
 			}
 		}
 		else
@@ -921,6 +954,9 @@ pqSendSome(PGconn *conn, int len)
 			 * can do, and works pretty well in practice.  (The documentation
 			 * used to say that you only need to wait for write-ready, so
 			 * there are still plenty of applications like that out there.)
+			 *
+			 * Note that errors here don't result in write_failed becoming
+			 * set.
 			 */
 			if (pqReadData(conn) < 0)
 			{
@@ -956,6 +992,7 @@ pqSendSome(PGconn *conn, int len)
  *
  * Return 0 on success, -1 on failure and 1 when not all data could be sent
  * because the socket would block and the connection is non-blocking.
+ * (See pqSendSome comments about how failure should be handled.)
  */
 int
 pqFlush(PGconn *conn)
author	Tom Lane <tgl@sss.pgh.pa.us>	2019-03-19 16:20:20 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2019-03-19 16:20:28 -0400
commit	1f39a1c0641531e0462a4822f2dba904c5d4d699 (patch)
tree	125928ffda2fbd24e09508e4def3bbc1cc15b70c /src/interfaces/libpq/fe-misc.c
parent	5e28b778bf9a5835e702277119c5f92b4dbab45e (diff)
download	postgresql-1f39a1c0641531e0462a4822f2dba904c5d4d699.tar.gz postgresql-1f39a1c0641531e0462a4822f2dba904c5d4d699.zip