Roman Arutyunyan [Thu, 17 Feb 2022 19:38:42 +0000 (22:38 +0300)]
QUIC: fixed insertion at the end of buffer.
Previously, last buffer was tracked by keeping a pointer to the previous
chain link "next" field. When the previous buffer was split and then removed,
the pointer was no longer valid. Writing at this pointer resulted in broken
data chains.
Now last buffer is tracked by keeping a direct pointer to it.
Roman Arutyunyan [Mon, 14 Feb 2022 12:27:59 +0000 (15:27 +0300)]
QUIC: ngx_quic_buffer_t object.
The object is used instead of ngx_chain_t pointer for buffer operations like
ngx_quic_write_chain() and ngx_quic_read_chain(). These functions are renamed
to ngx_quic_write_buffer() and ngx_quic_read_buffer().
Now ngx_quic_stream_t is decoupled from ngx_connection_t in a way that it
can persist after connection is closed by application. During this period,
server is expecting stream final size from client for correct flow control.
Also, buffered output is sent to client as more flow control credit is granted.
Sergey Kandaurov [Tue, 15 Feb 2022 11:12:34 +0000 (14:12 +0300)]
QUIC: optimized datagram expansion with half-RTT tickets.
As shown in RFC 8446, section 2.2, Figure 3, and further specified in
section 4.6.1, BoringSSL releases session tickets in Application Data
(along with Finished) early, based on a precalculated client Finished
transcript, once client signalled early data in extensions.
Initially, frames are genereated and stored in ctx->frames.
Next, ngx_quic_output() collects frames to be sent in in ctx->sending.
On failure, ngx_quic_revert_sned() returns frames into ctx->frames.
On success, the ngx_quic_commit_send() moves ack-eliciting frames into
ctx->sent and frees non-ack-eliciting frames.
This function also updates in-flight bytes counter, so only actually sent
frames are accounted.
The counter is decremented in the following cases:
- acknowledgment is received
- packet was declared lost
- we are discarding context completely
In each of this cases frame is removed from ctx->sent queue and in-flight
counter is accordingly decremented.
The patch fixes the case of discarding context - only removing frames
from ctx->sent must be followed by in-flight bytes counter decrement,
otherwise cg->in_flight could experience type underflow.
The cd8018bc81a5 fixed unintended send of non-padded initial packets,
but failed to restore context properly: only processed contexts need
to be restored. As a consequence, a packet number could be restored
from uninitialized value.
Previously, the flag could be reset after send_chain() with a limit, even
though there was room for more data. The application then started waiting for
a write event notification, which never happened.
Now the wev->ready flag is only reset when flow control is exhausted.
SSL: logging level of "application data after close notify".
Such fatal errors are reported by OpenSSL 1.1.1, and similarly by BoringSSL,
if application data is encountered during SSL shutdown, which started to be
observed on the second SSL_shutdown() call after SSL shutdown fixes made in 09fb2135a589 (1.19.2). The error means that the client continues to send
application data after receiving the "close_notify" alert (ticket #2318).
Previously it was reported as SSL_shutdown() error of SSL_ERROR_SYSCALL.
With large http2_max_concurrent_streams or http2_max_concurrent_pushes, more
than 255 ngx_http_v2_node_t structures might be allocated, eventually leading
to h2c->closed_nodes overflow when closing corresponding streams. This will
in turn result in additional allocations in ngx_http_v2_get_node_by_id().
While mostly harmless, it can result in excessive memory usage by a HTTP/2
connection, notably in configurations with many keepalive_requests allowed.
Fix is to use ngx_uint_t for h2c->closed_nodes instead of unsigned:8.
The switch happens when received byte counter reaches stream final size.
Previously, this state was skipped. The stream went from SIZE_KNOWN to
DATA_READ when all bytes were read by application.
The change prevents STOP_SENDING frames from being sent when all data is
received from client, but not yet fully read by application.
QUIC: improved size calculation in ngx_quic_write_chain().
Previously, size was calculated based on the number of input bytes processed
by the function. Now only the copied bytes are considered. This prevents
overlapping buffers from contributing twice to the overall written size.
Maxim Dounin [Wed, 2 Feb 2022 22:44:38 +0000 (01:44 +0300)]
HTTP/2: made it possible to flush response headers (ticket #1743).
Response headers can be buffered in the SSL buffer. But stream's fake
connection buffered flag did not reflect this, so any attempts to flush
the buffer without sending additional data were stopped by the write filter.
It does not seem to be possible to reflect this in fc->buffered though, as
we never known if main connection's c->buffered corresponds to the particular
stream or not. As such, fc->buffered might prevent request finalization
due to sending data on some other stream.
Fix is to implement handling of flush buffers when the c->need_flush_buf
flag is set, similarly to the existing last buffer handling. The same
flag is now used for UDP sockets in the stream module instead of explicit
checking of c->type.
QUIC: fixed padding of initial packets in case of limited path.
Previously, non-padded initial packet could be sent as a result of the
following situation:
- initial queue is not empty (so padding to 1200 is required)
- handshake queue is not empty (so padding is to be added after h/s packet)
- path is limited
If serializing handshake packet would violate path limit, such packet was
omitted, and the non-padded initial packet was sent.
The fix is to avoid sending the packet at all in such case. This follows the
original intention introduced in c5155a0cb12f.
Maxim Dounin [Tue, 1 Feb 2022 13:29:28 +0000 (16:29 +0300)]
Cache: fixed race in ngx_http_file_cache_forced_expire().
During configuration reload two cache managers might exist for a short
time. If both tried to delete the same cache node, the "ignore long locked
inactive cache entry" alert appeared in logs. Additionally,
ngx_http_file_cache_forced_expire() might be also called by worker
processes, with similar results.
Fix is to ignore cache nodes being deleted, similarly to how it is
done in ngx_http_file_cache_expire() since 3755:76e3a93821b1. This
was somehow missed in 7002:ab199f0eb8e8, when ignoring long locked
cache entries was introduced in ngx_http_file_cache_forced_expire().
- wording in log->action is adjusted to match function names.
- connection close steps are made obvious and start with "quic close" prefix:
*1 quic close initiated rc:-4
*1 quic close silent drain:0 timedout:1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-4
*1 quic close completed
this makes it easy to understand if particular "close" record is an initial
cause or lasting process, or the final one.
- cases of close without quic connection now logged as "packet rejected":
*14 quic run
*14 quic packet rx long flags:ec version:1
*14 quic packet rx hs len:61
*14 quic packet rx dcid len:20 00000000000002c32f60e4aa2b90a64a39dc4228
*14 quic packet rx scid len:8 81190308612cd019
*14 quic expected initial, got handshake
*14 quic packet done rc:-1 level:hs decr:0 pn:0 perr:0
*14 quic packet rejected rc:-1, cleanup connection
*14 reusable connection: 0
this makes it easy to spot early packet rejection and avoid confuse with
quic connection closing (which in fact was not even created).
- packet processing summary now uses same prefix "quic packet done rc:"
- added debug to places where packet was rejected without any reason logged
Roman Arutyunyan [Mon, 31 Jan 2022 06:46:30 +0000 (09:46 +0300)]
HTTP/3: proper uni stream closure detection.
Previously, closure detection for server-initiated uni streams was not properly
implemented. Instead, HTTP/3 code relied on QUIC code posting the read event
and setting rev->error when it needed to close the stream. Then, regular
uni stream read handler called c->recv() and received error, which closed the
stream. This was an ad-hoc solution. If, for whatever reason, the read
handler was called earlier, c->recv() would return 0, which would also close
the stream.
Now server-initiated uni streams have a separate read event handler for
tracking stream closure. The handler calls c->recv(), which normally returns
0, but may return error in case of closure.
Sending the instruction is delayed until the end of the current event cycle.
Delaying the instruction is allowed by quic-qpack-21, section 2.2.2.3.
The goal is to reduce the amount of data sent back to client by accumulating
several inserts in one instruction and sometimes not sending the instruction at
all, if Section Acknowledgement was sent just before it.
Roman Arutyunyan [Mon, 31 Jan 2022 06:16:47 +0000 (09:16 +0300)]
QUIC: allowed main QUIC connection for some operations.
Operations like ngx_quic_open_stream(), ngx_http_quic_get_connection(),
ngx_http_v3_finalize_connection(), ngx_http_v3_shutdown_connection() used to
receive a QUIC stream connection. Now they can receive the main QUIC
connection as well. This is useful when calling them from a stream context.
Sergey Kandaurov [Wed, 26 Jan 2022 11:15:40 +0000 (14:15 +0300)]
QUIC: set to standard TLS codepoint after draft versions removal.
This is to ease transition with oldish BoringSSL versions,
the default for SSL_set_quic_use_legacy_codepoint() has been
flipped in BoringSSL a1d3bfb64fd7ef2cb178b5b515522ffd75d7b8c5.
Maxim Dounin [Mon, 24 Jan 2022 14:18:50 +0000 (17:18 +0300)]
SSL: always renewing tickets with TLSv1.3 (ticket #1892).
Chrome only uses TLS session tickets once with TLS 1.3, likely following
RFC 8446 Appendix C.4 recommendation. With OpenSSL, this works fine with
built-in session tickets, since these are explicitly renewed in case of
TLS 1.3 on each session reuse, but results in only two connections being
reused after an initial handshake when using ssl_session_ticket_key.
Fix is to always renew TLS session tickets in case of TLS 1.3 when using
ssl_session_ticket_key, similarly to how it is done by OpenSSL internally.
Maxim Dounin [Fri, 21 Jan 2022 21:28:51 +0000 (00:28 +0300)]
Contrib: vim syntax adjusted to save cpoptions (ticket #2276).
Line continuation as used in the syntax file might be broken if "compatible"
is set or "C" is added to cpoptions. Fix is to set the "cpoptions" option
to vim default value at script start and restore it later, see
":help use-cpo-save".
Roman Arutyunyan [Tue, 25 Jan 2022 06:45:50 +0000 (09:45 +0300)]
QUIC: fixed chain returned from ngx_quic_write_chain().
Previously, when input ended on a QUIC buffer boundary, input chain was not
advanced to the next buffer. As a result, ngx_quic_write_chain() returned
a chain with an empty buffer instead of NULL. This broke HTTP write filter,
preventing it from closing the HTTP request and eventually timing out.
Now input chain is always advanced to a buffer that has data, before checking
QUIC buffer boundary condition.
Vladimir Homutov [Thu, 20 Jan 2022 19:00:25 +0000 (22:00 +0300)]
QUIC: additional limit for probing packets.
RFC 9000, 9.3. Responding to Connection Migration:
An endpoint only changes the address to which it sends packets in
response to the highest-numbered non-probing packet.
The patch extends this requirement to probing packets. Although it may
seem excessive, it helps with mitigation of reply attacks (when an off-path
attacker has copied packet with PATH_CHALLENGE and uses different
addresses to exhaust available connection ids).
Vladimir Homutov [Wed, 19 Jan 2022 19:39:24 +0000 (22:39 +0300)]
QUIC: reworked migration handling.
The quic connection now holds active, backup and probe paths instead
of sockets. The number of migration paths is now limited and cannot
be inflated by a bad client or an attacker.
The client id is now associated with path rather than socket. This allows
to simplify processing of output and connection ids handling.
New migration abandons any previously started migrations. This allows to
free consumed client ids and request new for use in future migrations and
make progress in case when connection id limit is hit during migration.
A path now can be revalidated without losing its state.
The patch also fixes various issues with NAT rebinding case handling:
- paths are now validated (previously, there was no validation
and paths were left in limited state)
- attempt to reuse id on different path is now again verified
(this was broken in 40445fc7c403)
- former path is now validated in case of apparent migration
Roman Arutyunyan [Mon, 17 Jan 2022 11:39:04 +0000 (14:39 +0300)]
QUIC: introduced function ngx_quic_split_chain().
The function splits a buffer at given offset. The function is now
called from ngx_quic_read_chain() and ngx_quic_write_chain(), which
simplifies both functions.
Sergey Kandaurov [Thu, 13 Jan 2022 12:57:21 +0000 (15:57 +0300)]
QUIC: removed ngx_send_lowat() check for QUIC connections.
After 9ae239d2547d, ngx_quic_handle_write_event() no longer runs into
ngx_send_lowat() for QUIC connections, so the check became excessive.
It is assumed that external modules operating with SO_SNDLOWAT
(I'm not aware of any) should do this check on their own.
Roman Arutyunyan [Thu, 13 Jan 2022 08:23:53 +0000 (11:23 +0300)]
QUIC: fixed handling stream input buffers.
Previously, ngx_quic_write_chain() treated each input buffer as a memory
buffer, which is not always the case. Special buffers were not skipped, which
is especially important when hitting the input byte limit.
The issue manifested itself with ngx_quic_write_chain() returning a non-empty
chain consisting of a special last_buf buffer when called from QUIC stream
send_chain(). In order for this to happen, input byte limit should be equal to
the chain length, and the input chain should end with an empty last_buf buffer.
An easy way to achieve this is the following:
location /empty {
return 200;
}
When this non-empty chain was returned from send_chain(), it signalled to the
caller that input was blocked, while in fact it wasn't. This prevented HTTP
request from finalization, which prevented QUIC from sending STREAM FIN to
the client. The QUIC stream was then reset after a timeout.
Now special buffers are skipped and send_chain() returns NULL in the case
above, which signals to the caller a successful operation.
Also, original byte limit is now passed to ngx_quic_write_chain() from
send_chain() instead of actual chain length to make sure it's never zero.
Roman Arutyunyan [Tue, 11 Jan 2022 15:57:02 +0000 (18:57 +0300)]
QUIC: fixed handling STREAM FIN.
Previously, when a STREAM FIN frame with no data bytes was received after all
prior stream data were already read by the application layer, the frame was
ignored and eof was not reported to the application.
Maxim Dounin [Mon, 10 Jan 2022 23:23:49 +0000 (02:23 +0300)]
Avoid sending "Connection: keep-alive" when shutting down.
When a worker process is shutting down, keepalive is not used: this is checked
before the ngx_http_set_keepalive() call in ngx_http_finalize_connection().
Yet the "Connection: keep-alive" header was still sent, even if we know that
the worker process is shutting down, potentially resulting in additional
requests being sent to the connection which is going to be closed anyway.
While clients are expected to be able to handle asynchronous close events
(see ticket #1022), it is certainly possible to send the "Connection: close"
header instead, informing the client that the connection is going to be closed
and potentially saving some unneeded work.
With this change, we additionally check for worker process shutdown just
before sending response headers, and disable keepalive accordingly.
Maxim Dounin [Wed, 29 Dec 2021 22:08:46 +0000 (01:08 +0300)]
Events: fixed balancing between workers with EPOLLEXCLUSIVE.
Linux with EPOLLEXCLUSIVE usually notifies only the process which was first
to add the listening socket to the epoll instance. As a result most of the
connections are handled by the first worker process (ticket #2285). To fix
this, we re-add the socket periodically, so other workers will get a chance
to accept connections.
Maxim Dounin [Mon, 27 Dec 2021 16:49:26 +0000 (19:49 +0300)]
Support for sendfile(SF_NOCACHE).
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
Maxim Dounin [Mon, 27 Dec 2021 16:48:33 +0000 (19:48 +0300)]
Simplified sendfile(SF_NODISKIO) usage.
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
Vladimir Homutov [Mon, 27 Dec 2021 10:49:56 +0000 (13:49 +0300)]
QUIC: got rid of ngx_quic_create_temp_socket().
It was mostly copy of the ngx_quic_listen(). Now ngx_quic_listen() no
longer generates server id and increments seqnum. Instead, the server
id is generated when the socket is created.
The ngx_quic_alloc_socket() function is renamed to ngx_quic_create_socket().
Maxim Dounin [Fri, 24 Dec 2021 22:07:18 +0000 (01:07 +0300)]
Core: added NGX_REGEX_MULTILINE for 3rd party modules.
Notably, NAXSI is known to misuse ngx_regex_compile() with rc.options set
to PCRE_CASELESS | PCRE_MULTILINE. With PCRE2 support, and notably binary
compatibility changes, it is no longer possible to set PCRE[2]_MULTILINE
option without using proper interface. To facilitate correct usage,
this change adds the NGX_REGEX_MULTILINE option.
Maxim Dounin [Fri, 24 Dec 2021 22:07:16 +0000 (01:07 +0300)]
PCRE2 and PCRE binary compatibility.
With this change, dynamic modules using nginx regex interface can be used
regardless of the variant of the PCRE library nginx was compiled with.
If a module is compiled with different PCRE library variant, in case of
ngx_regex_exec() errors it will report wrong function name in error
messages. This is believed to be tolerable, given that fixing this will
require interface changes.
Maxim Dounin [Fri, 24 Dec 2021 22:07:15 +0000 (01:07 +0300)]
PCRE2 library support.
The PCRE2 library is now used by default if found, instead of the
original PCRE library. If needed for some reason, this can be disabled
with the --without-pcre2 configure option.
To make it possible to specify paths to the library and include files
via --with-cc-opt / --with-ld-opt, the library is first tested without
any additional paths and options. If this fails, the pcre2-config script
is used.
Similarly to the original PCRE library, it is now possible to build PCRE2
from sources with nginx configure, by using the --with-pcre= option.
It automatically detects if PCRE or PCRE2 sources are provided.
Note that compiling PCRE2 10.33 and later requires inttypes.h. When
compiling on Windows with MSVC, inttypes.h is only available starting
with MSVC 2013. In older versions some replacement needs to be provided
("echo '#include <stdint.h>' > pcre2-10.xx/src/inttypes.h" is good enough
for MSVC 2010).
Maxim Dounin [Fri, 24 Dec 2021 22:07:10 +0000 (01:07 +0300)]
Core: fixed ngx_pcre_studies cleanup.
If a configuration parsing fails for some reason, ngx_regex_module_init()
is not called, and ngx_pcre_studies remained set despite the fact that
the pool it was allocated from is already freed. This might result in
a segmentation fault during runtime regular expression compilation, such
as in SSI, for example, in the single process mode, or if a worker process
died and was respawned from a master process in such an inconsistent state.
Fix is to clear ngx_pcre_studies from the pool cleanup handler (which is
anyway used to free JIT-compiled patterns).
Roman Arutyunyan [Fri, 24 Dec 2021 15:39:22 +0000 (18:39 +0300)]
QUIC: refactored buffer allocation, spliting and freeing.
Previously, buffer lists was used to track used buffers. Now reference
counter is used instead. The new implementation is simpler and faster with
many buffer clones.