Roman Arutyunyan [Mon, 17 Jan 2022 11:39:04 +0000 (14:39 +0300)]
QUIC: introduced function ngx_quic_split_chain().
The function splits a buffer at given offset. The function is now
called from ngx_quic_read_chain() and ngx_quic_write_chain(), which
simplifies both functions.
Sergey Kandaurov [Thu, 13 Jan 2022 12:57:21 +0000 (15:57 +0300)]
QUIC: removed ngx_send_lowat() check for QUIC connections.
After 9ae239d2547d, ngx_quic_handle_write_event() no longer runs into
ngx_send_lowat() for QUIC connections, so the check became excessive.
It is assumed that external modules operating with SO_SNDLOWAT
(I'm not aware of any) should do this check on their own.
Roman Arutyunyan [Thu, 13 Jan 2022 08:23:53 +0000 (11:23 +0300)]
QUIC: fixed handling stream input buffers.
Previously, ngx_quic_write_chain() treated each input buffer as a memory
buffer, which is not always the case. Special buffers were not skipped, which
is especially important when hitting the input byte limit.
The issue manifested itself with ngx_quic_write_chain() returning a non-empty
chain consisting of a special last_buf buffer when called from QUIC stream
send_chain(). In order for this to happen, input byte limit should be equal to
the chain length, and the input chain should end with an empty last_buf buffer.
An easy way to achieve this is the following:
location /empty {
return 200;
}
When this non-empty chain was returned from send_chain(), it signalled to the
caller that input was blocked, while in fact it wasn't. This prevented HTTP
request from finalization, which prevented QUIC from sending STREAM FIN to
the client. The QUIC stream was then reset after a timeout.
Now special buffers are skipped and send_chain() returns NULL in the case
above, which signals to the caller a successful operation.
Also, original byte limit is now passed to ngx_quic_write_chain() from
send_chain() instead of actual chain length to make sure it's never zero.
Roman Arutyunyan [Tue, 11 Jan 2022 15:57:02 +0000 (18:57 +0300)]
QUIC: fixed handling STREAM FIN.
Previously, when a STREAM FIN frame with no data bytes was received after all
prior stream data were already read by the application layer, the frame was
ignored and eof was not reported to the application.
Maxim Dounin [Mon, 27 Dec 2021 16:49:26 +0000 (19:49 +0300)]
Support for sendfile(SF_NOCACHE).
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
Maxim Dounin [Mon, 27 Dec 2021 16:48:33 +0000 (19:48 +0300)]
Simplified sendfile(SF_NODISKIO) usage.
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
Vladimir Homutov [Mon, 27 Dec 2021 10:49:56 +0000 (13:49 +0300)]
QUIC: got rid of ngx_quic_create_temp_socket().
It was mostly copy of the ngx_quic_listen(). Now ngx_quic_listen() no
longer generates server id and increments seqnum. Instead, the server
id is generated when the socket is created.
The ngx_quic_alloc_socket() function is renamed to ngx_quic_create_socket().
Maxim Dounin [Fri, 24 Dec 2021 22:07:18 +0000 (01:07 +0300)]
Core: added NGX_REGEX_MULTILINE for 3rd party modules.
Notably, NAXSI is known to misuse ngx_regex_compile() with rc.options set
to PCRE_CASELESS | PCRE_MULTILINE. With PCRE2 support, and notably binary
compatibility changes, it is no longer possible to set PCRE[2]_MULTILINE
option without using proper interface. To facilitate correct usage,
this change adds the NGX_REGEX_MULTILINE option.
Maxim Dounin [Fri, 24 Dec 2021 22:07:16 +0000 (01:07 +0300)]
PCRE2 and PCRE binary compatibility.
With this change, dynamic modules using nginx regex interface can be used
regardless of the variant of the PCRE library nginx was compiled with.
If a module is compiled with different PCRE library variant, in case of
ngx_regex_exec() errors it will report wrong function name in error
messages. This is believed to be tolerable, given that fixing this will
require interface changes.
Maxim Dounin [Fri, 24 Dec 2021 22:07:15 +0000 (01:07 +0300)]
PCRE2 library support.
The PCRE2 library is now used by default if found, instead of the
original PCRE library. If needed for some reason, this can be disabled
with the --without-pcre2 configure option.
To make it possible to specify paths to the library and include files
via --with-cc-opt / --with-ld-opt, the library is first tested without
any additional paths and options. If this fails, the pcre2-config script
is used.
Similarly to the original PCRE library, it is now possible to build PCRE2
from sources with nginx configure, by using the --with-pcre= option.
It automatically detects if PCRE or PCRE2 sources are provided.
Note that compiling PCRE2 10.33 and later requires inttypes.h. When
compiling on Windows with MSVC, inttypes.h is only available starting
with MSVC 2013. In older versions some replacement needs to be provided
("echo '#include <stdint.h>' > pcre2-10.xx/src/inttypes.h" is good enough
for MSVC 2010).
Maxim Dounin [Fri, 24 Dec 2021 22:07:10 +0000 (01:07 +0300)]
Core: fixed ngx_pcre_studies cleanup.
If a configuration parsing fails for some reason, ngx_regex_module_init()
is not called, and ngx_pcre_studies remained set despite the fact that
the pool it was allocated from is already freed. This might result in
a segmentation fault during runtime regular expression compilation, such
as in SSI, for example, in the single process mode, or if a worker process
died and was respawned from a master process in such an inconsistent state.
Fix is to clear ngx_pcre_studies from the pool cleanup handler (which is
anyway used to free JIT-compiled patterns).
Roman Arutyunyan [Fri, 24 Dec 2021 15:39:22 +0000 (18:39 +0300)]
QUIC: refactored buffer allocation, spliting and freeing.
Previously, buffer lists was used to track used buffers. Now reference
counter is used instead. The new implementation is simpler and faster with
many buffer clones.
Roman Arutyunyan [Fri, 24 Dec 2021 15:17:23 +0000 (18:17 +0300)]
QUIC: refactored ngx_quic_order_bufs() and ngx_quic_split_bufs().
They are replaced with ngx_quic_write_chain() and ngx_quic_read_chain().
These functions represent the API to data buffering.
The first function adds data of given size at given offset to the buffer.
Now it returns the unwritten part of the chain similar to c->send_chain().
The second function returns data of given size from the beginning of the buffer.
Its second argument and return value are swapped compared to
ngx_quic_split_bufs() to better match ngx_quic_write_chain().
Added, returned and stored data are regular ngx_chain_t/ngx_buf_t chains.
Missing data is marked with b->sync flag.
The functions are now used in both send and recv data chains in QUIC streams.
Roman Arutyunyan [Fri, 24 Dec 2021 15:13:51 +0000 (18:13 +0300)]
QUIC: avoid excessive buffer allocations in stream output.
Previously, when a few bytes were send to a QUIC stream by the application, a
4K buffer was allocated for these bytes. Then a STREAM frame was created and
that entire buffer was used as data for that frame. The frame with the buffer
were in use up until the frame was acked by client. Meanwhile, when more
bytes were send to the stream, more buffers were allocated and assigned as
data to newer STREAM frames. In this scenario most buffer memory is unused.
Now the unused part of the stream output buffer is available for further
stream output while earlier parts of the buffer are waiting to be acked.
This is achieved by splitting the output buffer.
Vladimir Homutov [Mon, 13 Dec 2021 06:48:33 +0000 (09:48 +0300)]
QUIC: decoupled path state and limitation status.
The path validation status and anti-amplification limit status is actually
two different variables. It is possible that validating path should not
be limited (for example, when re-validating former path).
Vladimir Homutov [Mon, 13 Dec 2021 14:27:29 +0000 (17:27 +0300)]
QUIC: improved path validation.
Previously, path was considered valid during arbitrary selected 10m timeout
since validation. This is quite not what RFC 9000 says; the relevant
part is:
An endpoint MAY skip validation of a peer address if that
address has been seen recently.
The patch considers a path to be 'recently seen' if packets were received
during idle timeout. If a packet is received from the path that was seen
not so recently, such path is considered new, and anti-amplification
restrictions apply.
Roman Arutyunyan [Fri, 10 Dec 2021 16:43:50 +0000 (19:43 +0300)]
QUIC: simplified stream initialization.
After creation, a client stream is added to qc->streams.uninitialized queue.
After initialization it's removed from the queue. If a stream is never
initialized, it is freed in ngx_quic_close_streams(). Stream initializer
is now set as read event handler in stream connection.
Previously qc->streams.uninitialized was used only for delayed stream
initialization.
The change makes it possible not to handle separately the case of a new stream
in stream-related frame handlers. It makes these handlers simpler since new
streams and existing streams are now handled by the same code.
Maxim Dounin [Thu, 25 Nov 2021 19:02:10 +0000 (22:02 +0300)]
HTTP/2: fixed sendfile() aio handling.
With sendfile() in threads ("aio threads; sendfile on;"), client connection
can block on writing, waiting for sendfile() to complete. In HTTP/2 this
might result in the request hang, since an attempt to continue processing
in thread event handler will call request's write event handler, which
is usually stopped by ngx_http_v2_send_chain(): it does nothing if there
are no additional data and stream->queued is set. Further, HTTP/2 resets
stream's c->write->ready to 0 if writing blocks, so just fixing
ngx_http_v2_send_chain() is not enough.
The following tests currently fail: h2_keepalive.t, h2_priority.t,
h2_proxy_max_temp_file_size.t, h2.t, h2_trailers.t.
Similarly, sendfile() with AIO preloading on FreeBSD can block as well,
with similar results. This is, however, harder to reproduce, especially
on modern FreeBSD systems, since sendfile() usually does not return EBUSY.
Fix is to modify ngx_http_v2_send_chain() so it actually tries to send
data to the main connection when called, and to make sure that
c->write->ready is set by the relevant event handlers.
Maxim Dounin [Thu, 25 Nov 2021 19:02:05 +0000 (22:02 +0300)]
HTTP/2: fixed "task already active" with sendfile in threads.
With sendfile in threads, "task already active" alerts might appear in logs
if a write event happens on the main HTTP/2 connection, triggering a sendfile
in threads while another thread operation is already running. Observed
with "aio threads; aio_write on; sendfile on;" and with thread event handlers
modified to post a write event to the main HTTP/2 connection (though can
happen without any modifications).
Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate
aio operation, resulting in "second aio post" alerts. This is, however,
harder to reproduce, especially on modern FreeBSD systems, since sendfile()
usually does not return EBUSY.
Fix is to avoid starting a sendfile operation if other thread operation
is active by checking r->aio in the thread handler (and, similarly, in
aio preload handler). The added check also makes duplicate calls protection
redundant, so it is removed.
QUIC: clear SSL_OP_ENABLE_MIDDLEBOX_COMPAT on SSL context switch.
The SSL_OP_ENABLE_MIDDLEBOX_COMPAT option is provided by QuicTLS and enabled
by default in the newly created SSL contexts. SSL_set_quic_method() is used
to clear it, which is required for SSL handshake to work on QUIC connections.
Switching context in the ngx_http_ssl_servername() SNI callback overrides SSL
options from the new SSL context. This results in the option set again.
Fix is to explicitly clear it when switching to another SSL context.
Initially reported here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2021-November/063989.html
Directives that set transport parameters are removed from the configuration.
Corresponding values are derived from the quic configuration or initialized
to default. Whenever possible, quic configuration parameters are taken from
higher-level protocol settings, i.e. HTTP/3.
QUIC: fixed using of retired connection id (ticket #2289).
RFC 9000 19.16
The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT
refer to the Destination Connection ID field of the packet in which the
frame is contained.
Before the patch, the RETIRE_CONNECTION_ID frame was sent before switching
to the new client id. If retired client id was currently in use, this lead
to violation of the spec.
The c->udp->dgram may be NULL only if the quic connection was just
created: the ngx_event_udp_recvmsg() passes information about datagrams
to existing connections by providing information in c->udp.
If case of a new connection, c->udp is allocated by the QUIC code during
creation of quic connection (it uses c->sockaddr to initialize qsock->path).
Thus the check for qsock->path is excessive and can be read wrong, assuming
that other options possible, leading to warnings from clang static analyzer.
Sergey Kandaurov [Tue, 30 Nov 2021 11:30:59 +0000 (14:30 +0300)]
QUIC: simplified ngx_quic_send_alert() callback.
Removed sending CLOSE_CONNECTION directly to avoid duplicate frames,
since it is sent later again in SSL_do_handshake() error handling.
As such, removed redundant settings of error fields set elsewhere.
While here, improved debug message.
Vladimir Homutov [Thu, 18 Nov 2021 11:33:21 +0000 (14:33 +0300)]
QUIC: removed unnecessary closing of active/backup sockets.
All open sockets are stored in a queue. There is no need to close some
of them separately. If it happens that active and backup point to same
socket, double close may happen (leading to possible segfault).
Vladimir Homutov [Mon, 29 Nov 2021 08:51:14 +0000 (11:51 +0300)]
QUIC: fixed migration during NAT rebinding.
The RFC 9000 allows a packet from known CID arrive from unknown path:
These requirements regarding connection ID reuse apply only to the
sending of packets, as unintentional changes in path without a change
in connection ID are possible. For example, after a period of
network inactivity, NAT rebinding might cause packets to be sent on a
new path when the client resumes sending.
Before the patch, such packets were rejected with an error in the
ngx_quic_check_migration() function. Removing the check makes the
separate function excessive - remaining checks are early migration
check and "disable_active_migration" check. The latter is a transport
parameter sent to client and it should not be used by server.
The server should send "disable_active_migration" "if the endpoint does
not support active connection migration" (18.2). The support status depends
on nginx configuration: to have migration working with multiple workers,
you need bpf helper, available on recent Linux systems. The patch does
not set "disable_active_migration" automatically and leaves it for the
administrator. By default, active migration is enabled.
RFC 900 says that it is ok to migrate if the peer violates
"disable_active_migration" flag requirements:
If the peer violates this requirement,
the endpoint MUST either drop the incoming packets on that path without
generating a Stateless Reset
OR
proceed with path validation and allow the peer to migrate. Generating a
Stateless Reset or closing the connection would allow third parties in the
network to cause connections to close by spoofing or otherwise manipulating
observed traffic.
So, nginx adheres to the second option and proceeds to path validation.
Note:
The ngtcp2 may be used for testing both active migration and NAT rebinding:
Vladimir Homutov [Mon, 29 Nov 2021 08:49:09 +0000 (11:49 +0300)]
QUIC: refactored multiple QUIC packets handling.
Single UDP datagram may contain multiple QUIC datagrams. In order to
facilitate handling of such cases, 'first' flag in the ngx_quic_header_t
structure is introduced.
Vladimir Homutov [Thu, 18 Nov 2021 11:19:36 +0000 (14:19 +0300)]
QUIC: fixed handling of RETIRE_CONNECTION_ID frame.
Previously, the retired socket was not closed if it didn't match
active or backup.
New sockets could not be created (due to count limit), since retired socket
was not closed before calling ngx_quic_create_sockets().
When replacing retired socket, new socket is only requested after closing
old one, to avoid hitting the limit on the number of active connection ids.
Together with added restrictions, this fixes an issue when a current socket
could be closed during migration, recreated and erroneously reused leading
to null pointer dereference.
Roman Arutyunyan [Wed, 17 Nov 2021 20:07:51 +0000 (23:07 +0300)]
QUIC: handle DATA_BLOCKED frame from client.
Previously the frame was not handled and connection was closed with an error.
Now, after receiving this frame, global flow control is updated and new
flow control credit is sent to client.
Roman Arutyunyan [Wed, 17 Nov 2021 20:07:38 +0000 (23:07 +0300)]
QUIC: update stream flow control credit on STREAM_DATA_BLOCKED.
Previously, after receiving STREAM_DATA_BLOCKED, current flow control limit
was sent to client. Now, if the limit can be updated to the full window size,
it is updated and the new value is sent to client, otherwise nothing is sent.
The change lets client update flow control credit on demand. Also, it saves
traffic by not sending MAX_STREAM_DATA with the same value twice.
Roman Arutyunyan [Thu, 11 Nov 2021 16:07:00 +0000 (19:07 +0300)]
QUIC: reject streams which we could not create.
The reasons why a stream may not be created by server currently include hitting
worker_connections limit and memory allocation error. Previously in these
cases the entire QUIC connection was closed and all its streams were shut down.
Now the new stream is rejected and existing streams continue working.
To reject an HTTP/3 request stream, RESET_STREAM and STOP_SENDING with
H3_REQUEST_REJECTED error code are sent to client. HTTP/3 uni streams and
Stream streams are not rejected.
The variable contains a negotiated curve used for the handshake key
exchange process. Known curves are listed by their names, unknown
ones are shown in hex.
Note that for resumed sessions in TLSv1.2 and older protocols,
$ssl_curve contains the curve used during the initial handshake,
while in TLSv1.3 it contains the curve used during the session
resumption (see the SSL_get_negotiated_group manual page for
details).
The variable is only meaningful when using OpenSSL 3.0 and above.
With older versions the variable is empty.
Maxim Dounin [Fri, 29 Oct 2021 23:39:19 +0000 (02:39 +0300)]
Changed ngx_chain_update_chains() to test tag first (ticket #2248).
Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).
The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.
In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.
Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.
Maxim Dounin [Fri, 29 Oct 2021 17:21:57 +0000 (20:21 +0300)]
Changed default value of sendfile_max_chunk to 2m.
The "sendfile_max_chunk" directive is important to prevent worker
monopolization by fast connections. The 2m value implies maximum 200ms
delay with 100 Mbps links, 20ms delay with 1 Gbps links, and 2ms on
10 Gbps links. It also seems to be a good value for disks.
Maxim Dounin [Fri, 29 Oct 2021 17:21:54 +0000 (20:21 +0300)]
Upstream: sendfile_max_chunk support.
Previously, connections to upstream servers used sendfile() if it was
enabled, but never honored sendfile_max_chunk. This might result
in worker monopolization for a long time if large request bodies
are allowed.
Maxim Dounin [Fri, 29 Oct 2021 17:21:51 +0000 (20:21 +0300)]
Fixed sendfile() limit handling on Linux.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
Maxim Dounin [Fri, 29 Oct 2021 17:21:48 +0000 (20:21 +0300)]
Simplified sendfile_max_chunk handling.
Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.
This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).
Maxim Dounin [Fri, 29 Oct 2021 17:21:43 +0000 (20:21 +0300)]
Switched to using posted next events after sendfile_max_chunk.
Previously, 1 millisecond delay was used instead. In certain edge cases
this might result in noticeable performance degradation though, notably on
Linux with typical CONFIG_HZ=250 (so 1ms delay becomes 4ms),
sendfile_max_chunk 2m, and link speed above 2.5 Gbps.
Using posted next events removes the artificial delay and makes processing
fast in all cases.
Roman Arutyunyan [Thu, 28 Oct 2021 11:14:25 +0000 (14:14 +0300)]
Mp4: mp4_start_key_frame directive.
The directive enables including all frames from start time to the most recent
key frame in the result. Those frames are removed from presentation timeline
using mp4 edit lists.
Edit lists are currently supported by popular players and browsers such as
Chrome, Safari, QuickTime and ffmpeg. Among those not supporting them properly
is Firefox[1].
Based on a patch by Tracey Jaquith, Internet Archive.
The function updates the duration field of mdhd atom. Previously it was
updated in ngx_http_mp4_read_mdhd_atom(). The change makes it possible to
alter track duration as a result of processing track frames.
Roman Arutyunyan [Mon, 18 Oct 2021 11:48:11 +0000 (14:48 +0300)]
HTTP/3: send Stream Cancellation instruction.
As per quic-qpack-21:
When a stream is reset or reading is abandoned, the decoder emits a
Stream Cancellation instruction.
Previously the instruction was not sent. Now it's sent when closing QUIC
stream connection if dynamic table capacity is non-zero and eof was not
received from client. The latter condition means that a trailers section
may still be on its way from client and the stream needs to be cancelled.
Roman Arutyunyan [Mon, 18 Oct 2021 12:47:06 +0000 (15:47 +0300)]
HTTP/3: allowed QUIC stream connection reuse.
A QUIC stream connection is treated as reusable until first bytes of request
arrive, which is also when the request object is now allocated. A connection
closed as a result of draining, is reset with the error code
H3_REQUEST_REJECTED. Such behavior is allowed by quic-http-34:
Once a request stream has been opened, the request MAY be cancelled
by either endpoint. Clients cancel requests if the response is no
longer of interest; servers cancel requests if they are unable to or
choose not to respond.
When the server cancels a request without performing any application
processing, the request is considered "rejected." The server SHOULD
abort its response stream with the error code H3_REQUEST_REJECTED.
The client can treat requests rejected by the server as though they had
never been sent at all, thereby allowing them to be retried later.
Roman Arutyunyan [Mon, 18 Oct 2021 12:22:33 +0000 (15:22 +0300)]
HTTP/3: adjusted QUIC connection finalization.
When an HTTP/3 function returns an error in context of a QUIC stream, it's
this function's responsibility now to finalize the entire QUIC connection
with the right code, if required. Previously, QUIC connection finalization
could be done both outside and inside such functions. The new rule follows
a similar rule for logging, leads to cleaner code, and allows to provide more
details about the error.
While here, a few error cases are no longer treated as fatal and QUIC connection
is no longer finalized in these cases. A few other cases now lead to
stream reset instead of connection finalization.