If a client packet carrying a stream data frame is not acked due to packet loss,
the stream data is retransmitted later by client. It's also possible that the
retransmitted range is bigger than before due to more stream data being
available by then. If the original data was read out by the application,
there would be no read event triggered by the retransmitted frame, even though
it contains new data.
Sergey Kandaurov [Tue, 22 Nov 2022 14:05:35 +0000 (18:05 +0400)]
QUIC: avoid using C99 designated initializers.
They are not supported by MSVC till 2012.
SSL_QUIC_METHOD initialization is moved to run-time to preserve portability
among SSL library implementations, which allows to reduce its visibility.
Note using of a static storage to keep SSL_set_quic_method() reference valid.
Sergey Kandaurov [Tue, 22 Nov 2022 14:05:35 +0000 (18:05 +0400)]
QUIC: moved variable declaration to fix build with MSVC 2010.
Previously, ngx_quic_hkdf_t variables used declaration with assignment
in the middle of a function, which is not supported by MSVC 2010.
Fixing this also required to rewrite the ngx_quic_hkdf_set macro
and to switch to an explicit array size.
Previously, HTTP/3 stream connection didn't inherit the servername regex
from the main QUIC connection saved when processing SNI and using regular
expressions in server names. As a result, it didn't execute to set regex
captures when choosing the virtual server while parsing HTTP/3 headers.
The type field was added in 7999d3fbb765 at early stages of QUIC implementation
and was not initialized for default listen. Missing initialization resulted in
default listen socket creation error.
Sergey Kandaurov [Thu, 20 Oct 2022 12:21:07 +0000 (16:21 +0400)]
QUIC: removed compatibility with older BoringSSL API.
SSL_CIPHER_get_protocol_id() appeared in BoringSSL somewhere between
BORINGSSL_API_VERSION 12 and 13 for compatibility with OpenSSL 1.1.1.
It was adopted without a proper macro test, which remained unnoticed.
This justifies that such old BoringSSL API isn't widely used and its
support can be dropped.
While here, removed SSL_set_quic_use_legacy_codepoint() that became
useless after the default was flipped in BoringSSL over a year ago.
Sergey Kandaurov [Thu, 20 Oct 2022 12:21:06 +0000 (16:21 +0400)]
QUIC: support for setting QUIC methods with LibreSSL.
Setting QUIC methods is converted to use C99 designated initializers
for simplicity, as LibreSSL 3.6.0 has different SSL_QUIC_METHOD layout.
Additionally, only set_read_secret/set_write_secret callbacks are set.
Although they are preferred in LibreSSL over set_encryption_secrets,
better be on a safe side as LibreSSL has unexpectedly incompatible
set_encryption_secrets calling convention expressed in passing read
and write secrets split in separate calls, unlike this is documented
in old BoringSSL sources. To avoid introducing further changes for
the old API, it is simply disabled.
Sergey Kandaurov [Thu, 20 Oct 2022 12:21:06 +0000 (16:21 +0400)]
QUIC: using SSL_set_quic_early_data_enabled() only with QuicTLS.
This function is present in QuicTLS only. After SSL_READ_EARLY_DATA_SUCCESS
became visible in LibreSSL together with experimental QUIC API, this required
to revise the conditional compilation test to use more narrow macros.
Sergey Kandaurov [Thu, 20 Oct 2022 12:21:05 +0000 (16:21 +0400)]
QUIC: using native TLSv1.3 cipher suite constants.
After BoringSSL aligned[1] with OpenSSL on TLS1_3_CK_* macros, and
LibreSSL uses OpenSSL naming, our own variants can be dropped now.
Compatibility is preserved with libraries that lack these macros.
Additionally, transition to SSL_CIPHER_get_id() fixes build error
with LibreSSL that doesn't implement SSL_CIPHER_get_protocol_id().
Roman Arutyunyan [Wed, 19 Oct 2022 07:53:17 +0000 (10:53 +0300)]
Mp4: disabled duplicate atoms.
Most atoms should not appear more than once in a container. Previously,
this was not enforced by the module, which could result in worker process
crash, memory corruption and disclosure.
Maxim Dounin [Wed, 12 Oct 2022 17:14:57 +0000 (20:14 +0300)]
SSL: workaround for session timeout handling with TLSv1.3.
OpenSSL with TLSv1.3 updates the session creation time on session
resumption and keeps the session timeout unmodified, making it possible
to maintain the session forever, bypassing client certificate expiration
and revocation. To make sure session timeouts are actually used, we
now update the session creation time and reduce the session timeout
accordingly.
BoringSSL with TLSv1.3 ignores configured session timeouts and uses a
hardcoded timeout instead, 7 days. So we update session timeout to
the configured value as soon as a session is created.
Maxim Dounin [Wed, 12 Oct 2022 17:14:55 +0000 (20:14 +0300)]
SSL: optimized rotation of session ticket keys.
Instead of syncing keys with shared memory on each ticket operation,
the code now does this only when the worker is going to change expiration
of the current key, or going to switch to a new key: that is, usually
at most once per second.
To do so without races, the code maintains 3 keys: current, previous,
and next. If a worker will switch to the next key earlier, other workers
will still be able to decrypt new tickets, since they will be encrypted
with the next key.
Maxim Dounin [Wed, 12 Oct 2022 17:14:53 +0000 (20:14 +0300)]
SSL: automatic rotation of session ticket keys.
As long as ssl_session_cache in shared memory is configured, session ticket
keys are now automatically generated in shared memory, and rotated
periodically. This can be beneficial from forward secrecy point of view,
and also avoids increased CPU usage after configuration reloads.
This also helps BoringSSL to properly resume sessions in configurations
with multiple worker processes and no ssl_session_ticket_key directives,
as BoringSSL tries to automatically rotate session ticket keys and does
this independently in different worker processes, thus breaking session
resumption between worker processes.
Maxim Dounin [Wed, 12 Oct 2022 17:14:40 +0000 (20:14 +0300)]
SSL: single allocation in session cache on 32-bit platforms.
Given the present typical SSL session sizes, on 32-bit platforms it is
now beneficial to store all data in a single allocation, since rbtree
node + session id + ASN1 representation of a session takes 256 bytes of
shared memory (36 + 32 + 150 = about 218 bytes plus SNI server name).
Storing all data in a single allocation is beneficial for SNI names up to
about 40 characters long and makes it possible to store about 4000 sessions
in one megabyte (instead of about 3000 sessions now). This also slightly
simplifies the code.
Maxim Dounin [Wed, 12 Oct 2022 17:14:39 +0000 (20:14 +0300)]
SSL: explicit session id length checking.
Session ids are not expected to be longer than 32 bytes, but this is
theoretically possible with TLSv1.3, where session ids are essentially
arbitrary and sent as session tickets. Since on 64-bit platforms we
use fixed 32-byte buffer for session ids, added an explicit length check
to make sure the buffer is large enough.
Maxim Dounin [Wed, 12 Oct 2022 17:14:36 +0000 (20:14 +0300)]
SSL: reduced logging of session cache failures (ticket #621).
Session cache allocations might fail as long as the new session is different
in size from the one least recently used (and freed when the first allocation
fails). In particular, it might not be possible to allocate space for
sessions with client certificates, since they are noticeably bigger than
normal sessions.
To ensure such allocation failures won't clutter logs, logging level changed
to "warn", and logging is now limited to at most one warning per second.
Maxim Dounin [Wed, 12 Oct 2022 17:14:34 +0000 (20:14 +0300)]
SSL: disabled saving tickets to session cache.
OpenSSL tries to save TLSv1.3 sessions into session cache even when using
tickets for stateless session resumption, "because some applications just
want to know about the creation of a session". To avoid trashing session
cache with useless data, we do not save such sessions now.
Roman Arutyunyan [Wed, 12 Oct 2022 12:58:16 +0000 (16:58 +0400)]
PROXY protocol v2 TLV variables.
The variables have prefix $proxy_protocol_tlv_ and are accessible by name
and by type. Examples are: $proxy_protocol_tlv_0x01, $proxy_protocol_tlv_alpn.
Roman Arutyunyan [Mon, 10 Oct 2022 09:57:31 +0000 (13:57 +0400)]
Log only the first line of user input on PROXY protocol v1 error.
Previously, all received user input was logged. If a multi-line text was
received from client and logged, it could reduce log readability and also make
it harder to parse nginx log by scripts. The change brings to PROXY protocol
the same behavior that exists for HTTP request line in
ngx_http_log_error_handler().
Win32: fixed build on Windows with OpenSSL 3.0.x (ticket #2379).
SSL_sendfile() expects integer file descriptor as an argument, but nginx
uses OS file handles (HANDLE) to work with files on Windows, and passing
HANDLE instead of an integer correctly results in build failure. Since
SSL_sendfile() is not expected to work on Windows anyway, the code is now
disabled on Windows with appropriate compile-time checks.
Multiple C4306 warnings (conversion from 'type1' to 'type2' of greater size)
appear during 64-bit compilation with MSVC 2010 (and older) due to extensively
used constructs like "(void *) -1", so they were disabled.
In newer MSVC versions C4306 warnings were replaced with C4312 ones, and
these are not generated for such trivial type casts.
SSL: fixed incorrect usage of #if instead of #ifdef.
In 2014ed60f17f, "#if SSL_CTRL_SET_ECDH_AUTO" test was incorrectly used
instead of "#ifdef SSL_CTRL_SET_ECDH_AUTO". There is no practical
difference, since SSL_CTRL_SET_ECDH_AUTO evaluates to a non-zero numeric
value when defined, but anyway it's better to correctly test if the value
is defined.
Murilo Andrade [Tue, 9 Aug 2022 20:13:46 +0000 (17:13 -0300)]
SSL: logging level of "bad record type" errors.
The SSL_R_BAD_RECORD_TYPE ("bad record type") errors are reported by
OpenSSL 1.1.1 or newer when using TLSv1.3 if the client sends a record
with unknown or unexpected type. These errors are now logged at the
"info" level.
HTTP/3: skip empty request body buffers (ticket #2374).
When client DATA frame header and its content come in different QUIC packets,
it may happen that only the header is processed by the first
ngx_http_v3_request_body_filter() call. In this case an empty request body
buffer is added to r->request_body->bufs, which is later reused in a
subsequent ngx_http_v3_request_body_filter() call without being removed from
the body chain. As a result, rb->request_body->bufs ends up with two copies of
the same buffer.
The fix is to avoid adding empty request body buffers to r->request_body->bufs.
Events: fixed EPOLLRDHUP with FIONREAD (ticket #2367).
When reading exactly rev->available bytes, rev->available might become 0
after FIONREAD usage introduction in efd71d49bde0. On the next call of
ngx_readv_chain() on systems with EPOLLRDHUP this resulted in return without
any actions, that is, with rev->ready set, and this in turn resulted in no
timers set in event pipe, leading to socket leaks.
Fix is to reset rev->ready in ngx_readv_chain() when returning due to
rev->available being 0 with EPOLLRDHUP, much like it is already done in
ngx_unix_recv(). This ensures that if rev->available will become 0, on
systems with EPOLLRDHUP support appropriate EPOLLRDHUP-specific handling
will happen on the next ngx_readv_chain() call.
While here, also synced ngx_readv_chain() to match ngx_unix_recv() and
reset rev->ready when returning due to rev->available being 0 with kqueue.
This is mostly cosmetic change, as rev->ready is anyway reset when
rev->available is set to 0.
Range filter: clearing of pre-existing Content-Range headers.
Some servers might emit Content-Range header on 200 responses, and this
does not seem to contradict RFC 9110: as per RFC 9110, the Content-Range
header has no meaning for status codes other than 206 and 416. Previously
this resulted in duplicate Content-Range headers in nginx responses handled
by the range filter. Fix is to clear pre-existing headers.
SSL: logging levels of various errors added in OpenSSL 1.1.1.
Starting with OpenSSL 1.1.1, various additional errors can be reported
by OpenSSL in case of client-related issues, most notably during TLSv1.3
handshakes. In particular, SSL_R_BAD_KEY_SHARE ("bad key share"),
SSL_R_BAD_EXTENSION ("bad extension"), SSL_R_BAD_CIPHER ("bad cipher"),
SSL_R_BAD_ECPOINT ("bad ecpoint"). These are now logged at the "info"
level.
Maxim Dounin [Tue, 28 Jun 2022 23:47:45 +0000 (02:47 +0300)]
Upstream: optimized use of SSL contexts (ticket #1234).
To ensure optimal use of memory, SSL contexts for proxying are now
inherited from previous levels as long as relevant proxy_ssl_* directives
are not redefined.
Further, when no proxy_ssl_* directives are redefined in a server block,
we now preserve plcf->upstream.ssl in the "http" section configuration
to inherit it to all servers.
Similar changes made in uwsgi, grpc, and stream proxy.
Aleksei Bavshin [Thu, 2 Jun 2022 03:17:23 +0000 (20:17 -0700)]
Resolver: make TCP write timer event cancelable.
Similar to 70e65bf8dfd7, the change is made to ensure that the ability to
cancel resolver tasks is fully controlled by the caller. As mentioned in the
referenced commit, it is safe to make this timer cancelable because resolve
tasks can have their own timeouts that are not cancelable.
The scenario where this may become a problem is a periodic background resolve
task (not tied to a specific request or a client connection), which receives a
response with short TTL, large enough to warrant fallback to a TCP query.
With each event loop wakeup, we either have a previously set write timer
instance or schedule a new one. The non-cancelable write timer can delay or
block graceful shutdown of a worker even if the ngx_resolver_ctx_t->cancelable
flag is set by the API user, and there are no other tasks or connections.
We use the resolver API in this way to maintain the list of upstream server
addresses specified with the 'resolve' parameter, and there could be third-party
modules implementing similar logic.
Aleksei Bavshin [Mon, 23 May 2022 18:29:44 +0000 (11:29 -0700)]
Stream: don't flush empty buffers created for read errors.
When we generate the last_buf buffer for an UDP upstream recv error, it does
not contain any data from the wire. ngx_stream_write_filter attempts to forward
it anyways, which is incorrect (e.g., UDP upstream ECONNREFUSED will be
translated to an empty packet).
This happens because we mark the buffer as both 'flush' and 'last_buf', and
ngx_stream_write_filter has special handling for flush with certain types of
connections (see d127837c714f, 32b0ba4855a6). The flags are meant to be
mutually exclusive, so the fix is to ensure that flush and last_buf are not set
at the same time.
Reproduction:
stream {
upstream unreachable {
server 127.0.0.1:8880;
}
server {
listen 127.0.0.1:8998 udp;
proxy_pass unreachable;
}
}
Maxim Dounin [Tue, 7 Jun 2022 18:58:52 +0000 (21:58 +0300)]
Mp4: fixed potential overflow in ngx_http_mp4_crop_stts_data().
Both "count" and "duration" variables are 32-bit, so their product might
potentially overflow. It is used to reduce 64-bit start_time variable,
and with very large start_time this can result in incorrect seeking.
Upstream: handling of certificates specified as an empty string.
Now, if the directive is given an empty string, such configuration cancels
loading of certificates, in particular, if they would be otherwise inherited
from the previous level. This restores previous behaviour, before variables
support in certificates was introduced (3ab8e1e2f0f7).
Maxim Dounin [Mon, 6 Jun 2022 21:07:12 +0000 (00:07 +0300)]
Upstream: fixed X-Accel-Expires/Cache-Control/Expires handling.
Previously, if caching was disabled due to Expires in the past, nginx
failed to cache the response even if it was cacheable as per subsequently
parsed Cache-Control header (ticket #964).
Similarly, if caching was disabled due to Expires in the past,
"Cache-Control: no-cache" or "Cache-Control: max-age=0", caching was not
used if it was cacheable as per subsequently parsed X-Accel-Expires header.
Fix is to avoid disabling caching immediately after parsing Expires in
the past or Cache-Control, but rather set flags which are later checked by
ngx_http_upstream_process_headers() (and cleared by "Cache-Control: max-age"
and X-Accel-Expires).
Additionally, now X-Accel-Expires does not prevent parsing of cache control
extensions, notably stale-while-revalidate and stale-if-error. This
ensures that order of the X-Accel-Expires and Cache-Control headers is not
important.
Maxim Dounin [Mon, 30 May 2022 18:25:56 +0000 (21:25 +0300)]
Multiple WWW-Authenticate headers with "satisfy any;".
If a module adds multiple WWW-Authenticate headers (ticket #485) to the
response, linked in r->headers_out.www_authenticate, all headers are now
cleared if another module later allows access.
This change is a nop for standard modules, since the only access module which
can add multiple WWW-Authenticate headers is the auth request module, and
it is checked after other standard access modules. Though this might
affect some third party access modules.
Note that if a 3rd party module adds a single WWW-Authenticate header
and not yet modified to set the header's next pointer to NULL, attempt to
clear such a header with this change will result in a segmentation fault.
When using auth_request with an upstream server which returns 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
When using proxy_intercept_errors and an error page for error 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
Maxim Dounin [Mon, 30 May 2022 18:25:49 +0000 (21:25 +0300)]
Upstream: duplicate headers ignored or properly linked.
Most of the known duplicate upstream response headers are now ignored
with a warning.
If syntax permits multiple headers, these are now properly linked to
the lists, notably Vary and WWW-Authenticate. This makes it possible
to further handle such lists where it makes sense.
Maxim Dounin [Mon, 30 May 2022 18:25:48 +0000 (21:25 +0300)]
Upstream: header handlers can now return parsing errors.
With this change, duplicate Content-Length and Transfer-Encoding headers
are now rejected. Further, responses with invalid Content-Length or
Transfer-Encoding headers are now rejected, as well as responses with both
Content-Length and Transfer-Encoding.
Maxim Dounin [Mon, 30 May 2022 18:25:42 +0000 (21:25 +0300)]
Upstream: simplified Content-Encoding handling.
Since introduction of offset handling in ngx_http_upstream_copy_header_line()
in revision 573:58475592100c, the ngx_http_upstream_copy_content_encoding()
function is no longer needed, as its behaviour is exactly equivalent to
ngx_http_upstream_copy_header_line() with appropriate offset. As such,
the ngx_http_upstream_copy_content_encoding() function was removed.
Further, the u->headers_in.content_encoding field is not used anywhere,
so it was removed as well.
Further, Content-Encoding handling no longer depends on NGX_HTTP_GZIP,
as it can be used even without any gzip handling compiled in (for example,
in the charset filter).
Maxim Dounin [Mon, 30 May 2022 18:25:36 +0000 (21:25 +0300)]
Perl: all known input headers are handled identically.
As all known input headers are now linked lists, these are now handled
identically. In particular, this makes it possible to access properly
combined values of headers not specifically handled previously, such
as "Via" or "Connection".
Maxim Dounin [Mon, 30 May 2022 18:25:35 +0000 (21:25 +0300)]
All non-unique input headers are now linked lists.
The ngx_http_process_multi_header_lines() function is removed, as it is
exactly equivalent to ngx_http_process_header_line(). Similarly,
ngx_http_variable_header() is used instead of ngx_http_variable_headers().
Maxim Dounin [Mon, 30 May 2022 18:25:33 +0000 (21:25 +0300)]
Reworked multi headers to use linked lists.
Multi headers are now using linked lists instead of arrays. Notably,
the following fields were changed: r->headers_in.cookies (renamed
to r->headers_in.cookie), r->headers_in.x_forwarded_for,
r->headers_out.cache_control, r->headers_out.link, u->headers_in.cache_control
u->headers_in.cookies (renamed to u->headers_in.set_cookie).
The r->headers_in.cookies and u->headers_in.cookies fields were renamed
to r->headers_in.cookie and u->headers_in.set_cookie to match header names.
The ngx_http_parse_multi_header_lines() and ngx_http_parse_set_cookie_lines()
functions were changed accordingly.
With this change, multi headers are now essentially equivalent to normal
headers, and following changes will further make them equivalent.
Maxim Dounin [Mon, 30 May 2022 18:25:32 +0000 (21:25 +0300)]
Combining unknown headers during variables lookup (ticket #1316).
Previously, $http_*, $sent_http_*, $sent_trailer_*, $upstream_http_*,
and $upstream_trailer_* variables returned only the first header (with
a few specially handled exceptions: $http_cookie, $http_x_forwarded_for,
$sent_http_cache_control, $sent_http_link).
With this change, all headers are returned, combined together. For
example, $http_foo variable will be "a, b" if there are "Foo: a" and
"Foo: b" headers in the request.
Note that $upstream_http_set_cookie will also return all "Set-Cookie"
headers (ticket #1843), though this might not be what one want, since
the "Set-Cookie" header does not follow the list syntax (see RFC 7230,
section 3.2.2).
Maxim Dounin [Mon, 30 May 2022 18:25:30 +0000 (21:25 +0300)]
Uwsgi: combining headers with identical names (ticket #1724).
The uwsgi specification states that "The uwsgi block vars represent a
dictionary/hash". This implies that no duplicate headers are expected.
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".