Events: fixed EPOLLRDHUP with FIONREAD (ticket #2367).
When reading exactly rev->available bytes, rev->available might become 0
after FIONREAD usage introduction in efd71d49bde0. On the next call of
ngx_readv_chain() on systems with EPOLLRDHUP this resulted in return without
any actions, that is, with rev->ready set, and this in turn resulted in no
timers set in event pipe, leading to socket leaks.
Fix is to reset rev->ready in ngx_readv_chain() when returning due to
rev->available being 0 with EPOLLRDHUP, much like it is already done in
ngx_unix_recv(). This ensures that if rev->available will become 0, on
systems with EPOLLRDHUP support appropriate EPOLLRDHUP-specific handling
will happen on the next ngx_readv_chain() call.
While here, also synced ngx_readv_chain() to match ngx_unix_recv() and
reset rev->ready when returning due to rev->available being 0 with kqueue.
This is mostly cosmetic change, as rev->ready is anyway reset when
rev->available is set to 0.
Range filter: clearing of pre-existing Content-Range headers.
Some servers might emit Content-Range header on 200 responses, and this
does not seem to contradict RFC 9110: as per RFC 9110, the Content-Range
header has no meaning for status codes other than 206 and 416. Previously
this resulted in duplicate Content-Range headers in nginx responses handled
by the range filter. Fix is to clear pre-existing headers.
SSL: logging levels of various errors added in OpenSSL 1.1.1.
Starting with OpenSSL 1.1.1, various additional errors can be reported
by OpenSSL in case of client-related issues, most notably during TLSv1.3
handshakes. In particular, SSL_R_BAD_KEY_SHARE ("bad key share"),
SSL_R_BAD_EXTENSION ("bad extension"), SSL_R_BAD_CIPHER ("bad cipher"),
SSL_R_BAD_ECPOINT ("bad ecpoint"). These are now logged at the "info"
level.
Maxim Dounin [Tue, 28 Jun 2022 23:47:45 +0000 (02:47 +0300)]
Upstream: optimized use of SSL contexts (ticket #1234).
To ensure optimal use of memory, SSL contexts for proxying are now
inherited from previous levels as long as relevant proxy_ssl_* directives
are not redefined.
Further, when no proxy_ssl_* directives are redefined in a server block,
we now preserve plcf->upstream.ssl in the "http" section configuration
to inherit it to all servers.
Similar changes made in uwsgi, grpc, and stream proxy.
Aleksei Bavshin [Thu, 2 Jun 2022 03:17:23 +0000 (20:17 -0700)]
Resolver: make TCP write timer event cancelable.
Similar to 70e65bf8dfd7, the change is made to ensure that the ability to
cancel resolver tasks is fully controlled by the caller. As mentioned in the
referenced commit, it is safe to make this timer cancelable because resolve
tasks can have their own timeouts that are not cancelable.
The scenario where this may become a problem is a periodic background resolve
task (not tied to a specific request or a client connection), which receives a
response with short TTL, large enough to warrant fallback to a TCP query.
With each event loop wakeup, we either have a previously set write timer
instance or schedule a new one. The non-cancelable write timer can delay or
block graceful shutdown of a worker even if the ngx_resolver_ctx_t->cancelable
flag is set by the API user, and there are no other tasks or connections.
We use the resolver API in this way to maintain the list of upstream server
addresses specified with the 'resolve' parameter, and there could be third-party
modules implementing similar logic.
Aleksei Bavshin [Mon, 23 May 2022 18:29:44 +0000 (11:29 -0700)]
Stream: don't flush empty buffers created for read errors.
When we generate the last_buf buffer for an UDP upstream recv error, it does
not contain any data from the wire. ngx_stream_write_filter attempts to forward
it anyways, which is incorrect (e.g., UDP upstream ECONNREFUSED will be
translated to an empty packet).
This happens because we mark the buffer as both 'flush' and 'last_buf', and
ngx_stream_write_filter has special handling for flush with certain types of
connections (see d127837c714f, 32b0ba4855a6). The flags are meant to be
mutually exclusive, so the fix is to ensure that flush and last_buf are not set
at the same time.
Reproduction:
stream {
upstream unreachable {
server 127.0.0.1:8880;
}
server {
listen 127.0.0.1:8998 udp;
proxy_pass unreachable;
}
}
Maxim Dounin [Tue, 7 Jun 2022 18:58:52 +0000 (21:58 +0300)]
Mp4: fixed potential overflow in ngx_http_mp4_crop_stts_data().
Both "count" and "duration" variables are 32-bit, so their product might
potentially overflow. It is used to reduce 64-bit start_time variable,
and with very large start_time this can result in incorrect seeking.
Upstream: handling of certificates specified as an empty string.
Now, if the directive is given an empty string, such configuration cancels
loading of certificates, in particular, if they would be otherwise inherited
from the previous level. This restores previous behaviour, before variables
support in certificates was introduced (3ab8e1e2f0f7).
Maxim Dounin [Mon, 6 Jun 2022 21:07:12 +0000 (00:07 +0300)]
Upstream: fixed X-Accel-Expires/Cache-Control/Expires handling.
Previously, if caching was disabled due to Expires in the past, nginx
failed to cache the response even if it was cacheable as per subsequently
parsed Cache-Control header (ticket #964).
Similarly, if caching was disabled due to Expires in the past,
"Cache-Control: no-cache" or "Cache-Control: max-age=0", caching was not
used if it was cacheable as per subsequently parsed X-Accel-Expires header.
Fix is to avoid disabling caching immediately after parsing Expires in
the past or Cache-Control, but rather set flags which are later checked by
ngx_http_upstream_process_headers() (and cleared by "Cache-Control: max-age"
and X-Accel-Expires).
Additionally, now X-Accel-Expires does not prevent parsing of cache control
extensions, notably stale-while-revalidate and stale-if-error. This
ensures that order of the X-Accel-Expires and Cache-Control headers is not
important.
Maxim Dounin [Mon, 30 May 2022 18:25:56 +0000 (21:25 +0300)]
Multiple WWW-Authenticate headers with "satisfy any;".
If a module adds multiple WWW-Authenticate headers (ticket #485) to the
response, linked in r->headers_out.www_authenticate, all headers are now
cleared if another module later allows access.
This change is a nop for standard modules, since the only access module which
can add multiple WWW-Authenticate headers is the auth request module, and
it is checked after other standard access modules. Though this might
affect some third party access modules.
Note that if a 3rd party module adds a single WWW-Authenticate header
and not yet modified to set the header's next pointer to NULL, attempt to
clear such a header with this change will result in a segmentation fault.
When using auth_request with an upstream server which returns 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
When using proxy_intercept_errors and an error page for error 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
Maxim Dounin [Mon, 30 May 2022 18:25:49 +0000 (21:25 +0300)]
Upstream: duplicate headers ignored or properly linked.
Most of the known duplicate upstream response headers are now ignored
with a warning.
If syntax permits multiple headers, these are now properly linked to
the lists, notably Vary and WWW-Authenticate. This makes it possible
to further handle such lists where it makes sense.
Maxim Dounin [Mon, 30 May 2022 18:25:48 +0000 (21:25 +0300)]
Upstream: header handlers can now return parsing errors.
With this change, duplicate Content-Length and Transfer-Encoding headers
are now rejected. Further, responses with invalid Content-Length or
Transfer-Encoding headers are now rejected, as well as responses with both
Content-Length and Transfer-Encoding.
Maxim Dounin [Mon, 30 May 2022 18:25:42 +0000 (21:25 +0300)]
Upstream: simplified Content-Encoding handling.
Since introduction of offset handling in ngx_http_upstream_copy_header_line()
in revision 573:58475592100c, the ngx_http_upstream_copy_content_encoding()
function is no longer needed, as its behaviour is exactly equivalent to
ngx_http_upstream_copy_header_line() with appropriate offset. As such,
the ngx_http_upstream_copy_content_encoding() function was removed.
Further, the u->headers_in.content_encoding field is not used anywhere,
so it was removed as well.
Further, Content-Encoding handling no longer depends on NGX_HTTP_GZIP,
as it can be used even without any gzip handling compiled in (for example,
in the charset filter).
Maxim Dounin [Mon, 30 May 2022 18:25:36 +0000 (21:25 +0300)]
Perl: all known input headers are handled identically.
As all known input headers are now linked lists, these are now handled
identically. In particular, this makes it possible to access properly
combined values of headers not specifically handled previously, such
as "Via" or "Connection".
Maxim Dounin [Mon, 30 May 2022 18:25:35 +0000 (21:25 +0300)]
All non-unique input headers are now linked lists.
The ngx_http_process_multi_header_lines() function is removed, as it is
exactly equivalent to ngx_http_process_header_line(). Similarly,
ngx_http_variable_header() is used instead of ngx_http_variable_headers().
Maxim Dounin [Mon, 30 May 2022 18:25:33 +0000 (21:25 +0300)]
Reworked multi headers to use linked lists.
Multi headers are now using linked lists instead of arrays. Notably,
the following fields were changed: r->headers_in.cookies (renamed
to r->headers_in.cookie), r->headers_in.x_forwarded_for,
r->headers_out.cache_control, r->headers_out.link, u->headers_in.cache_control
u->headers_in.cookies (renamed to u->headers_in.set_cookie).
The r->headers_in.cookies and u->headers_in.cookies fields were renamed
to r->headers_in.cookie and u->headers_in.set_cookie to match header names.
The ngx_http_parse_multi_header_lines() and ngx_http_parse_set_cookie_lines()
functions were changed accordingly.
With this change, multi headers are now essentially equivalent to normal
headers, and following changes will further make them equivalent.
Maxim Dounin [Mon, 30 May 2022 18:25:32 +0000 (21:25 +0300)]
Combining unknown headers during variables lookup (ticket #1316).
Previously, $http_*, $sent_http_*, $sent_trailer_*, $upstream_http_*,
and $upstream_trailer_* variables returned only the first header (with
a few specially handled exceptions: $http_cookie, $http_x_forwarded_for,
$sent_http_cache_control, $sent_http_link).
With this change, all headers are returned, combined together. For
example, $http_foo variable will be "a, b" if there are "Foo: a" and
"Foo: b" headers in the request.
Note that $upstream_http_set_cookie will also return all "Set-Cookie"
headers (ticket #1843), though this might not be what one want, since
the "Set-Cookie" header does not follow the list syntax (see RFC 7230,
section 3.2.2).
Maxim Dounin [Mon, 30 May 2022 18:25:30 +0000 (21:25 +0300)]
Uwsgi: combining headers with identical names (ticket #1724).
The uwsgi specification states that "The uwsgi block vars represent a
dictionary/hash". This implies that no duplicate headers are expected.
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
Maxim Dounin [Mon, 30 May 2022 18:25:28 +0000 (21:25 +0300)]
SCGI: combining headers with identical names (ticket #1724).
SCGI specification explicitly forbids headers with duplicate names
(section "3. Request Format"): "Duplicate names are not allowed in
the headers".
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
Maxim Dounin [Mon, 30 May 2022 18:25:27 +0000 (21:25 +0300)]
FastCGI: combining headers with identical names (ticket #1724).
FastCGI responder is expected to receive CGI/1.1 environment variables
in the parameters (see section "6.2 Responder" of the FastCGI specification).
Obviously enough, there cannot be multiple environment variables with
the same name.
Further, CGI specification (RFC 3875, section "4.1.18. Protocol-Specific
Meta-Variables") explicitly requires to combine headers: "If multiple
header fields with the same field-name are received then the server MUST
rewrite them as a single value having the same semantics".
Maxim Dounin [Mon, 30 May 2022 18:25:25 +0000 (21:25 +0300)]
Perl: fixed $r->header_in("Connection").
Previously, the r->header_in->connection pointer was never set despite
being present in ngx_http_headers_in, resulting in incorrect value returned
by $r->header_in("Connection") in embedded perl.
Marcus Ball [Sun, 29 May 2022 23:38:07 +0000 (02:38 +0300)]
Fixed runtime handling of systems without EPOLLRDHUP support.
In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
support in the kernel (but with EPOLLRDHUP in headers) was broken.
Before the change, rev->available was never set to 0 unless
ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
introduced in 6536:f7849bfb6d21 succeeded). After the change,
rev->available might reach 0 on systems without runtime EPOLLRDHUP
support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
And, if EOF happened to be already reported along with the last event,
it is not reported again by epoll_wait(), leading to connection hangs
and timeouts on such systems.
This affects Linux kernels before 2.6.17 if nginx was compiled
with newer headers, and, more importantly, emulation layers, such as
DigitalOcean's App Platform's / gVisor's epoll emulation layer.
Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
SSL: logging level of "application data after close notify".
Such fatal errors are reported by OpenSSL 1.1.1, and similarly by BoringSSL,
if application data is encountered during SSL shutdown, which started to be
observed on the second SSL_shutdown() call after SSL shutdown fixes made in 09fb2135a589 (1.19.2). The error means that the client continues to send
application data after receiving the "close_notify" alert (ticket #2318).
Previously it was reported as SSL_shutdown() error of SSL_ERROR_SYSCALL.
With large http2_max_concurrent_streams or http2_max_concurrent_pushes, more
than 255 ngx_http_v2_node_t structures might be allocated, eventually leading
to h2c->closed_nodes overflow when closing corresponding streams. This will
in turn result in additional allocations in ngx_http_v2_get_node_by_id().
While mostly harmless, it can result in excessive memory usage by a HTTP/2
connection, notably in configurations with many keepalive_requests allowed.
Fix is to use ngx_uint_t for h2c->closed_nodes instead of unsigned:8.
Maxim Dounin [Wed, 2 Feb 2022 22:44:38 +0000 (01:44 +0300)]
HTTP/2: made it possible to flush response headers (ticket #1743).
Response headers can be buffered in the SSL buffer. But stream's fake
connection buffered flag did not reflect this, so any attempts to flush
the buffer without sending additional data were stopped by the write filter.
It does not seem to be possible to reflect this in fc->buffered though, as
we never known if main connection's c->buffered corresponds to the particular
stream or not. As such, fc->buffered might prevent request finalization
due to sending data on some other stream.
Fix is to implement handling of flush buffers when the c->need_flush_buf
flag is set, similarly to the existing last buffer handling. The same
flag is now used for UDP sockets in the stream module instead of explicit
checking of c->type.
Maxim Dounin [Tue, 1 Feb 2022 13:29:28 +0000 (16:29 +0300)]
Cache: fixed race in ngx_http_file_cache_forced_expire().
During configuration reload two cache managers might exist for a short
time. If both tried to delete the same cache node, the "ignore long locked
inactive cache entry" alert appeared in logs. Additionally,
ngx_http_file_cache_forced_expire() might be also called by worker
processes, with similar results.
Fix is to ignore cache nodes being deleted, similarly to how it is
done in ngx_http_file_cache_expire() since 3755:76e3a93821b1. This
was somehow missed in 7002:ab199f0eb8e8, when ignoring long locked
cache entries was introduced in ngx_http_file_cache_forced_expire().
Maxim Dounin [Mon, 24 Jan 2022 14:18:50 +0000 (17:18 +0300)]
SSL: always renewing tickets with TLSv1.3 (ticket #1892).
Chrome only uses TLS session tickets once with TLS 1.3, likely following
RFC 8446 Appendix C.4 recommendation. With OpenSSL, this works fine with
built-in session tickets, since these are explicitly renewed in case of
TLS 1.3 on each session reuse, but results in only two connections being
reused after an initial handshake when using ssl_session_ticket_key.
Fix is to always renew TLS session tickets in case of TLS 1.3 when using
ssl_session_ticket_key, similarly to how it is done by OpenSSL internally.
Maxim Dounin [Fri, 21 Jan 2022 21:28:51 +0000 (00:28 +0300)]
Contrib: vim syntax adjusted to save cpoptions (ticket #2276).
Line continuation as used in the syntax file might be broken if "compatible"
is set or "C" is added to cpoptions. Fix is to set the "cpoptions" option
to vim default value at script start and restore it later, see
":help use-cpo-save".
Maxim Dounin [Mon, 10 Jan 2022 23:23:49 +0000 (02:23 +0300)]
Avoid sending "Connection: keep-alive" when shutting down.
When a worker process is shutting down, keepalive is not used: this is checked
before the ngx_http_set_keepalive() call in ngx_http_finalize_connection().
Yet the "Connection: keep-alive" header was still sent, even if we know that
the worker process is shutting down, potentially resulting in additional
requests being sent to the connection which is going to be closed anyway.
While clients are expected to be able to handle asynchronous close events
(see ticket #1022), it is certainly possible to send the "Connection: close"
header instead, informing the client that the connection is going to be closed
and potentially saving some unneeded work.
With this change, we additionally check for worker process shutdown just
before sending response headers, and disable keepalive accordingly.
Maxim Dounin [Wed, 29 Dec 2021 22:08:46 +0000 (01:08 +0300)]
Events: fixed balancing between workers with EPOLLEXCLUSIVE.
Linux with EPOLLEXCLUSIVE usually notifies only the process which was first
to add the listening socket to the epoll instance. As a result most of the
connections are handled by the first worker process (ticket #2285). To fix
this, we re-add the socket periodically, so other workers will get a chance
to accept connections.
Maxim Dounin [Mon, 27 Dec 2021 16:49:26 +0000 (19:49 +0300)]
Support for sendfile(SF_NOCACHE).
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
Maxim Dounin [Mon, 27 Dec 2021 16:48:33 +0000 (19:48 +0300)]
Simplified sendfile(SF_NODISKIO) usage.
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
Maxim Dounin [Fri, 24 Dec 2021 22:07:18 +0000 (01:07 +0300)]
Core: added NGX_REGEX_MULTILINE for 3rd party modules.
Notably, NAXSI is known to misuse ngx_regex_compile() with rc.options set
to PCRE_CASELESS | PCRE_MULTILINE. With PCRE2 support, and notably binary
compatibility changes, it is no longer possible to set PCRE[2]_MULTILINE
option without using proper interface. To facilitate correct usage,
this change adds the NGX_REGEX_MULTILINE option.
Maxim Dounin [Fri, 24 Dec 2021 22:07:16 +0000 (01:07 +0300)]
PCRE2 and PCRE binary compatibility.
With this change, dynamic modules using nginx regex interface can be used
regardless of the variant of the PCRE library nginx was compiled with.
If a module is compiled with different PCRE library variant, in case of
ngx_regex_exec() errors it will report wrong function name in error
messages. This is believed to be tolerable, given that fixing this will
require interface changes.
Maxim Dounin [Fri, 24 Dec 2021 22:07:15 +0000 (01:07 +0300)]
PCRE2 library support.
The PCRE2 library is now used by default if found, instead of the
original PCRE library. If needed for some reason, this can be disabled
with the --without-pcre2 configure option.
To make it possible to specify paths to the library and include files
via --with-cc-opt / --with-ld-opt, the library is first tested without
any additional paths and options. If this fails, the pcre2-config script
is used.
Similarly to the original PCRE library, it is now possible to build PCRE2
from sources with nginx configure, by using the --with-pcre= option.
It automatically detects if PCRE or PCRE2 sources are provided.
Note that compiling PCRE2 10.33 and later requires inttypes.h. When
compiling on Windows with MSVC, inttypes.h is only available starting
with MSVC 2013. In older versions some replacement needs to be provided
("echo '#include <stdint.h>' > pcre2-10.xx/src/inttypes.h" is good enough
for MSVC 2010).
Maxim Dounin [Fri, 24 Dec 2021 22:07:10 +0000 (01:07 +0300)]
Core: fixed ngx_pcre_studies cleanup.
If a configuration parsing fails for some reason, ngx_regex_module_init()
is not called, and ngx_pcre_studies remained set despite the fact that
the pool it was allocated from is already freed. This might result in
a segmentation fault during runtime regular expression compilation, such
as in SSI, for example, in the single process mode, or if a worker process
died and was respawned from a master process in such an inconsistent state.
Fix is to clear ngx_pcre_studies from the pool cleanup handler (which is
anyway used to free JIT-compiled patterns).
Maxim Dounin [Thu, 25 Nov 2021 19:02:10 +0000 (22:02 +0300)]
HTTP/2: fixed sendfile() aio handling.
With sendfile() in threads ("aio threads; sendfile on;"), client connection
can block on writing, waiting for sendfile() to complete. In HTTP/2 this
might result in the request hang, since an attempt to continue processing
in thread event handler will call request's write event handler, which
is usually stopped by ngx_http_v2_send_chain(): it does nothing if there
are no additional data and stream->queued is set. Further, HTTP/2 resets
stream's c->write->ready to 0 if writing blocks, so just fixing
ngx_http_v2_send_chain() is not enough.
The following tests currently fail: h2_keepalive.t, h2_priority.t,
h2_proxy_max_temp_file_size.t, h2.t, h2_trailers.t.
Similarly, sendfile() with AIO preloading on FreeBSD can block as well,
with similar results. This is, however, harder to reproduce, especially
on modern FreeBSD systems, since sendfile() usually does not return EBUSY.
Fix is to modify ngx_http_v2_send_chain() so it actually tries to send
data to the main connection when called, and to make sure that
c->write->ready is set by the relevant event handlers.
Maxim Dounin [Thu, 25 Nov 2021 19:02:05 +0000 (22:02 +0300)]
HTTP/2: fixed "task already active" with sendfile in threads.
With sendfile in threads, "task already active" alerts might appear in logs
if a write event happens on the main HTTP/2 connection, triggering a sendfile
in threads while another thread operation is already running. Observed
with "aio threads; aio_write on; sendfile on;" and with thread event handlers
modified to post a write event to the main HTTP/2 connection (though can
happen without any modifications).
Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate
aio operation, resulting in "second aio post" alerts. This is, however,
harder to reproduce, especially on modern FreeBSD systems, since sendfile()
usually does not return EBUSY.
Fix is to avoid starting a sendfile operation if other thread operation
is active by checking r->aio in the thread handler (and, similarly, in
aio preload handler). The added check also makes duplicate calls protection
redundant, so it is removed.
The variable contains a negotiated curve used for the handshake key
exchange process. Known curves are listed by their names, unknown
ones are shown in hex.
Note that for resumed sessions in TLSv1.2 and older protocols,
$ssl_curve contains the curve used during the initial handshake,
while in TLSv1.3 it contains the curve used during the session
resumption (see the SSL_get_negotiated_group manual page for
details).
The variable is only meaningful when using OpenSSL 3.0 and above.
With older versions the variable is empty.
Maxim Dounin [Fri, 29 Oct 2021 23:39:19 +0000 (02:39 +0300)]
Changed ngx_chain_update_chains() to test tag first (ticket #2248).
Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).
The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.
In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.
Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.
Maxim Dounin [Fri, 29 Oct 2021 17:21:57 +0000 (20:21 +0300)]
Changed default value of sendfile_max_chunk to 2m.
The "sendfile_max_chunk" directive is important to prevent worker
monopolization by fast connections. The 2m value implies maximum 200ms
delay with 100 Mbps links, 20ms delay with 1 Gbps links, and 2ms on
10 Gbps links. It also seems to be a good value for disks.
Maxim Dounin [Fri, 29 Oct 2021 17:21:54 +0000 (20:21 +0300)]
Upstream: sendfile_max_chunk support.
Previously, connections to upstream servers used sendfile() if it was
enabled, but never honored sendfile_max_chunk. This might result
in worker monopolization for a long time if large request bodies
are allowed.
Maxim Dounin [Fri, 29 Oct 2021 17:21:51 +0000 (20:21 +0300)]
Fixed sendfile() limit handling on Linux.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
Maxim Dounin [Fri, 29 Oct 2021 17:21:48 +0000 (20:21 +0300)]
Simplified sendfile_max_chunk handling.
Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.
This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).
Maxim Dounin [Fri, 29 Oct 2021 17:21:43 +0000 (20:21 +0300)]
Switched to using posted next events after sendfile_max_chunk.
Previously, 1 millisecond delay was used instead. In certain edge cases
this might result in noticeable performance degradation though, notably on
Linux with typical CONFIG_HZ=250 (so 1ms delay becomes 4ms),
sendfile_max_chunk 2m, and link speed above 2.5 Gbps.
Using posted next events removes the artificial delay and makes processing
fast in all cases.
Roman Arutyunyan [Thu, 28 Oct 2021 11:14:25 +0000 (14:14 +0300)]
Mp4: mp4_start_key_frame directive.
The directive enables including all frames from start time to the most recent
key frame in the result. Those frames are removed from presentation timeline
using mp4 edit lists.
Edit lists are currently supported by popular players and browsers such as
Chrome, Safari, QuickTime and ffmpeg. Among those not supporting them properly
is Firefox[1].
Based on a patch by Tracey Jaquith, Internet Archive.
The function updates the duration field of mdhd atom. Previously it was
updated in ngx_http_mp4_read_mdhd_atom(). The change makes it possible to
alter track duration as a result of processing track frames.
Alexey Radkov [Thu, 19 Aug 2021 17:51:27 +0000 (20:51 +0300)]
Core: removed unnecessary restriction in hash initialization.
Hash initialization ignores elements with key.data set to NULL.
Nevertheless, the initial hash bucket size check didn't skip them,
resulting in unnecessary restrictions on, for example, variables with
long names and with the NGX_HTTP_VARIABLE_NOHASH flag.
Fix is to update the initial hash bucket size check to skip elements
with key.data set to NULL, similarly to how it is done in other parts
of the code.
Maxim Dounin [Thu, 21 Oct 2021 15:44:07 +0000 (18:44 +0300)]
SSL: SSL_sendfile() support with kernel TLS.
Requires OpenSSL 3.0 compiled with "enable-ktls" option. Further, KTLS
needs to be enabled in kernel, and in OpenSSL, either via OpenSSL
configuration file or with "ssl_conf_command Options KTLS;" in nginx
configuration.
On FreeBSD, kernel TLS is available starting with FreeBSD 13.0, and
can be enabled with "sysctl kern.ipc.tls.enable=1" and "kldload ktls_ocf"
to load a software backend, see man ktls(4) for details.
On Linux, kernel TLS is available starting with kernel 4.13 (at least 5.2
is recommended), and needs kernel compiled with CONFIG_TLS=y (with
CONFIG_TLS=m, which is used at least on Ubuntu 21.04 by default,
the tls module needs to be loaded with "modprobe tls").
Maxim Dounin [Thu, 21 Oct 2021 15:38:38 +0000 (18:38 +0300)]
Removed CLOCK_MONOTONIC_COARSE support.
While clock_gettime(CLOCK_MONOTONIC_COARSE) is faster than
clock_gettime(CLOCK_MONOTONIC), the latter is fast enough on Linux for
practical usage, and the difference is negligible compared to other costs
at each event loop iteration. On the other hand, CLOCK_MONOTONIC_COARSE
causes various issues with typical CONFIG_HZ=250, notably very inaccurate
limit_rate handling in some edge cases (ticket #1678) and negative difference
between $request_time and $upstream_response_time (ticket #1965).