Ruslan Ermilov [Sun, 27 Sep 2020 20:21:09 +0000 (23:21 +0300)]
Proxy: strengthen syntax checking for some directives.
The "false" parameter of the proxy_redirect directive is deprecated.
Warning has been emitted since c2230102df6f (0.7.54).
The "off" parameter of the proxy_redirect, proxy_cookie_domain, and
proxy_cookie_path directives tells nginx not to inherit the
configuration from the previous configuration level.
Previously, after specifying the directive with the "off" parameter,
any other directives were ignored, and syntax checking was disabled.
The syntax was enforced to allow either one directive with the "off"
parameter, or several directives with other parameters.
Also, specifying "proxy_redirect default foo" no longer works like
"proxy_redirect default".
In rare cases, such as memory allocation failure, SSL_set_SSL_CTX() returns
NULL, which could mean that a different SSL configuration has not been set.
Note that this new behaviour seemingly originated in OpenSSL-1.1.0 release.
HTTP/2 code failed to run posted requests after calling the request body
handler, and this resulted in connection hang if a subrequest was created
in the body handler and no other actions were made.
HTTP/2: fixed segfault on DATA frames after 400 errors.
If 400 errors were redirected to an upstream server using the error_page
directive, DATA frames from the client might cause segmentation fault
due to null pointer dereference. The bug had appeared in 6989:2c4dbcd6f2e4
(1.13.0).
Fix is to skip such frames in ngx_http_v2_state_read_data() (similarly
to 7561:9f1f9d6e056a). With the fix, behaviour of 400 errors in HTTP/2
is now similar to one in HTTP/1.x, that is, nginx doesn't try to read the
request body.
Note that proxying 400 errors, as well as other early stage errors, to
upstream servers might not be a good idea anyway. These errors imply
that reading and processing of the request (and the request headers)
wasn't complete, and proxying of such incomplete request might lead to
various errors.
SSL: disabled shutdown when there are buffered data.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in 09fb2135a589 (1.19.2), notably when HTTP/2 connections are closed due
to read timeouts while there are incomplete writes.
This fixes "SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in 09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket #1194.
Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization. These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.
The c->read->ready and c->write->ready flags need to be cleared to ensure
that appropriate read or write events will be reported by kernel. Without
this, SSL shutdown might wait till the timeout after blocking on writing
or reading even if there is a socket activity.
SSL: workaround for incorrect SSL_write() errors in OpenSSL 1.1.1.
OpenSSL 1.1.1 fails to return SSL_ERROR_SYSCALL if an error happens
during SSL_write() after close_notify alert from the peer, and returns
SSL_ERROR_ZERO_RETURN instead. Broken by this commit, which removes
the "i == 0" check around the SSL_RECEIVED_SHUTDOWN one:
In particular, if a client closed the connection without reading
the response but with properly sent close_notify alert, this resulted in
unexpected "SSL_write() failed while ..." critical log message instead
of correct "SSL_write() failed (32: Broken pipe)" at the info level.
Since SSL_ERROR_ZERO_RETURN cannot be legitimately returned after
SSL_write(), the fix is to convert all SSL_ERROR_ZERO_RETURN errors
after SSL_write() to SSL_ERROR_SYSCALL.
Cache: keep c->body_start when Vary changes (ticket #2029).
If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key. In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".
Cache: reset c->body_start when reading a variant on Vary mismatch.
Previously, a variant not present in shared memory and stored on disk using a
secondary key was read using c->body_start from a variant stored with a main
key. This could result in critical errors "cache file .. has too long header".
Roman Arutyunyan [Wed, 29 Jul 2020 10:28:04 +0000 (13:28 +0300)]
Cache: ignore stale-if-error for 4xx and 5xx codes.
Previously the stale-if-error extension of the Cache-Control upstream header
triggered the return of a stale response for all error conditions that can be
specified in the proxy_cache_use_stale directive. The list of these errors
includes both network/timeout/format errors, as well as some HTTP codes like
503, 504, 403, 429 etc. The latter prevented a cache entry from being updated
by a response with any of these HTTP codes during the stale-if-error period.
Now stale-if-error only works for network/timeout/format errors and ignores
the upstream HTTP code. The return of a stale response for certain HTTP codes
is still possible using the proxy_cache_use_stale directive.
This change also applies to the stale-while-revalidate extension of the
Cache-Control header, which triggers stale-if-error if it is missing.
Reported at
http://mailman.nginx.org/pipermail/nginx/2020-July/059723.html.
Maxim Dounin [Mon, 10 Aug 2020 15:53:07 +0000 (18:53 +0300)]
Core: reusing connections in advance.
Reworked connections reuse, so closing connections is attempted in
advance, as long as number of free connections is less than 1/16 of
worker connections configured. This ensures that new connections can
be handled even if closing a reusable connection requires some time,
for example, for a lingering close (ticket #2017).
The 1/16 ratio is selected to be smaller than 1/8 used for disabling
accept when working with accept mutex, so nginx will try to balance
new connections to different workers first, and will start reusing
connections only if this won't help.
Maxim Dounin [Mon, 10 Aug 2020 15:52:59 +0000 (18:52 +0300)]
Core: added a warning about reusing connections.
Previously, reusing connections happened silently and was only
visible in monitoring systems. This was shown to be not very user-friendly,
and administrators often didn't realize there were too few connections
available to withstand the load, and configured timeouts (keepalive_timeout
and http2_idle_timeout) were effectively reduced to keep things running.
To provide at least some information about this, a warning is now logged
(at most once per second, to avoid flooding the logs).
Maxim Dounin [Mon, 10 Aug 2020 15:52:34 +0000 (18:52 +0300)]
SSL: disabled sending shutdown after ngx_http_test_reading().
Sending shutdown when ngx_http_test_reading() detects the connection is
closed can result in "SSL_shutdown() failed (SSL: ... bad write retry)"
critical log messages if there are blocked writes.
Fix is to avoid sending shutdown via the c->ssl->no_send_shutdown flag,
similarly to how it is done in ngx_http_keepalive_handler() for kqueue
when pending EOF is detected.
Reported by Jan Prachař
(http://mailman.nginx.org/pipermail/nginx-devel/2018-December/011702.html).
Maxim Dounin [Mon, 10 Aug 2020 15:52:20 +0000 (18:52 +0300)]
HTTP/2: fixed c->timedout flag on timed out connections.
Without the flag, SSL shutdown is attempted on such connections,
resulting in useless work and/or bogus "SSL_shutdown() failed
(SSL: ... bad write retry)" critical log messages if there are
blocked writes.
Maxim Dounin [Mon, 10 Aug 2020 15:52:09 +0000 (18:52 +0300)]
SSL: fixed shutdown handling.
Previously, bidirectional shutdown never worked, due to two issues
in the code:
1. The code only tested SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE
when there was an error in the error queue, which cannot happen.
The bug was introduced in an attempt to fix unexpected error logging
as reported with OpenSSL 0.9.8g
(http://mailman.nginx.org/pipermail/nginx/2008-January/003084.html).
2. The code never called SSL_shutdown() for the second time to wait for
the peer's close_notify alert.
This change fixes both issues.
Note that after this change bidirectional shutdown is expected to work for
the first time, so c->ssl->no_wait_shutdown now makes a difference. This
is not a problem for HTTP code which always uses c->ssl->no_wait_shutdown,
but might be a problem for stream and mail code, as well as 3rd party
modules.
To minimize the effect of the change, the timeout, which was used to be 30
seconds and not configurable, though never actually used, is now set to
3 seconds. It is also expanded to apply to both SSL_ERROR_WANT_READ and
SSL_ERROR_WANT_WRITE, so timeout is properly set if writing to the socket
buffer is not possible.
Maxim Dounin [Thu, 6 Aug 2020 02:02:55 +0000 (05:02 +0300)]
Request body: allowed large reads on chunk boundaries.
If some additional data from a pipelined request happens to be
read into the body buffer, we copy it to r->header_in or allocate
an additional large client header buffer for it.
Maxim Dounin [Thu, 6 Aug 2020 02:02:22 +0000 (05:02 +0300)]
Added size check to ngx_http_alloc_large_header_buffer().
This ensures that copying won't write more than the buffer size
even if the buffer comes from hc->free and it is smaller than the large
client header buffer size in the virtual host configuration. This might
happen if size of large client header buffers is different in name-based
virtual hosts, similarly to the problem with number of buffers fixed
in 6926:e662cbf1b932.
FastCGI: fixed zero size buf alerts on extra data (ticket #2018).
After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).
Roman Arutyunyan [Wed, 22 Jul 2020 19:16:19 +0000 (22:16 +0300)]
Xslt: disabled ranges.
Previously, the document generated by the xslt filter was always fully sent
to client even if a range was requested and response status was 206 with
appropriate Content-Range.
The xslt module is unable to serve a range because of suspending the header
filter chain. By the moment full response xml is buffered by the xslt filter,
range header filter is not called yet, but the range body filter has already
been called and did nothing.
The fix is to disable ranges by resetting the r->allow_ranges flag much like
the image filter that employs a similar technique.
The slice filter allows ranges for the response by setting the r->allow_ranges
flag, which enables the range filter. If the range was not requested, the
range filter adds an Accept-Ranges header to the response to signal the
support for ranges.
Previously, if an Accept-Ranges header was already present in the first slice
response, client received two copies of this header. Now, the slice filter
removes the Accept-Ranges header from the response prior to setting the
r->allow_ranges flag.
As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.
Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.
FastCGI: protection from responses with wrong length.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.
For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.
Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).
Previously, additional data after final chunk was either ignored
(in the same buffer, or during unbuffered proxying) or sent to the
client (in the next buffer already if it was already read from the
socket). Now additional data are properly detected and ignored
in all cases. Additionally, a warning is now logged and keepalive
is disabled in the connection.
Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.
If a memcached response was followed by a correct trailer, and then
the NUL character followed by some extra data - this was accepted by
the trailer checking code. This in turn resulted in ctx->rest underflow
and caused negative size buffer on the next reading from the upstream,
followed by the "negative size buf in writer" alert.
Fix is to always check for too long responses, so a correct trailer cannot
be followed by extra data.
Ruslan Ermilov [Fri, 3 Jul 2020 13:16:47 +0000 (16:16 +0300)]
HTTP/2: lingering close after GOAWAY.
After sending the GOAWAY frame, a connection is now closed using
the lingering close mechanism.
This allows for the reliable delivery of the GOAWAY frames, while
also fixing connection resets observed when http2_max_requests is
reached (ticket #1250), or with graceful shutdown (ticket #1544),
when some additional data from the client is received on a fully
closed connection.
For HTTP/2, the settings lingering_close, lingering_timeout, and
lingering_time are taken from the "server" level.
Using SSL_CTX_set_verify(SSL_VERIFY_PEER) implies that OpenSSL will
send a certificate request during an SSL handshake, leading to unexpected
certificate requests from browsers as long as there are any client
certificates installed. Given that ngx_ssl_trusted_certificate()
is called unconditionally by the ngx_http_ssl_module, this affected
all HTTPS servers. Broken by 699f6e55bbb4 (not released yet).
Fix is to set verify callback in the ngx_ssl_trusted_certificate() function
without changing the verify mode.
Maxim Dounin [Mon, 22 Jun 2020 15:03:00 +0000 (18:03 +0300)]
Cache: introduced min_free cache clearing.
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Maxim Dounin [Mon, 22 Jun 2020 15:02:59 +0000 (18:02 +0300)]
Too large st_blocks values are now ignored (ticket #157).
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
Maxim Dounin [Mon, 22 Jun 2020 15:02:58 +0000 (18:02 +0300)]
Large block sizes on Linux are now ignored (ticket #1168).
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
Roman Arutyunyan [Mon, 15 Jun 2020 17:17:16 +0000 (20:17 +0300)]
OCSP: fixed use-after-free on error.
When validating second and further certificates, ssl callback could be called
twice to report the error. After the first call client connection is
terminated and its memory is released. Prior to the second call and in it
released connection memory is accessed.
Errors triggering this behavior:
- failure to create the request
- failure to start resolving OCSP responder name
- failure to start connecting to the OCSP responder
The fix is to rearrange the code to eliminate the second call.
Quantum [Mon, 15 Jun 2020 21:35:26 +0000 (17:35 -0400)]
Correctly flush request body to uwsgi with SSL.
The flush flag was not set when forwarding the request body to the uwsgi
server. When using uwsgi_pass suwsgi://..., this causes the uwsgi server
to wait indefinitely for the request body and eventually time out due to
SSL buffering.
This is essentially the same change as 4009:3183165283cc, which was made
to ngx_http_proxy_module.c.
This will fix the uwsgi bug https://github.com/unbit/uwsgi/issues/1490.
Maxim Dounin [Wed, 3 Jun 2020 16:11:32 +0000 (19:11 +0300)]
SSL: added verify callback to ngx_ssl_trusted_certificate().
This ensures that certificate verification is properly logged to debug
log during upstream server certificate verification. This should help
with debugging various certificate issues.
Ruslan Ermilov [Mon, 1 Jun 2020 19:31:23 +0000 (22:31 +0300)]
Fixed SIGQUIT not removing listening UNIX sockets (closes #753).
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
Ruslan Ermilov [Mon, 1 Jun 2020 17:19:27 +0000 (20:19 +0300)]
Fixed removing of listening UNIX sockets when "changing binary".
When changing binary, sending a SIGTERM to the new binary's master process
should not remove inherited UNIX sockets unless the old binary's master
process has exited.
Previously, invalid connection preface errors were only logged at debug
level, providing no visible feedback, in particular, when a plain text
HTTP/2 listening socket is erroneously used for HTTP/1.x connections.
Now these are explicitly logged at the info level, much like other
client-related errors.
Roman Arutyunyan [Fri, 22 May 2020 14:30:12 +0000 (17:30 +0300)]
SSL: client certificate validation with OCSP (ticket #1534).
OCSP validation for client certificates is enabled by the "ssl_ocsp" directive.
OCSP responder can be optionally specified by "ssl_ocsp_responder".
When session is reused, peer chain is not available for validation.
If the verified chain contains certificates from the peer chain not available
at the server, validation will fail.
Ruslan Ermilov [Thu, 23 Apr 2020 12:10:26 +0000 (15:10 +0300)]
gRPC: WINDOW_UPDATE after END_STREAM handling (ticket #1797).
As per https://tools.ietf.org/html/rfc7540#section-6.9,
WINDOW_UPDATE received after a frame with the END_STREAM flag
should be handled and not treated as an error.
As per https://tools.ietf.org/html/rfc7540#section-8.1,
: A server can send a complete response prior to the client
: sending an entire request if the response does not depend on
: any portion of the request that has not been sent and
: received. When this is true, a server MAY request that the
: client abort transmission of a request without error by
: sending a RST_STREAM with an error code of NO_ERROR after
: sending a complete response (i.e., a frame with the
: END_STREAM flag). Clients MUST NOT discard responses as a
: result of receiving such a RST_STREAM, though clients can
: always discard responses at their discretion for other
: reasons.
Previously, RST_STREAM(NO_ERROR) received from upstream after
a frame with the END_STREAM flag was incorrectly treated as an
error. Now, a single RST_STREAM(NO_ERROR) is properly handled.
This fixes problems observed with modern grpc-c [1], as well
as with the Go gRPC module.
Ruslan Ermilov [Tue, 7 Apr 2020 22:02:17 +0000 (01:02 +0300)]
The new auth_delay directive for delaying unauthorized requests.
The request processing is delayed by a timer. Since nginx updates
internal time once at the start of each event loop iteration, this
normally ensures constant time delay, adding a mitigation from
time-based attacks.
A notable exception to this is the case when there are no additional
events before the timer expires. To ensure constant-time processing
in this case as well, we trigger an additional event loop iteration
by posting a dummy event for the next event loop iteration.
When "aio" or "aio threads" is used while processing the response body of an
in-memory background subrequest, the subrequest could be finalized with an aio
operation still in progress. Upon aio completion either parent request is
woken or the old r->write_event_handler is called again. The latter may result
in request errors. In either case post_subrequest handler is never called with
the full response body, which is typically expected when using in-memory
subrequests.
Currently in nginx background subrequests are created by the upstream module
and the mirror module. The issue does not manifest itself with these
subrequests because they are header-only. But it can manifest itself with
third-party modules which create in-memory background subrequests.
Maxim Dounin [Fri, 28 Feb 2020 14:21:18 +0000 (17:21 +0300)]
Added default overwrite in error_page 494.
We used to have default error_page overwrite for 495, 496, and 497, so
a configuration like
error_page 495 /error;
will result in error 400, much like without any error_page configured.
The 494 status code was introduced later (in 3848:de59ad6bf557, nginx 0.9.4),
and relevant changes to ngx_http_core_error_page() were missed, resulting
in inconsistent behaviour of "error_page 494" - with error_page configured
it results in 494 being returned instead of 400.
Reported by Frank Liu,
http://mailman.nginx.org/pipermail/nginx/2020-February/058957.html.
Roman Arutyunyan [Wed, 26 Feb 2020 12:10:46 +0000 (15:10 +0300)]
Mp4: fixed possible chunk offset overflow.
In "co64" atom chunk start offset is a 64-bit unsigned integer. When trimming
the "mdat" atom, chunk offsets are casted to off_t values which are typically
64-bit signed integers. A specially crafted mp4 file with huge chunk offsets
may lead to off_t overflow and result in negative trim boundaries.
The consequences of the overflow are:
- Incorrect Content-Length header value in the response.
- Negative left boundary of the response file buffer holding the trimmed "mdat".
This leads to pread()/sendfile() errors followed by closing the client
connection.
On rare systems where off_t is a 32-bit integer, this scenario is also feasible
with the "stco" atom.
The fix is to add checks which make sure data chunks referenced by each track
are within the mp4 file boundaries. Additionally a few more checks are added to
ensure mp4 file consistency and log errors.
Maxim Dounin [Thu, 20 Feb 2020 13:51:07 +0000 (16:51 +0300)]
Disabled duplicate "Host" headers (ticket #1724).
Duplicate "Host" headers were allowed in nginx 0.7.0 (revision b9de93d804ea)
as a workaround for some broken Motorola phones which used to generate
requests with two "Host" headers[1]. It is believed that this workaround
is no longer relevant.
Maxim Dounin [Thu, 20 Feb 2020 13:19:34 +0000 (16:19 +0300)]
Removed "Transfer-Encoding: identity" support.
The "identity" transfer coding has been removed in RFC 7230. It is
believed that it is not used in real life, and at the same time it
provides a potential attack vector.
Maxim Dounin [Thu, 20 Feb 2020 13:19:29 +0000 (16:19 +0300)]
Disabled multiple Transfer-Encoding headers.
We anyway do not support more than one transfer encoding, so accepting
requests with multiple Transfer-Encoding headers doesn't make sense.
Further, we do not handle multiple headers, and ignore anything but
the first header.
HTTP/2: fixed socket leak with an incomplete HEADERS frame.
A connection could get stuck without timers if a client has partially sent
the HEADERS frame such that it was split on the individual header boundary.
In this case, it cannot be processed without the rest of the HEADERS frame.
The fix is to call ngx_http_v2_state_headers_save() in this case. Normally,
it would be called from the ngx_http_v2_state_header_block() handler on the
next iteration, when there is not enough data to continue processing. This
isn't the case if recv_buffer became empty and there's no more data to read.
Daniil Bondarev [Tue, 14 Jan 2020 11:20:08 +0000 (14:20 +0300)]
HTTP/2: removed ngx_debug_point() call.
With the recent change to prevent frames flood in d4448892a294,
nginx will finalize the connection with NGX_HTTP_V2_INTERNAL_ERROR
whenever flood is detected, causing nginx aborting or stopping if
the debug_points directive is used in nginx config.