Maxim Dounin [Thu, 24 Jan 2019 18:51:21 +0000 (21:51 +0300)]
Win32: added WSAPoll() support.
WSAPoll() is only available with Windows Vista and newer (and only
available during compilation if _WIN32_WINNT >= 0x0600). To make
sure the code works with Windows XP, we do not redefine _WIN32_WINNT,
but instead load WSAPoll() dynamically if it is not available during
compilation.
Also, sockets are not guaranteed to be small integers on Windows.
So an index array is used instead of NGX_USE_FD_EVENT to map
events to connections.
Maxim Dounin [Thu, 24 Jan 2019 18:51:00 +0000 (21:51 +0300)]
Win32: properly enabled select on Windows.
Previously, select was compiled in by default, but the NGX_HAVE_SELECT
macro was not set, resulting in iocp being used by default unless
the "--with-select_module" configure option was explicitly specified.
Since the iocp module is not finished and does not work properly, this
effectively meant that the "--with-select_module" option was mandatory.
With the change NGX_HAVE_SELECT is properly set, making "--with-select_module"
optional. Accordingly, it is removed from misc/GNUmakefile win32 target.
Roman Arutyunyan [Thu, 27 Dec 2018 16:37:34 +0000 (19:37 +0300)]
Stream: do not split datagrams when limiting proxy rate.
Previously, when using proxy_upload_rate and proxy_download_rate, the buffer
size for reading from a socket could be reduced as a result of rate limiting.
For connection-oriented protocols this behavior is normal since unread data will
normally be read at the next iteration. But for datagram-oriented protocols
this is not the case, and unread part of the datagram is lost.
Now buffer size is not limited for datagrams. Rate limiting still works in this
case by delaying the next reading event.
Roman Arutyunyan [Mon, 14 Jan 2019 17:36:23 +0000 (20:36 +0300)]
Prevented scheduling events on a shared connection.
A shared connection does not own its file descriptor, which means that
ngx_handle_read_event/ngx_handle_write_event calls should do nothing for it.
Currently the c->shared flag is checked in several places in the stream proxy
module prior to calling these functions. However it was not done everywhere.
Missing checks could lead to calling
ngx_handle_read_event/ngx_handle_write_event on shared connections.
The problem manifested itself when using proxy_upload_rate and resulted in
either duplicate file descriptor error (e.g. with epoll) or incorrect further
udp packet processing (e.g. with kqueue).
The fix is to set and reset the event active flag in a way that prevents
ngx_handle_read_event/ngx_handle_write_event from scheduling socket events.
Maxim Dounin [Mon, 24 Dec 2018 18:07:05 +0000 (21:07 +0300)]
Win32: removed NGX_DIR_MASK concept.
Previous interface of ngx_open_dir() assumed that passed directory name
has a room for NGX_DIR_MASK at the end (NGX_DIR_MASK_LEN bytes). While all
direct users of ngx_dir_open() followed this interface, this also implied
similar requirements for indirect uses - in particular, via ngx_walk_tree().
Currently none of ngx_walk_tree() uses provides appropriate space, and
fixing this does not look like a right way to go. Instead, ngx_dir_open()
interface was changed to not require any additional space and use
appropriate allocations instead.
Sergey Kandaurov [Tue, 18 Dec 2018 12:15:15 +0000 (15:15 +0300)]
SSL: avoid reading on pending SSL_write_early_data().
If SSL_write_early_data() returned SSL_ERROR_WANT_WRITE, stop further reading
using a newly introduced c->ssl->write_blocked flag, as otherwise this would
result in SSL error "ssl3_write_bytes:bad length". Eventually, normal reading
will be restored by read event posted from successful SSL_write_early_data().
While here, place "SSL_write_early_data: want write" debug on the path.
Roman Arutyunyan [Tue, 11 Dec 2018 16:41:22 +0000 (19:41 +0300)]
Resolver: report SRV resolve failure if all A resolves failed.
Previously, if an SRV record was successfully resolved, but all of its A
records failed to resolve, NXDOMAIN was returned to the caller, which is
considered a successful resolve rather than an error. This could result in
losing the result of a previous successful resolve by the caller.
Now NXDOMAIN is only returned if at least one A resolve completed with this
code. Otherwise the error state of the first A resolve is returned.
Roman Arutyunyan [Tue, 11 Dec 2018 10:09:00 +0000 (13:09 +0300)]
Copy regex unnamed captures to cloned subrequests.
Previously, unnamed regex captures matched in the parent request, were not
available in a cloned subrequest. Now 3 fields related to unnamed captures
are copied to a cloned subrequest: r->ncaptures, r->captures and
r->captures_data. Since r->captures cannot be changed by either request after
creating a clone, a new flag r->realloc_captures is introduced to force
reallocation of r->captures.
The issue was reported as a proxy_cache_background_update misbehavior in
http://mailman.nginx.org/pipermail/nginx/2018-December/057251.html.
Maxim Dounin [Mon, 26 Nov 2018 15:29:56 +0000 (18:29 +0300)]
Negative size buffers detection.
In the past, there were several security issues which resulted in
worker process memory disclosure due to buffers with negative size.
It looks reasonable to check for such buffers in various places,
much like we already check for zero size buffers.
While here, removed "#if 1 / #endif" around zero size buffer checks.
It looks highly unlikely that we'll disable these checks anytime soon.
Maxim Dounin [Wed, 21 Nov 2018 17:23:16 +0000 (20:23 +0300)]
Mp4: fixed possible pointer overflow on 32-bit platforms.
On 32-bit platforms mp4->buffer_pos might overflow when a large
enough (close to 4 gigabytes) atom is being skipped, resulting in
incorrect memory addesses being read further in the code. In most
cases this results in harmless errors being logged, though may also
result in a segmentation fault if hitting unmapped pages.
To address this, ngx_mp4_atom_next() now only increments mp4->buffer_pos
up to mp4->buffer_end. This ensures that overflow cannot happen.
Maxim Dounin [Wed, 21 Nov 2018 15:56:50 +0000 (18:56 +0300)]
Limit req: "delay=" parameter.
This parameter specifies an additional "soft" burst limit at which requests
become delayed (but not yet rejected as it happens if "burst=" limit is
exceeded). Defaults to 0, i.e., all excess requests are delayed.
Originally inspired by Vladislav Shabanov
(http://mailman.nginx.org/pipermail/nginx-devel/2016-April/008126.html).
Further improved based on a patch by Peter Shchuchkin
(http://mailman.nginx.org/pipermail/nginx-devel/2018-October/011522.html).
Vladimir Homutov [Wed, 21 Nov 2018 10:40:40 +0000 (13:40 +0300)]
Upstream: revised upstream response time variables.
Variables now do not depend on presence of the HTTP status code in response.
If the corresponding event occurred, variables contain time between request
creation and the event, and "-" otherwise.
Previously, intermediate value of the $upstream_response_time variable held
unix timestamp.
Vladimir Homutov [Mon, 12 Nov 2018 13:29:30 +0000 (16:29 +0300)]
Stream: proxy_requests directive.
The directive allows to drop binding between a client and existing UDP stream
session after receiving a specified number of packets. First packet from the
same client address and port will start a new session. Old session continues
to exist and will terminate at moment defined by configuration: either after
receiving the expected number of responses, or after timeout, as specified by
the "proxy_responses" and/or "proxy_timeout" directives.
Ruslan Ermilov [Tue, 6 Nov 2018 13:29:49 +0000 (16:29 +0300)]
HTTP/2: limit the number of idle state switches.
An attack that continuously switches HTTP/2 connection between
idle and active states can result in excessive CPU usage.
This is because when a connection switches to the idle state,
all of its memory pool caches are freed.
This change limits the maximum allowed number of idle state
switches to 10 * http2_max_requests (i.e., 10000 by default).
This limits possible CPU usage in one connection, and also
imposes a limit on the maximum lifetime of a connection.
Initially reported by Gal Goldshtein from F5 Networks.
Ruslan Ermilov [Tue, 6 Nov 2018 13:29:35 +0000 (16:29 +0300)]
HTTP/2: flood detection.
Fixed uncontrolled memory growth in case peer is flooding us with
some frames (e.g., SETTINGS and PING) and doesn't read data. Fix
is to limit the number of allocated control frames.
Previously there was no validation for the size of a 64-bit atom
in an mp4 file. This could lead to a CPU hog when the size is 0,
or various other problems due to integer underflow when calculating
atom data size, including segmentation fault or worker process
memory disclosure.
Maxim Dounin [Wed, 31 Oct 2018 13:49:39 +0000 (16:49 +0300)]
Cache: fixed minimum cache keys zone size limit.
Size of a shared memory zones must be at least two pages - one page
for slab allocator internal data, and another page for actual allocations.
Using 8192 instead is wrong, as there are systems with page sizes other
than 4096.
Note well that two pages is usually too low as well. In particular, cache
is likely to use two allocations of different sizes for global structures,
and at least four pages will be needed to properly allocate cache nodes.
Except in a few very special cases, with keys zone of just two pages nginx
won't be able to start. Other uses of shared memory impose a limit
of 8 pages, which provides some room for global allocations. This patch
doesn't try to address this though.
Maxim Dounin [Tue, 23 Oct 2018 19:11:48 +0000 (22:11 +0300)]
SSL: explicitly set maximum version (ticket #1654).
With maximum version explicitly set, TLSv1.3 will not be unexpectedly
enabled if nginx compiled with OpenSSL 1.1.0 (without TLSv1.3 support)
will be run with OpenSSL 1.1.1 (with TLSv1.3 support).
Maxim Dounin [Tue, 2 Oct 2018 14:46:18 +0000 (17:46 +0300)]
SSL: fixed segfault on renegotiation (ticket #1646).
In e3ba4026c02d (1.15.4) nginx own renegotiation checks were disabled
if SSL_OP_NO_RENEGOTIATION is available. But since SSL_OP_NO_RENEGOTIATION
is only set on a connection, not in an SSL context, SSL_clear_option()
removed it as long as a matching virtual server was found. This resulted
in a segmentation fault similar to the one fixed in a6902a941279 (1.9.8),
affecting nginx built with OpenSSL 1.1.0h or higher.
To fix this, SSL_OP_NO_RENEGOTIATION is now explicitly set in
ngx_http_ssl_servername() after adjusting options. Additionally, instead
of c->ssl->renegotiation we now check c->ssl->handshaked, which seems
to be a more correct flag to test, and will prevent the segmentation fault
from happening even if SSL_OP_NO_RENEGOTIATION is not working.
SSL: logging level of "no suitable signature algorithm".
The "no suitable signature algorithm" errors are reported by OpenSSL 1.1.1
when using TLSv1.3 if there are no shared signature algorithms. In
particular, this can happen if the client limits available signature
algorithms to something we don't have a certificate for, or to an empty
list. For example, the following command:
will always result in the "no suitable signature algorithm" error
as the "rsa_pkcs1_sha1" algorithm refers solely to signatures which
appear in certificates and not defined for use in TLS 1.3 handshake
messages.
The SSL_R_NO_COMMON_SIGNATURE_ALGORITHMS error is what BoringSSL returns
in the same situation.
The "no suitable key share" errors are reported by OpenSSL 1.1.1 when
using TLSv1.3 if there are no shared groups (that is, elliptic curves).
In particular, it is easy enough to trigger by using only a single
curve in ssl_ecdh_curve:
On the client side it is seen as "sslv3 alert handshake failure",
"SSL alert number 40":
0:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:ssl/record/rec_layer_s3.c:1528:SSL alert number 40
It can be also triggered with default ssl_ecdh_curve by using a curve
which is not in the default list (X25519, prime256v1, X448, secp521r1,
secp384r1):
Given that many clients hardcode prime256v1, these errors might become
a common problem with TLSv1.3 if ssl_ecdh_curve is redefined. Previously
this resulted in not using ECDH with such clients, but with TLSv1.3 it
is no longer possible and will result in a handshake failure.
The SSL_R_NO_SHARED_GROUP error is what BoringSSL returns in the same
situation.
Nova DasSarma [Wed, 19 Sep 2018 14:26:47 +0000 (09:26 -0500)]
Removed bgcolor attribute on body in error pages and autoindex.
The bgcolor attribute overrides compatibility settings in browsers
and leads to undesirable behavior when the default font color is set
to white in the browser, since font-color is not also overridden.
SSL: disabled renegotiation checks with SSL_OP_NO_RENEGOTIATION.
Following 7319:dcab86115261, as long as SSL_OP_NO_RENEGOTIATION is
defined, it is OpenSSL library responsibility to prevent renegotiation,
so the checks are meaningless.
Additionally, with TLSv1.3 OpenSSL tends to report SSL_CB_HANDSHAKE_START
at various unexpected moments - notably, on KeyUpdate messages and
when sending tickets. This change prevents unexpected connection
close on KeyUpdate messages and when finishing handshake with upcoming
early data changes.
Rewrite: removed r->err_status special handling (ticket #1634).
Trying to look into r->err_status in the "return" directive
makes it behave differently than real errors generated in other
parts of the code, and is an endless source of various problems.
This behaviour was introduced in 726:7b71936d5299 (0.4.4) with
the comment "fix: "return" always overrode "error_page" response code".
It is not clear if there were any real cases this was expected to fix,
but there are several cases which are broken due to this change, some
previously fixed (4147:7f64de1cc2c0).
In ticket #1634, the problem is that when r->err_status is set to
a non-special status code, it is not possible to return a response
by simply returning r->err_status. If this is the case, the only
option is to return script's e->status instead. An example
configuration:
Fixed socket leak with "return 444" in error_page (ticket #274).
Socket leak was observed in the following configuration:
error_page 400 = /close;
location = /close {
return 444;
}
The problem is that "return 444" triggers termination of the request,
and due to error_page termination thinks that it needs to use a posted
request to clear stack. But at the early request processing where 400
errors are generated there are no ngx_http_run_posted_requests() calls,
so the request is only terminated after an external event.
Variants of the problem include "error_page 497" instead (ticket #695)
and various other errors generated during early request processing
(405, 414, 421, 494, 495, 496, 501, 505).
The same problem can be also triggered with "return 499" and "return 408"
as both codes trigger ngx_http_terminate_request(), much like "return 444".
To fix this, the patch adds ngx_http_run_posted_requests() calls to
ngx_http_process_request_line() and ngx_http_process_request_headers()
functions, and to ngx_http_v2_run_request() and ngx_http_v2_push_stream()
functions in HTTP/2.
Since the ngx_http_process_request() function is now only called via
other functions which call ngx_http_run_posted_requests(), the call
there is no longer needed and was removed.
It is possible that after SSL_read() will return SSL_ERROR_WANT_WRITE,
further calls will return SSL_ERROR_WANT_READ without reading any
application data. We have to call ngx_handle_write_event() and
switch back to normal write handling much like we do if there are some
application data, or the write there will be reported again and again.
Similarly, we have to switch back to normal read handling if there
is saved read handler and SSL_write() returns SSL_ERROR_WANT_WRITE.
While SSL_read() most likely to return SSL_ERROR_WANT_WRITE (and SSL_write()
accordingly SSL_ERROR_WANT_READ) during an SSL renegotiation, it is
not necessary mean that a renegotiation was started. In particular,
it can never happen during a renegotiation or can happen multiple times
during a renegotiation.
Because of the above, misleading "peer started SSL renegotiation" info
messages were replaced with "SSL_read: want write" and "SSL_write: want read"
debug ones.
Additionally, "SSL write handler" and "SSL read handler" are now logged
by the SSL write and read handlers, to make it easier to understand that
temporary SSL handlers are called instead of normal handlers.
The "do { c->recv() } while (c->read->ready)" form used in the
ngx_http_lingering_close_handler() is not really correct, as for
example with SSL c->read->ready may be still set when returning NGX_AGAIN
due to SSL_ERROR_WANT_WRITE. Therefore the above might be an infinite loop.
This doesn't really matter in lingering close, as we shutdown write side
of the socket anyway and also disable renegotiation (and even without shutdown
and with renegotiation it requires using very large certificate chain and
tuning socket buffers to trigger SSL_ERROR_WANT_WRITE). But for the sake of
correctness added an NGX_AGAIN check.
gRPC: disabled keepalive when sending control frames was blocked.
If sending request body was not completed (u->request_body_sent is not set),
the upstream keepalive module won't save such a connection. However, it
is theoretically possible (though highly unlikely) that sending of some
control frames can be blocked after the request body was sent. The
ctx->output_blocked flag introduced to disable keepalive in such cases.
The code is now able to parse additional control frames after
the response is received, and can send control frames as well.
This fixes keepalive problems as observed with grpc-c, which can
send window update and ping frames after the response, see
http://mailman.nginx.org/pipermail/nginx/2018-August/056620.html.
Roman Arutyunyan [Wed, 29 Aug 2018 12:56:42 +0000 (15:56 +0300)]
Stream: avoid potential infinite loop at preread phase.
Previously the preread phase code ignored NGX_AGAIN value returned from
c->recv() and relied only on c->read->ready. But this flag is not reliable and
should only be checked for optimization purposes. For example, when using
SSL, c->read->ready may be set when no input is available. This can lead to
calling preread handler infinitely in a loop.
The problem does not manifest itself currently, because in case of
non-buffered reading, chain link created by u->create_request method
consists of a single element.
Maxim Dounin [Fri, 10 Aug 2018 18:54:46 +0000 (21:54 +0300)]
Upstream keepalive: keepalive_requests directive.
The directive configures maximum number of requests allowed on
a connection kept in the cache. Once a connection reaches the number
of requests configured, it is no longer saved to the cache.
The default is 100.
Much like keepalive_requests for client connections, this is mostly
a safeguard to make sure connections are closed periodically and the
memory allocated from the connection pool is freed.
Maxim Dounin [Fri, 10 Aug 2018 18:54:23 +0000 (21:54 +0300)]
Upstream keepalive: keepalive_timeout directive.
The directive configures maximum time a connection can be kept in the
cache. By configuring a time which is smaller than the corresponding
timeout on the backend side one can avoid the race between closing
a connection by the backend and nginx trying to use the same connection
to send a request at the same time.
Maxim Dounin [Fri, 10 Aug 2018 17:49:06 +0000 (20:49 +0300)]
SSL: fixed build with LibreSSL 2.8.0 (ticket #1605).
LibreSSL 2.8.0 "added const annotations to many existing APIs from OpenSSL,
making interoperability easier for downstream applications". This includes
the const change in the SSL_CTX_sess_set_get_cb() callback function (see 9dd43f4ef67e), which breaks compilation.
To fix this, added a condition on how we redefine OPENSSL_VERSION_NUMBER
when working with LibreSSL (see 382fc7069e3a). With LibreSSL 2.8.0,
we now set OPENSSL_VERSION_NUMBER to 0x1010000fL (OpenSSL 1.1.0), so the
appropriate conditions in the code will use "const" as it happens with
OpenSSL 1.1.0 and later versions.
Maxim Dounin [Thu, 9 Aug 2018 17:12:17 +0000 (20:12 +0300)]
HTTP/2: workaround for clients which fail on table size updates.
There are clients which cannot handle HPACK's dynamic table size updates
as added in 12cadc4669a7 (1.13.6). Notably, old versions of OkHttp library
are known to fail on it (ticket #1397).
This change makes it possible to work with such clients by only sending
dynamic table size updates in response to SETTINGS_HEADER_TABLE_SIZE. As
a downside, clients which do not use SETTINGS_HEADER_TABLE_SIZE will
continue to maintain default 4k table.
Maxim Dounin [Thu, 9 Aug 2018 09:15:42 +0000 (12:15 +0300)]
Skipping spaces in configuration files (ticket #1557).
Previously, a chunk of spaces larger than NGX_CONF_BUFFER (4096 bytes)
resulted in the "too long parameter" error during parsing such a
configuration. This was because the code only set start and start_line
on non-whitespace characters, and hence adjacent whitespace characters
were preserved when reading additional data from the configuration file.
Fix is to always move start and start_line if the last character was
a space.
Maxim Dounin [Mon, 6 Aug 2018 23:16:07 +0000 (02:16 +0300)]
SSL: support for TLSv1.3 early data with BoringSSL.
Early data AKA 0-RTT mode is enabled as long as "ssl_early_data on" is
specified in the configuration (default is off).
The $ssl_early_data variable evaluates to "1" if the SSL handshake
isn't yet completed, and can be used to set the Early-Data header as
per draft-ietf-httpbis-replay-04.
Maxim Dounin [Mon, 6 Aug 2018 23:15:28 +0000 (02:15 +0300)]
SSL: enabled TLSv1.3 with BoringSSL.
BoringSSL currently requires SSL_CTX_set_max_proto_version(TLS1_3_VERSION)
to be able to enable TLS 1.3. This is because by default max protocol
version is set to TLS 1.2, and the SSL_OP_NO_* options are merely used
as a blacklist within the version range specified using the
SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
functions.
With this change, we now call SSL_CTX_set_max_proto_version() with an
explicit maximum version set. This enables TLS 1.3 with BoringSSL.
As a side effect, this change also limits maximum protocol version to
the newest protocol we know about, TLS 1.3. This seems to be a good
change, as enabling unknown protocols might have unexpected results.
Additionally, we now explicitly call SSL_CTX_set_min_proto_version()
with 0. This is expected to help with Debian system-wide default
of MinProtocol set to TLSv1.2, see
http://mailman.nginx.org/pipermail/nginx-ru/2017-October/060411.html.
Note that there is no SSL_CTX_set_min_proto_version macro in BoringSSL,
so we call SSL_CTX_set_min_proto_version() and SSL_CTX_set_max_proto_version()
as long as the TLS1_3_VERSION macro is defined.
Dav: changed ngx_copy_file() to preserve access and mtime.
This fixes wrong permissions and file time after cross-device MOVE
in the DAV module (ticket #1577). Broken in 8101d9101ed8 (0.8.9) when
cross-device copying was introduced in ngx_ext_rename_file().
With this change, ngx_copy_file() always calls ngx_set_file_time(),
either with the time provided, or with the time from the original file.
This is considered acceptable given that copying the file is costly anyway,
and optimizing cases when we do not need to preserve time will require
interface changes.
Dav: fixed ngx_copy_file() to truncate destination file.
Previously, ngx_open_file(NGX_FILE_CREATE_OR_OPEN) was used, resulting
in destination file being partially rewritten if exists. Notably,
this affected WebDAV COPY command (ticket #1576).
Fixed NGX_TID_T_FMT format specification for uint64_t.
Previously, "%uA" was used, which corresponds to ngx_atomic_uint_t.
Size of ngx_atomic_uint_t can be easily different from uint64_t,
leading to undefined results.
SSL: save sessions for upstream peers using a callback function.
In TLSv1.3, NewSessionTicket messages arrive after the handshake and
can come at any time. Therefore we use a callback to save the session
when we know about it. This approach works for < TLSv1.3 as well.
The callback function is set once per location on merge phase.
Since SSL_get_session() in BoringSSL returns an unresumable session for
TLSv1.3, peer save_session() methods have been updated as well to use a
session supplied within the callback. To preserve API, the session is
cached in c->ssl->session. It is preferably accessed in save_session()
methods by ngx_ssl_get_session() and ngx_ssl_get0_session() wrappers.
SSL: fixed SSL_clear_options() usage with OpenSSL 1.1.0+.
In OpenSSL 1.1.0 the SSL_CTRL_CLEAR_OPTIONS macro was removed, so
conditional compilation test on it results in SSL_clear_options()
and SSL_CTX_clear_options() not being used. Notably, this caused
"ssl_prefer_server_ciphers off" to not work in SNI-based virtual
servers if server preference was switched on in the default server.
It looks like the only possible fix is to test OPENSSL_VERSION_NUMBER
explicitly.
SSL: logging levels of "unsupported protocol", "version too low".
Starting with OpenSSL 1.1.0, SSL_R_UNSUPPORTED_PROTOCOL instead of
SSL_R_UNKNOWN_PROTOCOL is reported when a protocol is disabled via
an SSL_OP_NO_* option.
Additionally, SSL_R_VERSION_TOO_LOW is reported when using MinProtocol
or when seclevel checks (as set by @SECLEVEL=n in the cipher string)
rejects a protocol, and this is what happens with SSLv3 and @SECLEVEL=1,
which is the default.
There is also the SSL_R_VERSION_TOO_HIGH error code, but it looks like
it is not possible to trigger it.
Events: added configuration check on the number of connections.
There should be at least one worker connection for each listening socket,
plus an additional connection for channel between worker and master,
or starting worker processes will fail.