Maxim Dounin [Thu, 15 Feb 2018 16:06:22 +0000 (19:06 +0300)]
HTTP/2: precalculate hash for "Cookie".
There is no need to calculate hashes of static strings at runtime. The
ngx_hash() macro can be used to do it during compilation instead, similarly
to how it is done in ngx_http_proxy_module.c for "Server" and "Date" headers.
Ruslan Ermilov [Thu, 15 Feb 2018 14:51:37 +0000 (17:51 +0300)]
HTTP/2: fixed ngx_http_v2_push_stream() allocation error handling.
In particular, if a stream object allocation failed, and a client sent
the PRIORITY frame for this stream, ngx_http_v2_set_dependency() could
dereference a null pointer while trying to re-parent a dependency node.
Ruslan Ermilov [Fri, 9 Feb 2018 20:20:08 +0000 (23:20 +0300)]
HTTP/2: fixed null pointer dereference with server push.
r->headers_in.host can be NULL in ngx_http_v2_push_resource().
This happens when a request is terminated with 400 before the :authority
or Host header is parsed, and either pushing is enabled on the server{}
level or error_page 400 redirects to a location with pushes configured.
Ruslan Ermilov [Thu, 8 Feb 2018 06:55:03 +0000 (09:55 +0300)]
HTTP/2: server push.
Resources to be pushed are configured with the "http2_push" directive.
Also, preload links from the Link response headers, as described in
https://www.w3.org/TR/preload/#server-push-http-2, can be pushed, if
enabled with the "http2_push_preload" directive.
Only relative URIs with absolute paths can be pushed.
The number of concurrent pushes is normally limited by a client, but
cannot exceed a hard limit set by the "http2_max_concurrent_pushes"
directive.
Previously, when request body was not available or was previously read in
memory rather than a file, client received HTTP 500 error, but no explanation
was logged in error log. This could happen, for example, if request body was
read or discarded prior to error_page redirect, or if mirroring was enabled
along with dav.
Sergey Kandaurov [Tue, 30 Jan 2018 14:46:31 +0000 (17:46 +0300)]
SSL: using default server context in session remove (closes #1464).
This fixes segfault in configurations with multiple virtual servers sharing
the same port, where a non-default virtual server block misses certificate.
Maxim Dounin [Thu, 11 Jan 2018 18:43:49 +0000 (21:43 +0300)]
Upstream: fixed "header already sent" alerts on backend errors.
Following ad3f342f14ba046c (1.9.13), it is possible that a request where
header was already sent will be finalized with NGX_HTTP_BAD_GATEWAY,
triggering an attempt to return additional error response and the
"header already sent" alert as a result.
In particular, it is trivial to reproduce the problem with a HEAD request
and caching enabled. With caching enabled nginx will change HEAD to GET
and will set u->pipe->downstream_error to suppress sending the response
body to the client. When a backend-related error occurs (for example,
proxy_read_timeout expires), ngx_http_finalize_upstream_request() will
be called with NGX_HTTP_BAD_GATEWAY. After ad3f342f14ba046c this will
result in ngx_http_finalize_request(NGX_HTTP_BAD_GATEWAY).
Fix is to move u->pipe->downstream_error handling to a later point,
where all special response codes are changed to NGX_ERROR.
Reported by Jan Prachar,
http://mailman.nginx.org/pipermail/nginx-devel/2018-January/010737.html.
Roman Arutyunyan [Thu, 21 Dec 2017 10:29:40 +0000 (13:29 +0300)]
Allowed configuration token to start with a variable.
Specifically, it is now allowed to start with a variable expression with braces:
${name}. The opening curly bracket in such a token was previously considered
the start of a new block. Variables located anywhere else in a token worked
fine: foo${name}.
Roman Arutyunyan [Tue, 19 Dec 2017 16:00:27 +0000 (19:00 +0300)]
Fixed capabilities version.
Previously, capset(2) was called with the 64-bit capabilities version
_LINUX_CAPABILITY_VERSION_3. With this version Linux kernel expected two
copies of struct __user_cap_data_struct, while only one was submitted. As a
result, random stack memory was accessed and random capabilities were requested
by the worker. This sometimes caused capset() errors. Now the 32-bit version
_LINUX_CAPABILITY_VERSION_1 is used instead. This is OK since CAP_NET_RAW is
a 32-bit capability (CAP_NET_RAW = 13).
Roman Arutyunyan [Mon, 18 Dec 2017 18:09:39 +0000 (21:09 +0300)]
Improved the capabilities feature detection.
Previously included file sys/capability.h mentioned in capset(2) man page,
belongs to the libcap-dev package, which may not be installed on some Linux
systems when compiling nginx. This prevented the capabilities feature from
being detected and compiled on that systems.
Now linux/capability.h system header is included instead. Since capset()
declaration is located in sys/capability.h, now capset() syscall is defined
explicitly in code using the SYS_capset constant, similarly to other
Linux-specific features in nginx.
Roman Arutyunyan [Wed, 13 Dec 2017 17:40:53 +0000 (20:40 +0300)]
Retain CAP_NET_RAW capability for transparent proxying.
The capability is retained automatically in unprivileged worker processes after
changing UID if transparent proxying is enabled at least once in nginx
configuration.
Maxim Dounin [Thu, 7 Dec 2017 14:09:33 +0000 (17:09 +0300)]
Configure: moved IP_BIND_ADDRESS_NO_PORT test.
In 2c7b488a61fb, IP_BIND_ADDRESS_NO_PORT test was accidentally placed
between SO_BINDANY, IP_TRANSPARENT, and IP_BINDANY tests. Moved it after
these tests.
Roman Arutyunyan [Mon, 20 Nov 2017 17:50:35 +0000 (20:50 +0300)]
Proxy: escape explicit space in URI in default cache key.
If the flag space_in_uri is set, the URI in HTTP upstream request is escaped to
convert space to %20. However this flag is not checked while creating the
default cache key. This leads to different cache keys for requests
'/foo bar' and '/foo%20bar', while the upstream requests are identical.
Additionally, the change fixes background cache updates when the client URI
contains unescaped space. Default cache key in a subrequest is always based on
escaped URI, while the main request may not escape it. As a result, background
cache update subrequest may update a different cache entry.
Roman Arutyunyan [Mon, 20 Nov 2017 18:11:19 +0000 (21:11 +0300)]
Inherit valid_unparsed_uri in cloned subrequests (ticket #1430).
Inheriting this flag will make the cloned subrequest behave consistently with
the parent. Specifically, the upstream HTTP request and cache key created by
the proxy module may depend directly on unparsed_uri if valid_unparsed_uri flag
is set. Previously, the flag was zero for cloned requests, which could make
background update proxy a request different than its parent and cache the result
with a different key. For example, if client URI contained the escaped slash
character %2F, it was used as is by the proxy module in the main request, but
was unescaped in the subrequests.
Roman Arutyunyan [Mon, 20 Nov 2017 10:47:17 +0000 (13:47 +0300)]
Proxy: simplified conditions of using unparsed uri.
Previously, the unparsed uri was explicitly allowed to be used only by the main
request. However the valid_unparsed_uri flag is nonzero only in the main
request, which makes the main request check pointless.
If the data to write is bigger than what the socket can send, and the
reminder is smaller than NGX_SSL_BUFSIZE, then SSL_write() fails with
SSL_ERROR_WANT_WRITE. The reminder of payload however is successfully
copied to the low-level buffer and all the output chain buffers are
flushed. This means that retry logic doesn't work because
ngx_http_upstream_process_non_buffered_request() checks only if there's
anything in the output chain buffers and ignores the fact that something
may be buffered in low-level parts of the stack.
Roman Arutyunyan [Tue, 28 Nov 2017 11:00:00 +0000 (14:00 +0300)]
Upstream keepalive: clean read delayed flag in stored connections.
If a connection with the read delayed flag set was stored in the keepalive
cache, and after picking it from the cache a read timer was set on that
connection, this timer was considered a delay timer rather than a socket read
event timer as expected. The latter timeout is usually much longer than the
former, which caused a significant delay in request processing.
The issue manifested itself with proxy_limit_rate and upstream keepalive
enabled and exists since 973ee2276300 (1.7.7) when proxy_limit_rate was
introduced.
Ruslan Ermilov [Tue, 28 Nov 2017 09:00:24 +0000 (12:00 +0300)]
Fixed "changing binary" when reaper is not init.
On some systems, it's possible that reaper of orphaned processes is
set to something other than "init" process. On such systems, the
changing binary procedure did not work.
The fix is to check if PPID has changed, instead of assuming it's
always 1 for orphaned processes.
Maxim Dounin [Thu, 23 Nov 2017 13:33:40 +0000 (16:33 +0300)]
Configure: fixed clang detection on MINIX.
As per POSIX, basic regular expressions have no alternations, and the
interpretation of the "\|" construct is undefined. At least on MINIX
and Solaris grep interprets "\|" as literal "|", and not as an alternation
as GNU grep does. Removed such constructs introduced in f1daa0356a1d.
This fixes clang detection on MINIX.
Maxim Dounin [Mon, 20 Nov 2017 13:31:07 +0000 (16:31 +0300)]
Fixed worker_shutdown_timeout in various cases.
The ngx_http_upstream_process_upgraded() did not handle c->close request,
and upgraded connections do not use the write filter. As a result,
worker_shutdown_timeout did not affect upgraded connections (ticket #1419).
Fix is to handle c->close in the ngx_http_request_handler() function, thus
covering most of the possible cases in http handling.
Additionally, mail proxying did not handle neither c->close nor c->error,
and thus worker_shutdown_timeout did not work for mail connections. Fix is
to add c->close handling to ngx_mail_proxy_handler().
Also, added explicit handling of c->close to stream proxy,
ngx_stream_proxy_process_connection(). This improves worker_shutdown_timeout
handling in stream, it will no longer wait for some data being transferred
in a connection before closing it, and will also provide appropriate
logging at the "info" level.
Maxim Dounin [Sat, 18 Nov 2017 01:03:27 +0000 (04:03 +0300)]
Gzip: support for a zlib variant from Intel.
A zlib variant from Intel as available from https://github.com/jtkukunas/zlib
uses 64K hash instead of scaling it from the specified memory level, and
also uses 16-byte padding in one of the window-sized memory buffers, and can
force window bits to 13 if compression level is set to 1 and appropriate
compile options are used. As a result, nginx complained with "gzip filter
failed to use preallocated memory" alerts.
This change improves deflate_state allocation detection by testing that
items is 1 (deflate_state is the only allocation where items is 1).
Additionally, on first failure to use preallocated memory we now assume
that we are working with the Intel's modified zlib, and switch to using
appropriate preallocations. If this does not help, we complain with the
usual alerts.
Previous version of this patch was published at
http://mailman.nginx.org/pipermail/nginx/2014-July/044568.html.
The zlib variant in question is used by default in ClearLinux from Intel,
see http://mailman.nginx.org/pipermail/nginx-ru/2017-October/060421.html,
http://mailman.nginx.org/pipermail/nginx-ru/2017-November/060544.html.
Maxim Dounin [Thu, 9 Nov 2017 12:35:20 +0000 (15:35 +0300)]
FastCGI: adjust buffer position when parsing incomplete records.
Previously, nginx failed to move buffer position when parsing an incomplete
record header, and due to this wasn't be able to continue parsing once
remaining bytes of the record header were received.
This can affect response header parsing, potentially generating spurious errors
like "upstream sent unexpected FastCGI request id high byte: 1 while reading
response header from upstream". While this is very unlikely, since usually
record headers are written in a single buffer, this still can happen in real
life, for example, if a record header will be split across two TCP packets
and the second packet will be delayed.
This does not affect non-buffered response body proxying, due to "buf->pos =
buf->last;" at the start of the ngx_http_fastcgi_non_buffered_filter()
function. Also this does not affect buffered response body proxying, as
each input buffer is only passed to the filter once.
Maxim Dounin [Tue, 17 Oct 2017 16:52:16 +0000 (19:52 +0300)]
Core: free shared memory zones only after reconfiguration.
This is what usually happens for zones no longer used in the new
configuration, but zones where size or tag were changed were freed
when creating new memory zones. If reconfiguration failed (for
example, due to a conflicting listening socket), this resulted in a
segmentation fault in the master process.
Reported by Zhihua Cao,
http://mailman.nginx.org/pipermail/nginx-devel/2017-October/010536.html.
In particular, if ngx_http_postpone_filter_add() fails in ngx_chain_add_copy(),
the output chain of the postponed request was left in an invalid state.
Roman Arutyunyan [Wed, 11 Oct 2017 14:38:21 +0000 (17:38 +0300)]
Upstream: disabled upgrading in subrequests.
Upgrading an upstream connection is usually followed by reading from the client
which a subrequest is not allowed to do. Moreover, accessing the header_in
request field while processing upgraded connection ends up with a null pointer
dereference since the header_in buffer is only created for the the main request.
Ruslan Ermilov [Wed, 11 Oct 2017 19:04:28 +0000 (22:04 +0300)]
Upstream: fixed $upstream_status when upstream returns 503/504.
If proxy_next_upstream includes http_503/http_504, and upstream
returns 503/504, $upstream_status converted this to 502 for any
values except the last one.
Upstream: fixed error handling of stale and revalidated cache send.
The NGX_DONE value returned from ngx_http_upstream_cache_send() indicates
that upstream was already finalized in ngx_http_upstream_process_headers().
It was treated as a generic error which resulted in duplicate finalization.
Handled NGX_HTTP_UPSTREAM_INVALID_HEADER from ngx_http_upstream_cache_send().
Previously, it could return within ngx_http_upstream_finalize_request(), and
since it's below NGX_HTTP_SPECIAL_RESPONSE, a client connection could stuck.
Maxim Dounin [Mon, 9 Oct 2017 12:59:10 +0000 (15:59 +0300)]
Upstream: even better handling of invalid headers in cache files.
When parsing of headers in a cache file fails, already parsed headers
need to be cleared, and protocol state needs to be reinitialized. To do
so, u->request_sent is now set to ensure ngx_http_upstream_reinit() will
be called.
This change complements improvements in 46ddff109e72.
Maxim Dounin [Thu, 5 Oct 2017 14:43:05 +0000 (17:43 +0300)]
Upstream hash: reordered peer checks.
This slightly reduces cost of selecting a peer if all or almost all peers
failed, see ticket #1030. There should be no measureable difference with
other workloads.
Maxim Dounin [Thu, 5 Oct 2017 14:42:59 +0000 (17:42 +0300)]
Upstream hash: limited number of tries in consistent case.
While this may result in non-ideal distribution of requests if nginx
won't be able to select a server in a reasonable number of attempts,
this still looks better than severe performance degradation observed
if there is no limit and there are many points configured (ticket #1030).
This is also in line with what we do for other hash balancing methods.
Maxim Dounin [Wed, 4 Oct 2017 18:19:42 +0000 (21:19 +0300)]
Fixed handling of unix sockets in $binary_remote_addr.
Previously, unix sockets were treated as AF_INET ones, and this may
result in buffer overread on Linux, where unbound unix sockets have
2-byte addresses.
Note that it is not correct to use just sun_path as a binary representation
for unix sockets. This will result in an empty string for unbound unix
sockets, and thus behaviour of limit_req and limit_conn will change when
switching from $remote_addr to $binary_remote_addr. As such, normal text
representation is used.
Maxim Dounin [Wed, 4 Oct 2017 18:19:38 +0000 (21:19 +0300)]
Fixed handling of non-null-terminated unix sockets.
At least FreeBSD, macOS, NetBSD, and OpenBSD can return unix sockets
with non-null-terminated sun_path. Additionally, the address may become
non-null-terminated if it does not fit into the buffer provided and was
truncated (may happen on macOS, NetBSD, and Solaris, which allow unix socket
addresess larger than struct sockaddr_un). As such, ngx_sock_ntop() might
overread the sockaddr provided, as it used "%s" format and thus assumed
null-terminated string.
To fix this, the ngx_strnlen() function was introduced, and it is now used
to calculate correct length of sun_path.
Maxim Dounin [Wed, 4 Oct 2017 18:19:33 +0000 (21:19 +0300)]
Fixed buffer overread with unix sockets after accept().
Some OSes (notably macOS, NetBSD, and Solaris) allow unix socket addresses
larger than struct sockaddr_un. Moreover, some of them (macOS, Solaris)
return socklen of the socket address before it was truncated to fit the
buffer provided. As such, on these systems socklen must not be used without
additional check that it is within the buffer provided.
Appropriate checks added to ngx_event_accept() (after accept()),
ngx_event_recvmsg() (after recvmsg()), and ngx_set_inherited_sockets()
(after getsockname()).
We also obtain socket addresses via getsockname() in
ngx_connection_local_sockaddr(), but it does not need any checks as
it is only used for INET and INET6 sockets (as there can be no
wildcard unix sockets).
HTTP/2: enforce writing the sync request body buffer to file.
The sync flag of HTTP/2 request body buffer is used when the size of request
body is unknown or bigger than configured "client_body_buffer_size". In this
case the buffer points to body data inside the global receive buffer that is
used for reading all HTTP/2 connections in the worker process. Thus, when the
sync flag is set, the buffer must be flushed to a temporary file, otherwise
the request body data can be overwritten.
Previously, the sync buffer wasn't flushed to a temporary file if the whole
body was received in one DATA frame with the END_STREAM flag and wasn't
copied into the HTTP/2 body preread buffer. As a result, the request body
might be corrupted (ticket #1384).
Now, setting r->request_body_in_file_only enforces writing the sync buffer
to a temporary file in all cases.
Maxim Dounin [Tue, 3 Oct 2017 15:19:27 +0000 (18:19 +0300)]
Cache: fixed caching of intercepted errors (ticket #1382).
When caching intercepted errors, previous behaviour was to use
proxy_cache_valid times specified, regardless of various cache control
headers present in the response. Fix is to check u->cacheable and
use u->cache->valid_sec as set by various cache control response headers,
similar to how we do this in the normal caching code path.
Maxim Dounin [Mon, 2 Oct 2017 16:10:20 +0000 (19:10 +0300)]
Upstream: better handling of invalid headers in cache files.
If cache file is truncated, it is possible that u->process_header()
will return NGX_AGAIN. Added appropriate handling of this case by
changing the error to NGX_HTTP_UPSTREAM_INVALID_HEADER.
Also, added appropriate logging of this and NGX_HTTP_UPSTREAM_INVALID_HEADER
cases at the "crit" level. Note that this will result in duplicate logging
in case of NGX_HTTP_UPSTREAM_INVALID_HEADER. While this is something better
to avoid, it is considered to be an overkill to implement cache-specific
error logging in u->process_header().
Additionally, u->buffer.start is now reset to be able to receive a new
response, and u->cache_status set to MISS to provide the value in the
$upstream_cache_status variable, much like it happens on other cache file
errors detected by ngx_http_file_cache_read(), instead of HIT, which is
believed to be misleading.
Ruslan Ermilov [Fri, 22 Sep 2017 19:49:42 +0000 (22:49 +0300)]
Modules compatibility: down flag promoted to a bitmask.
It is to be used as a bitmask with various bits set/reset when appropriate. 63b8b157b776 made a similar change to ngx_http_upstream_rr_peer_t.down and
ngx_stream_upstream_rr_peer_t.down.
Previously, "get indexed header" message was logged when in fact only
header name was obtained using an index, and "get indexed header name"
was logged when full header representation (name and value) was obtained
using an index. Fixed version logs "get indexed name" and "get indexed
header" respectively.
Roman Arutyunyan [Tue, 12 Sep 2017 10:44:04 +0000 (13:44 +0300)]
Stream: fixed logging UDP upstream timeout.
Previously, when the first UDP response packet was not received from the
proxied server within proxy_timeout, no error message was logged before
switching to the next upstream. Additionally, when one of succeeding response
packets was not received within the timeout, the timeout error had low severity
because it was logged as a client connection error as opposed to upstream
connection error.
Introduced time truncation to December 31, 9999 (ticket #1368).
Various buffers are allocated in an assumption that there would be
no more than 4 year digits. This might not be true on platforms
with 64-bit time_t, as 64-bit time_t is able to represent more than that.
Such dates with more than 4 year digits hardly make sense though, as
various date formats in use do not allow them anyway.
As such, all dates are now truncated by ngx_gmtime() to December 31, 9999.
This should have no effect on valid dates, though will prevent potential
buffer overflows on invalid ones.
Fixed ngx_gmtime() on 32-bit platforms with 64-bit time_t.
In ngx_gmtime(), instead of casting to ngx_uint_t we now work with
time_t directly. This allows using dates after 2038 on 32-bit platforms
which use 64-bit time_t, notably NetBSD and OpenBSD.
As the code is not able to work with negative time_t values, argument
is now set to 0 for negative values. As a positive side effect, this
results in Epoch being used for such values instead of a date in distant
future.
Piotr Sikora [Wed, 30 Aug 2017 21:52:11 +0000 (14:52 -0700)]
HTTP/2: signal 0-byte HPACK's dynamic table size.
This change lets NGINX talk to clients with SETTINGS_HEADER_TABLE_SIZE
smaller than the default 4KB. Previously, NGINX would ACK the SETTINGS
frame with a small dynamic table size, but it would never send dynamic
table size update, leading to a connection-level COMPRESSION_ERROR.
Also, it allows clients to release 4KB of memory per connection, since
NGINX doesn't use HPACK's dynamic table when encoding headers, however
clients had to maintain it, since NGINX never signaled that it doesn't
use it.
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Roman Arutyunyan [Mon, 11 Sep 2017 12:32:31 +0000 (15:32 +0300)]
Stream: relaxed next upstream condition (ticket #1317).
When switching to a next upstream, some buffers could be stuck in the middle
of the filter chain. A condition existed that raised an error when this
happened. As it turned out, this condition prevented switching to a next
upstream if ssl preread was used with the TCP protocol (see the ticket).
In fact, the condition does not make sense for TCP, since after successful
connection to an upstream switching to another upstream never happens. As for
UDP, the issue with stuck buffers is unlikely to happen, but is still possible.
Specifically, if a filter delays sending data to upstream.
The condition can be relaxed to only check the "buffered" bitmask of the
upstream connection. The new condition is simpler and fixes the ticket issue
as well. Additionally, the upstream_out chain is now reset for UDP prior to
connecting to a new upstream to prevent repeating the client data twice.