Maxim Dounin [Sat, 17 Mar 2018 20:04:26 +0000 (23:04 +0300)]
gRPC: special handling of "trailer only" responses.
The gRPC protocol makes a distinction between HEADERS frame with
the END_STREAM flag set, and a HEADERS frame followed by an empty
DATA frame with the END_STREAM flag. The latter is not permitted,
and results in errors not being propagated through nginx. Instead,
gRPC clients complain that "server closed the stream without sending
trailers" (seen in grpc-go) or "13: Received RST_STREAM with error
code 2" (seen in grpc-c).
To fix this, nginx now returns HEADERS with the END_STREAM flag if
the response length is known to be 0, and we are not expecting
any trailer headers to be added. And the response length is
explicitly set to 0 in the gRPC proxy if we see initial HEADERS frame
with the END_STREAM flag set.
Maxim Dounin [Sat, 17 Mar 2018 20:04:25 +0000 (23:04 +0300)]
gRPC: special handling of the TE request header.
According to the gRPC protocol specification, the "TE" header is used
to detect incompatible proxies, and at least grpc-c server rejects
requests without "TE: trailers".
To preserve the logic, we have to pass "TE: trailers" to the backend if
and only if the original request contains "trailers" in the "TE" header.
Note that no other TE values are allowed in HTTP/2, so we have to remove
anything else.
Maxim Dounin [Sat, 17 Mar 2018 20:04:24 +0000 (23:04 +0300)]
The gRPC proxy module.
The module allows passing requests to upstream gRPC servers.
The module is built by default as long as HTTP/2 support is compiled in.
Example configuration:
grpc_pass 127.0.0.1:9000;
Alternatively, the "grpc://" scheme can be used:
grpc_pass grpc://127.0.0.1:9000;
Keepalive support is available via the upstream keepalive module. Note
that keepalive connections won't currently work with grpc-go as it fails
to handle SETTINGS_HEADER_TABLE_SIZE.
To use with SSL:
grpc_pass grpcs://127.0.0.1:9000;
SSL connections use ALPN "h2" when available. At least grpc-go works fine
without ALPN, so if ALPN is not available we just establish a connection
without it.
Maxim Dounin [Sat, 17 Mar 2018 20:04:23 +0000 (23:04 +0300)]
Upstream: u->conf->preserve_output flag.
The flag can be used to continue sending request body even after we've
got a response from the backend. In particular, this is needed for gRPC
proxying of bidirectional streaming RPCs, and also to send control frames
in other forms of RPCs.
Maxim Dounin [Sat, 17 Mar 2018 20:04:22 +0000 (23:04 +0300)]
Upstream: u->request_body_blocked flag.
The flag indicates whether last ngx_output_chain() returned NGX_AGAIN
or not. If the flag is set, we arm the u->conf->send_timeout timer.
The flag complements c->write->ready test, and allows to stop sending
the request body in an output filter due to protocol-specific flow
control.
Basic trailer headers support allows one to access response trailers
via the $upstream_trailer_* variables.
Additionally, the u->conf->pass_trailers flag was introduced. When the
flag is set, trailer headers from the upstream response are passed to
the client. Like normal headers, trailer headers will be hidden
if present in u->conf->hide_headers_hash.
Maxim Dounin [Thu, 1 Mar 2018 17:25:50 +0000 (20:25 +0300)]
Core: ngx_current_msec now uses monotonic time if available.
When clock_gettime(CLOCK_MONOTONIC) (or faster variants, _FAST on FreeBSD,
and _COARSE on Linux) is available, we now use it for ngx_current_msec.
This should improve handling of timers if system time changes (ticket #189).
The r->out chain link could be left uninitialized in case of error.
A segfault could happen if the subrequest handler accessed it.
The issue was introduced in commit 20f139e9ffa8.
Roman Arutyunyan [Wed, 28 Feb 2018 13:56:58 +0000 (16:56 +0300)]
Generic subrequests in memory.
Previously, only the upstream response body could be accessed with the
NGX_HTTP_SUBREQUEST_IN_MEMORY feature. Now any response body from a subrequest
can be saved in a memory buffer. It is available as a single buffer in r->out
and the buffer size is configured by the subrequest_output_buffer_size
directive.
Upstream, proxy and fastcgi code used to handle the old-style feature is
removed.
Roman Arutyunyan [Thu, 22 Feb 2018 10:16:21 +0000 (13:16 +0300)]
Generate error for unsupported IPv6 transparent proxy.
On some platforms (for example, Linux with glibc 2.12-2.25) IPv4 transparent
proxying is available, but IPv6 transparent proxying is not. The entire feature
is enabled in this case and NGX_HAVE_TRANSPARENT_PROXY macro is set to 1.
Previously, an attempt to enable transparency for an IPv6 socket was silently
ignored in this case and was usually followed by a bind(2) EADDRNOTAVAIL error
(ticket #1487). Now the error is generated for unavailable IPv6 transparent
proxy.
If during configuration parsing of the geo directive the memory
allocation has failed, pool used to parse configuration inside
the block, and sometimes the temporary pool were not destroyed.
Maxim Dounin [Thu, 15 Feb 2018 16:06:22 +0000 (19:06 +0300)]
HTTP/2: precalculate hash for "Cookie".
There is no need to calculate hashes of static strings at runtime. The
ngx_hash() macro can be used to do it during compilation instead, similarly
to how it is done in ngx_http_proxy_module.c for "Server" and "Date" headers.
Ruslan Ermilov [Thu, 15 Feb 2018 14:51:37 +0000 (17:51 +0300)]
HTTP/2: fixed ngx_http_v2_push_stream() allocation error handling.
In particular, if a stream object allocation failed, and a client sent
the PRIORITY frame for this stream, ngx_http_v2_set_dependency() could
dereference a null pointer while trying to re-parent a dependency node.
Ruslan Ermilov [Fri, 9 Feb 2018 20:20:08 +0000 (23:20 +0300)]
HTTP/2: fixed null pointer dereference with server push.
r->headers_in.host can be NULL in ngx_http_v2_push_resource().
This happens when a request is terminated with 400 before the :authority
or Host header is parsed, and either pushing is enabled on the server{}
level or error_page 400 redirects to a location with pushes configured.
Ruslan Ermilov [Thu, 8 Feb 2018 06:55:03 +0000 (09:55 +0300)]
HTTP/2: server push.
Resources to be pushed are configured with the "http2_push" directive.
Also, preload links from the Link response headers, as described in
https://www.w3.org/TR/preload/#server-push-http-2, can be pushed, if
enabled with the "http2_push_preload" directive.
Only relative URIs with absolute paths can be pushed.
The number of concurrent pushes is normally limited by a client, but
cannot exceed a hard limit set by the "http2_max_concurrent_pushes"
directive.
Previously, when request body was not available or was previously read in
memory rather than a file, client received HTTP 500 error, but no explanation
was logged in error log. This could happen, for example, if request body was
read or discarded prior to error_page redirect, or if mirroring was enabled
along with dav.
Sergey Kandaurov [Tue, 30 Jan 2018 14:46:31 +0000 (17:46 +0300)]
SSL: using default server context in session remove (closes #1464).
This fixes segfault in configurations with multiple virtual servers sharing
the same port, where a non-default virtual server block misses certificate.
Maxim Dounin [Thu, 11 Jan 2018 18:43:49 +0000 (21:43 +0300)]
Upstream: fixed "header already sent" alerts on backend errors.
Following ad3f342f14ba046c (1.9.13), it is possible that a request where
header was already sent will be finalized with NGX_HTTP_BAD_GATEWAY,
triggering an attempt to return additional error response and the
"header already sent" alert as a result.
In particular, it is trivial to reproduce the problem with a HEAD request
and caching enabled. With caching enabled nginx will change HEAD to GET
and will set u->pipe->downstream_error to suppress sending the response
body to the client. When a backend-related error occurs (for example,
proxy_read_timeout expires), ngx_http_finalize_upstream_request() will
be called with NGX_HTTP_BAD_GATEWAY. After ad3f342f14ba046c this will
result in ngx_http_finalize_request(NGX_HTTP_BAD_GATEWAY).
Fix is to move u->pipe->downstream_error handling to a later point,
where all special response codes are changed to NGX_ERROR.
Reported by Jan Prachar,
http://mailman.nginx.org/pipermail/nginx-devel/2018-January/010737.html.
Roman Arutyunyan [Thu, 21 Dec 2017 10:29:40 +0000 (13:29 +0300)]
Allowed configuration token to start with a variable.
Specifically, it is now allowed to start with a variable expression with braces:
${name}. The opening curly bracket in such a token was previously considered
the start of a new block. Variables located anywhere else in a token worked
fine: foo${name}.
Roman Arutyunyan [Tue, 19 Dec 2017 16:00:27 +0000 (19:00 +0300)]
Fixed capabilities version.
Previously, capset(2) was called with the 64-bit capabilities version
_LINUX_CAPABILITY_VERSION_3. With this version Linux kernel expected two
copies of struct __user_cap_data_struct, while only one was submitted. As a
result, random stack memory was accessed and random capabilities were requested
by the worker. This sometimes caused capset() errors. Now the 32-bit version
_LINUX_CAPABILITY_VERSION_1 is used instead. This is OK since CAP_NET_RAW is
a 32-bit capability (CAP_NET_RAW = 13).
Roman Arutyunyan [Mon, 18 Dec 2017 18:09:39 +0000 (21:09 +0300)]
Improved the capabilities feature detection.
Previously included file sys/capability.h mentioned in capset(2) man page,
belongs to the libcap-dev package, which may not be installed on some Linux
systems when compiling nginx. This prevented the capabilities feature from
being detected and compiled on that systems.
Now linux/capability.h system header is included instead. Since capset()
declaration is located in sys/capability.h, now capset() syscall is defined
explicitly in code using the SYS_capset constant, similarly to other
Linux-specific features in nginx.
Roman Arutyunyan [Wed, 13 Dec 2017 17:40:53 +0000 (20:40 +0300)]
Retain CAP_NET_RAW capability for transparent proxying.
The capability is retained automatically in unprivileged worker processes after
changing UID if transparent proxying is enabled at least once in nginx
configuration.
Maxim Dounin [Thu, 7 Dec 2017 14:09:33 +0000 (17:09 +0300)]
Configure: moved IP_BIND_ADDRESS_NO_PORT test.
In 2c7b488a61fb, IP_BIND_ADDRESS_NO_PORT test was accidentally placed
between SO_BINDANY, IP_TRANSPARENT, and IP_BINDANY tests. Moved it after
these tests.
Roman Arutyunyan [Mon, 20 Nov 2017 17:50:35 +0000 (20:50 +0300)]
Proxy: escape explicit space in URI in default cache key.
If the flag space_in_uri is set, the URI in HTTP upstream request is escaped to
convert space to %20. However this flag is not checked while creating the
default cache key. This leads to different cache keys for requests
'/foo bar' and '/foo%20bar', while the upstream requests are identical.
Additionally, the change fixes background cache updates when the client URI
contains unescaped space. Default cache key in a subrequest is always based on
escaped URI, while the main request may not escape it. As a result, background
cache update subrequest may update a different cache entry.
Roman Arutyunyan [Mon, 20 Nov 2017 18:11:19 +0000 (21:11 +0300)]
Inherit valid_unparsed_uri in cloned subrequests (ticket #1430).
Inheriting this flag will make the cloned subrequest behave consistently with
the parent. Specifically, the upstream HTTP request and cache key created by
the proxy module may depend directly on unparsed_uri if valid_unparsed_uri flag
is set. Previously, the flag was zero for cloned requests, which could make
background update proxy a request different than its parent and cache the result
with a different key. For example, if client URI contained the escaped slash
character %2F, it was used as is by the proxy module in the main request, but
was unescaped in the subrequests.
Roman Arutyunyan [Mon, 20 Nov 2017 10:47:17 +0000 (13:47 +0300)]
Proxy: simplified conditions of using unparsed uri.
Previously, the unparsed uri was explicitly allowed to be used only by the main
request. However the valid_unparsed_uri flag is nonzero only in the main
request, which makes the main request check pointless.
If the data to write is bigger than what the socket can send, and the
reminder is smaller than NGX_SSL_BUFSIZE, then SSL_write() fails with
SSL_ERROR_WANT_WRITE. The reminder of payload however is successfully
copied to the low-level buffer and all the output chain buffers are
flushed. This means that retry logic doesn't work because
ngx_http_upstream_process_non_buffered_request() checks only if there's
anything in the output chain buffers and ignores the fact that something
may be buffered in low-level parts of the stack.
Roman Arutyunyan [Tue, 28 Nov 2017 11:00:00 +0000 (14:00 +0300)]
Upstream keepalive: clean read delayed flag in stored connections.
If a connection with the read delayed flag set was stored in the keepalive
cache, and after picking it from the cache a read timer was set on that
connection, this timer was considered a delay timer rather than a socket read
event timer as expected. The latter timeout is usually much longer than the
former, which caused a significant delay in request processing.
The issue manifested itself with proxy_limit_rate and upstream keepalive
enabled and exists since 973ee2276300 (1.7.7) when proxy_limit_rate was
introduced.
Ruslan Ermilov [Tue, 28 Nov 2017 09:00:24 +0000 (12:00 +0300)]
Fixed "changing binary" when reaper is not init.
On some systems, it's possible that reaper of orphaned processes is
set to something other than "init" process. On such systems, the
changing binary procedure did not work.
The fix is to check if PPID has changed, instead of assuming it's
always 1 for orphaned processes.
Maxim Dounin [Thu, 23 Nov 2017 13:33:40 +0000 (16:33 +0300)]
Configure: fixed clang detection on MINIX.
As per POSIX, basic regular expressions have no alternations, and the
interpretation of the "\|" construct is undefined. At least on MINIX
and Solaris grep interprets "\|" as literal "|", and not as an alternation
as GNU grep does. Removed such constructs introduced in f1daa0356a1d.
This fixes clang detection on MINIX.
Maxim Dounin [Mon, 20 Nov 2017 13:31:07 +0000 (16:31 +0300)]
Fixed worker_shutdown_timeout in various cases.
The ngx_http_upstream_process_upgraded() did not handle c->close request,
and upgraded connections do not use the write filter. As a result,
worker_shutdown_timeout did not affect upgraded connections (ticket #1419).
Fix is to handle c->close in the ngx_http_request_handler() function, thus
covering most of the possible cases in http handling.
Additionally, mail proxying did not handle neither c->close nor c->error,
and thus worker_shutdown_timeout did not work for mail connections. Fix is
to add c->close handling to ngx_mail_proxy_handler().
Also, added explicit handling of c->close to stream proxy,
ngx_stream_proxy_process_connection(). This improves worker_shutdown_timeout
handling in stream, it will no longer wait for some data being transferred
in a connection before closing it, and will also provide appropriate
logging at the "info" level.
Maxim Dounin [Sat, 18 Nov 2017 01:03:27 +0000 (04:03 +0300)]
Gzip: support for a zlib variant from Intel.
A zlib variant from Intel as available from https://github.com/jtkukunas/zlib
uses 64K hash instead of scaling it from the specified memory level, and
also uses 16-byte padding in one of the window-sized memory buffers, and can
force window bits to 13 if compression level is set to 1 and appropriate
compile options are used. As a result, nginx complained with "gzip filter
failed to use preallocated memory" alerts.
This change improves deflate_state allocation detection by testing that
items is 1 (deflate_state is the only allocation where items is 1).
Additionally, on first failure to use preallocated memory we now assume
that we are working with the Intel's modified zlib, and switch to using
appropriate preallocations. If this does not help, we complain with the
usual alerts.
Previous version of this patch was published at
http://mailman.nginx.org/pipermail/nginx/2014-July/044568.html.
The zlib variant in question is used by default in ClearLinux from Intel,
see http://mailman.nginx.org/pipermail/nginx-ru/2017-October/060421.html,
http://mailman.nginx.org/pipermail/nginx-ru/2017-November/060544.html.
Maxim Dounin [Thu, 9 Nov 2017 12:35:20 +0000 (15:35 +0300)]
FastCGI: adjust buffer position when parsing incomplete records.
Previously, nginx failed to move buffer position when parsing an incomplete
record header, and due to this wasn't be able to continue parsing once
remaining bytes of the record header were received.
This can affect response header parsing, potentially generating spurious errors
like "upstream sent unexpected FastCGI request id high byte: 1 while reading
response header from upstream". While this is very unlikely, since usually
record headers are written in a single buffer, this still can happen in real
life, for example, if a record header will be split across two TCP packets
and the second packet will be delayed.
This does not affect non-buffered response body proxying, due to "buf->pos =
buf->last;" at the start of the ngx_http_fastcgi_non_buffered_filter()
function. Also this does not affect buffered response body proxying, as
each input buffer is only passed to the filter once.
Maxim Dounin [Tue, 17 Oct 2017 16:52:16 +0000 (19:52 +0300)]
Core: free shared memory zones only after reconfiguration.
This is what usually happens for zones no longer used in the new
configuration, but zones where size or tag were changed were freed
when creating new memory zones. If reconfiguration failed (for
example, due to a conflicting listening socket), this resulted in a
segmentation fault in the master process.
Reported by Zhihua Cao,
http://mailman.nginx.org/pipermail/nginx-devel/2017-October/010536.html.
In particular, if ngx_http_postpone_filter_add() fails in ngx_chain_add_copy(),
the output chain of the postponed request was left in an invalid state.
Roman Arutyunyan [Wed, 11 Oct 2017 14:38:21 +0000 (17:38 +0300)]
Upstream: disabled upgrading in subrequests.
Upgrading an upstream connection is usually followed by reading from the client
which a subrequest is not allowed to do. Moreover, accessing the header_in
request field while processing upgraded connection ends up with a null pointer
dereference since the header_in buffer is only created for the the main request.
Ruslan Ermilov [Wed, 11 Oct 2017 19:04:28 +0000 (22:04 +0300)]
Upstream: fixed $upstream_status when upstream returns 503/504.
If proxy_next_upstream includes http_503/http_504, and upstream
returns 503/504, $upstream_status converted this to 502 for any
values except the last one.
Upstream: fixed error handling of stale and revalidated cache send.
The NGX_DONE value returned from ngx_http_upstream_cache_send() indicates
that upstream was already finalized in ngx_http_upstream_process_headers().
It was treated as a generic error which resulted in duplicate finalization.
Handled NGX_HTTP_UPSTREAM_INVALID_HEADER from ngx_http_upstream_cache_send().
Previously, it could return within ngx_http_upstream_finalize_request(), and
since it's below NGX_HTTP_SPECIAL_RESPONSE, a client connection could stuck.
Maxim Dounin [Mon, 9 Oct 2017 12:59:10 +0000 (15:59 +0300)]
Upstream: even better handling of invalid headers in cache files.
When parsing of headers in a cache file fails, already parsed headers
need to be cleared, and protocol state needs to be reinitialized. To do
so, u->request_sent is now set to ensure ngx_http_upstream_reinit() will
be called.
This change complements improvements in 46ddff109e72.
Maxim Dounin [Thu, 5 Oct 2017 14:43:05 +0000 (17:43 +0300)]
Upstream hash: reordered peer checks.
This slightly reduces cost of selecting a peer if all or almost all peers
failed, see ticket #1030. There should be no measureable difference with
other workloads.
Maxim Dounin [Thu, 5 Oct 2017 14:42:59 +0000 (17:42 +0300)]
Upstream hash: limited number of tries in consistent case.
While this may result in non-ideal distribution of requests if nginx
won't be able to select a server in a reasonable number of attempts,
this still looks better than severe performance degradation observed
if there is no limit and there are many points configured (ticket #1030).
This is also in line with what we do for other hash balancing methods.