BUG/MEDIUM: mux-h1: Force close mode for bodyless message announcing a C-L
When dealing with EOH block, we must be sure to force the close mode for
message with no payload but annoncing a non-null content-length.
It is mainly an issue on the server side but it could be encountered on
client side too. Without this fix, a request can be switched to the DONE
state while the server is still expecting the payload. In an ideal world,
this case should not happen. But in conjunction with other bugs, it may lead
to a desynchro between haproxy and the server.
Now, when a non-null content-length is announced but we know we reached the
end of the message, we force the close mode. The only exception is for
bodyless responses (204s, 304s and responses to head requests).
Thanks to Martino Spagnuolo (r3verii) for his detailed report on this issue.
This patch must be backported to all stable version.
BUG/MAJOR: mux-h2: detect incomplete transfers on HEADERS frames as well
Checks are already made on H2 to detect inconsistencies between
advertised content-length and transferred data (excess of data or
premature END_STREAM flag on DATA frame). However, as found by
Martino Spagnuolo (r3verii), a subtle case remains: if the END_STREAM
appears on the HEADERS frame (i.e. a regular request for example),
then the check is not made. In this case it is possible to advertise
more contents than will really be transferred. If the other side uses
HTTP/1.1, and the server responds before the end of the transfer,
this means that the number of advertised bytes that will never be
transferred and that the server will drain will be taken from the
next request, effectively hiding a part of the header.
In practice this can be used to force subsequent requests to fail, or
when running with "http-reuse never" or when running with a totally
idle server, to perform a request smuggling by constructing specially
crafted request pairs where the first one is used to trigger an early
response and hide parts of or all headers of the second one, to
instead use a second embedded one that was not subject to analysis.
The risk remains moderate given the low prevalence of "http-reuse never"
in production environments, and of idle servers.
The fix consists in detecting if advertised content-length remains when
processing an END_STREAM flag on a HEADERS frame. It also does it for
trailers, which turn out to be another way to abuse the bug. However it
takes great care not to break bodyless responses (204, 304 and responses
to HEAD requests) that may present a content-length that doesn't reflect
the presence of a body in the response.
A temporary alternative to the fix is to disable HTTP/2 by specifying
"alpn http/1.1" on "bind" lines, and adding "option disable-h2-upgrade"
in HTTP frontends.
BUG/MINOR: server: fix a possible leak of an error message in dynamic servers
In 3.4-dev6, commit de5fc2f515 ("BUG/MINOR: server: set auto SNI for
dynamic servers") allowed to properly set the SNI, and return an error
message. However the error message is leaked after being printed on the
CLI.
BUG/MINOR: debug: properly mark the entire libs archive read-only
In 3.4-dev7, commit e1738b665d ("MINOR: debug: read all libs in memory
when set-dumpable=libs") reads dependencies into memory to store them as
a tar archive for later debugging. There was an attempt to mark the whole
archive read-only, except that the size passed in argument to mprotect()
is wrong: lib_size is only assigned after the operation and is still zero
at the moment this is done. new_size ought to be used instead.
This needs to be backported wherever the commit above is backported, at
least 3.2.
BUG/MINOR: stream: add the newly added SF_TXN_* flags to strm_show_flags()
3 new enum values and a mask were added in latest -dev with commit 24e05fe33a ("MINOR: stream: Use a pcli transaction to replace pcli_*
members"), unfortunately the entries needed by the "flags" command were
forgotten.
DOC: config: fix typo introduce in max-threads-per-group documentation
Since commit 0af603f46f ("MEDIUM: threads: change the default
max-threads-per-group value to 16"), it was written "Tha minimum" instead
of "The minimum". No backport needed, this is only in latest -dev.
BUG/MINOR: servers: fix last_sess date calculation
In 3.4-dev8, commit e264523112 ("MINOR: servers: Don't update last_sess
if it did not change") adjusted the last_sess date to avoid writing to
the same cache line all the time, however a typo makes it pick the wrong
second because it uses now_ms instead of now_ns (so the date would roughly
change every 12 days).
BUG/MINOR: compression: properly disable request when setting response
In 2.8, commit ead43fe4f2 ("MEDIUM: compression: Make it so we can
compress requests as well.") added the ability to independently enable
compression on request and/or response. However there's a bug in the
"compression direction response" case, which preserves only the request
flag and adds the response direction instead of clearing the request
flag, so this directive would clear offload and make it impossible to
disable request if it was already previously enabled.
This can be backported to stable releases as far as 2.8.
BUG/MEDIUM: tcpcheck: Release temporary small chunk when retrying on http-check
When a http request is sent during an http healthcheck, if an error is
triggered while the output buffer is a small buffer, another attempt is made
with a larger one. When this happens, the temporary chunk used to format
headers must be released.
BUG/MINOR: tcpcheck: Don't release ruleset when parsing 'spop-check' ruleset
Ruleset are stored in a global tree, released on deinit staged. All errors
are fatal and abort the configuration parsing. So the current ruleset must
not be released here.
BUG/MINOR: mux-h1: Fix test to skip trailers from chunked messages
The test to remove trailers from chunked messages was inverted and is thus
ineffective. The flag for the requests was tested on client side and the flag
for the response was tested on server side. It should be the opposite.
BUG/MINOR: mux-h1: Fix condition to send null-chunk for bodyless message
When the EOH block is processed, before sending message headers, there is a
test to know if there is no payload. In case of a chunked message, a
null-chunk is emitted, except for bodyless response. For instance, a
response to a HEAD request has no payload at all and no null-chunk.
However, the test for bodyless responses is not correct. Only
H1S_F_BODYLESS_RESP flag is tested. But this flag can be set on server side
when we are processing the request. To fix the issue, the test was
adapted. The null-chunk is added if a message with no payload is chunked and
it is a request or a non-bodyless responses.
This patch must be backported to all stable version.
BUG/MINOR: log: also wait for the response when logging response headers
A typo in commit e51be30f78 ("BUG/MINOR: log: consider format expression
dependencies to decide when to log") made HRSHP appear twice (persistent
response) while the second one ought to be HRSHV (volatile response, e.g.
header values). This is harmless in practice since logs always wait for
at least headers.
This should be backported wherever the patch above was backported.
BUG/MINOR: H2: Don't forget to free shared_rx_bufs on failure
In h2_init(), if we have a failure while creating the h2c, and we
allocated shared_tx_bufs, don't forget to free it, otherwise we'll have
a memory leak.
This was introduced in 3.1 by commit a891534bfd ("MINOR: mux-h2: allocate
the array of shared rx bufs in the h2c"), so the fix should be backported
as far as 3.2.
BUG/MINOR: h2: Don't look at the exclusive bit for PRIORITY frame
When receiving a PRIORITY frame, when checking if the stream id provided
is ours, ignore bit 31, as it is the exclusive bit, and not part of the
stream id, whoever sends a PRIORITY frame with its own id and the
exclusive bit set will not be considered an error, as it should per the
RFC.
The impact is basically non-existent since we don't use PRIORITY frames,
it's only that we would ignore such an invalid frame instead of breaking
the connection.
The bug was introduced in 1.9 with commit 92153fccd3 ("BUG/MINOR: h2:
properly check PRIORITY frames") so the fix must be backported to all
versions.
BUG/MINOR: h2: make tune.h2.log-errors actually work
Commit e67e36c9eb35eb1477ae0e425a660ee0c631cecd introduced
tune.h2.log-errors, that would let you pick if you wanted to know about
stream errors, connection errors, or no error.
However, a logic error made it so no error will be picked for any value
except for "none", in which case connection would be picked. Fix that by
just checking the strcmp() return value correctly.
BUG/MEDIUM: tasks: Make sure we don't schedule a task already running
In task_schedule(), before attempting to set the new task expiration
date, make sure it is not running by trying to set the TASK_RUNNING
flag, and waiting if it is already there. Having the flag set will
ensure that the task won't be running while we're modifying it.
There is a very rare race condition, where the expire would be set by
task_schedule(), then the running task might set it to something else,
and if it sets it to TICK_ETERNITY before task_schedule() calls
__task_queue(), then we will hit a BUG_ON() there.
This is very hard to reproduce, but has been reported a few times,
included in Github issue #3327, which should now be fixed.
This should be backported as far back as 2.8.
WIP: Make sure the task is not running before changing expire
BUG/MINOR: mux-h2: count a proto error when rejecting a stream on parsing error
The proxy error counter was not updated in h2c_frt_handle_headers() in
case of failure to decode a HEADERS frame. Make sure to keep it updated.
This can be backported to all stable versions.
BUG/MINOR: mux-h2: count a protocol error when failing to parse a trailer
Commit aab1a60977 ("BUG/MEDIUM: h2/htx: always fail on too large trailers")
explicitly returned an RST_STREAM on failure to decode some trailers, and
used the code H2_ERR_INTERNAL_ERROR. However there are multiple possible
causes for this failure to happen, and it turns out that it's much more
likely to be related to a protocol error than a decompression error. So
let's change this to PROTOCOL_ERROR, and count a protocol error on the
proxy and in the session.
This can be backported to all stable versions (with adjustments related
to these versions, maybe focusing on 3.2 max is reasonable).
CLEANUP: applet: Remove useless shadow pointer from appctx
This pointer was used during the appctx refactoring performed in 2.6. The
ctx union was still there and this pointer was used as the "shadow" of the
svcctx pointer used by most commands. In 2.7, the union was removed, making
the shadow pointer useless. Let's remove it now.
MINOR: stream: Use a pcli transaction to replace pcli_* members
A new type of transaction was introduced for master-cli streams. So
SF_TXN_PCLI flag and functions to allocate and destroy PCLI transactions
were added.
In the stream structure, all pcli_* members were moved in the pcli
transaction and the txn union was updated accordingly.
When it was ambiguous, a test on the transaction type was performed. For
instance to destroy the transaciton.
MINOR: stream: Add flags to identify the stream tansaction when allocated
To be able to deal with different types of transaction for a stream, new
stream flags was added to know the transaction type when allocated. For now
only HTTP transactions can be allocated, so only SF_TXN_HTTP was
introduced. The mask SF_TXN_MASK must be used to get the transaction type.
The transaction type is set when it is allocated and removed when it is
destroyed.
The HTTP transaction is moved in an union. For now, it is the only possible
transaction that can be allocated. But that will change. Thanks to this
commit and the next one, it will be possible to deal with different kind of
transactions for a stream.
This patch looks quite huge, but it is more or less a renaming of all
accesses to "txn" field by "txn.http".
MEDIUM: cli: increase the payload pattern up to 64 bytes
The maximum size allowed for the payload pattern was increase up to 64 bytes
(65 bytes because of the trailing \0), to be able to use a sha256 of random
data for instance. It could be useful to prevent any data smuggling on the
payload.
Note that on the CLI, it could be possible to have only the buffer size as a
limit, because the command line is only consumed once all commands are
executed. The payload pattern is only a pointer in the buffer where the
command line was copied. However, for the master CLI, the data are streamed
to the worker, so we must keep a copy of he payload pattern. This is why we
must limit its size.
MEDIUM: cli: Add support for dynamically allocated payloads
It is now possible to deal with too big payload to fit in a buffer, without
changing the buffer size. By default, a payload up to 128 KB can be
dynamically allocated. "tune.cli.max-payload-size" global parameter can be
used to change this value, with some caution for huge values.
For CLI command handler functions, there is no change at all. A pointer on
the payload is still passed as parameter. Internally, an area is allocated
for the payload only if it is too big.
The payload pattern used to detect the end of the payload is part from the
allocated area.
MEDIUM: cli: Make a buffer for the command payload
The payload is now saved as a buffer in the CLI context instead of a simple
pointer. It is mandatory to be able to reallocate the payload if it is too
big.
MINOR: cli: Handle the paylod pattern as a pointer in the cmdline buffer
Instead of copying the payload pattern in the CLI context, we now only save
a pointer on this pattern. It is possible because the command line is copied
in the CLI context. Arguments are already handled this way when the command
is processed.
BUILD: config: also set DEF_MAX_THREADS_PER_GROUP when not using threads
The single-threaded build is currently broken in development since commit 0af603f46f ("MEDIUM: threads: change the default max-threads-per-group
value to 16") because it doesn't set the default for the non-threaded
build. Let's set it to 1.
BUG/MAJOR: sched: protect task->expire on 32-bit platforms
Commit 7d40b3134 ("MEDIUM: sched: do not run a same task multiple times
in series") required to slightly reorder a few fields in struct tasklet
and task in order to reuse an existing hole and keep tree nodes aligned.
The problem is that nice+expire were placed in struct task just before
rq, and that a 48-bit hole replaces them in struct tasklet on 64-bit
platforms, just before the struct list. However, on 32-bit platforms,
the hole is only 16-bit and preserves nice, but expire is overwritten
by the first pointer of the list element. This is not a problem for
real tasklets which do not use these fields, but it definitely is a
problem for tasks that are cast to tasklets in the run queues, because
the expire field can be overwritten when the task is woken up, and if
requeued as-is, it will expire at a completely random date.
This is what caused certain regtests to fail on i386 and 32-bit arm
machines.
This fix needs to be backported wherever the patch above was backported.
The bug has no effect on 64-bit platforms. The fix doesn't inflate
structs on 64-bit, but will raise struct tasklet from 40 to 44 bytes on
32-bit platforms.
Thanks to William for spotting the problem, bisecting it and providing
a working reproducer.
CLEANUP: mux-h1: remove the unneeded test on conn->owner in h1s_finish_detach()
There was a test below the "release" label on conn->owner to decide
whether to kill the connection or not. But this test is not needed,
because:
- for frontends, it's always set so the test never matches
- for backends, it was NULL on the second stream once a request
was being reused from an idle pool, so it couldn't be used to
discriminate between connections. In practice, the goal was to
try to detect certain dead connections but all cases leading to
such connections are either already handled in the tests before
(which don't reach this label), or are handled by the other
conditions.
CLEANUP: mux-h1: avoid using conn->owner in uncertain areas
Some places use conn->owner to retrieve the session. It's valid because
each time it is done, it's on the frontend, though it's not always 100%
obvious and sometimes requires deep code analysis. Let's clarify these
points and even rely on an intermediary variable to make it clearer. One
case where the owner couldn't differ from the session without being NULL
was also eliminated.
MEDIUM: session: always reset the conn->owner on backend when installing mux
When installing a mux on the backend, unless we have a good reason for
keeping the session set in conn->owner, we must reset it. Having the
session there just hides potential bugs and prevents certain tests from
being properly done.
Now it is much clearer: conn->owner remains set to the session on
frontend connections, is set to the session when the connection is
private or assimilated private and belongs to the session list, or
is NULL.
MEDIUM: muxes: always set conn->owner to the session that owns the connection
When an idle connection is private or considered private, session_add_conn()
is called to add it to the list of connections owned by the session. But
in case of allocation failure, the session is not set, which results in
a long list of possible situations that are all corner cases which are
difficult to test (and debug).
This commit relies on the fact that it is already permitted to have
conn->owner pointing to a session even if the connection couldn't be
added to the session's list, as this was already the case in
conn_backend_get() when dealing with HOL_RISK. Also as seen in commit 3aab17bd566 added in 2.4, it is already possible to have conn->owner
set with the connection not being in a list, and only the list element
is checked for this.
This commit modifies session_add_conn() to always set conn->onwer, even
if the list element couldn't be allocated. This way it's possible to
always refer to conn->owner to find the session owning a private conn
even in case of failure to allocate an entry. This requires to change
the checks on conn->owner to a check of the list element to see if the
connection belongs to a session, the pre-assignment of sess to
conn->owner in conn_backend_get() is no longer needed, same for the
pre-assignment in http_wait_for_response(), and that's all.
The H1 mux remained unchanged because since it cannot multiplex, in
case it fails to allocate a pconn, it instantly kills the connection.
BUG/MINOR: sample: adjust dependencies for channel output bytes counters
The bytes_in, bytes_out, {req,res}.bytes_{in,out} sample fetch functions
are marked as internal dependencies only. But that's not exact, they are
statistics. Request traffic (bytes_in, req.bytes*) is usable starting
from the request, while response traffic (bytes_out, res.bytes*) is usable
as soon as a response begins to be received, and all are valid till the
end of the transaction.
This patch marks the request stats RQFIN and the response stats RSFIN,
so that they're valid at any moment and the logs backend knows it must
wait for the latest moment to emit such a line. With this change, the
line above now correctly produces:
MINOR: sample: make RQ/RS stats available everywhere
Sample fetch functions working on the request/response stats were marked
as being only compatible with the log phase. This is a mistake because
by definitions, stats can be consulted anywhere from the moment they
start to appear. It's only that they are valid as far as the logs. At the
moment, no sample fetch function depends on RQFIN, and only res.timer.data
depends on RSFIN. But this will be needed to relax certain sample fetch
functions (and will need to be backported along with a few other patches).
BUG/MINOR: log: consider format expression dependencies to decide when to log
Log-format properly takes into account the LW_* flags set by the log
aliases, however its consideration for the sample fetch expressions is
very minimalistic (HTTP y/n). It poses a problem because logging some
statistics doesn't work unless some log aliases are involved to force
the log to wait till the end.
Before this change, the following log-format:
log-format "res.timer.data=%[res.timer.data]"
would log "res.timer.data=0" regardless of the time taken to transfer
data, and the log would be emitted instantly. However, this line:
log-format "res.timer.data=%[res.timer.data] %B"
would properly log the time taken to transfer the data because %B which
carries the log flag LW_BYTES forces the log to wait till the end.
This patch makes sure that anything requiring response (headers or body)
waits for at least the response, and that anything requiring response body
or end of transfer (req/res) waits till the end (LW_BYTES). Thanks to
this, the log above is now correct even without the "%B" hack.
This should be backported at least till the latest LTS.
The ACME Profiles extension (draft-ietf-acme-profiles) allows a client
to request a specific certificate profile by including a "profile" field
in the newOrder request. This lets the CA select the appropriate
certificate issuance policy (e.g. "classic", "shortlived") for a given
order.
A new "profile" keyword is added to the acme section. When set, its
value is included in the newOrder JSON payload sent to the CA.
BUG/MEDIUM: checks: Don't forget to set the "alt_proto" field
The target address type has been added to checks in commit d759e60a3292f425aee66384e87ae227ce191c08, but as part of that address
type is the "alt_proto" field, that was not properly set for dynamic
servers, That could lead to checks not working for any protocol that use
a non-zero alt_proto, such as QUIC. So set it properly.
BUILD: ssl/sample: potential null pointer dereference in sample_conv_aes
gcc flags aead_tag_trash as potentially NULL at the chunk_memcpy call
inside the (!dec && gcm) block, because it cannot correlate the
condition with the allocation that only happens in that same branch. Add
an explicit NULL check to silence the warning.
This was caught by cross-zoo.yml:
In file included from include/haproxy/connection.h:28,
from src/ssl_sample.c:27:
In function ‘b_orig’,
inlined from ‘sample_conv_aes’ at src/ssl_sample.c:540:23:
include/haproxy/buf.h:80:17: error: potential null pointer dereference [-Werror=null-dereference]
80 | return b->area;
| ~^~~~~~
In function ‘b_data’,
inlined from ‘sample_conv_aes’ at src/ssl_sample.c:540:3:
include/haproxy/buf.h:100:17: error: potential null pointer dereference [-Werror=null-dereference]
100 | return b->data;
| ~^~~~~~
BUG/MINOR: xprt_qstrm: reduce max record length check
When trying to read QMux transport parameters frame, the record length
is checked to ensure it is not bigger than the buffer size. The
objective is to detect as soon as possible when receiving data that
cannot be handled and to close the connection.
In fact, this check is not accurate, as it did not take into account the
size of the Record length field itself. This patch fixes the comparison
by substracting with the size of the decoded varint.
BUG/MINOR: mux_quic: convert QCC rx.rlen to 64bits
This patch is related to the issue reported on the previous issue
related to QMux record length parsing.
QCC rx.rlen is used to store the decoded record length. Convert it into
a plain 64bits integer instead of a size_t. This ensures it is
sufficient to decode record length, even with an increase of
max_record_length value (not currently implemented).
This should fix github build issue #3334 for 32bits architecture.
BUG/MINOR: xprt_qstrm: read record length in 64bits
QMux record lengths are encoded as a QUIC varint. Thus in theory, it
requires a 64bits integer to be able to read the whole value. In
practice, if the record is bigger than bufsize, read operation cannot be
completed and an error must be reported.
This patch fixes record length decoding both in xprt_qstrm layer, which
is now performed in two steps. The value is first read in a 64bits
integer instead of a size_t whose size is dependent on the architecture.
Result is then checked against bufsize and if inferior stored in the
previously used variable (xprt ctx rxrlen member).
This should partially fix build issue reported on github #3334.
Tim Duesterhus [Mon, 13 Apr 2026 19:23:47 +0000 (21:23 +0200)]
CI: Build halog as part of contrib.yml
halog does not make use of any of the "fancy" build flags that HAProxy does and
except for itself only includes ebtree. There is no need to build it as part of
the VTest workflows.
In commit 7640d794 ("CI: Integrate Musl build into vtest.yml"), the
alpine job was integrated into VTest.yml. However, most of the task are
still duplicated and changes in the workflow need edits of copy/paste
code in two places because of that.
This commit deduplicates the code by making the alpine job part of the
matrix, like it was done for macOS.
Miroslav Zagorac [Thu, 16 Apr 2026 01:09:09 +0000 (03:09 +0200)]
MINOR: otel: test: unified run scripts into a single symlinked script
Replaced SH_ARGS variables with 'set --' and "${@}" to ensure proper
quoting of haproxy command-line arguments. Then replaced individual
per-config run scripts with a single generic run-test-config.sh that
derives the configuration directory from its own filename. The former
scripts became symlinks, and a new run-empty.sh symlink was added.
Removed the insecure-fork-wanted runtime check from the OTel filter parser
and all related mentions from documentation and test configuration.
The OpenTelemetry C wrapper library can now explicitly start all necessary
OTel threads immediately after configuration parsing, so it is no longer
affected by the HAProxy thread/process creation restriction and the
insecure-fork-wanted option is no longer needed.
Miroslav Zagorac [Wed, 15 Apr 2026 01:48:45 +0000 (03:48 +0200)]
MINOR: otel: test: added option parsing to the speed test script
Added getopts argument parsing with -h, -r and -d options, making sample
rate limits and wrk runtime configurable. Introduced a dry-run variable
for debugging, httpd cleanup in sh_exit, and removal of the log directory
on exit if empty.
Miroslav Zagorac [Mon, 13 Apr 2026 22:11:38 +0000 (00:11 +0200)]
MINOR: otel: added debug thread ID support for the OTel C wrapper library
Added per-thread ID tracking for the OpenTelemetry C wrapper debug system.
Registered HAProxy worker threads are identified by their tid;
unregistered threads (such as those created internally by the OTel SDK)
receive unique IDs from an atomic offset counter.
MINOR: mux-quic: release BE idle conn after GOAWAY reception
An idle backend connection is useless if a HTTP/3 GOAWAY frame has been
received. Indeed, it is forbid to open new stream on such connection.
Thus, this patch ensures such connections are removed as soon as
possible. This is performed via a new check in qcc_is_dead() on
QC_CF_CONN_SHUT flag for backend connections. This ensures that a shut
connection is released instead of being inserted in idle list on detach
operation.
This commits also completes qcc_recv() with a new call to qcc_is_dead()
on its ending. This is necessary if GOAWAY is received on an idle
connection. For now, this is only checked for backend connections as a
GOAWAY is without any real effect for frontend connections. Thus, this
extra protection ensures that we do not break by incident QUIC frontend
support.
qcc_io_recv() also performs qcc_decode_qcs(). However, an extra
qcc_is_dead() is not necessary in this case as the following
qcc_io_process() already performs it.
MEDIUM: h3: prevent new streams on GOAWAY reception
Implement the reception of a HTTP/3 GOAWAY frame. This is performed via
the new function h3_parse_goaway_frm(). The advertised ID is stored in
new <id_shut_r> h3c member. It serves to ensure that a bigger ID is not
advertised when receiving multiple GOAWAY frames.
GOAWAY frame reception is only really useful on the backend side for
haproxy. When this occurs, h3c is now flagged with H3_CF_GOAWAY_RECV.
Also, QCC is also updated with new flag QC_CF_CONN_SHUT. This flag
indicates that no new stream may be opened on the connection. Callback
avail_streams() is thus edited to report 0 in this case.
Rework GOAWAY emission handling at the HTTP/3 layer. Previously, h3c
member <id_goaway> were updated during the connection on each new
streams attach. This ID was finally reused when a GOAWAY was emitted.
However, this is unnecessary to keep an updated ID during the connection
lifetime. Indeed, <largest_bidi_r> QCC member can be used for the same
purpose. Note that this is only useful for the frontend side. For a
client connection, GOAWAY contains a PUSH ID, thus 0 can be used for
now.
Thus, <id_goaway> in h3c is renamed <id_shut_l>. Now it is only sent
when the GOAWAY is emitted. This allows to reject any streams with a
greater ID. This approach is considered simpler.
Note that <largest_bidi_r> is not strictly similar to the obsolete
<id_goaway>. Indeed, if an error occurs before the corresponding stream
layer allocation, the former would still be incremented. However,
this is not a real issue as GOAWAY specification is clear that lower IDs
are not guaranteed to being handled well, until either the stream is
closed or resetted, or the whole connection is teared down.
BUG/MINOR: mux_quic: limit avail_streams() to 2^62
QUIC streams ID are encoded as 62-bit integer and cannot reuse an ID
within a connection. This is necessary to take into account this
limitation for backend connections.
This patch implements this via qmux_avail_streams() callback. In the
case where the connection is approaching the encoding limit, reduce the
advertised value until the limit is reached. Note that this is very
unlikely to happen as the value is pretty high.
MINOR: compression: prefix compression oriented functions with "comp_"
add comp_ prefix to all compression related functions, in anticipation
of decompression functions that will be integrated in the same file, so
we don't get mixed up between the two.
BUG/MINOR: task: fix uninitialised read in run_tasks_from_lists()
Since commit 7d40b31 ("MEDIUM: sched: do not run a same task multiple
times in series") I noticed that any valid config, once haproxy was
started, would produce uninitialised reads on valgrind:
[NOTICE] (243490) : haproxy version is 3.4-dev9-0af603-2
[NOTICE] (243490) : path to executable is /tmp/haproxy/haproxy
[WARNING] (243490) : missing timeouts for proxy 'test'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] (243490) : Automatically setting global.maxconn to 491.
==243490== Thread 4:
==243490== Conditional jump or move depends on uninitialised value(s)
==243490== at 0x44DBD7: run_tasks_from_lists (task.c:567)
==243490== by 0x44E99E: process_runnable_tasks (task.c:913)
==243490== by 0x395A41: run_poll_loop (haproxy.c:2981)
==243490== by 0x396178: run_thread_poll_loop (haproxy.c:3211)
==243490== by 0x4E2DAA3: start_thread (pthread_create.c:447)
==243490== by 0x4EBAA63: clone (clone.S:100)
==243490==
Looking at it, it is caused by the fact that task->last_run member which
was introduced and used by commit above is never assigned a default value
so the first time it is used, reading from it causes uninitialised read.
To fix the issue, we simply ensure last_run task member gets a default
value when the task or tasklet is created. We use '0' as default value,
as the value itself is from minor importance because the member is used
to detect if the task has already been executed for the current loop cycle
so it will self-correct in any case.
BUG/MEDIUM: mux-h2: ignore conn->owner when deciding if a connection is dead
Originally, valid backend connections always used to have conn->owner
pointing to the owner session. In 1.9, commit 93c885 enforced this when
implementing backend H2 support by making sure that no orphaned connection
was left on its own with no remaining stream able to handle it.
Later, idle connections were reworked so that they were no longer
necessarily attached to a stream, but could be directly in the server,
accessed via a hash, so it started to become possible to have conn->owner
left to NULL when picking such a connection. It in fact happens for
http-reuse always, when the second stream picks the connection because
its owner is NULL and it's not changed.
More recently, a case was identified where it could be theoretically
possible to reinsert a dead connection into an idle list, and commit 59c599f3f0 ("BUG/MEDIUM: mux-h2: make sure not to move a dead
connection to idle") addressed that possibility in 3.3 by adding the
h2c_is_dead() test in h2_detach() before deciding to reinsert a
connection into the idle list.
Unfortunately, the combination of changes above results in the following
sequence being possible:
- a stream requires a connection, connect_server() creates one, sets
conn->owner to the session, then when the session is being set up,
the SSL stack calls conn_create_mux() which gets the session from
conn->owner, passes it to mux->init() (h2_init), which in turn
creates the backend stream and assigns it this session.
- when the stream ends, it detaches (h2_detach), and the call to
h2c_is_dead() returns false because h2c->conn->owner is set. The
connection is thus added into the server's idle list.
- a new stream comes, it finds the connection in the server's list,
which doesn't require to set conn->owner, the stream is added via
h2_attach() which passes the stream's session, and that one is
properly set on h2s again, but never on conn->owner.
- the stream finishes, detaches, and this time the call to h2c_is_dead()
sees the owner is NULL, thus indicates that the connection seems dead
so it's not added again to the idle list, and it's destroyed.
Note that this most only happens at low loads (at most one active stream
per connection, so typically at most than one active stream per thread),
where the H2 reuse ratio on a server configured with http-reuse always
or http-reuse aggressive is close to 50%. At high loads, this is much more
rare, though looking at the reuse stats for a server, it's visible that a
sustained load still shows around 1% of the connections being periodically
renewed.
Interestingly, for RHTTP the impact is more important because there
was already a work around for this test in h2c_is_dead() but it uses
conn_is_reverse(), which is never correct in this case (it should be
called conn_to_reverse() because it says the conn must be reversed
and has not yet been), so this extra test doesn't protect against the
NULL check, and connections are closed after each stream is terminated
(if there is no other stream left).
After a long analysis with Amaury and Olivier, it was concluded that:
- the h2c_is_dead() addition is finally not the best solution and
could be refined, however in the current state it's a bit tricky.
- the conn->owner test in h2c_is_dead() is no longer relevant,
probably since 2.4 when connections were stored using hash_nodes
in the servers and would no longer depend on a session, so that
test should be removed.
- the test conn_is_reverse() on the same line, that was added to
ignore the former for RHTTP, and which doesn't properly work either
should be removed as well.
Some further cleanups should be performed to clarify this situation.
This patch implements the points above, and it should be backported
wherever commit 59c599f3f0 was backported.
MEDIUM: threads: change the default max-threads-per-group value to 16
A lot of our subsystems start to be shared by thread groups now
(listeners, queues, stick-tables, stats, idle connections, LB algos).
This has allowed to recover the performance that used to be out of
reach on losely shared platforms (typically AMD EPYC systems), but in
parallel other large unified systems (Xeon and large Arm in general)
still suffer from the remaining contention when placing too many
threads in a group.
A first test running on a 64-core Neoverse-N1 processor with a single
backend with one server and no LB algo specifiied shows 1.58 Mrps with
64 threads per group, and 1.71 Mrps with 16 threads per group. The
difference is essentially spent updating stats counters everywhere.
Another test is the connection:close mode, delivering 85 kcps with
64 threads per group, and 172 kcps (202%) with 16 threads per group.
In this case it's mostly the more numerous listeners which improve
the situation as the change is mostly in the kernel:
So let's change the default value to 16. It also happens to match what's
used by default on EPYC systems these days.
This change was marked MEDIUM as it will increase the number of listening
sockets on some systems, to match their counter parts from other vendors,
which is easier for capacity planning.
DOC: config: fix spelling of "max-threads-per-group" in the index
It was spelled "max-thread-per-group" (without 's'). No backport is
needed unless commit 7e22d9c484 ("MEDIUM: cpu-topo: Add a new
"max-threads-per-group" global keyword") and its possible successors
are backported.
Released version 3.4-dev9 with the following main changes :
- DOC: config: fix ambiguous info in log-steps directive description
- MINOR: filters: add filter name to flt_conf struct
- MEDIUM: filters: add "filter-sequence" directive
- REGTESTS: add a test for "filter-sequence" directive
- Revert "CLEANUP: tcpcheck: Don't needlessly expose proxy_parse_tcpcheck()"
- MINOR: tcpcheck: reintroduce proxy_parse_tcpcheck() symbol
- BUG/MEDIUM: haterm: Move all init functions of haterm in haterm_init.c
- BUG/MEDIUM: mux-h1: Disable 0-copy forwarding when draining the request
- MINOR: servers: The right parameter for idle-pool.shared is "full"
- DOC: config: Fix two typos in the server param "healthcheck" description
- BUG/MINOR: http-act: fix a typo in the "pause" action error message
- MINOR: tcpcheck: Reject unknown keyword during parsing of healthcheck section
- BUG/MEDIUM: tcpcheck/server: Fix parsing of healthcheck param for dynamic servers
- BUG/MINOR: counters: fix unexpected 127 char GUID truncation for shm-stats-file objects
- BUG/MEDIUM: tcpcheck: Properly retrieve tcpcheck type to install the best mux
- BUG/MEDIUM: payload: validate SNI name_len in req.ssl_sni
- BUG/MEDIUM: jwe: fix NULL deref crash with empty CEK and non-dir alg
- BUG/MEDIUM: jwt: fix heap overflow in ECDSA signature DER conversion
- BUG/MEDIUM: jwe: fix memory leak in jwt_decrypt_secret with var argument
- BUG: hlua: fix stack overflow in httpclient headers conversion
- BUG/MINOR: hlua: fix stack overflow in httpclient headers conversion
- BUG/MINOR: hlua: fix format-string vulnerability in Patref error path
- BUG/MEDIUM: chunk: fix typo allocating small trash with bufsize_large
- BUG/MEDIUM: chunk: fix infinite loop in get_larger_trash_chunk()
- BUG/MINOR: peers: fix OOB heap write in dictionary cache update
- CI: VTest build with git clone + cache
- BUG/MEDIUM: connection: Wake the stconn on error when failing to create mux
- CI: github: update to cache@v5
- Revert "BUG: hlua: fix stack overflow in httpclient headers conversion"
- CI: github: fix vtest path to allow correct caching
- CI: github: add the architecture to the cache key for vtest2
- MEDIUM: connections: Really enforce mux protocol requirements
- MINOR: tools: Implement net_addr_type_is_quic()
- MEDIUM: check: Revamp the way the protocol and xprt are determined
- BUG/MAJOR: slz: always make sure to limit fixed output to less than worst case literals
- MINOR: lua: add tune.lua.openlibs to restrict loaded Lua standard libraries
- REGTESTS: lua: add tune.lua.openlibs to all Lua reg-tests
- BUG/MINOR: resolvers: fix memory leak on AAAA additional records
- BUG/MINOR: spoe: fix pointer arithmetic overflow in spoe_decode_buffer()
- BUG/MINOR: http-act: validate decoded lengths in *-headers-bin
- BUG/MINOR: haterm: Return the good start-line for 100-continue interim message
- BUG/MEDIUM: samples: Fix handling of SMP_T_METH samples
- BUG/MINOR: sample: fix info leak in regsub when exp_replace fails
- BUG/MEDIUM: mux-fcgi: prevent record-length truncation with large bufsize
- BUG/MINOR: hlua: fix use-after-free of HTTP reason string
- BUG/MINOR: mux-quic: fix potential NULL deref on qcc_release()
- BUG/MINOR: quic: increment pos pointer on QMux transport params parsing
- MINOR: xprt_qstrm: implement Rx buffering
- MINOR: xprt_qstrm/mux-quic: handle extra QMux frames after params
- MINOR: xprt_qstrm: implement Tx buffering
- MINOR: xprt_qstrm: handle connection errors
- MEDIUM: mux-quic: implement QMux record parsing
- MEDIUM: xprt_qstrm: implement QMux record parsing
- MEDIUM: mux-quic/xprt_qstrm: implement QMux record emission
- DOC: update draft link for QMux protocol
- BUG/MINOR: do not crash on QMux reception of BLOCKED frames
- Revert "BUG/MEDIUM: haterm: Move all init functions of haterm in haterm_init.c"
- BUG/MEDIUM: haterm: Properly initialize the splicing support for haterm
- BUG/MINOR: mux_quic: prevent QMux crash on qcc_io_send() error path
- BUG/MINOR: xprt_qstrm: do not parse record length on read again
- MEDIUM: otel: added OpenTelemetry filter skeleton
- MEDIUM: otel: added configuration and utility layer
- MEDIUM: otel: added configuration parser and event model
- MEDIUM: otel: added post-parse configuration check
- MEDIUM: otel: added memory pool and runtime scope layer
- MEDIUM: otel: implemented filter callbacks and event dispatcher
- MEDIUM: otel: wired OTel C wrapper library integration
- MEDIUM: otel: implemented scope execution and span management
- MEDIUM: otel: added context propagation via carrier interfaces
- MEDIUM: otel: added HTTP header operations for context propagation
- MEDIUM: otel: added HAProxy variable storage for context propagation
- MINOR: otel: added prefix-based variable scanning
- MEDIUM: otel: added CLI commands for runtime filter management
- MEDIUM: otel: added group action for rule-based scope execution
- MINOR: otel: added log-format support to the sample parser and runtime
- MINOR: otel: test: added test and benchmark suite for the OTel filter
- MINOR: otel: added span link support
- MINOR: otel: added metrics instrument support
- MINOR: otel: added log-record signal support
- MINOR: otel: test: added full-event test config
- DOC: otel: added documentation
- DOC: otel: test: added test README-* files
- DOC: otel: test: added speed test guide and benchmark results
- DOC: otel: added cross-cutting design patterns document
- MINOR: otel: added flt_otel_sample_eval and exposed flt_otel_sample_add_kv
- MINOR: otel: changed log-record attr to use sample expressions
- MINOR: otel: changed instrument attr to use sample expressions
- DOC: otel: added README.md overview document
- CLEANUP: ot: use the item API for the variables trees
- BUG/MINOR: ot: removed dead code in flt_ot_parse_cfg_str()
- BUG/MINOR: ot: fixed wrong NULL check in flt_ot_parse_cfg_group()
- BUILD: ot: removed explicit include path when building opentracing filter
- MINOR: ot: renamed the variable dbg_indent_level to flt_ot_dbg_indent_level
- CI: Drop obsolete `packages: write` permission from `quic-interop-*.yml`
- CI: Consistently add a top-level `permissions` definition to GHA workflows
- CI: Wrap all `if:` conditions in `${{ }}`
- CI: Fix regular expression escaping in matrix.py
- CI: Update to actions/checkout@v6
- CI: Simplify version extraction with `haproxy -vq`
- CI: Merge `aws-lc.yml` and `aws-lc-fips.yml` into `aws-lc.yml`
- CI: Merge `aws-lc-template.yml` into `aws-lc.yml`
- CI: Consistently set up VTest with `./.github/actions/setup-vtest`
- MINOR: mux_quic: remove duplicate QMux local transport params
- CI: github: add bash to the musl job
- BUG/MINOR: quic: do not use hardcoded values in QMux TP frame builder
- BUG/MINOR: log: Fix error message when using unavailable fetch in logfmt
- CLEANUP: log: Return `size_t` from `sess_build_logline_orig()`
- CLEANUP: stream: Explain the two-step initialization in `stream_generate_unique_id()`
- CLEANUP: stream: Reduce duplication in `stream_generate_unique_id()`
- CLEANUP: http_fetch: Use local `unique_id` variable in `smp_fetch_uniqueid()`
- CI: build WolfSSL job with asan enabled
- MINOR: tools: memvprintf(): remove <out> check that always true
- BUG/MEDIUM: cli: Properly handle too big payload on a command line
- REGTESTS: Never reuse server connection in reg-tests/jwt/jwt_decrypt.vtc
- MINOR: errors: remove excessive errmsg checks
- BUG/MINOR: haterm: preserve the pipe size margin for splicing
- MEDIUM: acme: implement dns-persist-01 challenge
- MINOR: acme: extend resolver-based DNS pre-check to dns-persist-01
- DOC: configuration: document dns-persist-01 challenge type and options
- BUG/MINOR: acme: read the wildcard flag from the authorization response
- BUG/MINOR: acme: don't pass NULL into format string
- BUG/MINOR: haterm: don't apply the default pipe size margin twice
- CLEANUP: Make `lf_expr` parameter of `sess_build_logline_orig()` const
- MINOR: Add `generate_unique_id()` helper
- MINOR: Allow inlining of `stream_generate_unique_id()`
- CLEANUP: log: Stop touching `struct stream` internals for `%ID`
- MINOR: check: Support generating a `unique_id` for checks
- MINOR: http_fetch: Add support for checks to `unique-id` fetch
- MINOR: acme: display the type of challenge in ACME_INITIAL_DELAY
- MINOR: mjson: reintroduce mjson_next()
- CI: Remove obsolete steps from musl.yml
- CI: Use `sh` in `actions/setup-vtest/action.yml`
- CI: Sync musl.yml with vtest.yml
- CI: Integrate Musl build into vtest.yml
- CI: Use `case()` function
- CI: Generate vtest.yml matrix on `ubuntu-slim`
- CI: Run contrib.yml on `ubuntu-slim`
- CI: Use `matrix:` in contrib.yml
- CI: Build `dev/haring/` as part of contrib.yml
- MINOR: htx: Add helper function to get type and size from the block info field
- BUG/MEDIUM: htx: Properly handle block modification during defragmentation
- BUG/MEDIUM: htx: Don't count delta twice when block value is replaced
- MINOR: ssl: add TLS 1.2 values in HAPROXY_KEYLOG_XX_LOG_FMT
- EXAMPLES: ssl: keylog entries are greater than 1024
- BUILD: Makefile: don't forget to also delete haterm on make clean
- MINOR: stats: report the number of thread groups in "show info"
- CLEANUP: sample: fix the comment regarding the range of the thread sample fetch
- MINOR: sample: return the number of the current thread group
- MINOR: sample: add new sample fetch functions reporting current CPU usage
- BUG/MEDIUM: peers: trash of expired entries delayed after fullresync
- DOC: remove the alpine/musl status job image
- MINOR: mux-quic: improve documentation for qcs_attach_sc()
- MINOR: mux-quic: reorganize code for app init/shutdown
- MINOR: mux-quic: perform app init in case of early shutdown
- MEDIUM: quic: implement fe.stream.max-total
- MINOR: mux-quic: close connection when reaching max-total streams
- REGTESTS: add QUIC test for max-total streams setting
- MEDIUM: threads: start threads by groups
- MINOR: acme: opportunistic DNS check for dns-persist-01 to skip challenge-ready steps
- BUG/MINOR: acme: fix fallback state after failed initial DNS check
- CLEANUP: acme: no need to reset ctx state and http_state before nextreq
- BUG/MINOR: threads: properly set the number of tgroups when non using policy
BUG/MINOR: threads: properly set the number of tgroups when non using policy
When nbthread is set, the CPU policies are not used and do not set
nbthread nor nbtgroups. When back into thread_detect_count(), these
are set respectively to thr_max and 1. The problem which becomes very
visible with max-threads-per-group, is that setting this one in
combination with nbthreads results in only one group with the calculated
number of threads per group. And there's not even a warning. So basically
a configuration having:
global
nbthread 64
max-threads-per-group 8
would only start 8 threads.
In this case, grp_min remains valid and should be used, so let's just
change the assignment so that the number of groups is always correct.
A few ifdefs had to move because the calculations were only made for
the USE_CPU_AFFINITY case. Now these parts have been refined so that
all the logic continues to apply even without USE_CPU_AFFINITY.
One visible side effect is that setting nbthread above 64 will
automatically create the associated number of groups even when
USE_CPU_AFFINITY is not set. Previously it was silently changed
to match the per-group limit.
Ideally this should be backported to 3.2 where the issue was
introduced, though it may change the behavior of configs that were
silently being ignored (e.g. "nbthread 128"), so the backport should
be considered with care. At least 3.3 should have it because it uses
cpu-policy by default so it's only for failing cases that it would be
involved.
CLEANUP: acme: no need to reset ctx state and http_state before nextreq
The nextreq label already implement setting http_state to ACME_HTTP_REQ
and setting ctx->state to st. It is only needed to set the st variable
before jumping to nextreq.
BUG/MINOR: acme: fix fallback state after failed initial DNS check
When the opportunistic initial DNS check (ACME_INITIAL_RSLV_READY) fails,
the state machine was incorrectly transitioning to ACME_RSLV_RETRY_DELAY
instead of ACME_CLI_WAIT. This caused the challenge to enter the DNS retry
loop rather than falling back to the normal cond_ready flow that waits for
the CLI signal.
Also reorder ACME_CLI_WAIT in the state enum and trace switch to reflect
the actual execution order introduced in the previous commit: it comes after
ACME_INITIAL_RSLV_READY, not before ACME_INITIAL_RSLV_TRIGGER.
MINOR: acme: opportunistic DNS check for dns-persist-01 to skip challenge-ready steps
For dns-persist-01, the "_validation-persist.<domain>" TXT record is set once
and never changes between renewals. Add an initial opportunistic DNS check
(ACME_INITIAL_RSLV_TRIGGER / ACME_INITIAL_RSLV_READY states) that runs before
the challenge-ready conditions are evaluated. If all domains already have the
TXT record, the challenge is submitted immediately without going through the
cli/delay/dns challenge-ready steps, making renewals faster once the record is
in place.
The new ACME_RDY_INITIAL_DNS flag is automatically set for
dns-persist-01 in cond_ready.
Till now, threads were all started one at a time from thread 1. This
will soon cause us limitations once we want to reduce shared stuff
between thread groups.
Let's slightly change the startup sequence so that the first thread
starts one initial thread for each group, and that each of these
threads then starts all other threads from their group before switching
to the final task. Since it requires an intermediary step, we need to
store that threads' start function to access it from the group, so it
was put into the tgroup_info which still has plenty of room available.
It could also theoretically speed up the boot sequence, though in
practice it doesn't change anything because each thead's initialization
is made one at a time to avoid races during the early boot. However
ther is now a function in charge of starting all extra threads of a
group, and whih is called from this group.
REGTESTS: add QUIC test for max-total streams setting
Add a new QUIC regtest to test the new frontend stream.max-total
setting.
This test relies on two haproxy instances, as QUIC client and server.
New setting stream.max-total is set to 3 on the server side. In total, 6
requests are performed, with a check to ensure that a new connection has
been reopened for the last ones.
MINOR: mux-quic: close connection when reaching max-total streams
This commit completes the previous one which implements a new setting to
limit the number of streams usable by a client on a QUIC connection.
When the connection becomes idle after reaching this limit, it is
immediately closed. This is implemented by extending checks in
qcc_is_dead(). This results in a CONNECTION_CLOSE emission, which is
useful to free resources as soon as possible.
Implement a new setting to limit the total number of bidirectional
streams that the client may use on a single connection. By default, it
is set to 0 which means it is not limited at all.
If a positive value is configured, the client can only open a fixed
number of request streams per QUIC connection. Internally, this is
implemented in two steps :
* First, MAX_STREAMS_BIDI flow control advertizing will be reduced when
approaching the limit before being completely turned off when reaching
it. This guarantees that the client cannot exceed the limit without
violating the flow control.
* Second, when attaching the latest stream with ID matching max-total
setting, connection graceful shutdown is initiated. In HTTP/3, this
results in a GOAWAY emission. This allows the remaining streams to be
completed before the connection becomes completely idle.
MINOR: mux-quic: perform app init in case of early shutdown
Adds a qcc_app_init() call in qcc_app_shutdown(). This is necessary if
shutdown is performed early, before any invokation of qcc_io_send().
Currently, this should never occur in practice. However, this will
become necessary with the new settings tune.quic.fe.stream.max-total.
Indeed, when using a very small value, app-ops layer may be closed early
in the connection lifetime.