* [PATCH 0/1] RFC: Clarifying seq_ack_to_tap logic
@ 2025-10-03 6:30 David Gibson
2025-10-03 6:30 ` [PATCH 1/1] tcp: Clarify logic calculating how much guest data to ack David Gibson
0 siblings, 1 reply; 2+ messages in thread
From: David Gibson @ 2025-10-03 6:30 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Prompted by the discussion on Stefano's recent TCP fixes, here's
change I think clarifies our logic. This goes on top of my linter
fixes (specifically the cppcheck suppression in the vicinity) but
before Stefano's TCP fixes.
Happy to have it folded into those TCP fixes, or reorganised around
them.
Stefano, please review my large comment. I'm confident on some of the
points in there, but not all of them.
David Gibson (1):
tcp: Clarify logic calculating how much guest data to ack
tcp.c | 68 ++++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 42 insertions(+), 26 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH 1/1] tcp: Clarify logic calculating how much guest data to ack
2025-10-03 6:30 [PATCH 0/1] RFC: Clarifying seq_ack_to_tap logic David Gibson
@ 2025-10-03 6:30 ` David Gibson
0 siblings, 0 replies; 2+ messages in thread
From: David Gibson @ 2025-10-03 6:30 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
This is fairly complex, because we have a method we prefer but we need to
fall back to a simpler one in a bunch of cases. Slightly reorganise the
code to make the flow clearer, and add a large comment giving the
rationale.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 68 ++++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 42 insertions(+), 26 deletions(-)
diff --git a/tcp.c b/tcp.c
index 7da41797..85eb2c32 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1014,35 +1014,51 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
uint32_t new_wnd_to_tap = prev_wnd_to_tap;
int s = conn->sock;
- if (!bytes_acked_cap) {
- conn->seq_ack_to_tap = conn->seq_from_tap;
- if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
- conn->seq_ack_to_tap = prev_ack_to_tap;
- } else {
- if ((unsigned)SNDBUF_GET(conn) < SNDBUF_SMALL ||
- tcp_rtt_dst_low(conn) || CONN_IS_CLOSING(conn) ||
- (conn->flags & LOCAL) || force_seq) {
- conn->seq_ack_to_tap = conn->seq_from_tap;
- } else if (conn->seq_ack_to_tap != conn->seq_from_tap) {
- if (!tinfo) {
- tinfo = &tinfo_new;
- if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl))
- return 0;
- }
-
- /* This trips a cppcheck bug in some versions, including
- * cppcheck 2.18.3.
- * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde59085/
- */
- /* cppcheck-suppress [uninitvar,unmatchedSuppression] */
- conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
- conn->seq_init_from_tap;
-
- if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
- conn->seq_ack_to_tap = prev_ack_to_tap;
+ /* At this point we could ack all the data we've accepted for forwarding
+ * (seq_from_tap). When possible, however, we want to only ack what the
+ * peer has acked. This makes it appear to the guest more like a direct
+ * connection to the peer, and may improve flow control behaviour.
+ *
+ * For it to be possible and worth it we need:
+ * - The TCP_INFO Linux extension which gives us the peer acked bytes
+ * - Not to be told not to (force_seq)
+ * - Not half-closed in the peer->guest direction
+ * With no data coming from the peer, we won't get further events
+ * which would prompt us to recheck bytes_acked. We could poll on
+ * a timer, but that's more trouble than it's worth.
+ * - Not a host local connection
+ * Data goes directly from socket to socket in this case, with
+ * nothing meaningful "in flight".
+ * - Large enough send buffer
+ * If this is small, there's not enough in flight to bother.
+ */
+ if (bytes_acked_cap && !force_seq &&
+ !CONN_IS_CLOSING(conn) &&
+ !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn) &&
+ (unsigned)SNDBUF_GET(conn) >= SNDBUF_SMALL) {
+ if (!tinfo) {
+ tinfo = &tinfo_new;
+ if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl))
+ return 0;
}
+
+ /* This trips a cppcheck bug in some versions, including
+ * cppcheck 2.18.3.
+ * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde59085/
+ */
+ /* cppcheck-suppress [uninitvar,unmatchedSuppression] */
+ conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
+ conn->seq_init_from_tap;
+ } else {
+ /* Fall back to acking everything we have */
+ conn->seq_ack_to_tap = conn->seq_from_tap;
}
+ /* If the guest is retransmitting, don't let our ACKed sequence go
+ * backwards */
+ if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
+ conn->seq_ack_to_tap = prev_ack_to_tap;
+
if (!snd_wnd_cap) {
tcp_get_sndbuf(conn);
new_wnd_to_tap = MIN(SNDBUF_GET(conn), MAX_WINDOW);
--
2.51.0
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-03 6:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-03 6:30 [PATCH 0/1] RFC: Clarifying seq_ack_to_tap logic David Gibson
2025-10-03 6:30 ` [PATCH 1/1] tcp: Clarify logic calculating how much guest data to ack David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).