public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Jon Maloy <jmaloy@redhat.com>, Paul Holzinger <pholzing@redhat.com>
Subject: [PATCH 6/6] tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent
Date: Fri, 15 Aug 2025 18:10:42 +0200	[thread overview]
Message-ID: <20250815161042.3606244-7-sbrivio@redhat.com> (raw)
In-Reply-To: <20250815161042.3606244-1-sbrivio@redhat.com>

We currently have a number of discrepancies in the tcp_tap_handler()
path between the half-closed connection path and the regular one, and
they are mostly a result of code duplication, which comes in turn from
the fact that tcp_data_from_tap() deals with data transfers as well as
general connection bookkeeping, so we can't use it for half-closed
connections.

This suggests that we should probably rework it into two or more
functions, in the long term, but for the moment being I'm just fixing
one obvious issue, which is the lack of fast retransmissions in the
TAP_FIN_RCVD path, and a potential one, which is the fact we don't
handle socket flush failures.

Add fast re-transmit for half-closed connections, and extract the
logic to determine the TCP payload length from tcp_data_from_tap()
into the new tcp_packet_data_len() helper to decrease a bit the amount
of resulting code duplication.

Handle the case of socket flush (tcp_sock_consume()) flush failure in
the same way as tcp_data_from_tap() handles it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 79 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 56 insertions(+), 23 deletions(-)

diff --git a/tcp.c b/tcp.c
index 624e7f4..163f820 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1640,6 +1640,22 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 
 	return tcp_buf_data_from_sock(c, conn);
 }
+/**
+ * tcp_packet_data_len() - Get data (TCP payload) length for a TCP packet
+ * @th:		Pointer to TCP header
+ * @l4len:	TCP packet length, including TCP header
+ *
+ * Return: data length of TCP packet, -1 on invalid value of Data Offset field
+ */
+static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len)
+{
+	size_t off = th->doff * 4UL;
+
+	if (off < sizeof(*th) || off > l4len)
+		return -1;
+
+	return l4len - off;
+}
 
 /**
  * tcp_data_from_tap() - tap/guest data for established connection
@@ -1671,27 +1687,21 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 	for (i = idx, iov_i = 0; i < (int)p->count; i++) {
 		uint32_t seq, seq_offset, ack_seq;
 		const struct tcphdr *th;
+		ssize_t dlen;
 		char *data;
-		size_t off;
 
 		th = packet_get(p, i, 0, sizeof(*th), &len);
 		if (!th)
 			return -1;
 		len += sizeof(*th);
 
-		off = th->doff * 4UL;
-		if (off < sizeof(*th) || off > len)
-			return -1;
-
 		if (th->rst) {
 			conn_event(c, conn, CLOSED);
 			return 1;
 		}
 
-		len -= off;
-		data = packet_get(p, i, off, len, NULL);
-		if (!data)
-			continue;
+		if ((dlen = tcp_packet_data_len(th, len)) < 0)
+			return -1;
 
 		seq = ntohl(th->seq);
 		if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
@@ -1719,7 +1729,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 			if (SEQ_GE(ack_seq, conn->seq_ack_from_tap) &&
 			    SEQ_GE(ack_seq, max_ack_seq)) {
 				/* Fast re-transmit */
-				retr = !len && !th->fin &&
+				retr = !dlen && !th->fin &&
 				       ack_seq == max_ack_seq &&
 				       ntohs(th->window) == max_ack_seq_wnd;
 
@@ -1731,33 +1741,37 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 		if (th->fin)
 			fin = 1;
 
-		if (!len)
+		if (!dlen)
+			continue;
+
+		data = packet_get(p, i, th->doff * 4UL, dlen, NULL);
+		if (!data)
 			continue;
 
 		seq_offset = seq_from_tap - seq;
 		/* Use data from this buffer only in these two cases:
 		 *
-		 *      , seq_from_tap           , seq_from_tap
-		 * |--------| <-- len            |--------| <-- len
+		 *      , seq_from_tap            , seq_from_tap
+		 * |--------| <-- dlen           |--------| <-- dlen
 		 * '----' <-- offset             ' <-- offset
 		 * ^ seq                         ^ seq
-		 *    (offset >= 0, seq + len > seq_from_tap)
+		 *    (offset >= 0, seq + dlen > seq_from_tap)
 		 *
 		 * discard in these two cases:
-		 *          , seq_from_tap                , seq_from_tap
-		 * |--------| <-- len            |--------| <-- len
+		 *          , seq_from_tap                 , seq_from_tap
+		 * |--------| <-- dlen           |--------| <-- dlen
 		 * '--------' <-- offset            '-----| <- offset
 		 * ^ seq                            ^ seq
-		 *    (offset >= 0, seq + len <= seq_from_tap)
+		 *    (offset >= 0, seq + dlen <= seq_from_tap)
 		 *
 		 * keep, look for another buffer, then go back, in this case:
 		 *      , seq_from_tap
-		 *          |--------| <-- len
+		 *          |--------| <-- dlen
 		 *      '===' <-- offset
 		 *          ^ seq
 		 *    (offset < 0)
 		 */
-		if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + len, seq_from_tap))
+		if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + dlen, seq_from_tap))
 			continue;
 
 		if (SEQ_LT(seq_offset, 0)) {
@@ -1767,7 +1781,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 		}
 
 		tcp_iov[iov_i].iov_base = data + seq_offset;
-		tcp_iov[iov_i].iov_len = len - seq_offset;
+		tcp_iov[iov_i].iov_len = dlen - seq_offset;
 		seq_from_tap += tcp_iov[iov_i].iov_len;
 		iov_i++;
 
@@ -2078,9 +2092,28 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
 
 	/* Established connections not accepting data from tap */
 	if (conn->events & TAP_FIN_RCVD) {
-		tcp_sock_consume(conn, ntohl(th->ack_seq));
-		tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
-		if (tcp_tap_window_update(c, conn, ntohs(th->window)))
+		bool retr;
+
+		retr = th->ack && !tcp_packet_data_len(th, len) && !th->fin &&
+		       ntohl(th->ack_seq) == conn->seq_ack_from_tap &&
+		       ntohs(th->window) == conn->wnd_from_tap;
+
+		/* On socket flush failure, pretend there was no ACK, try again
+		 * later
+		 */
+		if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq)))
+			tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
+
+		if (retr) {
+			flow_trace(conn,
+				   "fast re-transmit, ACK: %u, previous sequence: %u",
+				   ntohl(th->ack_seq), conn->seq_to_tap);
+
+			if (tcp_rewind_seq(c, conn))
+				return -1;
+		}
+
+		if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr)
 			tcp_data_from_sock(c, conn);
 
 		if (conn->seq_ack_from_tap == conn->seq_to_tap) {
-- 
@@ -1640,6 +1640,22 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 
 	return tcp_buf_data_from_sock(c, conn);
 }
+/**
+ * tcp_packet_data_len() - Get data (TCP payload) length for a TCP packet
+ * @th:		Pointer to TCP header
+ * @l4len:	TCP packet length, including TCP header
+ *
+ * Return: data length of TCP packet, -1 on invalid value of Data Offset field
+ */
+static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len)
+{
+	size_t off = th->doff * 4UL;
+
+	if (off < sizeof(*th) || off > l4len)
+		return -1;
+
+	return l4len - off;
+}
 
 /**
  * tcp_data_from_tap() - tap/guest data for established connection
@@ -1671,27 +1687,21 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 	for (i = idx, iov_i = 0; i < (int)p->count; i++) {
 		uint32_t seq, seq_offset, ack_seq;
 		const struct tcphdr *th;
+		ssize_t dlen;
 		char *data;
-		size_t off;
 
 		th = packet_get(p, i, 0, sizeof(*th), &len);
 		if (!th)
 			return -1;
 		len += sizeof(*th);
 
-		off = th->doff * 4UL;
-		if (off < sizeof(*th) || off > len)
-			return -1;
-
 		if (th->rst) {
 			conn_event(c, conn, CLOSED);
 			return 1;
 		}
 
-		len -= off;
-		data = packet_get(p, i, off, len, NULL);
-		if (!data)
-			continue;
+		if ((dlen = tcp_packet_data_len(th, len)) < 0)
+			return -1;
 
 		seq = ntohl(th->seq);
 		if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
@@ -1719,7 +1729,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 			if (SEQ_GE(ack_seq, conn->seq_ack_from_tap) &&
 			    SEQ_GE(ack_seq, max_ack_seq)) {
 				/* Fast re-transmit */
-				retr = !len && !th->fin &&
+				retr = !dlen && !th->fin &&
 				       ack_seq == max_ack_seq &&
 				       ntohs(th->window) == max_ack_seq_wnd;
 
@@ -1731,33 +1741,37 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 		if (th->fin)
 			fin = 1;
 
-		if (!len)
+		if (!dlen)
+			continue;
+
+		data = packet_get(p, i, th->doff * 4UL, dlen, NULL);
+		if (!data)
 			continue;
 
 		seq_offset = seq_from_tap - seq;
 		/* Use data from this buffer only in these two cases:
 		 *
-		 *      , seq_from_tap           , seq_from_tap
-		 * |--------| <-- len            |--------| <-- len
+		 *      , seq_from_tap            , seq_from_tap
+		 * |--------| <-- dlen           |--------| <-- dlen
 		 * '----' <-- offset             ' <-- offset
 		 * ^ seq                         ^ seq
-		 *    (offset >= 0, seq + len > seq_from_tap)
+		 *    (offset >= 0, seq + dlen > seq_from_tap)
 		 *
 		 * discard in these two cases:
-		 *          , seq_from_tap                , seq_from_tap
-		 * |--------| <-- len            |--------| <-- len
+		 *          , seq_from_tap                 , seq_from_tap
+		 * |--------| <-- dlen           |--------| <-- dlen
 		 * '--------' <-- offset            '-----| <- offset
 		 * ^ seq                            ^ seq
-		 *    (offset >= 0, seq + len <= seq_from_tap)
+		 *    (offset >= 0, seq + dlen <= seq_from_tap)
 		 *
 		 * keep, look for another buffer, then go back, in this case:
 		 *      , seq_from_tap
-		 *          |--------| <-- len
+		 *          |--------| <-- dlen
 		 *      '===' <-- offset
 		 *          ^ seq
 		 *    (offset < 0)
 		 */
-		if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + len, seq_from_tap))
+		if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + dlen, seq_from_tap))
 			continue;
 
 		if (SEQ_LT(seq_offset, 0)) {
@@ -1767,7 +1781,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 		}
 
 		tcp_iov[iov_i].iov_base = data + seq_offset;
-		tcp_iov[iov_i].iov_len = len - seq_offset;
+		tcp_iov[iov_i].iov_len = dlen - seq_offset;
 		seq_from_tap += tcp_iov[iov_i].iov_len;
 		iov_i++;
 
@@ -2078,9 +2092,28 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
 
 	/* Established connections not accepting data from tap */
 	if (conn->events & TAP_FIN_RCVD) {
-		tcp_sock_consume(conn, ntohl(th->ack_seq));
-		tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
-		if (tcp_tap_window_update(c, conn, ntohs(th->window)))
+		bool retr;
+
+		retr = th->ack && !tcp_packet_data_len(th, len) && !th->fin &&
+		       ntohl(th->ack_seq) == conn->seq_ack_from_tap &&
+		       ntohs(th->window) == conn->wnd_from_tap;
+
+		/* On socket flush failure, pretend there was no ACK, try again
+		 * later
+		 */
+		if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq)))
+			tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
+
+		if (retr) {
+			flow_trace(conn,
+				   "fast re-transmit, ACK: %u, previous sequence: %u",
+				   ntohl(th->ack_seq), conn->seq_to_tap);
+
+			if (tcp_rewind_seq(c, conn))
+				return -1;
+		}
+
+		if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr)
 			tcp_data_from_sock(c, conn);
 
 		if (conn->seq_ack_from_tap == conn->seq_to_tap) {
-- 
2.43.0


      parent reply	other threads:[~2025-08-15 16:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-15 16:10 [PATCH 0/6] tcp: Fixes for issues uncovered by tests with 6.17-rc1 kernels Stefano Brivio
2025-08-15 16:10 ` [PATCH 1/6] tcp: FIN flags have to be retransmitted as well Stefano Brivio
2025-08-15 16:10 ` [PATCH 2/6] tcp: Factor sequence rewind for retransmissions into a new function Stefano Brivio
2025-08-15 16:10 ` [PATCH 3/6] tcp: Rewind sequence when guest shrinks window to zero Stefano Brivio
2025-08-15 16:10 ` [PATCH 4/6] tcp: Fix closing logic for half-closed connections Stefano Brivio
2025-08-15 16:10 ` [PATCH 5/6] tcp: Don't try to transmit right after the peer shrank the window to zero Stefano Brivio
2025-08-15 16:10 ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250815161042.3606244-7-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).