From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=AmbaaCAX; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 804245A0282 for ; Mon, 18 Aug 2025 05:04:27 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1755486257; bh=Egc3Tz4K2R04/3tQUCJ4szZn1lX6Ydrn2rCTziDuDa4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=AmbaaCAXoHVi55cf3PxL47PoMdywRPLbZR690BRDBvj99JgV1allnjpf0waYSPI0o yKyYrhs2w9CeLLbLxPcaprmGhQ32ov1wyo3Fs/QMLmVvqod1GhLrC8NhB00M/pQWuF 0POGwjRbv3NH5tOVhX1CdF7CN1FcGeyRsKq4MPyJZ3FjhSkGsdDzkPhHBfVA3yD/Ap 7ckRmvZ6m0pAiE3tttGHtzZcwlxbAhyMjI5VD3ITRhY7WUh6+jh9vfGapegdeAVoGl qg+A/wtBMTGfXvqlIVqC1C6bj6fzFoagSZctHReeG6UdYv3UbKLdtLMrOfotAFlRTi kXxERAOcVv5rg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4c4yH53b2Kz4x5Z; Mon, 18 Aug 2025 13:04:17 +1000 (AEST) Date: Mon, 18 Aug 2025 13:04:11 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/6] tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent Message-ID: References: <20250815161042.3606244-1-sbrivio@redhat.com> <20250815161042.3606244-7-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="TkpaCw+Fs9tbOiVq" Content-Disposition: inline In-Reply-To: <20250815161042.3606244-7-sbrivio@redhat.com> Message-ID-Hash: Z5K2B5VR32A67CCU6DVC7BHARW2DHFYF X-Message-ID-Hash: Z5K2B5VR32A67CCU6DVC7BHARW2DHFYF X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --TkpaCw+Fs9tbOiVq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 15, 2025 at 06:10:42PM +0200, Stefano Brivio wrote: > We currently have a number of discrepancies in the tcp_tap_handler() > path between the half-closed connection path and the regular one, and > they are mostly a result of code duplication, which comes in turn from > the fact that tcp_data_from_tap() deals with data transfers as well as > general connection bookkeeping, so we can't use it for half-closed > connections. >=20 > This suggests that we should probably rework it into two or more > functions, in the long term, Agreed. > but for the moment being I'm just fixing > one obvious issue, which is the lack of fast retransmissions in the > TAP_FIN_RCVD path, and a potential one, which is the fact we don't > handle socket flush failures. Fair enough for the time being. > Add fast re-transmit for half-closed connections, and extract the > logic to determine the TCP payload length from tcp_data_from_tap() > into the new tcp_packet_data_len() helper to decrease a bit the amount > of resulting code duplication. >=20 > Handle the case of socket flush (tcp_sock_consume()) flush failure in > the same way as tcp_data_from_tap() handles it. >=20 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 79 ++++++++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 56 insertions(+), 23 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 624e7f4..163f820 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1640,6 +1640,22 @@ static int tcp_data_from_sock(const struct ctx *c,= struct tcp_tap_conn *conn) > =20 > return tcp_buf_data_from_sock(c, conn); > } > +/** > + * tcp_packet_data_len() - Get data (TCP payload) length for a TCP packet > + * @th: Pointer to TCP header > + * @l4len: TCP packet length, including TCP header > + * > + * Return: data length of TCP packet, -1 on invalid value of Data Offset= field > + */ > +static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len) > +{ > + size_t off =3D th->doff * 4UL; > + > + if (off < sizeof(*th) || off > l4len) > + return -1; > + > + return l4len - off; > +} > =20 > /** > * tcp_data_from_tap() - tap/guest data for established connection > @@ -1671,27 +1687,21 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > for (i =3D idx, iov_i =3D 0; i < (int)p->count; i++) { > uint32_t seq, seq_offset, ack_seq; > const struct tcphdr *th; > + ssize_t dlen; > char *data; > - size_t off; > =20 > th =3D packet_get(p, i, 0, sizeof(*th), &len); > if (!th) > return -1; > len +=3D sizeof(*th); > =20 > - off =3D th->doff * 4UL; > - if (off < sizeof(*th) || off > len) > - return -1; > - > if (th->rst) { > conn_event(c, conn, CLOSED); > return 1; > } > =20 > - len -=3D off; > - data =3D packet_get(p, i, off, len, NULL); > - if (!data) > - continue; > + if ((dlen =3D tcp_packet_data_len(th, len)) < 0) > + return -1; > =20 > seq =3D ntohl(th->seq); > if (SEQ_LT(seq, conn->seq_from_tap) && len <=3D 1) { > @@ -1719,7 +1729,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > if (SEQ_GE(ack_seq, conn->seq_ack_from_tap) && > SEQ_GE(ack_seq, max_ack_seq)) { > /* Fast re-transmit */ > - retr =3D !len && !th->fin && > + retr =3D !dlen && !th->fin && > ack_seq =3D=3D max_ack_seq && > ntohs(th->window) =3D=3D max_ack_seq_wnd; > =20 > @@ -1731,33 +1741,37 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > if (th->fin) > fin =3D 1; > =20 > - if (!len) > + if (!dlen) > + continue; > + > + data =3D packet_get(p, i, th->doff * 4UL, dlen, NULL); > + if (!data) > continue; > =20 > seq_offset =3D seq_from_tap - seq; > /* Use data from this buffer only in these two cases: > * > - * , seq_from_tap , seq_from_tap > - * |--------| <-- len |--------| <-- len > + * , seq_from_tap , seq_from_tap > + * |--------| <-- dlen |--------| <-- dlen > * '----' <-- offset ' <-- offset > * ^ seq ^ seq > - * (offset >=3D 0, seq + len > seq_from_tap) > + * (offset >=3D 0, seq + dlen > seq_from_tap) > * > * discard in these two cases: > - * , seq_from_tap , seq_from_tap > - * |--------| <-- len |--------| <-- len > + * , seq_from_tap , seq_from_tap > + * |--------| <-- dlen |--------| <-- dlen > * '--------' <-- offset '-----| <- offset > * ^ seq ^ seq > - * (offset >=3D 0, seq + len <=3D seq_from_tap) > + * (offset >=3D 0, seq + dlen <=3D seq_from_tap) > * > * keep, look for another buffer, then go back, in this case: > * , seq_from_tap > - * |--------| <-- len > + * |--------| <-- dlen > * '=3D=3D=3D' <-- offset > * ^ seq > * (offset < 0) > */ > - if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + len, seq_from_tap)) > + if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + dlen, seq_from_tap)) > continue; > =20 > if (SEQ_LT(seq_offset, 0)) { > @@ -1767,7 +1781,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > } > =20 > tcp_iov[iov_i].iov_base =3D data + seq_offset; > - tcp_iov[iov_i].iov_len =3D len - seq_offset; > + tcp_iov[iov_i].iov_len =3D dlen - seq_offset; > seq_from_tap +=3D tcp_iov[iov_i].iov_len; > iov_i++; > =20 > @@ -2078,9 +2092,28 @@ int tcp_tap_handler(const struct ctx *c, uint8_t p= if, sa_family_t af, > =20 > /* Established connections not accepting data from tap */ > if (conn->events & TAP_FIN_RCVD) { > - tcp_sock_consume(conn, ntohl(th->ack_seq)); > - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > - if (tcp_tap_window_update(c, conn, ntohs(th->window))) > + bool retr; > + > + retr =3D th->ack && !tcp_packet_data_len(th, len) && !th->fin && > + ntohl(th->ack_seq) =3D=3D conn->seq_ack_from_tap && > + ntohs(th->window) =3D=3D conn->wnd_from_tap; > + > + /* On socket flush failure, pretend there was no ACK, try again > + * later > + */ > + if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq))) > + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > + > + if (retr) { > + flow_trace(conn, > + "fast re-transmit, ACK: %u, previous sequence: %u", > + ntohl(th->ack_seq), conn->seq_to_tap); > + > + if (tcp_rewind_seq(c, conn)) > + return -1; > + } > + > + if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr) > tcp_data_from_sock(c, conn); > =20 > if (conn->seq_ack_from_tap =3D=3D conn->seq_to_tap) { --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --TkpaCw+Fs9tbOiVq Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiimCsACgkQzQJF27ox 2Gec8w//c0huHOPhdz1DIQrIvQO5ZOw8QWjc5ot6EGMZvDADi5SzqTzkyJKqBINQ +nP9sKNYujsANO4Bp699I/lN0ZdJihzyC+ez2E8XDTVA9kJCegU4kOWEgZRh3rdC 6xShZfN+0HgJqcRA3A+YXooS2oh+h0wHbHC8P1x6HaDyFoqG5ysiTOmZyZ0YuyDZ bSu1hfYKb9hrApGHRseJBHIwV3IIumPEFhgPee2HwIpLCoUJGL/ybmo3jigniSmw Zm0sQINLav0aRZo5L6qfl/jQkZYa6ZrCnoxHwZZq7yb2UcQ1lnLM4sGAfJQp5i/+ Z6Yqk2p+10wngesxePpOV9uMIRoiBtLDM6be1dw9FhKCk4O4O3n0cP64HtkQCnhI Kvc//94iLeQktYgvPDuVePoPz/qjZ/KktJH83yALyFz3TOwR94OBrAY8zv1BtLBs km3QugSDhC5pDy7Jw0FogUPdAs6f1SGjWbYqZSrmw1YTEt0Q6Ioj6boEbO7zsf6Z 8t7OVvF3R3Sf/vr2R/epiJDAMu9tjOamYyMFz3kscuGwoILdRl9XE76QVw3j/RUA gdo6TO8eCS3Si81HzLq5Rlk2QgzKFhaFIQKzbdjN1ZXMoiZVkVaH2c0Jk3kj5ZBh vjis+1YN8OO2WFsgBhElINgqyzqxuC1sV7piA8hIPcunc1CLk/I= =WTZ+ -----END PGP SIGNATURE----- --TkpaCw+Fs9tbOiVq--