From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=HG0jXlVf; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id E684C5A027A for ; Mon, 01 Sep 2025 06:31:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1756701105; bh=wuPDsSxgAi+5cTedl+QDvlRugPUoESF9Xsx6n0m9i0o=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HG0jXlVfhW5pGE1sQYjB2tJeN8148GnfaP92e/tFwLQ5KEpFb+vI8fNuP/uY9BAWs 1M3nccvbXqDP81V0pYySr+qStJwbvdnsQF7LwVYdz8L2D9R2emmS5o7VIeltAwJK/h /MWJyC6A+MCuysp88mGEAYIYglxpf+D/VRGwO98uWr5E7RvzXTxaba3hXImiZiRKpd 51g6EHPxhaNxZGtlE98SrAx0Bk6g0tiJVT7Khmaqb40yKqaaYziq5g6gOAPIdLeEau AptClnmBkRu7p3ypLuuVgWC19MU3hlez3rs2wUdRTKDeKjGqkDM8g7M/qTGsRXtQpA dlBZORM+I4cOw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cFbYY71tQz4w9c; Mon, 1 Sep 2025 14:31:45 +1000 (AEST) Date: Mon, 1 Sep 2025 14:31:40 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 7/7] tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent Message-ID: References: <20250829201132.1561650-1-sbrivio@redhat.com> <20250829201132.1561650-8-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="VB+f3NOp9Vgf8Om1" Content-Disposition: inline In-Reply-To: <20250829201132.1561650-8-sbrivio@redhat.com> Message-ID-Hash: 6FWW4BEB36JUZ3UCF2CHZIG3737GZRF6 X-Message-ID-Hash: 6FWW4BEB36JUZ3UCF2CHZIG3737GZRF6 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --VB+f3NOp9Vgf8Om1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 29, 2025 at 10:11:32PM +0200, Stefano Brivio wrote: > We currently have a number of discrepancies in the tcp_tap_handler() > path between the half-closed connection path and the regular one, and > they are mostly a result of code duplication, which comes in turn from > the fact that tcp_data_from_tap() deals with data transfers as well as > general connection bookkeeping, so we can't use it for half-closed > connections. >=20 > This suggests that we should probably rework it into two or more > functions, in the long term, but for the moment being I'm just fixing > one obvious issue, which is the lack of fast retransmissions in the > TAP_FIN_RCVD path, and a potential one, which is the fact we don't > handle socket flush failures. >=20 > Add fast re-transmit for half-closed connections, and extract the > logic to determine the TCP payload length from tcp_data_from_tap() > into the new tcp_packet_data_len() helper to decrease a bit the amount > of resulting code duplication. >=20 > Handle the case of socket flush (tcp_sock_consume()) flush failure in > the same way as tcp_data_from_tap() handles it. >=20 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 81 +++++++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 57 insertions(+), 24 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 051550d..35d101e 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1640,6 +1640,22 @@ static int tcp_data_from_sock(const struct ctx *c,= struct tcp_tap_conn *conn) > =20 > return tcp_buf_data_from_sock(c, conn); > } > +/** > + * tcp_packet_data_len() - Get data (TCP payload) length for a TCP packet > + * @th: Pointer to TCP header > + * @l4len: TCP packet length, including TCP header > + * > + * Return: data length of TCP packet, -1 on invalid value of Data Offset= field > + */ > +static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len) > +{ > + size_t off =3D th->doff * 4UL; > + > + if (off < sizeof(*th) || off > l4len) > + return -1; > + > + return l4len - off; > +} > =20 > /** > * tcp_data_from_tap() - tap/guest data for established connection > @@ -1671,30 +1687,24 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > for (i =3D idx, iov_i =3D 0; i < (int)p->count; i++) { > uint32_t seq, seq_offset, ack_seq; > const struct tcphdr *th; > + ssize_t dlen; > char *data; > - size_t off; > =20 > th =3D packet_get(p, i, 0, sizeof(*th), &len); > if (!th) > return -1; > len +=3D sizeof(*th); > =20 > - off =3D th->doff * 4UL; > - if (off < sizeof(*th) || off > len) > - return -1; > - > if (th->rst) { > conn_event(c, conn, CLOSED); > return 1; > } > =20 > - len -=3D off; > - data =3D packet_get(p, i, off, len, NULL); > - if (!data) > - continue; > + if ((dlen =3D tcp_packet_data_len(th, len)) < 0) > + return -1; > =20 > seq =3D ntohl(th->seq); > - if (SEQ_LT(seq, conn->seq_from_tap) && len <=3D 1) { > + if (SEQ_LT(seq, conn->seq_from_tap) && dlen <=3D 1) { > flow_trace(conn, > "keep-alive sequence: %u, previous: %u", > seq, conn->seq_from_tap); > @@ -1719,7 +1729,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > if (SEQ_GE(ack_seq, conn->seq_ack_from_tap) && > SEQ_GE(ack_seq, max_ack_seq)) { > /* Fast re-transmit */ > - retr =3D !len && !th->fin && > + retr =3D !dlen && !th->fin && > ack_seq =3D=3D max_ack_seq && > ntohs(th->window) =3D=3D max_ack_seq_wnd; > =20 > @@ -1740,33 +1750,37 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > if (th->fin) > fin =3D 1; > =20 > - if (!len) > + if (!dlen) > + continue; > + > + data =3D packet_get(p, i, th->doff * 4UL, dlen, NULL); > + if (!data) > continue; > =20 > seq_offset =3D seq_from_tap - seq; > /* Use data from this buffer only in these two cases: > * > - * , seq_from_tap , seq_from_tap > - * |--------| <-- len |--------| <-- len > + * , seq_from_tap , seq_from_tap > + * |--------| <-- dlen |--------| <-- dlen > * '----' <-- offset ' <-- offset > * ^ seq ^ seq > - * (offset >=3D 0, seq + len > seq_from_tap) > + * (offset >=3D 0, seq + dlen > seq_from_tap) > * > * discard in these two cases: > - * , seq_from_tap , seq_from_tap > - * |--------| <-- len |--------| <-- len > + * , seq_from_tap , seq_from_tap > + * |--------| <-- dlen |--------| <-- dlen > * '--------' <-- offset '-----| <- offset > * ^ seq ^ seq > - * (offset >=3D 0, seq + len <=3D seq_from_tap) > + * (offset >=3D 0, seq + dlen <=3D seq_from_tap) > * > * keep, look for another buffer, then go back, in this case: > * , seq_from_tap > - * |--------| <-- len > + * |--------| <-- dlen > * '=3D=3D=3D' <-- offset > * ^ seq > * (offset < 0) > */ > - if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + len, seq_from_tap)) > + if (SEQ_GE(seq_offset, 0) && SEQ_LE(seq + dlen, seq_from_tap)) > continue; > =20 > if (SEQ_LT(seq_offset, 0)) { > @@ -1776,7 +1790,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > } > =20 > tcp_iov[iov_i].iov_base =3D data + seq_offset; > - tcp_iov[iov_i].iov_len =3D len - seq_offset; > + tcp_iov[iov_i].iov_len =3D dlen - seq_offset; > seq_from_tap +=3D tcp_iov[iov_i].iov_len; > iov_i++; > =20 > @@ -2087,9 +2101,28 @@ int tcp_tap_handler(const struct ctx *c, uint8_t p= if, sa_family_t af, > =20 > /* Established connections not accepting data from tap */ > if (conn->events & TAP_FIN_RCVD) { > - tcp_sock_consume(conn, ntohl(th->ack_seq)); > - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > - if (tcp_tap_window_update(c, conn, ntohs(th->window))) > + bool retr; > + > + retr =3D th->ack && !tcp_packet_data_len(th, len) && !th->fin && > + ntohl(th->ack_seq) =3D=3D conn->seq_ack_from_tap && > + ntohs(th->window) =3D=3D conn->wnd_from_tap; > + > + /* On socket flush failure, pretend there was no ACK, try again > + * later > + */ > + if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq))) > + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > + > + if (retr) { > + flow_trace(conn, > + "fast re-transmit, ACK: %u, previous sequence: %u", > + ntohl(th->ack_seq), conn->seq_to_tap); > + > + if (tcp_rewind_seq(c, conn)) > + return -1; > + } > + > + if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr) > tcp_data_from_sock(c, conn); > =20 > if (conn->seq_ack_from_tap =3D=3D conn->seq_to_tap) { > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --VB+f3NOp9Vgf8Om1 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmi1IasACgkQzQJF27ox 2GdMAhAAg3h3FRXQKnLuFmUrH6NLxmq1+OX1eoixXyAhKG9qTNMSQPqsQRgKusTe G518eVDMjHZHV9kjqThYof7QWIGLKT2yCJnwuHvTDX+ir5pB1AwquTPuk4BhZCLp HO+n0yhXEhFbgkFrst/dO1BAwzDbNsi746CkvlqMvyJAz8QwqkO01mILTgALKXds 1TX+St9RF3qOKmUCK2JSG6O3laOOwHL1H4gH456zaSufc4R70luL2yy/SbvUu+4x Qxqfqiic4/MPtRESNuchgmwYdGf1BhUO/+VQygA31IOVmzqTpMI7iuQKYrxgC30Z Ni+A+OsuqMlzeh2LjPJIck8yqjjI9xv/a8SQRQ6aFyK+9sHmjurztsZDzFD4al/u E7z2sj1bzTUUPn0u1OlrOsDBmtzagX0HYEth3zl1kdeGKb6jT6HU5xf9sVx5pO8l II/aWJRoDd0UY7cYz3uz7P+OUzfXAxXqXzdPjgv4iVSKFNW/4uK1zCycIQxUaTzU 1r3JzVXlg6FXgblCPJRASm/LCCloFr+3p0Fe+ZdBcyiQ8xLZF9ftgJGi9QFM1xnj uaSWvu3n/2mLj8hnJqLVMvgPbbEsRQoCP5/GrOLMmdFZnIIeLl+NAfArFYapulKe HOGVkY/+9ZZdNN5x/KsxryQBeLKjZhScU2BrrgND4tkS9xbHCB4= =9l4h -----END PGP SIGNATURE----- --VB+f3NOp9Vgf8Om1--