From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 62D545A02CF for ; Thu, 16 May 2024 04:24:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715826278; bh=LVJKAeG8zV6mKDzdRtMS0Zdi0ZH7FSwwGED3BKObmlw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=fl5ZAqLyfmW+B/ve0iVxkSc/xlQSqIkR4ID3Y0ejVE4UT4Xu1dGXiw0/FRncWmyzd 3eprY7A2GcKQr0G7tBZ+hcSlHkpg96z+ePerKg7dP62DuIn8ObqLmVs/hBs6BnTOgg 3X5kLwKuXAvQPI2afxUiWJ8I5pBuxtX22ZgdMvktbYH/wolbEBJfbI5DpnjNW2mWmU vuMuuHtEyvbrNQNDIBf5QFNI8F+Lc1dRY+MaknHykaZhpS3ctrU1lbpMZbLU/pQ6v9 SYvJ9oKh67cegtVjN1CSKx2EA0MhNVlfROOkHhb4z1FY+/ZLJbymwsq5E/wMgpJJjT Ei4H5yPvW798g== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Vfv7B3vs3z4wcR; Thu, 16 May 2024 12:24:38 +1000 (AEST) Date: Thu, 16 May 2024 12:24:19 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued Message-ID: References: <20240515153429.859185-1-jmaloy@redhat.com> <20240515153429.859185-2-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="PurO5sCZpiETVKdO" Content-Disposition: inline In-Reply-To: <20240515153429.859185-2-jmaloy@redhat.com> Message-ID-Hash: IQ65EAHAMV6GU34BX57F7HCJH5VAADRT X-Message-ID-Hash: IQ65EAHAMV6GU34BX57F7HCJH5VAADRT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --PurO5sCZpiETVKdO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 15, 2024 at 11:34:27AM -0400, Jon Maloy wrote: > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter = for dropped frames") > delayed update of conn->seq_to_tap until the moment the corresponding > frame has been successfully pushed out. This has the advantage that we > immediately can make a new attempt to transmit a frame after a failed > trasnmit, rather than waiting for the peer to later discover a gap and > trigger the fast retransmit mechanism to solve the problem. >=20 > This approach has turned out to cause a problem with spurious sequence > number updates during peer-initiated retransmits, and we have realized > it may not be the best way to solve the above issue. >=20 > We now restore the previous method, by updating the said field at the > moment a frame is added to the outqueue. To retain the advantage of > having a quick re-attempt based on local failure detection, we now scan > through the part of the outqueue that had do be dropped, and restore the > sequence counter for each affected connection to the most appropriate > value. >=20 > Signed-off-by: Jon Maloy >=20 > --- > v2: - Re-spun loop in tcp_revert_seq() and some other changes based on > feedback from Stefano Brivio. > - Added paranoid test to avoid that seq_to_tap becomes lower than > seq_ack_from_tap. >=20 > v3: - Identical to v2. Called v3 because it was embedded in a series > with that version. >=20 > v4: - In tcp_revert_seq(), we read the sequence number from the TCP > header instead of keeping a copy in struct tcp_buf_seq_update. > - Since the only remaining field in struct tcp_buf_seq_update is > a pointer to struct tcp_tap_conn, we eliminate the struct > altogether, and make the tcp6/tcp3_buf_seq_update arrays into > arrays of said pointer. > - Removed 'paranoid' test in tcp_revert_seq. If it happens, it > is not fatal, and will be caught by other code anyway. > - Separated from the series again. > --- > tcp.c | 59 +++++++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 37 insertions(+), 22 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 21d0af0..976dba8 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; > */ > static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > =20 > -/** > - * tcp_buf_seq_update - Sequences to update with length of frames once s= ent > - * @seq: Pointer to sequence number sent to tap-side, to be updated > - * @len: TCP payload length > - */ > -struct tcp_buf_seq_update { > - uint32_t *seq; > - uint16_t len; > -}; > - > /* Static buffers */ > /** > * struct tcp_payload_t - TCP header and data to send segments with payl= oad > @@ -461,7 +451,8 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS4 <=3D sizeof(tcp4_payload[0].data), "MSS4 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp4_payload_used; > =20 > static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -483,7 +474,8 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS6 <=3D sizeof(tcp6_payload[0].data), "MSS6 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp6_payload_used; > =20 > static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -1261,25 +1253,49 @@ static void tcp_flags_flush(const struct ctx *c) > tcp4_flags_used =3D 0; > } > =20 > +/** > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed tran= smission > + * @conns: Array of connection pointers corresponding to queued fr= ames > + * @frames: Two-dimensional array containing queued frames with sub= -iovs You can make the 2d array explicit in the type as: struct iovec (*frames)[TCP_NUM_IOVS]; See, for example the 'tap_iov' local in udp_tap_send(). (I recommend the command line tool 'cdecl', also available online at cdecl.org for working out confusing pointer-to-array types). > + * @num_frames: Number of entries in the two arrays to be compared > + */ > +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec *fr= ames, > + int num_frames) > +{ > + int c, f; > + > + for (c =3D 0, f =3D 0; c < num_frames; c++, f +=3D TCP_NUM_IOVS) { Nit: I find having the two parallel counters kind of confusing. It naturally goes away with the type change suggested above, but even without that I'd prefer an explicit multiply in the body. I strongly suspect the compiler will be better at working out if the strength reduction is worth it. > + struct tcp_tap_conn *conn =3D conns[c]; > + struct tcphdr *th =3D frames[f + TCP_IOV_PAYLOAD].iov_base; > + uint32_t seq =3D ntohl(th->seq); > + > + if (SEQ_LE(conn->seq_to_tap, seq)) Isn't this test inverted? We want to rewind seq_to_tap if seq is less than it, rather than the other way aruond. > + continue; > + > + conn->seq_to_tap =3D seq; > + } > +} > + > /** > * tcp_payload_flush() - Send out buffers for segments with data > * @c: Execution context > */ > static void tcp_payload_flush(const struct ctx *c) > { > - unsigned i; > size_t m; > =20 > m =3D tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, > tcp6_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp6_seq_update[i].seq +=3D tcp6_seq_update[i].len; > + if (m !=3D tcp6_payload_used) > + tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m][0], With the type change above this would become just &tcp_l2_iov[m]. > + tcp6_payload_used - m); > tcp6_payload_used =3D 0; > =20 > m =3D tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, > tcp4_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp4_seq_update[i].seq +=3D tcp4_seq_update[i].len; > + if (m !=3D tcp4_payload_used) > + tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m][0], > + tcp4_payload_used - m); > tcp4_payload_used =3D 0; > } > =20 > @@ -2129,10 +2145,11 @@ static int tcp_sock_consume(const struct tcp_tap_= conn *conn, uint32_t ack_seq) > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *co= nn, > ssize_t dlen, int no_csum, uint32_t seq) > { > - uint32_t *seq_update =3D &conn->seq_to_tap; > struct iovec *iov; > size_t l4len; > =20 > + conn->seq_to_tap =3D seq + dlen; > + > if (CONN_V4(conn)) { > struct iovec *iov_prev =3D tcp4_l2_iov[tcp4_payload_used - 1]; > const uint16_t *check =3D NULL; > @@ -2142,8 +2159,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > check =3D &iph->check; > } > =20 > - tcp4_seq_update[tcp4_payload_used].seq =3D seq_update; > - tcp4_seq_update[tcp4_payload_used].len =3D dlen; > + tcp4_frame_conns[tcp4_payload_used] =3D conn; > =20 > iov =3D tcp4_l2_iov[tcp4_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); > @@ -2151,8 +2167,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > if (tcp4_payload_used > TCP_FRAMES_MEM - 1) > tcp_payload_flush(c); > } else if (CONN_V6(conn)) { > - tcp6_seq_update[tcp6_payload_used].seq =3D seq_update; > - tcp6_seq_update[tcp6_payload_used].len =3D dlen; > + tcp6_frame_conns[tcp6_payload_used] =3D conn; > =20 > iov =3D tcp6_l2_iov[tcp6_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --PurO5sCZpiETVKdO Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZFbkkACgkQzQJF27ox 2GfU7xAAocKMCt9TgKhh8ZHwUI80L/ajxKsHnWhLhR+zk5zPukm8S9DZmOhwsMf+ 1bl4bm75Nssjq+2/rdomjKxDgWXICT+BmApQzVsVT8AKX3IZpW0adJos20g9u0gb 3y/NoYWxYfqjJj6EBEAgPS7ozT+u5L4W/B1qfNc3SlwZSdftkYUa3iEZG/ziNya9 FPu2GAt0+XzfKzOYEIgoUBVUzzUL6oJ/1zOzOxUhNYHWHbV5J7pWzmsEFuqcWUze u9Nkf02myCPRQuz+uOKKuQ3gAriTN/uxMmzZYwmuJZPaoOAp7JVZ1rDCYcGyQhwL 6mEvvyjcjgzX/Sc2u4yDfbkZJ7j4u5MfN2260R0SEPsROk1N8dIouvD13ORBTtlW mXQDH/XipiH0+uqovQIxewhlmR6UG+FZAc0d9ZQP0WlIvuM0fPPt97j+K0OA003Q 975d3Oas2zHuOydSNLYlnOiuWTPe0iSWmj9oQ5muu0gLfZu7XmzlV0zxbjzdG5IU wJ+ZsHs8WAG5UpOn/QoHcrQU2Lb66RZeDTjijILiV8IlYVWKKalZ3auh/TawrzwT 7c85loUpJeKsBfu3MN7X5Nups34SYEomGI7gDbH3u2lNPjY7pNfzA7RC0+OIa6Sd zYDSSBuHXSXk2wZe3qIvdn3yfX5/WUuNf2n0JYe2jSWpt2oD/Dk= =8B6C -----END PGP SIGNATURE----- --PurO5sCZpiETVKdO--