From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id CD7285A0307 for ; Wed, 05 Jun 2024 02:00:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1717545603; bh=CwGN1JEDo+zopEkxuNVtMR7gDdq3YpWkQVArSX/afsE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=R9pBUN0ohiasSWwnO8O8X/1jPr24KUWJVI6ZDL/447C27BHrA9dSTDIWqG1MlFRKt ET9lYEZPT/abl/g2Qmg4KCmiUI/373DgSnall4TBDJhIX5DaI5LkwLX7VVTptWVCnd +jpSBGdxyWZ0L0WFrlXnXr7JPpYt2NgAPnPSSnzSxQrvVOXi5j23MTR8v11mA4Ugye 1mE+CgYReGUbgSDUQOvpFTW17AYGza8xUkxzSR0/MEPG9G/fZX1j2V4w5Nu+98Mb24 XrX3PJEuOrGERCQIg2ZxfTtSAdmUFUPq9UzXtjBHu5xc5IGipeTQbyxYfeKO8CbWtj rJp22casjewdw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Vv6z74vTRz4wyl; Wed, 5 Jun 2024 10:00:03 +1000 (AEST) Date: Wed, 5 Jun 2024 09:59:19 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v8] tcp: move seq_to_tap update to when frame is queued Message-ID: References: <20240604182908.1833186-1-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="KycssJjp7K6rB+F3" Content-Disposition: inline In-Reply-To: <20240604182908.1833186-1-jmaloy@redhat.com> Message-ID-Hash: E2ICE4JWGMMTLMU6INJSTYQEYB4KMOKE X-Message-ID-Hash: E2ICE4JWGMMTLMU6INJSTYQEYB4KMOKE X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --KycssJjp7K6rB+F3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2024 at 02:29:08PM -0400, Jon Maloy wrote: > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter = for dropped frames") > delayed update of conn->seq_to_tap until the moment the corresponding > frame has been successfully pushed out. This has the advantage that we > immediately can make a new attempt to transmit a frame after a failed > trasnmit, rather than waiting for the peer to later discover a gap and > trigger the fast retransmit mechanism to solve the problem. >=20 > This approach has turned out to cause a problem with spurious sequence > number updates during peer-initiated retransmits, and we have realized > it may not be the best way to solve the above issue. >=20 > We now restore the previous method, by updating the said field at the > moment a frame is added to the outqueue. To retain the advantage of > having a quick re-attempt based on local failure detection, we now scan > through the part of the outqueue that had do be dropped, and restore the > sequence counter for each affected connection to the most appropriate > value. >=20 > Signed-off-by: Jon Maloy Reviewed-by: David Gibson >=20 > --- > v2: - Re-spun loop in tcp_revert_seq() and some other changes based on > feedback from Stefano Brivio. > - Added paranoid test to avoid that seq_to_tap becomes lower than > seq_ack_from_tap. >=20 > v3: - Identical to v2. Called v3 because it was embedded in a series > with that version. >=20 > v4: - In tcp_revert_seq(), we read the sequence number from the TCP > header instead of keeping a copy in struct tcp_buf_seq_update. > - Since the only remaining field in struct tcp_buf_seq_update is > a pointer to struct tcp_tap_conn, we eliminate the struct > altogether, and make the tcp6/tcp3_buf_seq_update arrays into > arrays of said pointer. > - Removed 'paranoid' test in tcp_revert_seq. If it happens, it > is not fatal, and will be caught by other code anyway. > - Separated from the series again. >=20 > v5: - Changed way to index array within tcp_revert_seq() >=20 > v6-v7: No changes. >=20 > v8: Fixed missing indexing of tcp4/6_frame_conns array in > tcp_payload_flush(). > --- > tcp.c | 61 ++++++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 39 insertions(+), 22 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 06acb41..89a5b19 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -408,16 +408,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; > */ > static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > =20 > -/** > - * tcp_buf_seq_update - Sequences to update with length of frames once s= ent > - * @seq: Pointer to sequence number sent to tap-side, to be updated > - * @len: TCP payload length > - */ > -struct tcp_buf_seq_update { > - uint32_t *seq; > - uint16_t len; > -}; > - > /* Static buffers */ > /** > * struct tcp_payload_t - TCP header and data to send segments with payl= oad > @@ -459,7 +449,8 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS4 <=3D sizeof(tcp4_payload[0].data), "MSS4 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp4_payload_used; > =20 > static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -481,7 +472,8 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS6 <=3D sizeof(tcp6_payload[0].data), "MSS6 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp6_payload_used; > =20 > static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -1257,25 +1249,51 @@ static void tcp_flags_flush(const struct ctx *c) > tcp4_flags_used =3D 0; > } > =20 > +/** > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed tran= smission > + * @conns: Array of connection pointers corresponding to queued fr= ames > + * @frames: Two-dimensional array containing queued frames with sub= -iovs > + * @num_frames: Number of entries in the two arrays to be compared > + */ > +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec (*f= rames)[TCP_NUM_IOVS], > + int num_frames) > +{ > + int i; > + > + for (i =3D 0; i < num_frames; i++) { > + struct tcp_tap_conn *conn =3D conns[i]; > + struct tcphdr *th =3D frames[i][TCP_IOV_PAYLOAD].iov_base; > + uint32_t seq =3D ntohl(th->seq); > + > + if (SEQ_LE(conn->seq_to_tap, seq)) > + continue; > + > + conn->seq_to_tap =3D seq; > + } > +} > + > /** > * tcp_payload_flush() - Send out buffers for segments with data > * @c: Execution context > */ > static void tcp_payload_flush(const struct ctx *c) > { > - unsigned i; > size_t m; > =20 > m =3D tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, > tcp6_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp6_seq_update[i].seq +=3D tcp6_seq_update[i].len; > + if (m !=3D tcp6_payload_used) { > + tcp_revert_seq(&tcp6_frame_conns[m], &tcp6_l2_iov[m], > + tcp6_payload_used - m); > + } > tcp6_payload_used =3D 0; > =20 > m =3D tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, > tcp4_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp4_seq_update[i].seq +=3D tcp4_seq_update[i].len; > + if (m !=3D tcp4_payload_used) { > + tcp_revert_seq(&tcp4_frame_conns[m], &tcp4_l2_iov[m], > + tcp4_payload_used - m); > + } > tcp4_payload_used =3D 0; > } > =20 > @@ -2129,10 +2147,11 @@ static int tcp_sock_consume(const struct tcp_tap_= conn *conn, uint32_t ack_seq) > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *co= nn, > ssize_t dlen, int no_csum, uint32_t seq) > { > - uint32_t *seq_update =3D &conn->seq_to_tap; > struct iovec *iov; > size_t l4len; > =20 > + conn->seq_to_tap =3D seq + dlen; > + > if (CONN_V4(conn)) { > struct iovec *iov_prev =3D tcp4_l2_iov[tcp4_payload_used - 1]; > const uint16_t *check =3D NULL; > @@ -2142,8 +2161,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > check =3D &iph->check; > } > =20 > - tcp4_seq_update[tcp4_payload_used].seq =3D seq_update; > - tcp4_seq_update[tcp4_payload_used].len =3D dlen; > + tcp4_frame_conns[tcp4_payload_used] =3D conn; > =20 > iov =3D tcp4_l2_iov[tcp4_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); > @@ -2151,8 +2169,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > if (tcp4_payload_used > TCP_FRAMES_MEM - 1) > tcp_payload_flush(c); > } else if (CONN_V6(conn)) { > - tcp6_seq_update[tcp6_payload_used].seq =3D seq_update; > - tcp6_seq_update[tcp6_payload_used].len =3D dlen; > + tcp6_frame_conns[tcp6_payload_used] =3D conn; > =20 > iov =3D tcp6_l2_iov[tcp6_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --KycssJjp7K6rB+F3 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZfqlYACgkQzQJF27ox 2GfLDQ//TKV6vn4U7gzRR2qizP/buVyooi7HsSJ+hRcu3FoxKdiel/Por3UAU6CR xiiSSYEvC1STrAL9nckByuSSbI9GSKguOYQUXXQ+OU1mR5Y8oV3mp0Mds9FXbS9N 7Tz9XvaZHzoihYIgJfSXxBmwNr4cdYqun4mDtShzqg3iNS3NxJd4leMccRt+tV6Z cAjB0Xa3DcZbNNOuRO4WB3YPCZvo+goWjc86QIUV/2D8C4gwXMUqrR6Buy0s0rt9 JPdYmX3+GOMJPGe9EStjBtVT24Hy6S8p7+GkkelZhebsOw5cOuZSE6Bo7j/GKxJo dZ/lMoIfJbhbrchmd3F26R7nukfs3z0uuzhhUld00SPxsy3sl9981KYVlLLxFVHh ITO8O2CjRpHZj8ahdPOKPJ+q2QbnM0zQIF1R3M2GFsuHcaxx6/CySHkxWUbeCYic HsDuXFeRxcPA4RLuzjZp9hcftWwd0JBODSDa/d1zwjskYIns+c2Y50lRKcu4/4jz 9Jt3k72yf9l6ESIGLTD6RcWmwV586rbqAJjjLKFYNLndKaBiR7RcmeyfyUwjPZzY KnW8lulf3Sf59BkvXp1BXR8erqYjBNHMQ8c3kZtcn82JC4hCQ9aCdxMy00wx1mUB OFyuL+yK6pJyRT7hpVVoXmngU29y6YantgpKMCNdnbebbmZvZu8= =Ze3m -----END PGP SIGNATURE----- --KycssJjp7K6rB+F3--