From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id C792E5A02CF for ; Thu, 16 May 2024 07:26:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715837158; bh=TDh8feJ7kFwMQall4Um8B4oIBmPcfhPGHDNBGydb868=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jTZAsDhvZXA28T27L9Dmdif6NWTBnl4345scO1LhDjuusbAgh0onll1A3EH1CP+LO DdHAhEFpX7AIRGu9CbZjI3PKoh32XCaFUp9UUZ8diIZI2n9FdkZpALdC3tK+yUHABn C4zi8FBrHoFv3C9qAhS4qPMkKex2MX2cHuFaxnkpXt/UJ6nYepwBwqRdiELB0e73QS D+m9WUVMW7adzSJqRJuSXGB3GV3pZW8mvsw127P6Zclwu9ES+v1QRonitLrNY/9v1q 0EA/FUSUldnMSmXWl9SMz4RaA884OXvZzTM+NrxMWspWPTJI13ewnseZTrq7Qlk+B6 /XZsFFz2qxYsQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Vfz8Q62fSz4wcR; Thu, 16 May 2024 15:25:58 +1000 (AEST) Date: Thu, 16 May 2024 14:16:06 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v4 1/3] tcp: move seq_to_tap update to when frame is queued Message-ID: References: <20240515153429.859185-1-jmaloy@redhat.com> <20240515153429.859185-2-jmaloy@redhat.com> <1967b10f-479e-686e-9afc-ff52184f4198@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="wh7qVguP6m/9PgyM" Content-Disposition: inline In-Reply-To: <1967b10f-479e-686e-9afc-ff52184f4198@redhat.com> Message-ID-Hash: TLCYDVWCLHCLLRAZYET5RXVTVG5X235O X-Message-ID-Hash: TLCYDVWCLHCLLRAZYET5RXVTVG5X235O X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --wh7qVguP6m/9PgyM Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 15, 2024 at 10:57:06PM -0400, Jon Maloy wrote: >=20 >=20 > On 2024-05-15 22:24, David Gibson wrote: > > On Wed, May 15, 2024 at 11:34:27AM -0400, Jon Maloy wrote: > > > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence coun= ter for dropped frames") > > > delayed update of conn->seq_to_tap until the moment the corresponding > > > frame has been successfully pushed out. This has the advantage that we > > > immediately can make a new attempt to transmit a frame after a failed > > > trasnmit, rather than waiting for the peer to later discover a gap and > > > trigger the fast retransmit mechanism to solve the problem. > > >=20 > > > This approach has turned out to cause a problem with spurious sequence > > > number updates during peer-initiated retransmits, and we have realized > > > it may not be the best way to solve the above issue. > > >=20 > > > We now restore the previous method, by updating the said field at the > > > moment a frame is added to the outqueue. To retain the advantage of > > > having a quick re-attempt based on local failure detection, we now sc= an > > > through the part of the outqueue that had do be dropped, and restore = the > > > sequence counter for each affected connection to the most appropriate > > > value. > > >=20 > > > Signed-off-by: Jon Maloy > > >=20 > > > --- > > > v2: - Re-spun loop in tcp_revert_seq() and some other changes based on > > > feedback from Stefano Brivio. > > > - Added paranoid test to avoid that seq_to_tap becomes lower than > > > seq_ack_from_tap. > > >=20 > > > v3: - Identical to v2. Called v3 because it was embedded in a series > > > with that version. > > >=20 > > > v4: - In tcp_revert_seq(), we read the sequence number from the TCP > > > header instead of keeping a copy in struct tcp_buf_seq_update. > > > - Since the only remaining field in struct tcp_buf_seq_update is > > > a pointer to struct tcp_tap_conn, we eliminate the struct > > > altogether, and make the tcp6/tcp3_buf_seq_update arrays into > > > arrays of said pointer. > > > - Removed 'paranoid' test in tcp_revert_seq. If it happens, it > > > is not fatal, and will be caught by other code anyway. > > > - Separated from the series again. > > > --- > > > tcp.c | 59 +++++++++++++++++++++++++++++++++++++-------------------= --- > > > 1 file changed, 37 insertions(+), 22 deletions(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index 21d0af0..976dba8 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; > > > */ > > > static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > > > -/** > > > - * tcp_buf_seq_update - Sequences to update with length of frames on= ce sent > > > - * @seq: Pointer to sequence number sent to tap-side, to be updated > > > - * @len: TCP payload length > > > - */ > > > -struct tcp_buf_seq_update { > > > - uint32_t *seq; > > > - uint16_t len; > > > -}; > > > - > > > /* Static buffers */ > > > /** > > > * struct tcp_payload_t - TCP header and data to send segments with= payload > > > @@ -461,7 +451,8 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAM= ES_MEM]; > > > static_assert(MSS4 <=3D sizeof(tcp4_payload[0].data), "MSS4 is grea= ter than 65516"); > > > -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; > > > +/* References tracking the owner connection of frames in the tap out= queue */ > > > +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; > > > static unsigned int tcp4_payload_used; > > > static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; > > > @@ -483,7 +474,8 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAM= ES_MEM]; > > > static_assert(MSS6 <=3D sizeof(tcp6_payload[0].data), "MSS6 is grea= ter than 65516"); > > > -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; > > > +/* References tracking the owner connection of frames in the tap out= queue */ > > > +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; > > > static unsigned int tcp6_payload_used; > > > static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; > > > @@ -1261,25 +1253,49 @@ static void tcp_flags_flush(const struct ctx = *c) > > > tcp4_flags_used =3D 0; > > > } > > > +/** > > > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed = transmission > > > + * @conns: Array of connection pointers corresponding to queue= d frames > > > + * @frames: Two-dimensional array containing queued frames with= sub-iovs > > You can make the 2d array explicit in the type as: > > struct iovec (*frames)[TCP_NUM_IOVS]; > > See, for example the 'tap_iov' local in udp_tap_send(). (I recommend > > the command line tool 'cdecl', also available online at cdecl.org for > > working out confusing pointer-to-array types). > Nice. I=A0 wasn't quite happy with this. > >=20 > > > + * @num_frames: Number of entries in the two arrays to be compared > > > + */ > > > +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec= *frames, > > > + int num_frames) > > > +{ > > > + int c, f; > > > + > > > + for (c =3D 0, f =3D 0; c < num_frames; c++, f +=3D TCP_NUM_IOVS) { > > Nit: I find having the two parallel counters kind of confusing. It > > naturally goes away with the type change suggested above, but even > > without that I'd prefer an explicit multiply in the body. I strongly > > suspect the compiler will be better at working out if the strength > > reduction is worth it. > >=20 > > > + struct tcp_tap_conn *conn =3D conns[c]; > > > + struct tcphdr *th =3D frames[f + TCP_IOV_PAYLOAD].iov_base; > > > + uint32_t seq =3D ntohl(th->seq); > > > + > > > + if (SEQ_LE(conn->seq_to_tap, seq)) > > Isn't this test inverted? We want to rewind seq_to_tap if seq is less > > than it, rather than the other way aruond. > No. We do 'continue', i.e., nothing, if this condition is fulfilled. > This may look a little non-intuitive here, but makes sense when I add the > next patch. Oh, of course, my mistake. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --wh7qVguP6m/9PgyM Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZFiHkACgkQzQJF27ox 2GeNAA/9Gmz5qY5/K1pZXEywwFbKGjfujAQt/43YAA9tCbYISxiefhV1FbKypi5B g1rVetNz41qDFmYtIGX1z09ZBvFpfvUPu59Fxa7O64AmG8sw4Hwf9dUekDlNkbc9 bZG9nAnI2NyNfFt2tdRFj8ygNJeVobrrap7xxnYaLNZCcKmC9gzHRKSUsbxeyTKZ FmXdOy8ys590zqjumHXsvEOQdLS9YW9VmNDMdZ6WPLXrx1hkBHlskXLHKdYSCJr7 yQ9S47ohkY+qlLS+qQcfKLEsu+C6ESEFWA58I3v4bpQ3l6g29UWedfuZDcucpXwY wxLFyGES4OalXuTreNEf7O3wvHNb3QO35PW7EoVfQYYFTwy5w8Zl+0NGo8W9VbSX v7bVBaHHhJsT0ge/wGJbtQCauGc0tJDDTzpkYFw97DXO0y+J3XVt2QCo2XHNeE5y rsxVoDqoiVqmhd9U8l+1a+dr7sp8msJHJLu0dxAVHltlNYmDhhGG2BVpuC9x6jGs 3zqMm1SUv4yJtZUUVenkRfRZa9yvg6+VcfnALReSyEjSJfd6kxhptN886tMK5b6V MRnfgv8tm/8hcRleCnvrS2UxXuoFmqpix393AM/Tb6cKWRnlbCEyviEUFFjF22rt cGmqP7WfYzCG9SpOsa6SYccqJwXEUycJOoDjr4I0zhbDiFmyPn8= =0hmL -----END PGP SIGNATURE----- --wh7qVguP6m/9PgyM--