From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 2F2755A0281 for ; Mon, 13 May 2024 03:32:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715563964; bh=bu5MLjrDjCJw1+7d7ZhcXgzKeKYzJ5zLD2HpdHLSiNk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qHy9FpD96ZtaxiD29fIgOQ/Abz8OP/uvm9emngPlvUOXpjx/OhfW5Jpte4/gvEmAt O12zQ5lgi9KG6m59g6RioJ0feZ7MFRt/eVapbgJmXNaZbdjJIfB20wPblDJ2Fp0kSh 9hIqgwMAYkOPIxuc3uaLGBGkw3NZy7BfzjNMrR86aLb60Xwwzi2MO9R489My9halJd +8Z81MuPp03ktEYfozJ/fh2NrDhXf0l3MnGN+YGtoPqrvOqb9fabC0p5dfSBH6hF8a OJG1PRqJ4yTwScrwoKtYlqo982eAj0msw/NhC2P/FiGzjCSAMCm8WZZjtW2TeLKHsz oYLi0dYcln8pg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Vd26h3VHNz4wcn; Mon, 13 May 2024 11:32:44 +1000 (AEST) Date: Mon, 13 May 2024 11:03:11 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] tcp: move seq_to_tap update to when frame is queued Message-ID: References: <20240509030023.4153802-1-jmaloy@redhat.com> <20240510184030.44b57a2f@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="wxmG7+FwRAdSTSt3" Content-Disposition: inline In-Reply-To: <20240510184030.44b57a2f@elisabeth> Message-ID-Hash: KJGWHVZRXXU5RFXKXPZYYF7L374SE6BZ X-Message-ID-Hash: KJGWHVZRXXU5RFXKXPZYYF7L374SE6BZ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --wxmG7+FwRAdSTSt3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 10, 2024 at 06:40:30PM +0200, Stefano Brivio wrote: > On Wed, 8 May 2024 23:00:23 -0400 > Jon Maloy wrote: >=20 > > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counte= r for dropped frames") > > delayed update of conn->seq_to_tap until the moment the corresponding > > frame has been successfully pushed out. This has the advantage that we > > immediately can retransmit a buffer that we fail to trasnmit, rather > > than waiting for the peer side to discover the loss and initiate fast > > retransmit. >=20 > It's not really fast retransmit, it's a simple retry of the operation > that didn't succeed. We didn't even transmit. >=20 > >=20 > > This approach has turned out to cause a problem with spurious sequence > > number updates during peer-initiated retransmits, and we have realized > > it may not be the best way to solve te above issue. > >=20 > > We now restore the previous method, by updating the said field at the > > moment a frame is added to the outqueue. To retain the advantage of fast > > retansmit >=20 > Same here. >=20 > > based on local failure detection, we now scan through the part > > of the outqueue that had do be dropped, and restore the sequence counter > > for each affected connection to the most appropriate value. > >=20 > > Signed-off-by: Jon Maloy > > --- > > tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 42 insertions(+), 10 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index 21d0af0..58fdbc9 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -412,11 +412,13 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE= _SIZE]; > > =20 > > /** > > * tcp_buf_seq_update - Sequences to update with length of frames once= sent >=20 > This is not the case anymore, maybe: >=20 > * tcp_conn_old_seq() - Old sequence numbers for connections with pending= frames >=20 > > - * @seq: Pointer to sequence number sent to tap-side, to be updated > > + * @conn: Pointer to connection corresponding to frame. May need= update >=20 > Mixed whitespace and tabs. It looks like the connection pointer might > need to be updated... what about: >=20 > * @conn: Pointer to connection for this frame >=20 > ? >=20 > > + * @seq: Sequence number of the corresponding frame > > * @len: TCP payload length >=20 > The length is not needed anymore. Strictly speaking, I don't think you need the sequence number here either: it should be in the frame itself. The fiddliness of extracting it from the buffer might make it worthwhile to store here anyway. > > */ > > struct tcp_buf_seq_update { > > - uint32_t *seq; > > + struct tcp_tap_conn *conn; > > + uint32_t seq; > > uint16_t len; > > }; > > =20 > > @@ -1261,25 +1263,52 @@ static void tcp_flags_flush(const struct ctx *c) > > tcp4_flags_used =3D 0; > > } > > =20 > > +/** > > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed tr= ansmission > > + * @seq_update: Array with connection and sequence number data > > + * @s: Entry corresponding to first dropped frame > > + * @e: Entry corresponding to last dropped frame >=20 > These are not pointer to the entries, though. They are indices of the > queued frames. >=20 > > + */ > > +static void tcp_revert_seq(struct tcp_buf_seq_update *seq_update, int = s, int e) > > +{ > > + struct tcp_tap_conn *conn; > > + uint32_t lowest_seq; > > + int i, ii; > > + > > + for (i =3D s; i < e; i++) { > > + conn =3D seq_update[i].conn; > > + lowest_seq =3D seq_update[i].seq; > > + > > + for (ii =3D i + 1; ii < e; ii++) { > > + if (seq_update[ii].conn !=3D conn) > > + continue; > > + if (SEQ_GT(lowest_seq, seq_update[ii].seq)) > > + lowest_seq =3D seq_update[ii].seq; > > + } >=20 > If I recall correctly, David suggested a simpler approach that avoids > this O(n^2) scan, based on the observation that 1. the first entry you > find in the table also has the lowest sequence number (we don't send > frames out-of-order), and that 2. you'll never revert to a higher > sequence number (the two lines below take care of that). Right.. > That is, you could just scan the table once, and if you find a sequence > number that's lower than the current sequence stored for the connection, > store it. >=20 > > + > > + if (SEQ_GT(conn->seq_to_tap, lowest_seq)) > > + conn->seq_to_tap =3D lowest_seq; =2E.these lines here, specifically. Basically we rewind seq_to_tap each time we find an untransmitted frame that sits before it. Theoretically that could involve multiple rewinds, but a) that's not fatal, merely suboptimal and b) it won't happen in practice, since frames in the queue will (nearly?) always have increasing sequence numbers. > > + } > > +} > > + > > /** > > * tcp_payload_flush() - Send out buffers for segments with data > > * @c: Execution context > > */ > > static void tcp_payload_flush(const struct ctx *c) > > { > > - unsigned i; > > size_t m; > > =20 > > m =3D tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, > > tcp6_payload_used); > > - for (i =3D 0; i < m; i++) > > - *tcp6_seq_update[i].seq +=3D tcp6_seq_update[i].len; > > + if (m !=3D tcp6_payload_used) > > + tcp_revert_seq(tcp6_seq_update, m, tcp6_payload_used); > > tcp6_payload_used =3D 0; > > =20 > > m =3D tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, > > tcp4_payload_used); > > - for (i =3D 0; i < m; i++) > > - *tcp4_seq_update[i].seq +=3D tcp4_seq_update[i].len; > > + if (m !=3D tcp4_payload_used) > > + tcp_revert_seq(tcp4_seq_update, m, tcp4_payload_used); > > tcp4_payload_used =3D 0; > > } > > =20 > > @@ -2129,10 +2158,11 @@ static int tcp_sock_consume(const struct tcp_ta= p_conn *conn, uint32_t ack_seq) > > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *= conn, > > ssize_t dlen, int no_csum, uint32_t seq) > > { > > - uint32_t *seq_update =3D &conn->seq_to_tap; > > struct iovec *iov; > > size_t l4len; > > =20 > > + conn->seq_to_tap =3D seq; >=20 > This is the sequence number for the frame we're sending (start of this > frame), but not the current byte sequence sent to the "tap" (end of > this frame), which would be seq + dlen, I think. >=20 > > + > > if (CONN_V4(conn)) { > > struct iovec *iov_prev =3D tcp4_l2_iov[tcp4_payload_used - 1]; > > const uint16_t *check =3D NULL; > > @@ -2142,7 +2172,8 @@ static void tcp_data_to_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > > check =3D &iph->check; > > } > > =20 > > - tcp4_seq_update[tcp4_payload_used].seq =3D seq_update; > > + tcp4_seq_update[tcp4_payload_used].conn =3D conn; > > + tcp4_seq_update[tcp4_payload_used].seq =3D seq; > > tcp4_seq_update[tcp4_payload_used].len =3D dlen; > > =20 > > iov =3D tcp4_l2_iov[tcp4_payload_used++]; > > @@ -2151,7 +2182,8 @@ static void tcp_data_to_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > > if (tcp4_payload_used > TCP_FRAMES_MEM - 1) > > tcp_payload_flush(c); > > } else if (CONN_V6(conn)) { > > - tcp6_seq_update[tcp6_payload_used].seq =3D seq_update; > > + tcp6_seq_update[tcp6_payload_used].conn =3D conn; > > + tcp6_seq_update[tcp6_payload_used].seq =3D seq; > > tcp6_seq_update[tcp6_payload_used].len =3D dlen; > > =20 > > iov =3D tcp6_l2_iov[tcp6_payload_used++]; >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --wxmG7+FwRAdSTSt3 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZBZsIACgkQzQJF27ox 2Gfw8A/9GoWasjkrmsoUkJEEfB9Hoi0M1RUX78QN8gs3C81PwyqAMjrZ4ulig3VI dJ0W3sL0kgmisa3jGOp8KejPbU2xe9XXQ0sdqx6YVYpPm+rHt4gEhBVBwW349xn8 m+4THm7NNJRX8b3KDjQVj1lhMWubh1aBjyy9FjcOF1Y74gZ2pdPbSivwwjJVFUla wnQU6oZw19hgt1XgGKJm9gPgBYvcVp0zhhld8zIUiTanf3VBZsEFqyj/qusMw95o 5ZJp8H0duuM8EjJjhkJ4EnNECLMG09aFdkQQDPqMz9EYNyDIMxz17D5uoIFg7dGE z4NhhBS8hxEZKBeiCBG6GB8+wg5LR1Li9ELJkhnvqlCxQ+kTcMVKCpZHXySM9aXS FMkqnlKdYMTffcPwD35Uc14RBUYKh5DNWc4VXd1aUCUB0bolNm8cxjdFeEk/a+VN Bk0LLsMa7P5zwUu7uX2Yp12XNZgbaOZf2pH7PUW6bX5prYEa7V2L2Ya0tk95jehx j2yFMrCva744fjcFH0d3jwfdhQUDByJVT/uRYlmLcIoZYtWmXG6stWNvUASAXqdX jK/tObK8SJz4EBaW+EyHSzBDvGgzI3x4TMmpqDqDSQUoT2fxnTwsu1gf6n14ZIbM zD84Exr0pR+AJ96IiFw/5ZAt+wmcb/+uOqzjeOS7QAIaPfz263g= =9fL+ -----END PGP SIGNATURE----- --wxmG7+FwRAdSTSt3--