From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from imap.gmail.com [173.194.76.109] by localhost with POP3 (fetchmail-6.3.26) for (single-drop); Mon, 20 May 2024 11:50:15 +0200 (CEST) Received: by 2002:a05:6a10:9148:b0:55f:c3c0:ed08 with SMTP id n8csp276502pxb; Mon, 20 May 2024 02:50:02 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUrpyrmitkKNhju2GLlYHGdaw5QXDuxlj+p2gD8nVhgZVWJDmsBi8dOkNBSIftTzqINQOh/y4EoBJxwSl3g2R8vDPd8YkrFcNQ= X-Google-Smtp-Source: AGHT+IHz57jwLlT8PaC2O+q3JfDHrCeoxKdIRAaUYM5ZxtPt4xgtUQfiuMO3T+XY6kFv11kvwe/I X-Received: by 2002:a05:6214:319e:b0:6a3:5bb8:12dd with SMTP id 6a1803df08f44-6a35bb83836mr174824016d6.56.1716198602713; Mon, 20 May 2024 02:50:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716198602; cv=none; d=google.com; s=arc-20160816; b=H7Kj7waMvojJIHFVeLVcdKqG7tyQQlWs0wdfUqwY7s/t9vBkl8orJs/1Tm2eVHrmWd T4lTzxwEBPNx3vvM2koD3YOZ2HEVrPEJwP1aDVaB/Xa1yIY508OxbBJIz5emh2AN+bou RvtPUUnGMKsM3AQdjYvmmagAwABIF5tYxgNEkw/2/W+PIdfhiliGDrMPDklP3ktKEJ0Y ikTQHSvvoVts3V4KwjHV6hx72dGxYV8vPNd9Cw0/WngaIA1SXFI9mT8G0rORHvHGo4Vd a0gQhn67Mtd3d19qZdhdVztgxS6cgsLcYDBEmx+mvawTaex+C4gIWJ61XMP0h3Xf+xXk l3Ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:dkim-signature:delivered-to; bh=bLe0d6/3O5AX0q851+Ceau0yJFyO6EIQZrukeEgug94=; fh=MBMSY9n9QGUzmoRBE6HD7OZVa/6vAxSpDuj2NqustW0=; b=CymHdGBohfVVFDDEqZKHKEYm+r+zYhawjWfHVsIilkl3tMjb5Us+WSq8nGpuQ8uvbN SaPwcTmlsuCGjDx4QCE3e4V+WheHv7mJLJMYNcFdOKJDSlxEdBe1iAsQ+Y3GZTG4DLsn erc/US5I7xw8JwWsMVGPICPRisajPxzFTZabodV8wSt4nI3j1BrNjNDhvib6XAmWR0l8 4P32zjPz0hmguLFFVBr3OH2QupHygk0C930VMVNQDpT0EFWQXidqE0po0SxLE/QDvYa5 UtDV+g4qNLlYOt73Nblk/6jDFL0bCI9b6VnGEsV9dOXNhOTCQ4J3D0X+kqTk9tcwXA5W KrYg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@gibson.dropbear.id.au header.s=202312 header.b=WGD19Giw; spf=pass (google.com: domain of dgibson@gandalf.ozlabs.org designates 150.107.74.76 as permitted sender) smtp.mailfrom=dgibson@gandalf.ozlabs.org Return-Path: Received: from us-smtp-inbound-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com. [205.139.110.120]) by mx.google.com with ESMTPS id 6a1803df08f44-6a15f3060a8si255369826d6.581.2024.05.20.02.50.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 May 2024 02:50:02 -0700 (PDT) Received-SPF: pass (google.com: domain of dgibson@gandalf.ozlabs.org designates 150.107.74.76 as permitted sender) client-ip=150.107.74.76; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@gibson.dropbear.id.au header.s=202312 header.b=WGD19Giw; spf=pass (google.com: domain of dgibson@gandalf.ozlabs.org designates 150.107.74.76 as permitted sender) smtp.mailfrom=dgibson@gandalf.ozlabs.org Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-230-vGQrrqi9PRWBAF8VjobO3g-1; Mon, 20 May 2024 05:50:00 -0400 X-MC-Unique: vGQrrqi9PRWBAF8VjobO3g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 10F10800074 for ; Mon, 20 May 2024 09:50:00 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 0D686C15BB9; Mon, 20 May 2024 09:50:00 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast03.extmail.prod.ext.rdu2.redhat.com [10.11.55.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C94E8C15BB1 for ; Mon, 20 May 2024 09:49:59 +0000 (UTC) Received: from us-smtp-inbound-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AC9CE812296 for ; Mon, 20 May 2024 09:49:59 +0000 (UTC) Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-122-0V3XV6w9NQShgc3v_CchIg-1; Mon, 20 May 2024 05:49:55 -0400 X-MC-Unique: 0V3XV6w9NQShgc3v_CchIg-1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1716198590; bh=bLe0d6/3O5AX0q851+Ceau0yJFyO6EIQZrukeEgug94=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WGD19GiwDCjW4q3Avt+9/q/5C5NsOhUZsu3ySWRFOXbQ6C77RrKlWasWmYHKZHdU+ AD6MokaQQWBThyHbGDZe6RvfdkCmtA4H43jpVe3mSUIdOJ8zouTIoIO3kinLWPrmJU jIiUksON2+/pkAJEWPb3zjBdaB/2mhmeF5C3hO6dYDb/xoU6j/obqGVpy7BaGRJZV7 li2o8i9KjZPgU9ZJJQJdk5eVCihCAGnP83H2XefR5a1cCTa3r1iGW5c1/9ggDrUCHU qKB3wiVBlQnW/zsxKL6dU3UEaCGmJUr3xfVoSUgQJKju16BDKTUK2WfRjD2aVhcoua zCyLPK9+edHBw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4VjXq20KV8z4x0K; Mon, 20 May 2024 19:49:50 +1000 (AEST) Date: Mon, 20 May 2024 17:46:40 +1000 From: David Gibson To: Jon Maloy Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com Subject: Re: [PATCH v6 1/3] tcp: move seq_to_tap update to when frame is queued Message-ID: References: <20240517152414.1188282-1-jmaloy@redhat.com> <20240517152414.1188282-2-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Pn09t6c8NcxqTuXo" Content-Disposition: inline In-Reply-To: <20240517152414.1188282-2-jmaloy@redhat.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 List-Id: --Pn09t6c8NcxqTuXo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 17, 2024 at 11:24:12AM -0400, Jon Maloy wrote: > commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence counter = for dropped frames") > delayed update of conn->seq_to_tap until the moment the corresponding > frame has been successfully pushed out. This has the advantage that we > immediately can make a new attempt to transmit a frame after a failed > trasnmit, rather than waiting for the peer to later discover a gap and > trigger the fast retransmit mechanism to solve the problem. >=20 > This approach has turned out to cause a problem with spurious sequence > number updates during peer-initiated retransmits, and we have realized > it may not be the best way to solve the above issue. >=20 > We now restore the previous method, by updating the said field at the > moment a frame is added to the outqueue. To retain the advantage of > having a quick re-attempt based on local failure detection, we now scan > through the part of the outqueue that had do be dropped, and restore the > sequence counter for each affected connection to the most appropriate > value. >=20 > Signed-off-by: Jon Maloy >=20 > --- > v2: - Re-spun loop in tcp_revert_seq() and some other changes based on > feedback from Stefano Brivio. > - Added paranoid test to avoid that seq_to_tap becomes lower than > seq_ack_from_tap. >=20 > v3: - Identical to v2. Called v3 because it was embedded in a series > with that version. >=20 > v4: - In tcp_revert_seq(), we read the sequence number from the TCP > header instead of keeping a copy in struct tcp_buf_seq_update. > - Since the only remaining field in struct tcp_buf_seq_update is > a pointer to struct tcp_tap_conn, we eliminate the struct > altogether, and make the tcp6/tcp3_buf_seq_update arrays into > arrays of said pointer. > - Removed 'paranoid' test in tcp_revert_seq. If it happens, it > is not fatal, and will be caught by other code anyway. > - Separated from the series again. >=20 > v5: - A couple of style issues. > --- > tcp.c | 61 ++++++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 39 insertions(+), 22 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 21d0af0..3a2350a 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; > */ > static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > =20 > -/** > - * tcp_buf_seq_update - Sequences to update with length of frames once s= ent > - * @seq: Pointer to sequence number sent to tap-side, to be updated > - * @len: TCP payload length > - */ > -struct tcp_buf_seq_update { > - uint32_t *seq; > - uint16_t len; > -}; > - > /* Static buffers */ > /** > * struct tcp_payload_t - TCP header and data to send segments with payl= oad > @@ -461,7 +451,8 @@ static struct tcp_payload_t tcp4_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS4 <=3D sizeof(tcp4_payload[0].data), "MSS4 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp4_payload_used; > =20 > static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -483,7 +474,8 @@ static struct tcp_payload_t tcp6_payload[TCP_FRAMES_M= EM]; > =20 > static_assert(MSS6 <=3D sizeof(tcp6_payload[0].data), "MSS6 is greater t= han 65516"); > =20 > -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; > +/* References tracking the owner connection of frames in the tap outqueu= e */ > +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp6_payload_used; > =20 > static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; > @@ -1261,25 +1253,51 @@ static void tcp_flags_flush(const struct ctx *c) > tcp4_flags_used =3D 0; > } > =20 > +/** > + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed tran= smission > + * @conns: Array of connection pointers corresponding to queued fr= ames > + * @frames: Two-dimensional array containing queued frames with sub= -iovs > + * @num_frames: Number of entries in the two arrays to be compared > + */ > +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec (*f= rames)[TCP_NUM_IOVS], > + int num_frames) > +{ > + int i; > + > + for (i =3D 0; i < num_frames; i++) { > + struct tcp_tap_conn *conn =3D conns[i]; > + struct tcphdr *th =3D frames[i][TCP_IOV_PAYLOAD].iov_base; > + uint32_t seq =3D ntohl(th->seq); > + > + if (SEQ_LE(conn->seq_to_tap, seq)) > + continue; > + > + conn->seq_to_tap =3D seq; Not worth a respin, but given the other simplifications to this, it would be clearer to have: if (!SEQ_LE(conn->seq_to_tap, seq)) conn->seq_to_tao =3D seq; Rather than using continue; > + } > +} > + > /** > * tcp_payload_flush() - Send out buffers for segments with data > * @c: Execution context > */ > static void tcp_payload_flush(const struct ctx *c) > { > - unsigned i; > size_t m; > =20 > m =3D tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, > tcp6_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp6_seq_update[i].seq +=3D tcp6_seq_update[i].len; > + if (m !=3D tcp6_payload_used) { > + tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m], > + tcp6_payload_used - m); Hrm.. AFAICT tcp_revert_seq() is using the same indices into conns[] and frames[]. But here, aren't you passing the frames array from entry m onwards, but the conns array from 0 onwards? Meaning that revert_seq() might use the wrong connections for each frame. I think you either need tcp_revert_seq(&tcp6_frame_conns[m], &tcp6_l2_iov[m], ...) Or else pass the unindexed arrays here, and take the start index as a new parameter to tcp_revert_seq(). > + } > tcp6_payload_used =3D 0; > =20 > m =3D tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, > tcp4_payload_used); > - for (i =3D 0; i < m; i++) > - *tcp4_seq_update[i].seq +=3D tcp4_seq_update[i].len; > + if (m !=3D tcp4_payload_used) { > + tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m], > + tcp4_payload_used - m); Same thing here, of course. > + } > tcp4_payload_used =3D 0; > } > =20 > @@ -2129,10 +2147,11 @@ static int tcp_sock_consume(const struct tcp_tap_= conn *conn, uint32_t ack_seq) > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *co= nn, > ssize_t dlen, int no_csum, uint32_t seq) > { > - uint32_t *seq_update =3D &conn->seq_to_tap; > struct iovec *iov; > size_t l4len; > =20 > + conn->seq_to_tap =3D seq + dlen; Now that we update seq_to_tap here, we don't really need seq as a parameter any more, which would also simplify logic in the caller slightly. > if (CONN_V4(conn)) { > struct iovec *iov_prev =3D tcp4_l2_iov[tcp4_payload_used - 1]; > const uint16_t *check =3D NULL; > @@ -2142,8 +2161,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > check =3D &iph->check; > } > =20 > - tcp4_seq_update[tcp4_payload_used].seq =3D seq_update; > - tcp4_seq_update[tcp4_payload_used].len =3D dlen; > + tcp4_frame_conns[tcp4_payload_used] =3D conn; > =20 > iov =3D tcp4_l2_iov[tcp4_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); > @@ -2151,8 +2169,7 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > if (tcp4_payload_used > TCP_FRAMES_MEM - 1) > tcp_payload_flush(c); > } else if (CONN_V6(conn)) { > - tcp6_seq_update[tcp6_payload_used].seq =3D seq_update; > - tcp6_seq_update[tcp6_payload_used].len =3D dlen; > + tcp6_frame_conns[tcp6_payload_used] =3D conn; > =20 > iov =3D tcp6_l2_iov[tcp6_payload_used++]; > l4len =3D tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Pn09t6c8NcxqTuXo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZK/9sACgkQzQJF27ox 2GclnQ//TmbY6U0XwJB9ua/4qptWaUQ8AtN02t76uSSSN244zmjSLdziz8EAf8YQ DG2Wyilaj7hpUlBq/oNxNQmjWpvABc/Hlc+z59T7kuJRv5uWMTAY1uhJu5+Sz1/Q 0NlbHy+03O1z1kuOWd+7W5xcENEE1E7j74BHakeObAwXU2F99R1eaQsSfZ3JwvYe CFTzxISdeJXTG74IzXOMiy8pKIjfpi40jbiCxfGc4z0UCBeKi2aHbxtPtH7IKIjG N6NqfPOdJCwu45cP2V3wN1bkghwEv06Wv2NlULQeAvSrZ/J9pkdb5rLT5sELFgOT TyUcaJ4DtwcDqhKPR26t0JWQYI3wZqY9aQlCv51mimJqxQBgFaRHaqsF0W5Qcbw3 U2K3VFjvgblqEQzT9sCPuhIGFidbr3+0u5xrUTy8rWq9f3sWlzkvssdISOiKIUZS OZfFvdY4Zcpo7uEd3awneExbjrBN/YilozE18KxCds9QklzC5Wp7ds7j7Z3aGAGs sB64DvCFy1dZsQldlv3jyoCstNAiqfQutkPW18XkFTgvaWeo0Lf8WUYzUX26kQce hJ3QEgIegOy5baRYvIIANWfuw00OEeO7MmIUSC13AJS2xzA2VrRGzEXSioX4Abl8 pIyt/k19aUvuqvNzJ5+P+kEWoIrEUy6C5ap4BMtduqQUqyF8+3k= =DQ8s -----END PGP SIGNATURE----- --Pn09t6c8NcxqTuXo--