From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id B142E5A0262 for ; Fri, 24 Mar 2023 06:23:28 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4PjVws12s2z4xFV; Fri, 24 Mar 2023 16:23:25 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1679635405; bh=iG3NLyZUWO+rqK1i3GOJHq958QDptGmXhrRsnwPGDOc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oVF8cevAS4qyqqp8fiR3hAeftt6yM5m+DwYX6eZT4LQ6yBr9ZpzpWNqRux29sIXzp Dk7QMk7+HI6KBxj6DECepZ4JvXrLIrnV4DGPp7CFQCHdUlgXvl9sxm9jjFUhJGvC10 NdwKusnXav8p1A5JcdYwD4BA3ycnNMUjPd/oCVGA= Date: Fri, 24 Mar 2023 16:18:11 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] tcp: Clear and set ACK_FROM_TAP_DUE also on unchanged ACK sequence from peer Message-ID: References: <20230323160831.2206975-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6ebBC355sPWsnu36" Content-Disposition: inline In-Reply-To: <20230323160831.2206975-1-sbrivio@redhat.com> Message-ID-Hash: F4VX2NIADDQEF3DYYG2CQJROBDCTBE62 X-Message-ID-Hash: F4VX2NIADDQEF3DYYG2CQJROBDCTBE62 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --6ebBC355sPWsnu36 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 23, 2023 at 05:08:31PM +0100, Stefano Brivio wrote: I'm pretty sure this will make things better than they were, so in that sense: Reviewed-by: David Gibson However, I'm not entirely convinced by some of the reasoning below. > Since commit cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as > needed, update timer"), we don't clear ACK_FROM_TAP_DUE whenever we > process an ACK segment, but, more correctly, only if we're really not > waiting for a further ACK segment, that is, only if the acknowledged > sequence matches what we sent. >=20 > In the new function implementing this, tcp_update_seqack_from_tap(), > we also reset the retransmission counter and store the updated ACK > sequence. Both should be done iff forward progress is acknowledged, > implied by the fact that the new ACK sequence is greater than the > one we previously stored. >=20 > At that point, it looked natural to also include the statements that > clear and set the ACK_FROM_TAP_DUE flag inside the same conditional > block: if we're not making forward progress, the need for an ACK, or > lack thereof, should remain unchanged. >=20 > There's a case where this isn't true, though: if a client initiates Maybe mention this is a tap-side client specifically. > a connection, and the server doesn't send data, with the client also > not sending any further data except for what's possibly sent along > with the first ACK segment following SYN, ACK from the server, we'll This doesn't seem right. In my case the client *is* immediately sending data. It's *not* sending any in the ACK from the handshake (is that even allowed?). AFAICT it's just the fact that the (socket side) server doesn't send data which is relevant. > never, in the established state of the connection, call > tcp_update_seqack_from_tap() with reported forward progress. >=20 > In that case, we'll never reset the initial ACK_FROM_TAP_DUE (used > to trigger handshake timeouts), interpret the trigger as a the need > for a retransmission (which won't actually retransmit anything), and > eventually time out a perfectly healthy connection once we reach the > maximum retransmission count. >=20 > This is relatively simple to reproduce if we switch back to 30s > iperf3 test runs, but it depends on the timing of the iperf3 client: > sometimes the first ACK from the client (part of the three-way > handshake) will come with data (and we'll hit the problem), sometimes > data will be sent later (and we call to tcp_update_seqack_from_tap() > from tcp_data_from_tap() later, avoiding the issue). This last bit seems wrong too. What I'm seeing is that tcp_update_seqack_from_tap() *is* called later from tcp_data_from_tap(), but it doesn't avoid the issue, because the =66rom-client ack number hasn't advanced, so it doesn't do anything. > A reliable reproducer is a simpler: >=20 > $ strace -e accept,shutdown socat TCP-LISTEN:1111 STDIO & > [2] 2202832 > $ pasta --config-net -- sh -c '(sleep 30; echo a) | socat STDIN TCP:88.= 198.0.161:1111' I assume 88.198.0.161 is the gateway address here? > accept(5, {sa_family=3DAF_INET, sin_port=3Dhtons(57200), sin_addr=3Dine= t_addr("127.0.0.1")}, [16]) =3D 6 > shutdown(6, SHUT_RDWR) =3D 0 > --- SIGTTOU {si_signo=3DSIGTTOU, si_code=3DSI_KERNEL} --- > --- stopped by SIGTTOU --- > 2023/03/23 16:05:06 socat[3] E write(5, 0x5645dbca9000, 2): Connection = reset by peer That reproduces a problem, but not exactly the one I'm seeing (see notes above). Mine can be reproduced with: $ strace -e accept,shutdown socat -u TCP-LISTEN:1111 OPEN:/dev/null,wronly & $ ./pasta --config-net -- socat -u OPEN:/dev/zero,rdonly TCP:192.168.17.1:1= 111 then waiting 30s. > where the socat client connects, and no data exchange happens for 30s > in either direction. Here, I'm using the default gateway address to > reach the socat server on the host. >=20 > Fix this by clearing and setting the ACK_FROM_TAP_DUE flag regardless > of reported forward progress. If we clear it when it's already unset, > conn_flag() will do nothing with it. If it was set, it's always fine > to reschedule the timer (as long as we're waiting for a further ACK), > because we just received an ACK segment, no matter its sequence. Hrm... is that actually true? Consider this situation: the server (socket side) sent some data that got lost, so the client (tap side) is not ever going to ack it. However, the client is continuing to send data, so we keep getting acks from it that don't make forward progress. Won't the change as proposed mean we then keep delaying the retransmit indefinitely? I've been thinking about this a bunch, and it's doing my head in a bit. However, I think the problem is that we have ACK_FROM_TAP_DUE set in the first place, when we don't actually have any outstanding sent data to ack. I think that's happening because we're not clearing it on the very first ACK from the client - the one in the handshake. That's because of the + 1 in: tcp_seq_init(c, conn, now); conn->seq_ack_from_tap =3D conn->seq_to_tap + 1; I think we have a subtle conflict of two reasonable seeming invariants here (written for the client on tap side case below, but there are variants for other cases, I think). A) We expect an ack iff seq_to_tap > seq_ack_from_tap According to this invariant, we want to remove the + 1 there. We advance seq_to_tap when we send the syn-ack, and ACK_FROM_TAP_DUE is set. seq_ack_from_tap catches up when we receive the handshake ack and ACK_FROM_TAP_DUE is cleared. B) (seq_to_tap - seq_ack_from_tap) is equal to the number of bytes in the socket buffer which have been sent to the client at least once This wants the + 1, because before the server sends data there's obviously nothing in the socket buffers, including during the handshake. But... I think the only place that relies on (B) is tcp_data_from_sock(), and I don't think it ever gets called with !ESTABLISHED. So.. I think we want to stick with invariant (A) and remove the "+ 1", for both the conn_from_sock and conn_from_tap variants. > Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, up= date timer") > Reported-by: David Gibson > Analysed-by: David Gibson > Signed-off-by: Stefano Brivio > --- > tcp.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index f156287..b225bbe 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1610,12 +1610,12 @@ out: > static void tcp_update_seqack_from_tap(const struct ctx *c, > struct tcp_tap_conn *conn, uint32_t seq) > { > - if (SEQ_GT(seq, conn->seq_ack_from_tap)) { > - if (seq =3D=3D conn->seq_to_tap) > - conn_flag(c, conn, ~ACK_FROM_TAP_DUE); > - else > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + if (seq =3D=3D conn->seq_to_tap) > + conn_flag(c, conn, ~ACK_FROM_TAP_DUE); > + else > + conn_flag(c, conn, ACK_FROM_TAP_DUE); > =20 > + if (SEQ_GT(seq, conn->seq_ack_from_tap)) { > conn->retrans =3D 0; > conn->seq_ack_from_tap =3D seq; > } --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --6ebBC355sPWsnu36 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmQdMo0ACgkQzQJF27ox 2GfsZRAAgL6sUqPOf87VJJ/t623xYJx1Hgo+WBvEbXceUuaSPFuH9+sy81oIlYvb BIfYDrrqDIPeRa1A2UOLpn5SGVqpjHdwWfgecQLolodKD+TQa7oi+AAcC9ysHDyl rP3kxAK+9lusT9XLz8cSUbNQEAhoyJghLDTEiTf7QNQSXqtrKNenvSNSsyLfJfFZ lm6ByTHMwzs2Lvss1+PpE4MRsDRpTFt69Y87j8YhykZ4dyPymyjNyuo7me+nP7mX kRvHlnYDpR1TIlVqWGl5v3q5c0ruC/No7S+Jbqjr7BdWi7MluBbCqb0tigpoE+yB ej9FAhXaqDyAxZcQXxg1sUAyMdBrO7/yCc2pl/MXpanUdxl/6/JoEkx1oKu62T/5 zfzeYbr9aGPtVju4NGRaHPP6+goduhBby9KoFPL3lSsz302NYPgiXRmO45NmtBT2 M9N5RiUVc+B17l21xRI2nntmBUeeLF+5sVfHFd/qPPJ5Cq4GDHrxWtXAPrBBqJT0 ollUC1R9jy6ELscjHG8aiDDEjugxpET1pdHiKWQKNW2JB+8wtdpVuMmtutr++gWp UVuCJgs5Zw98QvioCBayCPKVxgJp023tIO07JR5FEQpBZ2/t+dZIoTCRo+jR8iQq wmaUgUao2p4J96TvD01veS3F+Ok+hcJ09VExjb42aqksmlukHsU= =gDWb -----END PGP SIGNATURE----- --6ebBC355sPWsnu36--