From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=daSF9IN4; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id E188D5A0272 for ; Thu, 27 Feb 2025 07:13:15 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1740636792; bh=gCG7b4AKy9mGUKomd4yBHtT7pjz3h7uFqu0pG92WpAY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=daSF9IN4LICzah+H4qDGDWR1Oig1itcJPZWZA3h+EvvSwU60ftzFF5S3W7Z7HCXo8 gon9H/YCI0TriyyJI+35yFmpC5rNypZ979r0ioif2FxAqO4MepQid3lwhhll+kBJXs KC/jM6ViudFOxp/zdQKz//1mcO9O239ol9mb1m/4sXD0MbnHpodJ922pTnC/b/XplG Qk8aV2Tj6Np83CY+wRiWHphjEmE+Fa+MWM+A+gxCFNrcRy9RhE1IuAZa+RO12lFEve euy2/VhZXFqKe/HQAUwzz/FxDXVn6zMGTP+bF8fZQjItUEpOYWwzUmogA732Z4Wug/ Z0c06pVVgqUCQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Z3LcS6TDtz4wyh; Thu, 27 Feb 2025 17:13:12 +1100 (AEDT) Date: Thu, 27 Feb 2025 17:02:15 +1100 From: David Gibson To: Stefano Brivio Subject: Re: tcp_rst() complications Message-ID: References: <20250226111507.166ed938@elisabeth> <20250227052644.7b5412d3@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VLRxO7dN05/euD8u" Content-Disposition: inline In-Reply-To: <20250227052644.7b5412d3@elisabeth> Message-ID-Hash: GQ3KFDB3CVFCY7QCHA7AA7WZ2DTC4NP4 X-Message-ID-Hash: GQ3KFDB3CVFCY7QCHA7AA7WZ2DTC4NP4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --VLRxO7dN05/euD8u Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 27, 2025 at 05:26:44AM +0100, Stefano Brivio wrote: > On Thu, 27 Feb 2025 15:10:32 +1100 > David Gibson wrote: >=20 > > On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote: > > > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote: =20 > > > > On Wed, 26 Feb 2025 20:05:25 +1100 > > > > David Gibson wrote: > > > > =20 > > > > > Amongst other things, I spotted some additional complications usi= ng > > > > > tcp_rst() in the migration path (some of which might also have > > > > > implications in other contexts). These might be things we can sa= fely > > > > > ignore, at least for now, but I haven't thought through them enou= gh to > > > > > be sure. > > > > >=20 > > > > > 1) Sending RST to guest during migration > > > > >=20 > > > > > The first issue is that tcp_rst() will send an actual RST to the = guest > > > > > on the tap interface. During migration, that means we're sending= to > > > > > the guest while it's suspended. At the very least that means we > > > > > probably have a much higher that usual chance of getting a queue = full > > > > > failure writing to the tap interface, which could hit problem (2). > > > > >=20 > > > > > But, beyond that, with vhost-user that means we're writing to gue= st > > > > > memory while the guest is suspended. Kind of the whole point of = the > > > > > suspended time is that the guest memory doesn't change during it,= so > > > > > I'm not sure what the consequences will be. =20 > > > >=20 > > > > If I recall correctly I checked this and something in the vhost-user > > > > code will tell us that the queue is not ready yet, done. =20 > > >=20 > > > Ah, yes. I'd seen those "Got packet, but RX virtqueue is not usable > > > yet" messages, but didn't make the connection. =20 > >=20 > > Significantly, of course, that case also still returns 0 from > > tcp_send_flag() so we carry on with changing the state to CLOSED. >=20 > Yes, I think we should eventually re-evaluate that 'return 0;' at some > point. On the other hand we're (generally) dealing with an unexpected > situation. Right. > > > I had been concerned that might only happen when we tried to kick the > > > queue, after we'd touched guest memory, but that doesn't appear to be > > > the case. Ok, now I'm reasonably convinced this is ifne. > > > =20 > > > > Ideally we want to make sure we queue those, but queue sizes are fi= nite > > > > and I don't think we can guarantee we can pile up 128k RST segments= =2E =20 > > >=20 > > > Well that, plus with vhost-user we don't really have anywhere we can > > > queue them other than in guest memory. > > > =20 > > > > Right now I would check that the functionality is not spectacularly > > > > broken (I looked into that briefly, QEMU didn't crash, guest kept > > > > running, but I didn't check further than that). If we miss a RST too > > > > bad, things will time out eventually. > > > >=20 > > > > As a second step we could perhaps introduce a post-migration stage = and > > > > move calling tcp_rst() to there if the connection is in a given sta= te? > > > > =20 > > > > > Now, at the moment I > > > > > think all our tcp_rst() calls are either on the source during rol= lback > > > > > (i.e. we're committed to resuming only on the source) or on the t= arget > > > > > past the point of no return (i.e. we're committed to resuming onl= y on > > > > > the target). I suspect that means we can get away with it, but I= do > > > > > worry this could break something in qeme by violating that assump= tion. > > > > >=20 > > > > > 2) tcp_rst() failures > > > > >=20 > > > > > tcp_rst() can fail if tcp_send_flag() fails. In this case we *do= n't* > > > > > change the events to CLOSED. I _think_ that's a bug: even if we > > > > > weren't able to send the RST to the guest, we've already closed t= he > > > > > socket so the flow is dead. Moving to CLOSED state (and then rem= oving > > > > > the flow entirely) should mean that we'll resend an RST if the gu= est > > > > > attempts to use the flow again later. > > > > >=20 > > > > > But.. I was worried there might be some subtle reason for not cha= nging > > > > > the event state in that case. =20 > > > >=20 > > > > Not very subtle: my original idea was that if we fail to send the R= ST, > > > > we should note that (by not moving to CLOSED) and try again from a > > > > timer. =20 > > >=20 > > > Ok. Makes sense, but I don't think it's necessary. We can treat a > > > failure here as if the RST was lost on the wire: we just forget the > > > connection, and if the guest doesn't drop it we'll send it another RST > > > if it tries to send a packet to the connection we no longer know > > > about. =20 > >=20 > > I was wrong about this, and we hit this case in practice for a > > migration where the target has a different address (which I've now > > managed to test properly). > >=20 > > With my draft fix, after the bind() failure we correctly We tcp_rst() > > the imported flow, but because the queue is disabled the RST isn't > > actually delivered to the guest. So, the (target) guest will now send > > mid-stream packets which no longer match any flow. Turns out > > flow-less packets which aren't SYN we simply ignore. > >=20 > > That seems wrong, I think we should RST and/or send an ICMP before. > > I'm slightly worried it might be one of those cases where that was > > how it was originally supposed to work, but the realities of a hostile > > internet mean it turns out to be unwise. Then again, this is guest > > facing, so it won't usually be exposed to the whole internet. >=20 > Correct, we should RST on !RST: >=20 > https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.1 >=20 > the consequences of not doing so shouldn't be relevant because > data-less segments can't be expected to be delivered reliably, plus > "stray" RSTs are very commonly dropped by firewalls. >=20 > Still yes, we should fix it, it was pretty much an oversight on my side. Ok, I might try to tackle that tomorrow. > > > > In practice I've never observed a tcp_send_flag() failure so I'm not > > > > sure if that mechanism even works. =20 > > >=20 > > > I don't think the mechanism even exists. I don't think there's any > > > timer that does this, and more to the point, if we don't change events > > > we basically don't have any record that we already reset the connecti= on. > > > =20 > > > > Moving the socket to CLOSED sounds > > > > totally okay to me, surely simpler, and probably more robust. =20 > > >=20 > > > Ok. I'll go ahead with that. =20 > >=20 > > I still think this change is correct, and I'm going ahead with it, but > > I think we probably want another change on that to RST (or something) > > on flowless !SYN packets. >=20 > Yes, it's RST with sequence number 0 if !ACK, matching sequence number > (needing a separate version of tcp_rst() perhaps?) on ACK. We'll certainly need a separate version of tcp_rst(). tcp_rst() takes the connection as a parameter. Here we're dealing with cases where there is no existing connection, and that's kind of the point. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --VLRxO7dN05/euD8u Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAme//+YACgkQzQJF27ox 2GfMPA/9EgkhqyaRdrQsIzpTL2/SwoLMEeUAt9IwUzM82zdMPZo8yT0wwbbPWXCj uLc/8QexZ6tNhWLRcl/Q2kO5T1ATPBXZ0A1nqdsS/M8our7c4twX7SNUaZQVVT0z ZxVa72exK8G1ghchG+qbMUxYHmzEd6sKWqyHLt5jjz65mYXruWOPra0B0NcsK2RL g1lbNX8CA4GbkfwtAx9VD35XawJ55S3/hmc4ZfJdFzctQj8EyedjUoBQ8DezJHb0 ww6tnmPVYRVqz4V4lLBbmG6co3pToDMJxJ4oFRHwQz4rVcNOqJAI+rRjR3DBjRzh Kq3MEDevJ2Ja1oNQ1GW2gI78/i+ZW5aySR1GBxAiZvN8vTwKifUCvF1n0IOMRwVh nEmd8vZpU6MstNV+vdY8iRJquomUqcYe34HRhviY9odsDogYQoyF4BKKo4kC1gZL dN3pMAyqo4be7sTXEsP3BfklHsNzYU6vBJKMwvXIq0xA2tlgRbxurR47+FZQF6jA B3RaOU3f0rqwgjGn/YKY+2KDsi8fywF8wcCh2B1eBViOHbJ595UUtsrjKNr7HCl3 vhGRWbgcaRn6BlmejwTOdAxyYv8GpJPuRZXNjQdOQjoL1DLv9bOkBaZ9E9uA4TSq 2HwhHN+SCyV4pSKlKfRZE6M7Wq/Jye5LU2quyzfN8nnj012j9Xo= =XlUT -----END PGP SIGNATURE----- --VLRxO7dN05/euD8u--