From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: tcp_rst() complications
Date: Thu, 27 Feb 2025 15:10:32 +1100 [thread overview]
Message-ID: <Z7_luJnVSHuI2KYL@zatzit> (raw)
In-Reply-To: <Z7_Ge8lEnWYJC2aV@zatzit>
[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]
On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote:
> On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:
> > On Wed, 26 Feb 2025 20:05:25 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > Amongst other things, I spotted some additional complications using
> > > tcp_rst() in the migration path (some of which might also have
> > > implications in other contexts). These might be things we can safely
> > > ignore, at least for now, but I haven't thought through them enough to
> > > be sure.
> > >
> > > 1) Sending RST to guest during migration
> > >
> > > The first issue is that tcp_rst() will send an actual RST to the guest
> > > on the tap interface. During migration, that means we're sending to
> > > the guest while it's suspended. At the very least that means we
> > > probably have a much higher that usual chance of getting a queue full
> > > failure writing to the tap interface, which could hit problem (2).
> > >
> > > But, beyond that, with vhost-user that means we're writing to guest
> > > memory while the guest is suspended. Kind of the whole point of the
> > > suspended time is that the guest memory doesn't change during it, so
> > > I'm not sure what the consequences will be.
> >
> > If I recall correctly I checked this and something in the vhost-user
> > code will tell us that the queue is not ready yet, done.
>
> Ah, yes. I'd seen those "Got packet, but RX virtqueue is not usable
> yet" messages, but didn't make the connection.
Significantly, of course, that case also still returns 0 from
tcp_send_flag() so we carry on with changing the state to CLOSED.
> I had been concerned that might only happen when we tried to kick the
> queue, after we'd touched guest memory, but that doesn't appear to be
> the case. Ok, now I'm reasonably convinced this is ifne.
>
> > Ideally we want to make sure we queue those, but queue sizes are finite
> > and I don't think we can guarantee we can pile up 128k RST segments.
>
> Well that, plus with vhost-user we don't really have anywhere we can
> queue them other than in guest memory.
>
> > Right now I would check that the functionality is not spectacularly
> > broken (I looked into that briefly, QEMU didn't crash, guest kept
> > running, but I didn't check further than that). If we miss a RST too
> > bad, things will time out eventually.
> >
> > As a second step we could perhaps introduce a post-migration stage and
> > move calling tcp_rst() to there if the connection is in a given state?
> >
> > > Now, at the moment I
> > > think all our tcp_rst() calls are either on the source during rollback
> > > (i.e. we're committed to resuming only on the source) or on the target
> > > past the point of no return (i.e. we're committed to resuming only on
> > > the target). I suspect that means we can get away with it, but I do
> > > worry this could break something in qeme by violating that assumption.
> > >
> > > 2) tcp_rst() failures
> > >
> > > tcp_rst() can fail if tcp_send_flag() fails. In this case we *don't*
> > > change the events to CLOSED. I _think_ that's a bug: even if we
> > > weren't able to send the RST to the guest, we've already closed the
> > > socket so the flow is dead. Moving to CLOSED state (and then removing
> > > the flow entirely) should mean that we'll resend an RST if the guest
> > > attempts to use the flow again later.
> > >
> > > But.. I was worried there might be some subtle reason for not changing
> > > the event state in that case.
> >
> > Not very subtle: my original idea was that if we fail to send the RST,
> > we should note that (by not moving to CLOSED) and try again from a
> > timer.
>
> Ok. Makes sense, but I don't think it's necessary. We can treat a
> failure here as if the RST was lost on the wire: we just forget the
> connection, and if the guest doesn't drop it we'll send it another RST
> if it tries to send a packet to the connection we no longer know
> about.
I was wrong about this, and we hit this case in practice for a
migration where the target has a different address (which I've now
managed to test properly).
With my draft fix, after the bind() failure we correctly We tcp_rst()
the imported flow, but because the queue is disabled the RST isn't
actually delivered to the guest. So, the (target) guest will now send
mid-stream packets which no longer match any flow. Turns out
flow-less packets which aren't SYN we simply ignore.
That seems wrong, I think we should RST and/or send an ICMP before.
I'm slightly worried it might be one of those cases where that was
how it was originally supposed to work, but the realities of a hostile
internet mean it turns out to be unwise. Then again, this is guest
facing, so it won't usually be exposed to the whole internet.
> > In practice I've never observed a tcp_send_flag() failure so I'm not
> > sure if that mechanism even works.
>
> I don't think the mechanism even exists. I don't think there's any
> timer that does this, and more to the point, if we don't change events
> we basically don't have any record that we already reset the connection.
>
> > Moving the socket to CLOSED sounds
> > totally okay to me, surely simpler, and probably more robust.
>
> Ok. I'll go ahead with that.
I still think this change is correct, and I'm going ahead with it, but
I think we probably want another change on that to RST (or something)
on flowless !SYN packets.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-02-27 4:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 9:05 tcp_rst() complications David Gibson
2025-02-26 10:15 ` Stefano Brivio
2025-02-27 1:57 ` David Gibson
2025-02-27 4:10 ` David Gibson [this message]
2025-02-27 4:26 ` Stefano Brivio
2025-02-27 6:02 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7_luJnVSHuI2KYL@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).