public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: tcp_rst() complications
Date: Thu, 27 Feb 2025 05:26:44 +0100	[thread overview]
Message-ID: <20250227052644.7b5412d3@elisabeth> (raw)
In-Reply-To: <Z7_luJnVSHuI2KYL@zatzit>

On Thu, 27 Feb 2025 15:10:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote:
> > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:  
> > > On Wed, 26 Feb 2025 20:05:25 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Amongst other things, I spotted some additional complications using
> > > > tcp_rst() in the migration path (some of which might also have
> > > > implications in other contexts).  These might be things we can safely
> > > > ignore, at least for now, but I haven't thought through them enough to
> > > > be sure.
> > > > 
> > > > 1) Sending RST to guest during migration
> > > > 
> > > > The first issue is that tcp_rst() will send an actual RST to the guest
> > > > on the tap interface.  During migration, that means we're sending to
> > > > the guest while it's suspended.  At the very least that means we
> > > > probably have a much higher that usual chance of getting a queue full
> > > > failure writing to the tap interface, which could hit problem (2).
> > > > 
> > > > But, beyond that, with vhost-user that means we're writing to guest
> > > > memory while the guest is suspended.  Kind of the whole point of the
> > > > suspended time is that the guest memory doesn't change during it, so
> > > > I'm not sure what the consequences will be.  
> > > 
> > > If I recall correctly I checked this and something in the vhost-user
> > > code will tell us that the queue is not ready yet, done.  
> > 
> > Ah, yes.  I'd seen those "Got packet, but RX virtqueue is not usable
> > yet" messages, but didn't make the connection.  
> 
> Significantly, of course, that case also still returns 0 from
> tcp_send_flag() so we carry on with changing the state to CLOSED.

Yes, I think we should eventually re-evaluate that 'return 0;' at some
point. On the other hand we're (generally) dealing with an unexpected
situation.

> > I had been concerned that might only happen when we tried to kick the
> > queue, after we'd touched guest memory, but that doesn't appear to be
> > the case.  Ok, now I'm reasonably convinced this is ifne.
> >   
> > > Ideally we want to make sure we queue those, but queue sizes are finite
> > > and I don't think we can guarantee we can pile up 128k RST segments.  
> > 
> > Well that, plus with vhost-user we don't really have anywhere we can
> > queue them other than in guest memory.
> >   
> > > Right now I would check that the functionality is not spectacularly
> > > broken (I looked into that briefly, QEMU didn't crash, guest kept
> > > running, but I didn't check further than that). If we miss a RST too
> > > bad, things will time out eventually.
> > > 
> > > As a second step we could perhaps introduce a post-migration stage and
> > > move calling tcp_rst() to there if the connection is in a given state?
> > >   
> > > > Now, at the moment I
> > > > think all our tcp_rst() calls are either on the source during rollback
> > > > (i.e. we're committed to resuming only on the source) or on the target
> > > > past the point of no return (i.e. we're committed to resuming only on
> > > > the target).  I suspect that means we can get away with it, but I do
> > > > worry this could break something in qeme by violating that assumption.
> > > > 
> > > > 2) tcp_rst() failures
> > > > 
> > > > tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> > > > change the events to CLOSED.  I _think_ that's a bug: even if we
> > > > weren't able to send the RST to the guest, we've already closed the
> > > > socket so the flow is dead.  Moving to CLOSED state (and then removing
> > > > the flow entirely) should mean that we'll resend an RST if the guest
> > > > attempts to use the flow again later.
> > > > 
> > > > But.. I was worried there might be some subtle reason for not changing
> > > > the event state in that case.  
> > > 
> > > Not very subtle: my original idea was that if we fail to send the RST,
> > > we should note that (by not moving to CLOSED) and try again from a
> > > timer.  
> > 
> > Ok.  Makes sense, but I don't think it's necessary. We can treat a
> > failure here as if the RST was lost on the wire: we just forget the
> > connection, and if the guest doesn't drop it we'll send it another RST
> > if it tries to send a packet to the connection we no longer know
> > about.  
> 
> I was wrong about this, and we hit this case in practice for a
> migration where the target has a different address (which I've now
> managed to test properly).
> 
> With my draft fix, after the bind() failure we correctly We tcp_rst()
> the imported flow, but because the queue is disabled the RST isn't
> actually delivered to the guest.  So, the (target) guest will now send
> mid-stream packets which no longer match any flow.  Turns out
> flow-less packets which aren't SYN we simply ignore.
> 
> That seems wrong, I think we should RST and/or send an ICMP before.
> I'm slightly worried it might be one of those cases where that was
> how it was originally supposed to work, but the realities of a hostile
> internet mean it turns out to be unwise.  Then again, this is guest
> facing, so it won't usually be exposed to the whole internet.

Correct, we should RST on !RST:

  https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.1

the consequences of not doing so shouldn't be relevant because
data-less segments can't be expected to be delivered reliably, plus
"stray" RSTs are very commonly dropped by firewalls.

Still yes, we should fix it, it was pretty much an oversight on my side.

> > > In practice I've never observed a tcp_send_flag() failure so I'm not
> > > sure if that mechanism even works.  
> > 
> > I don't think the mechanism even exists.  I don't think there's any
> > timer that does this, and more to the point, if we don't change events
> > we basically don't have any record that we already reset the connection.
> >   
> > > Moving the socket to CLOSED sounds
> > > totally okay to me, surely simpler, and probably more robust.  
> > 
> > Ok.  I'll go ahead with that.  
> 
> I still think this change is correct, and I'm going ahead with it, but
> I think we probably want another change on that to RST (or something)
> on flowless !SYN packets.

Yes, it's RST with sequence number 0 if !ACK, matching sequence number
(needing a separate version of tcp_rst() perhaps?) on ACK.

-- 
Stefano


  reply	other threads:[~2025-02-27  4:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26  9:05 tcp_rst() complications David Gibson
2025-02-26 10:15 ` Stefano Brivio
2025-02-27  1:57   ` David Gibson
2025-02-27  4:10     ` David Gibson
2025-02-27  4:26       ` Stefano Brivio [this message]
2025-02-27  6:02         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250227052644.7b5412d3@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).