tcp_rst() complications

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* tcp_rst() complications
@ 2025-02-26  9:05 David Gibson
  2025-02-26 10:15 ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-02-26  9:05 UTC (permalink / raw)
  To: passt-dev; +Cc: Steafno Brivio

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

Amongst other things, I spotted some additional complications using
tcp_rst() in the migration path (some of which might also have
implications in other contexts).  These might be things we can safely
ignore, at least for now, but I haven't thought through them enough to
be sure.

1) Sending RST to guest during migration

The first issue is that tcp_rst() will send an actual RST to the guest
on the tap interface.  During migration, that means we're sending to
the guest while it's suspended.  At the very least that means we
probably have a much higher that usual chance of getting a queue full
failure writing to the tap interface, which could hit problem (2).

But, beyond that, with vhost-user that means we're writing to guest
memory while the guest is suspended.  Kind of the whole point of the
suspended time is that the guest memory doesn't change during it, so
I'm not sure what the consequences will be.  Now, at the moment I
think all our tcp_rst() calls are either on the source during rollback
(i.e. we're committed to resuming only on the source) or on the target
past the point of no return (i.e. we're committed to resuming only on
the target).  I suspect that means we can get away with it, but I do
worry this could break something in qeme by violating that assumption.

2) tcp_rst() failures

tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
change the events to CLOSED.  I _think_ that's a bug: even if we
weren't able to send the RST to the guest, we've already closed the
socket so the flow is dead.  Moving to CLOSED state (and then removing
the flow entirely) should mean that we'll resend an RST if the guest
attempts to use the flow again later.

But.. I was worried there might be some subtle reason for not changing
the event state in that case.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_rst() complications
  2025-02-26  9:05 tcp_rst() complications David Gibson
@ 2025-02-26 10:15 ` Stefano Brivio
  2025-02-27  1:57   ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-02-26 10:15 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 26 Feb 2025 20:05:25 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Amongst other things, I spotted some additional complications using
> tcp_rst() in the migration path (some of which might also have
> implications in other contexts).  These might be things we can safely
> ignore, at least for now, but I haven't thought through them enough to
> be sure.
> 
> 1) Sending RST to guest during migration
> 
> The first issue is that tcp_rst() will send an actual RST to the guest
> on the tap interface.  During migration, that means we're sending to
> the guest while it's suspended.  At the very least that means we
> probably have a much higher that usual chance of getting a queue full
> failure writing to the tap interface, which could hit problem (2).
> 
> But, beyond that, with vhost-user that means we're writing to guest
> memory while the guest is suspended.  Kind of the whole point of the
> suspended time is that the guest memory doesn't change during it, so
> I'm not sure what the consequences will be.

If I recall correctly I checked this and something in the vhost-user
code will tell us that the queue is not ready yet, done.

Ideally we want to make sure we queue those, but queue sizes are finite
and I don't think we can guarantee we can pile up 128k RST segments.

Right now I would check that the functionality is not spectacularly
broken (I looked into that briefly, QEMU didn't crash, guest kept
running, but I didn't check further than that). If we miss a RST too
bad, things will time out eventually.

As a second step we could perhaps introduce a post-migration stage and
move calling tcp_rst() to there if the connection is in a given state?

> Now, at the moment I
> think all our tcp_rst() calls are either on the source during rollback
> (i.e. we're committed to resuming only on the source) or on the target
> past the point of no return (i.e. we're committed to resuming only on
> the target).  I suspect that means we can get away with it, but I do
> worry this could break something in qeme by violating that assumption.
> 
> 2) tcp_rst() failures
> 
> tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> change the events to CLOSED.  I _think_ that's a bug: even if we
> weren't able to send the RST to the guest, we've already closed the
> socket so the flow is dead.  Moving to CLOSED state (and then removing
> the flow entirely) should mean that we'll resend an RST if the guest
> attempts to use the flow again later.
> 
> But.. I was worried there might be some subtle reason for not changing
> the event state in that case.

Not very subtle: my original idea was that if we fail to send the RST,
we should note that (by not moving to CLOSED) and try again from a
timer.

In practice I've never observed a tcp_send_flag() failure so I'm not
sure if that mechanism even works. Moving the socket to CLOSED sounds
totally okay to me, surely simpler, and probably more robust.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_rst() complications
  2025-02-26 10:15 ` Stefano Brivio
@ 2025-02-27  1:57   ` David Gibson
  2025-02-27  4:10     ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-02-27  1:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4376 bytes --]

On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:
> On Wed, 26 Feb 2025 20:05:25 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Amongst other things, I spotted some additional complications using
> > tcp_rst() in the migration path (some of which might also have
> > implications in other contexts).  These might be things we can safely
> > ignore, at least for now, but I haven't thought through them enough to
> > be sure.
> > 
> > 1) Sending RST to guest during migration
> > 
> > The first issue is that tcp_rst() will send an actual RST to the guest
> > on the tap interface.  During migration, that means we're sending to
> > the guest while it's suspended.  At the very least that means we
> > probably have a much higher that usual chance of getting a queue full
> > failure writing to the tap interface, which could hit problem (2).
> > 
> > But, beyond that, with vhost-user that means we're writing to guest
> > memory while the guest is suspended.  Kind of the whole point of the
> > suspended time is that the guest memory doesn't change during it, so
> > I'm not sure what the consequences will be.
> 
> If I recall correctly I checked this and something in the vhost-user
> code will tell us that the queue is not ready yet, done.

Ah, yes.  I'd seen those "Got packet, but RX virtqueue is not usable
yet" messages, but didn't make the connection.

I had been concerned that might only happen when we tried to kick the
queue, after we'd touched guest memory, but that doesn't appear to be
the case.  Ok, now I'm reasonably convinced this is ifne.

> Ideally we want to make sure we queue those, but queue sizes are finite
> and I don't think we can guarantee we can pile up 128k RST segments.

Well that, plus with vhost-user we don't really have anywhere we can
queue them other than in guest memory.

> Right now I would check that the functionality is not spectacularly
> broken (I looked into that briefly, QEMU didn't crash, guest kept
> running, but I didn't check further than that). If we miss a RST too
> bad, things will time out eventually.
> 
> As a second step we could perhaps introduce a post-migration stage and
> move calling tcp_rst() to there if the connection is in a given state?
> 
> > Now, at the moment I
> > think all our tcp_rst() calls are either on the source during rollback
> > (i.e. we're committed to resuming only on the source) or on the target
> > past the point of no return (i.e. we're committed to resuming only on
> > the target).  I suspect that means we can get away with it, but I do
> > worry this could break something in qeme by violating that assumption.
> > 
> > 2) tcp_rst() failures
> > 
> > tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> > change the events to CLOSED.  I _think_ that's a bug: even if we
> > weren't able to send the RST to the guest, we've already closed the
> > socket so the flow is dead.  Moving to CLOSED state (and then removing
> > the flow entirely) should mean that we'll resend an RST if the guest
> > attempts to use the flow again later.
> > 
> > But.. I was worried there might be some subtle reason for not changing
> > the event state in that case.
> 
> Not very subtle: my original idea was that if we fail to send the RST,
> we should note that (by not moving to CLOSED) and try again from a
> timer.

Ok.  Makes sense, but I don't think it's necessary. We can treat a
failure here as if the RST was lost on the wire: we just forget the
connection, and if the guest doesn't drop it we'll send it another RST
if it tries to send a packet to the connection we no longer know
about.

> In practice I've never observed a tcp_send_flag() failure so I'm not
> sure if that mechanism even works.

I don't think the mechanism even exists.  I don't think there's any
timer that does this, and more to the point, if we don't change events
we basically don't have any record that we already reset the connection.

> Moving the socket to CLOSED sounds
> totally okay to me, surely simpler, and probably more robust.

Ok.  I'll go ahead with that.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_rst() complications
  2025-02-27  1:57   ` David Gibson
@ 2025-02-27  4:10     ` David Gibson
  2025-02-27  4:26       ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-02-27  4:10 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]

On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote:
> On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:
> > On Wed, 26 Feb 2025 20:05:25 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > Amongst other things, I spotted some additional complications using
> > > tcp_rst() in the migration path (some of which might also have
> > > implications in other contexts).  These might be things we can safely
> > > ignore, at least for now, but I haven't thought through them enough to
> > > be sure.
> > > 
> > > 1) Sending RST to guest during migration
> > > 
> > > The first issue is that tcp_rst() will send an actual RST to the guest
> > > on the tap interface.  During migration, that means we're sending to
> > > the guest while it's suspended.  At the very least that means we
> > > probably have a much higher that usual chance of getting a queue full
> > > failure writing to the tap interface, which could hit problem (2).
> > > 
> > > But, beyond that, with vhost-user that means we're writing to guest
> > > memory while the guest is suspended.  Kind of the whole point of the
> > > suspended time is that the guest memory doesn't change during it, so
> > > I'm not sure what the consequences will be.
> > 
> > If I recall correctly I checked this and something in the vhost-user
> > code will tell us that the queue is not ready yet, done.
> 
> Ah, yes.  I'd seen those "Got packet, but RX virtqueue is not usable
> yet" messages, but didn't make the connection.

Significantly, of course, that case also still returns 0 from
tcp_send_flag() so we carry on with changing the state to CLOSED.

> I had been concerned that might only happen when we tried to kick the
> queue, after we'd touched guest memory, but that doesn't appear to be
> the case.  Ok, now I'm reasonably convinced this is ifne.
> 
> > Ideally we want to make sure we queue those, but queue sizes are finite
> > and I don't think we can guarantee we can pile up 128k RST segments.
> 
> Well that, plus with vhost-user we don't really have anywhere we can
> queue them other than in guest memory.
> 
> > Right now I would check that the functionality is not spectacularly
> > broken (I looked into that briefly, QEMU didn't crash, guest kept
> > running, but I didn't check further than that). If we miss a RST too
> > bad, things will time out eventually.
> > 
> > As a second step we could perhaps introduce a post-migration stage and
> > move calling tcp_rst() to there if the connection is in a given state?
> > 
> > > Now, at the moment I
> > > think all our tcp_rst() calls are either on the source during rollback
> > > (i.e. we're committed to resuming only on the source) or on the target
> > > past the point of no return (i.e. we're committed to resuming only on
> > > the target).  I suspect that means we can get away with it, but I do
> > > worry this could break something in qeme by violating that assumption.
> > > 
> > > 2) tcp_rst() failures
> > > 
> > > tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> > > change the events to CLOSED.  I _think_ that's a bug: even if we
> > > weren't able to send the RST to the guest, we've already closed the
> > > socket so the flow is dead.  Moving to CLOSED state (and then removing
> > > the flow entirely) should mean that we'll resend an RST if the guest
> > > attempts to use the flow again later.
> > > 
> > > But.. I was worried there might be some subtle reason for not changing
> > > the event state in that case.
> > 
> > Not very subtle: my original idea was that if we fail to send the RST,
> > we should note that (by not moving to CLOSED) and try again from a
> > timer.
> 
> Ok.  Makes sense, but I don't think it's necessary. We can treat a
> failure here as if the RST was lost on the wire: we just forget the
> connection, and if the guest doesn't drop it we'll send it another RST
> if it tries to send a packet to the connection we no longer know
> about.

I was wrong about this, and we hit this case in practice for a
migration where the target has a different address (which I've now
managed to test properly).

With my draft fix, after the bind() failure we correctly We tcp_rst()
the imported flow, but because the queue is disabled the RST isn't
actually delivered to the guest.  So, the (target) guest will now send
mid-stream packets which no longer match any flow.  Turns out
flow-less packets which aren't SYN we simply ignore.

That seems wrong, I think we should RST and/or send an ICMP before.
I'm slightly worried it might be one of those cases where that was
how it was originally supposed to work, but the realities of a hostile
internet mean it turns out to be unwise.  Then again, this is guest
facing, so it won't usually be exposed to the whole internet.

> > In practice I've never observed a tcp_send_flag() failure so I'm not
> > sure if that mechanism even works.
> 
> I don't think the mechanism even exists.  I don't think there's any
> timer that does this, and more to the point, if we don't change events
> we basically don't have any record that we already reset the connection.
> 
> > Moving the socket to CLOSED sounds
> > totally okay to me, surely simpler, and probably more robust.
> 
> Ok.  I'll go ahead with that.

I still think this change is correct, and I'm going ahead with it, but
I think we probably want another change on that to RST (or something)
on flowless !SYN packets.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_rst() complications
  2025-02-27  4:10     ` David Gibson
@ 2025-02-27  4:26       ` Stefano Brivio
  2025-02-27  6:02         ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-02-27  4:26 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 27 Feb 2025 15:10:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote:
> > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:  
> > > On Wed, 26 Feb 2025 20:05:25 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Amongst other things, I spotted some additional complications using
> > > > tcp_rst() in the migration path (some of which might also have
> > > > implications in other contexts).  These might be things we can safely
> > > > ignore, at least for now, but I haven't thought through them enough to
> > > > be sure.
> > > > 
> > > > 1) Sending RST to guest during migration
> > > > 
> > > > The first issue is that tcp_rst() will send an actual RST to the guest
> > > > on the tap interface.  During migration, that means we're sending to
> > > > the guest while it's suspended.  At the very least that means we
> > > > probably have a much higher that usual chance of getting a queue full
> > > > failure writing to the tap interface, which could hit problem (2).
> > > > 
> > > > But, beyond that, with vhost-user that means we're writing to guest
> > > > memory while the guest is suspended.  Kind of the whole point of the
> > > > suspended time is that the guest memory doesn't change during it, so
> > > > I'm not sure what the consequences will be.  
> > > 
> > > If I recall correctly I checked this and something in the vhost-user
> > > code will tell us that the queue is not ready yet, done.  
> > 
> > Ah, yes.  I'd seen those "Got packet, but RX virtqueue is not usable
> > yet" messages, but didn't make the connection.  
> 
> Significantly, of course, that case also still returns 0 from
> tcp_send_flag() so we carry on with changing the state to CLOSED.

Yes, I think we should eventually re-evaluate that 'return 0;' at some
point. On the other hand we're (generally) dealing with an unexpected
situation.

> > I had been concerned that might only happen when we tried to kick the
> > queue, after we'd touched guest memory, but that doesn't appear to be
> > the case.  Ok, now I'm reasonably convinced this is ifne.
> >   
> > > Ideally we want to make sure we queue those, but queue sizes are finite
> > > and I don't think we can guarantee we can pile up 128k RST segments.  
> > 
> > Well that, plus with vhost-user we don't really have anywhere we can
> > queue them other than in guest memory.
> >   
> > > Right now I would check that the functionality is not spectacularly
> > > broken (I looked into that briefly, QEMU didn't crash, guest kept
> > > running, but I didn't check further than that). If we miss a RST too
> > > bad, things will time out eventually.
> > > 
> > > As a second step we could perhaps introduce a post-migration stage and
> > > move calling tcp_rst() to there if the connection is in a given state?
> > >   
> > > > Now, at the moment I
> > > > think all our tcp_rst() calls are either on the source during rollback
> > > > (i.e. we're committed to resuming only on the source) or on the target
> > > > past the point of no return (i.e. we're committed to resuming only on
> > > > the target).  I suspect that means we can get away with it, but I do
> > > > worry this could break something in qeme by violating that assumption.
> > > > 
> > > > 2) tcp_rst() failures
> > > > 
> > > > tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> > > > change the events to CLOSED.  I _think_ that's a bug: even if we
> > > > weren't able to send the RST to the guest, we've already closed the
> > > > socket so the flow is dead.  Moving to CLOSED state (and then removing
> > > > the flow entirely) should mean that we'll resend an RST if the guest
> > > > attempts to use the flow again later.
> > > > 
> > > > But.. I was worried there might be some subtle reason for not changing
> > > > the event state in that case.  
> > > 
> > > Not very subtle: my original idea was that if we fail to send the RST,
> > > we should note that (by not moving to CLOSED) and try again from a
> > > timer.  
> > 
> > Ok.  Makes sense, but I don't think it's necessary. We can treat a
> > failure here as if the RST was lost on the wire: we just forget the
> > connection, and if the guest doesn't drop it we'll send it another RST
> > if it tries to send a packet to the connection we no longer know
> > about.  
> 
> I was wrong about this, and we hit this case in practice for a
> migration where the target has a different address (which I've now
> managed to test properly).
> 
> With my draft fix, after the bind() failure we correctly We tcp_rst()
> the imported flow, but because the queue is disabled the RST isn't
> actually delivered to the guest.  So, the (target) guest will now send
> mid-stream packets which no longer match any flow.  Turns out
> flow-less packets which aren't SYN we simply ignore.
> 
> That seems wrong, I think we should RST and/or send an ICMP before.
> I'm slightly worried it might be one of those cases where that was
> how it was originally supposed to work, but the realities of a hostile
> internet mean it turns out to be unwise.  Then again, this is guest
> facing, so it won't usually be exposed to the whole internet.

Correct, we should RST on !RST:

  https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.1

the consequences of not doing so shouldn't be relevant because
data-less segments can't be expected to be delivered reliably, plus
"stray" RSTs are very commonly dropped by firewalls.

Still yes, we should fix it, it was pretty much an oversight on my side.

> > > In practice I've never observed a tcp_send_flag() failure so I'm not
> > > sure if that mechanism even works.  
> > 
> > I don't think the mechanism even exists.  I don't think there's any
> > timer that does this, and more to the point, if we don't change events
> > we basically don't have any record that we already reset the connection.
> >   
> > > Moving the socket to CLOSED sounds
> > > totally okay to me, surely simpler, and probably more robust.  
> > 
> > Ok.  I'll go ahead with that.  
> 
> I still think this change is correct, and I'm going ahead with it, but
> I think we probably want another change on that to RST (or something)
> on flowless !SYN packets.

Yes, it's RST with sequence number 0 if !ACK, matching sequence number
(needing a separate version of tcp_rst() perhaps?) on ACK.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tcp_rst() complications
  2025-02-27  4:26       ` Stefano Brivio
@ 2025-02-27  6:02         ` David Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2025-02-27  6:02 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7330 bytes --]

On Thu, Feb 27, 2025 at 05:26:44AM +0100, Stefano Brivio wrote:
> On Thu, 27 Feb 2025 15:10:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote:
> > > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote:  
> > > > On Wed, 26 Feb 2025 20:05:25 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >   
> > > > > Amongst other things, I spotted some additional complications using
> > > > > tcp_rst() in the migration path (some of which might also have
> > > > > implications in other contexts).  These might be things we can safely
> > > > > ignore, at least for now, but I haven't thought through them enough to
> > > > > be sure.
> > > > > 
> > > > > 1) Sending RST to guest during migration
> > > > > 
> > > > > The first issue is that tcp_rst() will send an actual RST to the guest
> > > > > on the tap interface.  During migration, that means we're sending to
> > > > > the guest while it's suspended.  At the very least that means we
> > > > > probably have a much higher that usual chance of getting a queue full
> > > > > failure writing to the tap interface, which could hit problem (2).
> > > > > 
> > > > > But, beyond that, with vhost-user that means we're writing to guest
> > > > > memory while the guest is suspended.  Kind of the whole point of the
> > > > > suspended time is that the guest memory doesn't change during it, so
> > > > > I'm not sure what the consequences will be.  
> > > > 
> > > > If I recall correctly I checked this and something in the vhost-user
> > > > code will tell us that the queue is not ready yet, done.  
> > > 
> > > Ah, yes.  I'd seen those "Got packet, but RX virtqueue is not usable
> > > yet" messages, but didn't make the connection.  
> > 
> > Significantly, of course, that case also still returns 0 from
> > tcp_send_flag() so we carry on with changing the state to CLOSED.
> 
> Yes, I think we should eventually re-evaluate that 'return 0;' at some
> point. On the other hand we're (generally) dealing with an unexpected
> situation.

Right.

> > > I had been concerned that might only happen when we tried to kick the
> > > queue, after we'd touched guest memory, but that doesn't appear to be
> > > the case.  Ok, now I'm reasonably convinced this is ifne.
> > >   
> > > > Ideally we want to make sure we queue those, but queue sizes are finite
> > > > and I don't think we can guarantee we can pile up 128k RST segments.  
> > > 
> > > Well that, plus with vhost-user we don't really have anywhere we can
> > > queue them other than in guest memory.
> > >   
> > > > Right now I would check that the functionality is not spectacularly
> > > > broken (I looked into that briefly, QEMU didn't crash, guest kept
> > > > running, but I didn't check further than that). If we miss a RST too
> > > > bad, things will time out eventually.
> > > > 
> > > > As a second step we could perhaps introduce a post-migration stage and
> > > > move calling tcp_rst() to there if the connection is in a given state?
> > > >   
> > > > > Now, at the moment I
> > > > > think all our tcp_rst() calls are either on the source during rollback
> > > > > (i.e. we're committed to resuming only on the source) or on the target
> > > > > past the point of no return (i.e. we're committed to resuming only on
> > > > > the target).  I suspect that means we can get away with it, but I do
> > > > > worry this could break something in qeme by violating that assumption.
> > > > > 
> > > > > 2) tcp_rst() failures
> > > > > 
> > > > > tcp_rst() can fail if tcp_send_flag() fails.  In this case we *don't*
> > > > > change the events to CLOSED.  I _think_ that's a bug: even if we
> > > > > weren't able to send the RST to the guest, we've already closed the
> > > > > socket so the flow is dead.  Moving to CLOSED state (and then removing
> > > > > the flow entirely) should mean that we'll resend an RST if the guest
> > > > > attempts to use the flow again later.
> > > > > 
> > > > > But.. I was worried there might be some subtle reason for not changing
> > > > > the event state in that case.  
> > > > 
> > > > Not very subtle: my original idea was that if we fail to send the RST,
> > > > we should note that (by not moving to CLOSED) and try again from a
> > > > timer.  
> > > 
> > > Ok.  Makes sense, but I don't think it's necessary. We can treat a
> > > failure here as if the RST was lost on the wire: we just forget the
> > > connection, and if the guest doesn't drop it we'll send it another RST
> > > if it tries to send a packet to the connection we no longer know
> > > about.  
> > 
> > I was wrong about this, and we hit this case in practice for a
> > migration where the target has a different address (which I've now
> > managed to test properly).
> > 
> > With my draft fix, after the bind() failure we correctly We tcp_rst()
> > the imported flow, but because the queue is disabled the RST isn't
> > actually delivered to the guest.  So, the (target) guest will now send
> > mid-stream packets which no longer match any flow.  Turns out
> > flow-less packets which aren't SYN we simply ignore.
> > 
> > That seems wrong, I think we should RST and/or send an ICMP before.
> > I'm slightly worried it might be one of those cases where that was
> > how it was originally supposed to work, but the realities of a hostile
> > internet mean it turns out to be unwise.  Then again, this is guest
> > facing, so it won't usually be exposed to the whole internet.
> 
> Correct, we should RST on !RST:
> 
>   https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.1
> 
> the consequences of not doing so shouldn't be relevant because
> data-less segments can't be expected to be delivered reliably, plus
> "stray" RSTs are very commonly dropped by firewalls.
> 
> Still yes, we should fix it, it was pretty much an oversight on my side.

Ok, I might try to tackle that tomorrow.

> > > > In practice I've never observed a tcp_send_flag() failure so I'm not
> > > > sure if that mechanism even works.  
> > > 
> > > I don't think the mechanism even exists.  I don't think there's any
> > > timer that does this, and more to the point, if we don't change events
> > > we basically don't have any record that we already reset the connection.
> > >   
> > > > Moving the socket to CLOSED sounds
> > > > totally okay to me, surely simpler, and probably more robust.  
> > > 
> > > Ok.  I'll go ahead with that.  
> > 
> > I still think this change is correct, and I'm going ahead with it, but
> > I think we probably want another change on that to RST (or something)
> > on flowless !SYN packets.
> 
> Yes, it's RST with sequence number 0 if !ACK, matching sequence number
> (needing a separate version of tcp_rst() perhaps?) on ACK.

We'll certainly need a separate version of tcp_rst().  tcp_rst() takes
the connection as a parameter.  Here we're dealing with cases where
there is no existing connection, and that's kind of the point.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-27  6:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-26  9:05 tcp_rst() complications David Gibson
2025-02-26 10:15 ` Stefano Brivio
2025-02-27  1:57   ` David Gibson
2025-02-27  4:10     ` David Gibson
2025-02-27  4:26       ` Stefano Brivio
2025-02-27  6:02         ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).