On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote: > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote: > > On Wed, 26 Feb 2025 20:05:25 +1100 > > David Gibson wrote: > > > > > Amongst other things, I spotted some additional complications using > > > tcp_rst() in the migration path (some of which might also have > > > implications in other contexts). These might be things we can safely > > > ignore, at least for now, but I haven't thought through them enough to > > > be sure. > > > > > > 1) Sending RST to guest during migration > > > > > > The first issue is that tcp_rst() will send an actual RST to the guest > > > on the tap interface. During migration, that means we're sending to > > > the guest while it's suspended. At the very least that means we > > > probably have a much higher that usual chance of getting a queue full > > > failure writing to the tap interface, which could hit problem (2). > > > > > > But, beyond that, with vhost-user that means we're writing to guest > > > memory while the guest is suspended. Kind of the whole point of the > > > suspended time is that the guest memory doesn't change during it, so > > > I'm not sure what the consequences will be. > > > > If I recall correctly I checked this and something in the vhost-user > > code will tell us that the queue is not ready yet, done. > > Ah, yes. I'd seen those "Got packet, but RX virtqueue is not usable > yet" messages, but didn't make the connection. Significantly, of course, that case also still returns 0 from tcp_send_flag() so we carry on with changing the state to CLOSED. > I had been concerned that might only happen when we tried to kick the > queue, after we'd touched guest memory, but that doesn't appear to be > the case. Ok, now I'm reasonably convinced this is ifne. > > > Ideally we want to make sure we queue those, but queue sizes are finite > > and I don't think we can guarantee we can pile up 128k RST segments. > > Well that, plus with vhost-user we don't really have anywhere we can > queue them other than in guest memory. > > > Right now I would check that the functionality is not spectacularly > > broken (I looked into that briefly, QEMU didn't crash, guest kept > > running, but I didn't check further than that). If we miss a RST too > > bad, things will time out eventually. > > > > As a second step we could perhaps introduce a post-migration stage and > > move calling tcp_rst() to there if the connection is in a given state? > > > > > Now, at the moment I > > > think all our tcp_rst() calls are either on the source during rollback > > > (i.e. we're committed to resuming only on the source) or on the target > > > past the point of no return (i.e. we're committed to resuming only on > > > the target). I suspect that means we can get away with it, but I do > > > worry this could break something in qeme by violating that assumption. > > > > > > 2) tcp_rst() failures > > > > > > tcp_rst() can fail if tcp_send_flag() fails. In this case we *don't* > > > change the events to CLOSED. I _think_ that's a bug: even if we > > > weren't able to send the RST to the guest, we've already closed the > > > socket so the flow is dead. Moving to CLOSED state (and then removing > > > the flow entirely) should mean that we'll resend an RST if the guest > > > attempts to use the flow again later. > > > > > > But.. I was worried there might be some subtle reason for not changing > > > the event state in that case. > > > > Not very subtle: my original idea was that if we fail to send the RST, > > we should note that (by not moving to CLOSED) and try again from a > > timer. > > Ok. Makes sense, but I don't think it's necessary. We can treat a > failure here as if the RST was lost on the wire: we just forget the > connection, and if the guest doesn't drop it we'll send it another RST > if it tries to send a packet to the connection we no longer know > about. I was wrong about this, and we hit this case in practice for a migration where the target has a different address (which I've now managed to test properly). With my draft fix, after the bind() failure we correctly We tcp_rst() the imported flow, but because the queue is disabled the RST isn't actually delivered to the guest. So, the (target) guest will now send mid-stream packets which no longer match any flow. Turns out flow-less packets which aren't SYN we simply ignore. That seems wrong, I think we should RST and/or send an ICMP before. I'm slightly worried it might be one of those cases where that was how it was originally supposed to work, but the realities of a hostile internet mean it turns out to be unwise. Then again, this is guest facing, so it won't usually be exposed to the whole internet. > > In practice I've never observed a tcp_send_flag() failure so I'm not > > sure if that mechanism even works. > > I don't think the mechanism even exists. I don't think there's any > timer that does this, and more to the point, if we don't change events > we basically don't have any record that we already reset the connection. > > > Moving the socket to CLOSED sounds > > totally okay to me, surely simpler, and probably more robust. > > Ok. I'll go ahead with that. I still think this change is correct, and I'm going ahead with it, but I think we probably want another change on that to RST (or something) on flowless !SYN packets. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson