From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OiO15SFb; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id A13275A0622 for ; Thu, 27 Feb 2025 05:26:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740630413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xvRI/Y53VSCcZMPmgDDOxwg5afX3F05/U4bf8yANDHE=; b=OiO15SFbleuCNr6mu5mFSTvsdMiM783/BrQjpQoqzx1BrRiN4VM4A9Zpk8+Obszrfx2xA3 NflveKy+BceftL4xzN4vbHSk0ZdvNM30vyh4rXNan/i0S7aK3HAQbx+j6YBBnfqADDZpqj hII2ZJuqULgHez512/428Eyuv7MYDfU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-iKUJ_98tOpCqDLZtMELQfA-1; Wed, 26 Feb 2025 23:26:51 -0500 X-MC-Unique: iKUJ_98tOpCqDLZtMELQfA-1 X-Mimecast-MFC-AGG-ID: iKUJ_98tOpCqDLZtMELQfA_1740630410 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38f32ac838cso309696f8f.2 for ; Wed, 26 Feb 2025 20:26:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740630410; x=1741235210; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=xvRI/Y53VSCcZMPmgDDOxwg5afX3F05/U4bf8yANDHE=; b=K5ZCJbe5fEr4X3rE30AvYMJefZedY41G0pu6sZUcR4LrKyqyEE0/T8OLFpKDONfYB8 4IkX5hr6VQtn8BcSDscoDAzPiB9AgPfv9URWgp7JmeJUvjn73nNHXFjYFN4tJ003NuBX Xp9Aq1lAACeSbfwAqPyJB/nh9nX7zkQTf5tSvg4i4ZbbS8MKeSquuk98ypIANj+ETAZl hIONrsfruDJX1P3l480W+fhkT+HRbg6LLKKn8hjOpu46DpXaM31r1EqSd8cu01/5naCg P/ZXqZjvGUZTgclWO7ohZ4utaobzYOq4Xd51K1Zr+S37+pnpDgwESqm9pMlYaQBDpBIO ZYpw== X-Gm-Message-State: AOJu0YwVa5Q8LyGbgxMv8RiZxwefmOoH272EEqNvwIlN9hn3CEBgK2t+ GFrcdjPEuBzErnxyd1hn4imKjfVQIIQQtVHyWoVqTUkFgK+EG+nJwSD6Q8ZTnZ0/OHDIpOwNHn7 Oy/gxe0VQBv+txmvzqgQE7FNhmnhuTkqgv5KgLz+qHs72yYGaJXKlwFy4MN6G X-Gm-Gg: ASbGncskd+HsQO4Bc3hX1K8Fp8Dq/JJkKcv99t8Lra9+Q5CXhkvdOa5MvNtVpsJ+uvc FVAuwI7YLmdxlCan1/uBnvl+LJqAZ6q9myR9A/tU/OBBX05WNJA/aoyeIDppxKattq1v/XEj4Jk zCWyuL4abDV75OkgEh+AfwgW2BmEdMocjj4WYC3fQO/7+OhHo+r91tsE6+vIUZv6JcpLSy0pYm0 wsVw+Bja7gLsTAWE/rOwmc9EEvUfUbbdfKy8x+cN+R9YzMK5H0BC9ziOVHn5uZW8arkZL9sULUk REAFMWLl770fCeRvuxsTtr56yaacwxbLywTLF34NJjdR X-Received: by 2002:a05:6000:18ad:b0:38d:d0ea:b04c with SMTP id ffacd0b85a97d-390d4f8b9f0mr5572592f8f.38.1740630409770; Wed, 26 Feb 2025 20:26:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IGY7lfIRTW+k2AJMHyO7CxVW9Rz8XoBm5+5FNzke9maypPV12HEfQebGjf01obE80oyCtPBUw== X-Received: by 2002:a05:6000:18ad:b0:38d:d0ea:b04c with SMTP id ffacd0b85a97d-390d4f8b9f0mr5572568f8f.38.1740630409316; Wed, 26 Feb 2025 20:26:49 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47a7a2asm755683f8f.37.2025.02.26.20.26.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2025 20:26:48 -0800 (PST) Date: Thu, 27 Feb 2025 05:26:44 +0100 From: Stefano Brivio To: David Gibson Subject: Re: tcp_rst() complications Message-ID: <20250227052644.7b5412d3@elisabeth> In-Reply-To: References: <20250226111507.166ed938@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: bgGvkujg2yhzv8NhlH3Gv_OD-sLAdnrJGczsZVkJCS4_1740630410 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: TTFESYIQTBKHLKNJRIGC4QHGVWLDE2ST X-Message-ID-Hash: TTFESYIQTBKHLKNJRIGC4QHGVWLDE2ST X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 27 Feb 2025 15:10:32 +1100 David Gibson wrote: > On Thu, Feb 27, 2025 at 12:57:15PM +1100, David Gibson wrote: > > On Wed, Feb 26, 2025 at 11:15:07AM +0100, Stefano Brivio wrote: > > > On Wed, 26 Feb 2025 20:05:25 +1100 > > > David Gibson wrote: > > > > > > > Amongst other things, I spotted some additional complications using > > > > tcp_rst() in the migration path (some of which might also have > > > > implications in other contexts). These might be things we can safely > > > > ignore, at least for now, but I haven't thought through them enough to > > > > be sure. > > > > > > > > 1) Sending RST to guest during migration > > > > > > > > The first issue is that tcp_rst() will send an actual RST to the guest > > > > on the tap interface. During migration, that means we're sending to > > > > the guest while it's suspended. At the very least that means we > > > > probably have a much higher that usual chance of getting a queue full > > > > failure writing to the tap interface, which could hit problem (2). > > > > > > > > But, beyond that, with vhost-user that means we're writing to guest > > > > memory while the guest is suspended. Kind of the whole point of the > > > > suspended time is that the guest memory doesn't change during it, so > > > > I'm not sure what the consequences will be. > > > > > > If I recall correctly I checked this and something in the vhost-user > > > code will tell us that the queue is not ready yet, done. > > > > Ah, yes. I'd seen those "Got packet, but RX virtqueue is not usable > > yet" messages, but didn't make the connection. > > Significantly, of course, that case also still returns 0 from > tcp_send_flag() so we carry on with changing the state to CLOSED. Yes, I think we should eventually re-evaluate that 'return 0;' at some point. On the other hand we're (generally) dealing with an unexpected situation. > > I had been concerned that might only happen when we tried to kick the > > queue, after we'd touched guest memory, but that doesn't appear to be > > the case. Ok, now I'm reasonably convinced this is ifne. > > > > > Ideally we want to make sure we queue those, but queue sizes are finite > > > and I don't think we can guarantee we can pile up 128k RST segments. > > > > Well that, plus with vhost-user we don't really have anywhere we can > > queue them other than in guest memory. > > > > > Right now I would check that the functionality is not spectacularly > > > broken (I looked into that briefly, QEMU didn't crash, guest kept > > > running, but I didn't check further than that). If we miss a RST too > > > bad, things will time out eventually. > > > > > > As a second step we could perhaps introduce a post-migration stage and > > > move calling tcp_rst() to there if the connection is in a given state? > > > > > > > Now, at the moment I > > > > think all our tcp_rst() calls are either on the source during rollback > > > > (i.e. we're committed to resuming only on the source) or on the target > > > > past the point of no return (i.e. we're committed to resuming only on > > > > the target). I suspect that means we can get away with it, but I do > > > > worry this could break something in qeme by violating that assumption. > > > > > > > > 2) tcp_rst() failures > > > > > > > > tcp_rst() can fail if tcp_send_flag() fails. In this case we *don't* > > > > change the events to CLOSED. I _think_ that's a bug: even if we > > > > weren't able to send the RST to the guest, we've already closed the > > > > socket so the flow is dead. Moving to CLOSED state (and then removing > > > > the flow entirely) should mean that we'll resend an RST if the guest > > > > attempts to use the flow again later. > > > > > > > > But.. I was worried there might be some subtle reason for not changing > > > > the event state in that case. > > > > > > Not very subtle: my original idea was that if we fail to send the RST, > > > we should note that (by not moving to CLOSED) and try again from a > > > timer. > > > > Ok. Makes sense, but I don't think it's necessary. We can treat a > > failure here as if the RST was lost on the wire: we just forget the > > connection, and if the guest doesn't drop it we'll send it another RST > > if it tries to send a packet to the connection we no longer know > > about. > > I was wrong about this, and we hit this case in practice for a > migration where the target has a different address (which I've now > managed to test properly). > > With my draft fix, after the bind() failure we correctly We tcp_rst() > the imported flow, but because the queue is disabled the RST isn't > actually delivered to the guest. So, the (target) guest will now send > mid-stream packets which no longer match any flow. Turns out > flow-less packets which aren't SYN we simply ignore. > > That seems wrong, I think we should RST and/or send an ICMP before. > I'm slightly worried it might be one of those cases where that was > how it was originally supposed to work, but the realities of a hostile > internet mean it turns out to be unwise. Then again, this is guest > facing, so it won't usually be exposed to the whole internet. Correct, we should RST on !RST: https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.1 the consequences of not doing so shouldn't be relevant because data-less segments can't be expected to be delivered reliably, plus "stray" RSTs are very commonly dropped by firewalls. Still yes, we should fix it, it was pretty much an oversight on my side. > > > In practice I've never observed a tcp_send_flag() failure so I'm not > > > sure if that mechanism even works. > > > > I don't think the mechanism even exists. I don't think there's any > > timer that does this, and more to the point, if we don't change events > > we basically don't have any record that we already reset the connection. > > > > > Moving the socket to CLOSED sounds > > > totally okay to me, surely simpler, and probably more robust. > > > > Ok. I'll go ahead with that. > > I still think this change is correct, and I'm going ahead with it, but > I think we probably want another change on that to RST (or something) > on flowless !SYN packets. Yes, it's RST with sequence number 0 if !ACK, matching sequence number (needing a separate version of tcp_rst() perhaps?) on ACK. -- Stefano