From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=KTPChzUf; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 1963D5A068E for ; Fri, 19 Sep 2025 03:38:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1758245879; bh=bCH8Xflzwl67pw8CqOAeIXUhDdsdlznVh+q5H8xxg3g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KTPChzUfQ5fdYZlNTrulJK6znHZeSmOUTZFfk+xCLQ6Fcrnk8wOmpvarYkurkcP+H 8S+k0KjPJl/WBsk+J4CJyHNT9E3i8M+YvwgP6PinRUQRxoayrLQP8F9P6ebEeIqj+n rjwjJ4I2yzvDLXcMcYdYSvMwMjLfbaVGImZ9TQdrbdVWGveMZV7swSdZLbe4vmzXbI RqdELPcPHLiazRQXPWwQad2/U/c5IvgZRUZGoz6Fd0SmLzgw1vDxwCVfh1h1f8RN1y C6uda7Ct4fbHIKv3E4kIf0Xh2EHejx4th4CZtFGkBh7wqmApiJYotKGaDKF4ZXPvLW foWzo+8A/QxvA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cSZrl5bK8z4wCQ; Fri, 19 Sep 2025 11:37:59 +1000 (AEST) Date: Fri, 19 Sep 2025 11:33:35 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] tap: Drop frames if no client connected Message-ID: References: <20250911085519.24395-1-yuhuang@redhat.com> <20250911115425.79eaaac5@elisabeth> <20250915081319.00e72e53@elisabeth> <20250918091714.77192b00@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="uh8bkEFNmOJJnt6b" Content-Disposition: inline In-Reply-To: <20250918091714.77192b00@elisabeth> Message-ID-Hash: BG3THWUXNHOJ7WZJ5BZTLQXYPTCER7PE X-Message-ID-Hash: BG3THWUXNHOJ7WZJ5BZTLQXYPTCER7PE X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top, lvivier@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --uh8bkEFNmOJJnt6b Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote: > On Thu, 18 Sep 2025 14:28:37 +1000 > David Gibson wrote: >=20 > > On Mon, Sep 15, 2025 at 08:13:19AM +0200, Stefano Brivio wrote: > > > On Fri, 12 Sep 2025 12:01:37 +1000 > > > David Gibson wrote: > > > =20 > > > > On Thu, Sep 11, 2025 at 11:54:25AM +0200, Stefano Brivio wrote: =20 > > > > > On Thu, 11 Sep 2025 16:55:19 +0800 > > > > > Yumei Huang wrote: > > > > > =20 > > > > > > If no client is attached, discard outgoing frames and report th= em as > > > > > > sent. This mimics the behavior of a physical host with its netw= ork > > > > > > cable unplugged. > > > > > >=20 > > > > > > Suggested-by: David Gibson > > > > > > Signed-off-by: Yumei Huang =20 > > > > >=20 > > > > > Thanks, the fix itself obviously makes sense, but I have a few qu= estions > > > > > and comments: > > > > >=20 > > > > > - first off, what happens if we don't return early in tap_send_fr= ames()? > > > > > Commit messages for fixes (assuming this is a fix) should alway= s say > > > > > what concrete problem we had, what is going to be fixed, or if = we're > > > > > not aware of any real issue but things are just fragile / wrong= =20 > > > >=20 > > > > Without this we will get an EBADF in either writev() (pasta) or > > > > sendmsg() (passt). That's basically harmless, but a bit ugly. > > > > Explicitly catching this case results in behaviour that's probably a > > > > bit clearer to debug if we hit it. > > > >=20 > > > > Putting that context in the commit message would be useful. > > > > =20 > > > > > - until a while ago, this couldn't happen at all. We were just bl= ocking > > > > > the whole execution as long as the tap / guest / container inte= rface > > > > > wasn't up and running. > > > > >=20 > > > > > I wonder when this changed and if it makes sense to go back to = the > > > > > previous behaviour. I had just a quick look and I wonder if I > > > > > accidentally broke this in c9b241346569 ("conf, passt, tap: Open > > > > > socket and PID files before switching UID/GID"). > > > > >=20 > > > > > Before that, main() would call tap_sock_init(), which would call > > > > > tap_sock_unix_open(), a blocking function. > > > > >=20 > > > > > Should we make the whole thing blocking again? If not, is there > > > > > anything else that's breaking with that? Timers, other inputs, = etc. =20 > > > >=20 > > > > I don't think we can quite do that. I'm not sure if it's the only > > > > reason, but for vhost-user I believe we need the epoll loop up and > > > > running before we have the tap connection fully set up, because we > > > > need it to process the vhost-user control messages. Laurent, can y= ou > > > > verify? =20 > > >=20 > > > We discussed this in the past, before realising that the execution > > > continues for whatever reason, and probably before I broke the > > > assumption that guest connection was blocking. > > >=20 > > > Yes, in the vhost-user case, the epoll loop needs to run before we ha= ve > > > a working connection to the guest, but: > > >=20 > > > - we can anyway block until the control socket is set up (we used to = do > > > that) =20 > >=20 > > The vhost-user control socket? I'm not entirely sure what you mean by > > "block" here. Since we need the epoll loop up, I don't see how we can > > block in the conventional sense. >=20 > Let's rather say "until the data setup is complete". >=20 > And by "block", I mean we would ignore any other event, obviously we > have to listen to the control socket (in the main loop or in a separate > dedicated loop). Ok. We could do that. I don't think the peer visible behaviour would really be different from what we get now silently dropping frames to tap. I'm not convinced it's really simpler than the current approach either: * For now, we could just skip all epoll handling if the event type isn't the control socket, but we'd need to be finer grained about this if we add anything else that needs handling before guest connection (e.g. dynamic configuration update mechanism and/or netlink monitor) * Ignoring events in that way could lead to us busy-looping on epoll, because we might not clear events. So we're back to having to consider every event type, at least to some extent. > I'm not suggesting we do this though (see below). It's just a > possibility. >=20 > >=20 > > > - the vhost-user implementation autonomously throws data away received > > > before that point =20 > >=20 > > Right. It doesn't have anywhere to put it, so it doesn't have much > > choice. > >=20 > > > Now, I don't think we necessarily need to stick to that approach, it > > > was the obvious choice when passt was much simpler, and it keeps thin= gs > > > simple in the sense that we don't need to care about cases like the > > > ones this patch is addressing. > > >=20 > > > On the other hand, if we want to switch to a different model, we need > > > to have a look at other possible breakages, I guess. > > > =20 > > > > There are several different approaches we can take here. I discuss= ed > > > > some with Yumei and suggested she take this one. Here's some > > > > reasoning (maybe this would also be useful in the commit message, > > > > though it's rather bulky) > > > >=20 > > > > # Don't listen() until the tap connection is ready > > > >=20 > > > > - It's not clear that the host rejecting the connection is better > > > > than the host accepting, then the connection stalling until the= =20 > > > > guest is ready. > > > > - Would require substantial rework because we currently listen() as > > > > we parse the command line and don't store the information we'd n= eed > > > > to do it later. =20 > > >=20 > > > Right, that looks like a lot of effort for nothing. > > > =20 > > > > # Don't accept() until the tap connection is ready > > > >=20 > > > > - To the peer, will behave basically the same as this patch - the > > > > host will complete the TCP handshake, then the connection will s= tall > > > > until the guest is ready. =20 > > >=20 > > > Same here. > > > =20 > > > > - More work to implement, because essentially every sock-side hand= ler > > > > has to check fd_tap and abort early =20 > > >=20 > > > There's one substantial issue at TCP level, though, that we're keeping > > > with the current approach and with this patch: we'll accept inbound > > > connections and silently stall them. > > >=20 > > > We could mitigate that by making the TCP handler aware of this, and by > > > resetting the connection if the guest isn't there. This would at least > > > be consistent with the case where the guest isn't listening on the po= rt > > > (we accept(), fail to connect to it, eventually call tcp_rst()). =20 > >=20 > > True. Arguably less consistent with a non-passt-connected peer that's > > not there though. Plus with the silently stall approach we have a > > chance that the TCP connection will recover if the guest attaches > > reasonably soon. > >=20 > > > If we don't do this, I think we should at least check what happens in > > > terms of race conditions between passt starting and the guest appeari= ng > > > and accepting the connection. I guess we'll retry for a bit, which is > > > desirable, but we should check that the whole retrying thing actually > > > works. > > >=20 > > > That's because the current approach just happened by accident. =20 > >=20 > > Right. I'm not entirely sure what concrete action you're suggesting > > at this point, though. >=20 > What I suggested in Monday's call and seemed to be all agreed upon, and > also mentioned above: *check what happens*. >=20 > Try that case, with this patch. >=20 > Does it work to cover situations where users might start passt a bit > before the guest connects, and try to connect to services right away? >=20 > I suggested using ssh which should have a quite long timeout and retry > connecting for a while. You mentioned you would assist Yumei in testing > this if needed. Ah, yes, you're right and I'd forgotten that. Following up today. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --uh8bkEFNmOJJnt6b Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjMst8ACgkQzQJF27ox 2GeiWQ//Qv82fZQPNbOACTFxbEWpPJuJeuCybKA+i9hXKd0m0BqUh4YAaenu3ZGD V+N2d8+M7PGkt/wb1dYH3syPGtrqOswEbQxz39sRN3Z33M9K+V1enAhSsRf+Wvbx xdyw6Jkis+2T/me0iFqTa+pqV2TJi+1z432IIgddAqUc8qsHf0D5mD246BJ4g2BQ WRrJ/T7dk37StG7hL9/aCRsUNc9Q9ZPHd5CNex/nxctk7xTN3U7nlI1JlF8/hagb OCVj1zpSt9XelynTYJKiGiaCDOblhnZC9AeGspqrAA4LcHgPBXHGHg0U9WcINXPB LiVhTC3Bk1LQMjMEAh15xPsUF5KOhT9IcDkNZ5ymvpMgsGn5yMyU3x0BMLwwjs7D dJAl9JxwATLvLJ4DO7WKr0xpa2UNL+NkBs4qLdwMnxjaO0Lazr3cZVuOPYhHZn6X VJVlDwAfInvxInucKRTCOllNW7uJLqzu3CHLa5y3V6vLstKWNh9gQtJtOozVntFf xxMjM6NDCreNiWLnOcwEAkn7LL3arQaj3Vk+SEYpX54RHtE7UnPPKfL1QlAJH8UO 4yV9gNmGupdQAyna5+GsUk85WDPSJ7SIlXVt4eYKkXVJMXgSnRBtCUX5sM5z5zT4 SQ3KjqnxLedxaaNln3+EMEXqCGuRpBpBOhlPTgsSpQGWiOSWk5M= =hXrb -----END PGP SIGNATURE----- --uh8bkEFNmOJJnt6b--