From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=CYtpixAd; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 1D6595A0271 for ; Mon, 22 Sep 2025 22:03:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758571419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KA9i2k9Hi27SvwyqiZ8WNklWoLTEKtfMLBBMLclTMTk=; b=CYtpixAddaTKg9x87GBkT+QGJrwi2oT49qkAz0I8ZgIjXFSjKIjGTAnoJ1e8QQuOonrBXi VJ+C9TRGah+EvnZgQ2vtCtCX8L0nn4+HaEEGW6xYGZ3mrrZT+1o2GUkbTHul0y5FYmuvV1 4vU5J20IgpXH2YhWhH2+SUo5JLvzrzk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-32-FrlR1jtGPcKtiC6Omz9jpA-1; Mon, 22 Sep 2025 16:03:37 -0400 X-MC-Unique: FrlR1jtGPcKtiC6Omz9jpA-1 X-Mimecast-MFC-AGG-ID: FrlR1jtGPcKtiC6Omz9jpA_1758571416 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-45cb604427fso26642945e9.1 for ; Mon, 22 Sep 2025 13:03:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758571416; x=1759176216; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ee4Vnj6WFcz6kf/rmAQ4urPo1nOxgGW3kZLcTGBnVJ8=; b=Fps/NqPomFr71VsHaDkdhg5hGJDOThkK5xpyrvmCPJJbeeQkWsfXqyGbGveJD8G0xl qCv3i/ekfh6jXGHWh/0jJLm6T6TTsxVW1GQstO+irNZyHGmA4H1X+I9GIFs8wEWy+Y8Z GgWAJkpGk1fLyUuSpWaQEDPk92X8a5Lpom6UDqVKS1JCLBnAPu6vpJK1FV6YQ3DJ4CI3 bLeQjCqZax2wFECTYgn4ruUfAKiCxdnA+ru1llOwDvtiQolHlg0xQ1Rm+j6stbfTB712 20SqaxsUuqAE6I5R1jnzBFS+1kb0QN7/x28WsQNHP4Ozxo2Pic+mXl3OpJFZaYlzMC5/ +Hfw== X-Forwarded-Encrypted: i=1; AJvYcCWxljXMF3sxoxTTIPloS1SaNmM5gVQSnBDG0vq4UfFO0B18lbvKIMqu9zm/kZJTG8xtvjalP21VfT4=@passt.top X-Gm-Message-State: AOJu0Yyvg96ppM2uWA6z7bAgguNaY7vnpHS5udvFzY0UdfG+RxMKUur7 OO1CghpxxF614WKRPVuyMQJ70jUyjkxjQBcbggqnMRyRV4smG1kJ1usTIuoNPFd8GNgFiYBZIOx ZN+gdm//8d0d1uz47oAWlIshiMv1kh7bekxTFI/N+VSaYzmn8eh8KCA== X-Gm-Gg: ASbGnctbfMV8OPZdMoAIWZvUCYyBvNGlUWGzqdGslUipagjslB84aW5CvSRhA9Qjnq7 JzPfysERQwNk6QkD9qprlOoV8S/IY5IRN6G6eAkiKqD2O/f1zInkj5ADRFI4t8p/XwYOw1SFxRn tJso6nKhgTh6vIFiLi8UqMiCGfyAm1deq93uA/goWdhxjdVBXj6gsGUNWjVtsGfQujFV8WvhAHO KsK0KsoCO4672RCnbvFLJ9VkzDn3DlE5W89iSQyFMB9JPNycHAwLyO6PpNAlexPIK6LLkon4CM0 Q0tosfGUOTDcF/sVIuMZUs0nHs/+5K+SG1fEKE2XfRsfSIXyBg8= X-Received: by 2002:a05:600c:1c05:b0:45b:7ce0:fb98 with SMTP id 5b1f17b1804b1-46e1d97493fmr1837905e9.5.1758571414585; Mon, 22 Sep 2025 13:03:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF7IlCDUaUbDZCrRl/U/FEDWpm8ZSYW2AhSTe1Vd5oqxC1iLmWF135PHKgxfHZiVolMx569ig== X-Received: by 2002:a05:600c:1c05:b0:45b:7ce0:fb98 with SMTP id 5b1f17b1804b1-46e1d97493fmr1837545e9.5.1758571413970; Mon, 22 Sep 2025 13:03:33 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-401d7fa1729sm3116007f8f.5.2025.09.22.13.03.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Sep 2025 13:03:33 -0700 (PDT) Date: Mon, 22 Sep 2025 22:03:30 +0200 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH] tap: Drop frames if no client connected Message-ID: <20250922220330.436e2b6f@elisabeth> In-Reply-To: References: <20250911085519.24395-1-yuhuang@redhat.com> <20250911115425.79eaaac5@elisabeth> <20250915081319.00e72e53@elisabeth> <20250918091714.77192b00@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: mEqxHaK6978kU3zrfCzzL3haGGFkJN-nMWJ9NhLJ6-A_1758571416 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: E5WHFJ3WOOBTNH7O7XPKCM7PJWCL6ECK X-Message-ID-Hash: E5WHFJ3WOOBTNH7O7XPKCM7PJWCL6ECK X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top, lvivier@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 22 Sep 2025 15:17:12 +0800 Yumei Huang wrote: > On Fri, Sep 19, 2025 at 9:38=E2=80=AFAM David Gibson > wrote: > > > > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote: =20 > > > On Thu, 18 Sep 2025 14:28:37 +1000 > > > David Gibson wrote: > > > =20 > > > > On Mon, Sep 15, 2025 at 08:13:19AM +0200, Stefano Brivio wrote: =20 > > > > > On Fri, 12 Sep 2025 12:01:37 +1000 > > > > > David Gibson wrote: > > > > > =20 > > > > > > On Thu, Sep 11, 2025 at 11:54:25AM +0200, Stefano Brivio wrote:= =20 > > > > > > > On Thu, 11 Sep 2025 16:55:19 +0800 > > > > > > > Yumei Huang wrote: > > > > > > > =20 > > > > > > > > If no client is attached, discard outgoing frames and repor= t them as > > > > > > > > sent. This mimics the behavior of a physical host with its = network > > > > > > > > cable unplugged. > > > > > > > > > > > > > > > > Suggested-by: David Gibson > > > > > > > > Signed-off-by: Yumei Huang =20 > > > > > > > > > > > > > > Thanks, the fix itself obviously makes sense, but I have a fe= w questions > > > > > > > and comments: > > > > > > > > > > > > > > - first off, what happens if we don't return early in tap_sen= d_frames()? > > > > > > > Commit messages for fixes (assuming this is a fix) should a= lways say > > > > > > > what concrete problem we had, what is going to be fixed, or= if we're > > > > > > > not aware of any real issue but things are just fragile / w= rong =20 > > > > > > > > > > > > Without this we will get an EBADF in either writev() (pasta) or > > > > > > sendmsg() (passt). That's basically harmless, but a bit ugly. > > > > > > Explicitly catching this case results in behaviour that's proba= bly a > > > > > > bit clearer to debug if we hit it. > > > > > > > > > > > > Putting that context in the commit message would be useful. > > > > > > =20 > > > > > > > - until a while ago, this couldn't happen at all. We were jus= t blocking > > > > > > > the whole execution as long as the tap / guest / container = interface > > > > > > > wasn't up and running. > > > > > > > > > > > > > > I wonder when this changed and if it makes sense to go back= to the > > > > > > > previous behaviour. I had just a quick look and I wonder if= I > > > > > > > accidentally broke this in c9b241346569 ("conf, passt, tap:= Open > > > > > > > socket and PID files before switching UID/GID"). > > > > > > > > > > > > > > Before that, main() would call tap_sock_init(), which would= call > > > > > > > tap_sock_unix_open(), a blocking function. > > > > > > > > > > > > > > Should we make the whole thing blocking again? If not, is t= here > > > > > > > anything else that's breaking with that? Timers, other inpu= ts, etc. =20 > > > > > > > > > > > > I don't think we can quite do that. I'm not sure if it's the o= nly > > > > > > reason, but for vhost-user I believe we need the epoll loop up = and > > > > > > running before we have the tap connection fully set up, because= we > > > > > > need it to process the vhost-user control messages. Laurent, c= an you > > > > > > verify? =20 > > > > > > > > > > We discussed this in the past, before realising that the executio= n > > > > > continues for whatever reason, and probably before I broke the > > > > > assumption that guest connection was blocking. > > > > > > > > > > Yes, in the vhost-user case, the epoll loop needs to run before w= e have > > > > > a working connection to the guest, but: > > > > > > > > > > - we can anyway block until the control socket is set up (we used= to do > > > > > that) =20 > > > > > > > > The vhost-user control socket? I'm not entirely sure what you mean= by > > > > "block" here. Since we need the epoll loop up, I don't see how we = can > > > > block in the conventional sense. =20 > > > > > > Let's rather say "until the data setup is complete". > > > > > > And by "block", I mean we would ignore any other event, obviously we > > > have to listen to the control socket (in the main loop or in a separa= te > > > dedicated loop). =20 > > > > Ok. We could do that. I don't think the peer visible behaviour would > > really be different from what we get now silently dropping frames to > > tap. I'm not convinced it's really simpler than the current approach > > either: > > > > * For now, we could just skip all epoll handling if the event type > > isn't the control socket, but we'd need to be finer grained about > > this if we add anything else that needs handling before guest > > connection (e.g. dynamic configuration update mechanism and/or > > netlink monitor) > > > > * Ignoring events in that way could lead to us busy-looping on epoll, > > because we might not clear events. So we're back to having to > > consider every event type, at least to some extent. > > =20 > > > I'm not suggesting we do this though (see below). It's just a > > > possibility. > > > =20 > > > > =20 > > > > > - the vhost-user implementation autonomously throws data away rec= eived > > > > > before that point =20 > > > > > > > > Right. It doesn't have anywhere to put it, so it doesn't have much > > > > choice. > > > > =20 > > > > > Now, I don't think we necessarily need to stick to that approach,= it > > > > > was the obvious choice when passt was much simpler, and it keeps = things > > > > > simple in the sense that we don't need to care about cases like t= he > > > > > ones this patch is addressing. > > > > > > > > > > On the other hand, if we want to switch to a different model, we = need > > > > > to have a look at other possible breakages, I guess. > > > > > =20 > > > > > > There are several different approaches we can take here. I dis= cussed > > > > > > some with Yumei and suggested she take this one. Here's some > > > > > > reasoning (maybe this would also be useful in the commit messag= e, > > > > > > though it's rather bulky) > > > > > > > > > > > > # Don't listen() until the tap connection is ready > > > > > > > > > > > > - It's not clear that the host rejecting the connection is bet= ter > > > > > > than the host accepting, then the connection stalling until = the > > > > > > guest is ready. > > > > > > - Would require substantial rework because we currently listen= () as > > > > > > we parse the command line and don't store the information we= 'd need > > > > > > to do it later. =20 > > > > > > > > > > Right, that looks like a lot of effort for nothing. > > > > > =20 > > > > > > # Don't accept() until the tap connection is ready > > > > > > > > > > > > - To the peer, will behave basically the same as this patch - = the > > > > > > host will complete the TCP handshake, then the connection wi= ll stall > > > > > > until the guest is ready. =20 > > > > > > > > > > Same here. > > > > > =20 > > > > > > - More work to implement, because essentially every sock-side = handler > > > > > > has to check fd_tap and abort early =20 > > > > > > > > > > There's one substantial issue at TCP level, though, that we're ke= eping > > > > > with the current approach and with this patch: we'll accept inbou= nd > > > > > connections and silently stall them. > > > > > > > > > > We could mitigate that by making the TCP handler aware of this, a= nd by > > > > > resetting the connection if the guest isn't there. This would at = least > > > > > be consistent with the case where the guest isn't listening on th= e port > > > > > (we accept(), fail to connect to it, eventually call tcp_rst()). = =20 > > > > > > > > True. Arguably less consistent with a non-passt-connected peer tha= t's > > > > not there though. Plus with the silently stall approach we have a > > > > chance that the TCP connection will recover if the guest attaches > > > > reasonably soon. > > > > =20 > > > > > If we don't do this, I think we should at least check what happen= s in > > > > > terms of race conditions between passt starting and the guest app= earing > > > > > and accepting the connection. I guess we'll retry for a bit, whic= h is > > > > > desirable, but we should check that the whole retrying thing actu= ally > > > > > works. > > > > > > > > > > That's because the current approach just happened by accident. = =20 > > > > > > > > Right. I'm not entirely sure what concrete action you're suggestin= g > > > > at this point, though. =20 > > > > > > What I suggested in Monday's call and seemed to be all agreed upon, a= nd > > > also mentioned above: *check what happens*. > > > > > > Try that case, with this patch. > > > > > > Does it work to cover situations where users might start passt a bit > > > before the guest connects, and try to connect to services right away? > > > > > > I suggested using ssh which should have a quite long timeout and retr= y > > > connecting for a while. You mentioned you would assist Yumei in testi= ng > > > this if needed. =20 > > > > Ah, yes, you're right and I'd forgotten that. Following up today. =20 >=20 > I tried both 'ssh' and 'socat'(writing a big file) before a guest > connects, they get a 'Connection reset' after 10s, even if the guest > connects in ~2s. > It's because, when start ssh or socat, passt would try to finish the > tcp handshake with the guest. It sends SYN to the guest immediately > and waits for SYN-ACK. However, the SYN frame is dropped/lost due to > no guest connected. So though the guest connects in seconds, the tcp > handshake would timeout, and returns rst via tcp_rst(). Ah, right. We won't try to resend the SYN, that's simply not implemented. The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() and handled by tcp_timer_handler(). > Either with or without this patch, they got the same 'connection reset'. > Maybe it's something to fix? First off, this shows that the current patch is harmless, so I would go ahead and apply it (but see 2. below). Strictly speaking, I don't think we really *need* to fix anything, but for sure the behaviour isn't ideal. I see two alternatives: 1. we implement a periodic retry for the SYN segment. This would *seem* to give the best behaviour in this case, but: a. it's quite complicated (we need to calculate some delays for the timers, etc.), and not really transparent (which is in general a goal of passt) b. if the guest never appears, we're just wasting client's time. See db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even for client in SYN-SENT state") for an example where it's important to fail fast c. if the guest appears but isn't listening to the port, see b. 2. reset right away as I was suggesting in https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabeth/: > We could mitigate that by making the TCP handler aware of this, and by > resetting the connection if the guest isn't there. This would at least > be consistent with the case where the guest isn't listening on the por= t > (we accept(), fail to connect to it, eventually call tcp_rst()). and let the client retry as appropriate (if implemented). Those retries can be quite fast, see this report (from IRC) for 722d347c1932 ("tcp: Don't reset outbound connection on SYN retries"): 3.3223: pasta: epoll event on /dev/net/tun device 18 (events: 0x00= 000001) 3.3223: pasta: epoll event on /dev/net/tun device 18 (events: 0x00= 000001) 3.3224: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p= acket) 3.3224: Flow 0 (NEW): FREE -> NEW 3.3224: Flow 0 (INI): NEW -> INI 3.3224: Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80= =3D> ? 3.3224: Flow 0 (TGT): INI -> TGT 3.3224: Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80= =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 3.3224: Flow 0 (TCP connection): TGT -> TYPED 3.3224: Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [19= 2.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 3.3224: Flow 0 (TCP connection): event at tcp_conn_from_tap:1489 3.3224: Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT 3.3224: Flow 0 (TCP connection): failed to set TCP_MAXSEG on socke= t 21 3.3224: Flow 0 (TCP connection): Side 0 hash table insert: bucket:= 294539 3.3225: Flow 0 (TCP connection): TYPED -> ACTIVE 3.3225: Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [19= 2.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 4.0027: pasta: epoll event on namespace timer watch 17 (events: 0x= 00000001) 4.3612: pasta: epoll event on /dev/net/tun device 18 (events: 0x00= 000001) 4.3613: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p= acket) 4.3613: Flow 0 (TCP connection): packet length 40 from tap 4.3613: Flow 0 (TCP connection): TCP reset at tcp_tap_handler:1989 4.3613: Flow 0 (TCP connection): flag at tcp_prepare_flags:1163 4.3613: Flow 0 (TCP connection): event at tcp_rst_do:1206 4.3613: Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSED 4.3614: Flow 0 (TCP connection): Side 0 hash table remove: bucket:= 294539 4.3614: Flow 0 (FREE): ACTIVE -> FREE 4.3614: Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.0.0.1]:8= 0 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 ...the retry happened within one second. This is a container, so Linux kernel, and the client was wget. So, in the end, I would suggest going with 2.: check if the guest / container is connected in the TCP handler (tcp_data_from_sock()) and reset the connection if it's not. I would suggest checking that together with this patch. They would still be two different patches, but I think it would be good to check / test what happens with both of them. --=20 Stefano