From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=CYtpixAd;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by passt.top (Postfix) with ESMTPS id 1D6595A0271
	for <passt-dev@passt.top>; Mon, 22 Sep 2025 22:03:40 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1758571419;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=KA9i2k9Hi27SvwyqiZ8WNklWoLTEKtfMLBBMLclTMTk=;
	b=CYtpixAddaTKg9x87GBkT+QGJrwi2oT49qkAz0I8ZgIjXFSjKIjGTAnoJ1e8QQuOonrBXi
	VJ+C9TRGah+EvnZgQ2vtCtCX8L0nn4+HaEEGW6xYGZ3mrrZT+1o2GUkbTHul0y5FYmuvV1
	4vU5J20IgpXH2YhWhH2+SUo5JLvzrzk=
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com
 [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-32-FrlR1jtGPcKtiC6Omz9jpA-1; Mon, 22 Sep 2025 16:03:37 -0400
X-MC-Unique: FrlR1jtGPcKtiC6Omz9jpA-1
X-Mimecast-MFC-AGG-ID: FrlR1jtGPcKtiC6Omz9jpA_1758571416
Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-45cb604427fso26642945e9.1
        for <passt-dev@passt.top>; Mon, 22 Sep 2025 13:03:37 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1758571416; x=1759176216;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=ee4Vnj6WFcz6kf/rmAQ4urPo1nOxgGW3kZLcTGBnVJ8=;
        b=Fps/NqPomFr71VsHaDkdhg5hGJDOThkK5xpyrvmCPJJbeeQkWsfXqyGbGveJD8G0xl
         qCv3i/ekfh6jXGHWh/0jJLm6T6TTsxVW1GQstO+irNZyHGmA4H1X+I9GIFs8wEWy+Y8Z
         GgWAJkpGk1fLyUuSpWaQEDPk92X8a5Lpom6UDqVKS1JCLBnAPu6vpJK1FV6YQ3DJ4CI3
         bLeQjCqZax2wFECTYgn4ruUfAKiCxdnA+ru1llOwDvtiQolHlg0xQ1Rm+j6stbfTB712
         20SqaxsUuqAE6I5R1jnzBFS+1kb0QN7/x28WsQNHP4Ozxo2Pic+mXl3OpJFZaYlzMC5/
         +Hfw==
X-Forwarded-Encrypted: i=1; AJvYcCWxljXMF3sxoxTTIPloS1SaNmM5gVQSnBDG0vq4UfFO0B18lbvKIMqu9zm/kZJTG8xtvjalP21VfT4=@passt.top
X-Gm-Message-State: AOJu0Yyvg96ppM2uWA6z7bAgguNaY7vnpHS5udvFzY0UdfG+RxMKUur7
	OO1CghpxxF614WKRPVuyMQJ70jUyjkxjQBcbggqnMRyRV4smG1kJ1usTIuoNPFd8GNgFiYBZIOx
	ZN+gdm//8d0d1uz47oAWlIshiMv1kh7bekxTFI/N+VSaYzmn8eh8KCA==
X-Gm-Gg: ASbGnctbfMV8OPZdMoAIWZvUCYyBvNGlUWGzqdGslUipagjslB84aW5CvSRhA9Qjnq7
	JzPfysERQwNk6QkD9qprlOoV8S/IY5IRN6G6eAkiKqD2O/f1zInkj5ADRFI4t8p/XwYOw1SFxRn
	tJso6nKhgTh6vIFiLi8UqMiCGfyAm1deq93uA/goWdhxjdVBXj6gsGUNWjVtsGfQujFV8WvhAHO
	KsK0KsoCO4672RCnbvFLJ9VkzDn3DlE5W89iSQyFMB9JPNycHAwLyO6PpNAlexPIK6LLkon4CM0
	Q0tosfGUOTDcF/sVIuMZUs0nHs/+5K+SG1fEKE2XfRsfSIXyBg8=
X-Received: by 2002:a05:600c:1c05:b0:45b:7ce0:fb98 with SMTP id 5b1f17b1804b1-46e1d97493fmr1837905e9.5.1758571414585;
        Mon, 22 Sep 2025 13:03:34 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IF7IlCDUaUbDZCrRl/U/FEDWpm8ZSYW2AhSTe1Vd5oqxC1iLmWF135PHKgxfHZiVolMx569ig==
X-Received: by 2002:a05:600c:1c05:b0:45b:7ce0:fb98 with SMTP id 5b1f17b1804b1-46e1d97493fmr1837545e9.5.1758571413970;
        Mon, 22 Sep 2025 13:03:33 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-401d7fa1729sm3116007f8f.5.2025.09.22.13.03.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 22 Sep 2025 13:03:33 -0700 (PDT)
Date: Mon, 22 Sep 2025 22:03:30 +0200
From: Stefano Brivio <sbrivio@redhat.com>
To: Yumei Huang <yuhuang@redhat.com>
Subject: Re: [PATCH] tap: Drop frames if no client connected
Message-ID: <20250922220330.436e2b6f@elisabeth>
In-Reply-To: <CANsz47m3hkY0t5wk2vqipuiGwyoVbDAy5u2RCNHFOJCDuxESrw@mail.gmail.com>
References: <20250911085519.24395-1-yuhuang@redhat.com>
	<20250911115425.79eaaac5@elisabeth>
	<aMN_AamYdb0tRytS@zatzit>
	<20250915081319.00e72e53@elisabeth>
	<aMuKdWHJojwS3r8F@zatzit>
	<20250918091714.77192b00@elisabeth>
	<aMyy7z0cd9hexsab@zatzit>
	<CANsz47m3hkY0t5wk2vqipuiGwyoVbDAy5u2RCNHFOJCDuxESrw@mail.gmail.com>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: mEqxHaK6978kU3zrfCzzL3haGGFkJN-nMWJ9NhLJ6-A_1758571416
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: E5WHFJ3WOOBTNH7O7XPKCM7PJWCL6ECK
X-Message-ID-Hash: E5WHFJ3WOOBTNH7O7XPKCM7PJWCL6ECK
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: David Gibson <david@gibson.dropbear.id.au>, passt-dev@passt.top, lvivier@redhat.com
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20250922220330.436e2b6f@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/E5WHFJ3WOOBTNH7O7XPKCM7PJWCL6ECK/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Mon, 22 Sep 2025 15:17:12 +0800
Yumei Huang <yuhuang@redhat.com> wrote:

> On Fri, Sep 19, 2025 at 9:38=E2=80=AFAM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote: =20
> > > On Thu, 18 Sep 2025 14:28:37 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > =20
> > > > On Mon, Sep 15, 2025 at 08:13:19AM +0200, Stefano Brivio wrote: =20
> > > > > On Fri, 12 Sep 2025 12:01:37 +1000
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > =20
> > > > > > On Thu, Sep 11, 2025 at 11:54:25AM +0200, Stefano Brivio wrote:=
 =20
> > > > > > > On Thu, 11 Sep 2025 16:55:19 +0800
> > > > > > > Yumei Huang <yuhuang@redhat.com> wrote:
> > > > > > > =20
> > > > > > > > If no client is attached, discard outgoing frames and repor=
t them as
> > > > > > > > sent. This mimics the behavior of a physical host with its =
network
> > > > > > > > cable unplugged.
> > > > > > > >
> > > > > > > > Suggested-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com> =20
> > > > > > >
> > > > > > > Thanks, the fix itself obviously makes sense, but I have a fe=
w questions
> > > > > > > and comments:
> > > > > > >
> > > > > > > - first off, what happens if we don't return early in tap_sen=
d_frames()?
> > > > > > >   Commit messages for fixes (assuming this is a fix) should a=
lways say
> > > > > > >   what concrete problem we had, what is going to be fixed, or=
 if we're
> > > > > > >   not aware of any real issue but things are just fragile / w=
rong =20
> > > > > >
> > > > > > Without this we will get an EBADF in either writev() (pasta) or
> > > > > > sendmsg() (passt).  That's basically harmless, but a bit ugly.
> > > > > > Explicitly catching this case results in behaviour that's proba=
bly a
> > > > > > bit clearer to debug if we hit it.
> > > > > >
> > > > > > Putting that context in the commit message would be useful.
> > > > > > =20
> > > > > > > - until a while ago, this couldn't happen at all. We were jus=
t blocking
> > > > > > >   the whole execution as long as the tap / guest / container =
interface
> > > > > > >   wasn't up and running.
> > > > > > >
> > > > > > >   I wonder when this changed and if it makes sense to go back=
 to the
> > > > > > >   previous behaviour. I had just a quick look and I wonder if=
 I
> > > > > > >   accidentally broke this in c9b241346569 ("conf, passt, tap:=
 Open
> > > > > > >   socket and PID files before switching UID/GID").
> > > > > > >
> > > > > > >   Before that, main() would call tap_sock_init(), which would=
 call
> > > > > > >   tap_sock_unix_open(), a blocking function.
> > > > > > >
> > > > > > >   Should we make the whole thing blocking again? If not, is t=
here
> > > > > > >   anything else that's breaking with that? Timers, other inpu=
ts, etc. =20
> > > > > >
> > > > > > I don't think we can quite do that.  I'm not sure if it's the o=
nly
> > > > > > reason, but for vhost-user I believe we need the epoll loop up =
and
> > > > > > running before we have the tap connection fully set up, because=
 we
> > > > > > need it to process the vhost-user control messages.  Laurent, c=
an you
> > > > > > verify? =20
> > > > >
> > > > > We discussed this in the past, before realising that the executio=
n
> > > > > continues for whatever reason, and probably before I broke the
> > > > > assumption that guest connection was blocking.
> > > > >
> > > > > Yes, in the vhost-user case, the epoll loop needs to run before w=
e have
> > > > > a working connection to the guest, but:
> > > > >
> > > > > - we can anyway block until the control socket is set up (we used=
 to do
> > > > >   that) =20
> > > >
> > > > The vhost-user control socket?  I'm not entirely sure what you mean=
 by
> > > > "block" here.  Since we need the epoll loop up, I don't see how we =
can
> > > > block in the conventional sense. =20
> > >
> > > Let's rather say "until the data setup is complete".
> > >
> > > And by "block", I mean we would ignore any other event, obviously we
> > > have to listen to the control socket (in the main loop or in a separa=
te
> > > dedicated loop). =20
> >
> > Ok.  We could do that.  I don't think the peer visible behaviour would
> > really be different from what we get now silently dropping frames to
> > tap.  I'm not convinced it's really simpler than the current approach
> > either:
> >
> >  * For now, we could just skip all epoll handling if the event type
> >    isn't the control socket, but we'd need to be finer grained about
> >    this if we add anything else that needs handling before guest
> >    connection (e.g. dynamic configuration update mechanism and/or
> >    netlink monitor)
> >
> >  * Ignoring events in that way could lead to us busy-looping on epoll,
> >    because we might not clear events.  So we're back to having to
> >    consider every event type, at least to some extent.
> > =20
> > > I'm not suggesting we do this though (see below). It's just a
> > > possibility.
> > > =20
> > > > =20
> > > > > - the vhost-user implementation autonomously throws data away rec=
eived
> > > > >   before that point =20
> > > >
> > > > Right.  It doesn't have anywhere to put it, so it doesn't have much
> > > > choice.
> > > > =20
> > > > > Now, I don't think we necessarily need to stick to that approach,=
 it
> > > > > was the obvious choice when passt was much simpler, and it keeps =
things
> > > > > simple in the sense that we don't need to care about cases like t=
he
> > > > > ones this patch is addressing.
> > > > >
> > > > > On the other hand, if we want to switch to a different model, we =
need
> > > > > to have a look at other possible breakages, I guess.
> > > > > =20
> > > > > > There are several different approaches we can take here.  I dis=
cussed
> > > > > > some with Yumei and suggested she take this one.  Here's some
> > > > > > reasoning (maybe this would also be useful in the commit messag=
e,
> > > > > > though it's rather bulky)
> > > > > >
> > > > > > # Don't listen() until the tap connection is ready
> > > > > >
> > > > > >  - It's not clear that the host rejecting the connection is bet=
ter
> > > > > >    than the host accepting, then the connection stalling until =
the
> > > > > >    guest is ready.
> > > > > >  - Would require substantial rework because we currently listen=
() as
> > > > > >    we parse the command line and don't store the information we=
'd need
> > > > > >    to do it later. =20
> > > > >
> > > > > Right, that looks like a lot of effort for nothing.
> > > > > =20
> > > > > > # Don't accept() until the tap connection is ready
> > > > > >
> > > > > >  - To the peer, will behave basically the same as this patch - =
the
> > > > > >    host will complete the TCP handshake, then the connection wi=
ll stall
> > > > > >    until the guest is ready. =20
> > > > >
> > > > > Same here.
> > > > > =20
> > > > > >  - More work to implement, because essentially every sock-side =
handler
> > > > > >    has to check fd_tap and abort early =20
> > > > >
> > > > > There's one substantial issue at TCP level, though, that we're ke=
eping
> > > > > with the current approach and with this patch: we'll accept inbou=
nd
> > > > > connections and silently stall them.
> > > > >
> > > > > We could mitigate that by making the TCP handler aware of this, a=
nd by
> > > > > resetting the connection if the guest isn't there. This would at =
least
> > > > > be consistent with the case where the guest isn't listening on th=
e port
> > > > > (we accept(), fail to connect to it, eventually call tcp_rst()). =
=20
> > > >
> > > > True.  Arguably less consistent with a non-passt-connected peer tha=
t's
> > > > not there though.  Plus with the silently stall approach we have a
> > > > chance that the TCP connection will recover if the guest attaches
> > > > reasonably soon.
> > > > =20
> > > > > If we don't do this, I think we should at least check what happen=
s in
> > > > > terms of race conditions between passt starting and the guest app=
earing
> > > > > and accepting the connection. I guess we'll retry for a bit, whic=
h is
> > > > > desirable, but we should check that the whole retrying thing actu=
ally
> > > > > works.
> > > > >
> > > > > That's because the current approach just happened by accident. =
=20
> > > >
> > > > Right.  I'm not entirely sure what concrete action you're suggestin=
g
> > > > at this point, though. =20
> > >
> > > What I suggested in Monday's call and seemed to be all agreed upon, a=
nd
> > > also mentioned above: *check what happens*.
> > >
> > > Try that case, with this patch.
> > >
> > > Does it work to cover situations where users might start passt a bit
> > > before the guest connects, and try to connect to services right away?
> > >
> > > I suggested using ssh which should have a quite long timeout and retr=
y
> > > connecting for a while. You mentioned you would assist Yumei in testi=
ng
> > > this if needed. =20
> >
> > Ah, yes, you're right and I'd forgotten that.  Following up today. =20
>=20
> I tried both 'ssh' and 'socat'(writing a big file) before a guest
> connects, they get a 'Connection reset' after 10s, even if the guest
> connects in ~2s.
> It's because, when start ssh or socat, passt would try to finish the
> tcp handshake with the guest. It sends SYN to the guest immediately
> and waits for SYN-ACK. However, the SYN frame is dropped/lost due to
> no guest connected. So though the guest connects in seconds, the tcp
> handshake would timeout, and returns rst via tcp_rst().

Ah, right. We won't try to resend the SYN, that's simply not
implemented.

The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() and
handled by tcp_timer_handler().

> Either with or without this patch, they got the same 'connection reset'.
> Maybe it's something to fix?

First off, this shows that the current patch is harmless, so I would go
ahead and apply it (but see 2. below).

Strictly speaking, I don't think we really *need* to fix anything, but
for sure the behaviour isn't ideal. I see two alternatives:

1. we implement a periodic retry for the SYN segment. This would *seem*
   to give the best behaviour in this case, but:

   a. it's quite complicated (we need to calculate some delays for the
      timers, etc.), and not really transparent (which is in general a
      goal of passt)

   b. if the guest never appears, we're just wasting client's time. See
      db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even for
      client in SYN-SENT state") for an example where it's important to
      fail fast

   c. if the guest appears but isn't listening to the port, see b.

2. reset right away as I was suggesting in
   https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabeth/:

   > We could mitigate that by making the TCP handler aware of this, and by
   > resetting the connection if the guest isn't there. This would at least
   > be consistent with the case where the guest isn't listening on the por=
t
   > (we accept(), fail to connect to it, eventually call tcp_rst()).

   and let the client retry as appropriate (if implemented). Those retries
   can be quite fast, see this report (from IRC) for 722d347c1932 ("tcp:
   Don't reset outbound connection on SYN retries"):

3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00=
000001)
3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00=
000001)
3.3224:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p=
acket)
3.3224:          Flow 0 (NEW): FREE -> NEW
3.3224:          Flow 0 (INI): NEW -> INI
3.3224:          Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80=
 =3D> ?
3.3224:          Flow 0 (TGT): INI -> TGT
3.3224:          Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80=
 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
3.3224:          Flow 0 (TCP connection): TGT -> TYPED
3.3224:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [19=
2.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
3.3224:          Flow 0 (TCP connection): event at tcp_conn_from_tap:1489
3.3224:          Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
3.3224:          Flow 0 (TCP connection): failed to set TCP_MAXSEG on socke=
t 21
3.3224:          Flow 0 (TCP connection): Side 0 hash table insert: bucket:=
 294539
3.3225:          Flow 0 (TCP connection): TYPED -> ACTIVE
3.3225:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [19=
2.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
4.0027:          pasta: epoll event on namespace timer watch 17 (events: 0x=
00000001)
4.3612:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00=
000001)
4.3613:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p=
acket)
4.3613:          Flow 0 (TCP connection): packet length 40 from tap
4.3613:          Flow 0 (TCP connection): TCP reset at tcp_tap_handler:1989
4.3613:          Flow 0 (TCP connection): flag at tcp_prepare_flags:1163
4.3613:          Flow 0 (TCP connection): event at tcp_rst_do:1206
4.3613:          Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSED
4.3614:          Flow 0 (TCP connection): Side 0 hash table remove: bucket:=
 294539
4.3614:          Flow 0 (FREE): ACTIVE -> FREE
4.3614:          Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.0.0.1]:8=
0 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80

   ...the retry happened within one second. This is a container, so Linux
   kernel, and the client was wget.

So, in the end, I would suggest going with 2.: check if the guest /
container is connected in the TCP handler (tcp_data_from_sock()) and
reset the connection if it's not.

I would suggest checking that together with this patch. They would
still be two different patches, but I think it would be good to
check / test what happens with both of them.

--=20
Stefano