From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VgZwQGBl;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by passt.top (Postfix) with ESMTPS id 5A61A5A026F
	for <passt-dev@passt.top>; Tue, 23 Sep 2025 13:00:46 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1758625245;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=ZhytOwM6EPuHWtkwhYmOs4K3r46CtoU0eZb9PUsh2hY=;
	b=VgZwQGBlQQfDoF5nhlUV+2GXmf77/IwDQcNRzAyYNcS3MrqqCDD1cU94oav1YzYT5mAd9t
	ZnfGde3OSB/ai/21jsbIKYhJTeZTyNztFOaAmE1Sl5yk/8/Vv6IZjxWJkZtcmcGn/k0Pwy
	Ce27kAvzn8tdSrqJ1R29OyYNwKtlpNg=
Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com
 [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-49-MkoRF4pNO9-xlrW4O63L_w-1; Tue, 23 Sep 2025 07:00:43 -0400
X-MC-Unique: MkoRF4pNO9-xlrW4O63L_w-1
X-Mimecast-MFC-AGG-ID: MkoRF4pNO9-xlrW4O63L_w_1758625242
Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3efa77de998so2420551f8f.0
        for <passt-dev@passt.top>; Tue, 23 Sep 2025 04:00:43 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1758625242; x=1759230042;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=6zTfi7fsh+QTup8+Ftpb61OKF6PrRmL01p3Jk6d94yw=;
        b=eRKd/YnVgntG/hIURY5Dcdr3Vflmj1T5K/fry/APp3iQrLSsRJq1k6OcgGJapT3MlM
         P6/iO9iMRlH0v3WdNOmV3GsI0SYIgdb+OlUTeYZSsY7mkVJXS1puW62u7nEcgTk3zPW7
         AOwMFnyOMO5OxTrwCIyUS7bnJhLkxzbciqHfvGLH2/ObSlj7ejU7rnv9GeCS1f+b8WZF
         PxSC5h6mg70DgE9ImUpW1qw22ENgOb4J6raMX1C4m2/uRNqK4vUlZnfOprHlOF+0WrEa
         qTgFoQEM2ReuXVdx6Nee2MQ5Js5w/jLpy2rXvI59bdM7jDOhaZyev6Uhpumma8juc0Sv
         oy2A==
X-Forwarded-Encrypted: i=1; AJvYcCVa+hDXfRNPe5z9+wopC03gymu2G0c6vsLUWuUIQCnKI8a42hixyh7+AviO9s2lEo0FLoDCP7XhdQo=@passt.top
X-Gm-Message-State: AOJu0YxP3RX2d0jjm/GvIzZ7NtztpS1JtaMKyNePA1Vin+hLQRDGmcJ8
	68mAtkpXRMafmD9fBU/Hwr1rqm3cH2SKgrwBKqbiuhR1YhGJNL0zEWpnLjAvPiabpFSGug3d8m9
	QBmcVWEDOZ+nj8MJW19Dy0NVid0g2ksRNuUVEqO2J7hLjC8bAS1fJCg==
X-Gm-Gg: ASbGncunUzsE5QM0MeoWXBmyntoCPLm4pcFLdhtZR+obCgh52FOizYdatYpnOwoNfiH
	e018g/usAlf7eac/xKBmJbdPSa+1i9YAyQDykk6+co+r2EaLHG0O8vzNMcgjjeuNJMcwM+/xa+R
	GYJl+6nCoeeN641hw90tGKTHLBygLxmpqWiKjGQ8cGs+wa8gMhBJO5Tvj4oZjuBmInO7gd0OOVH
	CrxdfrCeQ3QSzo3Y9/VFLMI6+ECthOSskaalZaRKe5IClAQtjSuqeRJsnDVpWt3q2Zsafq7Idcy
	t714uZj9V3OhKa4wt7eNObdFT38Tnb8surO8PXSaExQ6kk3Fh+DjBSsukZXEajCJGvPj
X-Received: by 2002:a5d:528f:0:b0:3ee:1118:df81 with SMTP id ffacd0b85a97d-405c4a973ebmr1422956f8f.13.1758625242055;
        Tue, 23 Sep 2025 04:00:42 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IH9RvIG1fMVPwg9nMsFogl395ZgRoSSkhcKLGUxOPWFs6TJe9KkDzA8R0YnIihjPx74jzX51A==
X-Received: by 2002:a5d:528f:0:b0:3ee:1118:df81 with SMTP id ffacd0b85a97d-405c4a973ebmr1422909f8f.13.1758625241389;
        Tue, 23 Sep 2025 04:00:41 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3f02f332c31sm17605984f8f.45.2025.09.23.04.00.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 23 Sep 2025 04:00:40 -0700 (PDT)
Date: Tue, 23 Sep 2025 13:00:39 +0200
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH] tap: Drop frames if no client connected
Message-ID: <20250923130039.41e8ef8d@elisabeth>
In-Reply-To: <aNJR5e1iEH9jZVPQ@zatzit>
References: <20250911085519.24395-1-yuhuang@redhat.com>
	<20250911115425.79eaaac5@elisabeth>
	<aMN_AamYdb0tRytS@zatzit>
	<20250915081319.00e72e53@elisabeth>
	<aMuKdWHJojwS3r8F@zatzit>
	<20250918091714.77192b00@elisabeth>
	<aMyy7z0cd9hexsab@zatzit>
	<CANsz47m3hkY0t5wk2vqipuiGwyoVbDAy5u2RCNHFOJCDuxESrw@mail.gmail.com>
	<20250922220330.436e2b6f@elisabeth>
	<aNJR5e1iEH9jZVPQ@zatzit>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: Mr-fyNY_YNzIJypUwM3JsfPv6EcnCfe0vmnP04-HMo8_1758625242
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: CVKZETRW5DPCFHZIIPTCKRMFIPWCY5ZY
X-Message-ID-Hash: CVKZETRW5DPCFHZIIPTCKRMFIPWCY5ZY
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Yumei Huang <yuhuang@redhat.com>, passt-dev@passt.top, lvivier@redhat.com
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20250923130039.41e8ef8d@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/CVKZETRW5DPCFHZIIPTCKRMFIPWCY5ZY/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Tue, 23 Sep 2025 17:53:09 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Sep 22, 2025 at 10:03:30PM +0200, Stefano Brivio wrote:
> > On Mon, 22 Sep 2025 15:17:12 +0800
> > Yumei Huang <yuhuang@redhat.com> wrote: =20
> > > On Fri, Sep 19, 2025 at 9:38=E2=80=AFAM David Gibson
> > > <david@gibson.dropbear.id.au> wrote: =20
> > > > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote:   =
=20
> > > > > On Thu, 18 Sep 2025 14:28:37 +1000
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote: =20
> [snip]
> > > > > Does it work to cover situations where users might start passt a =
bit
> > > > > before the guest connects, and try to connect to services right a=
way?
> > > > >
> > > > > I suggested using ssh which should have a quite long timeout and =
retry
> > > > > connecting for a while. You mentioned you would assist Yumei in t=
esting
> > > > > this if needed.   =20
> > > >
> > > > Ah, yes, you're right and I'd forgotten that.  Following up today. =
  =20
> > >=20
> > > I tried both 'ssh' and 'socat'(writing a big file) before a guest
> > > connects, they get a 'Connection reset' after 10s, even if the guest
> > > connects in ~2s.
> > > It's because, when start ssh or socat, passt would try to finish the
> > > tcp handshake with the guest. It sends SYN to the guest immediately
> > > and waits for SYN-ACK. However, the SYN frame is dropped/lost due to
> > > no guest connected. So though the guest connects in seconds, the tcp
> > > handshake would timeout, and returns rst via tcp_rst(). =20
> >=20
> > Ah, right. We won't try to resend the SYN, that's simply not
> > implemented.
> >=20
> > The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() and
> > handled by tcp_timer_handler().
> >  =20
> > > Either with or without this patch, they got the same 'connection rese=
t'.
> > > Maybe it's something to fix? =20
> >=20
> > First off, this shows that the current patch is harmless, so I would go
> > ahead and apply it (but see 2. below).
> >=20
> > Strictly speaking, I don't think we really *need* to fix anything, but
> > for sure the behaviour isn't ideal. I see two alternatives:
> >=20
> > 1. we implement a periodic retry for the SYN segment. This would *seem*
> >    to give the best behaviour in this case, but:
> >=20
> >    a. it's quite complicated (we need to calculate some delays for the
> >       timers, etc.), and not really transparent (which is in general a
> >       goal of passt) =20
>=20
> I'm not really sure why you say it's not transparent, or at least what
> other option you're comparing it to.  The peer has initiated a
> connection to us in the normal way (which may include resending SYNs).
> Now we're initiating a connection to the guest in the normal way
> (which may include resending SYNs).

I was comparing this to b. or to doing nothing.

But, actually, you're right, the kernel wouldn't tell us about a
repeated SYN, it would still be the same socket returned from accept(),
so it's not necessarily less transparent.

I was thinking that we know when the guest connects, so we could just
delay the SYN segment until then, by introducing a separate TAP_SYN_SENT
event (right now it's implicit in SOCK_ACCEPTED). But when the guest
connects, services are typically not up yet. You would typically get a
RST while the guest is booting.

> >    b. if the guest never appears, we're just wasting client's time. See
> >       db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even for
> >       client in SYN-SENT state") for an example where it's important to
> >       fail fast =20
>=20
> Sure.  I'd say RSTing here would be *less* transparent, but it might
> still be worth it to make the peer fail fast.

But that's what happens naturally (with Linux) if nobody is listening,
and in RFC 9293 terms, I'd say we should approximate a CLOSED state,
3.10.7.1:

  If the state is CLOSED (i.e., TCB does not exist), then [...] [a]n
  incoming segment not containing a RST causes a RST to be sent in response=
.

rather than a LISTEN state (3.10.7.2). However, see below.

> > 2. reset right away as I was suggesting in
> >    https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabe=
th/:
> >  =20
> >    > We could mitigate that by making the TCP handler aware of this, an=
d by
> >    > resetting the connection if the guest isn't there. This would at l=
east
> >    > be consistent with the case where the guest isn't listening on the=
 port
> >    > (we accept(), fail to connect to it, eventually call tcp_rst()). =
=20
> >=20
> >    and let the client retry as appropriate (if implemented). Those retr=
ies
> >    can be quite fast, see this report (from IRC) for 722d347c1932 ("tcp=
:
> >    Don't reset outbound connection on SYN retries"): =20
>=20
> I don't see how that commit is relevant to this situation.  That's
> talking about SYN retries.

That's just an example about how SYN segments are retried. It's not
otherwise relevant for this situation.

> We can see those in the case of outbound
> connections bot we'll never see them for the case of inbound
> connections, because the host kernel has already completed the
> handshake.  For inbound we essentially have two options:
>=20
>  a) Retry SYNs ourselves, emulating what the peer would do if it was
>     talking directly to an absent guest.
>  b) Reject SYNs quickly, trusting that the guest will have some sort of
>     application level retry.  That will depend on the client.  I guess
>     my fear here is that a client seeing a completed handshake + RST
>     might assume that the guest server is permanently broken, rather
>     than just temporarily missing as it might if there's no response at
>     all.

Oops, that's a detail I forgot: we complete the handshake and then
reset... which brings us to https://bugs.passt.top/show_bug.cgi?id=3D131.

Once that's implemented, perhaps it will be low effort to not listen()
at all in that case. Right now, I'm not sure anymore.

On the other hand, with just this patch, we will reset the connection
after 10 seconds (no matter what happens), which is just like this, but
delayed.

> I suggested Yumei's approach here to aim for (a) on the basis of
> transparency - it's as close as I think we can get to a bridged guest
> that's just missing.  I'm not necessarily opposed to (b), but I think
> it's less transparent, so we need an argument that it will lead to
> better outcomes regardless.

Given the problem above, maybe we should really look into a) (but this
patch doesn't do it).

Well, let me merge this, and other than that I would suggest looking
into a) if time allows.

b) looks still slightly better than the current situation, because right
now we'll accept and RST after 10 seconds. So if time doesn't allow,
let's settle for b) for the moment being?

> > 3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: =
0x00000001)
> > 3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: =
0x00000001)
> > 3.3224:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 =
(1 packet)
> > 3.3224:          Flow 0 (NEW): FREE -> NEW
> > 3.3224:          Flow 0 (INI): NEW -> INI
> > 3.3224:          Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0.0.1=
]:80 =3D> ?
> > 3.3224:          Flow 0 (TGT): INI -> TGT
> > 3.3224:          Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0.0.1=
]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> > 3.3224:          Flow 0 (TCP connection): TGT -> TYPED
> > 3.3224:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 ->=
 [192.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> > 3.3224:          Flow 0 (TCP connection): event at tcp_conn_from_tap:14=
89
> > 3.3224:          Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_S=
ENT
> > 3.3224:          Flow 0 (TCP connection): failed to set TCP_MAXSEG on s=
ocket 21
> > 3.3224:          Flow 0 (TCP connection): Side 0 hash table insert: buc=
ket: 294539
> > 3.3225:          Flow 0 (TCP connection): TYPED -> ACTIVE
> > 3.3225:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 ->=
 [192.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> > 4.0027:          pasta: epoll event on namespace timer watch 17 (events=
: 0x00000001)
> > 4.3612:          pasta: epoll event on /dev/net/tun device 18 (events: =
0x00000001)
> > 4.3613:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 =
(1 packet)
> > 4.3613:          Flow 0 (TCP connection): packet length 40 from tap
> > 4.3613:          Flow 0 (TCP connection): TCP reset at tcp_tap_handler:=
1989
> > 4.3613:          Flow 0 (TCP connection): flag at tcp_prepare_flags:116=
3
> > 4.3613:          Flow 0 (TCP connection): event at tcp_rst_do:1206
> > 4.3613:          Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSED
> > 4.3614:          Flow 0 (TCP connection): Side 0 hash table remove: buc=
ket: 294539
> > 4.3614:          Flow 0 (FREE): ACTIVE -> FREE
> > 4.3614:          Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.0.0.=
1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> >=20
> >    ...the retry happened within one second. This is a container, so Lin=
ux
> >    kernel, and the client was wget. =20
>=20
> I'm not seeing a retry at all in this log, plus it's an outbound
> connection, which is not the case we're dealing with here.

It's two SYN segments from a guest (yes, an outbound connection):

3.3224:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p=
acket)

4.3613:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 p=
acket)

that's a retry and that's all I wanted to show: the typical timing you
get from Linux.

--=20
Stefano