From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=exqT1y/A; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id EFAAC5A026F for ; Wed, 24 Sep 2025 01:56:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758671774; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4SHIeDoAHeD2EcAfssUP/my+HP7cR9mO7e6P+j1931g=; b=exqT1y/AptPdQiPkI4q62yAAz3NI3X93srEAPhCZ+0mbKm4ZKq8dizbklXFJb12P5kTLq2 7N8nz6NOiE84PKabWfn8cN0GQQ4NxqTvKoeiNpfESCRx/NtGSYBOeVai1LRM+2zR346S/1 /Te3kuGI7zn/hbMHApIYk5JxTFmuofA= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-549-PBoepS8FPFuwffGeEUlqLg-1; Tue, 23 Sep 2025 19:56:13 -0400 X-MC-Unique: PBoepS8FPFuwffGeEUlqLg-1 X-Mimecast-MFC-AGG-ID: PBoepS8FPFuwffGeEUlqLg_1758671772 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-46b303f6c9cso25012245e9.2 for ; Tue, 23 Sep 2025 16:56:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758671772; x=1759276572; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5zM7DRvdMlLZOsDa4t+Vm+M+QIcNJ0hFfzfsu8tJNhQ=; b=Bzq5zcZNiKAEfcCrCbhihSKo/z0P/bj/ckjefQO9Hj6i7veotC/mnMUcrmmqXS0gP3 OYgtoIcqQ2MWpDh3vwca5FR3cfAAmmekYNsK5qC4QheUZHRw7nav+u617qHpMZdePKCW mJngVGwsUZq+tAXxc+bttZfO/8GthsM9FdqkzezKF7g0n/ZVilso9qb6TyBN/YkqJK5O PFU97JC1pw+BCkHudZk14o9P9J8l9WL2Xt06PO8CUxDMYHADQ5AL80gjvAXIH+6WKmus rFSZcjXHJus918qvNvVx8WFLqzP8N81F8BshCJp4tPttkemqwIDFjLoK33F4LjXuMBXx fygQ== X-Forwarded-Encrypted: i=1; AJvYcCXwA8Wimjqb0uvnZfDV4sHm/aJ1RNU4FZER/Mbz9aMKk2VTs4ADyrkr0sXCeFmdBOSXzqWkO8cy+l0=@passt.top X-Gm-Message-State: AOJu0YyOoyzs0jmY29MeARksDC09hGtNfeGymFlesbI5DLTwBONLw0xX RzxSNPTediCLVgI/9n7/ShkFTdHQIqdhkHE1rd7EkzuDy3j0FcSN3yw/NlsunDiTXBoOPCdhdIR i9SYoiqDZDJJGWtWYD+h+g9OuQWkPiPeOOQ8+19dJn2GCPz7n1+Yx1w== X-Gm-Gg: ASbGncvyYv//ayeo4PPGhQfamgYHKQaO8G61beEQwJb2ZWio1eoqUxKEPF0hxg5ut2x kH+nlMRm9zGCOaytF/3h3EZLthZhCa4Xc++w50th0X133imWsAwb9uDfUqUz9pz3cNEl6KJWieT FVdvlxrsSRaCUpKZbFihOh+gRoO4yZ49fbe53bUqpxnyEgB8moSW3G1oLl+wEj4y19myNFu3t4m P+bmK7puSJxYo04IpK7LQ4xij9SwJjMfodgx1N/mQ5FqaZRy63dCKE7dNz6/q7COXuW9tMyJWcz ikNxq/F9xtQ8hwQHeD2hO1s0Jkdgs7OyR9T9A6EpT5zgrO/m7c8= X-Received: by 2002:a05:600c:1390:b0:45f:2c89:a873 with SMTP id 5b1f17b1804b1-46e1dac904dmr43352745e9.35.1758671771869; Tue, 23 Sep 2025 16:56:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFXFAEEKMAFF8xawITFBrnGuSEUpPLFGGWNBQweVZssR3HwwoFDEpqTv+FVISoSo1mSyx9ww== X-Received: by 2002:a05:600c:1390:b0:45f:2c89:a873 with SMTP id 5b1f17b1804b1-46e1dac904dmr43352555e9.35.1758671771172; Tue, 23 Sep 2025 16:56:11 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46e2ab31ea3sm5684845e9.12.2025.09.23.16.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Sep 2025 16:56:10 -0700 (PDT) Date: Wed, 24 Sep 2025 01:56:09 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH] tap: Drop frames if no client connected Message-ID: <20250924015609.58c1987a@elisabeth> In-Reply-To: References: <20250911115425.79eaaac5@elisabeth> <20250915081319.00e72e53@elisabeth> <20250918091714.77192b00@elisabeth> <20250922220330.436e2b6f@elisabeth> <20250923130039.41e8ef8d@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: jtNG915NZcJ8zDh47xZhz_mkHr1ODg_R9FkH7fXKL4A_1758671772 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: WHDFH6TBH7GX4GDNPDWSXL52PKKUTF3Z X-Message-ID-Hash: WHDFH6TBH7GX4GDNPDWSXL52PKKUTF3Z X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top, lvivier@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 23 Sep 2025 21:26:24 +1000 David Gibson wrote: > On Tue, Sep 23, 2025 at 01:00:39PM +0200, Stefano Brivio wrote: > > On Tue, 23 Sep 2025 17:53:09 +1000 > > David Gibson wrote: > > =20 > > > On Mon, Sep 22, 2025 at 10:03:30PM +0200, Stefano Brivio wrote: =20 > > > > On Mon, 22 Sep 2025 15:17:12 +0800 > > > > Yumei Huang wrote: =20 > > > > > On Fri, Sep 19, 2025 at 9:38=E2=80=AFAM David Gibson > > > > > wrote: =20 > > > > > > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote:= =20 > > > > > > > On Thu, 18 Sep 2025 14:28:37 +1000 > > > > > > > David Gibson wrote: =20 > > > [snip] =20 > > > > > > > Does it work to cover situations where users might start pass= t a bit > > > > > > > before the guest connects, and try to connect to services rig= ht away? > > > > > > > > > > > > > > I suggested using ssh which should have a quite long timeout = and retry > > > > > > > connecting for a while. You mentioned you would assist Yumei = in testing > > > > > > > this if needed. =20 > > > > > > > > > > > > Ah, yes, you're right and I'd forgotten that. Following up tod= ay. =20 > > > > >=20 > > > > > I tried both 'ssh' and 'socat'(writing a big file) before a guest > > > > > connects, they get a 'Connection reset' after 10s, even if the gu= est > > > > > connects in ~2s. > > > > > It's because, when start ssh or socat, passt would try to finish = the > > > > > tcp handshake with the guest. It sends SYN to the guest immediate= ly > > > > > and waits for SYN-ACK. However, the SYN frame is dropped/lost due= to > > > > > no guest connected. So though the guest connects in seconds, the = tcp > > > > > handshake would timeout, and returns rst via tcp_rst(). =20 > > > >=20 > > > > Ah, right. We won't try to resend the SYN, that's simply not > > > > implemented. > > > >=20 > > > > The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() an= d > > > > handled by tcp_timer_handler(). > > > > =20 > > > > > Either with or without this patch, they got the same 'connection = reset'. > > > > > Maybe it's something to fix? =20 > > > >=20 > > > > First off, this shows that the current patch is harmless, so I woul= d go > > > > ahead and apply it (but see 2. below). > > > >=20 > > > > Strictly speaking, I don't think we really *need* to fix anything, = but > > > > for sure the behaviour isn't ideal. I see two alternatives: > > > >=20 > > > > 1. we implement a periodic retry for the SYN segment. This would *s= eem* > > > > to give the best behaviour in this case, but: > > > >=20 > > > > a. it's quite complicated (we need to calculate some delays for = the > > > > timers, etc.), and not really transparent (which is in genera= l a > > > > goal of passt) =20 > > >=20 > > > I'm not really sure why you say it's not transparent, or at least wha= t > > > other option you're comparing it to. The peer has initiated a > > > connection to us in the normal way (which may include resending SYNs)= . > > > Now we're initiating a connection to the guest in the normal way > > > (which may include resending SYNs). =20 > >=20 > > I was comparing this to b. or to doing nothing. > >=20 > > But, actually, you're right, the kernel wouldn't tell us about a > > repeated SYN, it would still be the same socket returned from accept(), > > so it's not necessarily less transparent. =20 >=20 > Not only can't it tell us about a repeated SYN, but there won't *be* a > repeated SYN, unless the host kernel's SYN-ACK gets lost. Hmm, wait, that was actually one of my original points about transparency, which I forgot later on: that means one SYN. So it would be more transparent to send one SYN. But that's irrelevant, see below. > > I was thinking that we know when the guest connects, so we could just > > delay the SYN segment until then, by introducing a separate TAP_SYN_SEN= T > > event (right now it's implicit in SOCK_ACCEPTED). But when the guest > > connects, services are typically not up yet. You would typically get a > > RST while the guest is booting. =20 >=20 > We could; I would have thought the timescale over which we expect a > guest to be attached would mean that resending SYNs on a timer would > achieve the same thing more simply (and handle other cases, too). Right, we would need a timer anyway, fair point. > Getting an RST during boot would be typical, but as we've seen in > Volker's case, this may not be a newly booting guest, but a > reconnecting guest (or pasta is backing a NIC being hotplugged into > the guest). In practice we would even be able to distinguish the two cases, because in case of a reconnection we'll get an ARP reply for the request Volker added recently, and otherwise we won't. > > > > b. if the guest never appears, we're just wasting client's time.= See > > > > db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even = for > > > > client in SYN-SENT state") for an example where it's importan= t to > > > > fail fast =20 > > >=20 > > > Sure. I'd say RSTing here would be *less* transparent, but it might > > > still be worth it to make the peer fail fast. =20 > >=20 > > But that's what happens naturally (with Linux) if nobody is listening, > > and in RFC 9293 terms, I'd say we should approximate a CLOSED state, > > 3.10.7.1: > >=20 > > If the state is CLOSED (i.e., TCB does not exist), then [...] [a]n > > incoming segment not containing a RST causes a RST to be sent in resp= onse. > >=20 > > rather than a LISTEN state (3.10.7.2). However, see below. =20 >=20 > Well, it depends on what physical model we're trying to emulate > here. My assumption was that we were trying to make this look like the > guest was off, or had its network cable unplugged. In which case we > want to just discard packets to the extent we can. Considering all the implications above, I'm not sure if this is more natural, but it start looking simpler and more compatible. > We could model an unconnected guest as a host with no listens, in > which case, yes, we should RST ASAP. That seems less natural to me. >=20 > > > > 2. reset right away as I was suggesting in > > > > https://archives.passt.top/passt-dev/20250915081319.00e72e53@eli= sabeth/: > > > > =20 > > > > > We could mitigate that by making the TCP handler aware of this= , and by > > > > > resetting the connection if the guest isn't there. This would = at least > > > > > be consistent with the case where the guest isn't listening on= the port > > > > > (we accept(), fail to connect to it, eventually call tcp_rst()= ). =20 > > > >=20 > > > > and let the client retry as appropriate (if implemented). Those = retries > > > > can be quite fast, see this report (from IRC) for 722d347c1932 (= "tcp: > > > > Don't reset outbound connection on SYN retries"): =20 > > >=20 > > > I don't see how that commit is relevant to this situation. That's > > > talking about SYN retries. =20 > >=20 > > That's just an example about how SYN segments are retried. It's not > > otherwise relevant for this situation. =20 >=20 > Sure, but again, SYNs *won't* be retried in this case, because the > host completes the handshake. >=20 > Well.. there could be retried SYNs, but the host kernel either won't > tell us about it (lost SYN-ACK) or doesn't know itself (lost SYN). >=20 > > > We can see those in the case of outbound > > > connections bot we'll never see them for the case of inbound > > > connections, because the host kernel has already completed the > > > handshake. For inbound we essentially have two options: > > >=20 > > > a) Retry SYNs ourselves, emulating what the peer would do if it was > > > talking directly to an absent guest. > > > b) Reject SYNs quickly, trusting that the guest will have some sort = of > > > application level retry. That will depend on the client. I gues= s > > > my fear here is that a client seeing a completed handshake + RST > > > might assume that the guest server is permanently broken, rather > > > than just temporarily missing as it might if there's no response = at > > > all. =20 >=20 > Fwiw, these two options basically come down to whether we're trying to > make a missing guest look like a machine that's off, or a machine > that's not listening. >=20 > > Oops, that's a detail I forgot: we complete the handshake and then > > reset... which brings us to https://bugs.passt.top/show_bug.cgi?id=3D13= 1. > >=20 > > Once that's implemented, perhaps it will be low effort to not listen() > > at all in that case. Right now, I'm not sure anymore. =20 >=20 > Delaying the listen() would make us more closely (pretty much > exactly?) resemble the case where we're pretending the guest is a > machine that's not listening. It doesn't get us closer to treating > the guest as a machine that's off or physicall disconnected. It would > just mean we get SYN -> RST instead of SYN -> SYN-ACK -> ACK -> RST. Which is much better. Well, I start thinking that SYN -> (nothing) would be even better in this case, but we can't, because at some point we have to close that socket. Right now we have SYN -> 10 seconds -> RST, as Yumei pointed out, with or without this patch. So I guess the best way forward would be...: > > On the other hand, with just this patch, we will reset the connection > > after 10 seconds (no matter what happens), which is just like this, but > > delayed. > > =20 > > > I suggested Yumei's approach here to aim for (a) on the basis of > > > transparency - it's as close as I think we can get to a bridged guest > > > that's just missing. I'm not necessarily opposed to (b), but I think > > > it's less transparent, so we need an argument that it will lead to > > > better outcomes regardless. =20 > >=20 > > Given the problem above, maybe we should really look into a) (but this > > patch doesn't do it). =20 >=20 > It doesn't, because we don't retry SYNs (which I hadn't realised when > I suggested it). Arguably we should do that anyway. It would be > extremely rare, but it's not impossible for our SYN to be really truly > lost on its way to the guest due to, say, a full buffer in the tap > device, or in the guest itself. ...retrying SYNs. We won't get to (host-side) SYN -> (nothing), but reasonably close to it, that is, SYN -> (long delay) -> RST. And if the guest appears meanwhile, it's fine anyway. > > Well, let me merge this, and other than that I would suggest looking > > into a) if time allows. > >=20 > > b) looks still slightly better than the current situation, because righ= t > > now we'll accept and RST after 10 seconds. So if time doesn't allow, > > let's settle for b) for the moment being? > > =20 > > > > 3.3223: pasta: epoll event on /dev/net/tun device 18 (even= ts: 0x00000001) > > > > 3.3223: pasta: epoll event on /dev/net/tun device 18 (even= ts: 0x00000001) > > > > 3.3224: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1= :80 (1 packet) > > > > 3.3224: Flow 0 (NEW): FREE -> NEW > > > > 3.3224: Flow 0 (INI): NEW -> INI > > > > 3.3224: Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0= .0.1]:80 =3D> ? > > > > 3.3224: Flow 0 (TGT): INI -> TGT > > > > 3.3224: Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0= .0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 > > > > 3.3224: Flow 0 (TCP connection): TGT -> TYPED > > > > 3.3224: Flow 0 (TCP connection): TAP [192.168.122.14]:5553= 2 -> [192.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 > > > > 3.3224: Flow 0 (TCP connection): event at tcp_conn_from_ta= p:1489 > > > > 3.3224: Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> S= YN_SENT > > > > 3.3224: Flow 0 (TCP connection): failed to set TCP_MAXSEG = on socket 21 > > > > 3.3224: Flow 0 (TCP connection): Side 0 hash table insert:= bucket: 294539 > > > > 3.3225: Flow 0 (TCP connection): TYPED -> ACTIVE > > > > 3.3225: Flow 0 (TCP connection): TAP [192.168.122.14]:5553= 2 -> [192.0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 > > > > 4.0027: pasta: epoll event on namespace timer watch 17 (ev= ents: 0x00000001) > > > > 4.3612: pasta: epoll event on /dev/net/tun device 18 (even= ts: 0x00000001) > > > > 4.3613: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1= :80 (1 packet) > > > > 4.3613: Flow 0 (TCP connection): packet length 40 from tap > > > > 4.3613: Flow 0 (TCP connection): TCP reset at tcp_tap_hand= ler:1989 > > > > 4.3613: Flow 0 (TCP connection): flag at tcp_prepare_flags= :1163 > > > > 4.3613: Flow 0 (TCP connection): event at tcp_rst_do:1206 > > > > 4.3613: Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSE= D > > > > 4.3614: Flow 0 (TCP connection): Side 0 hash table remove:= bucket: 294539 > > > > 4.3614: Flow 0 (FREE): ACTIVE -> FREE > > > > 4.3614: Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.= 0.0.1]:80 =3D> HOST [0.0.0.0]:0 -> [192.0.0.1]:80 > > > >=20 > > > > ...the retry happened within one second. This is a container, so= Linux > > > > kernel, and the client was wget. =20 > > >=20 > > > I'm not seeing a retry at all in this log, plus it's an outbound > > > connection, which is not the case we're dealing with here. =20 > >=20 > > It's two SYN segments from a guest (yes, an outbound connection): > >=20 > > 3.3224: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 = (1 packet) > >=20 > > 4.3613: tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 = (1 packet) > >=20 > > that's a retry and that's all I wanted to show: the typical timing you > > get from Linux. =20 >=20 > Ah, right, I see. Right, but again, irrelevant to this case since we > don't get / can't see the repeated SYNs. Well, again, we should do something like that (it also seems to be what you're suggesting). --=20 Stefano