On Mon, Sep 22, 2025 at 10:03:30PM +0200, Stefano Brivio wrote:
> On Mon, 22 Sep 2025 15:17:12 +0800
> Yumei Huang <yuhuang@redhat.com> wrote:
> > On Fri, Sep 19, 2025 at 9:38 AM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > > On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote:  
> > > > On Thu, 18 Sep 2025 14:28:37 +1000
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
[snip]
> > > > Does it work to cover situations where users might start passt a bit
> > > > before the guest connects, and try to connect to services right away?
> > > >
> > > > I suggested using ssh which should have a quite long timeout and retry
> > > > connecting for a while. You mentioned you would assist Yumei in testing
> > > > this if needed.  
> > >
> > > Ah, yes, you're right and I'd forgotten that.  Following up today.  
> > 
> > I tried both 'ssh' and 'socat'(writing a big file) before a guest
> > connects, they get a 'Connection reset' after 10s, even if the guest
> > connects in ~2s.
> > It's because, when start ssh or socat, passt would try to finish the
> > tcp handshake with the guest. It sends SYN to the guest immediately
> > and waits for SYN-ACK. However, the SYN frame is dropped/lost due to
> > no guest connected. So though the guest connects in seconds, the tcp
> > handshake would timeout, and returns rst via tcp_rst().
> 
> Ah, right. We won't try to resend the SYN, that's simply not
> implemented.
> 
> The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() and
> handled by tcp_timer_handler().
> 
> > Either with or without this patch, they got the same 'connection reset'.
> > Maybe it's something to fix?
> 
> First off, this shows that the current patch is harmless, so I would go
> ahead and apply it (but see 2. below).
> 
> Strictly speaking, I don't think we really *need* to fix anything, but
> for sure the behaviour isn't ideal. I see two alternatives:
> 
> 1. we implement a periodic retry for the SYN segment. This would *seem*
>    to give the best behaviour in this case, but:
> 
>    a. it's quite complicated (we need to calculate some delays for the
>       timers, etc.), and not really transparent (which is in general a
>       goal of passt)

I'm not really sure why you say it's not transparent, or at least what
other option you're comparing it to.  The peer has initiated a
connection to us in the normal way (which may include resending SYNs).
Now we're initiating a connection to the guest in the normal way
(which may include resending SYNs).

>    b. if the guest never appears, we're just wasting client's time. See
>       db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even for
>       client in SYN-SENT state") for an example where it's important to
>       fail fast

Sure.  I'd say RSTing here would be *less* transparent, but it might
still be worth it to make the peer fail fast.

>    c. if the guest appears but isn't listening to the port, see b.
> 
> 2. reset right away as I was suggesting in
>    https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabeth/:
> 
>    > We could mitigate that by making the TCP handler aware of this, and by
>    > resetting the connection if the guest isn't there. This would at least
>    > be consistent with the case where the guest isn't listening on the port
>    > (we accept(), fail to connect to it, eventually call tcp_rst()).
> 
>    and let the client retry as appropriate (if implemented). Those retries
>    can be quite fast, see this report (from IRC) for 722d347c1932 ("tcp:
>    Don't reset outbound connection on SYN retries"):

I don't see how that commit is relevant to this situation.  That's
talking about SYN retries.  We can see those in the case of outbound
connections bot we'll never see them for the case of inbound
connections, because the host kernel has already completed the
handshake.  For inbound we essentially have two options:

 a) Retry SYNs ourselves, emulating what the peer would do if it was
    talking directly to an absent guest.
 b) Reject SYNs quickly, trusting that the guest will have some sort of
    application level retry.  That will depend on the client.  I guess
    my fear here is that a client seeing a completed handshake + RST
    might assume that the guest server is permanently broken, rather
    than just temporarily missing as it might if there's no response at
    all.

I suggested Yumei's approach here to aim for (a) on the basis of
transparency - it's as close as I think we can get to a bridged guest
that's just missing.  I'm not necessarily opposed to (b), but I think
it's less transparent, so we need an argument that it will lead to
better outcomes regardless.

> 3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
> 3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
> 3.3224:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 packet)
> 3.3224:          Flow 0 (NEW): FREE -> NEW
> 3.3224:          Flow 0 (INI): NEW -> INI
> 3.3224:          Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => ?
> 3.3224:          Flow 0 (TGT): INI -> TGT
> 3.3224:          Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> 3.3224:          Flow 0 (TCP connection): TGT -> TYPED
> 3.3224:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> 3.3224:          Flow 0 (TCP connection): event at tcp_conn_from_tap:1489
> 3.3224:          Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
> 3.3224:          Flow 0 (TCP connection): failed to set TCP_MAXSEG on socket 21
> 3.3224:          Flow 0 (TCP connection): Side 0 hash table insert: bucket: 294539
> 3.3225:          Flow 0 (TCP connection): TYPED -> ACTIVE
> 3.3225:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> 4.0027:          pasta: epoll event on namespace timer watch 17 (events: 0x00000001)
> 4.3612:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
> 4.3613:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 packet)
> 4.3613:          Flow 0 (TCP connection): packet length 40 from tap
> 4.3613:          Flow 0 (TCP connection): TCP reset at tcp_tap_handler:1989
> 4.3613:          Flow 0 (TCP connection): flag at tcp_prepare_flags:1163
> 4.3613:          Flow 0 (TCP connection): event at tcp_rst_do:1206
> 4.3613:          Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSED
> 4.3614:          Flow 0 (TCP connection): Side 0 hash table remove: bucket: 294539
> 4.3614:          Flow 0 (FREE): ACTIVE -> FREE
> 4.3614:          Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
> 
>    ...the retry happened within one second. This is a container, so Linux
>    kernel, and the client was wget.

I'm not seeing a retry at all in this log, plus it's an outbound
connection, which is not the case we're dealing with here.

> So, in the end, I would suggest going with 2.: check if the guest /
> container is connected in the TCP handler (tcp_data_from_sock()) and
> reset the connection if it's not.
> 
> I would suggest checking that together with this patch. They would
> still be two different patches, but I think it would be good to
> check / test what happens with both of them.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson