On Fri, Oct 24, 2025 at 10:37:17AM +0200, Stefano Brivio wrote:
> On Fri, 24 Oct 2025 14:30:09 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> > On Fri, Oct 24, 2025 at 01:04:31AM +0200, Stefano Brivio wrote:
> > > On Fri, 17 Oct 2025 14:28:37 +0800
> > > Yumei Huang <yuhuang@redhat.com> wrote:
[snip]
> > > > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > > >  		tcp_timer_ctl(c, conn);
> > > >  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > >  		if (!(conn->events & ESTABLISHED)) {
> > > > -			flow_dbg(conn, "handshake timeout");
> > > > -			tcp_rst(c, conn);
> > > > +			if (conn->retries >= TCP_MAX_RETRIES ||
> > > > +			    conn->retries >= (c->tcp.tcp_syn_retries +
> > > > +					      c->tcp.syn_linear_timeouts)) {
> > > > +				flow_dbg(conn, "handshake timeout");
> > > > +				tcp_rst(c, conn);
> > > > +			} else {
> > > > +				flow_trace(conn, "SYN timeout, retry");
> > > > +				tcp_send_flag(c, conn, SYN);
> > > > +				conn->retries++;  
> > > 
> > > I think I already raised this point on a previous revision: this needs
> > > to be zeroed as the connection is established, but I don't see that in
> > > the current version.  
> > 
> > Yes, you raised that, but then I realised it's already handled.  I
> > think I put that in the thread, not just direct to Yumei, but maybe
> > not?  Or it just got lost in the minutiae.
> 
> Yes, here:
> 
>   https://archives.passt.top/passt-dev/aOxFRfJjPWy0ZW0M@zatzit
> 
> this is another example of what I meant about (potential) advantages of
> a fully threaded (email) workflow.
> 
> In this case, I didn't review v2, which came before you could post this
> to my comment on v1, but in a normal case, we could have settled this
> earlier, once for all.

Ah, right, that'd do it.

> > When we receive a SYN-ACK, it will have th->ack_seq advanced a byte
> > acknowledging the SYN.  tcp_tap_handler() calls
> > tcp_update_seqack_from_tap() in the !ESTABLISHED case which will see
> > the new ack_seq and clear retries (retrans before this series).
> 
> It doesn't look obvious at all to me.

Oh, it's definitely not obvious, but I'm pretty confident it's
correct.  Fwiw, I spotted this because I thought the explicit handling
in v2 wasn't at quite the right point logically (though close enough
to be fine in practice).  I went looking for the precise right point -
when we receive the SYN-ACK - and there it was, already handled.

It does make a kind of logical sense.  The RFCs don't generally treat
SYN (or SYN-ACK, or FIN) retransmits any differently from data
retransmits.  We do treat them differently, but less so after this
series, which is a good thing, I think.

> We're unlikely to break it in the future, so I don't think it's fragile
> in the long term, but... can one of you double check that it's actually
> the case with a manual one-off test?

Yeah, I guess that's wise.  Easiest way is probably to add a temporary
debug message here, and try it against a qemu guest that's temporarily
suspended. Yumei, I can walk you through this, too.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson