On Fri, Oct 24, 2025 at 10:37:17AM +0200, Stefano Brivio wrote: > On Fri, 24 Oct 2025 14:30:09 +1100 > David Gibson wrote: > > On Fri, Oct 24, 2025 at 01:04:31AM +0200, Stefano Brivio wrote: > > > On Fri, 17 Oct 2025 14:28:37 +0800 > > > Yumei Huang wrote: [snip] > > > > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > > > > tcp_timer_ctl(c, conn); > > > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > > > if (!(conn->events & ESTABLISHED)) { > > > > - flow_dbg(conn, "handshake timeout"); > > > > - tcp_rst(c, conn); > > > > + if (conn->retries >= TCP_MAX_RETRIES || > > > > + conn->retries >= (c->tcp.tcp_syn_retries + > > > > + c->tcp.syn_linear_timeouts)) { > > > > + flow_dbg(conn, "handshake timeout"); > > > > + tcp_rst(c, conn); > > > > + } else { > > > > + flow_trace(conn, "SYN timeout, retry"); > > > > + tcp_send_flag(c, conn, SYN); > > > > + conn->retries++; > > > > > > I think I already raised this point on a previous revision: this needs > > > to be zeroed as the connection is established, but I don't see that in > > > the current version. > > > > Yes, you raised that, but then I realised it's already handled. I > > think I put that in the thread, not just direct to Yumei, but maybe > > not? Or it just got lost in the minutiae. > > Yes, here: > > https://archives.passt.top/passt-dev/aOxFRfJjPWy0ZW0M@zatzit > > this is another example of what I meant about (potential) advantages of > a fully threaded (email) workflow. > > In this case, I didn't review v2, which came before you could post this > to my comment on v1, but in a normal case, we could have settled this > earlier, once for all. Ah, right, that'd do it. > > When we receive a SYN-ACK, it will have th->ack_seq advanced a byte > > acknowledging the SYN. tcp_tap_handler() calls > > tcp_update_seqack_from_tap() in the !ESTABLISHED case which will see > > the new ack_seq and clear retries (retrans before this series). > > It doesn't look obvious at all to me. Oh, it's definitely not obvious, but I'm pretty confident it's correct. Fwiw, I spotted this because I thought the explicit handling in v2 wasn't at quite the right point logically (though close enough to be fine in practice). I went looking for the precise right point - when we receive the SYN-ACK - and there it was, already handled. It does make a kind of logical sense. The RFCs don't generally treat SYN (or SYN-ACK, or FIN) retransmits any differently from data retransmits. We do treat them differently, but less so after this series, which is a good thing, I think. > We're unlikely to break it in the future, so I don't think it's fragile > in the long term, but... can one of you double check that it's actually > the case with a manual one-off test? Yeah, I guess that's wise. Easiest way is probably to add a temporary debug message here, and try it against a qemu guest that's temporarily suspended. Yumei, I can walk you through this, too. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson