On Thu, Nov 21, 2024 at 10:23:12AM +0100, Stefano Brivio wrote: > On Thu, 21 Nov 2024 17:21:12 +1100 > David Gibson wrote: > > > On Thu, Nov 21, 2024 at 05:26:17AM +0100, Stefano Brivio wrote: > > > On Thu, 21 Nov 2024 13:38:09 +1100 > > > David Gibson wrote: > > > > > > > On Wed, Nov 20, 2024 at 07:43:44AM +0100, Stefano Brivio wrote: > > > > > On Wed, 20 Nov 2024 12:02:00 +1100 > > > > > David Gibson wrote: > > > > > > > > > > > On Tue, Nov 19, 2024 at 08:53:44PM +0100, Stefano Brivio wrote: > > > > > > > RFC 9293, 3.8.4 says: > > > > > > > > > > > > > > Implementers MAY include "keep-alives" in their TCP implementations > > > > > > > (MAY-5), although this practice is not universally accepted. Some > > > > > > > TCP implementations, however, have included a keep-alive mechanism. > > > > > > > To confirm that an idle connection is still active, these > > > > > > > implementations send a probe segment designed to elicit a response > > > > > > > from the TCP peer. Such a segment generally contains SEG.SEQ = > > > > > > > SND.NXT-1 and may or may not contain one garbage octet of data. If > > > > > > > keep-alives are included, the application MUST be able to turn them > > > > > > > on or off for each TCP connection (MUST-24), and they MUST default to > > > > > > > off (MUST-25). > > > > > > > > > > > > > > but currently, tcp_data_from_tap() is not aware of this and will > > > > > > > schedule a fast re-transmit on the second keep-alive (because it's > > > > > > > also a duplicate ACK), ignoring the fact that the sequence number was > > > > > > > rewinded to SND.NXT-1. > > > > > > > > > > > > > > ACK these keep-alive segments, reset the activity timeout, and ignore > > > > > > > them for the rest. > > > > > > > > > > > > > > At some point, we could think of implementing an approximation of > > > > > > > keep-alive segments on outbound sockets, for example by setting > > > > > > > TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single > > > > > > > keep-alive segment at approximately the same time, and never reset the > > > > > > > connection. That's beyond the scope of this fix, though. > > > > > > > > > > > > > > Reported-by: Tim Besard > > > > > > > Link: https://github.com/containers/podman/discussions/24572 > > > > > > > Signed-off-by: Stefano Brivio > > > > > > > --- > > > > > > > tcp.c | 14 ++++++++++++++ > > > > > > > 1 file changed, 14 insertions(+) > > > > > > > > > > > > > > diff --git a/tcp.c b/tcp.c > > > > > > > index f357920..1eb85bb 100644 > > > > > > > --- a/tcp.c > > > > > > > +++ b/tcp.c > > > > > > > @@ -1763,6 +1763,20 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > > > > > continue; > > > > > > > > > > > > > > seq = ntohl(th->seq); > > > > > > > + if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) { > > > > > > > + flow_trace(conn, > > > > > > > + "keep-alive sequence: %u, previous: %u", > > > > > > > + seq, conn->seq_from_tap); > > > > > > > + > > > > > > > + tcp_send_flag(c, conn, ACK); > > > > > > > + tcp_timer_ctl(c, conn); > > > > > > > + > > > > > > > + if (p->count == 1) > > > > > > > + return 1; > > > > > > > > > > > > I'm not sure what this test is for. Shouldn't the continue be sufficient? > > > > > > > > > > I don't think we want to go through tcp_update_seqack_from_tap(), > > > > > tcp_tap_window_update() and the like on a keep-alive segment. > > > > > > > > Ah, I see. But that is an optimisation, right? It shouldn't be > > > > necessary for correctness. > > > > > > *Shouldn't*. > > > > > > > > But if we receive something else in this batch, that's going to be a > > > > > data segment that happened to arrive just after the keep-alive, so, in > > > > > that case, we have to do the normal processing, by ignoring just this > > > > > segment and hitting 'continue'. > > > > > > > > > > Strictly speaking, the 'continue' is enough and correct, but I think > > > > > that returning early in the obviously common case is simpler and more > > > > > robust. > > > > > > > > Hrm. Doesn't seem simpler to me, but I can see the point of the > > > > change so, > > > > > > The code itself is two lines longer, of course, with an additional > > > early return. Considering all the possible side effects of looking at > > > window values from a keep-alive segment looks to me more complicated > > > than the alternative, though. > > > > Except that we *will* consider them if there happen to be other data > > packets in the batch. > > Eh, yes, we have to: > > > > > > But if we receive something else in this batch, that's going to be a > > > > > data segment that happened to arrive just after the keep-alive, so, in > > > > > that case, we have to do the normal processing, by ignoring just this > > > > > segment and hitting 'continue'. > > but we'll use _those_ window values (because we 'continue' here). > > > That seems like it will just make any problems > > from processing the keepalive sequence values harder to track down, > > not make them go away. > > We tested the common case (perhaps we'll never get anything else) and > my priority would be to make _that_ robust, because it's what matters > to users. If we find the time to write a small keep-alive sending > program, then I would feel more confident to drop that additional > condition. Eh, fair enough. Reviewed-by: David Gibson -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson