From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VyjhZ+ND; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id B0B095A0279 for ; Wed, 10 Sep 2025 11:57:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757498251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ClaIvN0tYz3iMRoc7mD0m3tPVNKkWr3yifuq3VCgONY=; b=VyjhZ+NDOCTmj6uc7xPV2hCy7P/1OupZzZu7LA7fV/jT15vRBTOL8cw2GGifv7ovGrNNXN RQoWXJUMTM6WRn36TJ7s/4AosHjaXqXaeHFulmIM49ckUD0kGiFIQBY12SoNZrJy9RBKYN mO5tuji6cSV/GwHwPxooQdzTdheR5vY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-104-8KRijKuFNPi_d1UrVFZDLw-1; Wed, 10 Sep 2025 05:57:30 -0400 X-MC-Unique: 8KRijKuFNPi_d1UrVFZDLw-1 X-Mimecast-MFC-AGG-ID: 8KRijKuFNPi_d1UrVFZDLw_1757498249 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-45dd9a66c3fso12420405e9.1 for ; Wed, 10 Sep 2025 02:57:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757498249; x=1758103049; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ClaIvN0tYz3iMRoc7mD0m3tPVNKkWr3yifuq3VCgONY=; b=wmzJC1ip1e9E3xXYChtSLAyBEH3oJVZk4gNDp2h5WGAKolDJ30m0HejFNGG1EsB0eZ KhuJqQaaasSf8L04YNvkXFwqMqoBkQWFCBNK+q7Pv1QPKDQGMziGeFhN78c/5dvX/1KS cAc+LbkwXb2KbXK8uyXk7PI3wVa8cmlf9/8VTk3kLEMlhKhfj5xv0xx1Qr0w/9fCIS9R 6gPjB50uNbd/Tdt0TVViYtDYD2v1ndnMuQrVbp3Br+vNYYaXmafE4ypUrLcIVBYnDGtn UPSHT2hHsusfTGwVm7E15olLng5HwNBRIsWEC92QomUvlT6diC/yYGFSWiOwXWuMmVPD txhg== X-Gm-Message-State: AOJu0YyQLvlBT1QdjvMwUrfMzE4JcEPkiZyVUmPB4bVsgq5qvkrKYJPH AquN1n5LANejicrX4QP5kWypt+aF898/qeIN8iV3d2gqTh1z7jWaeTkIoQhNeYi333CnkJBcQ5m 11IIOe8CLw93WXwVjl7l7o77hdNjcILYlk6b9j9ngt8/z3Z4ab2tdsw== X-Gm-Gg: ASbGncuoqqfZ5m6wSiDEUgXaM9xE2M+Jzfgb+qjjB9Nnl+gSZpqxKoUHUlKFdTt+tec DbSGZ3W3dm/L+4/HIpzWQ7S4bM/3M1fQIVW8lVIUJBAGwffJLnrOgh0R4fDxJl42PCXB05G9lrx DR1VvUEPPb8KBa4ret+NsmHinQ9U0G32C1BpArbf+4jZauGANp90o94WWKPYE1By0RsApEAlrMQ wcViTPQs+waw8jCwPLIOonNF75Xyx75tkc23WnuiCZTe97oKhQt0plEFbyL/irUPLNbJzISoAZk KaYgBwhM0htW/lwzmvWXmdzUy+g4V1UKr8RJEWVkPRmFB95wbCA= X-Received: by 2002:a05:600c:3b21:b0:45d:cff6:733f with SMTP id 5b1f17b1804b1-45dddeb8ee7mr137465295e9.11.1757498248997; Wed, 10 Sep 2025 02:57:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE2joF0nY9iZOFnB01wAWo2kbSjSI32lC6GJubjK/dCDUevS3G2wQ+SZSxktiBy1JU+l4rHMA== X-Received: by 2002:a05:600c:3b21:b0:45d:cff6:733f with SMTP id 5b1f17b1804b1-45dddeb8ee7mr137465045e9.11.1757498248491; Wed, 10 Sep 2025 02:57:28 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-45df8252036sm21182055e9.20.2025.09.10.02.57.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Sep 2025 02:57:27 -0700 (PDT) Date: Wed, 10 Sep 2025 11:57:26 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v4 7/8] tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent Message-ID: <20250910115726.432bbb8d@elisabeth> In-Reply-To: References: <20250909181655.2990223-1-sbrivio@redhat.com> <20250909181655.2990223-8-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: w7hG8htOGlyF2yuWY9BA_fMsO8bC-P5ipWeSA8F1qo4_1757498249 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: PJY4BECLEBVZOGWG7ESRTRBYYV2Z4XJV X-Message-ID-Hash: PJY4BECLEBVZOGWG7ESRTRBYYV2Z4XJV X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 10 Sep 2025 12:27:12 +1000 David Gibson wrote: > On Tue, Sep 09, 2025 at 08:16:54PM +0200, Stefano Brivio wrote: > > We currently have a number of discrepancies in the tcp_tap_handler() > > path between the half-closed connection path and the regular one, and > > they are mostly a result of code duplication, which comes in turn from > > the fact that tcp_data_from_tap() deals with data transfers as well as > > general connection bookkeeping, so we can't use it for half-closed > > connections. > > > > This suggests that we should probably rework it into two or more > > functions, in the long term, but for the moment being I'm just fixing > > one obvious issue, which is the lack of fast retransmissions in the > > TAP_FIN_RCVD path, and a potential one, which is the fact we don't > > handle socket flush failures. > > > > Add fast re-transmit for half-closed connections, and handle the case > > of socket flush (tcp_sock_consume()) flush failure in the same way as > > tcp_data_from_tap() handles it. > > > > Signed-off-by: Stefano Brivio > > Reviewed-by: David Gibson > > > --- > > tcp.c | 42 +++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 39 insertions(+), 3 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index 9c70a25..5163dbf 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1652,6 +1652,23 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > > return tcp_buf_data_from_sock(c, conn); > > } > > > > +/** > > + * tcp_packet_data_len() - Get data (TCP payload) length for a TCP packet > > + * @th: Pointer to TCP header > > + * @l4len: TCP packet length, including TCP header > > + * > > + * Return: data length of TCP packet, -1 on invalid value of Data Offset field > > + */ > > +static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len) > > +{ > > + size_t off = th->doff * 4UL; > > + > > + if (off < sizeof(*th) || off > l4len) > > + return -1; > > + > > + return l4len - off; > > +} > > + > > /** > > * tcp_data_from_tap() - tap/guest data for established connection > > * @c: Execution context > > @@ -2113,9 +2130,28 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > > > > /* Established connections not accepting data from tap */ > > if (conn->events & TAP_FIN_RCVD) { > > - tcp_sock_consume(conn, ntohl(th->ack_seq)); > > - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > > - if (tcp_tap_window_update(c, conn, ntohs(th->window))) > > + bool retr; > > + > > + retr = th->ack && !tcp_packet_data_len(th, l4len) && !th->fin && > > Not really in scope here, but I wonder if we should log an error > and/or RST if we get a non-zero data length in this situation. According to RFC 9293 we should ignore data (note: not data segments) in this case, see 3.10.7.4 "Other states": [...] Seventh, process the segment text: [...] CLOSE-WAIT STATE This should not occur since a FIN has been received from the remote side. Ignore the segment text. https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.7.2.7.1 We could add a debug() message perhaps (in a further patch), but I don't think we are allowed to reset the connection. > > + ntohl(th->ack_seq) == conn->seq_ack_from_tap && > > + ntohs(th->window) == conn->wnd_from_tap; > > + > > + /* On socket flush failure, pretend there was no ACK, try again > > + * later > > + */ > > + if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq))) > > + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > > + > > + if (retr) { > > + flow_trace(conn, > > + "fast re-transmit, ACK: %u, previous sequence: %u", > > + ntohl(th->ack_seq), conn->seq_to_tap); > > + > > + if (tcp_rewind_seq(c, conn)) > > + return -1; > > + } > > + > > + if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr) > > tcp_data_from_sock(c, conn); > > > > if (conn->seq_ack_from_tap == conn->seq_to_tap) { > > -- > > 2.43.0 -- Stefano