From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Oae/A7Ug; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id C57D65A0262 for ; Fri, 12 Jun 2026 18:55:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1781283322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YDoAK/1iHDZOKT1CQHm6F890tb6wysdj5OIGsTaWQ3o=; b=Oae/A7UgNHqnp/DWkK5g9MGNjYg+4tbNdVbik/krGl59Wa5v47uAX1Yb+sw2BEyIOLlbY5 w1MNyyBFyuciKBX5c74RbNNj9cXEFO+H8BTi7bpQrhoDyeKwXU3CiR85+gmrdlFEoyxRrz 9wJfFFxT+0RRGt8zl/QCyCpfPyDRXag= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-442-2aMUUJxmP9mQIgD40uNGtQ-1; Fri, 12 Jun 2026 12:55:21 -0400 X-MC-Unique: 2aMUUJxmP9mQIgD40uNGtQ-1 X-Mimecast-MFC-AGG-ID: 2aMUUJxmP9mQIgD40uNGtQ_1781283320 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-490bae3a39bso11553365e9.1 for ; Fri, 12 Jun 2026 09:55:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781283320; x=1781888120; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YDoAK/1iHDZOKT1CQHm6F890tb6wysdj5OIGsTaWQ3o=; b=IgLY8c5QX7XOxm1yfx4k59E5zq/B5FuUZWhOJ83oMvY7BGnwIYv4aErzmB+lUCdve/ 5FrtDkpGW2PL/25bquikm7p/36aOMwBegVtGfQGgP0q7PV+jKoRZ2iHL/rjH7f+euFBd MPQl/5t3OoJ9R2u+D9lcdY8FFHT9yMIzdVwPmI1t3YETm0mbSUhRrHeJdyKS1f5bltkK kSdJhTmcmXfTHfoSXpDmzS1Y5fRD3QSmCkgBNd5gioc8Idq4ddeYRe4SorUPlxTKY4jF 3mAmFNhh4luijUyvPrIyHZ4KXRNnRJecA9xnfWjiKWdkgp7F4HTfXzbgfqca1dbZwvoK zajw== X-Gm-Message-State: AOJu0YwEGg2WFhIqGVNTpdC2qvw3Id4XiGzqgO8nAc2lhT84i4QLfNln PrToY8FVsQtO2RZyXC3X7v6E24femHYZ6c0xPE07lweLjIVljGoJIiHhJznbpyiX2d7PqfRkfwC zNMCCFB4Ij/3JtVf/NNP89VQ+aVQ21Io8SquzmUjAxg0TPKDtRY3cfHcVhaMsDA== X-Gm-Gg: Acq92OHl4MCRPs/BPUnzPzSjkoqZs1nlSz0VX0VaW8jrPDEUD3UrfJInKQwo+YHICIp D9vUL8fPc6XLKCKBGvqHLdoioHw2K2xT1V3THvBQD5ELrsDdKT3ZBorqMn1Z1jifJRYG6/s34Fs d10YZW6dmmwija4+OCRqTdaA8650/mJ+9wTuVsLQEtPd6zyPKzm+9uNNcDr5TN0Jg6zDvLP/6xc bMom2iBzWmGEecYcXw/l6jRVKdI4zscTyR31FWyjIcS5jVzsvha7JiyLGqFYLgG/bsJquaB+8Hi 1VZDu//fqvQ7hoz038gsKbsZ5YwBgMyMX0e1sMPAIv9GNAQmumNLoPrS+07aetr3FFsX3CNXXGP ovKZdywdx7uyZndN0jVHEXMEiTxOT9ZX1x3D6wI3N/xlsKfGrBg== X-Received: by 2002:a05:600c:8718:b0:490:bccf:2bd6 with SMTP id 5b1f17b1804b1-490ec4eac52mr49024255e9.15.1781283319864; Fri, 12 Jun 2026 09:55:19 -0700 (PDT) X-Received: by 2002:a05:600c:8718:b0:490:bccf:2bd6 with SMTP id 5b1f17b1804b1-490ec4eac52mr49023825e9.15.1781283319224; Fri, 12 Jun 2026 09:55:19 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490ea9563b8sm52260425e9.2.2026.06.12.09.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jun 2026 09:55:18 -0700 (PDT) From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding Message-ID: <20260612185517.2349a94c@elisabeth> In-Reply-To: <20260612181841.40e698e7@elisabeth> References: <20260520130851.436931-1-david@gibson.dropbear.id.au> <20260520130851.436931-4-david@gibson.dropbear.id.au> <20260520222851.19e5f430@elisabeth> <20260612181841.40e698e7@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Fri, 12 Jun 2026 18:55:18 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: KUGngcwnFJ309vVzUstaAvO1SyQMTznw8_NxWbAlFTQ_1781283320 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: RYUKIJ7IJ7KEFUVIS2BF6R4RF3WTXGWG X-Message-ID-Hash: RYUKIJ7IJ7KEFUVIS2BF6R4RF3WTXGWG X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Paul Holzinger , Anshu Kumari X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 12 Jun 2026 18:18:41 +0200 Stefano Brivio wrote: > On Thu, 21 May 2026 10:50:24 +1000 > David Gibson wrote: > > > On Wed, May 20, 2026 at 10:28:52PM +0200, Stefano Brivio wrote: > > > Ah, yes, it looks better now. Three remarks: > > > > > > On Wed, 20 May 2026 23:08:48 +1000 > > > David Gibson wrote: > > > > > > > Splice forwarding can be blocked either waiting for data from one side > > > > or waiting for space on the other. For that reason, > > > > tcp_splice_sock_handler() on either socket can forward data in either or > > > > both directions, depending on whether we have EPOLLIN, EPOLLOUT or both > > > > events. > > > > > > > > The flow control for this is quite hard to follow though, since we forward > > > > in one direction, then sometimes loop back with a goto to do it in the > > > > other direction. Simplify this by adding a tcp_splice_forward() function > > > > with the logic to forward in one direction and calling it either once or > > > > twice from tcp_splice_sock_handler(). > > > > > > > > Signed-off-by: David Gibson > > > > --- > > > > tcp_splice.c | 137 ++++++++++++++++++++++++++------------------------- > > > > 1 file changed, 71 insertions(+), 66 deletions(-) > > > > > > > > diff --git a/tcp_splice.c b/tcp_splice.c > > > > index 34ffea73..18e8b303 100644 > > > > --- a/tcp_splice.c > > > > +++ b/tcp_splice.c > > > > @@ -474,67 +474,20 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) > > > > } > > > > > > > > /** > > > > - * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection > > > > + * tcp_splice_forward() - Forward data in one direction using splice() > > > > * @c: Execution context > > > > - * @ref: epoll reference > > > > - * @events: epoll events bitmap > > > > + * @conn: Connection to forward data for > > > > + * @fromsidei: Side to forward data from > > > > * > > > > * #syscalls:pasta splice > > > > */ > > > > -void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > > > > - uint32_t events) > > > > +static int tcp_splice_forward(struct ctx *c, struct > > > > + tcp_splice_conn *conn, unsigned fromsidei) > > > > > > I think the struct > > > argument should all be on the same line. > > > > Oops, definitely. Forgot to document the return value too. > > > > > > { > > > > - struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside); > > > > - unsigned evsidei = ref.flowside.sidei, fromsidei; > > > > - uint8_t lowat_set_flag, lowat_act_flag; > > > > - int eof, never_read; > > > > - > > > > - assert(conn->f.type == FLOW_TCP_SPLICE); > > > > - > > > > - if (conn->events == SPLICE_CLOSED) > > > > - return; > > > > - > > > > - if (events & EPOLLERR) { > > > > - int err, rc; > > > > - socklen_t sl = sizeof(err); > > > > - > > > > - rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl); > > > > - if (rc) > > > > - flow_perror(conn, "Error retrieving SO_ERROR"); > > > > - else > > > > - flow_dbg(conn, "Error event on %s socket: %s", > > > > - pif_name(conn->f.pif[evsidei]), > > > > - strerror_(err)); > > > > - goto reset; > > > > - } > > > > - > > > > - if (conn->events == SPLICE_CONNECT) { > > > > - if (!(events & EPOLLOUT)) { > > > > - flow_err(conn, "Unexpected events 0x%x during connect", > > > > - events); > > > > - goto reset; > > > > - } > > > > - if (tcp_splice_connect_finish(c, conn)) > > > > - goto reset; > > > > - } > > > > - > > > > - if (events & EPOLLOUT) { > > > > - fromsidei = !evsidei; > > > > - conn_event(conn, ~OUT_WAIT(evsidei)); > > > > - } else { > > > > - fromsidei = evsidei; > > > > - } > > > > - > > > > - if (events & EPOLLRDHUP) > > > > - /* For side 0 this is fake, but implied */ > > > > - conn_event(conn, FIN_RCVD(evsidei)); > > > > - > > > > -swap: > > > > - eof = 0; > > > > - never_read = 1; > > > > - > > > > - lowat_set_flag = RCVLOWAT_SET(fromsidei); > > > > - lowat_act_flag = RCVLOWAT_ACT(fromsidei); > > > > + uint8_t lowat_set_flag = RCVLOWAT_SET(fromsidei); > > > > + uint8_t lowat_act_flag = RCVLOWAT_ACT(fromsidei); > > > > + int never_read = 1; > > > > + int eof = 0; > > > > > > > > while (1) { > > > > ssize_t readlen, written, pending; > > > > @@ -551,7 +504,7 @@ retry: > > > > if (readlen < 0 && errno != EAGAIN) { > > > > flow_perror(conn, "Splicing from %s socket", > > > > pif_name(conn->f.pif[fromsidei])); > > > > - goto reset; > > > > + return -1; > > > > } > > > > > > > > flow_trace(conn, "%zi from read-side call", readlen); > > > > @@ -578,7 +531,7 @@ retry: > > > > if (written < 0 && errno != EAGAIN) { > > > > flow_perror(conn, "Splicing to %s socket", > > > > pif_name(conn->f.pif[!fromsidei])); > > > > - goto reset; > > > > + return -1; > > > > } > > > > > > > > flow_trace(conn, "%zi from write-side call (passed %zi)", > > > > @@ -639,24 +592,76 @@ retry: > > > > if (shutdown(conn->s[!sidei], SHUT_WR) < 0) { > > > > flow_perror(conn, "shutdown() on %s", > > > > pif_name(conn->f.pif[!sidei])); > > > > - goto reset; > > > > + return -1; > > > > } > > > > conn_event(conn, FIN_SENT(!sidei)); > > > > } > > > > } > > > > } > > > > > > > > - if (CONN_HAS(conn, FIN_SENT(0) | FIN_SENT(1))) { > > > > - /* Clean close, no reset */ > > > > - conn_flag(conn, CLOSING); > > > > + return 0; > > > > +} > > > > + > > > > +/** > > > > + * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection > > > > + * @c: Execution context > > > > + * @ref: epoll reference > > > > + * @events: epoll events bitmap > > > > + */ > > > > +void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > > > > + uint32_t events) > > > > +{ > > > > + struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside); > > > > + unsigned evsidei = ref.flowside.sidei; > > > > + > > > > + assert(conn->f.type == FLOW_TCP_SPLICE); > > > > + > > > > + if (conn->events == SPLICE_CLOSED) > > > > return; > > > > + > > > > + if (events & EPOLLERR) { > > > > + int err, rc; > > > > + socklen_t sl = sizeof(err); > > > > + > > > > + rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl); > > > > + if (rc) > > > > + flow_perror(conn, "Error retrieving SO_ERROR"); > > > > + else > > > > + flow_dbg(conn, "Error event on %s socket: %s", > > > > + pif_name(conn->f.pif[evsidei]), > > > > + strerror_(err)); > > > > + goto reset; > > > > + } > > > > + > > > > + if (conn->events == SPLICE_CONNECT) { > > > > + if (!(events & EPOLLOUT)) { > > > > + flow_err(conn, "Unexpected events 0x%x during connect", > > > > + events); > > > > + goto reset; > > > > + } > > > > + if (tcp_splice_connect_finish(c, conn)) > > > > + goto reset; > > > > + } > > > > + > > > > + if (events & EPOLLRDHUP) > > > > + /* For side 0 this is fake, but implied */ > > > > + conn_event(conn, FIN_RCVD(evsidei)); > > > > > > I saw this all goes away in 5/6, so it wouldn't be relevant. But in > > > case we decide to drop 5/6, here are my remarks on the this. > > > > > > EPOLLRDHUP is now handled before checking the other direction of the > > > connection in case of EPOLLOUT. > > > > I'm pretty sure that hasn't changed. In the old code EPOLLRDHUP > > handling was before we did any of the actual data handling for EPOLLIN > > or EPOLLOUT. > > Well, kind of, in the sense that it's true we did that before any data > handling, but we had two checks: > > if (conn->events == SPLICE_CLOSED) > return; > > [...] > > if (conn->events == SPLICE_CONNECT) { > if (!(events & EPOLLOUT)) { > [...] > goto reset; > } > if (tcp_splice_connect_finish(c, conn)) > goto reset; > } > > based on conn->events _before_ setting FIN_RCVD(evsidei). > > Now, that should never be relevant for SPLICE_CLOSED. I'm not sure about > SPLICE_CONNECT, what if we get EPOLLRDHUP right away as we are > re-establishing a connection? I need to look into that, but I wasn't > able to see any difference in behaviour so far. Ah, never mind, those checks are all now part of tcp_splice_sock_handler() anyway. So, right, there should be no functional change in the end. > What I really missed here is: > > > > I think it actually makes more sense this way because we update flags > > > with everything we know until that point, and it shouldn't have a > > > functional effect (the check at the end of the new tcp_splice_forward() > > > is on FIN_RCVD(fromsidei)), but I'm raising that in case the change > > > wasn't intended. > > > > > > > + > > > > + if (events & EPOLLOUT) { > > > > + if (tcp_splice_forward(c, conn, !evsidei)) > > > > + goto reset; > > > > + conn_event(conn, ~OUT_WAIT(evsidei)); > > ^^^ > > this swap, which caused https://bugs.passt.top/show_bug.cgi?id=207. > > Earlier, we had: > > if (events & EPOLLOUT) { > fromsidei = !evsidei; > conn_event(conn, ~OUT_WAIT(evsidei)); > } ... > > and then the rest of what's now tcp_splice_forward(). > > If we clear OUT_WAIT *later*, even if tcp_splice_forward() decides to keep > it after processing an EPOLLOUT event, we'll miss events. > > It turns out that 4ccb2eebaa02 ("tcp_splice: Simplify / correct OUT_WAIT > flag handling") fixes this. I'm still checking whether the fix is complete > though. I'm fairly convinced it is, because with that commit we don't care anymore whether EPOLLOUT was set to begin with and just set OUT_WAIT as obviously needed. Reading that again just reminded me that I actually spotted this but I saw it was going away in 5/6 of this revision of the series... and then I forgot to write that down. :( Somewhat funnily, that commit message says: -- We clear the OUT_WAIT flag when we complete forwarding on an EPOLLOUT event, but that's not quite right. Even though it's called on an EPOLLOUT, tcp_splice_forward() could, in principle empty the pipe, but also read enough new data from the other side to fill it again. That would set OUT_WAIT internally, but it would be cleared after returning meaning we could miss a necessary wakeup. -- ...well, yes, but that problem didn't exist before. :) Anyway, tagging a new release now. -- Stefano