From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=E3b/RuC/; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 1ED365A0269 for ; Tue, 21 Apr 2026 06:23:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1776745415; bh=31xScKlmdL/6AIE3Aif9HI2kG1fQDcAAsqDbPqKkKXs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=E3b/RuC/U6Yvbyk/hhQ+cQKDxWIfEqvOYJuoW1jzkpBLKoCo9jB3pFCPM32f0gWbb 08LhlsQ+KoSVJa0L3OlMcP69GAbpM/LoSKe3unOpxglS/k9G2SSbbkaI5zVTvgvPJn ohNc6raBaVrev0MPvRgRnbFPeVOVz2NPCgz8+E0l18FLFO4rP5HJoq1TjJ+S94zall UuOqwFiNHXbaIIUMYBVJ/9D7yxN8yX6u73BJYfyqov8S+pRT4lRXfaOjU0//vwhVYr 6/mA9HRLNC0SEVB5Gz8vHWqBLwyq/WkFQe5A/wVm0nQKnmnecdU7z8G+mKUedZg9+C 2ef4JvjtUTW0Q== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4g08P315CXz4wDx; Tue, 21 Apr 2026 14:23:35 +1000 (AEST) Date: Tue, 21 Apr 2026 14:23:29 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v1] tcp: Handle errors from tcp_send_flag() Message-ID: References: <20260410075539.1566421-1-anskuma@redhat.com> <20260415213827.39495072@elisabeth> <20260421014059.4dccee9e@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="9sOXeN/Y+CymvZI6" Content-Disposition: inline In-Reply-To: <20260421014059.4dccee9e@elisabeth> Message-ID-Hash: O3WQ3XPSKYYMC2BHG4S5HIFNH33BREWZ X-Message-ID-Hash: O3WQ3XPSKYYMC2BHG4S5HIFNH33BREWZ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Anshu Kumari , passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --9sOXeN/Y+CymvZI6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Apr 21, 2026 at 01:41:03AM +0200, Stefano Brivio wrote: > On Mon, 20 Apr 2026 12:47:58 +1000 > David Gibson wrote: >=20 > > On Wed, Apr 15, 2026 at 09:38:28PM +0200, Stefano Brivio wrote: > > > Nit: v1 in the subject tag is not necessary (not harmful either): if > > > there are no version tags, it's implicit we're talking about version = 1. > > >=20 > > > On Fri, 10 Apr 2026 13:25:39 +0530 > > > Anshu Kumari wrote: > > > =20 > > > > tcp_send_flag() can return error codes from tcp_prepare_flags() > > > > failing TCP_INFO, or from failure to collect buffers on the > > > > vhost-user path. These errors indicate the connection requires > > > > resetting. > > > >=20 > > > > Most callers of tcp_send_flag() were ignoring the error code and > > > > carrying on as if nothing was wrong. Check the return value at > > > > each call site and handle the error appropriately: > > > > - in tcp_data_from_tap(), return -1 so the caller resets > > > > - in tcp_tap_handler(), goto reset > > > > - in tcp_timer_handler()/tcp_sock_handler()/tcp_conn_from_sock_fi= nish(), > > > > call tcp_rst() and return > > > > - in tcp_tap_conn_from_sock(), set CLOSING flag (flow not yet act= ive) > > > > - in tcp_keepalive(), call tcp_rst() and continue the loop > > > > - in tcp_flow_migrate_target_ext(), goto fail > > > >=20 > > > > The call in tcp_rst_do() is left unchecked: we are already > > > > resetting, and tcp_sock_rst() still needs to run regardless. > > > >=20 > > > > Bug: https://bugs.passt.top/show_bug.cgi?id=3D194 =20 > > >=20 > > > Nit: we always use Link: tags (CONTRIBUTING.md uses the plural which > > > might be a bit confusing, I guess we should fix that), rationale: > > >=20 > > > https://archives.passt.top/passt-dev/20230704132104.48106368@elisab= eth/ > > > https://archives.passt.top/passt-dev/20251105163137.424a6537@elisab= eth/ > > >=20 > > > But I fix up these tags on merge anyway, no need to re-send (in > > > general). > > > =20 > > > > Signed-off-by: Anshu Kumari > > > > --- > > > > tcp.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------= ---- > > > > 1 file changed, 44 insertions(+), 15 deletions(-) > > > >=20 > > > > diff --git a/tcp.c b/tcp.c > > > > index 8ea9be8..9ce671a 100644 > > > > --- a/tcp.c > > > > +++ b/tcp.c > > > > @@ -1917,7 +1917,9 @@ static int tcp_data_from_tap(const struct ctx= *c, struct tcp_tap_conn *conn, > > > > "keep-alive sequence: %u, previous: %u", > > > > seq, conn->seq_from_tap); > > > > =20 > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + return -1; =20 > > >=20 > > > A general comment: in _some_ of these cases where we fail to send ACK > > > segments, I intentionally didn't check for errors and let the > > > connection live on, because that looked like the most graceful failure > > > handling to me. > > >=20 > > > After all, ACK segments without data are not assumed to be reliably > > > transmitted (RFC 9293, 3.8.4), so, given that failing to send some > > > should have a similar outcome as the peer missing some, I guess we're > > > always expected to recover from a situation like that. > > >=20 > > > This doesn't apply to other occurrences below where we fail to send a > > > SYN segment or where failure to send ACK segments might mean we are in > > > some expected state (including a connection that might get stuck > > > forever). > > >=20 > > > But reading David's description of bug #194, I wonder if he had > > > something else in mind. That is, I don't have a strong preference > > > against resetting the connection whenever we fail to prepare buffers, > > > but in many of these cases we don't really _have to_ reset the > > > connection. David, do you see this differently? =20 > >=20 > > That's a good point, which I didn't think about when I reviewed. I > > don't recall if I'd considered it when I filed the bug. > >=20 > > So, in principle, you're right; failing to send an ack is equivalent > > to losing an ack en route, which generally shouldn't cause an > > immediate reset of the connection in most cases. > >=20 > > But... that's complicated by the details of what errors can actually > > occur here. AFAICT there are only three cases: > >=20 > > 1) tcp_vu_send_flag() returns -1 because vu_collect() returned 0 > >=20 > > IIUC this means we ran out of buffers. We probably shouldn't reset in > > this case. >=20 > Right. >=20 > > 2) tcp_vu_send_flag() returns -1 because vu_collect() returned > 1 > >=20 > > This means the guest buffer layout wasn't what we expected. We > > probably shouldn't reset in this case, but I believe it will go away > > with Laurent's pending multi-buffer vu patches anyway. So we can > > probably ignore this one. >=20 > Yes, that will go away in a bit, so we can ignore it. >=20 > > [Also, we according to docs and other usage we should return an errno > > in both these cases, not a bare -1] > >=20 > > 3) tcp_prepare_flags() returns -ECONNRESET > >=20 > > This only happens if TCP_INFO fails. In this case we *should* reset - > > in fact we already set CLOSED in the connection events. Essentially, > > we've concluded the socket side is irretrievably broken, so we should > > shut down the tap side as well. >=20 > Agreed. >=20 > > So I think we have two options here: > >=20 > > A) tcp_send_flag() reports only "fatal" errors > >=20 > > Simpler to do, but possibly confusing. We'd need to >=20 > Rather confusing for me indeed. >=20 > > A1) Change tcp_vu_send_flag() not to report an error if > > vu_collect() returns 0. AIUI, this case is more or less > > equivalent to filling the tap socket queue, and we don't > > report that case as an error in tcp_buf_send_flag(). > > tcp_payload_flush() reverts sequences in this case, but > > doesn't report the error any further. > >=20 > > A2) That leaves only the TCP_INFO failure case, so we can and > > should reset on any failure from tcp_send_flag(), just like > > this draft >=20 > This looks like a stretch and almost a layering violation to me, that > is, we would adapt the semantics of tcp_send_flag() to the caller > instead of making it return what naturally makes sense. Yeah, maybe. > > B) tcp_send_flag() reports both transient and fatal errors > >=20 > > More complex, but maybe less surprising semantics? >=20 > Definitely less surprising I think. >=20 > > tcp_send_flag() > > would need to use different return codes for the different cases > > (maybe ECONNRESET vs. EBUSY?) >=20 > Or... if we run out of buffers, and we're sending an ACK, "try again": > -EAGAIN. Ah, yes EAGAIN is a better choice than EBUSY. > I was assuming that if we're sending a SYN that's a different case and > we should reset right away... but actually delivery of a SYN isn't > reliable either, and we have retries implemented with proper handling > of the case where we exceed the number of retries. >=20 > I think in general we should give ENOBUFS / -ENOBUFS (depending on the > specific convention of the given function) for functions that failed to > provide / reserve buffers, and -EAGAIN if we were trying to send an > ACK... or even just a SYN. It seems like a layering violation to me to peer into the contents of what we're sending to alter the return code. But I don't think we need to; I think it makes more sense to always send EAGAIN if the problem is something we expect to be transient - in practice, running out of space on the tap side, whether that's buffers in the VU ring or the tap socket's send buffer. If a caller really can't tolerate a delay in sending, it still has the option of aborting on an EAGAIN, but I don't see any reason we'd actually want that. > > B1) For consistency we should propagate failures from > > tcp_payload_flush(), through tcp_buf_send_flag() to its > > callers, > > B2) We should only reset if tcp_send_flag() returns -ECONNRESET >=20 > Or, at this point, on anything that's not -EAGAIN? Right. > > > > + > > > > tcp_timer_ctl(c, conn); > > > > =20 > > > > if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, > > > > @@ -2043,14 +2045,16 @@ eintr: > > > > * Then swiftly looked away and left. > > > > */ > > > > conn->seq_from_tap =3D seq_from_tap; > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + return -1; > > > > } > > > > =20 > > > > if (errno =3D=3D EINTR) > > > > goto eintr; > > > > =20 > > > > if (errno =3D=3D EAGAIN || errno =3D=3D EWOULDBLOCK) { > > > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > > > + return -1; > > > > return p->count - idx; > > > > =20 > > > > } > > > > @@ -2070,7 +2074,8 @@ out: > > > > */ > > > > if (conn->seq_dup_ack_approx !=3D (conn->seq_from_tap & 0xff)) { > > > > conn->seq_dup_ack_approx =3D conn->seq_from_tap & 0xff; > > > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > > > + return -1; > > > > } > > > > return p->count - idx; > > > > } > > > > @@ -2084,7 +2089,8 @@ out: > > > > =20 > > > > conn_event(c, conn, TAP_FIN_RCVD); > > > > } else { > > > > - tcp_send_flag(c, conn, ACK_IF_NEEDED); > > > > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) > > > > + return -1; > > > > } > > > > =20 > > > > return p->count - idx; > > > > @@ -2122,7 +2128,10 @@ static void tcp_conn_from_sock_finish(const = struct ctx *c, > > > > return; > > > > } > > > > =20 > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) { > > > > + tcp_rst(c, conn); > > > > + return; > > > > + } > > > > =20 > > > > /* The client might have sent data already, which we didn't > > > > * dequeue waiting for SYN,ACK from tap -- check now. > > > > @@ -2308,7 +2317,9 @@ int tcp_tap_handler(const struct ctx *c, uint= 8_t pif, sa_family_t af, > > > > goto reset; > > > > } > > > > =20 > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + goto reset; > > > > + > > > > conn_event(c, conn, SOCK_FIN_SENT); > > > > =20 > > > > return 1; > > > > @@ -2388,7 +2399,9 @@ int tcp_tap_handler(const struct ctx *c, uint= 8_t pif, sa_family_t af, > > > > } > > > > =20 > > > > conn_event(c, conn, SOCK_FIN_SENT); > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + goto reset; > > > > + > > > > ack_due =3D 0; > > > > =20 > > > > /* If we received a FIN, but the socket is in TCP_ESTABLISHED > > > > @@ -2478,7 +2491,11 @@ static void tcp_tap_conn_from_sock(const str= uct ctx *c, union flow *flow, > > > > =20 > > > > conn->wnd_from_tap =3D WINDOW_DEFAULT; > > > > =20 > > > > - tcp_send_flag(c, conn, SYN); > > > > + if (tcp_send_flag(c, conn, SYN)) { > > > > + conn_flag(c, conn, CLOSING); =20 > > >=20 > > > I would wait for David to confirm, but I'm fairly sure that this needs > > > FLOW_ACTIVATE(conn); before returning, just like in the other error p= ath > > > of this function, because otherwise we'll leave the newly created flow > > > in an "incomplete" state. =20 > >=20 > > Ah, yes, I missed that. > >=20 > > > Due to flow table restrictions we adopted to keep the implementation > > > simple (see "Theory of Operation - allocating and freeing flow entrie= s" > > > in flow.c), quoting from the documentation to enum flow_state in > > > flow.h: > > >=20 > > > * Caveats: > > > * - At most one entry may be NEW, INI, TGT or TYPED at a = time, so > > > * it's unsafe to use flow_alloc() again until this entr= y moves to > > > * ACTIVE or FREE > > >=20 > > > so, if we create a second connection within the same epoll cycle (for > > > example by calling tcp_tap_conn_from_sock() again), we'll now have two > > > entries in state TYPED, which breaks this assumption, and things will= =20 > >=20 > > Exactly right. > >=20 > > > David, I think this isn't documented very obviously, even though it's > > > all there in flow.h. This just occurred to me because of commit > > > 52419a64f2df ("migrate, tcp: Don't flow_alloc_cancel() during incoming > > > migration") but we can't expect others to know about past commits. = =20 > >=20 > > I agree it's not as clear as I'd like... > >=20 > > > I wonder if you could think of a quick way to make this more prominen= t... > > > should we perhaps state return conditions in functions, like you alre= ady > > > added for isolation.c? =20 > >=20 > > ... but I'm not really sure how to make it more prominent in a useful > > way. Maybe a note in the function header would do the trick? I'm not > > sure. >=20 > I don't think that's particularly helpful, because we have other > functions that would need that, and we might write more without > noticing. >=20 > I think we should, eventually, one day, have another look at the > documentation in flow.c and clearly state manipulation rules that are > otherwise only listed as comments to enums in flow.h. >=20 > That is, perhaps a new section with just those caveats, and in more > practical terms: instead of "at most one entry ...", something like > "make sure you call FLOW_ACTIVATE() after ...". >=20 > But regardless of that I don't really think it's a priority right now. > Let's keep this for later. I concur. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --9sOXeN/Y+CymvZI6 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmnm+8AACgkQzQJF27ox 2GcJ7Q//aGgeOdC+/c48cx2G66vC9qClB8J8J0mPWtGqMgBAkQsUbjA2fAM4eM4D olsgo4crVDjZJx7GF662jdF91BMKToWq1nrFsGZFuN/5AN4ia7EtGQUOX2NO6JEK g4rnY0mSQXdKMai/Gmo7R6AVJBz1MWWanvWKE+Eo9pMvxbjjMzKDQvHmeH2C5ZX7 Zq1zVpRNWzMYtMYAMBP3sFpTH2gfnsyEkkpLFT4O8qsm3FENQShrU57o3XC5rWvN tDY1XBQvTbaRN/T/N6ta3QPVDEF/UicQWldCbc3fZuJ5u5NgmyiZ9bia9ez3qt/K Ak1x15Z/DSNwVEIaVZZ0uV/y9K3JPE1smuJ8uHn7BbSSdXn0KfuY/7HQQjoXiAw8 m+a/dW29YUARfBUm+jY+AvTDKZ/mzyYxbkjS7uTxAsImP46X+CkSuy79TdiUTvSC 5WGB4hYawYdIHD5m4I8GoHqqTUDdhfVqcUMkA4vX4fJi58HIF1DQQlODADJwQkch Z1wSewYadZ8R6++bIuI0eqrSLoqojRlqfVJlqVRFdrAp5ZjhDtqY6xOXrX2SgUAX 0AqNMKRp7bUemNRl/zhsVUt/OF685f8ba+6VFpb7gjE2P5+xUY7OXUhENSXIYadt 4Lq7VBsySUp7PiES/usdfYUwVHMv2SD4FPKR6uPJgMuaGXYslQk= =Z+J1 -----END PGP SIGNATURE----- --9sOXeN/Y+CymvZI6--