From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=hcHtuYIq; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 3095D5A0262 for ; Mon, 20 Apr 2026 04:48:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1776653283; bh=LMJoNCenJzzh31KNKjgJANFSUxYfwOz4v8pPOF6QzGY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hcHtuYIqFNYfNuePAiur3DZMFzbqSRm2/uryUNulirtDxRD5aKNqgvttYKZUXklC2 DPo/dPDUeWbaYWPnp1kGtLNanYSMka5IEshzSmTMpY0kr2TsFtZF3ZyBqQy48O+lxW juErrj/voJ9qkQHiVSArTKkAwGnWloU/m3ybLIjdYm5JR7+2yNNVCvxHNH14o7JRRj nUeb2S/3hg8WLyUjSnLruQPD9d+zYEaLsQQdGAgvJHC6LIt+rmN8h+gCbw8IDEqRBn v+qoo5G3g/OIXnyse5CKSchwYeqZeJ5h5N4mNfbTzO0R/nAD/2pnQ95ibmK3EVUU+h ot/gsXCR54YbQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4fzVKH1XMpz4wCm; Mon, 20 Apr 2026 12:48:03 +1000 (AEST) Date: Mon, 20 Apr 2026 12:47:58 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v1] tcp: Handle errors from tcp_send_flag() Message-ID: References: <20260410075539.1566421-1-anskuma@redhat.com> <20260415213827.39495072@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="uBysIMzppXm/Qaw1" Content-Disposition: inline In-Reply-To: <20260415213827.39495072@elisabeth> Message-ID-Hash: CKAP4HRDTI5SDJAY4U6XU7T554WOMRAF X-Message-ID-Hash: CKAP4HRDTI5SDJAY4U6XU7T554WOMRAF X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Anshu Kumari , passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --uBysIMzppXm/Qaw1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 15, 2026 at 09:38:28PM +0200, Stefano Brivio wrote: > Nit: v1 in the subject tag is not necessary (not harmful either): if > there are no version tags, it's implicit we're talking about version 1. >=20 > On Fri, 10 Apr 2026 13:25:39 +0530 > Anshu Kumari wrote: >=20 > > tcp_send_flag() can return error codes from tcp_prepare_flags() > > failing TCP_INFO, or from failure to collect buffers on the > > vhost-user path. These errors indicate the connection requires > > resetting. > >=20 > > Most callers of tcp_send_flag() were ignoring the error code and > > carrying on as if nothing was wrong. Check the return value at > > each call site and handle the error appropriately: > > - in tcp_data_from_tap(), return -1 so the caller resets > > - in tcp_tap_handler(), goto reset > > - in tcp_timer_handler()/tcp_sock_handler()/tcp_conn_from_sock_finish= (), > > call tcp_rst() and return > > - in tcp_tap_conn_from_sock(), set CLOSING flag (flow not yet active) > > - in tcp_keepalive(), call tcp_rst() and continue the loop > > - in tcp_flow_migrate_target_ext(), goto fail > >=20 > > The call in tcp_rst_do() is left unchecked: we are already > > resetting, and tcp_sock_rst() still needs to run regardless. > >=20 > > Bug: https://bugs.passt.top/show_bug.cgi?id=3D194 >=20 > Nit: we always use Link: tags (CONTRIBUTING.md uses the plural which > might be a bit confusing, I guess we should fix that), rationale: >=20 > https://archives.passt.top/passt-dev/20230704132104.48106368@elisabeth/ > https://archives.passt.top/passt-dev/20251105163137.424a6537@elisabeth/ >=20 > But I fix up these tags on merge anyway, no need to re-send (in > general). >=20 > > Signed-off-by: Anshu Kumari > > --- > > tcp.c | 59 ++++++++++++++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 44 insertions(+), 15 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index 8ea9be8..9ce671a 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1917,7 +1917,9 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > > "keep-alive sequence: %u, previous: %u", > > seq, conn->seq_from_tap); > > =20 > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) > > + return -1; >=20 > A general comment: in _some_ of these cases where we fail to send ACK > segments, I intentionally didn't check for errors and let the > connection live on, because that looked like the most graceful failure > handling to me. >=20 > After all, ACK segments without data are not assumed to be reliably > transmitted (RFC 9293, 3.8.4), so, given that failing to send some > should have a similar outcome as the peer missing some, I guess we're > always expected to recover from a situation like that. >=20 > This doesn't apply to other occurrences below where we fail to send a > SYN segment or where failure to send ACK segments might mean we are in > some expected state (including a connection that might get stuck > forever). >=20 > But reading David's description of bug #194, I wonder if he had > something else in mind. That is, I don't have a strong preference > against resetting the connection whenever we fail to prepare buffers, > but in many of these cases we don't really _have to_ reset the > connection. David, do you see this differently? That's a good point, which I didn't think about when I reviewed. I don't recall if I'd considered it when I filed the bug. So, in principle, you're right; failing to send an ack is equivalent to losing an ack en route, which generally shouldn't cause an immediate reset of the connection in most cases. But... that's complicated by the details of what errors can actually occur here. AFAICT there are only three cases: 1) tcp_vu_send_flag() returns -1 because vu_collect() returned 0 IIUC this means we ran out of buffers. We probably shouldn't reset in this case. 2) tcp_vu_send_flag() returns -1 because vu_collect() returned > 1 This means the guest buffer layout wasn't what we expected. We probably shouldn't reset in this case, but I believe it will go away with Laurent's pending multi-buffer vu patches anyway. So we can probably ignore this one. [Also, we according to docs and other usage we should return an errno in both these cases, not a bare -1] 3) tcp_prepare_flags() returns -ECONNRESET This only happens if TCP_INFO fails. In this case we *should* reset - in fact we already set CLOSED in the connection events. Essentially, we've concluded the socket side is irretrievably broken, so we should shut down the tap side as well. So I think we have two options here: A) tcp_send_flag() reports only "fatal" errors Simpler to do, but possibly confusing. We'd need to A1) Change tcp_vu_send_flag() not to report an error if vu_collect() returns 0. AIUI, this case is more or less equivalent to filling the tap socket queue, and we don't report that case as an error in tcp_buf_send_flag(). tcp_payload_flush() reverts sequences in this case, but doesn't report the error any further. A2) That leaves only the TCP_INFO failure case, so we can and should reset on any failure from tcp_send_flag(), just like this draft B) tcp_send_flag() reports both transient and fatal errors More complex, but maybe less surprising semantics? tcp_send_flag() would need to use different return codes for the different cases (maybe ECONNRESET vs. EBUSY?) B1) For consistency we should propagate failures from tcp_payload_flush(), through tcp_buf_send_flag() to its callers, B2) We should only reset if tcp_send_flag() returns -ECONNRESET > > + > > tcp_timer_ctl(c, conn); > > =20 > > if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, > > @@ -2043,14 +2045,16 @@ eintr: > > * Then swiftly looked away and left. > > */ > > conn->seq_from_tap =3D seq_from_tap; > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) > > + return -1; > > } > > =20 > > if (errno =3D=3D EINTR) > > goto eintr; > > =20 > > if (errno =3D=3D EAGAIN || errno =3D=3D EWOULDBLOCK) { > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > + return -1; > > return p->count - idx; > > =20 > > } > > @@ -2070,7 +2074,8 @@ out: > > */ > > if (conn->seq_dup_ack_approx !=3D (conn->seq_from_tap & 0xff)) { > > conn->seq_dup_ack_approx =3D conn->seq_from_tap & 0xff; > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > + return -1; > > } > > return p->count - idx; > > } > > @@ -2084,7 +2089,8 @@ out: > > =20 > > conn_event(c, conn, TAP_FIN_RCVD); > > } else { > > - tcp_send_flag(c, conn, ACK_IF_NEEDED); > > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) > > + return -1; > > } > > =20 > > return p->count - idx; > > @@ -2122,7 +2128,10 @@ static void tcp_conn_from_sock_finish(const stru= ct ctx *c, > > return; > > } > > =20 > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) { > > + tcp_rst(c, conn); > > + return; > > + } > > =20 > > /* The client might have sent data already, which we didn't > > * dequeue waiting for SYN,ACK from tap -- check now. > > @@ -2308,7 +2317,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t = pif, sa_family_t af, > > goto reset; > > } > > =20 > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) > > + goto reset; > > + > > conn_event(c, conn, SOCK_FIN_SENT); > > =20 > > return 1; > > @@ -2388,7 +2399,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t = pif, sa_family_t af, > > } > > =20 > > conn_event(c, conn, SOCK_FIN_SENT); > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) > > + goto reset; > > + > > ack_due =3D 0; > > =20 > > /* If we received a FIN, but the socket is in TCP_ESTABLISHED > > @@ -2478,7 +2491,11 @@ static void tcp_tap_conn_from_sock(const struct = ctx *c, union flow *flow, > > =20 > > conn->wnd_from_tap =3D WINDOW_DEFAULT; > > =20 > > - tcp_send_flag(c, conn, SYN); > > + if (tcp_send_flag(c, conn, SYN)) { > > + conn_flag(c, conn, CLOSING); >=20 > I would wait for David to confirm, but I'm fairly sure that this needs > FLOW_ACTIVATE(conn); before returning, just like in the other error path > of this function, because otherwise we'll leave the newly created flow > in an "incomplete" state. Ah, yes, I missed that. > Due to flow table restrictions we adopted to keep the implementation > simple (see "Theory of Operation - allocating and freeing flow entries" > in flow.c), quoting from the documentation to enum flow_state in > flow.h: >=20 > * Caveats: > * - At most one entry may be NEW, INI, TGT or TYPED at a time= , so > * it's unsafe to use flow_alloc() again until this entry mo= ves to > * ACTIVE or FREE >=20 > so, if we create a second connection within the same epoll cycle (for > example by calling tcp_tap_conn_from_sock() again), we'll now have two > entries in state TYPED, which breaks this assumption, and things will Exactly right. > David, I think this isn't documented very obviously, even though it's > all there in flow.h. This just occurred to me because of commit > 52419a64f2df ("migrate, tcp: Don't flow_alloc_cancel() during incoming > migration") but we can't expect others to know about past commits. I agree it's not as clear as I'd like... > I wonder if you could think of a quick way to make this more prominent... > should we perhaps state return conditions in functions, like you already > added for isolation.c? =2E.. but I'm not really sure how to make it more prominent in a useful way. Maybe a note in the function header would do the trick? I'm not sure. >=20 > > + return; > > + } > > + > > conn_flag(c, conn, ACK_FROM_TAP_DUE); > > =20 > > tcp_get_sndbuf(conn); > > @@ -2585,7 +2602,10 @@ void tcp_timer_handler(const struct ctx *c, unio= n epoll_ref ref) > > return; > > =20 > > if (conn->flags & ACK_TO_TAP_DUE) { > > - tcp_send_flag(c, conn, ACK_IF_NEEDED); > > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) { > > + tcp_rst(c, conn); > > + return; > > + } > > tcp_timer_ctl(c, conn); > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > if (!(conn->events & ESTABLISHED)) { > > @@ -2598,7 +2618,10 @@ void tcp_timer_handler(const struct ctx *c, unio= n epoll_ref ref) > > tcp_rst(c, conn); > > } else { > > flow_trace(conn, "SYN timeout, retry"); > > - tcp_send_flag(c, conn, SYN); > > + if (tcp_send_flag(c, conn, SYN)) { > > + tcp_rst(c, conn); > > + return; > > + } > > conn->retries++; > > conn_flag(c, conn, SYN_RETRIED); > > tcp_timer_ctl(c, conn); > > @@ -2662,8 +2685,11 @@ void tcp_sock_handler(const struct ctx *c, union= epoll_ref ref, > > tcp_data_from_sock(c, conn); > > =20 > > if (events & EPOLLOUT) { > > - if (tcp_update_seqack_wnd(c, conn, false, NULL)) > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_update_seqack_wnd(c, conn, false, NULL) && > > + tcp_send_flag(c, conn, ACK)) { > > + tcp_rst(c, conn); > > + return; > > + } > > } > > =20 > > return; > > @@ -2903,7 +2929,8 @@ static void tcp_keepalive(struct ctx *c, const st= ruct timespec *now) > > if (conn->tap_inactive) { > > flow_dbg(conn, "No tap activity for least %us, send keepalive", > > KEEPALIVE_INTERVAL); > > - tcp_send_flag(c, conn, KEEPALIVE); > > + if (tcp_send_flag(c, conn, KEEPALIVE)) > > + tcp_rst(c, conn); > > } > > =20 > > /* Ready to check fot next interval */ > > @@ -3926,7 +3953,9 @@ int tcp_flow_migrate_target_ext(struct ctx *c, st= ruct tcp_tap_conn *conn, int fd > > if (tcp_set_peek_offset(conn, peek_offset)) > > goto fail; > > =20 > > - tcp_send_flag(c, conn, ACK); > > + if (tcp_send_flag(c, conn, ACK)) > > + goto fail; > > + > > tcp_data_from_sock(c, conn); > > =20 > > if ((rc =3D tcp_epoll_ctl(conn))) { >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --uBysIMzppXm/Qaw1 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmnlk90ACgkQzQJF27ox 2GcT3w//eBHbNhR3MeGVMj6XL3n1lIW/QkzBlAYuaLQ6wYW7WZnEeOKi/QVlcEF0 sk0gmVJFvHH1bajT1HZgaVqMihl9LEAHOoVr2cKGtfibL5x64SJkglIeto0qzUQM DC7x4ylGlsne2D5J6LeNBehFks2ka/E86NYreZHj1/wVT2sc/Vt0m1NGPOP5ZdiP ZbQHZjLppZISAhD9hP3/5fzC7IjNbzACkhqDM9qCvCUwmT1N8hD/Xi7RTJJ6nFnr 6MRsOenTpReWJeo1/nHqDn/+qxXI9UF6rZfLS3WOL1OPlrxDJyLwBr7vEUywiTo7 78BBt0zU0DZINcyiprNpntdhKPPn51msOg05Lo9lwkoXBvsr4iZ+/Jpv1ST+UyzJ emHlw0W9wD0cMg1JRBEDh09r0JtV8jJg/4sX26v82A8KWh5sHTigey2kFSpaGqNW k58J357xcLfAQ16yVtgoovdRsnQoU2ZaOYW3VeibBnfeV/dzqc7Zm61yMqRd0XMg JACx5jYl6Q0f5MOSLPNf8qC1x6H6PLnW4tpKZdK6V1nVLlEmVsPK1eitQNbjLgse F/GApiOd4QIGWvzmJG/QHndIOo5JoVXcWuM5U8RVr1C+Ekm+47pihAbXLHEvPNqC jFdAzBdvluQ1eMmtBIY2cj/TC+G43rGlVQ8uTRORhsuOAb0Vq+Y= =Maao -----END PGP SIGNATURE----- --uBysIMzppXm/Qaw1--