On Tue, Apr 21, 2026 at 01:41:03AM +0200, Stefano Brivio wrote: > On Mon, 20 Apr 2026 12:47:58 +1000 > David Gibson wrote: > > > On Wed, Apr 15, 2026 at 09:38:28PM +0200, Stefano Brivio wrote: > > > Nit: v1 in the subject tag is not necessary (not harmful either): if > > > there are no version tags, it's implicit we're talking about version 1. > > > > > > On Fri, 10 Apr 2026 13:25:39 +0530 > > > Anshu Kumari wrote: > > > > > > > tcp_send_flag() can return error codes from tcp_prepare_flags() > > > > failing TCP_INFO, or from failure to collect buffers on the > > > > vhost-user path. These errors indicate the connection requires > > > > resetting. > > > > > > > > Most callers of tcp_send_flag() were ignoring the error code and > > > > carrying on as if nothing was wrong. Check the return value at > > > > each call site and handle the error appropriately: > > > > - in tcp_data_from_tap(), return -1 so the caller resets > > > > - in tcp_tap_handler(), goto reset > > > > - in tcp_timer_handler()/tcp_sock_handler()/tcp_conn_from_sock_finish(), > > > > call tcp_rst() and return > > > > - in tcp_tap_conn_from_sock(), set CLOSING flag (flow not yet active) > > > > - in tcp_keepalive(), call tcp_rst() and continue the loop > > > > - in tcp_flow_migrate_target_ext(), goto fail > > > > > > > > The call in tcp_rst_do() is left unchecked: we are already > > > > resetting, and tcp_sock_rst() still needs to run regardless. > > > > > > > > Bug: https://bugs.passt.top/show_bug.cgi?id=194 > > > > > > Nit: we always use Link: tags (CONTRIBUTING.md uses the plural which > > > might be a bit confusing, I guess we should fix that), rationale: > > > > > > https://archives.passt.top/passt-dev/20230704132104.48106368@elisabeth/ > > > https://archives.passt.top/passt-dev/20251105163137.424a6537@elisabeth/ > > > > > > But I fix up these tags on merge anyway, no need to re-send (in > > > general). > > > > > > > Signed-off-by: Anshu Kumari > > > > --- > > > > tcp.c | 59 ++++++++++++++++++++++++++++++++++++++++++++--------------- > > > > 1 file changed, 44 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/tcp.c b/tcp.c > > > > index 8ea9be8..9ce671a 100644 > > > > --- a/tcp.c > > > > +++ b/tcp.c > > > > @@ -1917,7 +1917,9 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > > "keep-alive sequence: %u, previous: %u", > > > > seq, conn->seq_from_tap); > > > > > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + return -1; > > > > > > A general comment: in _some_ of these cases where we fail to send ACK > > > segments, I intentionally didn't check for errors and let the > > > connection live on, because that looked like the most graceful failure > > > handling to me. > > > > > > After all, ACK segments without data are not assumed to be reliably > > > transmitted (RFC 9293, 3.8.4), so, given that failing to send some > > > should have a similar outcome as the peer missing some, I guess we're > > > always expected to recover from a situation like that. > > > > > > This doesn't apply to other occurrences below where we fail to send a > > > SYN segment or where failure to send ACK segments might mean we are in > > > some expected state (including a connection that might get stuck > > > forever). > > > > > > But reading David's description of bug #194, I wonder if he had > > > something else in mind. That is, I don't have a strong preference > > > against resetting the connection whenever we fail to prepare buffers, > > > but in many of these cases we don't really _have to_ reset the > > > connection. David, do you see this differently? > > > > That's a good point, which I didn't think about when I reviewed. I > > don't recall if I'd considered it when I filed the bug. > > > > So, in principle, you're right; failing to send an ack is equivalent > > to losing an ack en route, which generally shouldn't cause an > > immediate reset of the connection in most cases. > > > > But... that's complicated by the details of what errors can actually > > occur here. AFAICT there are only three cases: > > > > 1) tcp_vu_send_flag() returns -1 because vu_collect() returned 0 > > > > IIUC this means we ran out of buffers. We probably shouldn't reset in > > this case. > > Right. > > > 2) tcp_vu_send_flag() returns -1 because vu_collect() returned > 1 > > > > This means the guest buffer layout wasn't what we expected. We > > probably shouldn't reset in this case, but I believe it will go away > > with Laurent's pending multi-buffer vu patches anyway. So we can > > probably ignore this one. > > Yes, that will go away in a bit, so we can ignore it. > > > [Also, we according to docs and other usage we should return an errno > > in both these cases, not a bare -1] > > > > 3) tcp_prepare_flags() returns -ECONNRESET > > > > This only happens if TCP_INFO fails. In this case we *should* reset - > > in fact we already set CLOSED in the connection events. Essentially, > > we've concluded the socket side is irretrievably broken, so we should > > shut down the tap side as well. > > Agreed. > > > So I think we have two options here: > > > > A) tcp_send_flag() reports only "fatal" errors > > > > Simpler to do, but possibly confusing. We'd need to > > Rather confusing for me indeed. > > > A1) Change tcp_vu_send_flag() not to report an error if > > vu_collect() returns 0. AIUI, this case is more or less > > equivalent to filling the tap socket queue, and we don't > > report that case as an error in tcp_buf_send_flag(). > > tcp_payload_flush() reverts sequences in this case, but > > doesn't report the error any further. > > > > A2) That leaves only the TCP_INFO failure case, so we can and > > should reset on any failure from tcp_send_flag(), just like > > this draft > > This looks like a stretch and almost a layering violation to me, that > is, we would adapt the semantics of tcp_send_flag() to the caller > instead of making it return what naturally makes sense. Yeah, maybe. > > B) tcp_send_flag() reports both transient and fatal errors > > > > More complex, but maybe less surprising semantics? > > Definitely less surprising I think. > > > tcp_send_flag() > > would need to use different return codes for the different cases > > (maybe ECONNRESET vs. EBUSY?) > > Or... if we run out of buffers, and we're sending an ACK, "try again": > -EAGAIN. Ah, yes EAGAIN is a better choice than EBUSY. > I was assuming that if we're sending a SYN that's a different case and > we should reset right away... but actually delivery of a SYN isn't > reliable either, and we have retries implemented with proper handling > of the case where we exceed the number of retries. > > I think in general we should give ENOBUFS / -ENOBUFS (depending on the > specific convention of the given function) for functions that failed to > provide / reserve buffers, and -EAGAIN if we were trying to send an > ACK... or even just a SYN. It seems like a layering violation to me to peer into the contents of what we're sending to alter the return code. But I don't think we need to; I think it makes more sense to always send EAGAIN if the problem is something we expect to be transient - in practice, running out of space on the tap side, whether that's buffers in the VU ring or the tap socket's send buffer. If a caller really can't tolerate a delay in sending, it still has the option of aborting on an EAGAIN, but I don't see any reason we'd actually want that. > > B1) For consistency we should propagate failures from > > tcp_payload_flush(), through tcp_buf_send_flag() to its > > callers, > > B2) We should only reset if tcp_send_flag() returns -ECONNRESET > > Or, at this point, on anything that's not -EAGAIN? Right. > > > > + > > > > tcp_timer_ctl(c, conn); > > > > > > > > if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, > > > > @@ -2043,14 +2045,16 @@ eintr: > > > > * Then swiftly looked away and left. > > > > */ > > > > conn->seq_from_tap = seq_from_tap; > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + return -1; > > > > } > > > > > > > > if (errno == EINTR) > > > > goto eintr; > > > > > > > > if (errno == EAGAIN || errno == EWOULDBLOCK) { > > > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > > > + return -1; > > > > return p->count - idx; > > > > > > > > } > > > > @@ -2070,7 +2074,8 @@ out: > > > > */ > > > > if (conn->seq_dup_ack_approx != (conn->seq_from_tap & 0xff)) { > > > > conn->seq_dup_ack_approx = conn->seq_from_tap & 0xff; > > > > - tcp_send_flag(c, conn, ACK | DUP_ACK); > > > > + if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > > > > + return -1; > > > > } > > > > return p->count - idx; > > > > } > > > > @@ -2084,7 +2089,8 @@ out: > > > > > > > > conn_event(c, conn, TAP_FIN_RCVD); > > > > } else { > > > > - tcp_send_flag(c, conn, ACK_IF_NEEDED); > > > > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) > > > > + return -1; > > > > } > > > > > > > > return p->count - idx; > > > > @@ -2122,7 +2128,10 @@ static void tcp_conn_from_sock_finish(const struct ctx *c, > > > > return; > > > > } > > > > > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) { > > > > + tcp_rst(c, conn); > > > > + return; > > > > + } > > > > > > > > /* The client might have sent data already, which we didn't > > > > * dequeue waiting for SYN,ACK from tap -- check now. > > > > @@ -2308,7 +2317,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > > > > goto reset; > > > > } > > > > > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + goto reset; > > > > + > > > > conn_event(c, conn, SOCK_FIN_SENT); > > > > > > > > return 1; > > > > @@ -2388,7 +2399,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > > > > } > > > > > > > > conn_event(c, conn, SOCK_FIN_SENT); > > > > - tcp_send_flag(c, conn, ACK); > > > > + if (tcp_send_flag(c, conn, ACK)) > > > > + goto reset; > > > > + > > > > ack_due = 0; > > > > > > > > /* If we received a FIN, but the socket is in TCP_ESTABLISHED > > > > @@ -2478,7 +2491,11 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow, > > > > > > > > conn->wnd_from_tap = WINDOW_DEFAULT; > > > > > > > > - tcp_send_flag(c, conn, SYN); > > > > + if (tcp_send_flag(c, conn, SYN)) { > > > > + conn_flag(c, conn, CLOSING); > > > > > > I would wait for David to confirm, but I'm fairly sure that this needs > > > FLOW_ACTIVATE(conn); before returning, just like in the other error path > > > of this function, because otherwise we'll leave the newly created flow > > > in an "incomplete" state. > > > > Ah, yes, I missed that. > > > > > Due to flow table restrictions we adopted to keep the implementation > > > simple (see "Theory of Operation - allocating and freeing flow entries" > > > in flow.c), quoting from the documentation to enum flow_state in > > > flow.h: > > > > > > * Caveats: > > > * - At most one entry may be NEW, INI, TGT or TYPED at a time, so > > > * it's unsafe to use flow_alloc() again until this entry moves to > > > * ACTIVE or FREE > > > > > > so, if we create a second connection within the same epoll cycle (for > > > example by calling tcp_tap_conn_from_sock() again), we'll now have two > > > entries in state TYPED, which breaks this assumption, and things will > > > > Exactly right. > > > > > David, I think this isn't documented very obviously, even though it's > > > all there in flow.h. This just occurred to me because of commit > > > 52419a64f2df ("migrate, tcp: Don't flow_alloc_cancel() during incoming > > > migration") but we can't expect others to know about past commits. > > > > I agree it's not as clear as I'd like... > > > > > I wonder if you could think of a quick way to make this more prominent... > > > should we perhaps state return conditions in functions, like you already > > > added for isolation.c? > > > > ... but I'm not really sure how to make it more prominent in a useful > > way. Maybe a note in the function header would do the trick? I'm not > > sure. > > I don't think that's particularly helpful, because we have other > functions that would need that, and we might write more without > noticing. > > I think we should, eventually, one day, have another look at the > documentation in flow.c and clearly state manipulation rules that are > otherwise only listed as comments to enums in flow.h. > > That is, perhaps a new section with just those caveats, and in more > practical terms: instead of "at most one entry ...", something like > "make sure you call FLOW_ACTIVATE() after ...". > > But regardless of that I don't really think it's a priority right now. > Let's keep this for later. I concur. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson