From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FWEB5Yvq;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by passt.top (Postfix) with ESMTPS id A5FDC5A026D
	for <passt-dev@passt.top>; Wed, 15 Apr 2026 21:38:33 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1776281912;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=JyrTIfFWxVsNozHyoy8a9ybCdwyEnTfBIHrNZDxZ450=;
	b=FWEB5Yvq7MEujqnyDZBdHBPjf1q4aqbG6+CfK7zVMoDPD/XlYni0labHnJHFIiv7VXCDWK
	wjFgOsmInUFwURIHIsEwWgezIp9NuBB636Bc8wzlt/Z/6hccgANN4JBmIf64CDi1ZemhSr
	Fzn2eVE59Ju2ioMwKleujtv3LwpoWNA=
Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com
 [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-524-Pdg_j7cfNUuobB4dxiRxMQ-1; Wed, 15 Apr 2026 15:38:31 -0400
X-MC-Unique: Pdg_j7cfNUuobB4dxiRxMQ-1
X-Mimecast-MFC-AGG-ID: Pdg_j7cfNUuobB4dxiRxMQ_1776281910
Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-43ea7a5da42so1135990f8f.1
        for <passt-dev@passt.top>; Wed, 15 Apr 2026 12:38:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1776281910; x=1776886710;
        h=date:content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=JyrTIfFWxVsNozHyoy8a9ybCdwyEnTfBIHrNZDxZ450=;
        b=HHHaD4ACNH5VzbkIglPbHRR1T8slXn8QOiXaCjgOIFzd/ztEw9dgoO7ruljVhqjaUn
         DaKQeNd2RQ2W35mP2mwLbhQLVJ2WxkMJ7yYRB2FUrb6p7swsb8Trsq0yhHmNy65TLIi7
         UJWjGOspZjZ37FZEPCWUjKYCOWRnndwfvy2zr3aXhh/rdgVm7iAPZ2aWi69kKiRHiSPk
         QeGNowNzndFobLtHkOLwDEpGDEDny/DSz6Ob3z2ve/FkBUwYQaXpdosiLzCjHUNh7/5P
         aKF5D9Vv5xwHUoG5ke0fcw9DLx4MsFzR4m6x1NbteJ3csWe6XeqLLJSzIyoF9XpSmXU0
         Onug==
X-Gm-Message-State: AOJu0Yye7c8uoqcwa0BcUbeKsTV93LHogE/VMJgzl2XlWhREB3gr9E3p
	A/Dr4weltLnwBbwmK2mU/GPIskzCfv01frhSeftPTfYjNfm1oVOASM5aoLiA+o7j7zk8N0sRIGZ
	iWJitziInb7fY6r3hPOnDGxRpCSZq7utrtT0EIrcfAU4lghgBgcCgKA==
X-Gm-Gg: AeBDievcFCfXgml39kezS1cvPXXrIpCmfuU9P4n0CAnAJ0WexxZUWTS+gycoFLvdDMB
	5DBfJf8epkLaA8PNfbCoPMQdWSXiIA2HuuempwrPnE3hzSPjZDEaiw/yD5OFcuZPAsdzDes8AM0
	RgFb9cW1vg3wV14RYMELUhq/NuwBcPQU5mghWCTzoel2WPSLKdAIudqdZLOROHpRfKnvGQvyA1P
	wbd+9za2HXqvl8V4wPYAG0fN2g2DDyUEeqqTNzbqHR48NwbEM60v1aZnj5iAtZdmPmL/MutO6jS
	3HChW9M3aM+DAAijJcDizgysvWd60XOLLymg1kWK/uteKw4VWDClcer2dHF7sINzgcUn5t6o4mZ
	wJvsuczC5iBuiZeVxP+t4FzlhyPda9ZnQu3Wcn07w3XzWTRKYEA==
X-Received: by 2002:a05:6000:2403:b0:43d:4b00:9ee7 with SMTP id ffacd0b85a97d-43d642d1b1cmr34095101f8f.33.1776281909838;
        Wed, 15 Apr 2026 12:38:29 -0700 (PDT)
X-Received: by 2002:a05:6000:2403:b0:43d:4b00:9ee7 with SMTP id ffacd0b85a97d-43d642d1b1cmr34095064f8f.33.1776281909263;
        Wed, 15 Apr 2026 12:38:29 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43ead33d670sm7898967f8f.2.2026.04.15.12.38.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 15 Apr 2026 12:38:28 -0700 (PDT)
From: Stefano Brivio <sbrivio@redhat.com>
To: Anshu Kumari <anskuma@redhat.com>, David Gibson
 <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v1] tcp: Handle errors from tcp_send_flag()
Message-ID: <20260415213827.39495072@elisabeth>
In-Reply-To: <20260410075539.1566421-1-anskuma@redhat.com>
References: <20260410075539.1566421-1-anskuma@redhat.com>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Date: Wed, 15 Apr 2026 21:38:28 +0200 (CEST)
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: N2NhtDx_nD3fNEI8noZhfUOtyxICfMK1G8lgx5EnXdo_1776281910
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: SK25AKSAFT2CFLGAPZGBTPI5RLCDQFVC
X-Message-ID-Hash: SK25AKSAFT2CFLGAPZGBTPI5RLCDQFVC
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20260415213827.39495072@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/SK25AKSAFT2CFLGAPZGBTPI5RLCDQFVC/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

Nit: v1 in the subject tag is not necessary (not harmful either): if
there are no version tags, it's implicit we're talking about version 1.

On Fri, 10 Apr 2026 13:25:39 +0530
Anshu Kumari <anskuma@redhat.com> wrote:

> tcp_send_flag() can return error codes from tcp_prepare_flags()
> failing TCP_INFO, or from failure to collect buffers on the
> vhost-user path. These errors indicate the connection requires
> resetting.
> 
> Most callers of tcp_send_flag() were ignoring the error code and
> carrying on as if nothing was wrong. Check the return value at
> each call site and handle the error appropriately:
>   - in tcp_data_from_tap(), return -1 so the caller resets
>   - in tcp_tap_handler(), goto reset
>   - in tcp_timer_handler()/tcp_sock_handler()/tcp_conn_from_sock_finish(),
>     call tcp_rst() and return
>   - in tcp_tap_conn_from_sock(), set CLOSING flag (flow not yet active)
>   - in tcp_keepalive(), call tcp_rst() and continue the loop
>   - in tcp_flow_migrate_target_ext(), goto fail
> 
> The call in tcp_rst_do() is left unchecked: we are already
> resetting, and tcp_sock_rst() still needs to run regardless.
> 
> Bug: https://bugs.passt.top/show_bug.cgi?id=194

Nit: we always use Link: tags (CONTRIBUTING.md uses the plural which
might be a bit confusing, I guess we should fix that), rationale:

  https://archives.passt.top/passt-dev/20230704132104.48106368@elisabeth/
  https://archives.passt.top/passt-dev/20251105163137.424a6537@elisabeth/

But I fix up these tags on merge anyway, no need to re-send (in
general).

> Signed-off-by: Anshu Kumari <anskuma@redhat.com>
> ---
>  tcp.c | 59 ++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 8ea9be8..9ce671a 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1917,7 +1917,9 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  				   "keep-alive sequence: %u, previous: %u",
>  				   seq, conn->seq_from_tap);
>  
> -			tcp_send_flag(c, conn, ACK);
> +			if (tcp_send_flag(c, conn, ACK))
> +				return -1;

A general comment: in _some_ of these cases where we fail to send ACK
segments, I intentionally didn't check for errors and let the
connection live on, because that looked like the most graceful failure
handling to me.

After all, ACK segments without data are not assumed to be reliably
transmitted (RFC 9293, 3.8.4), so, given that failing to send some
should have a similar outcome as the peer missing some, I guess we're
always expected to recover from a situation like that.

This doesn't apply to other occurrences below where we fail to send a
SYN segment or where failure to send ACK segments might mean we are in
some expected state (including a connection that might get stuck
forever).

But reading David's description of bug #194, I wonder if he had
something else in mind. That is, I don't have a strong preference
against resetting the connection whenever we fail to prepare buffers,
but in many of these cases we don't really _have to_ reset the
connection. David, do you see this differently?

> +
>  			tcp_timer_ctl(c, conn);
>  
>  			if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE,
> @@ -2043,14 +2045,16 @@ eintr:
>  			 *   Then swiftly looked away and left.
>  			 */
>  			conn->seq_from_tap = seq_from_tap;
> -			tcp_send_flag(c, conn, ACK);
> +			if (tcp_send_flag(c, conn, ACK))
> +				return -1;
>  		}
>  
>  		if (errno == EINTR)
>  			goto eintr;
>  
>  		if (errno == EAGAIN || errno == EWOULDBLOCK) {
> -			tcp_send_flag(c, conn, ACK | DUP_ACK);
> +			if (tcp_send_flag(c, conn, ACK | DUP_ACK))
> +				return -1;
>  			return p->count - idx;
>  
>  		}
> @@ -2070,7 +2074,8 @@ out:
>  		 */
>  		if (conn->seq_dup_ack_approx != (conn->seq_from_tap & 0xff)) {
>  			conn->seq_dup_ack_approx = conn->seq_from_tap & 0xff;
> -			tcp_send_flag(c, conn, ACK | DUP_ACK);
> +			if (tcp_send_flag(c, conn, ACK | DUP_ACK))
> +				return -1;
>  		}
>  		return p->count - idx;
>  	}
> @@ -2084,7 +2089,8 @@ out:
>  
>  		conn_event(c, conn, TAP_FIN_RCVD);
>  	} else {
> -		tcp_send_flag(c, conn, ACK_IF_NEEDED);
> +		if (tcp_send_flag(c, conn, ACK_IF_NEEDED))
> +			return -1;
>  	}
>  
>  	return p->count - idx;
> @@ -2122,7 +2128,10 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
>  		return;
>  	}
>  
> -	tcp_send_flag(c, conn, ACK);
> +	if (tcp_send_flag(c, conn, ACK)) {
> +		tcp_rst(c, conn);
> +		return;
> +	}
>  
>  	/* The client might have sent data already, which we didn't
>  	 * dequeue waiting for SYN,ACK from tap -- check now.
> @@ -2308,7 +2317,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
>  				goto reset;
>  			}
>  
> -			tcp_send_flag(c, conn, ACK);
> +			if (tcp_send_flag(c, conn, ACK))
> +				goto reset;
> +
>  			conn_event(c, conn, SOCK_FIN_SENT);
>  
>  			return 1;
> @@ -2388,7 +2399,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
>  		}
>  
>  		conn_event(c, conn, SOCK_FIN_SENT);
> -		tcp_send_flag(c, conn, ACK);
> +		if (tcp_send_flag(c, conn, ACK))
> +			goto reset;
> +
>  		ack_due = 0;
>  
>  		/* If we received a FIN, but the socket is in TCP_ESTABLISHED
> @@ -2478,7 +2491,11 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow,
>  
>  	conn->wnd_from_tap = WINDOW_DEFAULT;
>  
> -	tcp_send_flag(c, conn, SYN);
> +	if (tcp_send_flag(c, conn, SYN)) {
> +		conn_flag(c, conn, CLOSING);

I would wait for David to confirm, but I'm fairly sure that this needs
FLOW_ACTIVATE(conn); before returning, just like in the other error path
of this function, because otherwise we'll leave the newly created flow
in an "incomplete" state.

Due to flow table restrictions we adopted to keep the implementation
simple (see "Theory of Operation - allocating and freeing flow entries"
in flow.c), quoting from the documentation to enum flow_state in
flow.h:

 *        Caveats:
 *            - At most one entry may be NEW, INI, TGT or TYPED at a time, so
 *              it's unsafe to use flow_alloc() again until this entry moves to
 *              ACTIVE or FREE

so, if we create a second connection within the same epoll cycle (for
example by calling tcp_tap_conn_from_sock() again), we'll now have two
entries in state TYPED, which breaks this assumption, and things will

David, I think this isn't documented very obviously, even though it's
all there in flow.h. This just occurred to me because of commit
52419a64f2df ("migrate, tcp: Don't flow_alloc_cancel() during incoming
migration") but we can't expect others to know about past commits.

I wonder if you could think of a quick way to make this more prominent...
should we perhaps state return conditions in functions, like you already
added for isolation.c?

> +		return;
> +	}
> +
>  	conn_flag(c, conn, ACK_FROM_TAP_DUE);
>  
>  	tcp_get_sndbuf(conn);
> @@ -2585,7 +2602,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
>  		return;
>  
>  	if (conn->flags & ACK_TO_TAP_DUE) {
> -		tcp_send_flag(c, conn, ACK_IF_NEEDED);
> +		if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) {
> +			tcp_rst(c, conn);
> +			return;
> +		}
>  		tcp_timer_ctl(c, conn);
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
>  		if (!(conn->events & ESTABLISHED)) {
> @@ -2598,7 +2618,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
>  				tcp_rst(c, conn);
>  			} else {
>  				flow_trace(conn, "SYN timeout, retry");
> -				tcp_send_flag(c, conn, SYN);
> +				if (tcp_send_flag(c, conn, SYN)) {
> +					tcp_rst(c, conn);
> +					return;
> +				}
>  				conn->retries++;
>  				conn_flag(c, conn, SYN_RETRIED);
>  				tcp_timer_ctl(c, conn);
> @@ -2662,8 +2685,11 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  			tcp_data_from_sock(c, conn);
>  
>  		if (events & EPOLLOUT) {
> -			if (tcp_update_seqack_wnd(c, conn, false, NULL))
> -				tcp_send_flag(c, conn, ACK);
> +			if (tcp_update_seqack_wnd(c, conn, false, NULL) &&
> +			    tcp_send_flag(c, conn, ACK)) {
> +				tcp_rst(c, conn);
> +				return;
> +			}
>  		}
>  
>  		return;
> @@ -2903,7 +2929,8 @@ static void tcp_keepalive(struct ctx *c, const struct timespec *now)
>  		if (conn->tap_inactive) {
>  			flow_dbg(conn, "No tap activity for least %us, send keepalive",
>  				 KEEPALIVE_INTERVAL);
> -			tcp_send_flag(c, conn, KEEPALIVE);
> +			if (tcp_send_flag(c, conn, KEEPALIVE))
> +				tcp_rst(c, conn);
>  		}
>  
>  		/* Ready to check fot next interval */
> @@ -3926,7 +3953,9 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
>  	if (tcp_set_peek_offset(conn, peek_offset))
>  		goto fail;
>  
> -	tcp_send_flag(c, conn, ACK);
> +	if (tcp_send_flag(c, conn, ACK))
> +		goto fail;
> +
>  	tcp_data_from_sock(c, conn);
>  
>  	if ((rc = tcp_epoll_ctl(conn))) {

-- 
Stefano