public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 12/12] udp_flow: Don't discard packets that arrive between bind() and connect()
Date: Mon, 7 Apr 2025 23:49:41 +0200	[thread overview]
Message-ID: <20250407234941.4a0b811b@elisabeth> (raw)
In-Reply-To: <20250404101542.3729316-13-david@gibson.dropbear.id.au>

On Fri,  4 Apr 2025 21:15:42 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> When we establish a new UDP flow we create connect()ed sockets that will
> only handle datagrams for this flow.  However, there is a race between
> bind() and connect() where they might get some packets queued for a
> different flow.  Currently we handle this by simply discarding any
> queued datagrams after the connect.  UDP protocols should be able to handle
> such packet loss, but it's not ideal.
> 
> We now have the tools we need to handle this better, by redirecting any
> datagrams received during that race to the appropriate flow.  We need to
> use a deferred handler for this to avoid unexpectedly re-ordering datagrams
> in some edge cases.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  flow.c         |  2 +-
>  udp.c          |  4 +--
>  udp_flow.c     | 73 +++++++++++++++++++++++++++++++++++---------------
>  udp_flow.h     |  6 ++++-
>  udp_internal.h |  2 ++
>  5 files changed, 61 insertions(+), 26 deletions(-)
> 
> diff --git a/flow.c b/flow.c
> index 86222426..29a83e18 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -850,7 +850,7 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now)
>  				closed = icmp_ping_timer(c, &flow->ping, now);
>  			break;
>  		case FLOW_UDP:
> -			closed = udp_flow_defer(&flow->udp);
> +			closed = udp_flow_defer(c, &flow->udp, now);
>  			if (!closed && timer)
>  				closed = udp_flow_timer(c, &flow->udp, now);
>  			break;
> diff --git a/udp.c b/udp.c
> index 7c8b7a2c..b275db3d 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -697,8 +697,8 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n,
>   * @port:	Our (local) port number of @s
>   * @now:	Current timestamp
>   */
> -static void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> -			 in_port_t port, const struct timespec *now)
> +void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> +		  in_port_t port, const struct timespec *now)
>  {
>  	union sockaddr_inany src;
>  
> diff --git a/udp_flow.c b/udp_flow.c
> index b95c3176..af15d7f2 100644
> --- a/udp_flow.c
> +++ b/udp_flow.c
> @@ -9,10 +9,12 @@
>  #include <fcntl.h>
>  #include <sys/uio.h>
>  #include <unistd.h>
> +#include <netinet/udp.h>
>  
>  #include "util.h"
>  #include "passt.h"
>  #include "flow_table.h"
> +#include "udp_internal.h"
>  
>  #define UDP_CONN_TIMEOUT	180 /* s, timeout for ephemeral or local bind */
>  
> @@ -67,16 +69,15 @@ void udp_flow_close(const struct ctx *c, struct udp_flow *uflow)
>   * Return: fd of new socket on success, -ve error code on failure
>   */
>  static int udp_flow_sock(const struct ctx *c,
> -			 const struct udp_flow *uflow, unsigned sidei)
> +			 struct udp_flow *uflow, unsigned sidei)
>  {
>  	const struct flowside *side = &uflow->f.side[sidei];
> -	struct mmsghdr discard[UIO_MAXIOV] = { 0 };
>  	uint8_t pif = uflow->f.pif[sidei];
>  	union {
>  		flow_sidx_t sidx;
>  		uint32_t data;
>  	} fref = { .sidx = FLOW_SIDX(uflow, sidei) };
> -	int rc, s;
> +	int s;
>  
>  	s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side, fref.data);
>  	if (s < 0) {
> @@ -85,30 +86,32 @@ static int udp_flow_sock(const struct ctx *c,
>  	}
>  
>  	if (flowside_connect(c, s, pif, side) < 0) {
> -		rc = -errno;
> +		int rc = -errno;
>  		flow_dbg_perror(uflow, "Couldn't connect flow socket");
>  		return rc;
>  	}
>  
> -	/* It's possible, if unlikely, that we could receive some unrelated
> -	 * packets in between the bind() and connect() of this socket.  For now
> -	 * we just discard these.
> +	/* It's possible, if unlikely, that we could receive some packets in
> +	 * between the bind() and connect() which may or may not be for this
> +	 * flow.  Being UDP we could just discard them, but it's not ideal.
>  	 *
> -	 * FIXME: Redirect these to an appropriate handler
> +	 * There's also a tricky case if a bunch of datagrams for a new flow
> +	 * arrive in rapid succession, the first going to the original listening
> +	 * socket and later ones going to this new socket.  If we forwarded the
> +	 * datagrams from the new socket immediately here they would go before
> +	 * the datagram which established the flow.  Again, not strictly wrong
> +	 * for UDP, but not ideal.
> +	 *
> +	 * So, we flag that the new socket is in a transient state where it
> +	 * might have datagrams for a different flow queued.  Before the next
> +	 * epoll cycle, udp_flow_defer() will flush out any such datagrams, and
> +	 * thereafter everything on the new socket should be strictly for this
> +	 * flow.
>  	 */
> -	rc = recvmmsg(s, discard, ARRAY_SIZE(discard), MSG_DONTWAIT, NULL);
> -	if (rc >= ARRAY_SIZE(discard)) {
> -		flow_dbg(uflow, "Too many (%d) spurious reply datagrams", rc);
> -		return -E2BIG;
> -	}
> -
> -	if (rc > 0) {
> -		flow_trace(uflow, "Discarded %d spurious reply datagrams", rc);
> -	} else if (errno != EAGAIN) {
> -		rc = -errno;
> -		flow_perror(uflow, "Unexpected error discarding datagrams");
> -		return rc;
> -	}
> +	if (sidei)
> +		uflow->flush1 = true;
> +	else
> +		uflow->flush0 = true;
>  
>  	return s;
>  }
> @@ -268,14 +271,40 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
>  	return udp_flow_new(c, flow, now);
>  }
>  
> +/**
> + * udp_flush_flow() - Flush datagrams that might not be for this flow
> + * @ctx:	Execution context
> + * @uflow:	Flow to handle
> + * @sidei:	Side of the flow to flush
> + * @now:	Current timestamp
> + */
> +static void udp_flush_flow(const struct ctx *c,
> +			   const struct udp_flow *uflow, unsigned sidei,
> +			   const struct timespec *now)
> +{
> +	/* We don't know exactly where the datagrams will come from, but we know
> +	 * they'll have an interface and oport matching this flow */
> +	udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei],
> +		     uflow->f.side[sidei].oport, now);
> +}
> +
>  /**
>   * udp_flow_defer() - Deferred per-flow handling (clean up aborted flows)
>   * @uflow:	Flow to handle
>   *
>   * Return: true if the connection is ready to free, false otherwise
>   */
> -bool udp_flow_defer(const struct udp_flow *uflow)
> +bool udp_flow_defer(const struct ctx *c, struct udp_flow *uflow,
> +		    const struct timespec *now)

Function comment not updated. Updated on merge.

>  {
> +	if (uflow->flush0) {
> +		udp_flush_flow(c, uflow, INISIDE, now);
> +		uflow->flush0 = false;
> +	}
> +	if (uflow->flush1) {
> +		udp_flush_flow(c, uflow, TGTSIDE, now);
> +		uflow->flush1 = false;
> +	}
>  	return uflow->closed;
>  }
>  
> diff --git a/udp_flow.h b/udp_flow.h
> index d4e4c8b9..d518737e 100644
> --- a/udp_flow.h
> +++ b/udp_flow.h
> @@ -11,6 +11,8 @@
>   * struct udp - Descriptor for a flow of UDP packets
>   * @f:		Generic flow information
>   * @closed:	Flow is already closed
> + * @flush0:	@s[0] may have datagrams queued for other flows
> + * @flush1:	@s[1] may have datagrams queued for other flows
>   * @ts:		Activity timestamp
>   * @s:		Socket fd (or -1) for each side of the flow
>   */
> @@ -19,6 +21,7 @@ struct udp_flow {
>  	struct flow_common f;
>  
>  	bool closed :1;
> +	bool flush0, flush1 :1;
>  	time_t ts;
>  	int s[SIDES];
>  };
> @@ -33,7 +36,8 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
>  			      in_port_t srcport, in_port_t dstport,
>  			      const struct timespec *now);
>  void udp_flow_close(const struct ctx *c, struct udp_flow *uflow);
> -bool udp_flow_defer(const struct udp_flow *uflow);
> +bool udp_flow_defer(const struct ctx *c, struct udp_flow *uflow,
> +		    const struct timespec *now);
>  bool udp_flow_timer(const struct ctx *c, struct udp_flow *uflow,
>  		    const struct timespec *now);
>  
> diff --git a/udp_internal.h b/udp_internal.h
> index f7d84267..96d11cff 100644
> --- a/udp_internal.h
> +++ b/udp_internal.h
> @@ -28,5 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>  size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>                         const struct flowside *toside, size_t dlen,
>  		       bool no_udp_csum);
> +void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> +		  in_port_t port, const struct timespec *now);
>  
>  #endif /* UDP_INTERNAL_H */

-- 
Stefano


  reply	other threads:[~2025-04-07 21:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04 10:15 [PATCH 00/12] Use connect()ed sockets for both sides of UDP flows David Gibson
2025-04-04 10:15 ` [PATCH 01/12] udp: Use connect()ed sockets for initiating side David Gibson
2025-04-04 10:15 ` [PATCH 02/12] udp: Make udp_sock_recv() take max number of frames as a parameter David Gibson
2025-04-07 21:49   ` Stefano Brivio
2025-04-04 10:15 ` [PATCH 03/12] udp: Polish udp_vu_sock_info() and remove from vu specific code David Gibson
2025-04-07 21:49   ` Stefano Brivio
2025-04-04 10:15 ` [PATCH 04/12] udp: Don't bother to batch datagrams from "listening" socket David Gibson
2025-04-04 10:15 ` [PATCH 05/12] udp: Parameterize number of datagrams handled by udp_*_reply_sock_data() David Gibson
2025-04-04 10:15 ` [PATCH 06/12] udp: Split spliced forwarding path from udp_buf_reply_sock_data() David Gibson
2025-04-07 21:49   ` Stefano Brivio
2025-04-04 10:15 ` [PATCH 07/12] udp: Merge vhost-user and "buf" listening socket paths David Gibson
2025-04-04 10:15 ` [PATCH 08/12] udp: Move UDP_MAX_FRAMES to udp.c David Gibson
2025-04-04 10:15 ` [PATCH 09/12] udp_flow: Take pif and port as explicit parameters to udp_flow_from_sock() David Gibson
2025-04-07 21:49   ` Stefano Brivio
2025-04-04 10:15 ` [PATCH 10/12] udp: Rework udp_listen_sock_data() into udp_sock_fwd() David Gibson
2025-04-04 10:15 ` [PATCH 11/12] udp: Fold udp_splice_prepare and udp_splice_send into udp_sock_to_sock David Gibson
2025-04-07 21:49   ` Stefano Brivio
2025-04-04 10:15 ` [PATCH 12/12] udp_flow: Don't discard packets that arrive between bind() and connect() David Gibson
2025-04-07 21:49   ` Stefano Brivio [this message]
2025-04-07 21:49 ` [PATCH 00/12] Use connect()ed sockets for both sides of UDP flows Stefano Brivio
2025-04-07 23:50   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250407234941.4a0b811b@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).