public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/3] Dual stack sockets for UDP
@ 2024-08-26  9:37 David Gibson
  2024-08-26  9:37 ` [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays David Gibson
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: David Gibson @ 2024-08-26  9:37 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Here, at last, is a change to use dual stack sockets for UDP port
forwards as we have for TCP for some time.  With the merging of many
parallel IPv4/IPv6 structures being done for the flow table and other
recent work, this actually turned out much easier than I anticipated.

David Gibson (3):
  udp: Merge udp[46]_mh_recv arrays
  udp: Remove unnnecessary local from udp_sock_init()
  udp: Use dual stack sockets for port forwarding when possible

 udp.c | 104 ++++++++++++++++++++++++++++++----------------------------
 udp.h |   2 --
 2 files changed, 53 insertions(+), 53 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
  2024-08-26  9:37 [PATCH 0/3] Dual stack sockets for UDP David Gibson
@ 2024-08-26  9:37 ` David Gibson
  2024-08-26 19:32   ` Stefano Brivio
  2024-08-26  9:37 ` [PATCH 2/3] udp: Remove unnnecessary local from udp_sock_init() David Gibson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2024-08-26  9:37 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We've already gotten rid of most of the IPv4/IPv6 specific data structures
in udp.c by merging them with each other.  One significant one remains:
udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
interaction.  We initialise the msg_namelen fields to represent the total
size we have for a socket address, but when we receive into the arrays
those are modified to the actual length of the sockaddr we received.

That meant that naively merging the arrays meant that if we received IPv4
datagrams, then IPv6 datagrams, the addresses for the latter would be
truncated.  In this patch address that by resetting the received
msg_namelen as soon as we've found a flow for the datagram.  Finding the
flow is the only thing that might use the actual sockaddr length, although
we in fact don't need it for the time being.

This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
so remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 57 ++++++++++++++++++++-------------------------------------
 udp.h |  2 --
 2 files changed, 20 insertions(+), 39 deletions(-)

diff --git a/udp.c b/udp.c
index 8a93aad6..6638c22b 100644
--- a/udp.c
+++ b/udp.c
@@ -178,8 +178,7 @@ enum udp_iov_idx {
 
 /* IOVs and msghdr arrays for receiving datagrams from sockets */
 static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
-static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
-static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
+static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
 
 /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
 static union sockaddr_inany udp_splice_to;
@@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 static void udp_iov_init_one(const struct ctx *c, size_t i)
 {
 	struct udp_payload_t *payload = &udp_payload[i];
+	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
 	struct udp_meta_t *meta = &udp_meta[i];
 	struct iovec *siov = &udp_iov_recv[i];
 	struct iovec *tiov = udp_l2_iov[i];
@@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
 	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
 	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
 
-	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
-	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
-	 * time or risk truncating the address on future IPv6 recv()s.
-	 */
-	if (c->ifi4) {
-		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
-
-		mh->msg_name	= &meta->s_in;
-		mh->msg_namelen	= sizeof(struct sockaddr_in);
-		mh->msg_iov	= siov;
-		mh->msg_iovlen	= 1;
-	}
-
-	if (c->ifi6) {
-		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
-
-		mh->msg_name	= &meta->s_in;
-		mh->msg_namelen	= sizeof(struct sockaddr_in6);
-		mh->msg_iov	= siov;
-		mh->msg_iovlen	= 1;
-	}
+	mh->msg_name	= &meta->s_in;
+	mh->msg_namelen	= sizeof(meta->s_in);
+	mh->msg_iov	= siov;
+	mh->msg_iovlen	= 1;
 }
 
 /**
@@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
 void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 			     uint32_t events, const struct timespec *now)
 {
-	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
+	const socklen_t sasize = sizeof(udp_meta[0].s_in);
 	int n, i;
 
-	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
+	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
 		return;
 
 	/* We divide datagrams into batches based on how we need to send them,
@@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 	 * populate it one entry *ahead* of the loop counter.
 	 */
 	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
+	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;
 	for (i = 0; i < n; ) {
 		flow_sidx_t batchsidx = udp_meta[i].tosidx;
 		uint8_t batchpif = pif_at_sidx(batchsidx);
@@ -525,18 +509,22 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 
 		do {
 			if (pif_is_socket(batchpif)) {
-				udp_splice_prepare(mmh_recv, i);
+				udp_splice_prepare(udp_mh_recv, i);
 			} else if (batchpif == PIF_TAP) {
-				udp_tap_prepare(mmh_recv, i,
+				udp_tap_prepare(udp_mh_recv, i,
 						flowside_at_sidx(batchsidx));
 			}
 
+			/* Restore sockaddr length clobbered by recvmsg() */
+			udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);
+
 			if (++i >= n)
 				break;
 
 			udp_meta[i].tosidx = udp_flow_from_sock(c, ref,
 								&udp_meta[i].s_in,
 								now);
+			udp_mh_recv[i].msg_hdr.msg_namelen = sasize;
 		} while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx));
 
 		if (pif_is_socket(batchpif)) {
@@ -572,19 +560,16 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
 			    uint32_t events, const struct timespec *now)
 {
-	const struct flowside *fromside = flowside_at_sidx(ref.flowside);
 	flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside);
 	const struct flowside *toside = flowside_at_sidx(tosidx);
 	struct udp_flow *uflow = udp_at_sidx(ref.flowside);
 	int from_s = uflow->s[ref.flowside.sidei];
-	bool v6 = !inany_v4(&fromside->eaddr);
-	struct mmsghdr *mmh_recv = v6 ? udp6_mh_recv : udp4_mh_recv;
 	uint8_t topif = pif_at_sidx(tosidx);
 	int n, i;
 
 	ASSERT(!c->no_udp && uflow);
 
-	if ((n = udp_sock_recv(c, from_s, events, mmh_recv)) <= 0)
+	if ((n = udp_sock_recv(c, from_s, events, udp_mh_recv)) <= 0)
 		return;
 
 	flow_trace(uflow, "Received %d datagrams on reply socket", n);
@@ -592,9 +577,11 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
 
 	for (i = 0; i < n; i++) {
 		if (pif_is_socket(topif))
-			udp_splice_prepare(mmh_recv, i);
+			udp_splice_prepare(udp_mh_recv, i);
 		else if (topif == PIF_TAP)
-			udp_tap_prepare(mmh_recv, i, toside);
+			udp_tap_prepare(udp_mh_recv, i, toside);
+		/* Restore sockaddr length clobbered by recvmsg() */
+		udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);
 	}
 
 	if (pif_is_socket(topif)) {
@@ -740,8 +727,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		uref.pif = PIF_HOST;
 
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
-		uref.v6 = 0;
-
 		if (!ns) {
 			r4 = s = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
 					 addr, ifname, port, uref.u32);
@@ -756,8 +741,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 	}
 
 	if ((af == AF_INET6 || af == AF_UNSPEC) && c->ifi6) {
-		uref.v6 = 1;
-
 		if (!ns) {
 			r6 = s = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
 					 addr, ifname, port, uref.u32);
diff --git a/udp.h b/udp.h
index fb42e1c5..a8e76bfe 100644
--- a/udp.h
+++ b/udp.h
@@ -26,14 +26,12 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
  * union udp_listen_epoll_ref - epoll reference for "listening" UDP sockets
  * @port:		Source port for connected sockets, bound port otherwise
  * @pif:		pif for this socket
- * @v6:			Set for IPv6 sockets or connections
  * @u32:		Opaque u32 value of reference
  */
 union udp_listen_epoll_ref {
 	struct {
 		in_port_t	port;
 		uint8_t		pif;
-		bool		v6:1;
 	};
 	uint32_t u32;
 };
-- 
@@ -26,14 +26,12 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
  * union udp_listen_epoll_ref - epoll reference for "listening" UDP sockets
  * @port:		Source port for connected sockets, bound port otherwise
  * @pif:		pif for this socket
- * @v6:			Set for IPv6 sockets or connections
  * @u32:		Opaque u32 value of reference
  */
 union udp_listen_epoll_ref {
 	struct {
 		in_port_t	port;
 		uint8_t		pif;
-		bool		v6:1;
 	};
 	uint32_t u32;
 };
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] udp: Remove unnnecessary local from udp_sock_init()
  2024-08-26  9:37 [PATCH 0/3] Dual stack sockets for UDP David Gibson
  2024-08-26  9:37 ` [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays David Gibson
@ 2024-08-26  9:37 ` David Gibson
  2024-08-26  9:37 ` [PATCH 3/3] udp: Use dual stack sockets for port forwarding when possible David Gibson
  2024-08-26 19:33 ` [PATCH 0/3] Dual stack sockets for UDP Stefano Brivio
  3 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2024-08-26  9:37 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

The 's' variable is always redundant with either 'r4' or 'r6', so remove
it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/udp.c b/udp.c
index 6638c22b..a2e7043a 100644
--- a/udp.c
+++ b/udp.c
@@ -717,7 +717,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port)
 {
 	union udp_listen_epoll_ref uref = { .port = port };
-	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
+	int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
 	ASSERT(!c->no_udp);
 
@@ -728,29 +728,29 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		if (!ns) {
-			r4 = s = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-					 addr, ifname, port, uref.u32);
+			r4 = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
+				     addr, ifname, port, uref.u32);
 
-			udp_splice_init[V4][port] = s < 0 ? -1 : s;
+			udp_splice_init[V4][port] = r4 < 0 ? -1 : r4;
 		} else {
-			r4 = s = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-					 &in4addr_loopback,
-					 ifname, port, uref.u32);
-			udp_splice_ns[V4][port] = s < 0 ? -1 : s;
+			r4  = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
+				      &in4addr_loopback,
+				      ifname, port, uref.u32);
+			udp_splice_ns[V4][port] = r4 < 0 ? -1 : r4;
 		}
 	}
 
 	if ((af == AF_INET6 || af == AF_UNSPEC) && c->ifi6) {
 		if (!ns) {
-			r6 = s = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
-					 addr, ifname, port, uref.u32);
+			r6 = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
+				     addr, ifname, port, uref.u32);
 
-			udp_splice_init[V6][port] = s < 0 ? -1 : s;
+			udp_splice_init[V6][port] = r6 < 0 ? -1 : r6;
 		} else {
-			r6 = s = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
-					 &in6addr_loopback,
-					 ifname, port, uref.u32);
-			udp_splice_ns[V6][port] = s < 0 ? -1 : s;
+			r6 = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
+				     &in6addr_loopback,
+				     ifname, port, uref.u32);
+			udp_splice_ns[V6][port] = r6 < 0 ? -1 : r6;
 		}
 	}
 
-- 
@@ -717,7 +717,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port)
 {
 	union udp_listen_epoll_ref uref = { .port = port };
-	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
+	int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
 	ASSERT(!c->no_udp);
 
@@ -728,29 +728,29 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		if (!ns) {
-			r4 = s = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-					 addr, ifname, port, uref.u32);
+			r4 = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
+				     addr, ifname, port, uref.u32);
 
-			udp_splice_init[V4][port] = s < 0 ? -1 : s;
+			udp_splice_init[V4][port] = r4 < 0 ? -1 : r4;
 		} else {
-			r4 = s = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-					 &in4addr_loopback,
-					 ifname, port, uref.u32);
-			udp_splice_ns[V4][port] = s < 0 ? -1 : s;
+			r4  = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
+				      &in4addr_loopback,
+				      ifname, port, uref.u32);
+			udp_splice_ns[V4][port] = r4 < 0 ? -1 : r4;
 		}
 	}
 
 	if ((af == AF_INET6 || af == AF_UNSPEC) && c->ifi6) {
 		if (!ns) {
-			r6 = s = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
-					 addr, ifname, port, uref.u32);
+			r6 = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
+				     addr, ifname, port, uref.u32);
 
-			udp_splice_init[V6][port] = s < 0 ? -1 : s;
+			udp_splice_init[V6][port] = r6 < 0 ? -1 : r6;
 		} else {
-			r6 = s = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
-					 &in6addr_loopback,
-					 ifname, port, uref.u32);
-			udp_splice_ns[V6][port] = s < 0 ? -1 : s;
+			r6 = sock_l4(c, AF_INET6, EPOLL_TYPE_UDP_LISTEN,
+				     &in6addr_loopback,
+				     ifname, port, uref.u32);
+			udp_splice_ns[V6][port] = r6 < 0 ? -1 : r6;
 		}
 	}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] udp: Use dual stack sockets for port forwarding when possible
  2024-08-26  9:37 [PATCH 0/3] Dual stack sockets for UDP David Gibson
  2024-08-26  9:37 ` [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays David Gibson
  2024-08-26  9:37 ` [PATCH 2/3] udp: Remove unnnecessary local from udp_sock_init() David Gibson
@ 2024-08-26  9:37 ` David Gibson
  2024-08-26 19:33 ` [PATCH 0/3] Dual stack sockets for UDP Stefano Brivio
  3 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2024-08-26  9:37 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Platforms like Linux allow IPv6 sockets to listen for IPv4 connections as
well as native IPv6 connections.  By doing this we halve the number of
listening sockets we need (assuming passt/pasta is listening on the same
ports for IPv4 and IPv6).  When forwarding many ports (e.g. -u all) this
can significantly reduce the amount of kernel memory that passt consumes.

We've used such dual stack sockets for TCP since 8e914238b "tcp: Use dual
stack sockets for port forwarding when possible".  Add similar support for
UDP "listening" sockets.  Since UDP sockets don't use as much kernel memory
as TCP sockets this isn't as big a saving, but it's still significant.
When forwarding all TCP and UDP ports for both IPv4 & IPv6 (-t all -u all),
this reduces kernel memory usage from ~522 MiB to ~380MiB (kernel version
6.10.6 on Fedora 40, x86_64).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/udp.c b/udp.c
index a2e7043a..8bfafed2 100644
--- a/udp.c
+++ b/udp.c
@@ -726,6 +726,25 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 	else
 		uref.pif = PIF_HOST;
 
+	if (af == AF_UNSPEC && c->ifi4 && c->ifi6) {
+		int s;
+
+		/* Attempt to get a dual stack socket */
+		if (!ns) {
+			s = sock_l4(c, AF_UNSPEC, EPOLL_TYPE_UDP_LISTEN,
+				    addr, ifname, port, uref.u32);
+			udp_splice_init[V4][port] = s < 0 ? -1 : s;
+			udp_splice_init[V6][port] = s < 0 ? -1 : s;
+		} else {
+			s = sock_l4(c, AF_UNSPEC, EPOLL_TYPE_UDP_LISTEN,
+				    &in4addr_loopback, ifname, port, uref.u32);
+			udp_splice_ns[V4][port] = s < 0 ? -1 : s;
+			udp_splice_ns[V6][port] = s < 0 ? -1 : s;
+		}
+		if (IN_INTERVAL(0, FD_REF_MAX, s))
+			return 0;
+	}
+
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		if (!ns) {
 			r4 = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-- 
@@ -726,6 +726,25 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 	else
 		uref.pif = PIF_HOST;
 
+	if (af == AF_UNSPEC && c->ifi4 && c->ifi6) {
+		int s;
+
+		/* Attempt to get a dual stack socket */
+		if (!ns) {
+			s = sock_l4(c, AF_UNSPEC, EPOLL_TYPE_UDP_LISTEN,
+				    addr, ifname, port, uref.u32);
+			udp_splice_init[V4][port] = s < 0 ? -1 : s;
+			udp_splice_init[V6][port] = s < 0 ? -1 : s;
+		} else {
+			s = sock_l4(c, AF_UNSPEC, EPOLL_TYPE_UDP_LISTEN,
+				    &in4addr_loopback, ifname, port, uref.u32);
+			udp_splice_ns[V4][port] = s < 0 ? -1 : s;
+			udp_splice_ns[V6][port] = s < 0 ? -1 : s;
+		}
+		if (IN_INTERVAL(0, FD_REF_MAX, s))
+			return 0;
+	}
+
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		if (!ns) {
 			r4 = sock_l4(c, AF_INET, EPOLL_TYPE_UDP_LISTEN,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
  2024-08-26  9:37 ` [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays David Gibson
@ 2024-08-26 19:32   ` Stefano Brivio
  2024-08-27  1:12     ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2024-08-26 19:32 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 26 Aug 2024 19:37:14 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> We've already gotten rid of most of the IPv4/IPv6 specific data structures
> in udp.c by merging them with each other.  One significant one remains:
> udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
> interaction.  We initialise the msg_namelen fields to represent the total
> size we have for a socket address, but when we receive into the arrays
> those are modified to the actual length of the sockaddr we received.
> 
> That meant that naively merging the arrays meant that if we received IPv4
> datagrams, then IPv6 datagrams, the addresses for the latter would be
> truncated.  In this patch address that by resetting the received
> msg_namelen as soon as we've found a flow for the datagram.  Finding the
> flow is the only thing that might use the actual sockaddr length, although
> we in fact don't need it for the time being.
> 
> This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
> so remove that as well.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 57 ++++++++++++++++++++-------------------------------------
>  udp.h |  2 --
>  2 files changed, 20 insertions(+), 39 deletions(-)
> 
> diff --git a/udp.c b/udp.c
> index 8a93aad6..6638c22b 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -178,8 +178,7 @@ enum udp_iov_idx {
>  
>  /* IOVs and msghdr arrays for receiving datagrams from sockets */
>  static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
> -static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
> -static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
> +static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
>  
>  /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
>  static union sockaddr_inany udp_splice_to;
> @@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
>  static void udp_iov_init_one(const struct ctx *c, size_t i)
>  {
>  	struct udp_payload_t *payload = &udp_payload[i];
> +	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
>  	struct udp_meta_t *meta = &udp_meta[i];
>  	struct iovec *siov = &udp_iov_recv[i];
>  	struct iovec *tiov = udp_l2_iov[i];
> @@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
>  	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
>  	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
>  
> -	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
> -	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
> -	 * time or risk truncating the address on future IPv6 recv()s.
> -	 */
> -	if (c->ifi4) {
> -		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
> -
> -		mh->msg_name	= &meta->s_in;
> -		mh->msg_namelen	= sizeof(struct sockaddr_in);
> -		mh->msg_iov	= siov;
> -		mh->msg_iovlen	= 1;
> -	}
> -
> -	if (c->ifi6) {
> -		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
> -
> -		mh->msg_name	= &meta->s_in;
> -		mh->msg_namelen	= sizeof(struct sockaddr_in6);
> -		mh->msg_iov	= siov;
> -		mh->msg_iovlen	= 1;
> -	}
> +	mh->msg_name	= &meta->s_in;
> +	mh->msg_namelen	= sizeof(meta->s_in);
> +	mh->msg_iov	= siov;
> +	mh->msg_iovlen	= 1;
>  }
>  
>  /**
> @@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
>  void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
>  			     uint32_t events, const struct timespec *now)
>  {
> -	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
> +	const socklen_t sasize = sizeof(udp_meta[0].s_in);
>  	int n, i;
>  
> -	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
> +	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
>  		return;
>  
>  	/* We divide datagrams into batches based on how we need to send them,
> @@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
>  	 * populate it one entry *ahead* of the loop counter.
>  	 */
>  	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
> +	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;

I don't understand why you need this assignment. To me it looks
redundant with:

  udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);

later (because n > 0), and:

>  	for (i = 0; i < n; ) {
>  		flow_sidx_t batchsidx = udp_meta[i].tosidx;
>  		uint8_t batchpif = pif_at_sidx(batchsidx);
> @@ -525,18 +509,22 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
>  
>  		do {
>  			if (pif_is_socket(batchpif)) {
> -				udp_splice_prepare(mmh_recv, i);
> +				udp_splice_prepare(udp_mh_recv, i);
>  			} else if (batchpif == PIF_TAP) {
> -				udp_tap_prepare(mmh_recv, i,
> +				udp_tap_prepare(udp_mh_recv, i,
>  						flowside_at_sidx(batchsidx));
>  			}
>  
> +			/* Restore sockaddr length clobbered by recvmsg() */
> +			udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);

what is the difference between assigning sizeof(udp_meta[i].s_in); and
sasize? I thought it would be the same quantity.

> +
>  			if (++i >= n)
>  				break;
>  
>  			udp_meta[i].tosidx = udp_flow_from_sock(c, ref,
>  								&udp_meta[i].s_in,
>  								now);
> +			udp_mh_recv[i].msg_hdr.msg_namelen = sasize;
>  		} while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx));
>  
>  		if (pif_is_socket(batchpif)) {

-- 
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Dual stack sockets for UDP
  2024-08-26  9:37 [PATCH 0/3] Dual stack sockets for UDP David Gibson
                   ` (2 preceding siblings ...)
  2024-08-26  9:37 ` [PATCH 3/3] udp: Use dual stack sockets for port forwarding when possible David Gibson
@ 2024-08-26 19:33 ` Stefano Brivio
  3 siblings, 0 replies; 9+ messages in thread
From: Stefano Brivio @ 2024-08-26 19:33 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 26 Aug 2024 19:37:13 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Here, at last, is a change to use dual stack sockets for UDP port
> forwards as we have for TCP for some time.  With the merging of many
> parallel IPv4/IPv6 structures being done for the flow table and other
> recent work, this actually turned out much easier than I anticipated.

...magic.

Except for the (non-functional) detail I don't understand about 1/3, it
looks good to me, and I tested this with results similar to yours (I had
to remember to rebuild mbuto.mem.img of course...).

-- 
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
  2024-08-26 19:32   ` Stefano Brivio
@ 2024-08-27  1:12     ` David Gibson
  2024-08-27  5:33       ` Stefano Brivio
  0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2024-08-27  1:12 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7296 bytes --]

On Mon, Aug 26, 2024 at 09:32:55PM +0200, Stefano Brivio wrote:
> On Mon, 26 Aug 2024 19:37:14 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We've already gotten rid of most of the IPv4/IPv6 specific data structures
> > in udp.c by merging them with each other.  One significant one remains:
> > udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
> > interaction.  We initialise the msg_namelen fields to represent the total
> > size we have for a socket address, but when we receive into the arrays
> > those are modified to the actual length of the sockaddr we received.
> > 
> > That meant that naively merging the arrays meant that if we received IPv4
> > datagrams, then IPv6 datagrams, the addresses for the latter would be
> > truncated.  In this patch address that by resetting the received
> > msg_namelen as soon as we've found a flow for the datagram.  Finding the
> > flow is the only thing that might use the actual sockaddr length, although
> > we in fact don't need it for the time being.
> > 
> > This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
> > so remove that as well.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 57 ++++++++++++++++++++-------------------------------------
> >  udp.h |  2 --
> >  2 files changed, 20 insertions(+), 39 deletions(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index 8a93aad6..6638c22b 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -178,8 +178,7 @@ enum udp_iov_idx {
> >  
> >  /* IOVs and msghdr arrays for receiving datagrams from sockets */
> >  static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
> > -static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
> > -static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
> > +static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
> >  
> >  /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
> >  static union sockaddr_inany udp_splice_to;
> > @@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
> >  static void udp_iov_init_one(const struct ctx *c, size_t i)
> >  {
> >  	struct udp_payload_t *payload = &udp_payload[i];
> > +	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
> >  	struct udp_meta_t *meta = &udp_meta[i];
> >  	struct iovec *siov = &udp_iov_recv[i];
> >  	struct iovec *tiov = udp_l2_iov[i];
> > @@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
> >  	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
> >  	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
> >  
> > -	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
> > -	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
> > -	 * time or risk truncating the address on future IPv6 recv()s.
> > -	 */
> > -	if (c->ifi4) {
> > -		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
> > -
> > -		mh->msg_name	= &meta->s_in;
> > -		mh->msg_namelen	= sizeof(struct sockaddr_in);
> > -		mh->msg_iov	= siov;
> > -		mh->msg_iovlen	= 1;
> > -	}
> > -
> > -	if (c->ifi6) {
> > -		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
> > -
> > -		mh->msg_name	= &meta->s_in;
> > -		mh->msg_namelen	= sizeof(struct sockaddr_in6);
> > -		mh->msg_iov	= siov;
> > -		mh->msg_iovlen	= 1;
> > -	}
> > +	mh->msg_name	= &meta->s_in;
> > +	mh->msg_namelen	= sizeof(meta->s_in);
> > +	mh->msg_iov	= siov;
> > +	mh->msg_iovlen	= 1;
> >  }
> >  
> >  /**
> > @@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
> >  void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  			     uint32_t events, const struct timespec *now)
> >  {
> > -	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
> > +	const socklen_t sasize = sizeof(udp_meta[0].s_in);
> >  	int n, i;
> >  
> > -	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
> > +	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
> >  		return;
> >  
> >  	/* We divide datagrams into batches based on how we need to send them,
> > @@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  	 * populate it one entry *ahead* of the loop counter.
> >  	 */
> >  	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
> > +	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;
> 
> I don't understand why you need this assignment. To me it looks
> redundant with:
> 
>   udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);

It's not redundant per se, because the later assignment only occurs
for i > 0, so the first one is for slot 0.  It would, however, be
possible to move to a single assignment in the loop body before i is
incremented.

I did it this way, because I found it easier to reason about.  At
least theoretically the value of msg_namelen written by recvmmsg()
could be important, although we don't use yet (we rely on the
sa_family field instead).  But because of that it felt wrong to
overwrite that value before we've "consumed" it.  Logically that
happens in udp_flow_from_sock() which is what takes the address in
msg_name / msg_namelen and converts it into the long-term form (as
part of the flowside).  Hence, clearing msg_namelen immediately after
each call to udp_flow_from_sock() made sense to me.

I did consider changing udp_flow_from_sock() to take a socklen_t *
which it clears after using.  That seemed slightly abstraction
violationy to me: clearing msg_namelen only makes sense because the
address is part of a re-used mmsghdr array, and that's not something
udp_flow_from_sock() "knows".

That was my reasoning, anyway.  I'm happy enough to change it if you
have a preferred approach.

> later (because n > 0), and:
> 
> >  	for (i = 0; i < n; ) {
> >  		flow_sidx_t batchsidx = udp_meta[i].tosidx;
> >  		uint8_t batchpif = pif_at_sidx(batchsidx);
> > @@ -525,18 +509,22 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  
> >  		do {
> >  			if (pif_is_socket(batchpif)) {
> > -				udp_splice_prepare(mmh_recv, i);
> > +				udp_splice_prepare(udp_mh_recv, i);
> >  			} else if (batchpif == PIF_TAP) {
> > -				udp_tap_prepare(mmh_recv, i,
> > +				udp_tap_prepare(udp_mh_recv, i,
> >  						flowside_at_sidx(batchsidx));
> >  			}
> >  
> > +			/* Restore sockaddr length clobbered by recvmsg() */
> > +			udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);
> 
> what is the difference between assigning sizeof(udp_meta[i].s_in); and
> sasize? I thought it would be the same quantity.

It is.  The only purpose of sasize is to avoid some over-long lines.

> > +
> >  			if (++i >= n)
> >  				break;
> >  
> >  			udp_meta[i].tosidx = udp_flow_from_sock(c, ref,
> >  								&udp_meta[i].s_in,
> >  								now);
> > +			udp_mh_recv[i].msg_hdr.msg_namelen = sasize;
> >  		} while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx));
> >  
> >  		if (pif_is_socket(batchpif)) {
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
  2024-08-27  1:12     ` David Gibson
@ 2024-08-27  5:33       ` Stefano Brivio
  2024-08-27  6:04         ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2024-08-27  5:33 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Tue, 27 Aug 2024 11:12:46 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Aug 26, 2024 at 09:32:55PM +0200, Stefano Brivio wrote:
> > On Mon, 26 Aug 2024 19:37:14 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > We've already gotten rid of most of the IPv4/IPv6 specific data structures
> > > in udp.c by merging them with each other.  One significant one remains:
> > > udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
> > > interaction.  We initialise the msg_namelen fields to represent the total
> > > size we have for a socket address, but when we receive into the arrays
> > > those are modified to the actual length of the sockaddr we received.
> > > 
> > > That meant that naively merging the arrays meant that if we received IPv4
> > > datagrams, then IPv6 datagrams, the addresses for the latter would be
> > > truncated.  In this patch address that by resetting the received
> > > msg_namelen as soon as we've found a flow for the datagram.  Finding the
> > > flow is the only thing that might use the actual sockaddr length, although
> > > we in fact don't need it for the time being.
> > > 
> > > This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
> > > so remove that as well.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  udp.c | 57 ++++++++++++++++++++-------------------------------------
> > >  udp.h |  2 --
> > >  2 files changed, 20 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/udp.c b/udp.c
> > > index 8a93aad6..6638c22b 100644
> > > --- a/udp.c
> > > +++ b/udp.c
> > > @@ -178,8 +178,7 @@ enum udp_iov_idx {
> > >  
> > >  /* IOVs and msghdr arrays for receiving datagrams from sockets */
> > >  static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
> > > -static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
> > > -static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
> > > +static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
> > >  
> > >  /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
> > >  static union sockaddr_inany udp_splice_to;
> > > @@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
> > >  static void udp_iov_init_one(const struct ctx *c, size_t i)
> > >  {
> > >  	struct udp_payload_t *payload = &udp_payload[i];
> > > +	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
> > >  	struct udp_meta_t *meta = &udp_meta[i];
> > >  	struct iovec *siov = &udp_iov_recv[i];
> > >  	struct iovec *tiov = udp_l2_iov[i];
> > > @@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
> > >  	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
> > >  	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
> > >  
> > > -	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
> > > -	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
> > > -	 * time or risk truncating the address on future IPv6 recv()s.
> > > -	 */
> > > -	if (c->ifi4) {
> > > -		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
> > > -
> > > -		mh->msg_name	= &meta->s_in;
> > > -		mh->msg_namelen	= sizeof(struct sockaddr_in);
> > > -		mh->msg_iov	= siov;
> > > -		mh->msg_iovlen	= 1;
> > > -	}
> > > -
> > > -	if (c->ifi6) {
> > > -		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
> > > -
> > > -		mh->msg_name	= &meta->s_in;
> > > -		mh->msg_namelen	= sizeof(struct sockaddr_in6);
> > > -		mh->msg_iov	= siov;
> > > -		mh->msg_iovlen	= 1;
> > > -	}
> > > +	mh->msg_name	= &meta->s_in;
> > > +	mh->msg_namelen	= sizeof(meta->s_in);
> > > +	mh->msg_iov	= siov;
> > > +	mh->msg_iovlen	= 1;
> > >  }
> > >  
> > >  /**
> > > @@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
> > >  void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  			     uint32_t events, const struct timespec *now)
> > >  {
> > > -	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
> > > +	const socklen_t sasize = sizeof(udp_meta[0].s_in);
> > >  	int n, i;
> > >  
> > > -	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
> > > +	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
> > >  		return;
> > >  
> > >  	/* We divide datagrams into batches based on how we need to send them,
> > > @@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  	 * populate it one entry *ahead* of the loop counter.
> > >  	 */
> > >  	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
> > > +	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;  
> > 
> > I don't understand why you need this assignment. To me it looks
> > redundant with:
> > 
> >   udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);  
> 
> It's not redundant per se, because the later assignment only occurs
> for i > 0, so the first one is for slot 0.

I still don't see how: the second assignment (out of three) is done
before i is incremented, so that should cover i == 0 as well, right?

> It would, however, be
> possible to move to a single assignment in the loop body before i is
> incremented.
> 
> I did it this way, because I found it easier to reason about.  At
> least theoretically the value of msg_namelen written by recvmmsg()
> could be important, although we don't use yet (we rely on the
> sa_family field instead).  But because of that it felt wrong to
> overwrite that value before we've "consumed" it.  Logically that
> happens in udp_flow_from_sock() which is what takes the address in
> msg_name / msg_namelen and converts it into the long-term form (as
> part of the flowside).  Hence, clearing msg_namelen immediately after
> each call to udp_flow_from_sock() made sense to me.
> 
> I did consider changing udp_flow_from_sock() to take a socklen_t *
> which it clears after using.  That seemed slightly abstraction
> violationy to me: clearing msg_namelen only makes sense because the
> address is part of a re-used mmsghdr array, and that's not something
> udp_flow_from_sock() "knows".
> 
> That was my reasoning, anyway.  I'm happy enough to change it if you
> have a preferred approach.

No, no, this all makes sense. But you add three assignments here, and I
don't understand why #1 is needed if we have #2 and #3, or why #2 is
needed if we have #1 and #3.

> > later (because n > 0), and:
> >   
> > >  	for (i = 0; i < n; ) {
> > >  		flow_sidx_t batchsidx = udp_meta[i].tosidx;
> > >  		uint8_t batchpif = pif_at_sidx(batchsidx);
> > > @@ -525,18 +509,22 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  
> > >  		do {
> > >  			if (pif_is_socket(batchpif)) {
> > > -				udp_splice_prepare(mmh_recv, i);
> > > +				udp_splice_prepare(udp_mh_recv, i);
> > >  			} else if (batchpif == PIF_TAP) {
> > > -				udp_tap_prepare(mmh_recv, i,
> > > +				udp_tap_prepare(udp_mh_recv, i,
> > >  						flowside_at_sidx(batchsidx));
> > >  			}
> > >  
> > > +			/* Restore sockaddr length clobbered by recvmsg() */
> > > +			udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);  
> > 
> > what is the difference between assigning sizeof(udp_meta[i].s_in); and
> > sasize? I thought it would be the same quantity.  
> 
> It is.  The only purpose of sasize is to avoid some over-long lines.

Right, but why do you use it just twice out of three assignments? What
is special with the one immediately above here?

> > > +
> > >  			if (++i >= n)
> > >  				break;
> > >  
> > >  			udp_meta[i].tosidx = udp_flow_from_sock(c, ref,
> > >  								&udp_meta[i].s_in,
> > >  								now);
> > > +			udp_mh_recv[i].msg_hdr.msg_namelen = sasize;
> > >  		} while (flow_sidx_eq(udp_meta[i].tosidx, batchsidx));
> > >  
> > >  		if (pif_is_socket(batchpif)) {  

-- 
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays
  2024-08-27  5:33       ` Stefano Brivio
@ 2024-08-27  6:04         ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2024-08-27  6:04 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7197 bytes --]

On Tue, Aug 27, 2024 at 07:33:29AM +0200, Stefano Brivio wrote:
> On Tue, 27 Aug 2024 11:12:46 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Aug 26, 2024 at 09:32:55PM +0200, Stefano Brivio wrote:
> > > On Mon, 26 Aug 2024 19:37:14 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > We've already gotten rid of most of the IPv4/IPv6 specific data structures
> > > > in udp.c by merging them with each other.  One significant one remains:
> > > > udp[46]_mh_recv.  This was a bit awkward to remove because of a subtle
> > > > interaction.  We initialise the msg_namelen fields to represent the total
> > > > size we have for a socket address, but when we receive into the arrays
> > > > those are modified to the actual length of the sockaddr we received.
> > > > 
> > > > That meant that naively merging the arrays meant that if we received IPv4
> > > > datagrams, then IPv6 datagrams, the addresses for the latter would be
> > > > truncated.  In this patch address that by resetting the received
> > > > msg_namelen as soon as we've found a flow for the datagram.  Finding the
> > > > flow is the only thing that might use the actual sockaddr length, although
> > > > we in fact don't need it for the time being.
> > > > 
> > > > This also removes the last use of the 'v6' field from udp_listen_epoll_ref,
> > > > so remove that as well.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  udp.c | 57 ++++++++++++++++++++-------------------------------------
> > > >  udp.h |  2 --
> > > >  2 files changed, 20 insertions(+), 39 deletions(-)
> > > > 
> > > > diff --git a/udp.c b/udp.c
> > > > index 8a93aad6..6638c22b 100644
> > > > --- a/udp.c
> > > > +++ b/udp.c
> > > > @@ -178,8 +178,7 @@ enum udp_iov_idx {
> > > >  
> > > >  /* IOVs and msghdr arrays for receiving datagrams from sockets */
> > > >  static struct iovec	udp_iov_recv		[UDP_MAX_FRAMES];
> > > > -static struct mmsghdr	udp4_mh_recv		[UDP_MAX_FRAMES];
> > > > -static struct mmsghdr	udp6_mh_recv		[UDP_MAX_FRAMES];
> > > > +static struct mmsghdr	udp_mh_recv		[UDP_MAX_FRAMES];
> > > >  
> > > >  /* IOVs and msghdr arrays for sending "spliced" datagrams to sockets */
> > > >  static union sockaddr_inany udp_splice_to;
> > > > @@ -222,6 +221,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
> > > >  static void udp_iov_init_one(const struct ctx *c, size_t i)
> > > >  {
> > > >  	struct udp_payload_t *payload = &udp_payload[i];
> > > > +	struct msghdr *mh = &udp_mh_recv[i].msg_hdr;
> > > >  	struct udp_meta_t *meta = &udp_meta[i];
> > > >  	struct iovec *siov = &udp_iov_recv[i];
> > > >  	struct iovec *tiov = udp_l2_iov[i];
> > > > @@ -236,27 +236,10 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
> > > >  	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
> > > >  	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
> > > >  
> > > > -	/* It's useful to have separate msghdr arrays for receiving.  Otherwise,
> > > > -	 * an IPv4 recv() will alter msg_namelen, so we'd have to reset it every
> > > > -	 * time or risk truncating the address on future IPv6 recv()s.
> > > > -	 */
> > > > -	if (c->ifi4) {
> > > > -		struct msghdr *mh = &udp4_mh_recv[i].msg_hdr;
> > > > -
> > > > -		mh->msg_name	= &meta->s_in;
> > > > -		mh->msg_namelen	= sizeof(struct sockaddr_in);
> > > > -		mh->msg_iov	= siov;
> > > > -		mh->msg_iovlen	= 1;
> > > > -	}
> > > > -
> > > > -	if (c->ifi6) {
> > > > -		struct msghdr *mh = &udp6_mh_recv[i].msg_hdr;
> > > > -
> > > > -		mh->msg_name	= &meta->s_in;
> > > > -		mh->msg_namelen	= sizeof(struct sockaddr_in6);
> > > > -		mh->msg_iov	= siov;
> > > > -		mh->msg_iovlen	= 1;
> > > > -	}
> > > > +	mh->msg_name	= &meta->s_in;
> > > > +	mh->msg_namelen	= sizeof(meta->s_in);
> > > > +	mh->msg_iov	= siov;
> > > > +	mh->msg_iovlen	= 1;
> > > >  }
> > > >  
> > > >  /**
> > > > @@ -506,10 +489,10 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
> > > >  void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > > >  			     uint32_t events, const struct timespec *now)
> > > >  {
> > > > -	struct mmsghdr *mmh_recv = ref.udp.v6 ? udp6_mh_recv : udp4_mh_recv;
> > > > +	const socklen_t sasize = sizeof(udp_meta[0].s_in);
> > > >  	int n, i;
> > > >  
> > > > -	if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0)
> > > > +	if ((n = udp_sock_recv(c, ref.fd, events, udp_mh_recv)) <= 0)
> > > >  		return;
> > > >  
> > > >  	/* We divide datagrams into batches based on how we need to send them,
> > > > @@ -518,6 +501,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
> > > >  	 * populate it one entry *ahead* of the loop counter.
> > > >  	 */
> > > >  	udp_meta[0].tosidx = udp_flow_from_sock(c, ref, &udp_meta[0].s_in, now);
> > > > +	udp_mh_recv[0].msg_hdr.msg_namelen = sasize;  
> > > 
> > > I don't understand why you need this assignment. To me it looks
> > > redundant with:
> > > 
> > >   udp_mh_recv[i].msg_hdr.msg_namelen = sizeof(udp_meta[i].s_in);  
> > 
> > It's not redundant per se, because the later assignment only occurs
> > for i > 0, so the first one is for slot 0.
> 
> I still don't see how: the second assignment (out of three) is done
> before i is incremented, so that should cover i == 0 as well, right?
> 
> > It would, however, be
> > possible to move to a single assignment in the loop body before i is
> > incremented.
> > 
> > I did it this way, because I found it easier to reason about.  At
> > least theoretically the value of msg_namelen written by recvmmsg()
> > could be important, although we don't use yet (we rely on the
> > sa_family field instead).  But because of that it felt wrong to
> > overwrite that value before we've "consumed" it.  Logically that
> > happens in udp_flow_from_sock() which is what takes the address in
> > msg_name / msg_namelen and converts it into the long-term form (as
> > part of the flowside).  Hence, clearing msg_namelen immediately after
> > each call to udp_flow_from_sock() made sense to me.
> > 
> > I did consider changing udp_flow_from_sock() to take a socklen_t *
> > which it clears after using.  That seemed slightly abstraction
> > violationy to me: clearing msg_namelen only makes sense because the
> > address is part of a re-used mmsghdr array, and that's not something
> > udp_flow_from_sock() "knows".
> > 
> > That was my reasoning, anyway.  I'm happy enough to change it if you
> > have a preferred approach.
> 
> No, no, this all makes sense. But you add three assignments here, and I
> don't understand why #1 is needed if we have #2 and #3, or why #2 is
> needed if we have #1 and #3.

Oh, bother, somehow I missed #2 when reading your comments.  It's a
leftover from an earlier draft and is not supposed to be there.
New spin shortly.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-27  6:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-26  9:37 [PATCH 0/3] Dual stack sockets for UDP David Gibson
2024-08-26  9:37 ` [PATCH 1/3] udp: Merge udp[46]_mh_recv arrays David Gibson
2024-08-26 19:32   ` Stefano Brivio
2024-08-27  1:12     ` David Gibson
2024-08-27  5:33       ` Stefano Brivio
2024-08-27  6:04         ` David Gibson
2024-08-26  9:37 ` [PATCH 2/3] udp: Remove unnnecessary local from udp_sock_init() David Gibson
2024-08-26  9:37 ` [PATCH 3/3] udp: Use dual stack sockets for port forwarding when possible David Gibson
2024-08-26 19:33 ` [PATCH 0/3] Dual stack sockets for UDP Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).