public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>, passt-dev@passt.top
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v4 8/9] tcp, udp: Remove fallback if creating dual stack socket fails
Date: Wed, 19 Nov 2025 16:22:56 +1100	[thread overview]
Message-ID: <20251119052257.3004500-9-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20251119052257.3004500-1-david@gibson.dropbear.id.au>

To save kernel memory we try to use "dual stack" sockets which can listen
for both IPv4 and IPv6 connections where possible.  To support kernels
which don't allow dual stack sockets, we fall back to creating individual
sockets for IPv4 and IPv6.

This fallback causes some mild ugliness now, and will cause more difficulty
with upcoming improvements to the forwarding logic.  I don't think we need
the fallback on the following grounds:

1) The fallback was broken since inception:

The fallback was triggered if pif_sock_l4() failed attempting to create the
dual stack socket.  But even if the kernel didn't support them,
pif_sock_l4() would not report a failure.
  - Dual stack sockets are distinguished by having the IPV6_V6ONLY sockopt
    set to 0.  However, until the last patch, we only called setsockopt()
    if we wanted to set this to 1, so there was no kernel operation which
    could fail for dual stack sockets - we'd silently create a IPv6 only
    socket instead.
  - Even if we did call the setsockopt(), we only printed a debug() message
    for failures, we didn't report it to the caller

2) Dual stack sockets are not just a Linux extension

The dual stack socket interface is described in RFC3493, specifically
section 3.7 and section 5.3.  It is supported on BSD:
  https://man.freebsd.org/cgi/man.cgi?query=ip6
and on Windows:
  https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-ipv6-socket-options

3) Linux has supported dual stack sockets for over 20 years

According to ipv6(7) the IPV6_V6ONLY socket option was introduced in
Linux 2.6 and Linux 2.4.21 both from 2003.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 43 +++++++++++++++++++++++++------------------
 udp.c | 56 ++++++++++++++++++++++++++------------------------------
 2 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/tcp.c b/tcp.c
index 428bac7b..2abb8be4 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2575,42 +2575,49 @@ static int tcp_sock_init_one(const struct ctx *c, uint8_t pif,
 }
 
 /**
- * tcp_sock_init() - Create listening sockets for a given host ("inbound") port
+ * tcp_sock_init() - Create listening socket for a given host ("inbound") port
  * @c:		Execution context
  * @pif:	Interface to open the socket for (PIF_HOST or PIF_SPLICE)
  * @addr:	Pointer to address for binding, NULL if not configured
  * @ifname:	Name of interface to bind to, NULL if not configured
  * @port:	Port, host order
  *
- * Return: 0 on (partial) success, negative error code on (complete) failure
+ * Return: 0 on success, negative error code on failure
  */
 int tcp_sock_init(const struct ctx *c, uint8_t pif,
 		  const union inany_addr *addr, const char *ifname,
 		  in_port_t port)
 {
-	int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
+	int s;
 
 	ASSERT(!c->no_tcp);
 
-	if (!addr && c->ifi4 && c->ifi6)
-		/* Attempt to get a dual stack socket */
-		if (tcp_sock_init_one(c, pif, NULL, ifname, port) >= 0)
+	if (!c->ifi4) {
+		if (!addr)
+			/* Restrict to v6 only */
+			addr = &inany_any6;
+		else if (inany_v4(addr))
+			/* Nothing to do */
 			return 0;
+	}
+	if (!c->ifi6) {
+		if (!addr)
+			/* Restrict to v4 only */
+			addr = &inany_any4;
+		else if (!inany_v4(addr))
+			/* Nothing to do */
+			return 0;
+	}
 
-	/* Otherwise create a socket per IP version */
-	if ((!addr || inany_v4(addr)) && c->ifi4)
-		r4 = tcp_sock_init_one(c, pif,
-				       addr ? addr : &inany_any4, ifname, port);
-
-	if ((!addr || !inany_v4(addr)) && c->ifi6)
-		r6 = tcp_sock_init_one(c, pif,
-				       addr ? addr : &inany_any6, ifname, port);
-
-	if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
-		return 0;
+	s = tcp_sock_init_one(c, pif, addr, ifname, port);
+	if (s < 0)
+		return s;
+	if (s > FD_REF_MAX)
+		return -EIO;
 
-	return r4 < 0 ? r4 : r6;
+	return 0;
 }
+
 /**
  * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
  * @c:		Execution context
diff --git a/udp.c b/udp.c
index b3ce9c7f..3d097fbb 100644
--- a/udp.c
+++ b/udp.c
@@ -1102,14 +1102,14 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
 }
 
 /**
- * udp_sock_init() - Initialise listening sockets for a given port
+ * udp_sock_init() - Initialise listening socket for a given port
  * @c:		Execution context
  * @pif:	Interface to open the socket for (PIF_HOST or PIF_SPLICE)
  * @addr:	Pointer to address for binding, NULL if not configured
  * @ifname:	Name of interface to bind to, NULL if not configured
  * @port:	Port, host order
  *
- * Return: 0 on (partial) success, negative error code on (complete) failure
+ * Return: 0 on success, negative error code on failure
  */
 int udp_sock_init(const struct ctx *c, uint8_t pif,
 		  const union inany_addr *addr, const char *ifname,
@@ -1119,8 +1119,8 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
 		.pif = pif,
 		.port = port,
 	};
-	int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 	int (*socks)[NUM_PORTS];
+	int s;
 
 	ASSERT(!c->no_udp);
 	ASSERT(pif_is_socket(pif));
@@ -1130,40 +1130,36 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
 	else
 		socks = udp_splice_ns;
 
-	if (!addr && c->ifi4 && c->ifi6) {
-		int s;
-
-		/* Attempt to get a dual stack socket */
-		s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST,
-				NULL, ifname, port, uref.u32);
-		socks[V4][port] = s < 0 ? -1 : s;
-		socks[V6][port] = s < 0 ? -1 : s;
-		if (IN_INTERVAL(0, FD_REF_MAX, s))
+	if (!c->ifi4) {
+		if (!addr)
+			/* Restrict to v6 only */
+			addr = &inany_any6;
+		else if (inany_v4(addr))
+			/* Nothing to do */
 			return 0;
 	}
-
-	if ((!addr || inany_v4(addr)) && c->ifi4) {
-		const union inany_addr *a = addr ? addr : &inany_any4;
-
-		r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
-				 port, uref.u32);
-
-		socks[V4][port] = r4 < 0 ? -1 : r4;
+	if (!c->ifi6) {
+		if (!addr)
+			/* Restrict to v4 only */
+			addr = &inany_any4;
+		else if (!inany_v4(addr))
+			/* Nothing to do */
+			return 0;
 	}
 
-	if ((!addr || !inany_v4(addr)) && c->ifi6) {
-		const union inany_addr *a = addr ? addr : &inany_any6;
-
-		r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
-				 port, uref.u32);
-
-		socks[V6][port] = r6 < 0 ? -1 : r6;
+	s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
+			addr, ifname, port, uref.u32);
+	if (s > FD_REF_MAX) {
+		close(s);
+		s = -EIO;
 	}
 
-	if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
-		return 0;
+	if (!addr || inany_v4(addr))
+		socks[V4][port] = s < 0 ? -1 : s;
+	if (!addr || !inany_v4(addr))
+		socks[V6][port] = s < 0 ? -1 : s;
 
-	return r4 < 0 ? r4 : r6;
+	return s < 0 ? s : 0;
 }
 
 /**
-- 
2.51.1


  parent reply	other threads:[~2025-11-19  5:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19  5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
2025-11-19  5:22 ` [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args David Gibson
2025-11-19  5:22 ` [PATCH v4 2/9] inany: Let length of sockaddr_inany be implicit from the family David Gibson
2025-11-19  5:22 ` [PATCH v4 3/9] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
2025-11-19  5:22 ` [PATCH v4 4/9] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-11-19  5:22 ` [PATCH v4 5/9] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-11-19  5:22 ` [PATCH v4 6/9] udp: Move udp_sock_init() special case to its caller David Gibson
2025-11-19  5:22 ` [PATCH v4 7/9] util: Fix setting of IPV6_V6ONLY socket option David Gibson
2025-11-19  5:22 ` David Gibson [this message]
2025-11-19  5:22 ` [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-11-21  3:56   ` Stefano Brivio
2025-11-21  5:24     ` David Gibson
2025-11-21  5:39       ` Stefano Brivio
2025-11-26  5:42     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251119052257.3004500-9-david@gibson.dropbear.id.au \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).