From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=itZd2kKU; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 012785A0BC2 for ; Tue, 02 Dec 2025 05:02:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1764648138; bh=uuM9nwsZAp6DE3hI143hS4KPCgeTFy5UyGbDchGenkY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=itZd2kKUuuZAhwgXKheX3zYJQXFcQyJdAyL/DBiO6GyY+rQgnYcPgr6SaQ240tDZc XlPeKxcLtcVVC+k6oyuxGrkDTtAyxEgjZ357S5hLnI2PK585UBoKIXFYlnFJDfEU6R 40DVnCL0eQVkm/db0t2dsOqLHPOwZ1UxcNdUOkyGIZk3uQ53Nfz0fSoxTO7eZjyEbL JENyrahyVHkQUPFui2ZApJeRmw6Gfu4wqxHmmt7ZUbblvyRNNcJIlb026+tRD5htvM rekCadAaGwbsk1JuvPDNoNbdh0We3xUfFvU9EYdRO+U808+yWa1NtO7unIDcBqzDe8 oVloE7gJzXYuA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dL6Y61WB4z4wQd; Tue, 02 Dec 2025 15:02:18 +1100 (AEDT) From: David Gibson To: Stefano Brivio , passt-dev@passt.top Subject: [PATCH v5 11/15] tcp, udp: Remove fallback if creating dual stack socket fails Date: Tue, 2 Dec 2025 15:02:11 +1100 Message-ID: <20251202040215.2351792-12-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20251202040215.2351792-1-david@gibson.dropbear.id.au> References: <20251202040215.2351792-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: KEVKTF56SHYBELKQGII4AZVNULD3DODU X-Message-ID-Hash: KEVKTF56SHYBELKQGII4AZVNULD3DODU X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: To save kernel memory we try to use "dual stack" sockets which can listen for both IPv4 and IPv6 connections where possible. To support kernels which don't allow dual stack sockets, we fall back to creating individual sockets for IPv4 and IPv6. This fallback causes some mild ugliness now, and will cause more difficulty with upcoming improvements to the forwarding logic. I don't think we need the fallback on the following grounds: 1) The fallback was broken since inception: The fallback was triggered if pif_sock_l4() failed attempting to create the dual stack socket. But even if the kernel didn't support them, pif_sock_l4() would not report a failure. - Dual stack sockets are distinguished by having the IPV6_V6ONLY sockopt set to 0. However, until the last patch, we only called setsockopt() if we wanted to set this to 1, so there was no kernel operation which could fail for dual stack sockets - we'd silently create a IPv6 only socket instead. - Even if we did call the setsockopt(), we only printed a debug() message for failures, we didn't report it to the caller 2) Dual stack sockets are not just a Linux extension The dual stack socket interface is described in RFC3493, specifically section 3.7 and section 5.3. It is supported on BSD: https://man.freebsd.org/cgi/man.cgi?query=ip6 and on Windows: https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-ipv6-socket-options 3) Linux has supported dual stack sockets for over 20 years According to ipv6(7) the IPV6_V6ONLY socket option was introduced in Linux 2.6 and Linux 2.4.21 both from 2003. Signed-off-by: David Gibson --- tcp.c | 43 +++++++++++++++++++++++++------------------ udp.c | 56 ++++++++++++++++++++++++++------------------------------ 2 files changed, 51 insertions(+), 48 deletions(-) diff --git a/tcp.c b/tcp.c index 428bac7b..2abb8be4 100644 --- a/tcp.c +++ b/tcp.c @@ -2575,42 +2575,49 @@ static int tcp_sock_init_one(const struct ctx *c, uint8_t pif, } /** - * tcp_sock_init() - Create listening sockets for a given host ("inbound") port + * tcp_sock_init() - Create listening socket for a given host ("inbound") port * @c: Execution context * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE) * @addr: Pointer to address for binding, NULL if not configured * @ifname: Name of interface to bind to, NULL if not configured * @port: Port, host order * - * Return: 0 on (partial) success, negative error code on (complete) failure + * Return: 0 on success, negative error code on failure */ int tcp_sock_init(const struct ctx *c, uint8_t pif, const union inany_addr *addr, const char *ifname, in_port_t port) { - int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1; + int s; ASSERT(!c->no_tcp); - if (!addr && c->ifi4 && c->ifi6) - /* Attempt to get a dual stack socket */ - if (tcp_sock_init_one(c, pif, NULL, ifname, port) >= 0) + if (!c->ifi4) { + if (!addr) + /* Restrict to v6 only */ + addr = &inany_any6; + else if (inany_v4(addr)) + /* Nothing to do */ return 0; + } + if (!c->ifi6) { + if (!addr) + /* Restrict to v4 only */ + addr = &inany_any4; + else if (!inany_v4(addr)) + /* Nothing to do */ + return 0; + } - /* Otherwise create a socket per IP version */ - if ((!addr || inany_v4(addr)) && c->ifi4) - r4 = tcp_sock_init_one(c, pif, - addr ? addr : &inany_any4, ifname, port); - - if ((!addr || !inany_v4(addr)) && c->ifi6) - r6 = tcp_sock_init_one(c, pif, - addr ? addr : &inany_any6, ifname, port); - - if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6)) - return 0; + s = tcp_sock_init_one(c, pif, addr, ifname, port); + if (s < 0) + return s; + if (s > FD_REF_MAX) + return -EIO; - return r4 < 0 ? r4 : r6; + return 0; } + /** * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections * @c: Execution context diff --git a/udp.c b/udp.c index b3ce9c7f..3d097fbb 100644 --- a/udp.c +++ b/udp.c @@ -1102,14 +1102,14 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, } /** - * udp_sock_init() - Initialise listening sockets for a given port + * udp_sock_init() - Initialise listening socket for a given port * @c: Execution context * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE) * @addr: Pointer to address for binding, NULL if not configured * @ifname: Name of interface to bind to, NULL if not configured * @port: Port, host order * - * Return: 0 on (partial) success, negative error code on (complete) failure + * Return: 0 on success, negative error code on failure */ int udp_sock_init(const struct ctx *c, uint8_t pif, const union inany_addr *addr, const char *ifname, @@ -1119,8 +1119,8 @@ int udp_sock_init(const struct ctx *c, uint8_t pif, .pif = pif, .port = port, }; - int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1; int (*socks)[NUM_PORTS]; + int s; ASSERT(!c->no_udp); ASSERT(pif_is_socket(pif)); @@ -1130,40 +1130,36 @@ int udp_sock_init(const struct ctx *c, uint8_t pif, else socks = udp_splice_ns; - if (!addr && c->ifi4 && c->ifi6) { - int s; - - /* Attempt to get a dual stack socket */ - s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST, - NULL, ifname, port, uref.u32); - socks[V4][port] = s < 0 ? -1 : s; - socks[V6][port] = s < 0 ? -1 : s; - if (IN_INTERVAL(0, FD_REF_MAX, s)) + if (!c->ifi4) { + if (!addr) + /* Restrict to v6 only */ + addr = &inany_any6; + else if (inany_v4(addr)) + /* Nothing to do */ return 0; } - - if ((!addr || inany_v4(addr)) && c->ifi4) { - const union inany_addr *a = addr ? addr : &inany_any4; - - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, - port, uref.u32); - - socks[V4][port] = r4 < 0 ? -1 : r4; + if (!c->ifi6) { + if (!addr) + /* Restrict to v4 only */ + addr = &inany_any4; + else if (!inany_v4(addr)) + /* Nothing to do */ + return 0; } - if ((!addr || !inany_v4(addr)) && c->ifi6) { - const union inany_addr *a = addr ? addr : &inany_any6; - - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, - port, uref.u32); - - socks[V6][port] = r6 < 0 ? -1 : r6; + s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, + addr, ifname, port, uref.u32); + if (s > FD_REF_MAX) { + close(s); + s = -EIO; } - if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6)) - return 0; + if (!addr || inany_v4(addr)) + socks[V4][port] = s < 0 ? -1 : s; + if (!addr || !inany_v4(addr)) + socks[V6][port] = s < 0 ? -1 : s; - return r4 < 0 ? r4 : r6; + return s < 0 ? s : 0; } /** -- 2.52.0