public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v2 00/12] RFC: Improve forwarding data structure
@ 2025-12-19 14:18 David Gibson
  2025-12-19 14:18 ` [PATCH v2 01/12] tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen() David Gibson
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

This replaces the existing data structures storing forwarding
information, with new ones that can store more detailed and flexible
information,  Eventually we plan to allow dynamically updating this as well.

Patches 1, 2 & 8 could be merged (1 & 2 are the same as the the
earlier series with cleanups to the listening socket init functions).
The rest are just for early review.  Since I'm going to be away for
the next two weeks, no rush at all.

v2:
 * Remove preliminary patches already merged
 * Add several patches actually using the new data structure

David Gibson (12):
  tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen()
  udp: Rename udp_sock_init() to udp_listen() with small cleanups
  conf, fwd: Keep a table of our port forwarding configuration
  conf: Accurately record ifname and address for outbound forwards
  conf, fwd: Record "auto" port forwards in forwarding table
  tcp, udp: Make {tcp,udp}_listen() return socket fds
  fwd: Make space to store listening sockets in forward table
  ip: Add ipproto_name() function
  fwd, tcp, udp: Set up listening sockets based on forward table
  tcp, udp: Remove old auto-forwarding socket arrays
  fwd: Generate auto-forward exclusions from socket fd tables
  tcp: Remove unused tcp_epoll_ref

 conf.c | 125 +++++++++++++++++---------
 fwd.c  | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fwd.h  |  49 ++++++++++-
 ip.c   |  26 ++++++
 ip.h   |   2 +
 tcp.c  | 190 +++------------------------------------
 tcp.h  |  17 +---
 udp.c  | 140 ++---------------------------
 udp.h  |   8 +-
 9 files changed, 447 insertions(+), 383 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 01/12] tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen()
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 02/12] udp: Rename udp_sock_init() to udp_listen() with small cleanups David Gibson
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Despite the name, these two functions are specifically for creating
listening sockets, not any others.  Recent changes mean that there's
always exactly one call of tcp_sock_init_one() call per call to
tcp_sock_init().  So combine them into tcp_listen().

While we're there remove a redundant check for (s > FD_REF_MAX).
pif_sock_l4() already checks for this (and must, in order to properly
populate the epoll reference).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c |  2 +-
 tcp.c  | 94 +++++++++++++++++++++-------------------------------------
 tcp.h  |  6 ++--
 3 files changed, 38 insertions(+), 64 deletions(-)

diff --git a/conf.c b/conf.c
index 2942c8c2..dada3b1f 100644
--- a/conf.c
+++ b/conf.c
@@ -175,7 +175,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 		fwd->delta[i] = to - first;
 
 		if (optname == 't')
-			ret = tcp_sock_init(c, PIF_HOST, addr, ifname, i);
+			ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
 		else if (optname == 'u')
 			ret = udp_sock_init(c, PIF_HOST, addr, ifname, i);
 		else
diff --git a/tcp.c b/tcp.c
index b179e399..71c6d632 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2673,66 +2673,24 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
 }
 
 /**
- * tcp_sock_init_one() - Initialise listening socket for address and port
+ * tcp_listen() - Create listening socket
  * @c:		Execution context
  * @pif:	Interface to open the socket for (PIF_HOST or PIF_SPLICE)
- * @addr:	Pointer to address for binding, NULL for dual stack any
- * @ifname:	Name of interface to bind to, NULL if not configured
+ * @addr:	Pointer to address for binding, NULL for any
+ * @ifname:	Name of interface to bind to, NULL for any
  * @port:	Port, host order
  *
- * Return: fd for the new listening socket, negative error code on failure
- *
- * If pif == PIF_SPLICE, the caller must have already entered the guest ns.
+ * Return: 0 on success, negative error code on failure
  */
-static int tcp_sock_init_one(const struct ctx *c, uint8_t pif,
-			     const union inany_addr *addr, const char *ifname,
-			     in_port_t port)
+int tcp_listen(const struct ctx *c, uint8_t pif,
+	       const union inany_addr *addr, const char *ifname, in_port_t port)
 {
 	union tcp_listen_epoll_ref tref = {
 		.port = port,
 		.pif = pif,
 	};
 	const struct fwd_ports *fwd;
-	int s;
-
-	if (pif == PIF_HOST)
-		fwd = &c->tcp.fwd_in;
-	else
-		fwd = &c->tcp.fwd_out;
-
-	s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, pif, addr, ifname,
-			port, tref.u32);
-
-	if (fwd->mode == FWD_AUTO) {
-		int (*socks)[IP_VERSIONS] = pif == PIF_SPLICE ?
-			tcp_sock_ns : tcp_sock_init_ext;
-
-		if (!addr || inany_v4(addr))
-			socks[port][V4] = s < 0 ? -1 : s;
-		if (!addr || !inany_v4(addr))
-			socks[port][V6] = s < 0 ? -1 : s;
-	}
-
-	if (s < 0)
-		return s;
-
-	return s;
-}
-
-/**
- * tcp_sock_init() - Create listening socket for a given host ("inbound") port
- * @c:		Execution context
- * @pif:	Interface to open the socket for (PIF_HOST or PIF_SPLICE)
- * @addr:	Pointer to address for binding, NULL if not configured
- * @ifname:	Name of interface to bind to, NULL if not configured
- * @port:	Port, host order
- *
- * Return: 0 on success, negative error code on failure
- */
-int tcp_sock_init(const struct ctx *c, uint8_t pif,
-		  const union inany_addr *addr, const char *ifname,
-		  in_port_t port)
-{
+	int (*socks)[IP_VERSIONS];
 	int s;
 
 	ASSERT(!c->no_tcp);
@@ -2754,33 +2712,49 @@ int tcp_sock_init(const struct ctx *c, uint8_t pif,
 			return 0;
 	}
 
-	s = tcp_sock_init_one(c, pif, addr, ifname, port);
+	if (pif == PIF_HOST) {
+		fwd = &c->tcp.fwd_in;
+		socks = tcp_sock_init_ext;
+	} else {
+		ASSERT(pif == PIF_SPLICE);
+		fwd = &c->tcp.fwd_out;
+		socks = tcp_sock_ns;
+	}
+
+	s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, pif, addr, ifname,
+			port, tref.u32);
+
+	if (fwd->mode == FWD_AUTO) {
+		if (!addr || inany_v4(addr))
+			socks[port][V4] = s < 0 ? -1 : s;
+		if (!addr || !inany_v4(addr))
+			socks[port][V6] = s < 0 ? -1 : s;
+	}
+
 	if (s < 0)
 		return s;
-	if (s > FD_REF_MAX)
-		return -EIO;
 
 	return 0;
 }
 
 /**
- * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
+ * tcp_ns_listen() - Init socket to listen for spliced outbound connections
  * @c:		Execution context
  * @port:	Port, host order
  */
-static void tcp_ns_sock_init(const struct ctx *c, in_port_t port)
+static void tcp_ns_listen(const struct ctx *c, in_port_t port)
 {
 	ASSERT(!c->no_tcp);
 
 	if (!c->no_bindtodevice) {
-		tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
+		tcp_listen(c, PIF_SPLICE, NULL, "lo", port);
 		return;
 	}
 
 	if (c->ifi4)
-		tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port);
+		tcp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
 	if (c->ifi6)
-		tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port);
+		tcp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
 }
 
 /**
@@ -2801,7 +2775,7 @@ static int tcp_ns_socks_init(void *arg)
 		if (!bitmap_isset(c->tcp.fwd_out.map, port))
 			continue;
 
-		tcp_ns_sock_init(c, port);
+		tcp_ns_listen(c, port);
 	}
 
 	return 0;
@@ -3003,9 +2977,9 @@ static void tcp_port_rebind(struct ctx *c, bool outbound)
 		if ((c->ifi4 && socks[port][V4] == -1) ||
 		    (c->ifi6 && socks[port][V6] == -1)) {
 			if (outbound)
-				tcp_ns_sock_init(c, port);
+				tcp_ns_listen(c, port);
 			else
-				tcp_sock_init(c, PIF_HOST, NULL, NULL, port);
+				tcp_listen(c, PIF_HOST, NULL, NULL, port);
 		}
 	}
 }
diff --git a/tcp.h b/tcp.h
index 3f21e755..9dd88762 100644
--- a/tcp.h
+++ b/tcp.h
@@ -18,9 +18,9 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
 int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
 		    const void *saddr, const void *daddr, uint32_t flow_lbl,
 		    const struct pool *p, int idx, const struct timespec *now);
-int tcp_sock_init(const struct ctx *c, uint8_t pif,
-		  const union inany_addr *addr, const char *ifname,
-		  in_port_t port);
+int tcp_listen(const struct ctx *c, uint8_t pif,
+	       const union inany_addr *addr, const char *ifname,
+	       in_port_t port);
 int tcp_init(struct ctx *c);
 void tcp_port_rebind_all(struct ctx *c);
 void tcp_timer(const struct ctx *c, const struct timespec *now);
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 02/12] udp: Rename udp_sock_init() to udp_listen() with small cleanups
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
  2025-12-19 14:18 ` [PATCH v2 01/12] tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen() David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 03/12] conf, fwd: Keep a table of our port forwarding configuration David Gibson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Despite the name, this functions is specifically for creating
"listening" sockets, not any others.  While we're there remove a redundant
check for (s > FD_REF_MAX).  pif_sock_l4() already checks for this (and
must, in order to properly populate the epoll reference).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c |  2 +-
 udp.c  | 32 ++++++++++++++------------------
 udp.h  |  6 +++---
 3 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/conf.c b/conf.c
index dada3b1f..84ae12b2 100644
--- a/conf.c
+++ b/conf.c
@@ -177,7 +177,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 		if (optname == 't')
 			ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
 		else if (optname == 'u')
-			ret = udp_sock_init(c, PIF_HOST, addr, ifname, i);
+			ret = udp_listen(c, PIF_HOST, addr, ifname, i);
 		else
 			/* No way to check in advance for -T and -U */
 			ret = 0;
diff --git a/udp.c b/udp.c
index 08bec50a..eda55c39 100644
--- a/udp.c
+++ b/udp.c
@@ -1129,7 +1129,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
 }
 
 /**
- * udp_sock_init() - Initialise listening socket for a given port
+ * udp_listen() - Initialise listening socket for a given port
  * @c:		Execution context
  * @pif:	Interface to open the socket for (PIF_HOST or PIF_SPLICE)
  * @addr:	Pointer to address for binding, NULL if not configured
@@ -1138,9 +1138,8 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
  *
  * Return: 0 on success, negative error code on failure
  */
-int udp_sock_init(const struct ctx *c, uint8_t pif,
-		  const union inany_addr *addr, const char *ifname,
-		  in_port_t port)
+int udp_listen(const struct ctx *c, uint8_t pif,
+	       const union inany_addr *addr, const char *ifname, in_port_t port)
 {
 	union udp_listen_epoll_ref uref = {
 		.pif = pif,
@@ -1150,12 +1149,13 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
 	int s;
 
 	ASSERT(!c->no_udp);
-	ASSERT(pif_is_socket(pif));
 
-	if (pif == PIF_HOST)
+	if (pif == PIF_HOST) {
 		socks = udp_splice_init;
-	else
+	} else {
+		ASSERT(pif == PIF_SPLICE);
 		socks = udp_splice_ns;
+	}
 
 	if (!c->ifi4) {
 		if (!addr)
@@ -1176,10 +1176,6 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
 
 	s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
 			addr, ifname, port, uref.u32);
-	if (s > FD_REF_MAX) {
-		close(s);
-		s = -EIO;
-	}
 
 	if (!addr || inany_v4(addr))
 		socks[V4][port] = s < 0 ? -1 : s;
@@ -1210,23 +1206,23 @@ static void udp_splice_iov_init(void)
 }
 
 /**
- * udp_ns_sock_init() - Init socket to listen for spliced outbound connections
+ * udp_ns_listen() - Init socket to listen for spliced outbound connections
  * @c:		Execution context
  * @port:	Port, host order
  */
-static void udp_ns_sock_init(const struct ctx *c, in_port_t port)
+static void udp_ns_listen(const struct ctx *c, in_port_t port)
 {
 	ASSERT(!c->no_udp);
 
 	if (!c->no_bindtodevice) {
-		udp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
+		udp_listen(c, PIF_SPLICE, NULL, "lo", port);
 		return;
 	}
 
 	if (c->ifi4)
-		udp_sock_init(c, PIF_SPLICE, &inany_loopback4, NULL, port);
+		udp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
 	if (c->ifi6)
-		udp_sock_init(c, PIF_SPLICE, &inany_loopback6, NULL, port);
+		udp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
 }
 
 /**
@@ -1261,9 +1257,9 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
 		if ((c->ifi4 && socks[V4][port] == -1) ||
 		    (c->ifi6 && socks[V6][port] == -1)) {
 			if (outbound)
-				udp_ns_sock_init(c, port);
+				udp_ns_listen(c, port);
 			else
-				udp_sock_init(c, PIF_HOST, NULL, NULL, port);
+				udp_listen(c, PIF_HOST, NULL, NULL, port);
 		}
 	}
 }
diff --git a/udp.h b/udp.h
index 03e8dc54..5407db3b 100644
--- a/udp.h
+++ b/udp.h
@@ -15,9 +15,9 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
 		    sa_family_t af, const void *saddr, const void *daddr,
 		    uint8_t ttl, const struct pool *p, int idx,
 		    const struct timespec *now);
-int udp_sock_init(const struct ctx *c, uint8_t pif,
-		  const union inany_addr *addr, const char *ifname,
-		  in_port_t port);
+int udp_listen(const struct ctx *c, uint8_t pif,
+	       const union inany_addr *addr, const char *ifname,
+	       in_port_t port);
 int udp_init(struct ctx *c);
 void udp_port_rebind_all(struct ctx *c);
 void udp_update_l2_buf(const unsigned char *eth_d);
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 03/12] conf, fwd: Keep a table of our port forwarding configuration
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
  2025-12-19 14:18 ` [PATCH v2 01/12] tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen() David Gibson
  2025-12-19 14:18 ` [PATCH v2 02/12] udp: Rename udp_sock_init() to udp_listen() with small cleanups David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 04/12] conf: Accurately record ifname and address for outbound forwards David Gibson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

At present, we set up forwarding as we parse the -t, and -u options, not
keeping a persistent data structure with all the details.  We do have
some information in struct fwd_ports, but it doesn't capture all the nuance
that the options do.

As a first step to generalising our forwarding model, add a table of all
the forwarding configured to struct fwd_ports.  For now it covers only
explicit forwards, not automatic, and we don't do anything with it other
than print some additional debug information.  We'll do more with it in
future patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 83 +++++++++++++++++++++++++++++++++++++---------------------
 fwd.c  | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fwd.h  | 33 ++++++++++++++++++++++-
 3 files changed, 161 insertions(+), 31 deletions(-)

diff --git a/conf.c b/conf.c
index 84ae12b2..ae2dc3e1 100644
--- a/conf.c
+++ b/conf.c
@@ -137,7 +137,7 @@ static int parse_port_range(const char *s, char **endptr,
  * @last:	Last port to forward
  * @exclude:	Bitmap of ports to exclude
  * @to:		Port to translate @first to when forwarding
- * @weak:	Ignore errors, as long as at least one port is mapped
+ * @flags:	Flags for forwarding entries
  */
 static void conf_ports_range_except(const struct ctx *c, char optname,
 				    const char *optarg, struct fwd_ports *fwd,
@@ -145,10 +145,11 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 				    const char *ifname,
 				    uint16_t first, uint16_t last,
 				    const uint8_t *exclude, uint16_t to,
-				    bool weak)
+				    uint8_t flags)
 {
+	unsigned delta = to - first;
 	bool bound_one = false;
-	unsigned i;
+	unsigned base, i;
 	int ret;
 
 	if (first == 0) {
@@ -162,37 +163,48 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 		    optname, optarg);
 	}
 
-	for (i = first; i <= last; i++) {
-		if (bitmap_isset(exclude, i))
+	for (base = first; base <= last; base++) {
+		if (bitmap_isset(exclude, base))
 			continue;
 
-		if (bitmap_isset(fwd->map, i)) {
-			warn(
-"Altering mapping of already mapped port number: %s", optarg);
-		}
+		for (i = base; i <= last; i++) {
+			if (bitmap_isset(exclude, i))
+				break;
 
-		bitmap_set(fwd->map, i);
-		fwd->delta[i] = to - first;
+			if (bitmap_isset(fwd->map, i)) {
+				warn(
+"Altering mapping of already mapped port number: %s", optarg);
+			}
 
-		if (optname == 't')
-			ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
-		else if (optname == 'u')
-			ret = udp_listen(c, PIF_HOST, addr, ifname, i);
-		else
-			/* No way to check in advance for -T and -U */
-			ret = 0;
+			bitmap_set(fwd->map, i);
+			fwd->delta[i] = delta;
+
+			if (optname == 't')
+				ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
+			else if (optname == 'u')
+				ret = udp_listen(c, PIF_HOST, addr, ifname, i);
+			else
+				/* No way to check in advance for -T and -U */
+				ret = 0;
+
+			if (ret == -ENFILE || ret == -EMFILE) {
+				die(
+"Can't open enough sockets for port specifier: %s",
+				    optarg);
+			}
 
-		if (ret == -ENFILE || ret == -EMFILE) {
-			die("Can't open enough sockets for port specifier: %s",
-			    optarg);
+			if (!ret) {
+				bound_one = true;
+			} else if (!(flags & FWD_WEAK)) {
+				die(
+"Failed to bind port %u (%s) for option '-%c %s'",
+				    i, strerror_(-ret), optname, optarg);
+			}
 		}
 
-		if (!ret) {
-			bound_one = true;
-		} else if (!weak) {
-			die("Failed to bind port %u (%s) for option '-%c %s'",
-			    i, strerror_(-ret), optname, optarg);
-		}
+		fwd_table_add(fwd, flags, addr, ifname,
+			      base, i - 1, base + delta);
+		base = i - 1;
 	}
 
 	if (!bound_one)
@@ -262,7 +274,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 		conf_ports_range_except(c, optname, optarg, fwd,
 					NULL, NULL,
 					1, NUM_PORTS - 1, exclude,
-					1, true);
+					1, FWD_WEAK);
 		return;
 	}
 
@@ -347,7 +359,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 		conf_ports_range_except(c, optname, optarg, fwd,
 					addr, ifname,
 					1, NUM_PORTS - 1, exclude,
-					1, true);
+					1, FWD_WEAK);
 		return;
 	}
 
@@ -380,7 +392,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 					addr, ifname,
 					orig_range.first, orig_range.last,
 					exclude,
-					mapped_range.first, false);
+					mapped_range.first, 0);
 	} while ((p = next_chunk(p, ',')));
 
 	return;
@@ -1210,6 +1222,17 @@ dns6:
 			info("    %s", c->dns_search[i].n);
 		}
 	}
+
+	info("Inbound TCP forwarding:");
+	fwd_table_print(&c->tcp.fwd_in);
+	info("Inbound UDP forwarding:");
+	fwd_table_print(&c->udp.fwd_in);
+	if (c->mode == MODE_PASTA) {
+		info("Outbound TCP forwarding:");
+		fwd_table_print(&c->tcp.fwd_out);
+		info("Outbound UDP forwarding:");
+		fwd_table_print(&c->udp.fwd_out);
+	}
 }
 
 /**
diff --git a/fwd.c b/fwd.c
index 44a0e109..a8477607 100644
--- a/fwd.c
+++ b/fwd.c
@@ -13,6 +13,7 @@
  * Author: David Gibson <david@gibson.dropbear.id.au>
  */
 
+#include <assert.h>
 #include <stdint.h>
 #include <errno.h>
 #include <fcntl.h>
@@ -22,6 +23,8 @@
 
 #include "util.h"
 #include "ip.h"
+#include "siphash.h"
+#include "inany.h"
 #include "fwd.h"
 #include "passt.h"
 #include "lineread.h"
@@ -313,6 +316,79 @@ bool fwd_port_is_ephemeral(in_port_t port)
 	return (port >= fwd_ephemeral_min) && (port <= fwd_ephemeral_max);
 }
 
+/**
+ * fwd_table_add() - Add an entry to a forwarding table
+ * @fwd:	Table to add to
+ * @flags:	Flags for this entry
+ * @addr:	Our address to forward (NULL for both 0.0.0.0 and ::)
+ * @ifname:	Only forward from this interface name, if non-empty
+ * @first:	First port number to forward
+ * @last:	Last port number to forward
+ * @to:		First port of target port range to map to
+ */
+void fwd_table_add(struct fwd_ports *fwd, uint8_t flags,
+		   const union inany_addr *addr, const char *ifname,
+		   in_port_t first, in_port_t last, in_port_t to)
+{
+	/* Flags which can be set from the caller */
+	const uint8_t allowed_flags = FWD_WEAK;
+	struct fwd_entry *new;
+
+	ASSERT(!(flags & ~allowed_flags));
+
+	if (fwd->count >= ARRAY_SIZE(fwd->tab))
+		die("Too many port forwarding ranges");
+
+	new = &fwd->tab[fwd->count++];
+	new->flags = flags;
+
+	if (addr) {
+		new->addr = *addr;
+	} else {
+		new->addr = inany_any6;
+		new->flags |= FWD_DUAL_STACK;
+	}
+
+	memset(new->ifname, 0, sizeof(new->ifname));
+	if (ifname)
+		strncpy(new->ifname, ifname, sizeof(new->ifname));
+
+	ASSERT(first <= last);
+	new->first = first;
+	new->last = last;
+
+	new->to = to;
+}
+
+/**
+ * fwd_table_print() - Print forwarding table for debugging
+ * @fwd:	Table to print
+ */
+void fwd_table_print(const struct fwd_ports *fwd)
+{
+	unsigned i;
+
+	for (i = 0; i < fwd->count; i++) {
+		const struct fwd_entry *fe = &fwd->tab[i];
+		const char *weak = fe->flags & FWD_WEAK ? " WEAK" : "";
+		const char *percent = *fe->ifname ? "%" : "";
+		char addr[INANY_ADDRSTRLEN] = "*";
+
+		if (!(fe->flags & FWD_DUAL_STACK))
+			inany_ntop(&fe->addr, addr, sizeof(addr));
+
+		if (fe->first == fe->last) {
+			info("    [%s]%s%s:%hu  =>  %hu %s",
+			     addr, percent, fe->ifname,
+			     fe->first, fe->to, weak);
+		} else {
+			info("    [%s]%s%s:%hu-%hu  =>  %hu-%hu %s",
+			     addr, percent, fe->ifname, fe->first, fe->last,
+			     fe->to, fe->last - fe->first + fe->to, weak);
+		}
+	}
+}
+
 /* See enum in kernel's include/net/tcp_states.h */
 #define UDP_LISTEN	0x07
 #define TCP_LISTEN	0x0a
diff --git a/fwd.h b/fwd.h
index 77925822..21f00cf8 100644
--- a/fwd.h
+++ b/fwd.h
@@ -16,6 +16,30 @@ struct flowside;
 void fwd_probe_ephemeral(void);
 bool fwd_port_is_ephemeral(in_port_t port);
 
+/**
+ * struct fwd_entry - One range of ports to forward
+ * @addr:	Address to forward from
+ * @ifname:	Interface to forward from
+ * @first:	First port number to forward
+ * @last:	Last port number to forward
+ * @to:		Port number to forward port @first to.
+ * @flags:	Flag mask
+ * 	FWD_DUAL_STACK - forward both IPv4 and IPv6 (requires @addr be ::)
+ *	FWD_WEAK - Don't give an error if binds fail for some forwards
+ *
+ * FIXME: @addr and @ifname currently ignored for outbound tables
+ */
+struct fwd_entry {
+	union inany_addr addr;
+	char ifname[IFNAMSIZ];
+	in_port_t first, last, to;
+#define FWD_DUAL_STACK		BIT(0)
+#define FWD_WEAK		BIT(1)
+	uint8_t flags;
+};
+
+#define MAX_FWDS	128
+
 enum fwd_ports_mode {
 	FWD_UNSET = 0,
 	FWD_SPEC = 1,
@@ -38,14 +62,21 @@ struct fwd_ports {
 	enum fwd_ports_mode mode;
 	int scan4;
 	int scan6;
+	unsigned count;
+	struct fwd_entry tab[MAX_FWDS];
 	uint8_t map[PORT_BITMAP_SIZE];
 	in_port_t delta[NUM_PORTS];
 };
 
 #define FWD_PORT_SCAN_INTERVAL		1000	/* ms */
 
+void fwd_table_add(struct fwd_ports *fwd, uint8_t flags,
+		   const union inany_addr *addr, const char *ifname,
+		   in_port_t first, in_port_t last, in_port_t to);
+void fwd_table_print(const struct fwd_ports *fwd);
+
 void fwd_scan_ports_init(struct ctx *c);
-void fwd_scan_ports_timer(struct ctx *c, const struct timespec *now);
+void fwd_scan_ports_timer(struct ctx * c, const struct timespec *now);
 
 bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
 		 union inany_addr *translated);
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 04/12] conf: Accurately record ifname and address for outbound forwards
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (2 preceding siblings ...)
  2025-12-19 14:18 ` [PATCH v2 03/12] conf, fwd: Keep a table of our port forwarding configuration David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 05/12] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

-T and -U options don't allow specifying a listening address.  Usually this
will listen on *%lo in the guest.  However on kernels without unprivileged
SO_BINDTODEVICE that's not possible so we instead listen separately on
127.0.0.1 and ::1.

Currently that's handled at the point we actually set up the listens, we
record both address and ifname as NULL in the forwarding table entry.  That
will cause trouble for future extensions we want, so update this to
accurately create the forwarding table: either a single entry with ifname
== "lo" or two entries with address of 127.0.0.1 and ::1.

As a bonus, this gives the user a warning if they specify an explicit
outbound forwarding on a kernel without SO_BINDTODEVICE.  The existing
warning for missing SO_BINDTODEVICE (incorrectly) only covers the case of
-T auto or -U auto.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/conf.c b/conf.c
index ae2dc3e1..af9e82f5 100644
--- a/conf.c
+++ b/conf.c
@@ -157,12 +157,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 		    optname, optarg);
 	}
 
-	if (ifname && c->no_bindtodevice) {
-		die(
-"Device binding for '-%c %s' unsupported (requires kernel 5.7+)",
-		    optname, optarg);
-	}
-
 	for (base = first; base <= last; base++) {
 		if (bitmap_isset(exclude, base))
 			continue;
@@ -202,8 +196,27 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 			}
 		}
 
-		fwd_table_add(fwd, flags, addr, ifname,
-			      base, i - 1, base + delta);
+		if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) {
+			/* FIXME: Once the fwd bitmaps are removed, move this
+			 * workaround to the caller
+			 */
+			ASSERT(!addr && ifname && !strcmp(ifname, "lo"));
+			warn(
+"SO_BINDTODEVICE unavailable, forwarding only 127.0.0.1 and ::1 for '-%c %s'",
+			     optname, optarg);
+
+			if (c->ifi4) {
+				fwd_table_add(fwd, flags, &inany_loopback4, NULL,
+					      base, i - 1, base + delta);
+			}
+			if (c->ifi6) {
+				fwd_table_add(fwd, flags, &inany_loopback6, NULL,
+					      base, i - 1, base + delta);
+			}
+		} else {
+			fwd_table_add(fwd, flags, addr, ifname,
+				      base, i - 1, base + delta);
+		}
 		base = i - 1;
 	}
 
@@ -350,6 +363,15 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 		}
 	} while ((p = next_chunk(p, ',')));
 
+	if (ifname && c->no_bindtodevice) {
+		die(
+"Device binding for '-%c %s' unsupported (requires kernel 5.7+)",
+		    optname, optarg);
+	}
+	/* Outbound forwards come from guest loopback */
+	if ((optname == 'T' || optname == 'U') && !ifname)
+		ifname = "lo";
+
 	if (exclude_only) {
 		/* Exclude ephemeral ports */
 		for (i = 0; i < NUM_PORTS; i++)
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 05/12] conf, fwd: Record "auto" port forwards in forwarding table
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (3 preceding siblings ...)
  2025-12-19 14:18 ` [PATCH v2 04/12] conf: Accurately record ifname and address for outbound forwards David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 06/12] tcp, udp: Make {tcp,udp}_listen() return socket fds David Gibson
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Currently the forwarding table records details of explicit port forwards,
but nothing for -[tTuU] auto.  That looks a little odd on the debug output,
and will be a problem for future changes.

Extend the forward table to have entries for auto-scanned forwards, using
a new FWD_SCAN flag.  For now the mechanism of auto port forwarding isn't
updated, and we can only create a single FWD_SCAN entry per protocol and
direction.  We'll better integrate auto scanning with other forward table
mechanics in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 34 ++++++++++++++++++++++++++++------
 fwd.c  | 12 +++++++-----
 fwd.h  |  2 ++
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/conf.c b/conf.c
index af9e82f5..9d94e449 100644
--- a/conf.c
+++ b/conf.c
@@ -135,7 +135,7 @@ static int parse_port_range(const char *s, char **endptr,
  * @ifname:	Listening interface
  * @first:	First port to forward
  * @last:	Last port to forward
- * @exclude:	Bitmap of ports to exclude
+ * @exclude:	Bitmap of ports to exclude (may be NULL)
  * @to:		Port to translate @first to when forwarding
  * @flags:	Flags for forwarding entries
  */
@@ -158,11 +158,11 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 	}
 
 	for (base = first; base <= last; base++) {
-		if (bitmap_isset(exclude, base))
+		if (exclude && bitmap_isset(exclude, base))
 			continue;
 
 		for (i = base; i <= last; i++) {
-			if (bitmap_isset(exclude, i))
+			if (exclude && bitmap_isset(exclude, i))
 				break;
 
 			if (bitmap_isset(fwd->map, i)) {
@@ -170,12 +170,13 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 "Altering mapping of already mapped port number: %s", optarg);
 			}
 
-			bitmap_set(fwd->map, i);
+			if (!(flags & FWD_SCAN))
+				bitmap_set(fwd->map, i);
 			fwd->delta[i] = delta;
 
-			if (optname == 't')
+			if (!(flags & FWD_SCAN) && optname == 't')
 				ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
-			else if (optname == 'u')
+			else if (!(flags & FWD_SCAN) && optname == 'u')
 				ret = udp_listen(c, PIF_HOST, addr, ifname, i);
 			else
 				/* No way to check in advance for -T and -U */
@@ -2189,6 +2190,27 @@ void conf(struct ctx *c, int argc, char **argv)
 	if (!c->udp.fwd_out.mode)
 		c->udp.fwd_out.mode = fwd_default;
 
+	if (c->tcp.fwd_in.mode == FWD_AUTO) {
+		conf_ports_range_except(c, 't', "auto", &c->tcp.fwd_in,
+					NULL, NULL, 1, NUM_PORTS - 1,
+					NULL, 1, FWD_SCAN);
+	}
+	if (c->tcp.fwd_out.mode == FWD_AUTO) {
+		conf_ports_range_except(c, 'T', "auto", &c->tcp.fwd_out,
+					NULL, "lo", 1, NUM_PORTS - 1,
+					NULL, 1, FWD_SCAN);
+	}
+	if (c->udp.fwd_in.mode == FWD_AUTO) {
+		conf_ports_range_except(c, 'u', "auto", &c->udp.fwd_in,
+					NULL, NULL, 1, NUM_PORTS - 1,
+					NULL, 1, FWD_SCAN);
+	}
+	if (c->udp.fwd_out.mode == FWD_AUTO) {
+		conf_ports_range_except(c, 'U', "auto", &c->udp.fwd_out,
+					NULL, "lo", 1, NUM_PORTS - 1,
+					NULL, 1, FWD_SCAN);
+	}
+
 	if (!c->quiet)
 		conf_print(c);
 }
diff --git a/fwd.c b/fwd.c
index a8477607..5e5dc58c 100644
--- a/fwd.c
+++ b/fwd.c
@@ -331,7 +331,7 @@ void fwd_table_add(struct fwd_ports *fwd, uint8_t flags,
 		   in_port_t first, in_port_t last, in_port_t to)
 {
 	/* Flags which can be set from the caller */
-	const uint8_t allowed_flags = FWD_WEAK;
+	const uint8_t allowed_flags = FWD_WEAK | FWD_SCAN;
 	struct fwd_entry *new;
 
 	ASSERT(!(flags & ~allowed_flags));
@@ -371,6 +371,7 @@ void fwd_table_print(const struct fwd_ports *fwd)
 	for (i = 0; i < fwd->count; i++) {
 		const struct fwd_entry *fe = &fwd->tab[i];
 		const char *weak = fe->flags & FWD_WEAK ? " WEAK" : "";
+		const char *scan = fe->flags & FWD_SCAN ? " AUTO" : "";
 		const char *percent = *fe->ifname ? "%" : "";
 		char addr[INANY_ADDRSTRLEN] = "*";
 
@@ -378,13 +379,14 @@ void fwd_table_print(const struct fwd_ports *fwd)
 			inany_ntop(&fe->addr, addr, sizeof(addr));
 
 		if (fe->first == fe->last) {
-			info("    [%s]%s%s:%hu  =>  %hu %s",
+			info("    [%s]%s%s:%hu  =>  %hu %s%s",
 			     addr, percent, fe->ifname,
-			     fe->first, fe->to, weak);
+			     fe->first, fe->to, weak, scan);
 		} else {
-			info("    [%s]%s%s:%hu-%hu  =>  %hu-%hu %s",
+			info("    [%s]%s%s:%hu-%hu  =>  %hu-%hu %s%s",
 			     addr, percent, fe->ifname, fe->first, fe->last,
-			     fe->to, fe->last - fe->first + fe->to, weak);
+			     fe->to, fe->last - fe->first + fe->to,
+			     weak, scan);
 		}
 	}
 }
diff --git a/fwd.h b/fwd.h
index 21f00cf8..eef507c6 100644
--- a/fwd.h
+++ b/fwd.h
@@ -26,6 +26,7 @@ bool fwd_port_is_ephemeral(in_port_t port);
  * @flags:	Flag mask
  * 	FWD_DUAL_STACK - forward both IPv4 and IPv6 (requires @addr be ::)
  *	FWD_WEAK - Don't give an error if binds fail for some forwards
+ *	FWD_SCAN - Only forward if we scan a listener on the target
  *
  * FIXME: @addr and @ifname currently ignored for outbound tables
  */
@@ -35,6 +36,7 @@ struct fwd_entry {
 	in_port_t first, last, to;
 #define FWD_DUAL_STACK		BIT(0)
 #define FWD_WEAK		BIT(1)
+#define FWD_SCAN		BIT(2)
 	uint8_t flags;
 };
 
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 06/12] tcp, udp: Make {tcp,udp}_listen() return socket fds
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (4 preceding siblings ...)
  2025-12-19 14:18 ` [PATCH v2 05/12] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:18 ` [PATCH v2 07/12] fwd: Make space to store listening sockets in forward table David Gibson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

{tcp,udp}_listen() currently return 0 on success, rather than the socket
fd they created.  We had that historically because these functions could
sometimes create multiple sockets.  We've now refactored things to avoid
that, so it makes more sense for them to return the socket on success.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 14 +++++++-------
 tcp.c  |  7 ++-----
 udp.c  |  4 ++--
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/conf.c b/conf.c
index 9d94e449..3d41a0fb 100644
--- a/conf.c
+++ b/conf.c
@@ -150,7 +150,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 	unsigned delta = to - first;
 	bool bound_one = false;
 	unsigned base, i;
-	int ret;
+	int fd;
 
 	if (first == 0) {
 		die("Can't forward port 0 for option '-%c %s'",
@@ -175,25 +175,25 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 			fwd->delta[i] = delta;
 
 			if (!(flags & FWD_SCAN) && optname == 't')
-				ret = tcp_listen(c, PIF_HOST, addr, ifname, i);
+				fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
 			else if (!(flags & FWD_SCAN) && optname == 'u')
-				ret = udp_listen(c, PIF_HOST, addr, ifname, i);
+				fd = udp_listen(c, PIF_HOST, addr, ifname, i);
 			else
 				/* No way to check in advance for -T and -U */
-				ret = 0;
+				fd = 0; /* dummy */
 
-			if (ret == -ENFILE || ret == -EMFILE) {
+			if (fd == -ENFILE || fd == -EMFILE) {
 				die(
 "Can't open enough sockets for port specifier: %s",
 				    optarg);
 			}
 
-			if (!ret) {
+			if (fd >= 0) {
 				bound_one = true;
 			} else if (!(flags & FWD_WEAK)) {
 				die(
 "Failed to bind port %u (%s) for option '-%c %s'",
-				    i, strerror_(-ret), optname, optarg);
+				    i, strerror_(-fd), optname, optarg);
 			}
 		}
 
diff --git a/tcp.c b/tcp.c
index 71c6d632..e52f5420 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2680,7 +2680,7 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
  * @ifname:	Name of interface to bind to, NULL for any
  * @port:	Port, host order
  *
- * Return: 0 on success, negative error code on failure
+ * Return: Socket fd on success, negative error code on failure
  */
 int tcp_listen(const struct ctx *c, uint8_t pif,
 	       const union inany_addr *addr, const char *ifname, in_port_t port)
@@ -2731,10 +2731,7 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
 			socks[port][V6] = s < 0 ? -1 : s;
 	}
 
-	if (s < 0)
-		return s;
-
-	return 0;
+	return s;
 }
 
 /**
diff --git a/udp.c b/udp.c
index eda55c39..6168c36c 100644
--- a/udp.c
+++ b/udp.c
@@ -1136,7 +1136,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
  * @ifname:	Name of interface to bind to, NULL if not configured
  * @port:	Port, host order
  *
- * Return: 0 on success, negative error code on failure
+ * Return: Socket fd on success, negative error code on failure
  */
 int udp_listen(const struct ctx *c, uint8_t pif,
 	       const union inany_addr *addr, const char *ifname, in_port_t port)
@@ -1182,7 +1182,7 @@ int udp_listen(const struct ctx *c, uint8_t pif,
 	if (!addr || !inany_v4(addr))
 		socks[V6][port] = s < 0 ? -1 : s;
 
-	return s < 0 ? s : 0;
+	return s;
 }
 
 /**
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 07/12] fwd: Make space to store listening sockets in forward table
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (5 preceding siblings ...)
  2025-12-19 14:18 ` [PATCH v2 06/12] tcp, udp: Make {tcp,udp}_listen() return socket fds David Gibson
@ 2025-12-19 14:18 ` David Gibson
  2025-12-19 14:19 ` [PATCH v2 08/12] ip: Add ipproto_name() function David Gibson
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:18 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

At present, we don't keep track of the fds for listening sockets (except
for "auto" ones).  Since the fd is stored in the epoll reference, we didn't
need an alternative source of it for the various handlers.

However, we're intending to allow dynamic changes to forwarding
configuration in future.  That means we need a way to enumerate sockets so
we can close them on removal of a forward.

Extend our forwarding table data structure to make space for all the
listening sockets.  To avoid allocation, this imposes another limit: we
could run out of space for socket fds before we run out of forwarding
entries.

We don't actually do anything with the allocate space yet.  Plus, for
"auto" forwards the new space is redundant with existing arrays.  We'll fix
both of those in later patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 fwd.c |  8 ++++++++
 fwd.h | 11 +++++++++++
 2 files changed, 19 insertions(+)

diff --git a/fwd.c b/fwd.c
index 5e5dc58c..5215cee9 100644
--- a/fwd.c
+++ b/fwd.c
@@ -332,12 +332,15 @@ void fwd_table_add(struct fwd_ports *fwd, uint8_t flags,
 {
 	/* Flags which can be set from the caller */
 	const uint8_t allowed_flags = FWD_WEAK | FWD_SCAN;
+	unsigned num = (unsigned)last - first + 1, i;
 	struct fwd_entry *new;
 
 	ASSERT(!(flags & ~allowed_flags));
 
 	if (fwd->count >= ARRAY_SIZE(fwd->tab))
 		die("Too many port forwarding ranges");
+	if ((fwd->listen_sock_count + num) > ARRAY_SIZE(fwd->listen_socks))
+		die("Too many listening sockets");
 
 	new = &fwd->tab[fwd->count++];
 	new->flags = flags;
@@ -358,6 +361,11 @@ void fwd_table_add(struct fwd_ports *fwd, uint8_t flags,
 	new->last = last;
 
 	new->to = to;
+
+	new->socks = &fwd->listen_socks[fwd->listen_sock_count];
+	fwd->listen_sock_count += num;
+	for (i = 0; i < num; i++)
+		new->socks[i] = -1;
 }
 
 /**
diff --git a/fwd.h b/fwd.h
index eef507c6..84c463e2 100644
--- a/fwd.h
+++ b/fwd.h
@@ -23,6 +23,7 @@ bool fwd_port_is_ephemeral(in_port_t port);
  * @first:	First port number to forward
  * @last:	Last port number to forward
  * @to:		Port number to forward port @first to.
+ * @socks:	Array of listening sockets for this entry
  * @flags:	Flag mask
  * 	FWD_DUAL_STACK - forward both IPv4 and IPv6 (requires @addr be ::)
  *	FWD_WEAK - Don't give an error if binds fail for some forwards
@@ -34,6 +35,7 @@ struct fwd_entry {
 	union inany_addr addr;
 	char ifname[IFNAMSIZ];
 	in_port_t first, last, to;
+	int *socks;
 #define FWD_DUAL_STACK		BIT(0)
 #define FWD_WEAK		BIT(1)
 #define FWD_SCAN		BIT(2)
@@ -52,6 +54,13 @@ enum fwd_ports_mode {
 
 #define PORT_BITMAP_SIZE	DIV_ROUND_UP(NUM_PORTS, 8)
 
+/* Maximum number of listening sockets (per pif & protocol)
+ *
+ * Rationale: This lets us listen on every port for two addresses (which we need
+ * for -T auto without SO_BINDTODEVICE), plus a comfortable number of extras.
+ */
+#define MAX_LISTEN_SOCKS	(NUM_PORTS * 3)
+
 /**
  * fwd_ports() - Describes port forwarding for one protocol and direction
  * @mode:	Overall forwarding mode (all, none, auto, specific ports)
@@ -68,6 +77,8 @@ struct fwd_ports {
 	struct fwd_entry tab[MAX_FWDS];
 	uint8_t map[PORT_BITMAP_SIZE];
 	in_port_t delta[NUM_PORTS];
+	unsigned listen_sock_count;
+	int listen_socks[MAX_LISTEN_SOCKS];
 };
 
 #define FWD_PORT_SCAN_INTERVAL		1000	/* ms */
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 08/12] ip: Add ipproto_name() function
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (6 preceding siblings ...)
  2025-12-19 14:18 ` [PATCH v2 07/12] fwd: Make space to store listening sockets in forward table David Gibson
@ 2025-12-19 14:19 ` David Gibson
  2025-12-19 14:19 ` [PATCH v2 09/12] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:19 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Add a function to get the name of an IP protocol from its number.  Usually
this would be done by getprotobynumber(), but that requires access to
/etc/protocols and might allocate.  We can't do either of those once we've
self-isolated.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 ip.c | 27 +++++++++++++++++++++++++++
 ip.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/ip.c b/ip.c
index 9a7f4c54..f1d224bd 100644
--- a/ip.c
+++ b/ip.c
@@ -67,3 +67,30 @@ found:
 	*proto = nh;
 	return true;
 }
+
+/**
+ * ipproto_name() - Get IP protocol name from number
+ * @proto:	IP protocol number
+ *
+ * Return: pointer to name of protocol @proto
+ *
+ * Usually this would be done with getprotobynumber(3) but that reads
+ * /etc/protocols and might allocate, which isn't possible for us once
+ * self-isolated.
+ */
+/* cppcheck-suppress unusedFunction */
+const char *ipproto_name(uint8_t proto)
+{
+	switch (proto) {
+	case IPPROTO_ICMP:
+		return "ICMP";
+	case IPPROTO_TCP:
+		return "TCP";
+	case IPPROTO_UDP:
+		return "UDP";
+	case IPPROTO_ICMPV6:
+		return "ICMPv6";
+	default:
+		return "<unknown protocol>";
+	}
+}
diff --git a/ip.h b/ip.h
index 5830b923..42b417c5 100644
--- a/ip.h
+++ b/ip.h
@@ -116,6 +116,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
 }
 
 bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
+const char *ipproto_name(uint8_t proto);
 
 /* IPv6 link-local all-nodes multicast address, ff02::1 */
 static const struct in6_addr in6addr_ll_all_nodes = {
@@ -135,4 +136,5 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff };
 #define IPV6_MIN_MTU		1280
 #endif
 
+
 #endif /* IP_H */
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 09/12] fwd, tcp, udp: Set up listening sockets based on forward table
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (7 preceding siblings ...)
  2025-12-19 14:19 ` [PATCH v2 08/12] ip: Add ipproto_name() function David Gibson
@ 2025-12-19 14:19 ` David Gibson
  2025-12-19 14:19 ` [PATCH v2 10/12] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:19 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Previously we created inbound listening sockets as we parsed the forwarding
options (-t, -u) whereas outbound listening sockets were created during
{tcp,udp}_init().  Now that we have a data structure recording the full
details of the listening options we can move socket creation to
{tcp,udp}_init().  This means that errors for either direction are
detected and reported the same way.

Introduce fwd_listen_sync() which synchronizes the state of listening
sockets to the forward table data structure, both for fixed and automatic
forwards.

This does cause a change in semantics for "exclude only" port
specifications.  Previously an option like -t ~6000 wouldn't cause a fatal
error, as long as we could bind at least one port.  Now, it requires at
least one port bound in each of the contiguous blocks of ports the
specification resolves to.  With typical ephemeral ports settings that's
one port each in 1..5999, 6001..32767 and 61000..65535.

Preserving the exact behaviour for this case would require a considerably
more complex data structure, so I'm hoping this is a sufficiently niche
case for the change to be acceptable.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c |  27 ----------
 fwd.c  | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fwd.h  |   3 ++
 ip.c   |   1 -
 tcp.c  | 119 +------------------------------------------
 tcp.h  |   1 -
 udp.c  |  96 ++---------------------------------
 udp.h  |   1 -
 8 files changed, 160 insertions(+), 245 deletions(-)

diff --git a/conf.c b/conf.c
index 3d41a0fb..c0672be5 100644
--- a/conf.c
+++ b/conf.c
@@ -148,9 +148,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 				    uint8_t flags)
 {
 	unsigned delta = to - first;
-	bool bound_one = false;
 	unsigned base, i;
-	int fd;
 
 	if (first == 0) {
 		die("Can't forward port 0 for option '-%c %s'",
@@ -173,28 +171,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 			if (!(flags & FWD_SCAN))
 				bitmap_set(fwd->map, i);
 			fwd->delta[i] = delta;
-
-			if (!(flags & FWD_SCAN) && optname == 't')
-				fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
-			else if (!(flags & FWD_SCAN) && optname == 'u')
-				fd = udp_listen(c, PIF_HOST, addr, ifname, i);
-			else
-				/* No way to check in advance for -T and -U */
-				fd = 0; /* dummy */
-
-			if (fd == -ENFILE || fd == -EMFILE) {
-				die(
-"Can't open enough sockets for port specifier: %s",
-				    optarg);
-			}
-
-			if (fd >= 0) {
-				bound_one = true;
-			} else if (!(flags & FWD_WEAK)) {
-				die(
-"Failed to bind port %u (%s) for option '-%c %s'",
-				    i, strerror_(-fd), optname, optarg);
-			}
 		}
 
 		if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) {
@@ -220,9 +196,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
 		}
 		base = i - 1;
 	}
-
-	if (!bound_one)
-		die("Failed to bind any port for '-%c %s'", optname, optarg);
 }
 
 /**
diff --git a/fwd.c b/fwd.c
index 5215cee9..21e852af 100644
--- a/fwd.c
+++ b/fwd.c
@@ -22,6 +22,7 @@
 #include <stdio.h>
 
 #include "util.h"
+#include "epoll_ctl.h"
 #include "ip.h"
 #include "siphash.h"
 #include "inany.h"
@@ -399,6 +400,148 @@ void fwd_table_print(const struct fwd_ports *fwd)
 	}
 }
 
+/** fwd_sync_one() - Create or remove listening sockets for a forward entry
+ * @c:		Execution context
+ * @fe:		Forwarding entry
+ * @pif:	Interface to create listening sockets for
+ * @proto:	Protocol to listen for
+ * @scanmap:	Bitmap of ports to listen for on FWD_SCAN entries
+ */
+static void fwd_sync_one(const struct ctx *c, const struct fwd_entry *fe,
+			 uint8_t pif, uint8_t proto, const uint8_t *scanmap)
+{
+	const union inany_addr *addr = &fe->addr;
+	const char *ifname = fe->ifname;
+	bool bound_one = false;
+	unsigned port;
+
+	ASSERT(pif_is_socket(pif));
+
+	if (fe->flags & FWD_DUAL_STACK)
+		addr = NULL;
+	if (!*ifname)
+		ifname = NULL;
+
+	for (port = fe->first; port <= fe->last; port++) {
+		int fd = fe->socks[port - fe->first];
+
+		if ((fe->flags & FWD_SCAN) && !bitmap_isset(scanmap, port)) {
+			/* We don't want to listen on this port */
+			if (fd >= 0) {
+				/* We already are, so stop */
+				epoll_del(c->epollfd, fd);
+				close(fd);
+				fe->socks[port - fe->first] = -1;
+			}
+			continue;
+		}
+
+		if (fd >= 0) /* Already listening, nothing to do */ {
+			bound_one = true;
+			continue;
+		}
+
+		if (proto == IPPROTO_TCP)
+			fd = tcp_listen(c, pif, addr, ifname, port);
+		else if (proto == IPPROTO_UDP)
+			fd = udp_listen(c, pif, addr, ifname, port);
+		else
+			ASSERT(0);
+
+		if (fd < 0) {
+			char astr[INANY_ADDRSTRLEN] = "";
+
+			if (addr)
+				inany_ntop(addr, astr, sizeof(astr));
+
+			warn("Listen failed for %s %s port %s%s%s%s%u: %s",
+			     pif_name(pif), ipproto_name(proto),
+			     astr, ifname ? "%" : "", ifname ? ifname : "",
+			     addr || ifname ? "/" : "", port, strerror_(-fd));
+
+			if (!(fe->flags & FWD_WEAK))
+				goto die;
+
+			continue;
+		}
+
+		fe->socks[port - fe->first] = fd;
+		bound_one = true;
+	}
+
+	if (!bound_one && !(fe->flags & FWD_SCAN)) {
+		char astr[INANY_ADDRSTRLEN] = "";
+
+		if (addr)
+			inany_ntop(addr, astr, sizeof(astr));
+
+		err("All listens failed for %s %s %s%s%s%s%u-%u",
+		    pif_name(pif), ipproto_name(proto),
+		    astr, ifname ? "%" : "", ifname ? ifname : "",
+		    addr || ifname ? "/" : "", fe->first, fe->last);
+		goto die;
+	}
+
+	return;
+
+die:
+	die("Couldn't listen on requested %s ports", ipproto_name(proto));
+}
+
+/** struct fwd_listen_args - arguments for fwd_listen_init_()
+ * @c:		Execution context
+ * @fwd:	Forwarding information
+ * @scanmap:	Bitmap of ports to auto-forward
+ * @pif:	Interface to create listening sockets for
+ * @proto:	Protocol
+ */
+struct fwd_listen_args {
+	const struct ctx *c;
+	const struct fwd_ports *fwd;
+	const uint8_t *scanmap;
+	uint8_t pif;
+	uint8_t proto;
+};
+
+/** fwd_listen_sync_() - Update listening sockets to match forwards
+ * @arg:	struct fwd_listen_args with arguments
+ *
+ * Returns: zero
+ */
+static int fwd_listen_sync_(void *arg)
+{
+	const struct fwd_listen_args *a = arg;
+	unsigned i;
+
+	if (a->pif == PIF_SPLICE)
+		ns_enter(a->c);
+
+	for (i = 0; i < a->fwd->count; i++)
+		fwd_sync_one(a->c, &a->fwd->tab[i], a->pif, a->proto,
+			     a->fwd->map);
+
+	return 0;
+}
+
+/** fwd_listen_sync() - Update listening sockets to match forwards
+ * @c:		Execution context
+ * @fwd:	Forwarding information
+ * @pif:	Interface to create listening sockets for
+ * @proto:	Protocol
+ */
+void fwd_listen_sync(const struct ctx *c, const struct fwd_ports *fwd,
+		     uint8_t pif, uint8_t proto)
+{
+	struct fwd_listen_args a = {
+		.c = c, .fwd = fwd, .pif = pif, .proto = proto,
+	};
+
+	if (pif == PIF_SPLICE)
+		NS_CALL(fwd_listen_sync_, &a);
+	else
+		fwd_listen_sync_(&a);
+}
+
 /* See enum in kernel's include/net/tcp_states.h */
 #define UDP_LISTEN	0x07
 #define TCP_LISTEN	0x0a
@@ -506,7 +649,7 @@ static void fwd_scan_ports(struct ctx *c)
 }
 
 /**
- * fwd_scan_ports_init() - Initial setup for automatic port forwarding
+ * fwd_scan_ports_init() - Initial setup for port forwarding
  * @c:		Execution context
  */
 void fwd_scan_ports_init(struct ctx *c)
@@ -557,10 +700,14 @@ void fwd_scan_ports_timer(struct ctx *c, const struct timespec *now)
 
 	fwd_scan_ports(c);
 
-	if (!c->no_tcp)
-		tcp_port_rebind_all(c);
-	if (!c->no_udp)
-		udp_port_rebind_all(c);
+	if (!c->no_tcp) {
+		fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP);
+		fwd_listen_sync(c, &c->tcp.fwd_out, PIF_SPLICE, IPPROTO_TCP);
+	}
+	if (!c->no_udp) {
+		fwd_listen_sync(c, &c->udp.fwd_in, PIF_HOST, IPPROTO_UDP);
+		fwd_listen_sync(c, &c->udp.fwd_out, PIF_SPLICE, IPPROTO_UDP);
+	}
 }
 
 /**
diff --git a/fwd.h b/fwd.h
index 84c463e2..3f3b111c 100644
--- a/fwd.h
+++ b/fwd.h
@@ -91,6 +91,9 @@ void fwd_table_print(const struct fwd_ports *fwd);
 void fwd_scan_ports_init(struct ctx *c);
 void fwd_scan_ports_timer(struct ctx * c, const struct timespec *now);
 
+void fwd_listen_sync(const struct ctx *c, const struct fwd_ports *fwd,
+		     uint8_t pif, uint8_t proto);
+
 bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
 		 union inany_addr *translated);
 uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
diff --git a/ip.c b/ip.c
index f1d224bd..fc26dab2 100644
--- a/ip.c
+++ b/ip.c
@@ -78,7 +78,6 @@ found:
  * /etc/protocols and might allocate, which isn't possible for us once
  * self-isolated.
  */
-/* cppcheck-suppress unusedFunction */
 const char *ipproto_name(uint8_t proto)
 {
 	switch (proto) {
diff --git a/tcp.c b/tcp.c
index e52f5420..06f58b10 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2734,50 +2734,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
 	return s;
 }
 
-/**
- * tcp_ns_listen() - Init socket to listen for spliced outbound connections
- * @c:		Execution context
- * @port:	Port, host order
- */
-static void tcp_ns_listen(const struct ctx *c, in_port_t port)
-{
-	ASSERT(!c->no_tcp);
-
-	if (!c->no_bindtodevice) {
-		tcp_listen(c, PIF_SPLICE, NULL, "lo", port);
-		return;
-	}
-
-	if (c->ifi4)
-		tcp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
-	if (c->ifi6)
-		tcp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
-}
-
-/**
- * tcp_ns_socks_init() - Bind sockets in namespace for outbound connections
- * @arg:	Execution context
- *
- * Return: 0
- */
-/* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */
-static int tcp_ns_socks_init(void *arg)
-{
-	const struct ctx *c = (const struct ctx *)arg;
-	unsigned port;
-
-	ns_enter(c);
-
-	for (port = 0; port < NUM_PORTS; port++) {
-		if (!bitmap_isset(c->tcp.fwd_out.map, port))
-			continue;
-
-		tcp_ns_listen(c, port);
-	}
-
-	return 0;
-}
-
 /**
  * tcp_sock_refill_pool() - Refill one pool of pre-opened sockets
  * @pool:	Pool of sockets to refill
@@ -2921,10 +2877,10 @@ int tcp_init(struct ctx *c)
 
 	tcp_sock_refill_init(c);
 
+	fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP);
 	if (c->mode == MODE_PASTA) {
 		tcp_splice_init(c);
-
-		NS_CALL(tcp_ns_socks_init, c);
+		fwd_listen_sync(c, &c->tcp.fwd_out, PIF_SPLICE, IPPROTO_TCP);
 	}
 
 	peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
@@ -2943,77 +2899,6 @@ int tcp_init(struct ctx *c)
 	return 0;
 }
 
-/**
- * tcp_port_rebind() - Rebind ports to match forward maps
- * @c:		Execution context
- * @outbound:	True to remap outbound forwards, otherwise inbound
- *
- * Must be called in namespace context if @outbound is true.
- */
-static void tcp_port_rebind(struct ctx *c, bool outbound)
-{
-	const uint8_t *fmap = outbound ? c->tcp.fwd_out.map : c->tcp.fwd_in.map;
-	int (*socks)[IP_VERSIONS] = outbound ? tcp_sock_ns : tcp_sock_init_ext;
-	unsigned port;
-
-	for (port = 0; port < NUM_PORTS; port++) {
-		if (!bitmap_isset(fmap, port)) {
-			if (socks[port][V4] >= 0) {
-				close(socks[port][V4]);
-				socks[port][V4] = -1;
-			}
-
-			if (socks[port][V6] >= 0) {
-				close(socks[port][V6]);
-				socks[port][V6] = -1;
-			}
-
-			continue;
-		}
-
-		if ((c->ifi4 && socks[port][V4] == -1) ||
-		    (c->ifi6 && socks[port][V6] == -1)) {
-			if (outbound)
-				tcp_ns_listen(c, port);
-			else
-				tcp_listen(c, PIF_HOST, NULL, NULL, port);
-		}
-	}
-}
-
-/**
- * tcp_port_rebind_outbound() - Rebind ports in namespace
- * @arg:	Execution context
- *
- * Called with NS_CALL()
- *
- * Return: 0
- */
-static int tcp_port_rebind_outbound(void *arg)
-{
-	struct ctx *c = (struct ctx *)arg;
-
-	ns_enter(c);
-	tcp_port_rebind(c, true);
-
-	return 0;
-}
-
-/**
- * tcp_port_rebind_all() - Rebind ports to match forward maps (in host & ns)
- * @c:		Execution context
- */
-void tcp_port_rebind_all(struct ctx *c)
-{
-	ASSERT(c->mode == MODE_PASTA && !c->no_tcp);
-
-	if (c->tcp.fwd_out.mode == FWD_AUTO)
-		NS_CALL(tcp_port_rebind_outbound, c);
-
-	if (c->tcp.fwd_in.mode == FWD_AUTO)
-		tcp_port_rebind(c, false);
-}
-
 /**
  * tcp_timer() - Periodic tasks: port detection, closed connections, pool refill
  * @c:		Execution context
diff --git a/tcp.h b/tcp.h
index 9dd88762..8b44e321 100644
--- a/tcp.h
+++ b/tcp.h
@@ -22,7 +22,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
 	       const union inany_addr *addr, const char *ifname,
 	       in_port_t port);
 int tcp_init(struct ctx *c);
-void tcp_port_rebind_all(struct ctx *c);
 void tcp_timer(const struct ctx *c, const struct timespec *now);
 void tcp_defer_handler(struct ctx *c);
 
diff --git a/udp.c b/udp.c
index 6168c36c..adcd2d4a 100644
--- a/udp.c
+++ b/udp.c
@@ -1205,98 +1205,6 @@ static void udp_splice_iov_init(void)
 	}
 }
 
-/**
- * udp_ns_listen() - Init socket to listen for spliced outbound connections
- * @c:		Execution context
- * @port:	Port, host order
- */
-static void udp_ns_listen(const struct ctx *c, in_port_t port)
-{
-	ASSERT(!c->no_udp);
-
-	if (!c->no_bindtodevice) {
-		udp_listen(c, PIF_SPLICE, NULL, "lo", port);
-		return;
-	}
-
-	if (c->ifi4)
-		udp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
-	if (c->ifi6)
-		udp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
-}
-
-/**
- * udp_port_rebind() - Rebind ports to match forward maps
- * @c:		Execution context
- * @outbound:	True to remap outbound forwards, otherwise inbound
- *
- * Must be called in namespace context if @outbound is true.
- */
-static void udp_port_rebind(struct ctx *c, bool outbound)
-{
-	int (*socks)[NUM_PORTS] = outbound ? udp_splice_ns : udp_splice_init;
-	const uint8_t *fmap
-		= outbound ? c->udp.fwd_out.map : c->udp.fwd_in.map;
-	unsigned port;
-
-	for (port = 0; port < NUM_PORTS; port++) {
-		if (!bitmap_isset(fmap, port)) {
-			if (socks[V4][port] >= 0) {
-				close(socks[V4][port]);
-				socks[V4][port] = -1;
-			}
-
-			if (socks[V6][port] >= 0) {
-				close(socks[V6][port]);
-				socks[V6][port] = -1;
-			}
-
-			continue;
-		}
-
-		if ((c->ifi4 && socks[V4][port] == -1) ||
-		    (c->ifi6 && socks[V6][port] == -1)) {
-			if (outbound)
-				udp_ns_listen(c, port);
-			else
-				udp_listen(c, PIF_HOST, NULL, NULL, port);
-		}
-	}
-}
-
-/**
- * udp_port_rebind_outbound() - Rebind ports in namespace
- * @arg:	Execution context
- *
- * Called with NS_CALL()
- *
- * Return: 0
- */
-static int udp_port_rebind_outbound(void *arg)
-{
-	struct ctx *c = (struct ctx *)arg;
-
-	ns_enter(c);
-	udp_port_rebind(c, true);
-
-	return 0;
-}
-
-/**
- * udp_port_rebind_all() - Rebind ports to match forward maps (in host & ns)
- * @c:		Execution context
- */
-void udp_port_rebind_all(struct ctx *c)
-{
-	ASSERT(c->mode == MODE_PASTA && !c->no_udp);
-
-	if (c->udp.fwd_out.mode == FWD_AUTO)
-		NS_CALL(udp_port_rebind_outbound, c);
-
-	if (c->udp.fwd_in.mode == FWD_AUTO)
-		udp_port_rebind(c, false);
-}
-
 /**
  * udp_init() - Initialise per-socket data, and sockets in namespace
  * @c:		Execution context
@@ -1309,9 +1217,11 @@ int udp_init(struct ctx *c)
 
 	udp_iov_init(c);
 
+	fwd_listen_sync(c, &c->udp.fwd_in, PIF_HOST, IPPROTO_UDP);
+
 	if (c->mode == MODE_PASTA) {
 		udp_splice_iov_init();
-		NS_CALL(udp_port_rebind_outbound, c);
+		fwd_listen_sync(c, &c->udp.fwd_out, PIF_SPLICE, IPPROTO_UDP);
 	}
 
 	return 0;
diff --git a/udp.h b/udp.h
index 5407db3b..f1a7a026 100644
--- a/udp.h
+++ b/udp.h
@@ -19,7 +19,6 @@ int udp_listen(const struct ctx *c, uint8_t pif,
 	       const union inany_addr *addr, const char *ifname,
 	       in_port_t port);
 int udp_init(struct ctx *c);
-void udp_port_rebind_all(struct ctx *c);
 void udp_update_l2_buf(const unsigned char *eth_d);
 
 /**
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 10/12] tcp, udp: Remove old auto-forwarding socket arrays
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (8 preceding siblings ...)
  2025-12-19 14:19 ` [PATCH v2 09/12] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
@ 2025-12-19 14:19 ` David Gibson
  2025-12-19 14:19 ` [PATCH v2 11/12] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
  2025-12-19 14:19 ` [PATCH v2 12/12] tcp: Remove unused tcp_epoll_ref David Gibson
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:19 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

Now that we've moved listening socket management to the new forwarding
table data structure, the existing arrays of socket fds are maintained,
but never consulted.  Remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c |  1 -
 tcp.c  | 24 ------------------------
 udp.c  | 30 ------------------------------
 udp.h  |  1 -
 4 files changed, 56 deletions(-)

diff --git a/conf.c b/conf.c
index c0672be5..dcadfa96 100644
--- a/conf.c
+++ b/conf.c
@@ -2051,7 +2051,6 @@ void conf(struct ctx *c, int argc, char **argv)
 	 * settings
 	 */
 	fwd_probe_ephemeral();
-	udp_portmap_clear();
 	optind = 0;
 	do {
 		name = getopt_long(argc, argv, optstring, options, NULL);
diff --git a/tcp.c b/tcp.c
index 06f58b10..3e241438 100644
--- a/tcp.c
+++ b/tcp.c
@@ -414,10 +414,6 @@ static const char *tcp_flag_str[] __attribute((__unused__)) = {
 	"ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED",
 };
 
-/* Listening sockets, used for automatic port forwarding in pasta mode only */
-static int tcp_sock_init_ext	[NUM_PORTS][IP_VERSIONS];
-static int tcp_sock_ns		[NUM_PORTS][IP_VERSIONS];
-
 /* Table of our guest side addresses with very low RTT (assumed to be local to
  * the host), LRU
  */
@@ -2689,8 +2685,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
 		.port = port,
 		.pif = pif,
 	};
-	const struct fwd_ports *fwd;
-	int (*socks)[IP_VERSIONS];
 	int s;
 
 	ASSERT(!c->no_tcp);
@@ -2712,25 +2706,9 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
 			return 0;
 	}
 
-	if (pif == PIF_HOST) {
-		fwd = &c->tcp.fwd_in;
-		socks = tcp_sock_init_ext;
-	} else {
-		ASSERT(pif == PIF_SPLICE);
-		fwd = &c->tcp.fwd_out;
-		socks = tcp_sock_ns;
-	}
-
 	s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, pif, addr, ifname,
 			port, tref.u32);
 
-	if (fwd->mode == FWD_AUTO) {
-		if (!addr || inany_v4(addr))
-			socks[port][V4] = s < 0 ? -1 : s;
-		if (!addr || !inany_v4(addr))
-			socks[port][V6] = s < 0 ? -1 : s;
-	}
-
 	return s;
 }
 
@@ -2872,8 +2850,6 @@ int tcp_init(struct ctx *c)
 
 	memset(init_sock_pool4,		0xff,	sizeof(init_sock_pool4));
 	memset(init_sock_pool6,		0xff,	sizeof(init_sock_pool6));
-	memset(tcp_sock_init_ext,	0xff,	sizeof(tcp_sock_init_ext));
-	memset(tcp_sock_ns,		0xff,	sizeof(tcp_sock_ns));
 
 	tcp_sock_refill_init(c);
 
diff --git a/udp.c b/udp.c
index adcd2d4a..daab319b 100644
--- a/udp.c
+++ b/udp.c
@@ -124,10 +124,6 @@
 			- sizeof(struct udphdr)	\
 			- sizeof(struct ipv6hdr))
 
-/* "Spliced" sockets indexed by bound port (host order) */
-static int udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
-static int udp_splice_init[IP_VERSIONS][NUM_PORTS];
-
 /* Static buffers */
 
 /* UDP header and data for inbound messages */
@@ -193,19 +189,6 @@ static struct mmsghdr	udp_mh_splice		[UDP_MAX_FRAMES];
 /* IOVs for L2 frames */
 static struct iovec	udp_l2_iov		[UDP_MAX_FRAMES][UDP_NUM_IOVS];
 
-/**
- * udp_portmap_clear() - Clear UDP port map before configuration
- */
-void udp_portmap_clear(void)
-{
-	unsigned i;
-
-	for (i = 0; i < NUM_PORTS; i++) {
-		udp_splice_ns[V4][i] = udp_splice_ns[V6][i] = -1;
-		udp_splice_init[V4][i] = udp_splice_init[V6][i] = -1;
-	}
-}
-
 /**
  * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
  * @eth_d:	Ethernet destination address, NULL if unchanged
@@ -1145,18 +1128,10 @@ int udp_listen(const struct ctx *c, uint8_t pif,
 		.pif = pif,
 		.port = port,
 	};
-	int (*socks)[NUM_PORTS];
 	int s;
 
 	ASSERT(!c->no_udp);
 
-	if (pif == PIF_HOST) {
-		socks = udp_splice_init;
-	} else {
-		ASSERT(pif == PIF_SPLICE);
-		socks = udp_splice_ns;
-	}
-
 	if (!c->ifi4) {
 		if (!addr)
 			/* Restrict to v6 only */
@@ -1177,11 +1152,6 @@ int udp_listen(const struct ctx *c, uint8_t pif,
 	s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
 			addr, ifname, port, uref.u32);
 
-	if (!addr || inany_v4(addr))
-		socks[V4][port] = s < 0 ? -1 : s;
-	if (!addr || !inany_v4(addr))
-		socks[V6][port] = s < 0 ? -1 : s;
-
 	return s;
 }
 
diff --git a/udp.h b/udp.h
index f1a7a026..42c222c6 100644
--- a/udp.h
+++ b/udp.h
@@ -6,7 +6,6 @@
 #ifndef UDP_H
 #define UDP_H
 
-void udp_portmap_clear(void);
 void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 			     uint32_t events, const struct timespec *now);
 void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 11/12] fwd: Generate auto-forward exclusions from socket fd tables
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (9 preceding siblings ...)
  2025-12-19 14:19 ` [PATCH v2 10/12] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
@ 2025-12-19 14:19 ` David Gibson
  2025-12-19 14:19 ` [PATCH v2 12/12] tcp: Remove unused tcp_epoll_ref David Gibson
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:19 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

When auto-forwarding based on port scans, we must exclude our own
listening ports, to avoid circular forwards.  Currently we use the (old)
forwarding bitmaps for the reverse direction to determine that.

Instead, generate it from the tables of listening sockets that we now
maintain.  For now this seems like a lot more work to get to the same
place.  However, it does mean we're basing our exclusions directly on the
relevant information: which of the scanned listens belong to us.  More
importantly, it's a step towards removing the bitmaps entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 fwd.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/fwd.c b/fwd.c
index 21e852af..848930a9 100644
--- a/fwd.c
+++ b/fwd.c
@@ -628,6 +628,28 @@ static void fwd_scan_ports_udp(struct fwd_ports *fwd,
 	bitmap_and_not(fwd->map, PORT_BITMAP_SIZE, fwd->map, exclude);
 }
 
+/**
+ * current_listen_map() - Get bitmap of which ports we're already listening on
+ * @map:	Bitmap to populate
+ * @fwd:	Forwarding table to consider
+ */
+static void current_listen_map(uint8_t *map, const struct fwd_ports *fwd)
+{
+	unsigned i;
+
+	memset(map, 0, PORT_BITMAP_SIZE);
+
+	for (i = 0; i < fwd->count; i++) {
+		const struct fwd_entry *fe = &fwd->tab[i];
+		unsigned port;
+
+		for (port = fe->first; port <= fe->last; port++) {
+			if (fe->socks[port - fe->first] >= 0)
+				bitmap_set(map, port);
+		}
+	}
+}
+
 /**
  * fwd_scan_ports() - Scan automatic port forwarding information
  * @c:		Execution context
@@ -637,10 +659,10 @@ static void fwd_scan_ports(struct ctx *c)
 	uint8_t excl_tcp_out[PORT_BITMAP_SIZE], excl_udp_out[PORT_BITMAP_SIZE];
 	uint8_t excl_tcp_in[PORT_BITMAP_SIZE], excl_udp_in[PORT_BITMAP_SIZE];
 
-	memcpy(excl_tcp_out, c->tcp.fwd_in.map, sizeof(excl_tcp_out));
-	memcpy(excl_tcp_in, c->tcp.fwd_out.map, sizeof(excl_tcp_in));
-	memcpy(excl_udp_out, c->udp.fwd_in.map, sizeof(excl_udp_out));
-	memcpy(excl_udp_in, c->udp.fwd_out.map, sizeof(excl_udp_in));
+	current_listen_map(excl_tcp_out, &c->tcp.fwd_in);
+	current_listen_map(excl_tcp_in, &c->tcp.fwd_out);
+	current_listen_map(excl_udp_out, &c->udp.fwd_in);
+	current_listen_map(excl_udp_in, &c->udp.fwd_out);
 
 	fwd_scan_ports_tcp(&c->tcp.fwd_out, excl_tcp_out);
 	fwd_scan_ports_tcp(&c->tcp.fwd_in, excl_tcp_in);
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 12/12] tcp: Remove unused tcp_epoll_ref
  2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
                   ` (10 preceding siblings ...)
  2025-12-19 14:19 ` [PATCH v2 11/12] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
@ 2025-12-19 14:19 ` David Gibson
  11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-12-19 14:19 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

This union has been unused for some time.  Remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/tcp.h b/tcp.h
index 8b44e321..f0eebd2d 100644
--- a/tcp.h
+++ b/tcp.h
@@ -29,16 +29,6 @@ void tcp_update_l2_buf(const unsigned char *eth_d);
 
 extern bool peek_offset_cap;
 
-/**
- * union tcp_epoll_ref - epoll reference portion for TCP connections
- * @index:		Index of connection in table
- * @u32:		Opaque u32 value of reference
- */
-union tcp_epoll_ref {
-	uint32_t index:20;
-	uint32_t u32;
-};
-
 /**
  * union tcp_listen_epoll_ref - epoll reference portion for TCP listening
  * @port:	Bound port number of the socket
-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-12-19 14:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-19 14:18 [PATCH v2 00/12] RFC: Improve forwarding data structure David Gibson
2025-12-19 14:18 ` [PATCH v2 01/12] tcp: Combine tcp_sock_init_one() and tcp_sock_init() into tcp_listen() David Gibson
2025-12-19 14:18 ` [PATCH v2 02/12] udp: Rename udp_sock_init() to udp_listen() with small cleanups David Gibson
2025-12-19 14:18 ` [PATCH v2 03/12] conf, fwd: Keep a table of our port forwarding configuration David Gibson
2025-12-19 14:18 ` [PATCH v2 04/12] conf: Accurately record ifname and address for outbound forwards David Gibson
2025-12-19 14:18 ` [PATCH v2 05/12] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
2025-12-19 14:18 ` [PATCH v2 06/12] tcp, udp: Make {tcp,udp}_listen() return socket fds David Gibson
2025-12-19 14:18 ` [PATCH v2 07/12] fwd: Make space to store listening sockets in forward table David Gibson
2025-12-19 14:19 ` [PATCH v2 08/12] ip: Add ipproto_name() function David Gibson
2025-12-19 14:19 ` [PATCH v2 09/12] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
2025-12-19 14:19 ` [PATCH v2 10/12] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
2025-12-19 14:19 ` [PATCH v2 11/12] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
2025-12-19 14:19 ` [PATCH v2 12/12] tcp: Remove unused tcp_epoll_ref David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).