* [PATCH v3 00/14] Introduce forwarding table
@ 2026-01-08 2:29 David Gibson
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
` (13 more replies)
0 siblings, 14 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
This creates a new data structure for recording our port forwarding: a
table of individual forwarding rules. This is intended to replace the
existing forwarding mode, map and delta structures. For now, only the
delta[] array is removed. map is still used, but only for automatic
forwards. Mode is still used, but only has significance during
configuration.
There's still a lot that can be done for flexible forwarding, but this
introduces the core data structure, and does enough to fix at least
one concrete defect with the current logic (bug 187). I think it
would be a good point to give it a solid review and maybe merge.
I'm aware that this is a very large complex series, that will be
difficult to review. I think that's more or less inevitable
implementing a feature as broad and complex as "flexible forwarding".
If it helps at all, I'm fine if it's not reviewed all at once - I
won't assume review is complete until either you say so, or I get
comments or R-b on each patch.
This applies on top of all 3 of my outstanding series in this area:
"Allow listen functions to return fds", "Clean ups to epoll
references" (v2) and "Small cleanups to splice forwarding logic".
David Gibson (14):
inany: Extend inany_ntop() to treat NULL as a fully unspecified
address
conf, fwd: Keep a table of our port forwarding configuration
conf: Accurately record ifname and address for outbound forwards
conf, fwd: Record "auto" port forwards in forwarding table
fwd: Make space to store listening sockets in forward table
ip: Add ipproto_name() function
fwd, tcp, udp: Set up listening sockets based on forward table
tcp, udp: Remove old auto-forwarding socket arrays
conf, fwd: Check forwarding table for conflicting rules
fwd: Generate auto-forward exclusions from socket fd tables
flow, fwd: Consult rules table when forwarding a new flow from socket
fwd: Remap ports based directly on forwarding rule
fwd, tcp, udp: Add forwarding rule to listening socket epoll
references
flow, fwd: Optimise forwarding rule lookup using epoll ref when
possible
conf.c | 118 +++++++++------
flow.c | 59 ++++++--
flow_table.h | 2 +-
fwd.c | 379 ++++++++++++++++++++++++++++++++++++++++++++++---
fwd.h | 68 ++++++++-
icmp.c | 2 +-
inany.c | 28 +++-
inany.h | 1 +
ip.c | 26 ++++
ip.h | 2 +
tcp.c | 154 ++------------------
tcp.h | 6 +-
udp.c | 147 +++----------------
udp.h | 7 +-
udp_flow.c | 14 +-
udp_flow.h | 2 +-
udp_internal.h | 4 +-
17 files changed, 641 insertions(+), 378 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 13:16 ` Laurent Vivier
2026-01-08 2:29 ` [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration David Gibson
` (12 subsequent siblings)
13 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
In a number of places we're using a convention that a NULL inany represents
a dual-stack unspecified address, that is one which includes both 0.0.0.0
and ::. Extend inany_ntop() to handle that convention, representing it
as "*".
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
inany.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/inany.c b/inany.c
index 7680439d..87a4d8b6 100644
--- a/inany.c
+++ b/inany.c
@@ -22,7 +22,7 @@ const union inany_addr inany_loopback4 = INANY_INIT4(IN4ADDR_LOOPBACK_INIT);
const union inany_addr inany_any4 = INANY_INIT4(IN4ADDR_ANY_INIT);
/** inany_ntop - Convert an IPv[46] address to text format
- * @src: IPv[46] address
+ * @src: IPv[46] address (NULL for unspecified)
* @dst: output buffer, minimum INANY_ADDRSTRLEN bytes
* @size: size of buffer at @dst
*
@@ -30,9 +30,12 @@ const union inany_addr inany_any4 = INANY_INIT4(IN4ADDR_ANY_INIT);
*/
const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size)
{
- const struct in_addr *v4 = inany_v4(src);
+ const struct in_addr *v4;
+
+ if (!src)
+ return strncpy(dst, "*", size);
- if (v4)
+ if ((v4 = inany_v4(src)))
return inet_ntop(AF_INET, v4, dst, size);
return inet_ntop(AF_INET6, &src->a6, dst, size);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards David Gibson
` (11 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
At present, we set up forwarding as we parse the -t, and -u options, not
keeping a persistent data structure with all the details. We do have
some information in struct fwd_ports, but it doesn't capture all the nuance
that the options do.
As a first step to generalising our forwarding model, add a table of all
the forwarding rules to struct fwd_ports. For now it covers only explicit
forwards, not automatic, and we don't do anything with it other than print
some additional debug information. We'll do more with it in future
patches.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 80 ++++++++++++++++++++++++++++++-------------------
fwd.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fwd.h | 35 +++++++++++++++++++++-
3 files changed, 179 insertions(+), 31 deletions(-)
diff --git a/conf.c b/conf.c
index c936664b..127e69f5 100644
--- a/conf.c
+++ b/conf.c
@@ -137,7 +137,7 @@ static int parse_port_range(const char *s, char **endptr,
* @last: Last port to forward
* @exclude: Bitmap of ports to exclude
* @to: Port to translate @first to when forwarding
- * @weak: Ignore errors, as long as at least one port is mapped
+ * @flags: Flags for forwarding entries
*/
static void conf_ports_range_except(const struct ctx *c, char optname,
const char *optarg, struct fwd_ports *fwd,
@@ -145,10 +145,11 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
const char *ifname,
uint16_t first, uint16_t last,
const uint8_t *exclude, uint16_t to,
- bool weak)
+ uint8_t flags)
{
+ unsigned delta = to - first;
bool bound_one = false;
- unsigned i;
+ unsigned base, i;
int fd;
if (first == 0) {
@@ -172,37 +173,45 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
}
}
- for (i = first; i <= last; i++) {
- if (bitmap_isset(exclude, i))
+ for (base = first; base <= last; base++) {
+ if (bitmap_isset(exclude, base))
continue;
- if (bitmap_isset(fwd->map, i)) {
- warn(
-"Altering mapping of already mapped port number: %s", optarg);
- }
+ for (i = base; i <= last; i++) {
+ if (bitmap_isset(exclude, i))
+ break;
- bitmap_set(fwd->map, i);
- fwd->delta[i] = to - first;
+ if (bitmap_isset(fwd->map, i)) {
+ warn(
+"Altering mapping of already mapped port number: %s", optarg);
+ }
- if (optname == 't')
- fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
- else if (optname == 'u')
- fd = udp_listen(c, PIF_HOST, addr, ifname, i);
- else
- /* No way to check in advance for -T and -U */
- fd = 0;
+ if (optname == 't')
+ fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
+ else if (optname == 'u')
+ fd = udp_listen(c, PIF_HOST, addr, ifname, i);
+ else
+ /* No way to check in advance for -T and -U */
+ fd = 0;
+
+ if (fd == -ENFILE || fd == -EMFILE) {
+ die(
+"Can't open enough sockets for port specifier: %s",
+ optarg);
+ }
- if (fd == -ENFILE || fd == -EMFILE) {
- die("Can't open enough sockets for port specifier: %s",
- optarg);
+ if (fd >= 0) {
+ bound_one = true;
+ } else if (!(flags & FWD_WEAK)) {
+ die(
+"Failed to bind port %u (%s) for option '-%c %s'",
+ i, strerror_(-fd), optname, optarg);
+ }
}
- if (fd >= 0) {
- bound_one = true;
- } else if (!weak) {
- die("Failed to bind port %u (%s) for option '-%c %s'",
- i, strerror_(-fd), optname, optarg);
- }
+ fwd_rule_add(fwd, flags, addr, ifname, base, i - 1,
+ base + delta);
+ base = i - 1;
}
if (!bound_one)
@@ -272,7 +281,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
conf_ports_range_except(c, optname, optarg, fwd,
NULL, NULL,
1, NUM_PORTS - 1, exclude,
- 1, true);
+ 1, FWD_WEAK);
return;
}
@@ -357,7 +366,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
conf_ports_range_except(c, optname, optarg, fwd,
addr, ifname,
1, NUM_PORTS - 1, exclude,
- 1, true);
+ 1, FWD_WEAK);
return;
}
@@ -390,7 +399,7 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
addr, ifname,
orig_range.first, orig_range.last,
exclude,
- mapped_range.first, false);
+ mapped_range.first, 0);
} while ((p = next_chunk(p, ',')));
return;
@@ -1225,6 +1234,17 @@ dns6:
info(" %s", c->dns_search[i].n);
}
}
+
+ info("Inbound TCP forwarding:");
+ fwd_rules_print(&c->tcp.fwd_in);
+ info("Inbound UDP forwarding:");
+ fwd_rules_print(&c->udp.fwd_in);
+ if (c->mode == MODE_PASTA) {
+ info("Outbound TCP forwarding:");
+ fwd_rules_print(&c->tcp.fwd_out);
+ info("Outbound UDP forwarding:");
+ fwd_rules_print(&c->udp.fwd_out);
+ }
}
/**
diff --git a/fwd.c b/fwd.c
index 961c533f..ad398e54 100644
--- a/fwd.c
+++ b/fwd.c
@@ -13,6 +13,7 @@
* Author: David Gibson <david@gibson.dropbear.id.au>
*/
+#include <assert.h>
#include <stdint.h>
#include <errno.h>
#include <fcntl.h>
@@ -22,6 +23,8 @@
#include "util.h"
#include "ip.h"
+#include "siphash.h"
+#include "inany.h"
#include "fwd.h"
#include "passt.h"
#include "lineread.h"
@@ -300,6 +303,19 @@ parse_err:
warn("Unable to parse %s", PORT_RANGE_SYSCTL);
}
+/**
+ * fwd_rule_addr() - Return match address for a rule
+ * @rule: Forwarding rule
+ *
+ * Return: Rule's match address, or NULL, if it matches any address
+ */
+static const union inany_addr *fwd_rule_addr(const struct fwd_rule *rule)
+{
+ if (rule->flags & FWD_DUAL_STACK_ANY)
+ return NULL;
+ return &rule->addr;
+}
+
/**
* fwd_port_is_ephemeral() - Is port number ephemeral?
* @port: Port number
@@ -313,6 +329,85 @@ bool fwd_port_is_ephemeral(in_port_t port)
return (port >= fwd_ephemeral_min) && (port <= fwd_ephemeral_max);
}
+/**
+ * fwd_rule_add() - Add a rule to a forwarding table
+ * @fwd: Table to add to
+ * @flags: Flags for this entry
+ * @addr: Our address to forward (NULL for both 0.0.0.0 and ::)
+ * @ifname: Only forward from this interface name, if non-empty
+ * @first: First port number to forward
+ * @last: Last port number to forward
+ * @to: First port of target port range to map to
+ */
+void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t first, in_port_t last, in_port_t to)
+{
+ /* Flags which can be set from the caller */
+ const uint8_t allowed_flags = FWD_WEAK;
+ struct fwd_rule *new;
+ unsigned port;
+
+ ASSERT(!(flags & ~allowed_flags));
+
+ if (fwd->count >= ARRAY_SIZE(fwd->rules))
+ die("Too many port forwarding ranges");
+
+ new = &fwd->rules[fwd->count++];
+ new->flags = flags;
+
+ if (addr) {
+ new->addr = *addr;
+ } else {
+ new->addr = inany_any6;
+ new->flags |= FWD_DUAL_STACK_ANY;
+ }
+
+ memset(new->ifname, 0, sizeof(new->ifname));
+ if (ifname)
+ strncpy(new->ifname, ifname, sizeof(new->ifname));
+
+ ASSERT(first <= last);
+ new->first = first;
+ new->last = last;
+
+ new->to = to;
+
+ for (port = new->first; port <= new->last; port++) {
+ /* Fill in the legacy data structures to match the table */
+ bitmap_set(fwd->map, port);
+ fwd->delta[port] = new->to - new->first;
+ }
+}
+
+/**
+ * fwd_rules_print() - Print forwarding rules for debugging
+ * @fwd: Table to print
+ */
+void fwd_rules_print(const struct fwd_ports *fwd)
+{
+ unsigned i;
+
+ for (i = 0; i < fwd->count; i++) {
+ const struct fwd_rule *rule = &fwd->rules[i];
+ const char *weak = rule->flags & FWD_WEAK ? " WEAK" : "";
+ const char *percent = *rule->ifname ? "%" : "";
+ char addr[INANY_ADDRSTRLEN];
+
+ inany_ntop(fwd_rule_addr(rule), addr, sizeof(addr));
+
+ if (rule->first == rule->last) {
+ info(" [%s]%s%s:%hu => %hu %s",
+ addr, percent, rule->ifname,
+ rule->first, rule->to, weak);
+ } else {
+ info(" [%s]%s%s:%hu-%hu => %hu-%hu %s",
+ addr, percent, rule->ifname, rule->first, rule->last,
+ rule->to, rule->last - rule->first + rule->to, weak);
+ }
+ }
+}
+
/* See enum in kernel's include/net/tcp_states.h */
#define UDP_LISTEN 0x07
#define TCP_LISTEN 0x0a
diff --git a/fwd.h b/fwd.h
index 934bab33..3dfc7959 100644
--- a/fwd.h
+++ b/fwd.h
@@ -16,6 +16,30 @@ struct flowside;
void fwd_probe_ephemeral(void);
bool fwd_port_is_ephemeral(in_port_t port);
+/**
+ * struct fwd_rule - Forwarding rule governing a range of ports
+ * @addr: Address to forward from
+ * @ifname: Interface to forward from
+ * @first: First port number to forward
+ * @last: Last port number to forward
+ * @to: Port number to forward port @first to.
+ * @flags: Flag mask
+ * FWD_DUAL_STACK_ANY - match any IPv4 or IPv6 address (@addr should be ::)
+ * FWD_WEAK - Don't give an error if binds fail for some forwards
+ *
+ * FIXME: @addr and @ifname currently ignored for outbound tables
+ */
+struct fwd_rule {
+ union inany_addr addr;
+ char ifname[IFNAMSIZ];
+ in_port_t first, last, to;
+#define FWD_DUAL_STACK_ANY BIT(0)
+#define FWD_WEAK BIT(1)
+ uint8_t flags;
+};
+
+#define MAX_FWD_RULES 128
+
/**
* union fwd_listen_ref - information about a single listening socket
* @port: Bound port number of the socket
@@ -44,6 +68,8 @@ enum fwd_ports_mode {
* @mode: Overall forwarding mode (all, none, auto, specific ports)
* @scan4: /proc/net fd to scan for IPv4 ports when in AUTO mode
* @scan6: /proc/net fd to scan for IPv6 ports when in AUTO mode
+ * @count: Number of forwarding rules
+ * @rules: Array of forwarding rules
* @map: Bitmap describing which ports are forwarded
* @delta: Offset between the original destination and mapped port number
*/
@@ -51,14 +77,21 @@ struct fwd_ports {
enum fwd_ports_mode mode;
int scan4;
int scan6;
+ unsigned count;
+ struct fwd_rule rules[MAX_FWD_RULES];
uint8_t map[PORT_BITMAP_SIZE];
in_port_t delta[NUM_PORTS];
};
#define FWD_PORT_SCAN_INTERVAL 1000 /* ms */
+void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t first, in_port_t last, in_port_t to);
+void fwd_rules_print(const struct fwd_ports *fwd);
+
void fwd_scan_ports_init(struct ctx *c);
-void fwd_scan_ports_timer(struct ctx *c, const struct timespec *now);
+void fwd_scan_ports_timer(struct ctx * c, const struct timespec *now);
bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
union inany_addr *translated);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
2026-01-08 2:29 ` [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
` (10 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
-T and -U options don't allow specifying a listening address. Usually this
will listen on *%lo in the guest. However on kernels without unprivileged
SO_BINDTODEVICE that's not possible so we instead listen separately on
127.0.0.1 and ::1.
Currently that's handled at the point we actually set up the listens,
we record both address and ifname as NULL in the forwarding table
entry. That will cause trouble for future extensions we want, so
update this to accurately create the forwarding table: either a single
rule with ifname == "lo" or two rules with addresses of 127.0.0.1 and
::1.
As a bonus, this gives the user a warning if they specify an explicit
outbound forwarding on a kernel without SO_BINDTODEVICE. The existing
warning for missing SO_BINDTODEVICE incorrectly covered only the case
of -T auto or -U auto.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)
diff --git a/conf.c b/conf.c
index 127e69f5..b486fefe 100644
--- a/conf.c
+++ b/conf.c
@@ -157,12 +157,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
optname, optarg);
}
- if (ifname && c->no_bindtodevice) {
- die(
-"Device binding for '-%c %s' unsupported (requires kernel 5.7+)",
- optname, optarg);
- }
-
if (addr) {
if (!c->ifi4 && inany_v4(addr)) {
die("IPv4 is disabled, can't use -%c %s",
@@ -209,8 +203,27 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
}
}
- fwd_rule_add(fwd, flags, addr, ifname, base, i - 1,
- base + delta);
+ if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) {
+ /* FIXME: Once the fwd bitmaps are removed, move this
+ * workaround to the caller
+ */
+ ASSERT(!addr && ifname && !strcmp(ifname, "lo"));
+ warn(
+"SO_BINDTODEVICE unavailable, forwarding only 127.0.0.1 and ::1 for '-%c %s'",
+ optname, optarg);
+
+ if (c->ifi4) {
+ fwd_rule_add(fwd, flags, &inany_loopback4, NULL,
+ base, i - 1, base + delta);
+ }
+ if (c->ifi6) {
+ fwd_rule_add(fwd, flags, &inany_loopback6, NULL,
+ base, i - 1, base + delta);
+ }
+ } else {
+ fwd_rule_add(fwd, flags, addr, ifname,
+ base, i - 1, base + delta);
+ }
base = i - 1;
}
@@ -357,6 +370,15 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
}
} while ((p = next_chunk(p, ',')));
+ if (ifname && c->no_bindtodevice) {
+ die(
+"Device binding for '-%c %s' unsupported (requires kernel 5.7+)",
+ optname, optarg);
+ }
+ /* Outbound forwards come from guest loopback */
+ if ((optname == 'T' || optname == 'U') && !ifname)
+ ifname = "lo";
+
if (exclude_only) {
/* Exclude ephemeral ports */
for (i = 0; i < NUM_PORTS; i++)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (2 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table David Gibson
` (9 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Currently the forwarding table records details of explicit port forwards,
but nothing for -[tTuU] auto. That looks a little odd on the debug output,
and will be a problem for future changes.
Extend the forward table to have rules for auto-scanned forwards,
using a new FWD_SCAN flag. For now the mechanism of auto port
forwarding isn't updated, and we will only create a single FWD_SCAN
rule per protocol and direction. We'll better integrate auto scanning
with other forward table mechanics in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 31 ++++++++++++++++++++++++++-----
fwd.c | 18 +++++++++++-------
fwd.h | 2 ++
3 files changed, 39 insertions(+), 12 deletions(-)
diff --git a/conf.c b/conf.c
index b486fefe..0bcf80d7 100644
--- a/conf.c
+++ b/conf.c
@@ -135,7 +135,7 @@ static int parse_port_range(const char *s, char **endptr,
* @ifname: Listening interface
* @first: First port to forward
* @last: Last port to forward
- * @exclude: Bitmap of ports to exclude
+ * @exclude: Bitmap of ports to exclude (may be NULL)
* @to: Port to translate @first to when forwarding
* @flags: Flags for forwarding entries
*/
@@ -168,11 +168,11 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
}
for (base = first; base <= last; base++) {
- if (bitmap_isset(exclude, base))
+ if (exclude && bitmap_isset(exclude, base))
continue;
for (i = base; i <= last; i++) {
- if (bitmap_isset(exclude, i))
+ if (exclude && bitmap_isset(exclude, i))
break;
if (bitmap_isset(fwd->map, i)) {
@@ -180,9 +180,9 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
"Altering mapping of already mapped port number: %s", optarg);
}
- if (optname == 't')
+ if (!(flags & FWD_SCAN) && optname == 't')
fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
- else if (optname == 'u')
+ else if (!(flags & FWD_SCAN) && optname == 'u')
fd = udp_listen(c, PIF_HOST, addr, ifname, i);
else
/* No way to check in advance for -T and -U */
@@ -2202,6 +2202,27 @@ void conf(struct ctx *c, int argc, char **argv)
if (!c->udp.fwd_out.mode)
c->udp.fwd_out.mode = fwd_default;
+ if (c->tcp.fwd_in.mode == FWD_AUTO) {
+ conf_ports_range_except(c, 't', "auto", &c->tcp.fwd_in,
+ NULL, NULL, 1, NUM_PORTS - 1,
+ NULL, 1, FWD_SCAN);
+ }
+ if (c->tcp.fwd_out.mode == FWD_AUTO) {
+ conf_ports_range_except(c, 'T', "auto", &c->tcp.fwd_out,
+ NULL, "lo", 1, NUM_PORTS - 1,
+ NULL, 1, FWD_SCAN);
+ }
+ if (c->udp.fwd_in.mode == FWD_AUTO) {
+ conf_ports_range_except(c, 'u', "auto", &c->udp.fwd_in,
+ NULL, NULL, 1, NUM_PORTS - 1,
+ NULL, 1, FWD_SCAN);
+ }
+ if (c->udp.fwd_out.mode == FWD_AUTO) {
+ conf_ports_range_except(c, 'U', "auto", &c->udp.fwd_out,
+ NULL, "lo", 1, NUM_PORTS - 1,
+ NULL, 1, FWD_SCAN);
+ }
+
if (!c->quiet)
conf_print(c);
}
diff --git a/fwd.c b/fwd.c
index ad398e54..69aca441 100644
--- a/fwd.c
+++ b/fwd.c
@@ -344,7 +344,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
in_port_t first, in_port_t last, in_port_t to)
{
/* Flags which can be set from the caller */
- const uint8_t allowed_flags = FWD_WEAK;
+ const uint8_t allowed_flags = FWD_WEAK | FWD_SCAN;
struct fwd_rule *new;
unsigned port;
@@ -375,7 +375,8 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
for (port = new->first; port <= new->last; port++) {
/* Fill in the legacy data structures to match the table */
- bitmap_set(fwd->map, port);
+ if (!(new->flags & FWD_SCAN))
+ bitmap_set(fwd->map, port);
fwd->delta[port] = new->to - new->first;
}
}
@@ -391,19 +392,22 @@ void fwd_rules_print(const struct fwd_ports *fwd)
for (i = 0; i < fwd->count; i++) {
const struct fwd_rule *rule = &fwd->rules[i];
const char *weak = rule->flags & FWD_WEAK ? " WEAK" : "";
+ const char *scan = rule->flags & FWD_SCAN ? " AUTO" : "";
const char *percent = *rule->ifname ? "%" : "";
char addr[INANY_ADDRSTRLEN];
inany_ntop(fwd_rule_addr(rule), addr, sizeof(addr));
if (rule->first == rule->last) {
- info(" [%s]%s%s:%hu => %hu %s",
+ info(" [%s]%s%s:%hu => %hu %s%s",
addr, percent, rule->ifname,
- rule->first, rule->to, weak);
+ rule->first, rule->to, weak, scan);
} else {
- info(" [%s]%s%s:%hu-%hu => %hu-%hu %s",
- addr, percent, rule->ifname, rule->first, rule->last,
- rule->to, rule->last - rule->first + rule->to, weak);
+ info(" [%s]%s%s:%hu-%hu => %hu-%hu %s%s",
+ addr, percent, rule->ifname,
+ rule->first, rule->last,
+ rule->to, rule->last - rule->first + rule->to,
+ weak, scan);
}
}
}
diff --git a/fwd.h b/fwd.h
index 3dfc7959..94869c2a 100644
--- a/fwd.h
+++ b/fwd.h
@@ -26,6 +26,7 @@ bool fwd_port_is_ephemeral(in_port_t port);
* @flags: Flag mask
* FWD_DUAL_STACK_ANY - match any IPv4 or IPv6 address (@addr should be ::)
* FWD_WEAK - Don't give an error if binds fail for some forwards
+ * FWD_SCAN - Only forward if we scan a listener on the target
*
* FIXME: @addr and @ifname currently ignored for outbound tables
*/
@@ -35,6 +36,7 @@ struct fwd_rule {
in_port_t first, last, to;
#define FWD_DUAL_STACK_ANY BIT(0)
#define FWD_WEAK BIT(1)
+#define FWD_SCAN BIT(2)
uint8_t flags;
};
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (3 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
` (8 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
At present, we don't keep track of the fds for listening sockets (except
for "auto" ones). Since the fd is stored in the epoll reference, we didn't
need an alternative source of it for the various handlers.
However, we're intending to allow dynamic changes to forwarding
configuration in future. That means we need a way to enumerate sockets so
we can close them on removal of a forward.
Extend our forwarding table data structure to make space for all the
listening sockets. To avoid allocation, this imposes another limit:
we could run out of space for socket fds before we run out of slots
for forwarding rules.
We don't actually do anything with the allocate spaced yet. For
"auto" forwards it's redundant with existing arrays. We'll fix both
of those in later patches.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
fwd.c | 10 +++++++++-
fwd.h | 13 +++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/fwd.c b/fwd.c
index 69aca441..f27a4220 100644
--- a/fwd.c
+++ b/fwd.c
@@ -345,6 +345,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
{
/* Flags which can be set from the caller */
const uint8_t allowed_flags = FWD_WEAK | FWD_SCAN;
+ unsigned num = (unsigned)last - first + 1;
struct fwd_rule *new;
unsigned port;
@@ -352,6 +353,8 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
if (fwd->count >= ARRAY_SIZE(fwd->rules))
die("Too many port forwarding ranges");
+ if ((fwd->listen_sock_count + num) > ARRAY_SIZE(fwd->listen_socks))
+ die("Too many listening sockets");
new = &fwd->rules[fwd->count++];
new->flags = flags;
@@ -373,8 +376,13 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
new->to = to;
+ new->socks = &fwd->listen_socks[fwd->listen_sock_count];
+ fwd->listen_sock_count += num;
+
for (port = new->first; port <= new->last; port++) {
- /* Fill in the legacy data structures to match the table */
+ new->socks[port - new->first] = -1;
+
+ /* Fill in the legacy forwarding data structures to match the table */
if (!(new->flags & FWD_SCAN))
bitmap_set(fwd->map, port);
fwd->delta[port] = new->to - new->first;
diff --git a/fwd.h b/fwd.h
index 94869c2a..3ddcb91d 100644
--- a/fwd.h
+++ b/fwd.h
@@ -23,6 +23,7 @@ bool fwd_port_is_ephemeral(in_port_t port);
* @first: First port number to forward
* @last: Last port number to forward
* @to: Port number to forward port @first to.
+ * @socks: Array of listening sockets for this entry
* @flags: Flag mask
* FWD_DUAL_STACK_ANY - match any IPv4 or IPv6 address (@addr should be ::)
* FWD_WEAK - Don't give an error if binds fail for some forwards
@@ -34,6 +35,7 @@ struct fwd_rule {
union inany_addr addr;
char ifname[IFNAMSIZ];
in_port_t first, last, to;
+ int *socks;
#define FWD_DUAL_STACK_ANY BIT(0)
#define FWD_WEAK BIT(1)
#define FWD_SCAN BIT(2)
@@ -65,6 +67,13 @@ enum fwd_ports_mode {
#define PORT_BITMAP_SIZE DIV_ROUND_UP(NUM_PORTS, 8)
+/* Maximum number of listening sockets (per pif & protocol)
+ *
+ * Rationale: This lets us listen on every port for two addresses (which we need
+ * for -T auto without SO_BINDTODEVICE), plus a comfortable number of extras.
+ */
+#define MAX_LISTEN_SOCKS (NUM_PORTS * 3)
+
/**
* fwd_ports() - Describes port forwarding for one protocol and direction
* @mode: Overall forwarding mode (all, none, auto, specific ports)
@@ -74,6 +83,8 @@ enum fwd_ports_mode {
* @rules: Array of forwarding rules
* @map: Bitmap describing which ports are forwarded
* @delta: Offset between the original destination and mapped port number
+ * @listen_sock_count: Number of entries used in @listen_socks
+ * @listen_socks: Listening sockets for forwarding
*/
struct fwd_ports {
enum fwd_ports_mode mode;
@@ -83,6 +94,8 @@ struct fwd_ports {
struct fwd_rule rules[MAX_FWD_RULES];
uint8_t map[PORT_BITMAP_SIZE];
in_port_t delta[NUM_PORTS];
+ unsigned listen_sock_count;
+ int listen_socks[MAX_LISTEN_SOCKS];
};
#define FWD_PORT_SCAN_INTERVAL 1000 /* ms */
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 06/14] ip: Add ipproto_name() function
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (4 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 13:22 ` Laurent Vivier
2026-01-08 2:29 ` [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
` (7 subsequent siblings)
13 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Add a function to get the name of an IP protocol from its number. Usually
this would be done by getprotobynumber(), but that requires access to
/etc/protocols and might allocate. We can't do either of those once we've
self-isolated.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
ip.c | 27 +++++++++++++++++++++++++++
ip.h | 2 ++
2 files changed, 29 insertions(+)
diff --git a/ip.c b/ip.c
index 9a7f4c54..f1d224bd 100644
--- a/ip.c
+++ b/ip.c
@@ -67,3 +67,30 @@ found:
*proto = nh;
return true;
}
+
+/**
+ * ipproto_name() - Get IP protocol name from number
+ * @proto: IP protocol number
+ *
+ * Return: pointer to name of protocol @proto
+ *
+ * Usually this would be done with getprotobynumber(3) but that reads
+ * /etc/protocols and might allocate, which isn't possible for us once
+ * self-isolated.
+ */
+/* cppcheck-suppress unusedFunction */
+const char *ipproto_name(uint8_t proto)
+{
+ switch (proto) {
+ case IPPROTO_ICMP:
+ return "ICMP";
+ case IPPROTO_TCP:
+ return "TCP";
+ case IPPROTO_UDP:
+ return "UDP";
+ case IPPROTO_ICMPV6:
+ return "ICMPv6";
+ default:
+ return "<unknown protocol>";
+ }
+}
diff --git a/ip.h b/ip.h
index 5830b923..42b417c5 100644
--- a/ip.h
+++ b/ip.h
@@ -116,6 +116,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
}
bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
+const char *ipproto_name(uint8_t proto);
/* IPv6 link-local all-nodes multicast address, ff02::1 */
static const struct in6_addr in6addr_ll_all_nodes = {
@@ -135,4 +136,5 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff };
#define IPV6_MIN_MTU 1280
#endif
+
#endif /* IP_H */
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (5 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
` (6 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Previously we created inbound listening sockets as we parsed the forwarding
options (-t, -u) whereas outbound listening sockets were created during
{tcp,udp}_init(). Now that we have a data structure recording the full
details of the listening options we can move all listening socket creation
to {tcp,udp}_init(). This means that errors for either direction are
detected and reported the same way.
Introduce fwd_listen_sync() which synchronizes the state of listening
sockets to the forwarding rules table, both for fixed and automatic
forwards.
This does cause a change in semantics for "exclude only" port
specifications. Previously an option like -t ~6000 wouldn't cause a
fatal error, as long as we could bind at least one port. Now, it
requires at least one port for each generated rule; that is for each
of the contiguous blocks of ports the specification resolves to. With
typical ephemeral ports settings that's one port each in 1..5999,
6001..32767 and 61000..65535.
Preserving the exact behaviour for this case would require a considerably
more complex data structure, so I'm hoping this is a sufficiently niche
case for the change to be acceptable.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 27 ----------
fwd.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
fwd.h | 3 ++
ip.c | 1 -
tcp.c | 122 ++---------------------------------------
tcp.h | 1 -
udp.c | 99 +++-------------------------------
udp.h | 1 -
8 files changed, 177 insertions(+), 244 deletions(-)
diff --git a/conf.c b/conf.c
index 0bcf80d7..57693b3f 100644
--- a/conf.c
+++ b/conf.c
@@ -148,9 +148,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
uint8_t flags)
{
unsigned delta = to - first;
- bool bound_one = false;
unsigned base, i;
- int fd;
if (first == 0) {
die("Can't forward port 0 for option '-%c %s'",
@@ -179,28 +177,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
warn(
"Altering mapping of already mapped port number: %s", optarg);
}
-
- if (!(flags & FWD_SCAN) && optname == 't')
- fd = tcp_listen(c, PIF_HOST, addr, ifname, i);
- else if (!(flags & FWD_SCAN) && optname == 'u')
- fd = udp_listen(c, PIF_HOST, addr, ifname, i);
- else
- /* No way to check in advance for -T and -U */
- fd = 0;
-
- if (fd == -ENFILE || fd == -EMFILE) {
- die(
-"Can't open enough sockets for port specifier: %s",
- optarg);
- }
-
- if (fd >= 0) {
- bound_one = true;
- } else if (!(flags & FWD_WEAK)) {
- die(
-"Failed to bind port %u (%s) for option '-%c %s'",
- i, strerror_(-fd), optname, optarg);
- }
}
if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) {
@@ -226,9 +202,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
}
base = i - 1;
}
-
- if (!bound_one)
- die("Failed to bind any port for '-%c %s'", optname, optarg);
}
/**
diff --git a/fwd.c b/fwd.c
index f27a4220..70ef73a3 100644
--- a/fwd.c
+++ b/fwd.c
@@ -22,6 +22,7 @@
#include <stdio.h>
#include "util.h"
+#include "epoll_ctl.h"
#include "ip.h"
#include "siphash.h"
#include "inany.h"
@@ -420,6 +421,160 @@ void fwd_rules_print(const struct fwd_ports *fwd)
}
}
+/** fwd_sync_one() - Create or remove listening sockets for a forward entry
+ * @c: Execution context
+ * @rule: Forwarding rule
+ * @pif: Interface to create listening sockets for
+ * @proto: Protocol to listen for
+ * @scanmap: Bitmap of ports to listen for on FWD_SCAN entries
+ *
+ * Return: 0 on success, -1 on failure
+ */
+static int fwd_sync_one(const struct ctx *c, const struct fwd_rule *rule,
+ uint8_t pif, uint8_t proto, const uint8_t *scanmap)
+{
+ const union inany_addr *addr = fwd_rule_addr(rule);
+ const char *ifname = rule->ifname;
+ bool bound_one = false;
+ unsigned port;
+
+ ASSERT(pif_is_socket(pif));
+
+ if (!*ifname)
+ ifname = NULL;
+
+ for (port = rule->first; port <= rule->last; port++) {
+ int fd = rule->socks[port - rule->first];
+
+ if ((rule->flags & FWD_SCAN) && !bitmap_isset(scanmap, port)) {
+ /* We don't want to listen on this port */
+ if (fd >= 0) {
+ /* We already are, so stop */
+ epoll_del(c->epollfd, fd);
+ close(fd);
+ rule->socks[port - rule->first] = -1;
+ }
+ continue;
+ }
+
+ if (fd >= 0) /* Already listening, nothing to do */ {
+ bound_one = true;
+ continue;
+ }
+
+ if (proto == IPPROTO_TCP)
+ fd = tcp_listen(c, pif, addr, ifname, port);
+ else if (proto == IPPROTO_UDP)
+ fd = udp_listen(c, pif, addr, ifname, port);
+ else
+ ASSERT(0);
+
+ if (fd < 0) {
+ char astr[INANY_ADDRSTRLEN] = "";
+
+ if (addr)
+ inany_ntop(addr, astr, sizeof(astr));
+
+ warn("Listen failed for %s %s port %s%s%s%s%u: %s",
+ pif_name(pif), ipproto_name(proto),
+ astr, ifname ? "%" : "", ifname ? ifname : "",
+ addr || ifname ? "/" : "", port, strerror_(-fd));
+
+ if (!(rule->flags & FWD_WEAK))
+ return -1;
+
+ continue;
+ }
+
+ rule->socks[port - rule->first] = fd;
+ bound_one = true;
+ }
+
+ if (!bound_one && !(rule->flags & FWD_SCAN)) {
+ char astr[INANY_ADDRSTRLEN] = "";
+
+ if (addr)
+ inany_ntop(addr, astr, sizeof(astr));
+
+ warn("All listens failed for %s %s %s%s%s%s%u-%u",
+ pif_name(pif), ipproto_name(proto),
+ astr, ifname ? "%" : "", ifname ? ifname : "",
+ addr || ifname ? "/" : "", rule->first, rule->last);
+ return -1;
+ }
+
+ return 0;
+}
+
+/** struct fwd_listen_args - arguments for fwd_listen_init_()
+ * @c: Execution context
+ * @fwd: Forwarding information
+ * @scanmap: Bitmap of ports to auto-forward
+ * @pif: Interface to create listening sockets for
+ * @proto: Protocol
+ * @ret: Return code
+ */
+struct fwd_listen_args {
+ const struct ctx *c;
+ const struct fwd_ports *fwd;
+ const uint8_t *scanmap;
+ uint8_t pif;
+ uint8_t proto;
+ int ret;
+};
+
+/** fwd_listen_sync_() - Update listening sockets to match forwards
+ * @arg: struct fwd_listen_args with arguments
+ *
+ * Returns: zero
+ */
+static int fwd_listen_sync_(void *arg)
+{
+ struct fwd_listen_args *a = arg;
+ unsigned i;
+
+ if (a->pif == PIF_SPLICE)
+ ns_enter(a->c);
+
+ for (i = 0; i < a->fwd->count; i++) {
+ a->ret = fwd_sync_one(a->c, &a->fwd->rules[i],
+ a->pif, a->proto, a->fwd->map);
+ if (a->ret < 0)
+ break;
+ }
+
+ return 0;
+}
+
+/** fwd_listen_sync() - Update listening sockets to match forwards
+ * @c: Execution context
+ * @fwd: Forwarding information
+ * @pif: Interface to create listening sockets for
+ * @proto: Protocol
+ *
+ * Return: 0 on success, -1 on failure
+ */
+int fwd_listen_sync(const struct ctx *c, const struct fwd_ports *fwd,
+ uint8_t pif, uint8_t proto)
+{
+ struct fwd_listen_args a = {
+ .c = c, .fwd = fwd, .pif = pif, .proto = proto,
+ };
+
+ if (pif == PIF_SPLICE)
+ NS_CALL(fwd_listen_sync_, &a);
+ else
+ fwd_listen_sync_(&a);
+
+ if (a.ret < 0) {
+ err("Couldn't listen on requested %s ports",
+ ipproto_name(proto));
+ return -1;
+ }
+
+ return 0;
+}
+
/* See enum in kernel's include/net/tcp_states.h */
#define UDP_LISTEN 0x07
#define TCP_LISTEN 0x0a
@@ -578,10 +733,14 @@ void fwd_scan_ports_timer(struct ctx *c, const struct timespec *now)
fwd_scan_ports(c);
- if (!c->no_tcp)
- tcp_port_rebind_all(c);
- if (!c->no_udp)
- udp_port_rebind_all(c);
+ if (!c->no_tcp) {
+ fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP);
+ fwd_listen_sync(c, &c->tcp.fwd_out, PIF_SPLICE, IPPROTO_TCP);
+ }
+ if (!c->no_udp) {
+ fwd_listen_sync(c, &c->udp.fwd_in, PIF_HOST, IPPROTO_UDP);
+ fwd_listen_sync(c, &c->udp.fwd_out, PIF_SPLICE, IPPROTO_UDP);
+ }
}
/**
diff --git a/fwd.h b/fwd.h
index 3ddcb91d..f84e7c01 100644
--- a/fwd.h
+++ b/fwd.h
@@ -108,6 +108,9 @@ void fwd_rules_print(const struct fwd_ports *fwd);
void fwd_scan_ports_init(struct ctx *c);
void fwd_scan_ports_timer(struct ctx * c, const struct timespec *now);
+int fwd_listen_sync(const struct ctx *c, const struct fwd_ports *fwd,
+ uint8_t pif, uint8_t proto);
+
bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
union inany_addr *translated);
uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
diff --git a/ip.c b/ip.c
index f1d224bd..fc26dab2 100644
--- a/ip.c
+++ b/ip.c
@@ -78,7 +78,6 @@ found:
* /etc/protocols and might allocate, which isn't possible for us once
* self-isolated.
*/
-/* cppcheck-suppress unusedFunction */
const char *ipproto_name(uint8_t proto)
{
switch (proto) {
diff --git a/tcp.c b/tcp.c
index 57faed4b..976f0ab7 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2732,50 +2732,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
return s;
}
-/**
- * tcp_ns_listen() - Init socket to listen for spliced outbound connections
- * @c: Execution context
- * @port: Port, host order
- */
-static void tcp_ns_listen(const struct ctx *c, in_port_t port)
-{
- ASSERT(!c->no_tcp);
-
- if (!c->no_bindtodevice) {
- tcp_listen(c, PIF_SPLICE, NULL, "lo", port);
- return;
- }
-
- if (c->ifi4)
- tcp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
- if (c->ifi6)
- tcp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
-}
-
-/**
- * tcp_ns_socks_init() - Bind sockets in namespace for outbound connections
- * @arg: Execution context
- *
- * Return: 0
- */
-/* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */
-static int tcp_ns_socks_init(void *arg)
-{
- const struct ctx *c = (const struct ctx *)arg;
- unsigned port;
-
- ns_enter(c);
-
- for (port = 0; port < NUM_PORTS; port++) {
- if (!bitmap_isset(c->tcp.fwd_out.map, port))
- continue;
-
- tcp_ns_listen(c, port);
- }
-
- return 0;
-}
-
/**
* tcp_sock_refill_pool() - Refill one pool of pre-opened sockets
* @pool: Pool of sockets to refill
@@ -2919,10 +2875,13 @@ int tcp_init(struct ctx *c)
tcp_sock_refill_init(c);
+ if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0)
+ return -1;
if (c->mode == MODE_PASTA) {
tcp_splice_init(c);
-
- NS_CALL(tcp_ns_socks_init, c);
+ if (fwd_listen_sync(c, &c->tcp.fwd_out,
+ PIF_SPLICE, IPPROTO_TCP) < 0)
+ return -1;
}
peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
@@ -2941,77 +2900,6 @@ int tcp_init(struct ctx *c)
return 0;
}
-/**
- * tcp_port_rebind() - Rebind ports to match forward maps
- * @c: Execution context
- * @outbound: True to remap outbound forwards, otherwise inbound
- *
- * Must be called in namespace context if @outbound is true.
- */
-static void tcp_port_rebind(struct ctx *c, bool outbound)
-{
- const uint8_t *fmap = outbound ? c->tcp.fwd_out.map : c->tcp.fwd_in.map;
- int (*socks)[IP_VERSIONS] = outbound ? tcp_sock_ns : tcp_sock_init_ext;
- unsigned port;
-
- for (port = 0; port < NUM_PORTS; port++) {
- if (!bitmap_isset(fmap, port)) {
- if (socks[port][V4] >= 0) {
- close(socks[port][V4]);
- socks[port][V4] = -1;
- }
-
- if (socks[port][V6] >= 0) {
- close(socks[port][V6]);
- socks[port][V6] = -1;
- }
-
- continue;
- }
-
- if ((c->ifi4 && socks[port][V4] == -1) ||
- (c->ifi6 && socks[port][V6] == -1)) {
- if (outbound)
- tcp_ns_listen(c, port);
- else
- tcp_listen(c, PIF_HOST, NULL, NULL, port);
- }
- }
-}
-
-/**
- * tcp_port_rebind_outbound() - Rebind ports in namespace
- * @arg: Execution context
- *
- * Called with NS_CALL()
- *
- * Return: 0
- */
-static int tcp_port_rebind_outbound(void *arg)
-{
- struct ctx *c = (struct ctx *)arg;
-
- ns_enter(c);
- tcp_port_rebind(c, true);
-
- return 0;
-}
-
-/**
- * tcp_port_rebind_all() - Rebind ports to match forward maps (in host & ns)
- * @c: Execution context
- */
-void tcp_port_rebind_all(struct ctx *c)
-{
- ASSERT(c->mode == MODE_PASTA && !c->no_tcp);
-
- if (c->tcp.fwd_out.mode == FWD_AUTO)
- NS_CALL(tcp_port_rebind_outbound, c);
-
- if (c->tcp.fwd_in.mode == FWD_AUTO)
- tcp_port_rebind(c, false);
-}
-
/**
* tcp_timer() - Periodic tasks: port detection, closed connections, pool refill
* @c: Execution context
diff --git a/tcp.h b/tcp.h
index ef1e3544..45f97d93 100644
--- a/tcp.h
+++ b/tcp.h
@@ -22,7 +22,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
const union inany_addr *addr, const char *ifname,
in_port_t port);
int tcp_init(struct ctx *c);
-void tcp_port_rebind_all(struct ctx *c);
void tcp_timer(const struct ctx *c, const struct timespec *now);
void tcp_defer_handler(struct ctx *c);
diff --git a/udp.c b/udp.c
index d7dcb1d2..7c5546df 100644
--- a/udp.c
+++ b/udp.c
@@ -1203,98 +1203,6 @@ static void udp_splice_iov_init(void)
}
}
-/**
- * udp_ns_listen() - Init socket to listen for spliced outbound connections
- * @c: Execution context
- * @port: Port, host order
- */
-static void udp_ns_listen(const struct ctx *c, in_port_t port)
-{
- ASSERT(!c->no_udp);
-
- if (!c->no_bindtodevice) {
- udp_listen(c, PIF_SPLICE, NULL, "lo", port);
- return;
- }
-
- if (c->ifi4)
- udp_listen(c, PIF_SPLICE, &inany_loopback4, NULL, port);
- if (c->ifi6)
- udp_listen(c, PIF_SPLICE, &inany_loopback6, NULL, port);
-}
-
-/**
- * udp_port_rebind() - Rebind ports to match forward maps
- * @c: Execution context
- * @outbound: True to remap outbound forwards, otherwise inbound
- *
- * Must be called in namespace context if @outbound is true.
- */
-static void udp_port_rebind(struct ctx *c, bool outbound)
-{
- int (*socks)[NUM_PORTS] = outbound ? udp_splice_ns : udp_splice_init;
- const uint8_t *fmap
- = outbound ? c->udp.fwd_out.map : c->udp.fwd_in.map;
- unsigned port;
-
- for (port = 0; port < NUM_PORTS; port++) {
- if (!bitmap_isset(fmap, port)) {
- if (socks[V4][port] >= 0) {
- close(socks[V4][port]);
- socks[V4][port] = -1;
- }
-
- if (socks[V6][port] >= 0) {
- close(socks[V6][port]);
- socks[V6][port] = -1;
- }
-
- continue;
- }
-
- if ((c->ifi4 && socks[V4][port] == -1) ||
- (c->ifi6 && socks[V6][port] == -1)) {
- if (outbound)
- udp_ns_listen(c, port);
- else
- udp_listen(c, PIF_HOST, NULL, NULL, port);
- }
- }
-}
-
-/**
- * udp_port_rebind_outbound() - Rebind ports in namespace
- * @arg: Execution context
- *
- * Called with NS_CALL()
- *
- * Return: 0
- */
-static int udp_port_rebind_outbound(void *arg)
-{
- struct ctx *c = (struct ctx *)arg;
-
- ns_enter(c);
- udp_port_rebind(c, true);
-
- return 0;
-}
-
-/**
- * udp_port_rebind_all() - Rebind ports to match forward maps (in host & ns)
- * @c: Execution context
- */
-void udp_port_rebind_all(struct ctx *c)
-{
- ASSERT(c->mode == MODE_PASTA && !c->no_udp);
-
- if (c->udp.fwd_out.mode == FWD_AUTO)
- NS_CALL(udp_port_rebind_outbound, c);
-
- if (c->udp.fwd_in.mode == FWD_AUTO)
- udp_port_rebind(c, false);
-}
-
/**
* udp_init() - Initialise per-socket data, and sockets in namespace
* @c: Execution context
@@ -1307,9 +1215,14 @@ int udp_init(struct ctx *c)
udp_iov_init(c);
+ if (fwd_listen_sync(c, &c->udp.fwd_in, PIF_HOST, IPPROTO_UDP) < 0)
+ return -1;
+
if (c->mode == MODE_PASTA) {
udp_splice_iov_init();
- NS_CALL(udp_port_rebind_outbound, c);
+ if (fwd_listen_sync(c, &c->udp.fwd_out,
+ PIF_SPLICE, IPPROTO_UDP) < 0)
+ return -1;
}
return 0;
diff --git a/udp.h b/udp.h
index 94c698e2..73efe036 100644
--- a/udp.h
+++ b/udp.h
@@ -19,7 +19,6 @@ int udp_listen(const struct ctx *c, uint8_t pif,
const union inany_addr *addr, const char *ifname,
in_port_t port);
int udp_init(struct ctx *c);
-void udp_port_rebind_all(struct ctx *c);
void udp_update_l2_buf(const unsigned char *eth_d);
/**
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (6 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules David Gibson
` (5 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Now that we've moved listening socket management to the new forwarding
table data structure, the existing arrays of socket fds are maintained,
but never consulted. Remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 1 -
tcp.c | 24 ------------------------
udp.c | 30 ------------------------------
udp.h | 1 -
4 files changed, 56 deletions(-)
diff --git a/conf.c b/conf.c
index 57693b3f..725cf88b 100644
--- a/conf.c
+++ b/conf.c
@@ -2063,7 +2063,6 @@ void conf(struct ctx *c, int argc, char **argv)
* settings
*/
fwd_probe_ephemeral();
- udp_portmap_clear();
optind = 0;
do {
name = getopt_long(argc, argv, optstring, options, NULL);
diff --git a/tcp.c b/tcp.c
index 976f0ab7..e9b440da 100644
--- a/tcp.c
+++ b/tcp.c
@@ -414,10 +414,6 @@ static const char *tcp_flag_str[] __attribute((__unused__)) = {
"ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED",
};
-/* Listening sockets, used for automatic port forwarding in pasta mode only */
-static int tcp_sock_init_ext [NUM_PORTS][IP_VERSIONS];
-static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS];
-
/* Table of our guest side addresses with very low RTT (assumed to be local to
* the host), LRU
*/
@@ -2689,8 +2685,6 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
.port = port,
.pif = pif,
};
- const struct fwd_ports *fwd;
- int (*socks)[IP_VERSIONS];
int s;
ASSERT(!c->no_tcp);
@@ -2710,25 +2704,9 @@ int tcp_listen(const struct ctx *c, uint8_t pif,
return -EACCES;
}
- if (pif == PIF_HOST) {
- fwd = &c->tcp.fwd_in;
- socks = tcp_sock_init_ext;
- } else {
- ASSERT(pif == PIF_SPLICE);
- fwd = &c->tcp.fwd_out;
- socks = tcp_sock_ns;
- }
-
s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, pif, addr, ifname,
port, ref.u32);
- if (fwd->mode == FWD_AUTO) {
- if (!addr || inany_v4(addr))
- socks[port][V4] = s < 0 ? -1 : s;
- if (!addr || !inany_v4(addr))
- socks[port][V6] = s < 0 ? -1 : s;
- }
-
return s;
}
@@ -2870,8 +2848,6 @@ int tcp_init(struct ctx *c)
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
memset(init_sock_pool6, 0xff, sizeof(init_sock_pool6));
- memset(tcp_sock_init_ext, 0xff, sizeof(tcp_sock_init_ext));
- memset(tcp_sock_ns, 0xff, sizeof(tcp_sock_ns));
tcp_sock_refill_init(c);
diff --git a/udp.c b/udp.c
index 7c5546df..92a87198 100644
--- a/udp.c
+++ b/udp.c
@@ -124,10 +124,6 @@
- sizeof(struct udphdr) \
- sizeof(struct ipv6hdr))
-/* "Spliced" sockets indexed by bound port (host order) */
-static int udp_splice_ns [IP_VERSIONS][NUM_PORTS];
-static int udp_splice_init[IP_VERSIONS][NUM_PORTS];
-
/* Static buffers */
/* UDP header and data for inbound messages */
@@ -193,19 +189,6 @@ static struct mmsghdr udp_mh_splice [UDP_MAX_FRAMES];
/* IOVs for L2 frames */
static struct iovec udp_l2_iov [UDP_MAX_FRAMES][UDP_NUM_IOVS];
-/**
- * udp_portmap_clear() - Clear UDP port map before configuration
- */
-void udp_portmap_clear(void)
-{
- unsigned i;
-
- for (i = 0; i < NUM_PORTS; i++) {
- udp_splice_ns[V4][i] = udp_splice_ns[V6][i] = -1;
- udp_splice_init[V4][i] = udp_splice_init[V6][i] = -1;
- }
-}
-
/**
* udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
* @eth_d: Ethernet destination address, NULL if unchanged
@@ -1145,18 +1128,10 @@ int udp_listen(const struct ctx *c, uint8_t pif,
.pif = pif,
.port = port,
};
- int (*socks)[NUM_PORTS];
int s;
ASSERT(!c->no_udp);
- if (pif == PIF_HOST) {
- socks = udp_splice_init;
- } else {
- ASSERT(pif == PIF_SPLICE);
- socks = udp_splice_ns;
- }
-
if (!c->ifi4) {
if (!addr)
/* Restrict to v6 only */
@@ -1175,11 +1150,6 @@ int udp_listen(const struct ctx *c, uint8_t pif,
s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
addr, ifname, port, ref.u32);
- if (!addr || inany_v4(addr))
- socks[V4][port] = s < 0 ? -1 : s;
- if (!addr || !inany_v4(addr))
- socks[V6][port] = s < 0 ? -1 : s;
-
return s;
}
diff --git a/udp.h b/udp.h
index 73efe036..3c6f90a9 100644
--- a/udp.h
+++ b/udp.h
@@ -6,7 +6,6 @@
#ifndef UDP_H
#define UDP_H
-void udp_portmap_clear(void);
void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now);
void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (7 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
` (4 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
It's possible for a user to supply conflicting forwarding parameters, e.g.
$ pasta -t 80:8080 -t 127.0.0.1/80:8888
We give a warning in this case, but it's based on the legacy
forwarding bitmaps. This is too strict, because it will also warn on
cases that shouldn't conflict because they use different addresses,
e.g.
$ pasta -t 192.0.2.1/80:8080 127.0.0.1/80:8888
Theoretically, it's also too loose because it won't take into account
auto-scan forwarding rules. We can't hit that in practice now,
because we only ever have one auto-scan rule and nothing else, but we
want to remove that restriction in future.
Replace the bitmap based check with a check based on actually scanning
the forwarding rules for conflicts.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 5 -----
fwd.c | 21 ++++++++++++++++++++-
inany.c | 19 +++++++++++++++++++
inany.h | 1 +
4 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/conf.c b/conf.c
index 725cf88b..e8d6d5d9 100644
--- a/conf.c
+++ b/conf.c
@@ -172,11 +172,6 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
for (i = base; i <= last; i++) {
if (exclude && bitmap_isset(exclude, i))
break;
-
- if (bitmap_isset(fwd->map, i)) {
- warn(
-"Altering mapping of already mapped port number: %s", optarg);
- }
}
if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) {
diff --git a/fwd.c b/fwd.c
index 70ef73a3..5208155b 100644
--- a/fwd.c
+++ b/fwd.c
@@ -348,7 +348,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
const uint8_t allowed_flags = FWD_WEAK | FWD_SCAN;
unsigned num = (unsigned)last - first + 1;
struct fwd_rule *new;
- unsigned port;
+ unsigned i, port;
ASSERT(!(flags & ~allowed_flags));
@@ -357,6 +357,25 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
if ((fwd->listen_sock_count + num) > ARRAY_SIZE(fwd->listen_socks))
die("Too many listening sockets");
+ /* Check for any conflicting entries */
+ for (i = 0; i < fwd->count; i++) {
+ char newstr[INANY_ADDRSTRLEN], rulestr[INANY_ADDRSTRLEN];
+ struct fwd_rule *rule = &fwd->rules[i];
+
+ if (!inany_matches(addr, fwd_rule_addr(rule)))
+ /* Non-conflicting addresses */
+ continue;
+
+ if (last < rule->first || rule->last < first)
+ /* Port ranges don't overlap */
+ continue;
+
+ die("Forwarding configuration conflict: %s/%u-%u versus %s/%u-%u",
+ inany_ntop(addr, newstr, sizeof(newstr)), first, last,
+ inany_ntop(fwd_rule_addr(rule), rulestr, sizeof(rulestr)),
+ rule->first, rule->last);
+ }
+
new = &fwd->rules[fwd->count++];
new->flags = flags;
diff --git a/inany.c b/inany.c
index 87a4d8b6..a8c44237 100644
--- a/inany.c
+++ b/inany.c
@@ -21,6 +21,25 @@
const union inany_addr inany_loopback4 = INANY_INIT4(IN4ADDR_LOOPBACK_INIT);
const union inany_addr inany_any4 = INANY_INIT4(IN4ADDR_ANY_INIT);
+/** inany_matches - Do two addresses match
+ * @a, @b: IPv[46] addresses (NULL for 0.0.0.0 & ::)
+ *
+ * Return: true if they match, false otherwise
+ *
+ * Addresses match themselves, but also with unspecified addresses of the same
+ * family.
+ */
+bool inany_matches(const union inany_addr *a, const union inany_addr *b)
+{
+ if (!a || !b)
+ return true;
+
+ if (inany_is_unspecified(a) || inany_is_unspecified(b))
+ return !!inany_v4(a) == !!inany_v4(b);
+
+ return inany_equals(a, b);
+}
+
/** inany_ntop - Convert an IPv[46] address to text format
* @src: IPv[46] address (NULL for unspecified)
* @dst: output buffer, minimum INANY_ADDRSTRLEN bytes
diff --git a/inany.h b/inany.h
index 61b36fb4..b02c2891 100644
--- a/inany.h
+++ b/inany.h
@@ -293,6 +293,7 @@ static inline void inany_siphash_feed(struct siphash_state *state,
#define INANY_ADDRSTRLEN MAX(INET_ADDRSTRLEN, INET6_ADDRSTRLEN)
+bool inany_matches(const union inany_addr *a, const union inany_addr *b);
const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size);
int inany_pton(const char *src, union inany_addr *dst);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (8 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket David Gibson
` (3 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
When auto-forwarding based on port scans, we must exclude our own
listening ports, to avoid circular forwards. Currently we use the
(previous value of the) forwarding bitmaps for the reverse direction
to determine that.
Instead, generate it from the tables of listening sockets that we now
maintain. For now this seems like a lot more work to get to the same
place. However, it does mean we're basing our exclusions directly on the
relevant information: which of the scanned listens belong to us. More
importantly, it's a step towards removing the bitmaps entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
fwd.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/fwd.c b/fwd.c
index 5208155b..3f088fd2 100644
--- a/fwd.c
+++ b/fwd.c
@@ -680,6 +680,28 @@ static void fwd_scan_ports_udp(struct fwd_ports *fwd,
bitmap_and_not(fwd->map, PORT_BITMAP_SIZE, fwd->map, exclude);
}
+/**
+ * current_listen_map() - Get bitmap of which ports we're already listening on
+ * @map: Bitmap to populate
+ * @fwd: Forwarding table to consider
+ */
+static void current_listen_map(uint8_t *map, const struct fwd_ports *fwd)
+{
+ unsigned i;
+
+ memset(map, 0, PORT_BITMAP_SIZE);
+
+ for (i = 0; i < fwd->count; i++) {
+ const struct fwd_rule *rule = &fwd->rules[i];
+ unsigned port;
+
+ for (port = rule->first; port <= rule->last; port++) {
+ if (rule->socks[port - rule->first] >= 0)
+ bitmap_set(map, port);
+ }
+ }
+}
+
/**
* fwd_scan_ports() - Scan automatic port forwarding information
* @c: Execution context
@@ -689,10 +711,10 @@ static void fwd_scan_ports(struct ctx *c)
uint8_t excl_tcp_out[PORT_BITMAP_SIZE], excl_udp_out[PORT_BITMAP_SIZE];
uint8_t excl_tcp_in[PORT_BITMAP_SIZE], excl_udp_in[PORT_BITMAP_SIZE];
- memcpy(excl_tcp_out, c->tcp.fwd_in.map, sizeof(excl_tcp_out));
- memcpy(excl_tcp_in, c->tcp.fwd_out.map, sizeof(excl_tcp_in));
- memcpy(excl_udp_out, c->udp.fwd_in.map, sizeof(excl_udp_out));
- memcpy(excl_udp_in, c->udp.fwd_out.map, sizeof(excl_udp_in));
+ current_listen_map(excl_tcp_out, &c->tcp.fwd_in);
+ current_listen_map(excl_tcp_in, &c->tcp.fwd_out);
+ current_listen_map(excl_udp_out, &c->udp.fwd_in);
+ current_listen_map(excl_udp_in, &c->udp.fwd_out);
fwd_scan_ports_tcp(&c->tcp.fwd_out, excl_tcp_out);
fwd_scan_ports_tcp(&c->tcp.fwd_in, excl_tcp_in);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (9 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule David Gibson
` (2 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
We now have a formal array of forwarding rules. However, we don't actually
consult it when we forward a new flow. Instead we rely on (a) implicit
information (we wouldn't be here if there wasn't a listening socket for the
rule) and (b) the legacy delta[] data structure.
Start addressing this, by searching for a matching forwarding rule when
attempting to forward a new flow. For now this is incomplete:
* We only do this for socket-initiated flows
* We make a potentially costly linear lookup
* We don't actually use the matching rule for anything yet
We'll address each of those in later patches.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 43 ++++++++++++++++++++++++++++++++++---------
fwd.c | 33 +++++++++++++++++++++++++++++++++
fwd.h | 2 ++
3 files changed, 69 insertions(+), 9 deletions(-)
diff --git a/flow.c b/flow.c
index 4f534865..045e9712 100644
--- a/flow.c
+++ b/flow.c
@@ -489,7 +489,7 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif,
struct flowside *flow_target(const struct ctx *c, union flow *flow,
uint8_t proto)
{
- char estr[INANY_ADDRSTRLEN], fstr[INANY_ADDRSTRLEN];
+ char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN];
struct flow_common *f = &flow->f;
const struct flowside *ini = &f->side[INISIDE];
struct flowside *tgt = &f->side[TGTSIDE];
@@ -500,6 +500,30 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
ASSERT(f->pif[INISIDE] != PIF_NONE && f->pif[TGTSIDE] == PIF_NONE);
ASSERT(flow->f.state == FLOW_STATE_INI);
+ if (pif_is_socket(f->pif[INISIDE])) {
+ const struct fwd_ports *fwd;
+
+ if (f->pif[INISIDE] == PIF_HOST && proto == IPPROTO_TCP)
+ fwd = &c->tcp.fwd_in;
+ else if (f->pif[INISIDE] == PIF_HOST && proto == IPPROTO_UDP)
+ fwd = &c->udp.fwd_in;
+ else if (f->pif[INISIDE] == PIF_SPLICE && proto == IPPROTO_TCP)
+ fwd = &c->tcp.fwd_out;
+ else if (f->pif[INISIDE] == PIF_SPLICE && proto == IPPROTO_UDP)
+ fwd = &c->udp.fwd_out;
+ else
+ goto nofwd;
+
+ if (!fwd_rule_search(fwd, ini)) {
+ /* This shouldn't happen, because if there's no rule for
+ * it we should have no listening socket that would let
+ * us get here
+ */
+ flow_dbg(flow, "Unexpected missing forward rule");
+ goto nofwd;
+ }
+ }
+
switch (f->pif[INISIDE]) {
case PIF_TAP:
memcpy(f->tap_omac, MAC_UNDEF, ETH_ALEN);
@@ -514,22 +538,23 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
tgtpif = fwd_nat_from_host(c, proto, ini, tgt);
fwd_neigh_mac_get(c, &tgt->oaddr, f->tap_omac);
break;
-
default:
- flow_err(flow, "No rules to forward %s [%s]:%hu -> [%s]:%hu",
- pif_name(f->pif[INISIDE]),
- inany_ntop(&ini->eaddr, estr, sizeof(estr)),
- ini->eport,
- inany_ntop(&ini->oaddr, fstr, sizeof(fstr)),
- ini->oport);
+ goto nofwd;
}
if (tgtpif == PIF_NONE)
- return NULL;
+ goto nofwd;
f->pif[TGTSIDE] = tgtpif;
flow_set_state(f, FLOW_STATE_TGT);
return tgt;
+
+nofwd:
+ flow_err(flow, "No rules to forward %s %s [%s]:%hu -> [%s]:%hu",
+ pif_name(f->pif[INISIDE]), ipproto_name(proto),
+ inany_ntop(&ini->eaddr, estr, sizeof(estr)), ini->eport,
+ inany_ntop(&ini->oaddr, ostr, sizeof(ostr)), ini->oport);
+ return NULL;
}
/**
diff --git a/fwd.c b/fwd.c
index 3f088fd2..633ba5db 100644
--- a/fwd.c
+++ b/fwd.c
@@ -409,6 +409,39 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
}
}
+/**
+ * fwd_rule_match() - Does a prospective flow match a given forwarding rule
+ * @rule: Forwarding rule
+ * @ini: Initiating side flow information
+ *
+ * Returns: true if the rule applies to the flow, false otherwise
+ */
+static bool fwd_rule_match(const struct fwd_rule *rule,
+ const struct flowside *ini)
+{
+ return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) &&
+ ini->oport >= rule->first && ini->oport <= rule->last;
+}
+
+/**
+ * fwd_rule_search() - Find a rule which matches a prospective flow
+ * @fwd: Forwarding table
+ * @ini: Initiating side flow information
+ *
+ * Returns: first matching rule, or NULL if there is none
+ */
+const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd,
+ const struct flowside *ini)
+{
+ unsigned i;
+
+ for (i = 0; i < fwd->count; i++) {
+ if (fwd_rule_match(&fwd->rules[i], ini))
+ return &fwd->rules[i];
+ }
+ return NULL;
+}
+
/**
* fwd_rules_print() - Print forwarding rules for debugging
* @fwd: Table to print
diff --git a/fwd.h b/fwd.h
index f84e7c01..a10bdbb4 100644
--- a/fwd.h
+++ b/fwd.h
@@ -103,6 +103,8 @@ struct fwd_ports {
void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
const union inany_addr *addr, const char *ifname,
in_port_t first, in_port_t last, in_port_t to);
+const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd,
+ const struct flowside *ini);
void fwd_rules_print(const struct fwd_ports *fwd);
void fwd_scan_ports_init(struct ctx *c);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (10 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references David Gibson
2026-01-08 2:29 ` [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible David Gibson
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Currently we remap port numbers based on the legacy delta[] array, which
is indexed only by original port number, not the listening address. Now
that we look up a forwarding rule entry in flow_target(), we can use this
entry to directly determine the correct remapped port. Implement this,
and remove the old delta[] array.
Link: https://bugs.passt.top/show_bug.cgi?id=187
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 7 ++++---
fwd.c | 21 +++++++--------------
fwd.h | 7 +++----
3 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/flow.c b/flow.c
index 045e9712..38f88732 100644
--- a/flow.c
+++ b/flow.c
@@ -493,6 +493,7 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
struct flow_common *f = &flow->f;
const struct flowside *ini = &f->side[INISIDE];
struct flowside *tgt = &f->side[TGTSIDE];
+ const struct fwd_rule *rule = NULL;
uint8_t tgtpif = PIF_NONE;
ASSERT(flow_new_entry == flow && f->state == FLOW_STATE_INI);
@@ -514,7 +515,7 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
else
goto nofwd;
- if (!fwd_rule_search(fwd, ini)) {
+ if (!(rule = fwd_rule_search(fwd, ini))) {
/* This shouldn't happen, because if there's no rule for
* it we should have no listening socket that would let
* us get here
@@ -531,11 +532,11 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
break;
case PIF_SPLICE:
- tgtpif = fwd_nat_from_splice(c, proto, ini, tgt);
+ tgtpif = fwd_nat_from_splice(rule, proto, ini, tgt);
break;
case PIF_HOST:
- tgtpif = fwd_nat_from_host(c, proto, ini, tgt);
+ tgtpif = fwd_nat_from_host(c, rule, proto, ini, tgt);
fwd_neigh_mac_get(c, &tgt->oaddr, f->tap_omac);
break;
default:
diff --git a/fwd.c b/fwd.c
index 633ba5db..7c4575ff 100644
--- a/fwd.c
+++ b/fwd.c
@@ -405,7 +405,6 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
/* Fill in the legacy forwarding data structures to match the table */
if (!(new->flags & FWD_SCAN))
bitmap_set(fwd->map, port);
- fwd->delta[port] = new->to - new->first;
}
}
@@ -978,7 +977,7 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
/**
* fwd_nat_from_splice() - Determine to forward a flow from the splice interface
- * @c: Execution context
+ * @rule: Forwarding rule to apply
* @proto: Protocol (IP L4 protocol number)
* @ini: Flow address information of the initiating side
* @tgt: Flow address information on the target side (updated)
@@ -986,7 +985,7 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
* Return: pif of the target interface to forward the flow to, PIF_NONE if the
* flow cannot or should not be forwarded at all.
*/
-uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
+uint8_t fwd_nat_from_splice(const struct fwd_rule *rule, uint8_t proto,
const struct flowside *ini, struct flowside *tgt)
{
if (!inany_is_loopback(&ini->eaddr) ||
@@ -1010,11 +1009,7 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
/* But for UDP preserve the source port */
tgt->oport = ini->eport;
- tgt->eport = ini->oport;
- if (proto == IPPROTO_TCP)
- tgt->eport += c->tcp.fwd_out.delta[tgt->eport];
- else if (proto == IPPROTO_UDP)
- tgt->eport += c->udp.fwd_out.delta[tgt->eport];
+ tgt->eport = ini->oport - rule->first + rule->to;
return PIF_HOST;
}
@@ -1058,6 +1053,7 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
/**
* fwd_nat_from_host() - Determine to forward a flow from the host interface
* @c: Execution context
+ * @rule: Forwarding rule to apply
* @proto: Protocol (IP L4 protocol number)
* @ini: Flow address information of the initiating side
* @tgt: Flow address information on the target side (updated)
@@ -1065,15 +1061,12 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
* Return: pif of the target interface to forward the flow to, PIF_NONE if the
* flow cannot or should not be forwarded at all.
*/
-uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
+uint8_t fwd_nat_from_host(const struct ctx *c,
+ const struct fwd_rule *rule, uint8_t proto,
const struct flowside *ini, struct flowside *tgt)
{
/* Common for spliced and non-spliced cases */
- tgt->eport = ini->oport;
- if (proto == IPPROTO_TCP)
- tgt->eport += c->tcp.fwd_in.delta[tgt->eport];
- else if (proto == IPPROTO_UDP)
- tgt->eport += c->udp.fwd_in.delta[tgt->eport];
+ tgt->eport = ini->oport - rule->first + rule->to;
if (!c->no_splice && inany_is_loopback(&ini->eaddr) &&
(proto == IPPROTO_TCP || proto == IPPROTO_UDP)) {
diff --git a/fwd.h b/fwd.h
index a10bdbb4..cfe9ed46 100644
--- a/fwd.h
+++ b/fwd.h
@@ -82,7 +82,6 @@ enum fwd_ports_mode {
* @count: Number of forwarding rules
* @rules: Array of forwarding rules
* @map: Bitmap describing which ports are forwarded
- * @delta: Offset between the original destination and mapped port number
* @listen_sock_count: Number of entries used in @listen_socks
* @listen_socks: Listening sockets for forwarding
*/
@@ -93,7 +92,6 @@ struct fwd_ports {
unsigned count;
struct fwd_rule rules[MAX_FWD_RULES];
uint8_t map[PORT_BITMAP_SIZE];
- in_port_t delta[NUM_PORTS];
unsigned listen_sock_count;
int listen_socks[MAX_LISTEN_SOCKS];
};
@@ -117,9 +115,10 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr,
union inany_addr *translated);
uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
const struct flowside *ini, struct flowside *tgt);
-uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
+uint8_t fwd_nat_from_splice(const struct fwd_rule *rule, uint8_t proto,
const struct flowside *ini, struct flowside *tgt);
-uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
+uint8_t fwd_nat_from_host(const struct ctx *c,
+ const struct fwd_rule *rule, uint8_t proto,
const struct flowside *ini, struct flowside *tgt);
void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
const uint8_t *mac, bool permanent);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (11 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule David Gibson
@ 2026-01-08 2:29 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible David Gibson
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Now that we have a table of all our forwarding rules, every listening
socket can be associated with a specific rule. Add an index allowing us to
locate that rule from the socket's epoll reference. We don't use it yet,
but we'll use it to optimise rule lookup when forwarding new flows.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
fwd.c | 15 ++++++++++-----
fwd.h | 5 +++++
tcp.c | 4 +++-
tcp.h | 5 ++---
udp.c | 4 +++-
udp.h | 5 ++---
6 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/fwd.c b/fwd.c
index 7c4575ff..6727d26f 100644
--- a/fwd.c
+++ b/fwd.c
@@ -474,6 +474,7 @@ void fwd_rules_print(const struct fwd_ports *fwd)
/** fwd_sync_one() - Create or remove listening sockets for a forward entry
* @c: Execution context
+ * @fwd: Forwarding table
* @rule: Forwarding rule
* @pif: Interface to create listening sockets for
* @proto: Protocol to listen for
@@ -481,19 +482,23 @@ void fwd_rules_print(const struct fwd_ports *fwd)
*
* Return: 0 on success, -1 on failure
*/
-static int fwd_sync_one(const struct ctx *c, const struct fwd_rule *rule,
+static int fwd_sync_one(const struct ctx *c,
+ const struct fwd_ports *fwd, const struct fwd_rule *rule,
uint8_t pif, uint8_t proto, const uint8_t *scanmap)
{
const union inany_addr *addr = fwd_rule_addr(rule);
const char *ifname = rule->ifname;
bool bound_one = false;
- unsigned port;
+ unsigned port, idx;
ASSERT(pif_is_socket(pif));
if (!*ifname)
ifname = NULL;
+ idx = rule - fwd->rules;
+ ASSERT(idx < MAX_FWD_RULES);
+
for (port = rule->first; port <= rule->last; port++) {
int fd = rule->socks[port - rule->first];
@@ -514,9 +519,9 @@ static int fwd_sync_one(const struct ctx *c, const struct fwd_rule *rule,
}
if (proto == IPPROTO_TCP)
- fd = tcp_listen(c, pif, addr, ifname, port);
+ fd = tcp_listen(c, pif, idx, addr, ifname, port);
else if (proto == IPPROTO_UDP)
- fd = udp_listen(c, pif, addr, ifname, port);
+ fd = udp_listen(c, pif, idx, addr, ifname, port);
else
ASSERT(0);
@@ -588,7 +593,7 @@ static int fwd_listen_sync_(void *arg)
ns_enter(a->c);
for (i = 0; i < a->fwd->count; i++) {
- a->ret = fwd_sync_one(a->c, &a->fwd->rules[i],
+ a->ret = fwd_sync_one(a->c, a->fwd, &a->fwd->rules[i],
a->pif, a->proto, a->fwd->map);
if (a->ret < 0)
break;
diff --git a/fwd.h b/fwd.h
index cfe9ed46..435f422a 100644
--- a/fwd.h
+++ b/fwd.h
@@ -48,14 +48,19 @@ struct fwd_rule {
* union fwd_listen_ref - information about a single listening socket
* @port: Bound port number of the socket
* @pif: pif in which the socket is listening
+ * @rule: Index of forwarding rule
*/
union fwd_listen_ref {
struct {
in_port_t port;
uint8_t pif;
+#define FWD_RULE_BITS 8
+ unsigned rule :FWD_RULE_BITS;
};
uint32_t u32;
};
+static_assert(sizeof(union fwd_listen_ref) == sizeof(uint32_t));
+static_assert(MAX_FWD_RULES <= (1U << FWD_RULE_BITS));
enum fwd_ports_mode {
FWD_UNSET = 0,
diff --git a/tcp.c b/tcp.c
index e9b440da..fc03e38f 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2672,18 +2672,20 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
* tcp_listen() - Create listening socket
* @c: Execution context
* @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
+ * @rule: Index of relevant forwarding rule
* @addr: Pointer to address for binding, NULL for any
* @ifname: Name of interface to bind to, NULL for any
* @port: Port, host order
*
* Return: Socket fd on success, negative error code on failure
*/
-int tcp_listen(const struct ctx *c, uint8_t pif,
+int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
const union inany_addr *addr, const char *ifname, in_port_t port)
{
union fwd_listen_ref ref = {
.port = port,
.pif = pif,
+ .rule = rule,
};
int s;
diff --git a/tcp.h b/tcp.h
index 45f97d93..24b90870 100644
--- a/tcp.h
+++ b/tcp.h
@@ -18,9 +18,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const void *saddr, const void *daddr, uint32_t flow_lbl,
const struct pool *p, int idx, const struct timespec *now);
-int tcp_listen(const struct ctx *c, uint8_t pif,
- const union inany_addr *addr, const char *ifname,
- in_port_t port);
+int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
+ const union inany_addr *addr, const char *ifname, in_port_t port);
int tcp_init(struct ctx *c);
void tcp_timer(const struct ctx *c, const struct timespec *now);
void tcp_defer_handler(struct ctx *c);
diff --git a/udp.c b/udp.c
index 92a87198..761221f6 100644
--- a/udp.c
+++ b/udp.c
@@ -1115,18 +1115,20 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
* udp_listen() - Initialise listening socket for a given port
* @c: Execution context
* @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
+ * @rule: Index of relevant forwarding rule
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
* Return: Socket fd on success, negative error code on failure
*/
-int udp_listen(const struct ctx *c, uint8_t pif,
+int udp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
const union inany_addr *addr, const char *ifname, in_port_t port)
{
union fwd_listen_ref ref = {
.pif = pif,
.port = port,
+ .rule = rule,
};
int s;
diff --git a/udp.h b/udp.h
index 3c6f90a9..2b91d728 100644
--- a/udp.h
+++ b/udp.h
@@ -14,9 +14,8 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
sa_family_t af, const void *saddr, const void *daddr,
uint8_t ttl, const struct pool *p, int idx,
const struct timespec *now);
-int udp_listen(const struct ctx *c, uint8_t pif,
- const union inany_addr *addr, const char *ifname,
- in_port_t port);
+int udp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
+ const union inany_addr *addr, const char *ifname, in_port_t port);
int udp_init(struct ctx *c);
void udp_update_l2_buf(const unsigned char *eth_d);
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
` (12 preceding siblings ...)
2026-01-08 2:29 ` [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references David Gibson
@ 2026-01-08 2:29 ` David Gibson
13 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 2:29 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Now that listening sockets include a reference to the forwarding rule which
created them we can, in many cases, avoid a linear search of the forwarding
table when we want to find the relevant rule. Instead we can take the
rule index from the socket's epoll reference, and use that to immediately
find the correct rule.
This is conceptually simple, but requires a moderate amount of plumbing to
get the index from the reference through to the rule lookup. We still
allow fall back to linear search if we don't have the index, and this may
(rarely) be used in the udp_flush_flow() case, where we could get packets
for one flow on a different flow's socket, rather than through a listening
socket as usual.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 13 +++++++++++--
flow_table.h | 2 +-
fwd.c | 3 +--
fwd.h | 1 +
icmp.c | 2 +-
tcp.c | 4 ++--
udp.c | 14 +++++++++-----
udp_flow.c | 14 ++++++++------
udp_flow.h | 2 +-
udp_internal.h | 4 ++--
10 files changed, 37 insertions(+), 22 deletions(-)
diff --git a/flow.c b/flow.c
index 38f88732..85759970 100644
--- a/flow.c
+++ b/flow.c
@@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif,
* flow_target() - Determine where flow should forward to, and move to TGT
* @c: Execution context
* @flow: Flow to forward
+ * @rule_hint: Index of relevant forwarding rule, or -1 if unknown
* @proto: Protocol
*
* Return: pointer to the target flowside information
*/
struct flowside *flow_target(const struct ctx *c, union flow *flow,
- uint8_t proto)
+ int rule_hint, uint8_t proto)
{
char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN];
struct flow_common *f = &flow->f;
@@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
else
goto nofwd;
- if (!(rule = fwd_rule_search(fwd, ini))) {
+ if (rule_hint >= 0) {
+ ASSERT((unsigned)rule_hint < fwd->count);
+ rule = &fwd->rules[rule_hint];
+ if (!fwd_rule_match(rule, ini)) {
+ flow_dbg(flow,
+ "Unexpected mismatching forward rule");
+ goto nofwd;
+ }
+ } else if (!(rule = fwd_rule_search(fwd, ini))) {
/* This shouldn't happen, because if there's no rule for
* it we should have no listening socket that would let
* us get here
diff --git a/flow_table.h b/flow_table.h
index 5ee13acc..73de13ba 100644
--- a/flow_table.h
+++ b/flow_table.h
@@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *flow, uint8_t pif,
const void *saddr, in_port_t sport,
const void *daddr, in_port_t dport);
struct flowside *flow_target(const struct ctx *c, union flow *flow,
- uint8_t proto);
+ int rule_hint, uint8_t proto);
union flow *flow_set_type(union flow *flow, enum flow_type type);
#define FLOW_SET_TYPE(flow_, t_, var_) (&flow_set_type((flow_), (t_))->var_)
diff --git a/fwd.c b/fwd.c
index 6727d26f..250b470c 100644
--- a/fwd.c
+++ b/fwd.c
@@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
*
* Returns: true if the rule applies to the flow, false otherwise
*/
-static bool fwd_rule_match(const struct fwd_rule *rule,
- const struct flowside *ini)
+bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini)
{
return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) &&
ini->oport >= rule->first && ini->oport <= rule->last;
diff --git a/fwd.h b/fwd.h
index 435f422a..2af7791d 100644
--- a/fwd.h
+++ b/fwd.h
@@ -106,6 +106,7 @@ struct fwd_ports {
void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
const union inany_addr *addr, const char *ifname,
in_port_t first, in_port_t last, in_port_t to);
+bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini);
const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd,
const struct flowside *ini);
void fwd_rules_print(const struct fwd_ports *fwd);
diff --git a/icmp.c b/icmp.c
index 9564c496..0f4d48bb 100644
--- a/icmp.c
+++ b/icmp.c
@@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
return NULL;
flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id);
- if (!(tgt = flow_target(c, flow, proto)))
+ if (!(tgt = flow_target(c, flow, -1, proto)))
goto cancel;
if (flow->f.pif[TGTSIDE] != PIF_HOST) {
diff --git a/tcp.c b/tcp.c
index fc03e38f..89b22a51 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
ini = flow_initiate_af(flow, PIF_TAP,
af, saddr, srcport, daddr, dstport);
- if (!(tgt = flow_target(c, flow, IPPROTO_TCP)))
+ if (!(tgt = flow_target(c, flow, -1, IPPROTO_TCP)))
goto cancel;
if (flow->f.pif[TGTSIDE] != PIF_HOST) {
@@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref,
goto cancel;
}
- if (!flow_target(c, flow, IPPROTO_TCP))
+ if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP))
goto cancel;
switch (flow->f.pif[TGTSIDE]) {
diff --git a/udp.c b/udp.c
index 761221f6..d7e47e52 100644
--- a/udp.c
+++ b/udp.c
@@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n,
* udp_sock_fwd() - Forward datagrams from a possibly unconnected socket
* @c: Execution context
* @s: Socket to forward from
+ * @rule_hint: Forwarding rule to use, or -1 if unknown
* @frompif: Interface to which @s belongs
* @port: Our (local) port number of @s
* @now: Current timestamp
*/
-void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
- in_port_t port, const struct timespec *now)
+void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
+ uint8_t frompif, in_port_t port, const struct timespec *now)
{
union sockaddr_inany src;
union inany_addr dst;
@@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
continue;
}
- tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now);
+ tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src,
+ rule_hint, now);
topif = pif_at_sidx(tosidx);
if (pif_is_socket(topif)) {
@@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c,
union epoll_ref ref, uint32_t events,
const struct timespec *now)
{
- if (events & (EPOLLERR | EPOLLIN))
- udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now);
+ if (events & (EPOLLERR | EPOLLIN)) {
+ udp_sock_fwd(c, ref.fd, ref.listen.rule,
+ ref.listen.pif, ref.listen.port, now);
+ }
}
/**
diff --git a/udp_flow.c b/udp_flow.c
index 8907f2f7..b82c6c06 100644
--- a/udp_flow.c
+++ b/udp_flow.c
@@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c,
* udp_flow_new() - Common setup for a new UDP flow
* @c: Execution context
* @flow: Initiated flow
+ * @rule_hint: Index of forwarding rule, or -1 if unknown
* @now: Timestamp
*
* Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_NONE
@@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c,
* #syscalls getsockname
*/
static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow,
- const struct timespec *now)
+ int rule_hint, const struct timespec *now)
{
struct udp_flow *uflow = NULL;
const struct flowside *tgt;
unsigned sidei;
- if (!(tgt = flow_target(c, flow, IPPROTO_UDP)))
+ if (!(tgt = flow_target(c, flow, rule_hint, IPPROTO_UDP)))
goto cancel;
uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp);
@@ -216,6 +217,7 @@ cancel:
* @dst: Our (local) address to which the datagram is arriving
* @port: Our (local) port number to which the datagram is arriving
* @s_in: Source socket address, filled in by recvmmsg()
+ * @rule_hint: Index of forwarding rule, or -1 if unknown
* @now: Timestamp
*
* #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64
@@ -226,7 +228,7 @@ cancel:
flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
const union inany_addr *dst, in_port_t port,
const union sockaddr_inany *s_in,
- const struct timespec *now)
+ int rule_hint, const struct timespec *now)
{
const struct flowside *ini;
struct udp_flow *uflow;
@@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
return FLOW_SIDX_NONE;
}
- return udp_flow_new(c, flow, now);
+ return udp_flow_new(c, flow, rule_hint, now);
}
/**
@@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
return FLOW_SIDX_NONE;
}
- return udp_flow_new(c, flow, now);
+ return udp_flow_new(c, flow, -1, now);
}
/**
@@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c,
{
/* We don't know exactly where the datagrams will come from, but we know
* they'll have an interface and oport matching this flow */
- udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei],
+ udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei],
uflow->f.side[sidei].oport, now);
}
diff --git a/udp_flow.h b/udp_flow.h
index 4c528e95..14e0f920 100644
--- a/udp_flow.h
+++ b/udp_flow.h
@@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx);
flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
const union inany_addr *dst, in_port_t port,
const union sockaddr_inany *s_in,
- const struct timespec *now);
+ int rule_hint, const struct timespec *now);
flow_sidx_t udp_flow_from_tap(const struct ctx *c,
uint8_t pif, sa_family_t af,
const void *saddr, const void *daddr,
diff --git a/udp_internal.h b/udp_internal.h
index 96d11cff..f0ce8f14 100644
--- a/udp_internal.h
+++ b/udp_internal.h
@@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
const struct flowside *toside, size_t dlen,
bool no_udp_csum);
-void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
- in_port_t port, const struct timespec *now);
+void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
+ uint8_t frompif, in_port_t port, const struct timespec *now);
#endif /* UDP_INTERNAL_H */
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
@ 2026-01-08 13:16 ` Laurent Vivier
0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2026-01-08 13:16 UTC (permalink / raw)
To: David Gibson, Stefano Brivio, passt-dev
On 1/8/26 03:29, David Gibson wrote:
> In a number of places we're using a convention that a NULL inany represents
> a dual-stack unspecified address, that is one which includes both 0.0.0.0
> and ::. Extend inany_ntop() to handle that convention, representing it
> as "*".
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> inany.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/inany.c b/inany.c
> index 7680439d..87a4d8b6 100644
> --- a/inany.c
> +++ b/inany.c
> @@ -22,7 +22,7 @@ const union inany_addr inany_loopback4 = INANY_INIT4(IN4ADDR_LOOPBACK_INIT);
> const union inany_addr inany_any4 = INANY_INIT4(IN4ADDR_ANY_INIT);
>
> /** inany_ntop - Convert an IPv[46] address to text format
> - * @src: IPv[46] address
> + * @src: IPv[46] address (NULL for unspecified)
> * @dst: output buffer, minimum INANY_ADDRSTRLEN bytes
> * @size: size of buffer at @dst
> *
> @@ -30,9 +30,12 @@ const union inany_addr inany_any4 = INANY_INIT4(IN4ADDR_ANY_INIT);
> */
> const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size)
> {
> - const struct in_addr *v4 = inany_v4(src);
> + const struct in_addr *v4;
> +
> + if (!src)
> + return strncpy(dst, "*", size);
>
> - if (v4)
> + if ((v4 = inany_v4(src)))
> return inet_ntop(AF_INET, v4, dst, size);
>
> return inet_ntop(AF_INET6, &src->a6, dst, size);
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 06/14] ip: Add ipproto_name() function
2026-01-08 2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
@ 2026-01-08 13:22 ` Laurent Vivier
2026-01-08 23:12 ` David Gibson
0 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2026-01-08 13:22 UTC (permalink / raw)
To: David Gibson, Stefano Brivio, passt-dev
On 1/8/26 03:29, David Gibson wrote:
> Add a function to get the name of an IP protocol from its number. Usually
> this would be done by getprotobynumber(), but that requires access to
> /etc/protocols and might allocate. We can't do either of those once we've
> self-isolated.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> ip.c | 27 +++++++++++++++++++++++++++
> ip.h | 2 ++
> 2 files changed, 29 insertions(+)
>
> diff --git a/ip.c b/ip.c
> index 9a7f4c54..f1d224bd 100644
> --- a/ip.c
> +++ b/ip.c
> @@ -67,3 +67,30 @@ found:
> *proto = nh;
> return true;
> }
> +
> +/**
> + * ipproto_name() - Get IP protocol name from number
> + * @proto: IP protocol number
> + *
> + * Return: pointer to name of protocol @proto
> + *
> + * Usually this would be done with getprotobynumber(3) but that reads
> + * /etc/protocols and might allocate, which isn't possible for us once
> + * self-isolated.
> + */
> +/* cppcheck-suppress unusedFunction */
> +const char *ipproto_name(uint8_t proto)
> +{
> + switch (proto) {
> + case IPPROTO_ICMP:
> + return "ICMP";
> + case IPPROTO_TCP:
> + return "TCP";
> + case IPPROTO_UDP:
> + return "UDP";
> + case IPPROTO_ICMPV6:
> + return "ICMPv6";
> + default:
> + return "<unknown protocol>";
> + }
> +}
> diff --git a/ip.h b/ip.h
> index 5830b923..42b417c5 100644
> --- a/ip.h
> +++ b/ip.h
> @@ -116,6 +116,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
> }
>
> bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
> +const char *ipproto_name(uint8_t proto);
>
> /* IPv6 link-local all-nodes multicast address, ff02::1 */
> static const struct in6_addr in6addr_ll_all_nodes = {
> @@ -135,4 +136,5 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff };
> #define IPV6_MIN_MTU 1280
> #endif
>
> +
Extra blank line
> #endif /* IP_H */
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 06/14] ip: Add ipproto_name() function
2026-01-08 13:22 ` Laurent Vivier
@ 2026-01-08 23:12 ` David Gibson
0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2026-01-08 23:12 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Stefano Brivio, passt-dev
[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]
On Thu, Jan 08, 2026 at 02:22:21PM +0100, Laurent Vivier wrote:
> On 1/8/26 03:29, David Gibson wrote:
> > Add a function to get the name of an IP protocol from its number. Usually
> > this would be done by getprotobynumber(), but that requires access to
> > /etc/protocols and might allocate. We can't do either of those once we've
> > self-isolated.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > ip.c | 27 +++++++++++++++++++++++++++
> > ip.h | 2 ++
> > 2 files changed, 29 insertions(+)
> >
> > diff --git a/ip.c b/ip.c
> > index 9a7f4c54..f1d224bd 100644
> > --- a/ip.c
> > +++ b/ip.c
> > @@ -67,3 +67,30 @@ found:
> > *proto = nh;
> > return true;
> > }
> > +
> > +/**
> > + * ipproto_name() - Get IP protocol name from number
> > + * @proto: IP protocol number
> > + *
> > + * Return: pointer to name of protocol @proto
> > + *
> > + * Usually this would be done with getprotobynumber(3) but that reads
> > + * /etc/protocols and might allocate, which isn't possible for us once
> > + * self-isolated.
> > + */
> > +/* cppcheck-suppress unusedFunction */
> > +const char *ipproto_name(uint8_t proto)
> > +{
> > + switch (proto) {
> > + case IPPROTO_ICMP:
> > + return "ICMP";
> > + case IPPROTO_TCP:
> > + return "TCP";
> > + case IPPROTO_UDP:
> > + return "UDP";
> > + case IPPROTO_ICMPV6:
> > + return "ICMPv6";
> > + default:
> > + return "<unknown protocol>";
> > + }
> > +}
> > diff --git a/ip.h b/ip.h
> > index 5830b923..42b417c5 100644
> > --- a/ip.h
> > +++ b/ip.h
> > @@ -116,6 +116,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
> > }
> > bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
> > +const char *ipproto_name(uint8_t proto);
> > /* IPv6 link-local all-nodes multicast address, ff02::1 */
> > static const struct in6_addr in6addr_ll_all_nodes = {
> > @@ -135,4 +136,5 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff };
> > #define IPV6_MIN_MTU 1280
> > #endif
> > +
>
> Extra blank line
Oops, fixed.
> > #endif /* IP_H */
>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-08 23:13 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
2026-01-08 13:16 ` Laurent Vivier
2026-01-08 2:29 ` [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration David Gibson
2026-01-08 2:29 ` [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards David Gibson
2026-01-08 2:29 ` [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
2026-01-08 2:29 ` [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table David Gibson
2026-01-08 2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
2026-01-08 13:22 ` Laurent Vivier
2026-01-08 23:12 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
2026-01-08 2:29 ` [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
2026-01-08 2:29 ` [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules David Gibson
2026-01-08 2:29 ` [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
2026-01-08 2:29 ` [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket David Gibson
2026-01-08 2:29 ` [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule David Gibson
2026-01-08 2:29 ` [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references David Gibson
2026-01-08 2:29 ` [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).