* [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding
@ 2025-11-19 5:22 David Gibson
2025-11-19 5:22 ` [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args David Gibson
` (8 more replies)
0 siblings, 9 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
This series is based on my series fixing bug 176 (regression in auto
forwarding).
The fact that outbound forwarding sockets are bound to the loopback
address, whereas inbound forwarding sockets are (by default) bound to
the unspecified address leads to some unexpected differences between
the paths setting up each of them.
An idea for tackling bug 100 suggested a different approach which will
also reduce some of those differences and allow more code to be shared
between the two paths. I've since discovered that this approach
doesn't help for bug 100, but I think it's still worthwhile for other reasons.
Patches 1..7/9 are cleanups which shouldn't change behaviour, and I
think are ready to merge. 8/9 and 9/9 introduce behavioural changes,
but I've made the case for them in the patch comments.
v4:
- Add cleanup patch removing unused structure field
- Rebase, fixing conflicts with Laurent's changes
- A bunch of spelling and other cosmetic fixes
- Clarify relation to bug 100 and bug 113
v3:
- A number of additional fixes covering the handling of IPV6_V6ONLY sockopt
- Assorted trivial changes
v2:
- Some rearrangements and rewordings for clarity
David Gibson (9):
flow: Remove bogus @path field from flowside_sock_args
inany: Let length of sockaddr_inany be implicit from the family
util, flow, pif: Simplify sock_l4_sa() interface
tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one()
udp: Unify some more inbound/outbound parts of udp_sock_init()
udp: Move udp_sock_init() special case to its caller
util: Fix setting of IPV6_V6ONLY socket option
tcp, udp: Remove fallback if creating dual stack socket fails
tcp, udp: Bind outbound listening sockets by interface instead of
address
conf.c | 4 +-
flow.c | 22 +++----
icmp.c | 3 +-
inany.h | 17 ++++++
pif.c | 27 ++-------
pif.h | 2 +-
tcp.c | 164 +++++++++++++++++++--------------------------------
tcp.h | 5 +-
tcp_splice.c | 5 +-
udp.c | 102 +++++++++++++++-----------------
udp.h | 5 +-
util.c | 76 ++++++++++++++++++++----
util.h | 8 ++-
13 files changed, 221 insertions(+), 219 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 2/9] inany: Let length of sockaddr_inany be implicit from the family David Gibson
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
This was there when the structure was created, but never used. Looks like
a copy-pasta error.
Fixes: 781164e25bdf ("flow: Helper to create sockets based on flowside")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/flow.c b/flow.c
index 278a9cf0..743860d6 100644
--- a/flow.c
+++ b/flow.c
@@ -172,7 +172,6 @@ struct flowside_sock_args {
enum epoll_type type;
const struct sockaddr *sa;
socklen_t sl;
- const char *path;
};
/** flowside_sock_splice() - Create and bind socket for PIF_SPLICE based on flowside
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 2/9] inany: Let length of sockaddr_inany be implicit from the family
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
2025-11-19 5:22 ` [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 3/9] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
sockaddr_inany can contain either an IPv4 or IPv6 socket address, so the
relevant length for bind() or connect() can vary. In pif_sockaddr() we
return that length, and in sock_l4_sa() we take it as an extra parameter.
However, sockaddr_inany always contains exactly a sockaddr_in or a
sockaddr_in6 each with a fixed size. Therefore we can derive the relevant
length from the family, and don't need to pass it around separately.
Make a tiny helper to get the relevant address length, and update all
interfaces to use that approach instead.
In the process, fix a buglet in tcp_flow_repair_bind(): we passed
sizeof(union sockaddr_inany) to bind() instead of the specific length for
the address family. Since the sizeof() is always longer than the specific
length, this is probably fine, but not theoretically correct.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 21 ++++++++-------------
icmp.c | 3 ++-
inany.h | 17 +++++++++++++++++
pif.c | 15 +++++----------
pif.h | 2 +-
tcp.c | 20 +++++++++-----------
tcp_splice.c | 5 ++---
udp.c | 8 +++-----
util.c | 8 +++-----
util.h | 4 ++--
10 files changed, 52 insertions(+), 51 deletions(-)
diff --git a/flow.c b/flow.c
index 743860d6..11e3f283 100644
--- a/flow.c
+++ b/flow.c
@@ -163,15 +163,13 @@ static void flowside_from_af(struct flowside *side, sa_family_t af,
* @err: Filled in with errno if something failed
* @type: Socket epoll type
* @sa: Socket address
- * @sl: Length of @sa
*/
struct flowside_sock_args {
const struct ctx *c;
int fd;
int err;
enum epoll_type type;
- const struct sockaddr *sa;
- socklen_t sl;
+ const union sockaddr_inany *sa;
};
/** flowside_sock_splice() - Create and bind socket for PIF_SPLICE based on flowside
@@ -185,8 +183,8 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, a->sl, NULL,
- a->sa->sa_family == AF_INET6);
+ a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL,
+ a->sa->sa_family == AF_INET6);
a->err = errno;
return 0;
@@ -207,11 +205,10 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
{
const char *ifname = NULL;
union sockaddr_inany sa;
- socklen_t sl;
ASSERT(pif_is_socket(pif));
- pif_sockaddr(c, &sa, &sl, pif, &tgt->oaddr, tgt->oport);
+ pif_sockaddr(c, &sa, pif, &tgt->oaddr, tgt->oport);
switch (pif) {
case PIF_HOST:
@@ -222,13 +219,12 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
else if (sa.sa_family == AF_INET6)
ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, sl, ifname,
+ return sock_l4_sa(c, type, &sa, ifname,
sa.sa_family == AF_INET6);
case PIF_SPLICE: {
struct flowside_sock_args args = {
- .c = c, .type = type,
- .sa = &sa.sa, .sl = sl,
+ .c = c, .type = type, .sa = &sa,
};
NS_CALL(flowside_sock_splice, &args);
errno = args.err;
@@ -257,10 +253,9 @@ int flowside_connect(const struct ctx *c, int s,
uint8_t pif, const struct flowside *tgt)
{
union sockaddr_inany sa;
- socklen_t sl;
- pif_sockaddr(c, &sa, &sl, pif, &tgt->eaddr, tgt->eport);
- return connect(s, &sa.sa, sl);
+ pif_sockaddr(c, &sa, pif, &tgt->eaddr, tgt->eport);
+ return connect(s, &sa.sa, socklen_inany(&sa));
}
/** flow_log_ - Log flow-related message
diff --git a/icmp.c b/icmp.c
index 35faefb9..9564c496 100644
--- a/icmp.c
+++ b/icmp.c
@@ -312,8 +312,9 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
ASSERT(flow_proto[pingf->f.type] == proto);
pingf->ts = now->tv_sec;
- pif_sockaddr(c, &sa, &msh.msg_namelen, PIF_HOST, &tgt->eaddr, 0);
+ pif_sockaddr(c, &sa, PIF_HOST, &tgt->eaddr, 0);
msh.msg_name = &sa;
+ msh.msg_namelen = socklen_inany(&sa);
msh.msg_iov = iov;
msh.msg_iovlen = cnt;
msh.msg_control = NULL;
diff --git a/inany.h b/inany.h
index 7ca5cbd3..61b36fb4 100644
--- a/inany.h
+++ b/inany.h
@@ -67,6 +67,23 @@ union sockaddr_inany {
struct sockaddr_in6 sa6;
};
+/** socklen_inany() - Get relevant address length for sockaddr_inany address
+ * @sa: sockaddr_inany socket address
+ *
+ * Return: socket address length for bind() or connect(), from IP family in @sa
+ */
+static inline socklen_t socklen_inany(const union sockaddr_inany *sa)
+{
+ switch (sa->sa_family) {
+ case AF_INET:
+ return sizeof(sa->sa4);
+ case AF_INET6:
+ return sizeof(sa->sa6);
+ default:
+ ASSERT(0);
+ }
+}
+
/** inany_v4 - Extract IPv4 address, if present, from IPv[46] address
* @addr: IPv4 or IPv6 address
*
diff --git a/pif.c b/pif.c
index 331942e7..e6a50806 100644
--- a/pif.c
+++ b/pif.c
@@ -30,12 +30,11 @@ static_assert(ARRAY_SIZE(pif_type_str) == PIF_NUM_TYPES,
/** pif_sockaddr() - Construct a socket address suitable for an interface
* @c: Execution context
* @sa: Pointer to sockaddr to fill in
- * @sl: Updated to relevant length of initialised @sa
* @pif: Interface to create the socket address
* @addr: IPv[46] address
* @port: Port (host byte order)
*/
-void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa, socklen_t *sl,
+void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa,
uint8_t pif, const union inany_addr *addr, in_port_t port)
{
const struct in_addr *v4 = inany_v4(addr);
@@ -47,7 +46,6 @@ void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa, socklen_t *sl,
sa->sa4.sin_addr = *v4;
sa->sa4.sin_port = htons(port);
memset(&sa->sa4.sin_zero, 0, sizeof(sa->sa4.sin_zero));
- *sl = sizeof(sa->sa4);
} else {
sa->sa_family = AF_INET6;
sa->sa6.sin6_addr = addr->a6;
@@ -57,7 +55,6 @@ void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa, socklen_t *sl,
else
sa->sa6.sin6_scope_id = 0;
sa->sa6.sin6_flowinfo = 0;
- *sl = sizeof(sa->sa6);
}
}
@@ -85,7 +82,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
.sa6.sin6_port = htons(port),
};
union epoll_ref ref;
- socklen_t sl;
int ret;
ASSERT(pif_is_socket(pif));
@@ -97,12 +93,11 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
}
if (!addr) {
- ref.fd = sock_l4_sa(c, type, &sa, sizeof(sa.sa6),
- ifname, false);
+ ref.fd = sock_l4_sa(c, type, &sa, ifname, false);
} else {
- pif_sockaddr(c, &sa, &sl, pif, addr, port);
- ref.fd = sock_l4_sa(c, type, &sa, sl,
- ifname, sa.sa_family == AF_INET6);
+ pif_sockaddr(c, &sa, pif, addr, port);
+ ref.fd = sock_l4_sa(c, type, &sa, ifname,
+ sa.sa_family == AF_INET6);
}
if (ref.fd < 0)
diff --git a/pif.h b/pif.h
index f029282b..0f7f6672 100644
--- a/pif.h
+++ b/pif.h
@@ -57,7 +57,7 @@ static inline bool pif_is_socket(uint8_t pif)
return pif == PIF_HOST || pif == PIF_SPLICE;
}
-void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa, socklen_t *sl,
+void pif_sockaddr(const struct ctx *c, union sockaddr_inany *sa,
uint8_t pif, const union inany_addr *addr, in_port_t port);
int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
const union inany_addr *addr, const char *ifname,
diff --git a/tcp.c b/tcp.c
index 3202d338..b8a98523 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1457,12 +1457,11 @@ static void tcp_bind_outbound(const struct ctx *c,
{
const struct flowside *tgt = &conn->f.side[TGTSIDE];
union sockaddr_inany bind_sa;
- socklen_t sl;
- pif_sockaddr(c, &bind_sa, &sl, PIF_HOST, &tgt->oaddr, tgt->oport);
+ pif_sockaddr(c, &bind_sa, PIF_HOST, &tgt->oaddr, tgt->oport);
if (!inany_is_unspecified(&tgt->oaddr) || tgt->oport) {
- if (bind(s, &bind_sa.sa, sl)) {
+ if (bind(s, &bind_sa.sa, socklen_inany(&bind_sa))) {
char sstr[INANY_ADDRSTRLEN];
flow_dbg_perror(conn,
@@ -1522,7 +1521,6 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
union flow *flow;
int s = -1, mss;
uint64_t hash;
- socklen_t sl;
if (!(flow = flow_alloc()))
return;
@@ -1555,7 +1553,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
if ((s = tcp_conn_sock(af)) < 0)
goto cancel;
- pif_sockaddr(c, &sa, &sl, PIF_HOST, &tgt->eaddr, tgt->eport);
+ pif_sockaddr(c, &sa, PIF_HOST, &tgt->eaddr, tgt->eport);
/* Use bind() to check if the target address is local (EADDRINUSE or
* similar) and already bound, and set the LOCAL flag in that case.
@@ -1567,7 +1565,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
*
* So, if bind() succeeds, close the socket, get a new one, and proceed.
*/
- if (bind(s, &sa.sa, sl)) {
+ if (bind(s, &sa.sa, socklen_inany(&sa))) {
if (errno != EADDRNOTAVAIL && errno != EACCES)
conn_flag(c, conn, LOCAL);
} else {
@@ -1607,7 +1605,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
tcp_bind_outbound(c, conn, s);
- if (connect(s, &sa.sa, sl)) {
+ if (connect(s, &sa.sa, socklen_inany(&sa))) {
if (errno != EINPROGRESS) {
tcp_rst(c, conn);
goto cancel;
@@ -1626,7 +1624,8 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
tcp_epoll_ctl(c, conn);
if (c->mode == MODE_VU) { /* To rebind to same oport after migration */
- sl = sizeof(sa);
+ socklen_t sl = sizeof(sa);
+
if (getsockname(s, &sa.sa, &sl) ||
inany_from_sockaddr(&tgt->oaddr, &tgt->oport, &sa) < 0)
err_perror("Can't get local address for socket %i", s);
@@ -3611,11 +3610,10 @@ static int tcp_flow_repair_bind(const struct ctx *c, struct tcp_tap_conn *conn)
{
const struct flowside *sockside = HOSTFLOW(conn);
union sockaddr_inany a;
- socklen_t sl;
- pif_sockaddr(c, &a, &sl, PIF_HOST, &sockside->oaddr, sockside->oport);
+ pif_sockaddr(c, &a, PIF_HOST, &sockside->oaddr, sockside->oport);
- if (bind(conn->sock, &a.sa, sizeof(a))) {
+ if (bind(conn->sock, &a.sa, socklen_inany(&a))) {
int rc = -errno;
flow_perror(conn, "Failed to bind socket for migrated flow");
return rc;
diff --git a/tcp_splice.c b/tcp_splice.c
index 5efd8597..717766af 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -352,7 +352,6 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
sa_family_t af = inany_v4(&tgt->eaddr) ? AF_INET : AF_INET6;
uint8_t tgtpif = conn->f.pif[TGTSIDE];
union sockaddr_inany sa;
- socklen_t sl;
int one = 1;
if (tgtpif == PIF_HOST)
@@ -380,9 +379,9 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
conn->s[1]);
}
- pif_sockaddr(c, &sa, &sl, tgtpif, &tgt->eaddr, tgt->eport);
+ pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
- if (connect(conn->s[1], &sa.sa, sl)) {
+ if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) {
if (errno != EINPROGRESS) {
flow_trace(conn, "Couldn't connect socket for splice: %s",
strerror_(errno));
diff --git a/udp.c b/udp.c
index 9c009502..d414ca32 100644
--- a/udp.c
+++ b/udp.c
@@ -780,7 +780,6 @@ static void udp_sock_to_sock(const struct ctx *c, int from_s, int n,
const struct udp_flow *uflow = udp_at_sidx(tosidx);
uint8_t topif = pif_at_sidx(tosidx);
int to_s = uflow->s[tosidx.sidei];
- socklen_t sl;
int i;
if ((n = udp_sock_recv(c, from_s, udp_mh_recv, n)) <= 0)
@@ -791,7 +790,7 @@ static void udp_sock_to_sock(const struct ctx *c, int from_s, int n,
= udp_mh_recv[i].msg_len;
}
- pif_sockaddr(c, &udp_splice_to, &sl, topif,
+ pif_sockaddr(c, &udp_splice_to, topif,
&toside->eaddr, toside->eport);
sendmmsg(to_s, udp_mh_splice, n, MSG_NOSIGNAL);
@@ -999,7 +998,6 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
flow_sidx_t tosidx;
in_port_t src, dst;
uint8_t topif;
- socklen_t sl;
ASSERT(!c->no_udp);
@@ -1041,7 +1039,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
s = uflow->s[tosidx.sidei];
ASSERT(s >= 0);
- pif_sockaddr(c, &to_sa, &sl, topif, &toside->eaddr, toside->eport);
+ pif_sockaddr(c, &to_sa, topif, &toside->eaddr, toside->eport);
for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
const struct udphdr *uh_send;
@@ -1054,7 +1052,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
return p->count - idx;
mm[i].msg_hdr.msg_name = &to_sa;
- mm[i].msg_hdr.msg_namelen = sl;
+ mm[i].msg_hdr.msg_namelen = socklen_inany(&to_sa);
if (data.cnt) {
int cnt;
diff --git a/util.c b/util.c
index ab23463b..7df50c69 100644
--- a/util.c
+++ b/util.c
@@ -44,17 +44,15 @@
* @c: Execution context
* @type: epoll type
* @sa: Socket address to bind to
- * @sl: Length of @sa
* @ifname: Interface for binding, NULL for any
* @v6only: Set IPV6_V6ONLY socket option
*
* Return: newly created socket, negative error code on failure
*/
int sock_l4_sa(const struct ctx *c, enum epoll_type type,
- const void *sa, socklen_t sl,
- const char *ifname, bool v6only)
+ const union sockaddr_inany *sa, const char *ifname, bool v6only)
{
- sa_family_t af = ((const struct sockaddr *)sa)->sa_family;
+ sa_family_t af = sa->sa_family;
bool freebind = false;
int fd, y = 1, ret;
uint8_t proto;
@@ -146,7 +144,7 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
}
}
- if (bind(fd, sa, sl) < 0) {
+ if (bind(fd, &sa->sa, socklen_inany(sa)) < 0) {
/* We'll fail to bind to low ports if we don't have enough
* capabilities, and we'll fail to bind on already bound ports,
* this is fine. This might also fail for ICMP because of a
diff --git a/util.h b/util.h
index a0b2ada6..f8294199 100644
--- a/util.h
+++ b/util.h
@@ -206,10 +206,10 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
#include "packet.h"
struct ctx;
+union sockaddr_inany;
int sock_l4_sa(const struct ctx *c, enum epoll_type type,
- const void *sa, socklen_t sl,
- const char *ifname, bool v6only);
+ const union sockaddr_inany *sa, const char *ifname, bool v6only);
int sock_unix(char *sock_path);
void sock_probe_mem(struct ctx *c);
long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 3/9] util, flow, pif: Simplify sock_l4_sa() interface
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
2025-11-19 5:22 ` [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args David Gibson
2025-11-19 5:22 ` [PATCH v4 2/9] inany: Let length of sockaddr_inany be implicit from the family David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 4/9] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether
to set the IPV6_V6ONLY socket option. Usually it's set when the given
address is IPv6, but not when we want to create a dual stack listening
socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper
sock_l4_(). External users will call either sock_l4() which always creates
a socket bound to a specific IP version, or sock_l4_dualstack() which
creates a dual stack socket, but takes only a port not an address.
We drop the '_sa' suffix while we're at it - it exists because this used
to be an internal version with a sock_l4() wrapper. The wrapper no longer
exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 6 ++----
pif.c | 12 ++++--------
util.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
util.h | 6 ++++--
4 files changed, 52 insertions(+), 17 deletions(-)
diff --git a/flow.c b/flow.c
index 11e3f283..8d72965c 100644
--- a/flow.c
+++ b/flow.c
@@ -183,8 +183,7 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL,
- a->sa->sa_family == AF_INET6);
+ a->fd = sock_l4(a->c, a->type, a->sa, NULL);
a->err = errno;
return 0;
@@ -219,8 +218,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
else if (sa.sa_family == AF_INET6)
ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, ifname,
- sa.sa_family == AF_INET6);
+ return sock_l4(c, type, &sa, ifname);
case PIF_SPLICE: {
struct flowside_sock_args args = {
diff --git a/pif.c b/pif.c
index e6a50806..85904f35 100644
--- a/pif.c
+++ b/pif.c
@@ -76,11 +76,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
const union inany_addr *addr, const char *ifname,
in_port_t port, uint32_t data)
{
- union sockaddr_inany sa = {
- .sa6.sin6_family = AF_INET6,
- .sa6.sin6_addr = in6addr_any,
- .sa6.sin6_port = htons(port),
- };
union epoll_ref ref;
int ret;
@@ -93,11 +88,12 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
}
if (!addr) {
- ref.fd = sock_l4_sa(c, type, &sa, ifname, false);
+ ref.fd = sock_l4_dualstack(c, type, port, ifname);
} else {
+ union sockaddr_inany sa;
+
pif_sockaddr(c, &sa, pif, addr, port);
- ref.fd = sock_l4_sa(c, type, &sa, ifname,
- sa.sa_family == AF_INET6);
+ ref.fd = sock_l4(c, type, &sa, ifname);
}
if (ref.fd < 0)
diff --git a/util.c b/util.c
index 7df50c69..1108a1da 100644
--- a/util.c
+++ b/util.c
@@ -40,7 +40,7 @@
#endif
/**
- * sock_l4_sa() - Create and bind socket to socket address, add to epoll list
+ * sock_l4_() - Create and bind socket to socket address
* @c: Execution context
* @type: epoll type
* @sa: Socket address to bind to
@@ -49,8 +49,9 @@
*
* Return: newly created socket, negative error code on failure
*/
-int sock_l4_sa(const struct ctx *c, enum epoll_type type,
- const union sockaddr_inany *sa, const char *ifname, bool v6only)
+static int sock_l4_(const struct ctx *c, enum epoll_type type,
+ const union sockaddr_inany *sa, const char *ifname,
+ bool v6only)
{
sa_family_t af = sa->sa_family;
bool freebind = false;
@@ -167,6 +168,44 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
return fd;
}
+/**
+ * sock_l4() - Create and bind socket to given address
+ * @c: Execution context
+ * @type: epoll type
+ * @sa: Socket address to bind to
+ * @ifname: Interface for binding, NULL for any
+ *
+ * Return: newly created socket, negative error code on failure
+ */
+int sock_l4(const struct ctx *c, enum epoll_type type,
+ const union sockaddr_inany *sa, const char *ifname)
+{
+ return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6);
+}
+
+/**
+ * sock_l4_dualstack() - Create a dual stack socket bound with wildcard address
+ * @c: Execution context
+ * @type: epoll type
+ * @port Port to bind to (:: and 0.0.0.0)
+ * @ifname: Interface for binding, NULL for any
+ *
+ * Return: newly created socket, negative error code on failure
+ *
+ * A dual stack socket is effectively bound to both :: and 0.0.0.0.
+ */
+int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
+ in_port_t port, const char *ifname)
+{
+ union sockaddr_inany sa = {
+ .sa6.sin6_family = AF_INET6,
+ .sa6.sin6_addr = in6addr_any,
+ .sa6.sin6_port = htons(port),
+ };
+
+ return sock_l4_(c, type, &sa, ifname, 0);
+}
+
/**
* sock_unix() - Create and bind AF_UNIX socket
* @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
diff --git a/util.h b/util.h
index f8294199..d9fbfe5d 100644
--- a/util.h
+++ b/util.h
@@ -208,8 +208,10 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
struct ctx;
union sockaddr_inany;
-int sock_l4_sa(const struct ctx *c, enum epoll_type type,
- const union sockaddr_inany *sa, const char *ifname, bool v6only);
+int sock_l4(const struct ctx *c, enum epoll_type type,
+ const union sockaddr_inany *sa, const char *ifname);
+int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
+ in_port_t port, const char *ifname);
int sock_unix(char *sock_path);
void sock_probe_mem(struct ctx *c);
long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 4/9] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one()
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (2 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 3/9] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 5/9] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Surprisingly little logic is shared between the path for creating a
listen()ing socket in the guest namespace versus in the host namespace.
Improve this, by extending tcp_sock_init_one() to take a pif parameter
indicating where it should open the socket. This allows
tcp_ns_sock_init[46]() to be removed entirely.
We generalise tcp_sock_init() in the same way, although we don't fully use
it yet, due to some subtle differences in how we bind for -t versus -T.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 2 +-
tcp.c | 100 +++++++++++++++++++--------------------------------------
tcp.h | 5 +--
3 files changed, 37 insertions(+), 70 deletions(-)
diff --git a/conf.c b/conf.c
index 66b9e634..26f1bcc0 100644
--- a/conf.c
+++ b/conf.c
@@ -169,7 +169,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
fwd->delta[i] = to - first;
if (optname == 't')
- ret = tcp_sock_init(c, addr, ifname, i);
+ ret = tcp_sock_init(c, PIF_HOST, addr, ifname, i);
else if (optname == 'u')
ret = udp_sock_init(c, 0, addr, ifname, i);
else
diff --git a/tcp.c b/tcp.c
index b8a98523..428bac7b 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2530,29 +2530,42 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
/**
* tcp_sock_init_one() - Initialise listening socket for address and port
* @c: Execution context
+ * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
* @addr: Pointer to address for binding, NULL for dual stack any
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
* Return: fd for the new listening socket, negative error code on failure
+ *
+ * If pif == PIF_SPLICE, the caller must have already entered the guest ns.
*/
-static int tcp_sock_init_one(const struct ctx *c, const union inany_addr *addr,
- const char *ifname, in_port_t port)
+static int tcp_sock_init_one(const struct ctx *c, uint8_t pif,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t port)
{
union tcp_listen_epoll_ref tref = {
.port = port,
- .pif = PIF_HOST,
+ .pif = pif,
};
+ const struct fwd_ports *fwd;
int s;
- s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, PIF_HOST, addr,
- ifname, port, tref.u32);
+ if (pif == PIF_HOST)
+ fwd = &c->tcp.fwd_in;
+ else
+ fwd = &c->tcp.fwd_out;
+
+ s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, pif, addr, ifname,
+ port, tref.u32);
+
+ if (fwd->mode == FWD_AUTO) {
+ int (*socks)[IP_VERSIONS] = pif == PIF_SPLICE ?
+ tcp_sock_ns : tcp_sock_init_ext;
- if (c->tcp.fwd_in.mode == FWD_AUTO) {
if (!addr || inany_v4(addr))
- tcp_sock_init_ext[port][V4] = s < 0 ? -1 : s;
+ socks[port][V4] = s < 0 ? -1 : s;
if (!addr || !inany_v4(addr))
- tcp_sock_init_ext[port][V6] = s < 0 ? -1 : s;
+ socks[port][V6] = s < 0 ? -1 : s;
}
if (s < 0)
@@ -2564,14 +2577,16 @@ static int tcp_sock_init_one(const struct ctx *c, const union inany_addr *addr,
/**
* tcp_sock_init() - Create listening sockets for a given host ("inbound") port
* @c: Execution context
+ * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
* Return: 0 on (partial) success, negative error code on (complete) failure
*/
-int tcp_sock_init(const struct ctx *c, const union inany_addr *addr,
- const char *ifname, in_port_t port)
+int tcp_sock_init(const struct ctx *c, uint8_t pif,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t port)
{
int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
@@ -2579,72 +2594,23 @@ int tcp_sock_init(const struct ctx *c, const union inany_addr *addr,
if (!addr && c->ifi4 && c->ifi6)
/* Attempt to get a dual stack socket */
- if (tcp_sock_init_one(c, NULL, ifname, port) >= 0)
+ if (tcp_sock_init_one(c, pif, NULL, ifname, port) >= 0)
return 0;
/* Otherwise create a socket per IP version */
if ((!addr || inany_v4(addr)) && c->ifi4)
- r4 = tcp_sock_init_one(c, addr ? addr : &inany_any4,
- ifname, port);
+ r4 = tcp_sock_init_one(c, pif,
+ addr ? addr : &inany_any4, ifname, port);
if ((!addr || !inany_v4(addr)) && c->ifi6)
- r6 = tcp_sock_init_one(c, addr ? addr : &inany_any6,
- ifname, port);
+ r6 = tcp_sock_init_one(c, pif,
+ addr ? addr : &inany_any6, ifname, port);
if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
return 0;
return r4 < 0 ? r4 : r6;
}
-
-/**
- * tcp_ns_sock_init4() - Init socket to listen for outbound IPv4 connections
- * @c: Execution context
- * @port: Port, host order
- */
-static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
-{
- union tcp_listen_epoll_ref tref = {
- .port = port,
- .pif = PIF_SPLICE,
- };
- int s;
-
- ASSERT(c->mode == MODE_PASTA);
-
- s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, PIF_SPLICE, &inany_loopback4,
- NULL, port, tref.u32);
- if (s < 0)
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO)
- tcp_sock_ns[port][V4] = s;
-}
-
-/**
- * tcp_ns_sock_init6() - Init socket to listen for outbound IPv6 connections
- * @c: Execution context
- * @port: Port, host order
- */
-static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
-{
- union tcp_listen_epoll_ref tref = {
- .port = port,
- .pif = PIF_SPLICE,
- };
- int s;
-
- ASSERT(c->mode == MODE_PASTA);
-
- s = pif_sock_l4(c, EPOLL_TYPE_TCP_LISTEN, PIF_SPLICE, &inany_loopback6,
- NULL, port, tref.u32);
- if (s < 0)
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO)
- tcp_sock_ns[port][V6] = s;
-}
-
/**
* tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
* @c: Execution context
@@ -2655,9 +2621,9 @@ static void tcp_ns_sock_init(const struct ctx *c, in_port_t port)
ASSERT(!c->no_tcp);
if (c->ifi4)
- tcp_ns_sock_init4(c, port);
+ tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port);
if (c->ifi6)
- tcp_ns_sock_init6(c, port);
+ tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port);
}
/**
@@ -2855,7 +2821,7 @@ static void tcp_port_rebind(struct ctx *c, bool outbound)
if (outbound)
tcp_ns_sock_init(c, port);
else
- tcp_sock_init(c, NULL, NULL, port);
+ tcp_sock_init(c, PIF_HOST, NULL, NULL, port);
}
}
}
diff --git a/tcp.h b/tcp.h
index 00823867..de6b9f93 100644
--- a/tcp.h
+++ b/tcp.h
@@ -18,8 +18,9 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const void *saddr, const void *daddr, uint32_t flow_lbl,
const struct pool *p, int idx, const struct timespec *now);
-int tcp_sock_init(const struct ctx *c, const union inany_addr *addr,
- const char *ifname, in_port_t port);
+int tcp_sock_init(const struct ctx *c, uint8_t pif,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t port);
int tcp_init(struct ctx *c);
void tcp_port_rebind_all(struct ctx *c);
void tcp_timer(const struct ctx *c, const struct timespec *now);
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 5/9] udp: Unify some more inbound/outbound parts of udp_sock_init()
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (3 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 4/9] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 6/9] udp: Move udp_sock_init() special case to its caller David Gibson
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
udp_sock_init() takes an 'ns' parameter determining if it creates a socket
in the guest namespace or host namespace. Alter it to take a pif
parameter instead, like tcp_sock_init(), and use that change to slightly
reduce code duplication between the HOST and SPLICE cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
conf.c | 2 +-
udp.c | 63 ++++++++++++++++++++++++++++++----------------------------
udp.h | 5 +++--
3 files changed, 37 insertions(+), 33 deletions(-)
diff --git a/conf.c b/conf.c
index 26f1bcc0..08cb50aa 100644
--- a/conf.c
+++ b/conf.c
@@ -171,7 +171,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname,
if (optname == 't')
ret = tcp_sock_init(c, PIF_HOST, addr, ifname, i);
else if (optname == 'u')
- ret = udp_sock_init(c, 0, addr, ifname, i);
+ ret = udp_sock_init(c, PIF_HOST, addr, ifname, i);
else
/* No way to check in advance for -T and -U */
ret = 0;
diff --git a/udp.c b/udp.c
index d414ca32..3b43f136 100644
--- a/udp.c
+++ b/udp.c
@@ -1104,64 +1104,66 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
/**
* udp_sock_init() - Initialise listening sockets for a given port
* @c: Execution context
- * @ns: In pasta mode, if set, bind with loopback address in namespace
+ * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
* Return: 0 on (partial) success, negative error code on (complete) failure
*/
-int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr,
- const char *ifname, in_port_t port)
+int udp_sock_init(const struct ctx *c, uint8_t pif,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t port)
{
union udp_listen_epoll_ref uref = {
- .pif = ns ? PIF_SPLICE : PIF_HOST,
+ .pif = pif,
.port = port,
};
int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
+ int (*socks)[NUM_PORTS];
ASSERT(!c->no_udp);
+ ASSERT(pif_is_socket(pif));
- if (!addr && c->ifi4 && c->ifi6 && !ns) {
+ if (pif == PIF_HOST)
+ socks = udp_splice_init;
+ else
+ socks = udp_splice_ns;
+
+ if (!addr && c->ifi4 && c->ifi6 && pif == PIF_HOST) {
int s;
/* Attempt to get a dual stack socket */
s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST,
NULL, ifname, port, uref.u32);
- udp_splice_init[V4][port] = s < 0 ? -1 : s;
- udp_splice_init[V6][port] = s < 0 ? -1 : s;
+ socks[V4][port] = s < 0 ? -1 : s;
+ socks[V6][port] = s < 0 ? -1 : s;
if (IN_INTERVAL(0, FD_REF_MAX, s))
return 0;
}
if ((!addr || inany_v4(addr)) && c->ifi4) {
- if (!ns) {
- r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST,
- addr ? addr : &inany_any4, ifname,
- port, uref.u32);
+ const union inany_addr *a = addr ? addr : &inany_any4;
- udp_splice_init[V4][port] = r4 < 0 ? -1 : r4;
- } else {
- r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_SPLICE,
- &inany_loopback4, ifname,
- port, uref.u32);
- udp_splice_ns[V4][port] = r4 < 0 ? -1 : r4;
- }
+ if (pif == PIF_SPLICE)
+ a = &inany_loopback4;
+
+ r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
+ port, uref.u32);
+
+ socks[V4][port] = r4 < 0 ? -1 : r4;
}
if ((!addr || !inany_v4(addr)) && c->ifi6) {
- if (!ns) {
- r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST,
- addr ? addr : &inany_any6, ifname,
- port, uref.u32);
+ const union inany_addr *a = addr ? addr : &inany_any6;
- udp_splice_init[V6][port] = r6 < 0 ? -1 : r6;
- } else {
- r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_SPLICE,
- &inany_loopback6, ifname,
- port, uref.u32);
- udp_splice_ns[V6][port] = r6 < 0 ? -1 : r6;
- }
+ if (pif == PIF_SPLICE)
+ a = &inany_loopback6;
+
+ r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
+ port, uref.u32);
+
+ socks[V6][port] = r6 < 0 ? -1 : r6;
}
if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
@@ -1221,7 +1223,8 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
if ((c->ifi4 && socks[V4][port] == -1) ||
(c->ifi6 && socks[V6][port] == -1))
- udp_sock_init(c, outbound, NULL, NULL, port);
+ udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST,
+ NULL, NULL, port);
}
}
diff --git a/udp.h b/udp.h
index f1d83f38..03e8dc54 100644
--- a/udp.h
+++ b/udp.h
@@ -15,8 +15,9 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
sa_family_t af, const void *saddr, const void *daddr,
uint8_t ttl, const struct pool *p, int idx,
const struct timespec *now);
-int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr,
- const char *ifname, in_port_t port);
+int udp_sock_init(const struct ctx *c, uint8_t pif,
+ const union inany_addr *addr, const char *ifname,
+ in_port_t port);
int udp_init(struct ctx *c);
void udp_port_rebind_all(struct ctx *c);
void udp_update_l2_buf(const unsigned char *eth_d);
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 6/9] udp: Move udp_sock_init() special case to its caller
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (4 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 5/9] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 7/9] util: Fix setting of IPV6_V6ONLY socket option David Gibson
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Inbound sockets are bound to the unspecified address by default, whereas
outbound sockets are bound to the loopback address. This is currently
handled in udp_sock_init() which ignores its addr argument in the outbound
case.
Move the handling of this special case to the caller, udp_port_rebind().
This makes the semantics of the 'addr' argument more consistent, and will
make future changes easier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/udp.c b/udp.c
index 3b43f136..b3ce9c7f 100644
--- a/udp.c
+++ b/udp.c
@@ -1130,7 +1130,7 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
else
socks = udp_splice_ns;
- if (!addr && c->ifi4 && c->ifi6 && pif == PIF_HOST) {
+ if (!addr && c->ifi4 && c->ifi6) {
int s;
/* Attempt to get a dual stack socket */
@@ -1145,9 +1145,6 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
if ((!addr || inany_v4(addr)) && c->ifi4) {
const union inany_addr *a = addr ? addr : &inany_any4;
- if (pif == PIF_SPLICE)
- a = &inany_loopback4;
-
r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
port, uref.u32);
@@ -1157,9 +1154,6 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
if ((!addr || !inany_v4(addr)) && c->ifi6) {
const union inany_addr *a = addr ? addr : &inany_any6;
- if (pif == PIF_SPLICE)
- a = &inany_loopback6;
-
r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
port, uref.u32);
@@ -1222,9 +1216,16 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
}
if ((c->ifi4 && socks[V4][port] == -1) ||
- (c->ifi6 && socks[V6][port] == -1))
- udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST,
- NULL, NULL, port);
+ (c->ifi6 && socks[V6][port] == -1)) {
+ if (outbound) {
+ udp_sock_init(c, PIF_SPLICE,
+ &inany_loopback4, NULL, port);
+ udp_sock_init(c, PIF_SPLICE,
+ &inany_loopback6, NULL, port);
+ } else {
+ udp_sock_init(c, PIF_HOST, NULL, NULL, port);
+ }
+ }
}
}
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 7/9] util: Fix setting of IPV6_V6ONLY socket option
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (5 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 6/9] udp: Move udp_sock_init() special case to its caller David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 8/9] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
2025-11-19 5:22 ` [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Currently we only call setsockopt() on IPV6_V6ONLY when we want to set it
to 1, which we typically do on all IPv6 sockets except those explicitly for
dual stack listening. That's not quite right in two ways:
* Although IPV6_V6ONLY==0 is normally the default on Linux, that can be
changed with the net.ipv6.bindv6only sysctl. It may also have different
defaults on other OSes if we ever support them. We know we need it off
for dual stack sockets, so explicitly set it to 0 in that case.
* At the same time setting IPV6_V6ONLY to 1 for IPv6 sockets bound to a
specific address is harmless but pointless. Don't set the option at all
in this case, saving a syscall.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
util.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/util.c b/util.c
index 1108a1da..82491326 100644
--- a/util.c
+++ b/util.c
@@ -45,13 +45,13 @@
* @type: epoll type
* @sa: Socket address to bind to
* @ifname: Interface for binding, NULL for any
- * @v6only: Set IPV6_V6ONLY socket option
+ * @v6only: If >= 0, set IPV6_V6ONLY socket option to this value
*
* Return: newly created socket, negative error code on failure
*/
static int sock_l4_(const struct ctx *c, enum epoll_type type,
const union sockaddr_inany *sa, const char *ifname,
- bool v6only)
+ int v6only)
{
sa_family_t af = sa->sa_family;
bool freebind = false;
@@ -95,9 +95,13 @@ static int sock_l4_(const struct ctx *c, enum epoll_type type,
return -EBADF;
}
- if (v6only)
- if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &y, sizeof(y)))
- debug("Failed to set IPV6_V6ONLY on socket %i", fd);
+ if (v6only >= 0) {
+ if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY,
+ &v6only, sizeof(v6only))) {
+ debug("Failed to set IPV6_V6ONLY to %d on socket %i",
+ v6only, fd);
+ }
+ }
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &y, sizeof(y)))
debug("Failed to set SO_REUSEADDR on socket %i", fd);
@@ -180,7 +184,16 @@ static int sock_l4_(const struct ctx *c, enum epoll_type type,
int sock_l4(const struct ctx *c, enum epoll_type type,
const union sockaddr_inany *sa, const char *ifname)
{
- return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6);
+ int v6only = -1;
+
+ /* The option doesn't exist for IPv4 sockets, and we don't care about it
+ * for IPv6 sockets with a non-wildcard address.
+ */
+ if (sa->sa_family == AF_INET6 &&
+ IN6_IS_ADDR_UNSPECIFIED(&sa->sa6.sin6_addr))
+ v6only = 1;
+
+ return sock_l4_(c, type, sa, ifname, v6only);
}
/**
@@ -203,6 +216,10 @@ int sock_l4_dualstack(const struct ctx *c, enum epoll_type type,
.sa6.sin6_port = htons(port),
};
+ /* Dual stack sockets require IPV6_V6ONLY == 0. Usually that's the
+ * default, but sysctl net.ipv6.bindv6only can change that, so set the
+ * sockopt explicitly.
+ */
return sock_l4_(c, type, &sa, ifname, 0);
}
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 8/9] tcp, udp: Remove fallback if creating dual stack socket fails
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (6 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 7/9] util: Fix setting of IPV6_V6ONLY socket option David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-19 5:22 ` [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
8 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
To save kernel memory we try to use "dual stack" sockets which can listen
for both IPv4 and IPv6 connections where possible. To support kernels
which don't allow dual stack sockets, we fall back to creating individual
sockets for IPv4 and IPv6.
This fallback causes some mild ugliness now, and will cause more difficulty
with upcoming improvements to the forwarding logic. I don't think we need
the fallback on the following grounds:
1) The fallback was broken since inception:
The fallback was triggered if pif_sock_l4() failed attempting to create the
dual stack socket. But even if the kernel didn't support them,
pif_sock_l4() would not report a failure.
- Dual stack sockets are distinguished by having the IPV6_V6ONLY sockopt
set to 0. However, until the last patch, we only called setsockopt()
if we wanted to set this to 1, so there was no kernel operation which
could fail for dual stack sockets - we'd silently create a IPv6 only
socket instead.
- Even if we did call the setsockopt(), we only printed a debug() message
for failures, we didn't report it to the caller
2) Dual stack sockets are not just a Linux extension
The dual stack socket interface is described in RFC3493, specifically
section 3.7 and section 5.3. It is supported on BSD:
https://man.freebsd.org/cgi/man.cgi?query=ip6
and on Windows:
https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-ipv6-socket-options
3) Linux has supported dual stack sockets for over 20 years
According to ipv6(7) the IPV6_V6ONLY socket option was introduced in
Linux 2.6 and Linux 2.4.21 both from 2003.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 43 +++++++++++++++++++++++++------------------
udp.c | 56 ++++++++++++++++++++++++++------------------------------
2 files changed, 51 insertions(+), 48 deletions(-)
diff --git a/tcp.c b/tcp.c
index 428bac7b..2abb8be4 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2575,42 +2575,49 @@ static int tcp_sock_init_one(const struct ctx *c, uint8_t pif,
}
/**
- * tcp_sock_init() - Create listening sockets for a given host ("inbound") port
+ * tcp_sock_init() - Create listening socket for a given host ("inbound") port
* @c: Execution context
* @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
- * Return: 0 on (partial) success, negative error code on (complete) failure
+ * Return: 0 on success, negative error code on failure
*/
int tcp_sock_init(const struct ctx *c, uint8_t pif,
const union inany_addr *addr, const char *ifname,
in_port_t port)
{
- int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
+ int s;
ASSERT(!c->no_tcp);
- if (!addr && c->ifi4 && c->ifi6)
- /* Attempt to get a dual stack socket */
- if (tcp_sock_init_one(c, pif, NULL, ifname, port) >= 0)
+ if (!c->ifi4) {
+ if (!addr)
+ /* Restrict to v6 only */
+ addr = &inany_any6;
+ else if (inany_v4(addr))
+ /* Nothing to do */
return 0;
+ }
+ if (!c->ifi6) {
+ if (!addr)
+ /* Restrict to v4 only */
+ addr = &inany_any4;
+ else if (!inany_v4(addr))
+ /* Nothing to do */
+ return 0;
+ }
- /* Otherwise create a socket per IP version */
- if ((!addr || inany_v4(addr)) && c->ifi4)
- r4 = tcp_sock_init_one(c, pif,
- addr ? addr : &inany_any4, ifname, port);
-
- if ((!addr || !inany_v4(addr)) && c->ifi6)
- r6 = tcp_sock_init_one(c, pif,
- addr ? addr : &inany_any6, ifname, port);
-
- if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
- return 0;
+ s = tcp_sock_init_one(c, pif, addr, ifname, port);
+ if (s < 0)
+ return s;
+ if (s > FD_REF_MAX)
+ return -EIO;
- return r4 < 0 ? r4 : r6;
+ return 0;
}
+
/**
* tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
* @c: Execution context
diff --git a/udp.c b/udp.c
index b3ce9c7f..3d097fbb 100644
--- a/udp.c
+++ b/udp.c
@@ -1102,14 +1102,14 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
}
/**
- * udp_sock_init() - Initialise listening sockets for a given port
+ * udp_sock_init() - Initialise listening socket for a given port
* @c: Execution context
* @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE)
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*
- * Return: 0 on (partial) success, negative error code on (complete) failure
+ * Return: 0 on success, negative error code on failure
*/
int udp_sock_init(const struct ctx *c, uint8_t pif,
const union inany_addr *addr, const char *ifname,
@@ -1119,8 +1119,8 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
.pif = pif,
.port = port,
};
- int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
int (*socks)[NUM_PORTS];
+ int s;
ASSERT(!c->no_udp);
ASSERT(pif_is_socket(pif));
@@ -1130,40 +1130,36 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
else
socks = udp_splice_ns;
- if (!addr && c->ifi4 && c->ifi6) {
- int s;
-
- /* Attempt to get a dual stack socket */
- s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST,
- NULL, ifname, port, uref.u32);
- socks[V4][port] = s < 0 ? -1 : s;
- socks[V6][port] = s < 0 ? -1 : s;
- if (IN_INTERVAL(0, FD_REF_MAX, s))
+ if (!c->ifi4) {
+ if (!addr)
+ /* Restrict to v6 only */
+ addr = &inany_any6;
+ else if (inany_v4(addr))
+ /* Nothing to do */
return 0;
}
-
- if ((!addr || inany_v4(addr)) && c->ifi4) {
- const union inany_addr *a = addr ? addr : &inany_any4;
-
- r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
- port, uref.u32);
-
- socks[V4][port] = r4 < 0 ? -1 : r4;
+ if (!c->ifi6) {
+ if (!addr)
+ /* Restrict to v4 only */
+ addr = &inany_any4;
+ else if (!inany_v4(addr))
+ /* Nothing to do */
+ return 0;
}
- if ((!addr || !inany_v4(addr)) && c->ifi6) {
- const union inany_addr *a = addr ? addr : &inany_any6;
-
- r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
- port, uref.u32);
-
- socks[V6][port] = r6 < 0 ? -1 : r6;
+ s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
+ addr, ifname, port, uref.u32);
+ if (s > FD_REF_MAX) {
+ close(s);
+ s = -EIO;
}
- if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6))
- return 0;
+ if (!addr || inany_v4(addr))
+ socks[V4][port] = s < 0 ? -1 : s;
+ if (!addr || !inany_v4(addr))
+ socks[V6][port] = s < 0 ? -1 : s;
- return r4 < 0 ? r4 : r6;
+ return s < 0 ? s : 0;
}
/**
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
` (7 preceding siblings ...)
2025-11-19 5:22 ` [PATCH v4 8/9] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
@ 2025-11-19 5:22 ` David Gibson
2025-11-21 3:56 ` Stefano Brivio
8 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2025-11-19 5:22 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Currently, outbound forwards (-T, -U) are handled by sockets bound to the
loopback address. Typically we create two sockets, one for 127.0.0.1 and
one for ::1.
This has some disadvantages:
* The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1
* We can't use dual-stack sockets, we have to have separate sockets for
IPv4 and IPv6.
The restriction exists for a reason though. If the guest has any
interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach
the host via the forwards. Especially combined with -T auto / -U auto this
would make it very easy to make a mistake with nasty security implications.
We can achieve this a different way, however. Don't bind to a specific
address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo"
interface.
Note that although traffic to a local but non-loopback address is passed
over the 'lo' interface (as seen by netfilter and dumpcap), it doesn't
count as attached to that interface for the purposes of SO_BINDTODEVICE
(information from the routing table overrides the "physical" interface).
So, this change doesn't help for bug 100.
It's also not a complete fix for bug 113, it does however:
* Get us a step closer to fixing bug 113
* Slightly simplify the code
* Make things a bit easier to allow more flexible binding on the guest in
in future
Link: https://bugs.passt.top/show_bug.cgi?id=100
Link: https://bugs.passt.top/show_bug.cgi?id=113
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
pif.c | 6 ------
tcp.c | 19 ++-----------------
udp.c | 10 +++-------
3 files changed, 5 insertions(+), 30 deletions(-)
diff --git a/pif.c b/pif.c
index 85904f35..db447b4f 100644
--- a/pif.c
+++ b/pif.c
@@ -81,12 +81,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
ASSERT(pif_is_socket(pif));
- if (pif == PIF_SPLICE) {
- /* Sanity checks */
- ASSERT(!ifname);
- ASSERT(addr && inany_is_loopback(addr));
- }
-
if (!addr) {
ref.fd = sock_l4_dualstack(c, type, port, ifname);
} else {
diff --git a/tcp.c b/tcp.c
index 2abb8be4..bd25952f 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2618,21 +2618,6 @@ int tcp_sock_init(const struct ctx *c, uint8_t pif,
return 0;
}
-/**
- * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
- * @c: Execution context
- * @port: Port, host order
- */
-static void tcp_ns_sock_init(const struct ctx *c, in_port_t port)
-{
- ASSERT(!c->no_tcp);
-
- if (c->ifi4)
- tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port);
- if (c->ifi6)
- tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port);
-}
-
/**
* tcp_ns_socks_init() - Bind sockets in namespace for outbound connections
* @arg: Execution context
@@ -2651,7 +2636,7 @@ static int tcp_ns_socks_init(void *arg)
if (!bitmap_isset(c->tcp.fwd_out.map, port))
continue;
- tcp_ns_sock_init(c, port);
+ tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
}
return 0;
@@ -2826,7 +2811,7 @@ static void tcp_port_rebind(struct ctx *c, bool outbound)
if ((c->ifi4 && socks[port][V4] == -1) ||
(c->ifi6 && socks[port][V6] == -1)) {
if (outbound)
- tcp_ns_sock_init(c, port);
+ tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
else
tcp_sock_init(c, PIF_HOST, NULL, NULL, port);
}
diff --git a/udp.c b/udp.c
index 3d097fbb..f3d445cf 100644
--- a/udp.c
+++ b/udp.c
@@ -1213,14 +1213,10 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
if ((c->ifi4 && socks[V4][port] == -1) ||
(c->ifi6 && socks[V6][port] == -1)) {
- if (outbound) {
- udp_sock_init(c, PIF_SPLICE,
- &inany_loopback4, NULL, port);
- udp_sock_init(c, PIF_SPLICE,
- &inany_loopback6, NULL, port);
- } else {
+ if (outbound)
+ udp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
+ else
udp_sock_init(c, PIF_HOST, NULL, NULL, port);
- }
}
}
}
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address
2025-11-19 5:22 ` [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
@ 2025-11-21 3:56 ` Stefano Brivio
2025-11-21 5:24 ` David Gibson
2025-11-26 5:42 ` David Gibson
0 siblings, 2 replies; 14+ messages in thread
From: Stefano Brivio @ 2025-11-21 3:56 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
The series looks good to me in general, except that:
On Wed, 19 Nov 2025 16:22:57 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> Currently, outbound forwards (-T, -U) are handled by sockets bound to the
> loopback address. Typically we create two sockets, one for 127.0.0.1 and
> one for ::1.
>
> This has some disadvantages:
> * The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1
> * We can't use dual-stack sockets, we have to have separate sockets for
> IPv4 and IPv6.
>
> The restriction exists for a reason though. If the guest has any
> interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach
> the host via the forwards. Especially combined with -T auto / -U auto this
> would make it very easy to make a mistake with nasty security implications.
>
> We can achieve this a different way, however. Don't bind to a specific
> address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo"
> interface.
...this means, as I pointed out on:
https://archives.passt.top/passt-dev/20251022105916.53925523@elisabeth/
that we might break functionality for a number of pasta(1) users.
I don't have a complete version of the SO_BINDTODEVICE fallback I
sketched there, so I can't just add one on top of this series at the
moment, but we need something like that before I can merge this.
--
Stefano
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address
2025-11-21 3:56 ` Stefano Brivio
@ 2025-11-21 5:24 ` David Gibson
2025-11-21 5:39 ` Stefano Brivio
2025-11-26 5:42 ` David Gibson
1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2025-11-21 5:24 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1893 bytes --]
On Fri, Nov 21, 2025 at 04:56:01AM +0100, Stefano Brivio wrote:
> The series looks good to me in general, except that:
>
> On Wed, 19 Nov 2025 16:22:57 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > Currently, outbound forwards (-T, -U) are handled by sockets bound to the
> > loopback address. Typically we create two sockets, one for 127.0.0.1 and
> > one for ::1.
> >
> > This has some disadvantages:
> > * The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1
> > * We can't use dual-stack sockets, we have to have separate sockets for
> > IPv4 and IPv6.
> >
> > The restriction exists for a reason though. If the guest has any
> > interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach
> > the host via the forwards. Especially combined with -T auto / -U auto this
> > would make it very easy to make a mistake with nasty security implications.
> >
> > We can achieve this a different way, however. Don't bind to a specific
> > address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo"
> > interface.
>
> ...this means, as I pointed out on:
>
> https://archives.passt.top/passt-dev/20251022105916.53925523@elisabeth/
>
> that we might break functionality for a number of pasta(1) users.
>
> I don't have a complete version of the SO_BINDTODEVICE fallback I
> sketched there, so I can't just add one on top of this series at the
> moment, but we need something like that before I can merge this.
Yes, I was intending to make that change, but just forgot. I'll fix
this patch and resend. Are you intending to merge the rest of the
series, or should I respin it?
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address
2025-11-21 5:24 ` David Gibson
@ 2025-11-21 5:39 ` Stefano Brivio
0 siblings, 0 replies; 14+ messages in thread
From: Stefano Brivio @ 2025-11-21 5:39 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Fri, 21 Nov 2025 16:24:20 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Nov 21, 2025 at 04:56:01AM +0100, Stefano Brivio wrote:
> > The series looks good to me in general, except that:
> >
> > On Wed, 19 Nov 2025 16:22:57 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > Currently, outbound forwards (-T, -U) are handled by sockets bound to the
> > > loopback address. Typically we create two sockets, one for 127.0.0.1 and
> > > one for ::1.
> > >
> > > This has some disadvantages:
> > > * The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1
> > > * We can't use dual-stack sockets, we have to have separate sockets for
> > > IPv4 and IPv6.
> > >
> > > The restriction exists for a reason though. If the guest has any
> > > interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach
> > > the host via the forwards. Especially combined with -T auto / -U auto this
> > > would make it very easy to make a mistake with nasty security implications.
> > >
> > > We can achieve this a different way, however. Don't bind to a specific
> > > address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo"
> > > interface.
> >
> > ...this means, as I pointed out on:
> >
> > https://archives.passt.top/passt-dev/20251022105916.53925523@elisabeth/
> >
> > that we might break functionality for a number of pasta(1) users.
> >
> > I don't have a complete version of the SO_BINDTODEVICE fallback I
> > sketched there, so I can't just add one on top of this series at the
> > moment, but we need something like that before I can merge this.
>
> Yes, I was intending to make that change, but just forgot. I'll fix
> this patch and resend. Are you intending to merge the rest of the
> series, or should I respin it?
I guess it's marginally easier if you respin so that we have a single
link to the series, eventually, and I would run tests on the series as
a whole, not just up to 8/9.
--
Stefano
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address
2025-11-21 3:56 ` Stefano Brivio
2025-11-21 5:24 ` David Gibson
@ 2025-11-26 5:42 ` David Gibson
1 sibling, 0 replies; 14+ messages in thread
From: David Gibson @ 2025-11-26 5:42 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]
On Fri, Nov 21, 2025 at 04:56:01AM +0100, Stefano Brivio wrote:
> The series looks good to me in general, except that:
>
> On Wed, 19 Nov 2025 16:22:57 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > Currently, outbound forwards (-T, -U) are handled by sockets bound to the
> > loopback address. Typically we create two sockets, one for 127.0.0.1 and
> > one for ::1.
> >
> > This has some disadvantages:
> > * The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1
> > * We can't use dual-stack sockets, we have to have separate sockets for
> > IPv4 and IPv6.
> >
> > The restriction exists for a reason though. If the guest has any
> > interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach
> > the host via the forwards. Especially combined with -T auto / -U auto this
> > would make it very easy to make a mistake with nasty security implications.
> >
> > We can achieve this a different way, however. Don't bind to a specific
> > address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo"
> > interface.
>
> ...this means, as I pointed out on:
>
> https://archives.passt.top/passt-dev/20251022105916.53925523@elisabeth/
>
> that we might break functionality for a number of pasta(1) users.
>
> I don't have a complete version of the SO_BINDTODEVICE fallback I
> sketched there, so I can't just add one on top of this series at the
> moment, but we need something like that before I can merge this.
I re-examined your proposed approach, but realised it doesn't quite
work. The problem is that to complete it, sock_l4_sa() would need to
create both an IPv4 and IPv6. That works right now, but it breaks the
assumption that tcp_sock_init() and udp_sock_init() create (at most) a
single socket. That wasn't the case until 8/9 in this series, but
part of the reason for 8/9 is because establishing that invariant
makes a bunch of stuff in the works much saner.
So, I'm working to figure out a different approach for an
SO_BINDTODEVICE fallback.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-26 5:42 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-19 5:22 [PATCH v4 0/9] Reduce differences between inbound and outbound socket binding David Gibson
2025-11-19 5:22 ` [PATCH v4 1/9] flow: Remove bogus @path field from flowside_sock_args David Gibson
2025-11-19 5:22 ` [PATCH v4 2/9] inany: Let length of sockaddr_inany be implicit from the family David Gibson
2025-11-19 5:22 ` [PATCH v4 3/9] util, flow, pif: Simplify sock_l4_sa() interface David Gibson
2025-11-19 5:22 ` [PATCH v4 4/9] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-11-19 5:22 ` [PATCH v4 5/9] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-11-19 5:22 ` [PATCH v4 6/9] udp: Move udp_sock_init() special case to its caller David Gibson
2025-11-19 5:22 ` [PATCH v4 7/9] util: Fix setting of IPV6_V6ONLY socket option David Gibson
2025-11-19 5:22 ` [PATCH v4 8/9] tcp, udp: Remove fallback if creating dual stack socket fails David Gibson
2025-11-19 5:22 ` [PATCH v4 9/9] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-11-21 3:56 ` Stefano Brivio
2025-11-21 5:24 ` David Gibson
2025-11-21 5:39 ` Stefano Brivio
2025-11-26 5:42 ` David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).