From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/3] tcp, udp: Bind outbound listening sockets by interface instead of address
Date: Tue, 21 Oct 2025 23:51:12 +0200 [thread overview]
Message-ID: <20251021235112.17369db5@elisabeth> (raw)
In-Reply-To: <20251017003447.414103-4-david@gibson.dropbear.id.au>
On Fri, 17 Oct 2025 11:34:47 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> Currently, outbound forwards (-T, -U) are handled by sockets bound to the
> loopback address. Typically we create two sockets, one for 127.0.0.1 and
> one for ::1.
>
> This has some disadvantages:
> * The guest can't connect to these services using its global IP address,
> it must explicitly use 127.0.0.1 or ::1 (bug 100)
> * The guest can't even connect via 127.0.0.0/8 addresses other than
> 127.0.0.1
> * We can't use dual-stack sockets, we have to have separate sockets for
> IPv4 and IPv6.
>
> The restriction exist for a reason though. If the guest has any interfaces
> other than pasta (e.g. a VPN tunnel) external hosts could reach the host
> via the forwards. Especially combined with -T auto / -U auto this would
> make it very easy to make a mistake with nasty security implications.
>
> We can achieve both goals, however, if we don't bind the outbound listening
> sockets to a particular address, but _do_ use SO_BINDTODEVICE to restrict
> them to the "lo" interface.
Nice trick, I didn't think of it. I wonder if doing the same host-side
might help solving a part of https://bugs.passt.top/show_bug.cgi?id=113
as well.
> Link: https://bugs.passt.top/show_bug.cgi?id=100
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> pif.c | 6 ------
> tcp.c | 18 ++----------------
> udp.c | 27 ++++++++++-----------------
> 3 files changed, 12 insertions(+), 39 deletions(-)
>
> diff --git a/pif.c b/pif.c
> index 592fafaa..84e3ceae 100644
> --- a/pif.c
> +++ b/pif.c
> @@ -87,12 +87,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
>
> ASSERT(pif_is_socket(pif));
>
> - if (pif == PIF_SPLICE) {
> - /* Sanity checks */
> - ASSERT(!ifname);
> - ASSERT(addr && inany_is_loopback(addr));
> - }
> -
> if (!addr)
> return sock_l4_sa(c, type, &sa, sizeof(sa.sa6),
> ifname, false, data);
> diff --git a/tcp.c b/tcp.c
> index 15c012d7..982c9190 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2592,20 +2592,6 @@ int tcp_sock_init(const struct ctx *c, uint8_t pif,
>
> return r4 < 0 ? r4 : r6;
> }
> -/**
> - * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections
> - * @c: Execution context
> - * @port: Port, host order
> - */
> -static void tcp_ns_sock_init(const struct ctx *c, in_port_t port)
> -{
> - ASSERT(!c->no_tcp);
> -
> - if (c->ifi4)
> - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port);
> - if (c->ifi6)
> - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port);
> -}
>
> /**
> * tcp_ns_socks_init() - Bind sockets in namespace for outbound connections
> @@ -2625,7 +2611,7 @@ static int tcp_ns_socks_init(void *arg)
> if (!bitmap_isset(c->tcp.fwd_out.map, port))
> continue;
>
> - tcp_ns_sock_init(c, port);
> + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
I thought the "lo" string would be part of the Linux UAPI, but that's
not the case, and loopback_net_init() just calls:
alloc_netdev(0, "lo", NET_NAME_PREDICTABLE, loopback_setup);
so I think it's relatively unproblematic to hardcode that as well, and it
looks like we can't create a second loopback interface, even though:
$ pasta -- sh -c 'ip link set dev lo down; ip link change dev lo name lol; ip link show lol'
1: lol: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
I don't have any quick solution and I don't think we care enough as to
write a function in netlink.c fetching links with loopback type, so I'm
totally fine with this as it is.
By the way, if we fail to use SO_BINDTODEVICE, we already defensively
close the socket. The only possible flaw that occurs to me is that
somebody could rename 'lo' and then create a link called 'lo' of a
different type. But that needs CAP_NET_ADMIN in the container anyway.
> }
>
> return 0;
> @@ -2805,7 +2791,7 @@ static void tcp_port_rebind(struct ctx *c, bool outbound)
> if ((c->ifi4 && socks[port][V4] == -1) ||
> (c->ifi6 && socks[port][V6] == -1)) {
> if (outbound)
> - tcp_ns_sock_init(c, port);
> + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
Should we have/keep a fallback for pre-5.7 / pre-c427bfec18f2 kernels?
> else
> tcp_sock_init(c, PIF_HOST, NULL, NULL, port);
> }
> diff --git a/udp.c b/udp.c
> index 49dd0144..e38114eb 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -1127,26 +1127,16 @@ int udp_sock_init(const struct ctx *c, uint8_t pif,
> }
>
> if ((!addr || inany_v4(addr)) && c->ifi4) {
> - const union inany_addr *a = addr ?
> - addr : &inany_any4;
> -
> - if (pif == PIF_SPLICE)
> - a = &inany_loopback4;
> -
> - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
> + r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
> + addr ? addr : &inany_any4, ifname,
> port, uref.u32);
>
> socks[V4][port] = r4 < 0 ? -1 : r4;
> }
>
> if ((!addr || !inany_v4(addr)) && c->ifi6) {
> - const union inany_addr *a = addr ?
> - addr : &inany_any6;
> -
> - if (pif == PIF_SPLICE)
> - a = &inany_loopback6;
> -
> - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname,
> + r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif,
> + addr ? addr : &inany_any6, ifname,
> port, uref.u32);
>
> socks[V6][port] = r6 < 0 ? -1 : r6;
> @@ -1214,9 +1204,12 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
> continue;
>
> if ((c->ifi4 && socks[V4][port] == -1) ||
> - (c->ifi6 && socks[V6][port] == -1))
> - udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST,
> - NULL, NULL, port);
> + (c->ifi6 && socks[V6][port] == -1)) {
> + if (outbound)
> + udp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
> + else
> + udp_sock_init(c, PIF_HOST, NULL, NULL, port);
Same here, should we add a fallback case? The rest of the series looks
good to me.
> + }
> }
> }
>
--
Stefano
next prev parent reply other threads:[~2025-10-21 21:51 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 0:34 [PATCH 0/3] RFC: Reduce differences between inbound and outbound socket binding David Gibson
2025-10-17 0:34 ` [PATCH 1/3] tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() David Gibson
2025-10-20 6:08 ` Stefano Brivio
2025-10-20 9:24 ` David Gibson
2025-10-20 6:09 ` Stefano Brivio
2025-10-20 9:25 ` David Gibson
2025-10-17 0:34 ` [PATCH 2/3] udp: Unify some more inbound/outbound parts of udp_sock_init() David Gibson
2025-10-21 21:51 ` Stefano Brivio
2025-10-22 0:08 ` David Gibson
2025-10-17 0:34 ` [PATCH 3/3] tcp, udp: Bind outbound listening sockets by interface instead of address David Gibson
2025-10-21 21:51 ` Stefano Brivio [this message]
2025-10-22 0:34 ` David Gibson
2025-10-22 8:59 ` Stefano Brivio
2025-10-23 1:18 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251021235112.17369db5@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).