From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=Fb+/X82o; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id AA97E5A026F for ; Wed, 22 Oct 2025 02:34:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1761093289; bh=/NxujcAoUDRcMcsEXS3HRmhl1yAxqOF1+HI3rmsbT3Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Fb+/X82oVCH8WRTU69VBRoUEjFo8dIePzhDf1lj5A6bfhd4EjWXWdQ9eKrtJ4GnIp 9+kSFWk3jsP+4hN2lpywfDBS44G4P+uC+wgQLu63Z0/1Ekr+ismUouHJJSDS2CA+3R G9pwYAkxO40tGMqWYUs2RXVjtY7zXYC57FM5KJyhK1KksuBuo7WWlii3tGV02WyxCI oZnVpmYf/v1CAZelUOhIQgeYktuVhn2Uqb4l3rM5BCLCshOETGBxGTFtqb9Lv0dSCV 7keqbiy1X805Q1v3ymwhL7LrXY+8Wy6J/rogqGMiAcosHERGBT20hxe6PeSupl6P5l fWkBD53Ca6TnQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4crqtd2Hdfz4wD0; Wed, 22 Oct 2025 11:34:49 +1100 (AEDT) Date: Wed, 22 Oct 2025 11:34:40 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 3/3] tcp, udp: Bind outbound listening sockets by interface instead of address Message-ID: References: <20251017003447.414103-1-david@gibson.dropbear.id.au> <20251017003447.414103-4-david@gibson.dropbear.id.au> <20251021235112.17369db5@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="+FC8SR8UpXj/dU7Z" Content-Disposition: inline In-Reply-To: <20251021235112.17369db5@elisabeth> Message-ID-Hash: S2EGE7MHIFNZLD6YSUFR4Z5FNP5CI2RS X-Message-ID-Hash: S2EGE7MHIFNZLD6YSUFR4Z5FNP5CI2RS X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --+FC8SR8UpXj/dU7Z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 21, 2025 at 11:51:12PM +0200, Stefano Brivio wrote: > On Fri, 17 Oct 2025 11:34:47 +1100 > David Gibson wrote: >=20 > > Currently, outbound forwards (-T, -U) are handled by sockets bound to t= he > > loopback address. Typically we create two sockets, one for 127.0.0.1 a= nd > > one for ::1. > >=20 > > This has some disadvantages: > > * The guest can't connect to these services using its global IP addres= s, > > it must explicitly use 127.0.0.1 or ::1 (bug 100) > > * The guest can't even connect via 127.0.0.0/8 addresses other than > > 127.0.0.1 > > * We can't use dual-stack sockets, we have to have separate sockets for > > IPv4 and IPv6. > >=20 > > The restriction exist for a reason though. If the guest has any interf= aces > > other than pasta (e.g. a VPN tunnel) external hosts could reach the host > > via the forwards. Especially combined with -T auto / -U auto this would > > make it very easy to make a mistake with nasty security implications. > >=20 > > We can achieve both goals, however, if we don't bind the outbound liste= ning > > sockets to a particular address, but _do_ use SO_BINDTODEVICE to restri= ct > > them to the "lo" interface. >=20 > Nice trick, I didn't think of it. I wonder if doing the same host-side > might help solving a part of https://bugs.passt.top/show_bug.cgi?id=3D113 > as well. I don't think we even need to do anything host side - bug 113 arises because of where we're listening on the guest side. So this might be enough to fix it all on its own. I'm not certain, because the special case DNS handling complicates the picture there. >=20 > > Link: https://bugs.passt.top/show_bug.cgi?id=3D100 > >=20 > > Signed-off-by: David Gibson > > --- > > pif.c | 6 ------ > > tcp.c | 18 ++---------------- > > udp.c | 27 ++++++++++----------------- > > 3 files changed, 12 insertions(+), 39 deletions(-) > >=20 > > diff --git a/pif.c b/pif.c > > index 592fafaa..84e3ceae 100644 > > --- a/pif.c > > +++ b/pif.c > > @@ -87,12 +87,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type= type, uint8_t pif, > > =20 > > ASSERT(pif_is_socket(pif)); > > =20 > > - if (pif =3D=3D PIF_SPLICE) { > > - /* Sanity checks */ > > - ASSERT(!ifname); > > - ASSERT(addr && inany_is_loopback(addr)); > > - } > > - > > if (!addr) > > return sock_l4_sa(c, type, &sa, sizeof(sa.sa6), > > ifname, false, data); > > diff --git a/tcp.c b/tcp.c > > index 15c012d7..982c9190 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2592,20 +2592,6 @@ int tcp_sock_init(const struct ctx *c, uint8_t p= if, > > =20 > > return r4 < 0 ? r4 : r6; > > } > > -/** > > - * tcp_ns_sock_init() - Init socket to listen for spliced outbound con= nections > > - * @c: Execution context > > - * @port: Port, host order > > - */ > > -static void tcp_ns_sock_init(const struct ctx *c, in_port_t port) > > -{ > > - ASSERT(!c->no_tcp); > > - > > - if (c->ifi4) > > - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port); > > - if (c->ifi6) > > - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port); > > -} > > =20 > > /** > > * tcp_ns_socks_init() - Bind sockets in namespace for outbound connec= tions > > @@ -2625,7 +2611,7 @@ static int tcp_ns_socks_init(void *arg) > > if (!bitmap_isset(c->tcp.fwd_out.map, port)) > > continue; > > =20 > > - tcp_ns_sock_init(c, port); > > + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port); >=20 > I thought the "lo" string would be part of the Linux UAPI, but that's > not the case, and loopback_net_init() just calls: >=20 > alloc_netdev(0, "lo", NET_NAME_PREDICTABLE, loopback_setup); >=20 > so I think it's relatively unproblematic to hardcode that as well, and it > looks like we can't create a second loopback interface, even though: >=20 > $ pasta -- sh -c 'ip link set dev lo down; ip link change dev lo name lol= ; ip link show lol' > 1: lol: mtu 65536 qdisc noqueue state DOWN mode DEFAULT group = default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Hm, that is a point. *thinks*. So I believe loopback always has index 1, so we could potentially use that. Except that BINDTODEVICE takes a name, not an index. I don't think looking up the name from the index then using BINDTODEVICE would do quite what we want either: IIUC, BINDTODEVICE is fixed by name not index, so if the guest changed the name of lo after we did BINDTODEVICE, then it wouldn't "follow" the interface name change (which is what you want for intermittently present interfaces, not so much here). > I don't have any quick solution and I don't think we care enough as to > write a function in netlink.c fetching links with loopback type, so I'm > totally fine with this as it is. Yeah, given the above, I think this is another case of we can only go so far to stop the guest shooting itself in the foot. > By the way, if we fail to use SO_BINDTODEVICE, we already defensively > close the socket. The only possible flaw that occurs to me is that > somebody could rename 'lo' and then create a link called 'lo' of a > different type. But that needs CAP_NET_ADMIN in the container anyway. Right. And while that could expose host ports in ways we didn't expect, a malicious guest could already do that by running a port forwarder. So, again, I think this falls under the category of the guest being allowed to shoot itself in the foot. > > } > > =20 > > return 0; > > @@ -2805,7 +2791,7 @@ static void tcp_port_rebind(struct ctx *c, bool o= utbound) > > if ((c->ifi4 && socks[port][V4] =3D=3D -1) || > > (c->ifi6 && socks[port][V6] =3D=3D -1)) { > > if (outbound) > > - tcp_ns_sock_init(c, port); > > + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port); >=20 > Should we have/keep a fallback for pre-5.7 / pre-c427bfec18f2 kernels? For a moment I thought we didn't need a fallback, because we'd be entering the guest ns and thereby gaining CAP_NET_RAW. But that's not the case: we only enter the guest netns for this operation, we're already in the userns and have dropped capabilities at this point (unlike the bindings we create at startup). So, good question. Having a fallback would make this a *lot* messier, and perhaps more importantly means we'd get a subtle but real behavioural difference based on kernel version which sounds like it could be pretty surprising to the user. My inclination is to say that -T auto / -U auto requires a kernel with that patch, but if you overrule me, I'll see what I can do. > > else > > tcp_sock_init(c, PIF_HOST, NULL, NULL, port); > > } > > diff --git a/udp.c b/udp.c > > index 49dd0144..e38114eb 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -1127,26 +1127,16 @@ int udp_sock_init(const struct ctx *c, uint8_t = pif, > > } > > =20 > > if ((!addr || inany_v4(addr)) && c->ifi4) { > > - const union inany_addr *a =3D addr ? > > - addr : &inany_any4; > > - > > - if (pif =3D=3D PIF_SPLICE) > > - a =3D &inany_loopback4; > > - > > - r4 =3D pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, > > + r4 =3D pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, > > + addr ? addr : &inany_any4, ifname, > > port, uref.u32); > > =20 > > socks[V4][port] =3D r4 < 0 ? -1 : r4; > > } > > =20 > > if ((!addr || !inany_v4(addr)) && c->ifi6) { > > - const union inany_addr *a =3D addr ? > > - addr : &inany_any6; > > - > > - if (pif =3D=3D PIF_SPLICE) > > - a =3D &inany_loopback6; > > - > > - r6 =3D pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, > > + r6 =3D pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, > > + addr ? addr : &inany_any6, ifname, > > port, uref.u32); > > =20 > > socks[V6][port] =3D r6 < 0 ? -1 : r6; > > @@ -1214,9 +1204,12 @@ static void udp_port_rebind(struct ctx *c, bool = outbound) > > continue; > > =20 > > if ((c->ifi4 && socks[V4][port] =3D=3D -1) || > > - (c->ifi6 && socks[V6][port] =3D=3D -1)) > > - udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST, > > - NULL, NULL, port); > > + (c->ifi6 && socks[V6][port] =3D=3D -1)) { > > + if (outbound) > > + udp_sock_init(c, PIF_SPLICE, NULL, "lo", port); > > + else > > + udp_sock_init(c, PIF_HOST, NULL, NULL, port); >=20 > Same here, should we add a fallback case? The rest of the series looks > good to me. Same comments as for TCP. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --+FC8SR8UpXj/dU7Z Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmj4Jp8ACgkQzQJF27ox 2Ge3iQ//a1MU7PGtaimOonMfIc0M3JxOXnBkwuFvuk1FBIRSiKmS+VxrxfJfdyn7 zvSQkn9JK/1YP9X+gOlezTvKagpJjPBP2Zrm8vcLSgnVUp5494Vp96sXb2jOuLz1 Hp5Jy4JkqxWd21cJqxRFn6vOFEjjxw4D0hrq0oKQOgvKESkbNhgAmf5jTovOF8+M HWgiVo2Ras7wgqr/vLTJvpYcCIAMp1SIXkvT4gx9UbsPMEt2tjeBtc9tARF3r34a H0drScsYBCXzSm5MZdRknErcMNsO9gjx3MZ3teSlMyWDzpY6pEnP+i6Tib2F32lK VsHGmjui3mTeL605foB2QP9NH/p0LIhyQdQi8t3MznD+6uOtM/WqoXJvoFK8kaq+ ZXJPNi+vglWy66AEZEuQ28HvsfMt7Jq8d1qyPjIRHl3hyXXkZuSzdjM/+S6ViOIK nmeg4kr7FHYkqlnXnWtSInqAiF7m4iQaNLcAYWn4PkuS1k2IvoVpqj9r19iog69T wTtUXVHEgipO9f13GZeQJXUCYuBAQZ1FOWq4TsMIbNyCU2DzIYng7to6Gxr/uOG8 dmzxL8wL0M5l0BNT7CLKJhB3O9zRN8qtfzr96/G/f885F5XjOILc5tdSIToWwNtS Lp5n98oEzcVdHdS0AHMshX8/aKwsTVuNoYFgVcCmzAKXfDXNZhE= =NrtF -----END PGP SIGNATURE----- --+FC8SR8UpXj/dU7Z--