From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bQFkj/2Z; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id CAA025A061D for ; Tue, 21 Oct 2025 23:51:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761083477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=coOUuz18MZ/1BxPM4KDsJ1aeJzkTmAUpXYMTonpmLj8=; b=bQFkj/2Z/HkWDlORHBLUbWrZ71qAEQYPEfOxNH3Vvvql18b6YwYDXi6mlS7J9c5UT4DRoC vWfFevSLvj8LFzSlx1ndEwwJyh3RyfK/3Nee3sWQPbVgtFlkjV2j6fPifecoR4/5NerqRz gOaWel0AVKJcEMzFfDtEd3iMp7gRTUU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-614-HE5-7fxqPdS3QXD6vFZ5tQ-1; Tue, 21 Oct 2025 17:51:16 -0400 X-MC-Unique: HE5-7fxqPdS3QXD6vFZ5tQ-1 X-Mimecast-MFC-AGG-ID: HE5-7fxqPdS3QXD6vFZ5tQ_1761083475 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-4270848ceffso6434202f8f.3 for ; Tue, 21 Oct 2025 14:51:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761083475; x=1761688275; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=coOUuz18MZ/1BxPM4KDsJ1aeJzkTmAUpXYMTonpmLj8=; b=hFrrjgEwcwiLydPlRqQUyfXWGxbQLU8ADCfDNj8WUS94OUzXDTasvrk3Zin2dn/yY1 YKpMSxxDYchAjPm/p59ZjkC35SE8rd730ECYgUM7qNW/g431alzkBkwMmpVJHKeTXb5l r0tiNWVHybJ9mcAvkS1HE67eQrh2nEY83bfGrgowEhr9K1TFpNLdlx5z2GFIIlGWM/HC +n0A2Ned4A74tRXy6dVSFxTclUHnNqeUa5i3nQub3I5P/IO8MvTWTERoAWxzRL452XFp r20E7xqfTc/c8C7SNEmmiQUjK3MG/J/gciNIXBhS5Dcn2dlbCTAhe7unup0OYViRyszw ChuQ== X-Gm-Message-State: AOJu0YyHhC8kkAzbVp+kMWRdY/G2cvBdU0xnA+lJs8J1FMMhB16dLbPg FIh7B2jbsWCyDZD9BcmWGfl02xa6ZZ9hrZvjU7yGLzj8xIc2DTdMa/KGa/FMc83LU2L1oTDqFPq kqHXW8VHgdNUUbUlwwX8SxW+F3f52GQ/rjiZaeR+s4PoNFgnlsp2WXmZr/bqXDQ== X-Gm-Gg: ASbGnctGippVy0nrd7IJhyqTe3aHYucFq+AZ5ud+Js/XARhqBssl6cGcNQFyNlYB6TI Z0VuySspKypLpMecjMbGZU+uin/dYYHzmpcGvsZvWfjxYs2/HfX81ERgCu1xlQZ/jVPE8gCzEaY YS1RhGaUFzF8DErrl71WhtedGkJduQu8i9/QDXUFTcMHSb30ASBMxKZVFd+q/WYrmx0F9s1zULf eFnFOMXtUR66JMlbk37iUW8Yc90eIrp7VIhBYZaEnbbxwPQk3usudKyqZwIhmY7Ugo0uG/JqXHY Hnp3rLmnl785JPt/p/p6ACDfN+R7rVW0NQWO9XcLv5akEJAuCyX0RpRHVqRuncizL8Nh1tv4nI4 GsD5Lu1OnNw== X-Received: by 2002:a5d:5888:0:b0:427:151:3dae with SMTP id ffacd0b85a97d-42704d8dfa2mr12141671f8f.20.1761083474703; Tue, 21 Oct 2025 14:51:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHa9oXZRs5tZRep98RTgki5yuBkgr3TbXTD58crqhIFBujF+j3uVIcllpKdG/okKMRQEsG+gQ== X-Received: by 2002:a5d:5888:0:b0:427:151:3dae with SMTP id ffacd0b85a97d-42704d8dfa2mr12141665f8f.20.1761083474256; Tue, 21 Oct 2025 14:51:14 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-475c4342373sm13241085e9.12.2025.10.21.14.51.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Oct 2025 14:51:13 -0700 (PDT) Date: Tue, 21 Oct 2025 23:51:12 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/3] tcp, udp: Bind outbound listening sockets by interface instead of address Message-ID: <20251021235112.17369db5@elisabeth> In-Reply-To: <20251017003447.414103-4-david@gibson.dropbear.id.au> References: <20251017003447.414103-1-david@gibson.dropbear.id.au> <20251017003447.414103-4-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: KSqauFcrItr7zP7ftekx5mS917xdN70eyTKKd9yPZQI_1761083475 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: OHJPKRORPR5SOJE4UZVX2ZCXA54HO4FA X-Message-ID-Hash: OHJPKRORPR5SOJE4UZVX2ZCXA54HO4FA X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 17 Oct 2025 11:34:47 +1100 David Gibson wrote: > Currently, outbound forwards (-T, -U) are handled by sockets bound to the > loopback address. Typically we create two sockets, one for 127.0.0.1 and > one for ::1. > > This has some disadvantages: > * The guest can't connect to these services using its global IP address, > it must explicitly use 127.0.0.1 or ::1 (bug 100) > * The guest can't even connect via 127.0.0.0/8 addresses other than > 127.0.0.1 > * We can't use dual-stack sockets, we have to have separate sockets for > IPv4 and IPv6. > > The restriction exist for a reason though. If the guest has any interfaces > other than pasta (e.g. a VPN tunnel) external hosts could reach the host > via the forwards. Especially combined with -T auto / -U auto this would > make it very easy to make a mistake with nasty security implications. > > We can achieve both goals, however, if we don't bind the outbound listening > sockets to a particular address, but _do_ use SO_BINDTODEVICE to restrict > them to the "lo" interface. Nice trick, I didn't think of it. I wonder if doing the same host-side might help solving a part of https://bugs.passt.top/show_bug.cgi?id=113 as well. > Link: https://bugs.passt.top/show_bug.cgi?id=100 > > Signed-off-by: David Gibson > --- > pif.c | 6 ------ > tcp.c | 18 ++---------------- > udp.c | 27 ++++++++++----------------- > 3 files changed, 12 insertions(+), 39 deletions(-) > > diff --git a/pif.c b/pif.c > index 592fafaa..84e3ceae 100644 > --- a/pif.c > +++ b/pif.c > @@ -87,12 +87,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, > > ASSERT(pif_is_socket(pif)); > > - if (pif == PIF_SPLICE) { > - /* Sanity checks */ > - ASSERT(!ifname); > - ASSERT(addr && inany_is_loopback(addr)); > - } > - > if (!addr) > return sock_l4_sa(c, type, &sa, sizeof(sa.sa6), > ifname, false, data); > diff --git a/tcp.c b/tcp.c > index 15c012d7..982c9190 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -2592,20 +2592,6 @@ int tcp_sock_init(const struct ctx *c, uint8_t pif, > > return r4 < 0 ? r4 : r6; > } > -/** > - * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections > - * @c: Execution context > - * @port: Port, host order > - */ > -static void tcp_ns_sock_init(const struct ctx *c, in_port_t port) > -{ > - ASSERT(!c->no_tcp); > - > - if (c->ifi4) > - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port); > - if (c->ifi6) > - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port); > -} > > /** > * tcp_ns_socks_init() - Bind sockets in namespace for outbound connections > @@ -2625,7 +2611,7 @@ static int tcp_ns_socks_init(void *arg) > if (!bitmap_isset(c->tcp.fwd_out.map, port)) > continue; > > - tcp_ns_sock_init(c, port); > + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port); I thought the "lo" string would be part of the Linux UAPI, but that's not the case, and loopback_net_init() just calls: alloc_netdev(0, "lo", NET_NAME_PREDICTABLE, loopback_setup); so I think it's relatively unproblematic to hardcode that as well, and it looks like we can't create a second loopback interface, even though: $ pasta -- sh -c 'ip link set dev lo down; ip link change dev lo name lol; ip link show lol' 1: lol: mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 I don't have any quick solution and I don't think we care enough as to write a function in netlink.c fetching links with loopback type, so I'm totally fine with this as it is. By the way, if we fail to use SO_BINDTODEVICE, we already defensively close the socket. The only possible flaw that occurs to me is that somebody could rename 'lo' and then create a link called 'lo' of a different type. But that needs CAP_NET_ADMIN in the container anyway. > } > > return 0; > @@ -2805,7 +2791,7 @@ static void tcp_port_rebind(struct ctx *c, bool outbound) > if ((c->ifi4 && socks[port][V4] == -1) || > (c->ifi6 && socks[port][V6] == -1)) { > if (outbound) > - tcp_ns_sock_init(c, port); > + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port); Should we have/keep a fallback for pre-5.7 / pre-c427bfec18f2 kernels? > else > tcp_sock_init(c, PIF_HOST, NULL, NULL, port); > } > diff --git a/udp.c b/udp.c > index 49dd0144..e38114eb 100644 > --- a/udp.c > +++ b/udp.c > @@ -1127,26 +1127,16 @@ int udp_sock_init(const struct ctx *c, uint8_t pif, > } > > if ((!addr || inany_v4(addr)) && c->ifi4) { > - const union inany_addr *a = addr ? > - addr : &inany_any4; > - > - if (pif == PIF_SPLICE) > - a = &inany_loopback4; > - > - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, > + r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, > + addr ? addr : &inany_any4, ifname, > port, uref.u32); > > socks[V4][port] = r4 < 0 ? -1 : r4; > } > > if ((!addr || !inany_v4(addr)) && c->ifi6) { > - const union inany_addr *a = addr ? > - addr : &inany_any6; > - > - if (pif == PIF_SPLICE) > - a = &inany_loopback6; > - > - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, > + r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, > + addr ? addr : &inany_any6, ifname, > port, uref.u32); > > socks[V6][port] = r6 < 0 ? -1 : r6; > @@ -1214,9 +1204,12 @@ static void udp_port_rebind(struct ctx *c, bool outbound) > continue; > > if ((c->ifi4 && socks[V4][port] == -1) || > - (c->ifi6 && socks[V6][port] == -1)) > - udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST, > - NULL, NULL, port); > + (c->ifi6 && socks[V6][port] == -1)) { > + if (outbound) > + udp_sock_init(c, PIF_SPLICE, NULL, "lo", port); > + else > + udp_sock_init(c, PIF_HOST, NULL, NULL, port); Same here, should we add a fallback case? The rest of the series looks good to me. > + } > } > } > -- Stefano