From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202410 header.b=DTo9CEwd; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 9F86C5A004C for ; Wed, 16 Oct 2024 06:10:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202410; t=1729051801; bh=bpYvpIEVgtTlddcUE+VA+7YZRIqcMuZRvDoY6I+e4Go=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DTo9CEwdmknQWCYE5ZFz4mMvJmuBNf6Zo1JPmi/iXOZdvoXJ4OnmxJp078+oaDxS2 tW+SubCeANfcTpG3UJE4rELkZZLzw/lP9a62ri0QHJa6ht+kSfcKdyXhmr6I6aqM6E dgl5hLtdn49YY+BqprDukX/QNXclfInM7sA8+aW1eiob+wKhvkE4WzkTNoyZCsmq9b 1/gbHF7ojJtsIzWl/dhzSrahqi40FD7mVN24UyaWNHD5+8a7CDbjtJJRjhu2QjrJaV iFt/bgyieOoaeAWi6CEK8Y+0kNBuqtDqOKO2PRqpkWqELtnRtTshS9UJRO4YVvYhuE 0ewDI1LnZ17Sg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XSyD90BFhz4wd6; Wed, 16 Oct 2024 15:10:01 +1100 (AEDT) Date: Wed, 16 Oct 2024 14:15:19 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: References: <20241002054826.1812844-1-david@gibson.dropbear.id.au> <20241002054826.1812844-5-david@gibson.dropbear.id.au> <20241009150721.63af48f6@elisabeth> <20241009224433.7fc28fc7@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="/KJ6OI8/vjdnzGUC" Content-Disposition: inline In-Reply-To: Message-ID-Hash: MKCV46WDVC5YUDEGLRL7F6OHIHRQ6TM6 X-Message-ID-Hash: MKCV46WDVC5YUDEGLRL7F6OHIHRQ6TM6 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --/KJ6OI8/vjdnzGUC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote: > > On Wed, 9 Oct 2024 15:07:21 +0200 > > Stefano Brivio wrote: [snip] > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c= , uint8_t proto, > > > > (proto =3D=3D IPPROTO_TCP || proto =3D=3D IPPROTO_UDP)) { > > > > /* spliceable */ > > > > =20 > > > > - /* Preserve the specific loopback adddress used, but let the > > > > - * kernel pick a source port on the target side > > > > + /* The traffic will go over the guest's 'lo' interface, but by > > > > + * default use its external address, so we don't inadvertently > > > > + * expose services that listen only on the guest's loopback > > > > + * address. That can be overridden by --host-lo-to-ns-lo which > > > > + * will instead forward to the loopback address in the guest. > > > > + * > > > > + * In either case, let the kernel pick the source address to > > > > + * match. > > > > */ > > > > - tgt->oaddr =3D ini->eaddr; > > > > + if (inany_v4(&ini->eaddr)) { > > > > + if (c->host_lo_to_ns_lo) > > > > + tgt->eaddr =3D inany_loopback4; > > > > + else > > > > + tgt->eaddr =3D inany_from_v4(c->ip4.addr_seen); > > > > + tgt->oaddr =3D inany_any4; > > > > + } else { > > > > + if (c->host_lo_to_ns_lo) > > > > + tgt->eaddr =3D inany_loopback6; > > > > + else > > > > + tgt->eaddr.a6 =3D c->ip6.addr_seen; =20 > > >=20 > > > Either this... > > >=20 > > > > + tgt->oaddr =3D inany_any6; =20 > > >=20 > > > or this (and not something before this patch, up to 3/4) make the > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang, > > > sometimes (about one in three/four runs), that's what I mistakenly > > > reported as coming from Laurent's series at: >=20 > Huh, interesting. Just got back from my leave and ran that group of > tests in a loop this afternoon, but didn't manage to reproduce. I > have administrivia that will probably fill the rest of this week, but > I'll look into this as soon as I can. I reproduced the problem on passt.top, and I have a partial idea what's going on. As you say it's seeming like the address (addr_seen =3D=3D addr in this case) isn't properly ready. This is over splice, but on the tap interface, I see the container sending NS messages for its own address - seems like it's doing DAD. But more importantly, we're answering those NS messages with NA messages, because we answer all NS. i.e. we're making the DAD fail. What I'm not sure of is how this ever worked at all. --config-net makes sense, since we disable DAD, but our test suite has always been using NDP+DHCP instead of --config-net. So, AFACT, we'll always fail guest DAD attempts, both IPv6, which happens most of the time and for IPv4 via ARP, which is used much more rarely. I think we need to be more selective in what NS or ARP lookups we resopnd to. The question is what approach to take: Option 1: Answer everything apart from specific exceptions =3D=3D=3D=3D=3D=3D=3D=3D Basically we'd explicitly exclude the guest/container's assigned address but continue answering everything else. This is a bit ugly and means we'd probably still have a similar problem if the guest just picks an address instead of taking the assigned one. On the other hand it means things will work in at least some cases where the guest tries to contact remote hosts directly instead of via the default gateway. Option 2: Only answer for specific addresses =3D=3D=3D=3D=3D=3D=3D=3D=3D Reverse the logic above, usually *don't* answer NS queries, unless they have a specific address that we know is our responsibility. That would be, afaict, the gateway address, the "our_tap" addresses and the -map--host-loopback and --map-guest-addr addresses (usually all the same). This might be less robust against certain guests that don't use a default gateway when we expect. I guess that could happen when we have non-gateway routes copied from the host, so we'd have to handle that too. It would remove one more barrier towards having multiple bridged guests behind a single passt/pasta instance (they can't reasonably communicate with each other if passt answers all their NS / ARP requests). --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --/KJ6OI8/vjdnzGUC Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmcPL7UACgkQzQJF27ox 2GfGnw/9GdmaMCzcPJz2vo+Fl9hL0Zeyx5/E+eLInqAx/07qDHYgzBMPtNO9v6Jz l/vwmr30nSjgDdOmhBHpoS/Kqx9+VJFTSkYDXE58klR5PlEmz6zKXY8Fy8BAjqDb 6IdTZRX6gqkxTiWU5Gu4B19taoNj+VqjEPHXAmZp0a9R9fWWccrj/vEIJAolpjyp 3aigweOAYg1YssQTIQn6Ar8BwF82wgr7wKy7CYgBJlSIQu/9MD1LBhABQ2cd+kRC ySXP00GkVzK2D8MO9rR5V8k1s9AyWXj0+nGJnQinH2uMa6c6mxpEdKnQg35qm4Yr gkfa6DSseBBhbxHz/xThyLwuXheCuNYWz81ZcTD2brIAMihhmFuqWQqCPr5m4OWC CFnr4WuRonPC5F3Vgq9XpqC1ed8ZNtI7KDV5Q4BSd7Huxq/AFXXc+Cb1xzhmF102 BEG6lhxlKDKeTVCGg31wphvUqoTcA0nBf032nlh2169WOPyzwidprg3AzAsYPzCI WcrqbUITC8fgulDyjnvxiLML2m66GVjgYK9+v5mnnthhAHMSgC1Jo4PwBGwQNVcd PJWVx9Ooqr8WXJLBqUmwnUVE1LH4G4wgmTCIlfAJEcj5dpnsxyPCjeSmngX0fF/S v0VlA+KVlRXK8HbiVFIbPRUpSOaQVFKGSqAarvvEoZd/VNdabug= =13FQ -----END PGP SIGNATURE----- --/KJ6OI8/vjdnzGUC--