From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202410 header.b=HwB40eyX; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 3D91F5A004E for ; Wed, 16 Oct 2024 10:39:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202410; t=1729067988; bh=s6ww1VMitPfPnwOn79sZ/0wSgnqz2QASAYxJu44USn0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HwB40eyX99UPimbUBpSRHPsYVN7NQVAsMH8SioLznBLQ0GXJ67threcZ5k4i4b53M 0wofp6hiPHyu9FR7UjnyCr2vLupJa0digYoVdxoXdOEaB5irZMbEq7jsDqPqvOuIOz o0cr5lOuB56jRK6f2lIHGADy2M63uNsa9Jw3M5hZKEiJ5ecIsHiRVYFf+WyVI7IPAI WsS2u+g9WaEuIHytTt1cIFUA1BllcdC0MENXaRdKXQQ/aRixSTckvNLdmy3OHMpged ERgxikkmNR07JOnDCljLiAa044XWbDpUuALVVr/q0CuIE6KeC+aLM+aSEzWvXjr4b2 Vsomw6fa/H4mw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XT4CS0hmrz4x2J; Wed, 16 Oct 2024 19:39:48 +1100 (AEDT) Date: Wed, 16 Oct 2024 19:39:40 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: References: <20241002054826.1812844-1-david@gibson.dropbear.id.au> <20241002054826.1812844-5-david@gibson.dropbear.id.au> <20241009150721.63af48f6@elisabeth> <20241009224433.7fc28fc7@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="7EYjqzroF5yKY7c4" Content-Disposition: inline In-Reply-To: Message-ID-Hash: 5HISFDVJAGX2LWTFGDU2OCA6Y7ATNFMB X-Message-ID-Hash: 5HISFDVJAGX2LWTFGDU2OCA6Y7ATNFMB X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --7EYjqzroF5yKY7c4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote: > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote: > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote: > > > > On Wed, 9 Oct 2024 15:07:21 +0200 > > > > Stefano Brivio wrote: > > [snip] > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ct= x *c, uint8_t proto, > > > > > > (proto =3D=3D IPPROTO_TCP || proto =3D=3D IPPROTO_UDP)) { > > > > > > /* spliceable */ > > > > > > =20 > > > > > > - /* Preserve the specific loopback adddress used, but let the > > > > > > - * kernel pick a source port on the target side > > > > > > + /* The traffic will go over the guest's 'lo' interface, but = by > > > > > > + * default use its external address, so we don't inadvertent= ly > > > > > > + * expose services that listen only on the guest's loopback > > > > > > + * address. That can be overridden by --host-lo-to-ns-lo wh= ich > > > > > > + * will instead forward to the loopback address in the guest. > > > > > > + * > > > > > > + * In either case, let the kernel pick the source address to > > > > > > + * match. > > > > > > */ > > > > > > - tgt->oaddr =3D ini->eaddr; > > > > > > + if (inany_v4(&ini->eaddr)) { > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > + tgt->eaddr =3D inany_loopback4; > > > > > > + else > > > > > > + tgt->eaddr =3D inany_from_v4(c->ip4.addr_seen); > > > > > > + tgt->oaddr =3D inany_any4; > > > > > > + } else { > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > + tgt->eaddr =3D inany_loopback6; > > > > > > + else > > > > > > + tgt->eaddr.a6 =3D c->ip6.addr_seen; =20 > > > > >=20 > > > > > Either this... > > > > >=20 > > > > > > + tgt->oaddr =3D inany_any6; =20 > > > > >=20 > > > > > or this (and not something before this patch, up to 3/4) make the > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp = hang, > > > > > sometimes (about one in three/four runs), that's what I mistakenly > > > > > reported as coming from Laurent's series at: > > >=20 > > > Huh, interesting. Just got back from my leave and ran that group of > > > tests in a loop this afternoon, but didn't manage to reproduce. I > > > have administrivia that will probably fill the rest of this week, but > > > I'll look into this as soon as I can. > >=20 > > I reproduced the problem on passt.top, and I have a partial idea > > what's going on. As you say it's seeming like the address (addr_seen > > =3D=3D addr in this case) isn't properly ready. This is over splice, b= ut > > on the tap interface, I see the container sending NS messages for its > > own address - seems like it's doing DAD. But more importantly, we're > > answering those NS messages with NA messages, because we answer all > > NS. i.e. we're making the DAD fail. What I'm not sure of is how this > > ever worked at all. --config-net makes sense, since we disable DAD, > > but our test suite has always been using NDP+DHCP instead of > > --config-net. > >=20 > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, which > > happens most of the time and for IPv4 via ARP, which is used much more > > rarely. I think we need to be more selective in what NS or ARP > > lookups we resopnd to. The question is what approach to take: >=20 > Hmm... no.. there's more to this. >=20 > Usually DAD requests have :: as the source address, and we *do* > exclude those from getting replies. In this case though, we're > getting NS requests for the assigned address from what looks like the > SLAAC address. So, I do think it would be wise to explicitly exclude > these: we shouldn't be giving NA responses for an address that ought > to belong to the guest, even if it doesn't look like a DAD. >=20 > But, I'm not sure what's triggering this. Is for some reason the DHCP > address not "taking", so the container is trying to locate it on the > network instead? Or _is_ this DAD, but under some circumstances > rather than using :: as the source address it uses another configured > address. Ok.. I've understood a bit more. While timing is a factor here, it looks like the main reason I wasn't seeing it on my machine is what I'd consider a bug in the Debian version of the dhclient-script: when adding an IPv6 address, it returns without waiting for DAD to complete (i.e. for the address to be non-tentative). There's also an additional bug, which doesn't cause this problem, I think, but caused some problems when I was investigating. DHCPv6 needs the link-local SLAAC address already configured and non-tentative. The Fedora dhclient-script waits for that too at the PREINIT6 stage, but the Debian one doesn't, meaning if you attempt dhclient -6 immediately after starting the namespace it will fail to bind the UDP address it needs. I still think it's a good idea not to give NA messages for the guest assigned address, but we'll need a different workaround for this issue. I guess we'll have to manually wait for DAD to complete in the DHCP tests, which will be kind of mucky :/ --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --7EYjqzroF5yKY7c4 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmcPe74ACgkQzQJF27ox 2GdcFxAAk+I8y6MhH6FLyP2IGyaLRNy395vjjWfAEw9G7bzR1qyMOntucRoRZNyT NM6zUEEe471PcgCNhgn1/DgkWsGTOxcEsZUijrlX3eIJSkmyCm9voGCnCqYMR1jO n+zsMxG1AmnS8BERrungTJGa/gMSSbwKe4rzt1UuiEBzHn5Ki3+rMeMviX5P/oJM m8M8SBfmW96WzVKTCOC8ZGC5EDmePfru93H/+qLLZN5O3hBrsTD2yA4LRVk1uYse oSB2B4fLaP1bZujtndU1jXXCNxYuuFiivlirt5wjRNqanwt5rXb3/YrEFUsXRUaU ymMTekeyyK99jcnQ7jEkptHe8AIbWJ98B3iIdIB4/xar0TAxSH57JYErOK7FFEak iag212cE5kTQM27U4CEjDUYTucVJLDSxIoOPtDekocVD+8pXixk34Ot+sY3A8+xz ULeThsNBy27Oa09DsNq02DEPUkn0CL/iSuRtURJaDBgyjovvNp3aQvwjiZvZypuO iLMScq1r4maeMyQ/x20eYrfVSgzoX36HJwDjj2UMBac9ieTgZsRATo2KdHJCiqSH Cd9+HvhVGuXHwDokpQjE21o5wepPUQyL1ezj0gKMoRilt68duls5C5ex1e24S9ka ixv/yxFwAOVG87/q/lQb4gxR0Hoejluya+eFffPEcyQ3yKXPqa4= =t5Yg -----END PGP SIGNATURE----- --7EYjqzroF5yKY7c4--