From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202410 header.b=FNE000Ma; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 7410C5A004E for ; Mon, 21 Oct 2024 03:39:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202410; t=1729474766; bh=EMf6kCORcNyi4GsPT2VBwE8P0UKfqXoa4sCy/yjmlJM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FNE000MaKWtPkwXG6gOufpFCmDGPBV1ILZRwtcPxaAwlIuHs2IABpf0FltnVkq702 GPWrpMc0QfvqMIsc4brgcVFDMuLzR/Z4DTMHhwykEWKxa8yR6Q3dcf7FtafAD7APUd IGFyoaXJR4UkQT/eNgqtEgjoH2iCjYPG7wgeWQp+q7y8YzKrXDrbsNfHfz6/sJac+S RI77jtx+6a0ZXtgUhi+6eZErEuuA6QQYM669vQwrgpcxOd/uAeT8w/VzzhdR8grO4V 3xbpRLQCuB6lzaQRYw53cUan1sq2+Fuv1So78YukkVjYqpTpuQYQbdDBCb+uZ1rRIC pmMWvAolAU6EQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XWyf64mNLz4wcr; Mon, 21 Oct 2024 12:39:26 +1100 (AEDT) Date: Mon, 21 Oct 2024 12:35:33 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: References: <20241002054826.1812844-5-david@gibson.dropbear.id.au> <20241009150721.63af48f6@elisabeth> <20241009224433.7fc28fc7@elisabeth> <20241016172648.666b0f8c@elisabeth> <20241017103122.29b1afb0@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="EdW3Vd9OXTD9WReo" Content-Disposition: inline In-Reply-To: <20241017103122.29b1afb0@elisabeth> Message-ID-Hash: FRF3CZC4BXEXDDYQJPU4YZJVLQCXQJPC X-Message-ID-Hash: FRF3CZC4BXEXDDYQJPU4YZJVLQCXQJPC X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --EdW3Vd9OXTD9WReo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 17, 2024 at 10:31:22AM +0200, Stefano Brivio wrote: > On Thu, 17 Oct 2024 12:19:58 +1100 > David Gibson wrote: >=20 > > On Wed, Oct 16, 2024 at 05:26:48PM +0200, Stefano Brivio wrote: > > > On Wed, 16 Oct 2024 19:39:40 +1100 > > > David Gibson wrote: > > > =20 > > > > On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote: =20 > > > > > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote: = =20 > > > > > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: = =20 > > > > > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrot= e: =20 > > > > > > > > On Wed, 9 Oct 2024 15:07:21 +0200 > > > > > > > > Stefano Brivio wrote: =20 > > > > > > [snip] =20 > > > > > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const s= truct ctx *c, uint8_t proto, > > > > > > > > > > (proto =3D=3D IPPROTO_TCP || proto =3D=3D IPPROTO= _UDP)) { > > > > > > > > > > /* spliceable */ > > > > > > > > > > =20 > > > > > > > > > > - /* Preserve the specific loopback adddress used, but= let the > > > > > > > > > > - * kernel pick a source port on the target side > > > > > > > > > > + /* The traffic will go over the guest's 'lo' interfa= ce, but by > > > > > > > > > > + * default use its external address, so we don't ina= dvertently > > > > > > > > > > + * expose services that listen only on the guest's l= oopback > > > > > > > > > > + * address. That can be overridden by --host-lo-to-= ns-lo which > > > > > > > > > > + * will instead forward to the loopback address in t= he guest. > > > > > > > > > > + * > > > > > > > > > > + * In either case, let the kernel pick the source ad= dress to > > > > > > > > > > + * match. > > > > > > > > > > */ > > > > > > > > > > - tgt->oaddr =3D ini->eaddr; > > > > > > > > > > + if (inany_v4(&ini->eaddr)) { > > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > > + tgt->eaddr =3D inany_loopback4; > > > > > > > > > > + else > > > > > > > > > > + tgt->eaddr =3D inany_from_v4(c->ip4.addr_seen); > > > > > > > > > > + tgt->oaddr =3D inany_any4; > > > > > > > > > > + } else { > > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > > + tgt->eaddr =3D inany_loopback6; > > > > > > > > > > + else > > > > > > > > > > + tgt->eaddr.a6 =3D c->ip6.addr_seen; =20 > > > > > > > > >=20 > > > > > > > > > Either this... > > > > > > > > > =20 > > > > > > > > > > + tgt->oaddr =3D inany_any6; =20 > > > > > > > > >=20 > > > > > > > > > or this (and not something before this patch, up to 3/4) = make the > > > > > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pa= sta/tcp hang, > > > > > > > > > sometimes (about one in three/four runs), that's what I m= istakenly > > > > > > > > > reported as coming from Laurent's series at: =20 > > > > > > >=20 > > > > > > > Huh, interesting. Just got back from my leave and ran that g= roup of > > > > > > > tests in a loop this afternoon, but didn't manage to reproduc= e. I > > > > > > > have administrivia that will probably fill the rest of this w= eek, but > > > > > > > I'll look into this as soon as I can. =20 > > > > > >=20 > > > > > > I reproduced the problem on passt.top, and I have a partial idea > > > > > > what's going on. As you say it's seeming like the address (add= r_seen > > > > > > =3D=3D addr in this case) isn't properly ready. This is over s= plice, but > > > > > > on the tap interface, I see the container sending NS messages f= or its > > > > > > own address - seems like it's doing DAD. But more importantly,= we're > > > > > > answering those NS messages with NA messages, because we answer= all > > > > > > NS. i.e. we're making the DAD fail. What I'm not sure of is h= ow this > > > > > > ever worked at all. --config-net makes sense, since we disable= DAD, > > > > > > but our test suite has always been using NDP+DHCP instead of > > > > > > --config-net. > > > > > >=20 > > > > > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, whi= ch > > > > > > happens most of the time and for IPv4 via ARP, which is used mu= ch more > > > > > > rarely. I think we need to be more selective in what NS or ARP > > > > > > lookups we resopnd to. The question is what approach to take: = =20 > > > > >=20 > > > > > Hmm... no.. there's more to this. > > > > >=20 > > > > > Usually DAD requests have :: as the source address, and we *do* > > > > > exclude those from getting replies. In this case though, we're > > > > > getting NS requests for the assigned address from what looks like= the > > > > > SLAAC address. So, I do think it would be wise to explicitly exc= lude > > > > > these: we shouldn't be giving NA responses for an address that ou= ght > > > > > to belong to the guest, even if it doesn't look like a DAD. > > > > >=20 > > > > > But, I'm not sure what's triggering this. Is for some reason the= DHCP > > > > > address not "taking", so the container is trying to locate it on = the > > > > > network instead? Or _is_ this DAD, but under some circumstances > > > > > rather than using :: as the source address it uses another config= ured > > > > > address. =20 > > > >=20 > > > > Ok.. I've understood a bit more. While timing is a factor here, it > > > > looks like the main reason I wasn't seeing it on my machine is what > > > > I'd consider a bug in the Debian version of the dhclient-script: > > > > when adding an IPv6 address, it returns without waiting for DAD to > > > > complete (i.e. for the address to be non-tentative). =20 > > >=20 > > > Oops. On one hand, I would feel inclined to propose a fix for the > > > Debian and Ubuntu packages. On the other hand, I wonder if it's > > > universally considered a bug: the DHCPv6 client did its job at that > > > point, and it's debatable whether dhclient should wait for the address > > > to be usable before forking to background. > > >=20 > > > That is, arguably, the job of dhclient's is to request and configure = an > > > address. It's not a network configuration daemon. There might be many > > > other reasons why that address is unusable, and yet dhclient is not > > > responsible for them. =20 > >=20 > > Hrm... I guess. Counterpoints.. > > - Most other failures to get a usable address will result in a > > visible error > > - dhclient has a --dad-wait-time option which seems to imply that the > > script should wait for DAD > > - The upstream script version waits for DAD > >=20 > > In any case I filed a report for it > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D1085231 > >=20 > > > By the way, I guess it's just an issue for test scripts like this one= =2E =20 > >=20 > > Why do you guess that? >=20 > Because it's kind of rare that your address changes if you use DHCPv6, > I guess, so this would be relevant almost exclusively at boot. Hm, true. I was thinking of ephemeral containers where "boot" could be quite common.. but those will most likely use --config-net. > And, at boot, if a remote peer/client happens to try to connect to the > machine where the client is running right after an address was > assigned, it must have a retry mechanism almost for sure. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --EdW3Vd9OXTD9WReo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmcVr+EACgkQzQJF27ox 2Gc4jA/7BnrnN3WEFGy5ClVuCvJ9r533pr5Ec+9eeTShG5s4mxdaQ17bY1Jleqkv xJ+o4zVqhcIbQ9SFDn8j3UuOxtdZycNIaV9D9dqtSwnjAuU2CnQ5hOkv28s8RACn CYpxq0aNEUJf5/N+vMBeMZZvCDBaTAHNCCf1iXReeTPv/O2AsxFYemC/kWIIEumQ wU5xIdiG+Uh9jnxyQMRS501syzaw4Q0jKUFtGhv+cLV9SJDkpuJ83GWQiQ2CoBEU 1Hxelpe7JuEGsIPWXsiF4FbfJFzCHHed0ll5SArNadSk9DhQ2WsMxV23FS5hqJKo SmFd995C5tYb9+EXuK6KxA7dnOeZ+ZfSAIMxMQzJjDDb0geoGEx60pu3o9c/4hM9 t0PplePFLBd9HJnM9bUoWWbkUxSejbdgch05Dm5jHm/esL444R5BFc1dpci7SwoZ LVHu0Xvcj596IDzv6LifG6xx4hVg0W4OwvLImf2VgaG+zXzJ0cnAhoeM5GS1HR5i /BZ2gu47EH8ezIq3SZrsaIPQaliFpv6JBHL0dx/9fQGdJwS2XmcLd/UvGPcb3x+6 qF6JwLyoYGHP2F+lV3R/Qw9xlHkRS09VLOgPJefhshcTf7z8XPX3nf07W/CB+nO2 BVM6gxIj1baG0tqbeMbnpBWkfIYl0CK7NZDCxjB84eGWbVFhkmA= =F0pY -----END PGP SIGNATURE----- --EdW3Vd9OXTD9WReo--