On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote: > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote: > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote: > > > > On Wed, 9 Oct 2024 15:07:21 +0200 > > > > Stefano Brivio wrote: > > [snip] > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > > > > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > > > > > /* spliceable */ > > > > > > > > > > > > - /* Preserve the specific loopback adddress used, but let the > > > > > > - * kernel pick a source port on the target side > > > > > > + /* The traffic will go over the guest's 'lo' interface, but by > > > > > > + * default use its external address, so we don't inadvertently > > > > > > + * expose services that listen only on the guest's loopback > > > > > > + * address. That can be overridden by --host-lo-to-ns-lo which > > > > > > + * will instead forward to the loopback address in the guest. > > > > > > + * > > > > > > + * In either case, let the kernel pick the source address to > > > > > > + * match. > > > > > > */ > > > > > > - tgt->oaddr = ini->eaddr; > > > > > > + if (inany_v4(&ini->eaddr)) { > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > + tgt->eaddr = inany_loopback4; > > > > > > + else > > > > > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > > > > > + tgt->oaddr = inany_any4; > > > > > > + } else { > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > + tgt->eaddr = inany_loopback6; > > > > > > + else > > > > > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > > > > > > > > > Either this... > > > > > > > > > > > + tgt->oaddr = inany_any6; > > > > > > > > > > or this (and not something before this patch, up to 3/4) make the > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang, > > > > > sometimes (about one in three/four runs), that's what I mistakenly > > > > > reported as coming from Laurent's series at: > > > > > > Huh, interesting. Just got back from my leave and ran that group of > > > tests in a loop this afternoon, but didn't manage to reproduce. I > > > have administrivia that will probably fill the rest of this week, but > > > I'll look into this as soon as I can. > > > > I reproduced the problem on passt.top, and I have a partial idea > > what's going on. As you say it's seeming like the address (addr_seen > > == addr in this case) isn't properly ready. This is over splice, but > > on the tap interface, I see the container sending NS messages for its > > own address - seems like it's doing DAD. But more importantly, we're > > answering those NS messages with NA messages, because we answer all > > NS. i.e. we're making the DAD fail. What I'm not sure of is how this > > ever worked at all. --config-net makes sense, since we disable DAD, > > but our test suite has always been using NDP+DHCP instead of > > --config-net. > > > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, which > > happens most of the time and for IPv4 via ARP, which is used much more > > rarely. I think we need to be more selective in what NS or ARP > > lookups we resopnd to. The question is what approach to take: > > Hmm... no.. there's more to this. > > Usually DAD requests have :: as the source address, and we *do* > exclude those from getting replies. In this case though, we're > getting NS requests for the assigned address from what looks like the > SLAAC address. So, I do think it would be wise to explicitly exclude > these: we shouldn't be giving NA responses for an address that ought > to belong to the guest, even if it doesn't look like a DAD. > > But, I'm not sure what's triggering this. Is for some reason the DHCP > address not "taking", so the container is trying to locate it on the > network instead? Or _is_ this DAD, but under some circumstances > rather than using :: as the source address it uses another configured > address. Ok.. I've understood a bit more. While timing is a factor here, it looks like the main reason I wasn't seeing it on my machine is what I'd consider a bug in the Debian version of the dhclient-script: when adding an IPv6 address, it returns without waiting for DAD to complete (i.e. for the address to be non-tentative). There's also an additional bug, which doesn't cause this problem, I think, but caused some problems when I was investigating. DHCPv6 needs the link-local SLAAC address already configured and non-tentative. The Fedora dhclient-script waits for that too at the PREINIT6 stage, but the Debian one doesn't, meaning if you attempt dhclient -6 immediately after starting the namespace it will fail to bind the UDP address it needs. I still think it's a good idea not to give NA messages for the guest assigned address, but we'll need a different workaround for this issue. I guess we'll have to manually wait for DAD to complete in the DHCP tests, which will be kind of mucky :/ -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson