On Thu, Oct 17, 2024 at 10:31:22AM +0200, Stefano Brivio wrote: > On Thu, 17 Oct 2024 12:19:58 +1100 > David Gibson wrote: > > > On Wed, Oct 16, 2024 at 05:26:48PM +0200, Stefano Brivio wrote: > > > On Wed, 16 Oct 2024 19:39:40 +1100 > > > David Gibson wrote: > > > > > > > On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote: > > > > > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote: > > > > > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: > > > > > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote: > > > > > > > > On Wed, 9 Oct 2024 15:07:21 +0200 > > > > > > > > Stefano Brivio wrote: > > > > > > [snip] > > > > > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > > > > > > > > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > > > > > > > > > /* spliceable */ > > > > > > > > > > > > > > > > > > > > - /* Preserve the specific loopback adddress used, but let the > > > > > > > > > > - * kernel pick a source port on the target side > > > > > > > > > > + /* The traffic will go over the guest's 'lo' interface, but by > > > > > > > > > > + * default use its external address, so we don't inadvertently > > > > > > > > > > + * expose services that listen only on the guest's loopback > > > > > > > > > > + * address. That can be overridden by --host-lo-to-ns-lo which > > > > > > > > > > + * will instead forward to the loopback address in the guest. > > > > > > > > > > + * > > > > > > > > > > + * In either case, let the kernel pick the source address to > > > > > > > > > > + * match. > > > > > > > > > > */ > > > > > > > > > > - tgt->oaddr = ini->eaddr; > > > > > > > > > > + if (inany_v4(&ini->eaddr)) { > > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > > + tgt->eaddr = inany_loopback4; > > > > > > > > > > + else > > > > > > > > > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > > > > > > > > > + tgt->oaddr = inany_any4; > > > > > > > > > > + } else { > > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > > + tgt->eaddr = inany_loopback6; > > > > > > > > > > + else > > > > > > > > > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > > > > > > > > > > > > > > > > > Either this... > > > > > > > > > > > > > > > > > > > + tgt->oaddr = inany_any6; > > > > > > > > > > > > > > > > > > or this (and not something before this patch, up to 3/4) make the > > > > > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang, > > > > > > > > > sometimes (about one in three/four runs), that's what I mistakenly > > > > > > > > > reported as coming from Laurent's series at: > > > > > > > > > > > > > > Huh, interesting. Just got back from my leave and ran that group of > > > > > > > tests in a loop this afternoon, but didn't manage to reproduce. I > > > > > > > have administrivia that will probably fill the rest of this week, but > > > > > > > I'll look into this as soon as I can. > > > > > > > > > > > > I reproduced the problem on passt.top, and I have a partial idea > > > > > > what's going on. As you say it's seeming like the address (addr_seen > > > > > > == addr in this case) isn't properly ready. This is over splice, but > > > > > > on the tap interface, I see the container sending NS messages for its > > > > > > own address - seems like it's doing DAD. But more importantly, we're > > > > > > answering those NS messages with NA messages, because we answer all > > > > > > NS. i.e. we're making the DAD fail. What I'm not sure of is how this > > > > > > ever worked at all. --config-net makes sense, since we disable DAD, > > > > > > but our test suite has always been using NDP+DHCP instead of > > > > > > --config-net. > > > > > > > > > > > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, which > > > > > > happens most of the time and for IPv4 via ARP, which is used much more > > > > > > rarely. I think we need to be more selective in what NS or ARP > > > > > > lookups we resopnd to. The question is what approach to take: > > > > > > > > > > Hmm... no.. there's more to this. > > > > > > > > > > Usually DAD requests have :: as the source address, and we *do* > > > > > exclude those from getting replies. In this case though, we're > > > > > getting NS requests for the assigned address from what looks like the > > > > > SLAAC address. So, I do think it would be wise to explicitly exclude > > > > > these: we shouldn't be giving NA responses for an address that ought > > > > > to belong to the guest, even if it doesn't look like a DAD. > > > > > > > > > > But, I'm not sure what's triggering this. Is for some reason the DHCP > > > > > address not "taking", so the container is trying to locate it on the > > > > > network instead? Or _is_ this DAD, but under some circumstances > > > > > rather than using :: as the source address it uses another configured > > > > > address. > > > > > > > > Ok.. I've understood a bit more. While timing is a factor here, it > > > > looks like the main reason I wasn't seeing it on my machine is what > > > > I'd consider a bug in the Debian version of the dhclient-script: > > > > when adding an IPv6 address, it returns without waiting for DAD to > > > > complete (i.e. for the address to be non-tentative). > > > > > > Oops. On one hand, I would feel inclined to propose a fix for the > > > Debian and Ubuntu packages. On the other hand, I wonder if it's > > > universally considered a bug: the DHCPv6 client did its job at that > > > point, and it's debatable whether dhclient should wait for the address > > > to be usable before forking to background. > > > > > > That is, arguably, the job of dhclient's is to request and configure an > > > address. It's not a network configuration daemon. There might be many > > > other reasons why that address is unusable, and yet dhclient is not > > > responsible for them. > > > > Hrm... I guess. Counterpoints.. > > - Most other failures to get a usable address will result in a > > visible error > > - dhclient has a --dad-wait-time option which seems to imply that the > > script should wait for DAD > > - The upstream script version waits for DAD > > > > In any case I filed a report for it > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231 > > > > > By the way, I guess it's just an issue for test scripts like this one. > > > > Why do you guess that? > > Because it's kind of rare that your address changes if you use DHCPv6, > I guess, so this would be relevant almost exclusively at boot. Hm, true. I was thinking of ephemeral containers where "boot" could be quite common.. but those will most likely use --config-net. > And, at boot, if a remote peer/client happens to try to connect to the > machine where the client is running right after an address was > assigned, it must have a retry mechanism almost for sure. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson