From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=fail reason="key not found in DNS" header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202312 header.b=bw2x1GCt; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 316FC5A004C for ; Tue, 20 Aug 2024 02:49:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1724114993; bh=mg/gPKU/i42FW1xLbqF+MTdPfOFB+Dmk2/UZgpSlZUk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bw2x1GCtfwunoS+Vmy3es1cLQ8ns5BkX2Ij23jSobZShTQllQ5LQR8TqAy1c2ll94 G8ycotlt5ah9+t0wxdklrW+wUgL1jWpEC5neRdk7z1H+h1X4RX2SSLyCCfs+rLtjUj YyzAqqTdfxX8lRxsw0kWv0Y3bestRyQp1CnGGgXHvoy+SuManKXX7L8YRRH4Aiilo2 6SspQdHL/8HyzAFrrvlSFBeDtYoqnjgsyvrbTbTNLpmsoRHTw9wvFucS7fFell88Pz FoFY8RoWBYbne9z1z4UrCFwBSC68y0wgODTjV4SS2CmmjgkU4pnI0TlMVPtO4GvT2l 9Er8k6MhpfRNQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WnrTY3LPjz4w2F; Tue, 20 Aug 2024 10:49:53 +1000 (AEST) Date: Tue, 20 Aug 2024 10:42:17 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 00/22] RFC: Allow configuration of special case NATs Message-ID: References: <20240816054004.1335006-1-david@gibson.dropbear.id.au> <20240819112749.63d7476d@elisabeth> <20240819150100.3bbdbb3f@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="HJijYrL2+qcUnAE1" Content-Disposition: inline In-Reply-To: <20240819150100.3bbdbb3f@elisabeth> Message-ID-Hash: ZYN5LQHU2TSRC72V3P7R27RRVNC7PWX5 X-Message-ID-Hash: ZYN5LQHU2TSRC72V3P7R27RRVNC7PWX5 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --HJijYrL2+qcUnAE1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Aug 19, 2024 at 03:01:00PM +0200, Stefano Brivio wrote: > On Mon, 19 Aug 2024 19:52:49 +1000 > David Gibson wrote: >=20 > > On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote: > > > On Mon, 19 Aug 2024 18:46:31 +1000 > > > David Gibson wrote: > > > =20 > > > > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote: =20 > > > > > Based on Stefano's recent patch for faster tests. > > > > >=20 > > > > > Allow the user to specify which addresses are translated when use= d by > > > > > the guest, rather than always being the gateway address or nothin= g. > > > > > We also allow this remapping to go to the host's global address (= more > > > > > precisely the address assigned to the guest) rather than just host > > > > > loopback. > > > > >=20 > > > > > Suggestions for better names for the new options in patches 20 & = 22 > > > > > are most welcome. > > > > >=20 > > > > > Along the way to implementing that make many changes to clarify w= hat > > > > > various addresses we track mean, fixing a number of small bugs as > > > > > well. > > > > >=20 > > > > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp = perf > > > > > tests. I haven't managed to figure out why it's causing the prob= lem, > > > > > or even what the exact triggering conditions are (running the sin= gle > > > > > stalling iperf alone doesn't do it). Have to wrap up for today, = so I > > > > > thought I'd get this out for review anyway. =20 > > > >=20 > > > > I've identified the bug here. IMO, it's a pre-existing problem that > > > > only works by accident at the moment. The immediate fix is pretty > > > > obvious, but it raises some broader questions > > > >=20 > > > > The problem arises because of the MTU changes we make in order to t= est > > > > throughput with different packet sizes. Specifically we change the > > > > MTU to values < 1280, which implicitly disables IPv6 since it requi= res > > > > an MTU >=3D 1280. When we change the MTU back to a larger value IP= v6 is > > > > re-enabled, but some configuration has been lost in the meantime. > > > >=20 > > > > After the MTU is restored the guest reconfigures with NDP, but does > > > > not re-DHCPv6. That means the guest gets a SLAAC address in the ri= ght > > > > prefix but not the exact /128 address we've tried to assign to it. > > > > However, at least with the sequence of things we have in the tests, > > > > the guest never sends any packets with the new address, so passt > > > > doesn't update addr_seen. When the inbound connection comes we send > > > > it to the assigned address instead of the guest's actual address and > > > > the guest rejects it. =20 > > >=20 > > > I still have to take a closer look, but I'm fairly sure I hit a simil= ar > > > issue while I was writing these tests originally. I pondered > > > reconfiguring the address via DHCPv6, or using the keep_addr_on_down > > > sysctl (net.ipv6.conf..keep_addr_on_down), which was added > > > around that time. > > >=20 > > > Then: > > > =20 > > > > This "worked" previously, because before this patch, passt would > > > > translate the inbound connection to have source/dest as link-local > > > > addresses. =20 > > >=20 > > > ...I realised that this worked and forgot about the whole issue. > > > =20 > > > > We *do* have a current addr_ll_seen because (a) it won't > > > > change if the guest doesn't change MAC and (b) when IPv6 is re-enab= led > > > > the NDP traffic the guest generates will have link-local addresses > > > > that update addr_ll_seen. With this patch, and a global address for > > > > --map-host-loopback, we now need to send to addr_seen instead of > > > > addr_ll_seen, hence exposing the bug. > > > >=20 > > > > In the short term, the obvious fix would be to re-run dhclient -6 in > > > > the guest after we twiddle MTU but before running IPv6 tests. =20 > > >=20 > > > I guess setting keep_addr_on_down (even for "all" interfaces) should > > > work as well. =20 > >=20 > > Sounds like it. I wasn't aware of that one. > >=20 > > /me tests.. actually, no it doesn't work.. > >=20 > > # sysctl -a | grep keep_addr_on_down > > net.ipv6.conf.all.keep_addr_on_down =3D 1 > > net.ipv6.conf.default.keep_addr_on_down =3D 1 > > net.ipv6.conf.dummy0.keep_addr_on_down =3D 1 > > net.ipv6.conf.lo.keep_addr_on_down =3D 0 > > # ip addr add 2001:db8::1 dev dummy0 > > # ip a > > 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 10= 00 > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > 2: dummy0: mtu 1500 qdisc noop state DOWN group defau= lt qlen 1000 > > link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff > > inet6 2001:db8::1/128 scope global=20 > > valid_lft forever preferred_lft forever > > # ip link set dummy0 mtu 1200 > > # ip a > > 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 10= 00 > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > 2: dummy0: mtu 1200 qdisc noop state DOWN group defau= lt qlen 1000 > > link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff > > # ip link set dummy0 mtu 1500 > > # ip a > > 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 10= 00 > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > 2: dummy0: mtu 1500 qdisc noop state DOWN group defau= lt qlen 1000 > > link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff > >=20 > > My guess is that IPv6 being deconfigured because of an unsuitable MTU > > is considered a different event from a mere "down". >=20 > I guess it's because they're not IFA_F_PERMANENT, because > addrconf_permanent_addr() has: >=20 > case NETDEV_CHANGEMTU: > /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface.= */ > if (dev->mtu < IPV6_MIN_MTU) { > addrconf_ifdown(dev, dev !=3D net->loopback_dev); > break; > } >=20 > but addrconf_ifdown() does: >=20 > if (!keep_addr || > !(ifa->flags & IFA_F_PERMANENT) || > addr_is_local(&ifa->addr)) { > hlist_del_init_rcu(&ifa->addr_lst= ); > goto restart; > } >=20 > I'm not sure about the logic behind that. We could actually set those > addresses as permanent once the DHCPv6 client configures them, if it's > cleaner. Huh. Not in the passt/VM case, though, which is where I actually encountered this. > > > > This kind of opens a question about how hard we should try to > > > > accomodate guests which don't configure themselves how we told them= =2E =20 > > >=20 > > > There's a notable distinction between guests temporarily diverging (in > > > different ways) and guests we don't configure at all. =20 > >=20 > > I'm not really sure what you're getting at here. >=20 > In this case, it's not true that the guest doesn't configure itself in > the way we requested -- it's just a temporary diversion from that > configuration. Oh, I see. Assuming that at some point the DHCP client will re-run. > Those are different cases that we can handle in different ways, I > think. If it's a glitch that will only happen during testing, let's > work around that. >=20 > But if the guest really ignores DHCPv6 information, I think we should > keep that working. >=20 > > > It's probably more important to ensure we use the right type of addre= ss =20 > >=20 > > "type" in what sense here? >=20 > Global unicast instead of link-local. Ok. > > > (security) rather than ensuring we somehow manage to deliver packets = at > > > any time (minor glitch otherwise), also because the one you describe = is > > > something we're unlikely to hit outside of tests. > > > =20 > > > > Personally I'd be ok with saying that nothing works if the guest > > > > doesn't configure itself properly, thereby removing addr_seen and > > > > addr_ll_seen entirely. But I think, Stefano, you've been against t= hat > > > > idea in the past. =20 > > >=20 > > > Yes, I still think we should support guests that don't use DHCPv6 or > > > NDP at all, =20 > >=20 > > Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to > > manually configure the interface in the guest to match the address > > you've configured with -a. Just like you'd expect to have to > > correctly configure your address on a real network. >=20 > True, but if we make correctness as optional as possible, we'll be more > compatible (less time spent by users fixing situations that don't > necessarily need fixing, less time spent by developers to look into > reports, no matter who's at fault). Eh, maybe. Unless us trying to make sense of a nonsense situation causes some unpredictable behaviour that breaks something else. > > > or where related exchanges fail for any reason. It improves > > > reliability and compatibility at a small cost. In this case, I think > > > it's a nice feature that we would resume communicating as soon as the > > > guest shows its global unicast address. =20 > >=20 > > Hm, maybe. I'm not entirely convinced the cost is so small long term. > > It's pretty badly incompatible with having multiple guests behind the > > same passt instance: such as the initial guest bridging or routing to > > nested guests. >=20 > Why? We will need to hash the interface/guest index anyway, for > outbound flows. If we have separate interfaces for each guest, yes. But not if we have multiple guests behind a single tap because the initial guest sets up a bridge or routing. Then we have nothing but the address. > And for inbound flows, if a guest steals the address of another guest, > we'll give priority to the normal 'addr' versions instead of the > '_seen' ones, to decide how to direct traffic. I don't see how we'd know we're in this situation, so when to prioritise which address over the other. > > I'm actually not sure if encountering this bug makes me more or less > > in favour of addr_seen. On the one hand I think it highlights the > > flakiness of this approach; there are situations where we just won't > > know the right address. >=20 > I don't understand this argument: indeed, there are such situations, > and they are annoying. Why should we make them more common? Because predictability is good, and working _most_ of the time is a failure of predictability. > > On the other hand if shows a relatively > > plausible case where the guest won't get exactly the address we want > > it to (it uses NDP but not DHCPv6) > >=20 > > Hrm... actually this also shows a potential danger in the recent > > patches to disable DAD in the guest. With DAD enabled, when the guest > > grabs a new address, we'd expect it to emit DAD messages, which would > > have the side effect of updating our addr_seen (although I'm pretty > > sure I hit this patch before the nodad patches were applied, so that > > doesn't seem to be foolproof). >=20 > Well, but we do that for containers with --config-net only. In that > case, the addresses we configure have infinite lifetime anyway. Oh, good point. Hrm... then I'm unsure why the guest wasn't re-DADing its new address. > Besides, I don't think we need to have addr_seen updated as quickly and > correctly as possible just for the sake of it, we can also update it > when we get any other neighbour solicitation because the guest is > actually using the network. It's not meant to be perfect. If the guest is a pure server (a common case for containers AFAICT), then I don't know that we can expect NS messages for anything other than the default gateway, which is (typically) link-local and so won't help us to learn the new global address. > > We could maybe update addr_seen when we send RA messages to the guest > > - assuming that it will use the same host part (low 64-bits) for both > > link-local and global addresses. Not sure if that's a widely safe > > assumption or not. >=20 > I don't understand: what case are you trying to cover with this? A case just like the one in the tests: the interface bounces, and we get NDP traffic on the link-local address, but nothing on the global address before an inbound connection. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --HJijYrL2+qcUnAE1 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmbD5mkACgkQzQJF27ox 2Gc/9hAAgUzeaFLdTOX/FmPuzLVbfMZeW/oLgZSYyFlYFwtGmKIUK07k1lJ1WGeQ orZTBd/qDZ2kjvIswQBqVpj0sczQrbMaXfx4oHfrqt8BB386IAwm1STMiX88tE++ BSdY7Wr+O+weNBPKMGzla+Umqd+FMPrnS5AFTysiGROejj9MK6nkWgWNSD2vmdak 6/nmeF2936Yyj6lWfB05WxlyhpHoD7CbYruVxkf8Noj7PCFuIzyPDTpw4FUbZUo5 pY+mHA3EDoAzKp3mVX7CcXpkhdrUyyehHxUiM1boYu1cGAlpiPGinExCZowEpyHg /HkIBmDRxHA9JzKhJChCTNIPqQFhxAaiqJBL5/rkyl3XQFSJzzGo0fW06Zm//Aaj w1+qHdqD7y5XtUlz8OxbdD6mkeg6sBZfahS5pJEUd373wUBxIrIdNUzzM6h3FA88 1pbCVC3++niYYWDxYMZDU0fWhqWPJjFwJpyYHjDePw4/40lP/CgW765zogg6IKt/ 7ZmiRmZsLln4x/hc3GjV7N1LLAuEEwHVhuNuCo3Xv5pF+uGni5RCGVFTp7N4fmOq FQC8TkdKfwKrmrmFsUVTrjXRCLXbWZRIhNqbdtKDmho2+/R6JP8KanRlYWZx0SGd Lbp23+G0IJ8SG6BcD0gWX25GdhBh/EETDJ/44e2Gf/rXXwacO/U= =+Joz -----END PGP SIGNATURE----- --HJijYrL2+qcUnAE1--