From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 0B8655A0270 for ; Sat, 27 May 2023 04:26:33 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4QSlz85y7zz4x2k; Sat, 27 May 2023 12:26:28 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1685154388; bh=PhZ8vU/XK7BI93HX4SacrCVa0ww8bgVD3TQkSsDW4JI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ZPK9FfAjvkGO/Al79wr5yw4u1lWGNd/Laboq8woodSHnYXliPx+DyIBIIYFRDk5X9 32E7Mx6lvns3bAPv/EtAPSAIuByqBe5MJn23zxuJrYQXRJCx8CWQKQD00F1UFyaL94 apIUGcWcKD4FvYChsUKN+oG8aE/UyX6uxEsJ0LSo= Date: Sat, 27 May 2023 12:06:21 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Message-ID: References: <20230521234224.2770015-1-sbrivio@redhat.com> <20230521234224.2770015-4-sbrivio@redhat.com> <20230522115851.2c7f7736@elisabeth> <20230523081345.4f0a0274@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="gQUmkxT15Z8CFIHB" Content-Disposition: inline In-Reply-To: <20230523081345.4f0a0274@elisabeth> Message-ID-Hash: DMO6CRU75KK5V3BC7SV3EE2K6GKYXERR X-Message-ID-Hash: DMO6CRU75KK5V3BC7SV3EE2K6GKYXERR X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Callum Parsey , me@yawnt.com, lemmi@nerd2nerd.org, Andrea Arcangeli X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --gQUmkxT15Z8CFIHB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 23, 2023 at 08:14:07AM +0200, Stefano Brivio wrote: > On Tue, 23 May 2023 13:08:21 +1000 > David Gibson wrote: >=20 > > On Mon, May 22, 2023 at 11:58:51AM +0200, Stefano Brivio wrote: > > > On Mon, 22 May 2023 18:42:01 +1000 > > > David Gibson wrote: > > > =20 > > > > On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote: =20 > > > > > Instead of just fetching the default gateway and configuring a si= ngle > > > > > equivalent route in the target namespace, on 'pasta --config-net'= , it > > > > > might be desirable in some cases to copy the whole set of routes > > > > > corresponding to a given output interface. > > > > >=20 > > > > > For instance, in: > > > > > https://github.com/containers/podman/issues/18539 > > > > > IPv4 Default Route Does Not Propagate to Pasta Containers on He= tzner VPSes > > > > >=20 > > > > > configuring the default gateway won't work without a gateway-less > > > > > route (specifying the output interface only), because the default > > > > > gateway is, somewhat dubiously, not on the same subnet as the > > > > > container. > > > > >=20 > > > > > This is a similar case to the one covered by commit 7656a6f88882 > > > > > ("conf: Adjust netmask on mismatch between IPv4 address/netmask a= nd > > > > > gateway"), and I'm not exactly proud of that workaround. > > > > >=20 > > > > > We also have: > > > > > https://bugs.passt.top/show_bug.cgi?id=3D49 > > > > > pasta does not work with tap-style interface > > > > >=20 > > > > > for which, eventually, we should be able to configure a gateway-l= ess > > > > > route in the target namespace. > > > > >=20 > > > > > Introduce different operation modes for nl_route(), including a n= ew > > > > > NL_DUP one, not exposed yet, which simply parrots back to the ker= nel > > > > > the route dump for a given interface from the outer namespace, fi= xing > > > > > up flags and interface indices on the way, and requesting to add = the > > > > > same routes in the target namespace, on the interface we manage. > > > > >=20 > > > > > For n routes we want to duplicate, send n identical netlink reque= sts > > > > > including the full dump: routes might depend on each other and the > > > > > kernel processes RTM_NEWROUTE messages sequentially, not atomical= ly, > > > > > and repeating the full dump naturally resolves dependencies witho= ut > > > > > the need to actually calculate them. > > > > >=20 > > > > > I'm not kidding, it actually works pretty well. =20 > > > >=20 > > > > If there's a way to detect whether the kernel rejected some of the > > > > routes, it would be nice to cut that loop short as soon as all the > > > > routes are inserted. Obviously that could be a followup improvemen= t, > > > > though. =20 > > >=20 > > > Yes, there's a way, but to keep things asynchronous in a simple way we > > > process errors from nl_req() only at the next request. > > >=20 > > > This part doesn't really need to be asynchronous, though: we could add > > > a flag for nl_req() saying that we want to know about NLMSG_ERROR rig= ht > > > away. This looks relatively straightforward, and already an improveme= nt > > > in the sense you mentioned. > > >=20 > > > Actually parsing the error and finding out the offending route is a b= it > > > more complicated, though. =20 > >=20 > > Right, but we don't necessarily need to do that: all we need is that > > if there are *no* errors we can stop the loop early. >=20 > Yes yes, that's what I meant with the paragraph before. >=20 > By the way, note that in general we'll get EEXIST in the "extended ACK" > for any message we send, because we just inserted addresses that > already created their prefix routes. Ah, right. That might make it more trouble than it's worth. > We could think of setting the IFA_F_NOPREFIXROUTE flag on addresses, on > NL_DUP in nl_addr(), or even always, to avoid this. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --gQUmkxT15Z8CFIHB Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmRxZZcACgkQzQJF27ox 2GchaxAAoe45ES8DdUKlAnho4zMuGAKeYguLAz1QteWYDWuyMDmjT52btFBse9X7 Kibfw4xK1B6pX1liF1yET+xWGbaaa7plIrXE1sN/vV5o5jdvNtHYI29EBTokJGr+ lBBJ9CRgFdbSAL6k8McmJWr0XaE2X8JEFDWLKPEbZs9F1So9y7hOjcn9HNkr6wiY sQWPufkN8kPvaMz9e1RxXyXQhpcxjZhnn5+lkf/uatQaAAntfYzGJqNLfNWUPzAw dc9c/18lpfzOzLRA4JdOoWKREoYuLORirWjCbLFM27V4nLSgcke65OGflMxgQgSl MlUVR/jvAt0mFCyOF2KN6bElmPJx0jKsU1OKiJ0mkEX/kOM6q2rohxxGhHp58uFe XUTO3tqskvWLjr121hXNC0tuPbgZsA1puhwMCK6FCP5IphbtAknNhdiaMiyHrYlu ZTn2UCaeEM/umA+iU1KI4maShfrd2gsCR8DSYfigiLmSLuk8T4DeVIiIUbn+qpnK rtD5bCJV4bmpkp1WSJOTdD3TXylLMtOc8pDfigP6MKuqiry81stdEBm8HxU39SBb JHR28ICeuZw3NCwCmuHTpMjcpt2j+kMA9OE4CDW2FJkuAKpNew1tJs6ZiSIZTz8T YqOi5QKciVbu1WMNY5tJ1YxmbJqZmzGgtD4ZgDaG2GduGHtm7b8= =5Gbz -----END PGP SIGNATURE----- --gQUmkxT15Z8CFIHB--