From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id A763F5A0281 for ; Mon, 22 May 2023 10:54:02 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4QPrpb383dz4x48; Mon, 22 May 2023 18:53:59 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1684745639; bh=L9axfAYnf4eo5i1gf6sbPIqCB40ljwgeqi2R6gt7kW4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IVckp+pmDwnQhKzMGOR502OmWM+AXjl1FcZtcReaGeGDihMOC6bjJelw3zqf3kqK4 gbm00VabkUOsVTfEUJXVbkcQ2CtPwYny8AvD+yVnumTixZxzOXjUgdYWDb2LBhNA5o WR6PIL/s9UwUlnZO3xx2cdGoJjSSFDnP/bhiTK5I= Date: Mon, 22 May 2023 18:42:01 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Message-ID: References: <20230521234224.2770015-1-sbrivio@redhat.com> <20230521234224.2770015-4-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="UyuUDrrWgpiSd2io" Content-Disposition: inline In-Reply-To: <20230521234224.2770015-4-sbrivio@redhat.com> Message-ID-Hash: NFVTY433W7JOXLMKLCNABS5GWDDCBLUV X-Message-ID-Hash: NFVTY433W7JOXLMKLCNABS5GWDDCBLUV X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Callum Parsey , me@yawnt.com, lemmi@nerd2nerd.org, Andrea Arcangeli X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --UyuUDrrWgpiSd2io Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote: > Instead of just fetching the default gateway and configuring a single > equivalent route in the target namespace, on 'pasta --config-net', it > might be desirable in some cases to copy the whole set of routes > corresponding to a given output interface. >=20 > For instance, in: > https://github.com/containers/podman/issues/18539 > IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VP= Ses >=20 > configuring the default gateway won't work without a gateway-less > route (specifying the output interface only), because the default > gateway is, somewhat dubiously, not on the same subnet as the > container. >=20 > This is a similar case to the one covered by commit 7656a6f88882 > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and > gateway"), and I'm not exactly proud of that workaround. >=20 > We also have: > https://bugs.passt.top/show_bug.cgi?id=3D49 > pasta does not work with tap-style interface >=20 > for which, eventually, we should be able to configure a gateway-less > route in the target namespace. >=20 > Introduce different operation modes for nl_route(), including a new > NL_DUP one, not exposed yet, which simply parrots back to the kernel > the route dump for a given interface from the outer namespace, fixing > up flags and interface indices on the way, and requesting to add the > same routes in the target namespace, on the interface we manage. >=20 > For n routes we want to duplicate, send n identical netlink requests > including the full dump: routes might depend on each other and the > kernel processes RTM_NEWROUTE messages sequentially, not atomically, > and repeating the full dump naturally resolves dependencies without > the need to actually calculate them. >=20 > I'm not kidding, it actually works pretty well. If there's a way to detect whether the kernel rejected some of the routes, it would be nice to cut that loop short as soon as all the routes are inserted. Obviously that could be a followup improvement, though. Reviewed-by: David Gibson >=20 > Link: https://github.com/containers/podman/issues/18539 > Link: https://bugs.passt.top/show_bug.cgi?id=3D49 > Signed-off-by: Stefano Brivio > --- > conf.c | 4 ++-- > netlink.c | 71 ++++++++++++++++++++++++++++++++++++++++++------------- > netlink.h | 9 ++++++- > pasta.c | 6 +++-- > 4 files changed, 68 insertions(+), 22 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index 984c3ce..1f6bbef 100644 > --- a/conf.c > +++ b/conf.c > @@ -646,7 +646,7 @@ static unsigned int conf_ip4(unsigned int ifi, > } > =20 > if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw)) > - nl_route(0, ifi, AF_INET, &ip4->gw); > + nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw); > =20 > if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) > nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL); > @@ -718,7 +718,7 @@ static unsigned int conf_ip6(unsigned int ifi, > } > =20 > if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw)) > - nl_route(0, ifi, AF_INET6, &ip6->gw); > + nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw); > =20 > nl_addr(0, ifi, AF_INET6, > IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL, > diff --git a/netlink.c b/netlink.c > index c07a13c..d93ecda 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -185,16 +185,16 @@ unsigned int nl_get_ext_if(sa_family_t af) > } > =20 > /** > - * nl_route() - Get/set default gateway for given interface and address = family > - * @ns: Use netlink socket in namespace > - * @ifi: Interface index > + * nl_route() - Get/set/copy routes for given interface and address fami= ly > + * @op: Requested operation > + * @ifi: Interface index in outer network namespace > + * @ifi_ns: Interface index in target namespace for NL_SET, NL_DUP > * @af: Address family > - * @gw: Default gateway to fill if zero, to set if not > + * @gw: Default gateway to fill on NL_GET, to set on NL_SET > */ > -void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw) > +void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns, > + sa_family_t af, void *gw) > { > - int set =3D (af =3D=3D AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(gw)) || > - (af =3D=3D AF_INET && *(uint32_t *)gw); > struct req_t { > struct nlmsghdr nlh; > struct rtmsg rtm; > @@ -215,7 +215,7 @@ void nl_route(int ns, unsigned int ifi, sa_family_t a= f, void *gw) > } r4; > } set; > } req =3D { > - .nlh.nlmsg_type =3D set ? RTM_NEWROUTE : RTM_GETROUTE, > + .nlh.nlmsg_type =3D op =3D=3D NL_SET ? RTM_NEWROUTE : RTM_GETROUTE, > .nlh.nlmsg_flags =3D NLM_F_REQUEST, > .nlh.nlmsg_seq =3D nl_seq++, > =20 > @@ -228,14 +228,15 @@ void nl_route(int ns, unsigned int ifi, sa_family_t= af, void *gw) > .rta.rta_len =3D RTA_LENGTH(sizeof(unsigned int)), > .ifi =3D ifi, > }; > + unsigned dup_routes =3D 0; > + ssize_t n, nlmsgs_size; > struct nlmsghdr *nh; > struct rtattr *rta; > - struct rtmsg *rtm; > char buf[NLBUFSIZ]; > - ssize_t n; > + struct rtmsg *rtm; > size_t na; > =20 > - if (set) { > + if (op =3D=3D NL_SET) { > if (af =3D=3D AF_INET6) { > size_t rta_len =3D RTA_LENGTH(sizeof(req.set.r6.d)); > =20 > @@ -269,31 +270,67 @@ void nl_route(int ns, unsigned int ifi, sa_family_t= af, void *gw) > req.nlh.nlmsg_flags |=3D NLM_F_DUMP; > } > =20 > - if ((n =3D nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set) > + if ((n =3D nl_req(op =3D=3D NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0) > + return; > + > + if (op =3D=3D NL_SET) > return; > =20 > nh =3D (struct nlmsghdr *)buf; > + nlmsgs_size =3D n; > + > for ( ; NLMSG_OK(nh, n); nh =3D NLMSG_NEXT(nh, n)) { > if (nh->nlmsg_type !=3D RTM_NEWROUTE) > goto next; > =20 > + if (op =3D=3D NL_DUP) { > + nh->nlmsg_seq =3D nl_seq++; > + nh->nlmsg_pid =3D 0; > + nh->nlmsg_flags &=3D ~NLM_F_DUMP_FILTERED; > + nh->nlmsg_flags |=3D NLM_F_REQUEST | NLM_F_ACK | > + NLM_F_CREATE; > + dup_routes++; > + } > + > rtm =3D (struct rtmsg *)NLMSG_DATA(nh); > - if (rtm->rtm_dst_len) > + if (op =3D=3D NL_GET && rtm->rtm_dst_len) > continue; > =20 > for (rta =3D RTM_RTA(rtm), na =3D RTM_PAYLOAD(nh); RTA_OK(rta, na); > rta =3D RTA_NEXT(rta, na)) { > - if (rta->rta_type !=3D RTA_GATEWAY) > - continue; > + if (op =3D=3D NL_GET) { > + if (rta->rta_type !=3D RTA_GATEWAY) > + continue; > =20 > - memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta)); > - return; > + memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta)); > + return; > + } > + > + if (op =3D=3D NL_DUP && rta->rta_type =3D=3D RTA_OIF) > + *(unsigned int *)RTA_DATA(rta) =3D ifi_ns; > } > =20 > next: > if (nh->nlmsg_type =3D=3D NLMSG_DONE) > break; > } > + > + if (op =3D=3D NL_DUP) { > + char resp[NLBUFSIZ]; > + unsigned i; > + > + nh =3D (struct nlmsghdr *)buf; > + /* Routes might have dependencies between each other, and the > + * kernel processes RTM_NEWROUTE messages sequentially. For n > + * valid routes, we might need to send up to n requests to get > + * all of them inserted. Routes that have been already inserted > + * won't cause the whole request to fail, so we can simply > + * repeat the whole request. This approach avoids the need to > + * calculate dependencies: let the kernel do that. > + */ > + for (i =3D 0; i < dup_routes; i++) > + nl_req(1, resp, nh, nlmsgs_size); > + } > } > =20 > /** > diff --git a/netlink.h b/netlink.h > index ca4d6ef..217cf1e 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -6,9 +6,16 @@ > #ifndef NETLINK_H > #define NETLINK_H > =20 > +enum nl_op { > + NL_GET, > + NL_SET, > + NL_DUP, > +}; > + > void nl_sock_init(const struct ctx *c, bool ns); > unsigned int nl_get_ext_if(sa_family_t af); > -void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw); > +void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns, > + sa_family_t af, void *gw); > void nl_addr(int ns, unsigned int ifi, sa_family_t af, > void *addr, int *prefix_len, void *addr_l); > void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu); > diff --git a/pasta.c b/pasta.c > index 2a6fb60..01109f5 100644 > --- a/pasta.c > +++ b/pasta.c > @@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c) > if (c->ifi4) { > nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr, > &c->ip4.prefix_len, NULL); > - nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw); > + nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET, > + &c->ip4.gw); > } > =20 > if (c->ifi6) { > int prefix_len =3D 64; > nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr, > &prefix_len, NULL); > - nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw); > + nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6, > + &c->ip6.gw); > } > } else { > nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --UyuUDrrWgpiSd2io Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmRrKtEACgkQzQJF27ox 2GfxSg/+J5ggiSAqIZaPjaE8brWdUBPkOZWJou0BfezTewtwLwy19S2u3bl+Ljnq E5ShWW1dGS9nYQh8aFOGx3CwaqM74xId8Kyar3oyYZz2+GHpvl2iHrxKEQ3ZDf9v RABgeS++72ilF+3ZZP5TSknc6enR7j9goCpGNunji15K5155uVPtIDIMpaufLe54 QbpUjtV1EmewTpdxtzCYbIxghqEG20pOe1mF9gHvZBrDu+m/QeUEgUCUJgsGxiNT 71izhiDhyAJLz5o9Rf0Xg/XAcBJZhiMCCg7gnzCzcedvsfUp38mO+uL3pYlwxoSN 2KJjkqFNu1r7/v9FG/piwRjACijHefIz08Ooz603S8aNJGOk9yBNNhZl6j9/fiWe S90Ism0dkVvR3Xo/4L5sZLEzt4bx5vOeb59Xg/CpWxrKFa7L/cG96sp9zIwOTZU7 1qrb/39si6+mOoB0wYqPrjDImkJbc5plbkQ74NH++BiUHINirWqtVuB7l5W9tBKk MA9fB/sUVM7x6VVfa4b/COczJ62V++WCEPKGNQandegpOeFhBBUQ02qnk8Fpu0ho KfXnehf2yoMLVzO7EBzAUdx5TIueXxUlUU95rYDzKCHrlek76hDfrlgcxRr4M6Qu pg7q8oVD0hBy2AxM+vNiTPMHD8d8FePTKWzYPYyiREeTYtkReKo= =qixZ -----END PGP SIGNATURE----- --UyuUDrrWgpiSd2io--