On Tue, May 23, 2023 at 08:14:07AM +0200, Stefano Brivio wrote: > On Tue, 23 May 2023 13:08:21 +1000 > David Gibson wrote: > > > On Mon, May 22, 2023 at 11:58:51AM +0200, Stefano Brivio wrote: > > > On Mon, 22 May 2023 18:42:01 +1000 > > > David Gibson wrote: > > > > > > > On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote: > > > > > Instead of just fetching the default gateway and configuring a single > > > > > equivalent route in the target namespace, on 'pasta --config-net', it > > > > > might be desirable in some cases to copy the whole set of routes > > > > > corresponding to a given output interface. > > > > > > > > > > For instance, in: > > > > > https://github.com/containers/podman/issues/18539 > > > > > IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes > > > > > > > > > > configuring the default gateway won't work without a gateway-less > > > > > route (specifying the output interface only), because the default > > > > > gateway is, somewhat dubiously, not on the same subnet as the > > > > > container. > > > > > > > > > > This is a similar case to the one covered by commit 7656a6f88882 > > > > > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and > > > > > gateway"), and I'm not exactly proud of that workaround. > > > > > > > > > > We also have: > > > > > https://bugs.passt.top/show_bug.cgi?id=49 > > > > > pasta does not work with tap-style interface > > > > > > > > > > for which, eventually, we should be able to configure a gateway-less > > > > > route in the target namespace. > > > > > > > > > > Introduce different operation modes for nl_route(), including a new > > > > > NL_DUP one, not exposed yet, which simply parrots back to the kernel > > > > > the route dump for a given interface from the outer namespace, fixing > > > > > up flags and interface indices on the way, and requesting to add the > > > > > same routes in the target namespace, on the interface we manage. > > > > > > > > > > For n routes we want to duplicate, send n identical netlink requests > > > > > including the full dump: routes might depend on each other and the > > > > > kernel processes RTM_NEWROUTE messages sequentially, not atomically, > > > > > and repeating the full dump naturally resolves dependencies without > > > > > the need to actually calculate them. > > > > > > > > > > I'm not kidding, it actually works pretty well. > > > > > > > > If there's a way to detect whether the kernel rejected some of the > > > > routes, it would be nice to cut that loop short as soon as all the > > > > routes are inserted. Obviously that could be a followup improvement, > > > > though. > > > > > > Yes, there's a way, but to keep things asynchronous in a simple way we > > > process errors from nl_req() only at the next request. > > > > > > This part doesn't really need to be asynchronous, though: we could add > > > a flag for nl_req() saying that we want to know about NLMSG_ERROR right > > > away. This looks relatively straightforward, and already an improvement > > > in the sense you mentioned. > > > > > > Actually parsing the error and finding out the offending route is a bit > > > more complicated, though. > > > > Right, but we don't necessarily need to do that: all we need is that > > if there are *no* errors we can stop the loop early. > > Yes yes, that's what I meant with the paragraph before. > > By the way, note that in general we'll get EEXIST in the "extended ACK" > for any message we send, because we just inserted addresses that > already created their prefix routes. Ah, right. That might make it more trouble than it's worth. > We could think of setting the IFA_F_NOPREFIXROUTE flag on addresses, on > NL_DUP in nl_addr(), or even always, to avoid this. > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson