On Wed, Jul 20, 2022 at 10:23:22AM +0200, Stefano Brivio wrote: > On Wed, 20 Jul 2022 17:24:18 +1000 > David Gibson wrote: > > > On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote: > > > On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote: > > > > On Tue, 19 Jul 2022 16:20:45 +1000 > > > > David Gibson wrote: > > > > > > > > > On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote: > > > > > > By default, passt itself attaches to the first host interface with a > > > > > > default route. However, when determining the host interface name the tests > > > > > > implicitly select the *last* host interface: they use a jq expression which > > > > > > will list all interfaces with default routes, but the way output detection > > > > > > works in the scripts, it will only pick up the last line. > > > > > > > > > > > > If there are multiple interfaces with default routes on the host, and they > > > > > > each have a different address, this can cause spurious test > > > > > > failures. > > > > > > > > > > It seems this change is not enough to always fix the tests when there > > > > > are multiple default routes. I'm still sometimes getting failures, > > > > > now because passt itself doesn't seem to be picking the interface > > > > > with the first default route. > > > > > > > > > > I'm wondering if this is because ip(8) is sorting the output, not just > > > > > presenting it in the same order that the underlying netlink interface > > > > > does. > > > > > > > > I don't see that happening at least in my environment (and I also can't > > > > see any code that would sort it, that pretty much comes from > > > > rtnl_dump_filter_l() in lib/libnetlink.c). > > > > > > > > The netlink "filter", though, is slightly different. For IPv4: > > > > > > > > $ strace -e sendto ip route show >/dev/null > > > > sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156 > > > > > > > > $ strace -e sendto ./passt -f > > > > sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 > > > > [...] > > > > > > > > and, while I don't think the FIB trie is actually descended in a > > > > different way, we might still have a slightly different result from the > > > > kernel (if I recall correctly, I didn't check right now). > > > > So, I've been staring at this for most of the day. I don't think the > > difference is because of the slightly different filters. Instead it's > > an easy to miss subtletly in nl_get_ext_if(). > > > > In nl_get_ext_if() both first_v4 and first_v6 will be set to the > > interface which meets the criteria and comes first in the order of > > routes returned by netlink from the kernel. However, there's this early return: > > > > word = (long *)has_v4; > > for (i = 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { > > tmp = *word; > > while ((n = ffsl(tmp))) { > > int ifi = i * sizeof(long) * 8 + n - 1; > > > > if (!first_v4) > > first_v4 = ifi; > > > > tmp &= ~(1UL << (n - 1)); > > if (bitmap_isset(has_v6, ifi)) { > > *v4 = *v6 = IP_VERSION_ENABLED; > > return ifi; <================ HERE > > } > > } > > } > > > > Which means that if we find an interface with both a v4 and a v6 > > default route, we'll return it immediately. That loop is across the > > has_v4 bitmap, so we'll return the first one in *interface index order*, > > which might not necessarily be the same as the order the routes are > > returned from netlink. > > Whoops :) ...right, sorry, I forgot about that. > > > In the other cases: one of v4 or v6 is disabled, or there's no > > interface with both v4 and v6 default routes, then we'll end up > > returning the first interface in netlink route order. Which makes > > passt's overall priority of selecting interfaces pretty hard to > > follow. > > > > I've been thinking a bunch about what the order should be. The > > obvious cleanup to the current logic would be: > > 1. If there are >0 interfaces with both default v4 and v6 > > routes, pick the one with the lowest metric > > 2. Otherwise if there are >0 interfaces with a default v4 > > route, pick the one with the lowest metric > > 3. Otherwise if there are >0 interfaces with a default v6 > > route, pick the one with the lowest metric > > 4. Otherwise fail. > > > > ..but, I think that's the wrong choice. For one thing it gets weird > > in the unlikely but possible case that there are two interfaces with > > both v4 and v6 routes, but one has a lower v4 metric and the other has > > a lower v6 metric. But more importantly, I think it fails the "do > > what you want by default" test in the case of a host that has both v4 > > and v6 routing, but not on the same interface. That's quite a > > realistic situation if the v6 connectivity is coming from a tunnel > > (6in4 or 6rd or 6over4 or whatever). > > Definitely, I didn't consider the tunneled IPv6 case, but it looks > still rather real even nowadays. Yeah, I gather IPv6 is finally fairly standard in Germany, but here in Australia I'm with basically the only residential ISP which provides v6. > > Instead, I think we should separately pick an external interface for > > v4 and v6 connectivity. Yes, it's weird to have a view of two > > different host interfaces on the same virtual interface. It also > > means pasta has to arbitrarily pick one of the host interface names > > for the namespace, which will make the tests a bit uglier. But it > > makes picking the right interfaces a lot simpler and clearer, and does > > what you probably want for the case of a host with tunnelled ipv6 > > connectivity. > > I totally agree. It's still less weird than the alternative. Excellent. > We'll need to store two interface indices (say, "ifi4" and "ifi6"), but > their usage right now is almost entirely limited to configuration and > netlink functions. Yes, I looked through all the uses of c->ifi, and it looks fairly straightforward. > One additional usage is to select a given interface for sockets bound > to link-local addresses by means of setting .sin6_scope_id in struct > sockaddr_in6 -- that happens in tcp_conn_from_tap() and sock_l4(). > There, we just need to switch to the new "ifi6" index. > > Let me know if I should start looking into this or if you can do it > instead. I'll work on it. Hoping I can get that done before heading on holiday next week. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson