On Tue, 19 Jul 2022 16:20:45 +1000 David Gibson wrote: > On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote: > > By default, passt itself attaches to the first host interface with a > > default route. However, when determining the host interface name the tests > > implicitly select the *last* host interface: they use a jq expression which > > will list all interfaces with default routes, but the way output detection > > works in the scripts, it will only pick up the last line. > > > > If there are multiple interfaces with default routes on the host, and they > > each have a different address, this can cause spurious test > > failures. > > It seems this change is not enough to always fix the tests when there > are multiple default routes. I'm still sometimes getting failures, > now because passt itself doesn't seem to be picking the interface > with the first default route. > > I'm wondering if this is because ip(8) is sorting the output, not just > presenting it in the same order that the underlying netlink interface > does. I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c). The netlink "filter", though, is slightly different. For IPv4: $ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156 $ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...] and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now). But letting that aside for a moment: if you have two default routes, I suppose they have different metrics. If not, what's the intended usage? If yes, we should probably implement a sorting logic in passt, so that the route with the lowest metric is picked, and then adjust the jq expression to also pick that one. -- Stefano