From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson To: passt-dev@passt.top Subject: Re: [PATCH 17/18] tests: Correct determination of host interface name in tests Date: Wed, 20 Jul 2022 17:24:18 +1000 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5093084521349801007==" --===============5093084521349801007== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote: > On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote: > > On Tue, 19 Jul 2022 16:20:45 +1000 > > David Gibson wrote: > >=20 > > > On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote: > > > > By default, passt itself attaches to the first host interface with a > > > > default route. However, when determining the host interface name the= tests > > > > implicitly select the *last* host interface: they use a jq expression= which > > > > will list all interfaces with default routes, but the way output dete= ction > > > > works in the scripts, it will only pick up the last line. > > > >=20 > > > > If there are multiple interfaces with default routes on the host, and= they > > > > each have a different address, this can cause spurious test > > > > failures. =20 > > >=20 > > > It seems this change is not enough to always fix the tests when there > > > are multiple default routes. I'm still sometimes getting failures, > > > now because passt itself doesn't seem to be picking the interface > > > with the first default route. > > >=20 > > > I'm wondering if this is because ip(8) is sorting the output, not just > > > presenting it in the same order that the underlying netlink interface > > > does. > >=20 > > I don't see that happening at least in my environment (and I also can't > > see any code that would sort it, that pretty much comes from > > rtnl_dump_filter_l() in lib/libnetlink.c). > >=20 > > The netlink "filter", though, is slightly different. For IPv4: > >=20 > > $ strace -e sendto ip route show >/dev/null > > sendto(3, [{{len=3D36, type=3DRTM_GETROUTE, flags=3DNLM_F_REQUEST|NLM_F_D= UMP, seq=3D1658257027, pid=3D0}, {rtm_family=3DAF_INET, rtm_dst_len=3D0, rtm_= src_len=3D0, rtm_tos=3D0, rtm_table=3DRT_TABLE_UNSPEC, rtm_protocol=3DRTPROT_= UNSPEC, rtm_scope=3DRT_SCOPE_UNIVERSE, rtm_type=3DRTN_UNSPEC, rtm_flags=3D0},= {{nla_len=3D8, nla_type=3DRTA_TABLE}, RT_TABLE_MAIN}}, {len=3D0, type=3D0 /*= NLMSG_??? */, flags=3D0, seq=3D0, pid=3D0}], 156, 0, NULL, 0) =3D 156 > >=20 > > $ strace -e sendto ./passt -f > > sendto(5, {{len=3D28, type=3DRTM_GETROUTE, flags=3DNLM_F_REQUEST|NLM_F_DU= MP, seq=3D0, pid=3D0}, {rtm_family=3DAF_INET, rtm_dst_len=3D0, rtm_src_len=3D= 0, rtm_tos=3D0, rtm_table=3DRT_TABLE_MAIN, rtm_protocol=3DRTPROT_UNSPEC, rtm_= scope=3DRT_SCOPE_UNIVERSE, rtm_type=3DRTN_UNICAST, rtm_flags=3D0}}, 28, 0, NU= LL, 0) =3D 28 > > [...] > >=20 > > and, while I don't think the FIB trie is actually descended in a > > different way, we might still have a slightly different result from the > > kernel (if I recall correctly, I didn't check right now). So, I've been staring at this for most of the day. I don't think the difference is because of the slightly different filters. Instead it's an easy to miss subtletly in nl_get_ext_if(). In nl_get_ext_if() both first_v4 and first_v6 will be set to the interface which meets the criteria and comes first in the order of routes returned by netlink from the kernel. However, there's this early retu= rn: word =3D (long *)has_v4; for (i =3D 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { tmp =3D *word; while ((n =3D ffsl(tmp))) { int ifi =3D i * sizeof(long) * 8 + n - 1; if (!first_v4) first_v4 =3D ifi; tmp &=3D ~(1UL << (n - 1)); if (bitmap_isset(has_v6, ifi)) { *v4 =3D *v6 =3D IP_VERSION_ENABLED; return ifi; <=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D HERE } } } Which means that if we find an interface with both a v4 and a v6 default route, we'll return it immediately. That loop is across the has_v4 bitmap, so we'll return the first one in *interface index order*, which might not necessarily be the same as the order the routes are returned from netlink. In the other cases: one of v4 or v6 is disabled, or there's no interface with both v4 and v6 default routes, then we'll end up returning the first interface in netlink route order. Which makes passt's overall priority of selecting interfaces pretty hard to follow. I've been thinking a bunch about what the order should be. The obvious cleanup to the current logic would be: 1. If there are >0 interfaces with both default v4 and v6 routes, pick the one with the lowest metric 2. Otherwise if there are >0 interfaces with a default v4 route, pick the one with the lowest metric 3. Otherwise if there are >0 interfaces with a default v6 route, pick the one with the lowest metric 4. Otherwise fail. ..but, I think that's the wrong choice. For one thing it gets weird in the unlikely but possible case that there are two interfaces with both v4 and v6 routes, but one has a lower v4 metric and the other has a lower v6 metric. But more importantly, I think it fails the "do what you want by default" test in the case of a host that has both v4 and v6 routing, but not on the same interface. That's quite a realistic situation if the v6 connectivity is coming from a tunnel (6in4 or 6rd or 6over4 or whatever). Instead, I think we should separately pick an external interface for v4 and v6 connectivity. Yes, it's weird to have a view of two different host interfaces on the same virtual interface. It also means pasta has to arbitrarily pick one of the host interface names for the namespace, which will make the tests a bit uglier. But it makes picking the right interfaces a lot simpler and clearer, and does what you probably want for the case of a host with tunnelled ipv6 connectivity. > > But letting that aside for a moment: if you have two default routes, I > > suppose they have different metrics. If not, what's the intended usage? >=20 > Yes, there are two different metrics. I was thinking after I sent > this that we should sort by metric. >=20 > > If yes, we should probably implement a sorting logic in passt, so that > > the route with the lowest metric is picked, and then adjust the jq > > expression to also pick that one. >=20 > Agreed. I'll see if I can implement this. Up to you whether you drop > this patch or I just do another update on top of it. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --===============5093084521349801007== Content-Type: application/pgp-signature Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="signature.asc" MIME-Version: 1.0 LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KCmlRSXpCQUVCQ0FBZEZpRUVvVUx4V3U0L1dz MGRCK1h0Z3lwWTRnRXdZU0lGQW1MWHJZY0FDZ2tRZ3lwWTRnRXcKWVNKZzB3Ly9jMjJZeitHbGRo dlpOeGJxb1YzSjVHSXBJUm1mM3JuOFEvK2d1OXBuakRCMjY5aEhrQWdWS25NdQoxZHIyaXoxZDlp TWpVRks2OWYrbjJTVVIwaFZEcUFIendwRXArOU4relZyR01XTmRnTlo4ZWcxbnFta00xRVlQCm1F OFlUaXg1SHZ0MldqbUFYWXVxNzl3eC9LVHh3dXhUeXd5T1ROSjJoMXROVldUNHNQNXVEaEs1TURM aDhMME8KdDIxZVFqYktkRGFiL1VzVEplWEY4Rm1VcEhZNitIeTBBVmYxWkNiTi80R0FNVVZlYmdm eFVDMnFCblhnS0lMbQo5TXNicTZSVzR3UjFHMTNTOFFpVmJwN0NIY1NKcmJOZVl2THR1dEN1T1J3 MnVRV1Z0N1Bvd28vK0RkenRUVG0xCjMzVGo3V0dtb3dvbUNPaTBnajlwVGRaM3h1Yk53Tk9TTHNk eE5vRTZRa2krakdsNE41U3drRjBpWC96TTRId2wKa2I3Y3lzbzJzWTRNTUhDR0k2c2RQK3ZlaE1U SDJWWWdCdVlGaUNEOUxxSWQvekxXNGZPVUlMWW9vbkx2d0RDMAo0dGNMclp3RmF0K2hINVVxOGwr RzJiUEZUV0VZZkNCOHV3cTZrSTNENDR0eWVZNUhyQnZGZUxMR3ZHWnJ4a2hmCjRRd3FvbWtvUmx3 ekVDUks3Q1hQSWplUDF6YUJDV2JuZUJVRGVtY0xMc1Ixd1N0d0tlaDhDZkYvYUQvcTN2NnYKQ2x3 U2JWZSt3VmwyTHBPN1NVWlZiSnRucFdtS3ZmSlhJRW5nb2N0c29nTytxOE9LWWVNYXp2RW5SREVM WVFkcwpmZjNIRThub2h2TnkvTGVBRDdlNjhpclFlaDFHNEdvS2ozOXJPc0FXc01mSnV4NlROYnc9 Cj1OQXAvCi0tLS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo= --===============5093084521349801007==--