From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson To: passt-dev@passt.top Subject: Re: [PATCH 17/18] tests: Correct determination of host interface name in tests Date: Wed, 20 Jul 2022 20:33:23 +1000 Message-ID: In-Reply-To: <20220720102322.2d4d85f9@elisabeth> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3494557058101226565==" --===============3494557058101226565== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 20, 2022 at 10:23:22AM +0200, Stefano Brivio wrote: > On Wed, 20 Jul 2022 17:24:18 +1000 > David Gibson wrote: >=20 > > On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote: > > > On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote: =20 > > > > On Tue, 19 Jul 2022 16:20:45 +1000 > > > > David Gibson wrote: > > > > =20 > > > > > On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote: =20 > > > > > > By default, passt itself attaches to the first host interface wit= h a > > > > > > default route. However, when determining the host interface name= the tests > > > > > > implicitly select the *last* host interface: they use a jq expres= sion which > > > > > > will list all interfaces with default routes, but the way output = detection > > > > > > works in the scripts, it will only pick up the last line. > > > > > >=20 > > > > > > If there are multiple interfaces with default routes on the host,= and they > > > > > > each have a different address, this can cause spurious test > > > > > > failures. =20 > > > > >=20 > > > > > It seems this change is not enough to always fix the tests when the= re > > > > > are multiple default routes. I'm still sometimes getting failures, > > > > > now because passt itself doesn't seem to be picking the interface > > > > > with the first default route. > > > > >=20 > > > > > I'm wondering if this is because ip(8) is sorting the output, not j= ust > > > > > presenting it in the same order that the underlying netlink interfa= ce > > > > > does. =20 > > > >=20 > > > > I don't see that happening at least in my environment (and I also can= 't > > > > see any code that would sort it, that pretty much comes from > > > > rtnl_dump_filter_l() in lib/libnetlink.c). > > > >=20 > > > > The netlink "filter", though, is slightly different. For IPv4: > > > >=20 > > > > $ strace -e sendto ip route show >/dev/null > > > > sendto(3, [{{len=3D36, type=3DRTM_GETROUTE, flags=3DNLM_F_REQUEST|NLM= _F_DUMP, seq=3D1658257027, pid=3D0}, {rtm_family=3DAF_INET, rtm_dst_len=3D0, = rtm_src_len=3D0, rtm_tos=3D0, rtm_table=3DRT_TABLE_UNSPEC, rtm_protocol=3DRTP= ROT_UNSPEC, rtm_scope=3DRT_SCOPE_UNIVERSE, rtm_type=3DRTN_UNSPEC, rtm_flags= =3D0}, {{nla_len=3D8, nla_type=3DRTA_TABLE}, RT_TABLE_MAIN}}, {len=3D0, type= =3D0 /* NLMSG_??? */, flags=3D0, seq=3D0, pid=3D0}], 156, 0, NULL, 0) =3D 156 > > > >=20 > > > > $ strace -e sendto ./passt -f > > > > sendto(5, {{len=3D28, type=3DRTM_GETROUTE, flags=3DNLM_F_REQUEST|NLM_= F_DUMP, seq=3D0, pid=3D0}, {rtm_family=3DAF_INET, rtm_dst_len=3D0, rtm_src_le= n=3D0, rtm_tos=3D0, rtm_table=3DRT_TABLE_MAIN, rtm_protocol=3DRTPROT_UNSPEC, = rtm_scope=3DRT_SCOPE_UNIVERSE, rtm_type=3DRTN_UNICAST, rtm_flags=3D0}}, 28, 0= , NULL, 0) =3D 28 > > > > [...] > > > >=20 > > > > and, while I don't think the FIB trie is actually descended in a > > > > different way, we might still have a slightly different result from t= he > > > > kernel (if I recall correctly, I didn't check right now). =20 > >=20 > > So, I've been staring at this for most of the day. I don't think the > > difference is because of the slightly different filters. Instead it's > > an easy to miss subtletly in nl_get_ext_if(). > >=20 > > In nl_get_ext_if() both first_v4 and first_v6 will be set to the > > interface which meets the criteria and comes first in the order of > > routes returned by netlink from the kernel. However, there's this early = return: > >=20 > > word =3D (long *)has_v4; > > for (i =3D 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { > > tmp =3D *word; > > while ((n =3D ffsl(tmp))) { > > int ifi =3D i * sizeof(long) * 8 + n - 1; > >=20 > > if (!first_v4) > > first_v4 =3D ifi; > >=20 > > tmp &=3D ~(1UL << (n - 1)); > > if (bitmap_isset(has_v6, ifi)) { > > *v4 =3D *v6 =3D IP_VERSION_ENABLED; > > return ifi; <=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D HERE > > } > > } > > } > >=20 > > Which means that if we find an interface with both a v4 and a v6 > > default route, we'll return it immediately. That loop is across the > > has_v4 bitmap, so we'll return the first one in *interface index order*, > > which might not necessarily be the same as the order the routes are > > returned from netlink. >=20 > Whoops :) ...right, sorry, I forgot about that. >=20 > > In the other cases: one of v4 or v6 is disabled, or there's no > > interface with both v4 and v6 default routes, then we'll end up > > returning the first interface in netlink route order. Which makes > > passt's overall priority of selecting interfaces pretty hard to > > follow. > >=20 > > I've been thinking a bunch about what the order should be. The > > obvious cleanup to the current logic would be: > > 1. If there are >0 interfaces with both default v4 and v6 > > routes, pick the one with the lowest metric > > 2. Otherwise if there are >0 interfaces with a default v4 > > route, pick the one with the lowest metric > > 3. Otherwise if there are >0 interfaces with a default v6 > > route, pick the one with the lowest metric > > 4. Otherwise fail. > >=20 > > ..but, I think that's the wrong choice. For one thing it gets weird > > in the unlikely but possible case that there are two interfaces with > > both v4 and v6 routes, but one has a lower v4 metric and the other has > > a lower v6 metric. But more importantly, I think it fails the "do > > what you want by default" test in the case of a host that has both v4 > > and v6 routing, but not on the same interface. That's quite a > > realistic situation if the v6 connectivity is coming from a tunnel > > (6in4 or 6rd or 6over4 or whatever). >=20 > Definitely, I didn't consider the tunneled IPv6 case, but it looks > still rather real even nowadays. Yeah, I gather IPv6 is finally fairly standard in Germany, but here in Australia I'm with basically the only residential ISP which provides v6. > > Instead, I think we should separately pick an external interface for > > v4 and v6 connectivity. Yes, it's weird to have a view of two > > different host interfaces on the same virtual interface. It also > > means pasta has to arbitrarily pick one of the host interface names > > for the namespace, which will make the tests a bit uglier. But it > > makes picking the right interfaces a lot simpler and clearer, and does > > what you probably want for the case of a host with tunnelled ipv6 > > connectivity. >=20 > I totally agree. It's still less weird than the alternative. Excellent. > We'll need to store two interface indices (say, "ifi4" and "ifi6"), but > their usage right now is almost entirely limited to configuration and > netlink functions. Yes, I looked through all the uses of c->ifi, and it looks fairly straightforward. > One additional usage is to select a given interface for sockets bound > to link-local addresses by means of setting .sin6_scope_id in struct > sockaddr_in6 -- that happens in tcp_conn_from_tap() and sock_l4(). > There, we just need to switch to the new "ifi6" index. >=20 > Let me know if I should start looking into this or if you can do it > instead. I'll work on it. Hoping I can get that done before heading on holiday next week. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --===============3494557058101226565== Content-Type: application/pgp-signature Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="signature.asc" MIME-Version: 1.0 LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KCmlRSXpCQUVCQ0FBZEZpRUVvVUx4V3U0L1dz MGRCK1h0Z3lwWTRnRXdZU0lGQW1MWDJkOEFDZ2tRZ3lwWTRnRXcKWVNMVXdnLy9STk9lYmMwUTFS c1dPaEl2TURXQ2s2MjZZOVdsaGwySnRsTUZZSFJzOEpYaU1zV3R4TjVSTGlSTgo1UmdXQmlsZjlY S21RK1ljNmdrS01xUGozYU1HdE5mQ0tHaXRoZ1hMb0VuUXN0Z2l1enBrUXNNRTkrK0t6eU9DClVn K2xFYVRaMWw3Q2RHaTdhazA2aWdQMnJJcjFheGVieEIyN3R3ejB2ZVlCSUR5Y0FWc0N5MmNxN1Jt d1ZlaUsKbXBQUEU3MHVDeHVBWDRsenJKcS9oQysrajRHaXRCdmRHNWh5dzBRTER0Ulg5Wlo4ZkJJ UENZMmN1VXRsckhTOApKR21mRTZxSFo2ZThOQzZsVi9NRnZJalBMdmY2a2ZDd0lPc0QxT3pPYlg3 YlF4YnBCeXhBclVuTWJrV0w3YUR3CkxmanBIMjFpZVZKM1JKSFh5WG9RMnlDTUl4SWg5OWVlb0NC U1FkbjFFV0EvOGRseXRUTGpsTnJXZ3NRU0xyQW8KQXQ1Z0pNNCt4aVVmZHdRTGl5aERRNE1KeHdT ajVkbWlFbHBNTEEwL0dOT29KNmZraVVwSTJod1F4UFVlTFNvdgorVk1ocHYzSzd1Q25nN3dPdldy ZWd6Z3p4R0x6VElRWXVOd2VBS28zeDNPWHVsZzBFeVhyeERyYkRmbmxFcS9EClphelBqSEpPRFhp bzQ3dDJMZ2IwOHRsYUErcGFMK1FOYTIyZVVuZDZGd2l3SElaYThvZWl0V3VLVUJVVXIyTDgKTFpV WmhucE5nb3RWZmZ2U3NEbmJROFZPUmR4R2VqL3E1ZGp0YzNRblJIYVZET2hrcXRPRzdjaWk3Ujlo dGtuVwplUmpqR2RJdXZXRmJLV2JXVEVVRXhNMDRRSVFNUjFFblJLNVhYYmdKdGdYUmFBa0dRNzA9 Cj1hZ1BzCi0tLS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo= --===============3494557058101226565==--