public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: passt-dev@passt.top
Subject: Re: [PATCH 17/18] tests: Correct determination of host interface name in tests
Date: Wed, 20 Jul 2022 20:33:23 +1000	[thread overview]
Message-ID: <YtfZ863+urykHimY@yekko> (raw)
In-Reply-To: <20220720102322.2d4d85f9@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 7377 bytes --]

On Wed, Jul 20, 2022 at 10:23:22AM +0200, Stefano Brivio wrote:
> On Wed, 20 Jul 2022 17:24:18 +1000
> David Gibson <david(a)gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote:
> > > On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote:  
> > > > On Tue, 19 Jul 2022 16:20:45 +1000
> > > > David Gibson <david(a)gibson.dropbear.id.au> wrote:
> > > >   
> > > > > On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:  
> > > > > > By default, passt itself attaches to the first host interface with a
> > > > > > default route.  However, when determining the host interface name the tests
> > > > > > implicitly select the *last* host interface: they use a jq expression which
> > > > > > will list all interfaces with default routes, but the way output detection
> > > > > > works in the scripts, it will only pick up the last line.
> > > > > > 
> > > > > > If there are multiple interfaces with default routes on the host, and they
> > > > > > each have a different address, this can cause spurious test
> > > > > > failures.    
> > > > > 
> > > > > It seems this change is not enough to always fix the tests when there
> > > > > are multiple default routes.  I'm still sometimes getting failures,
> > > > > now because passt itself doesn't seem to be picking the interface
> > > > > with the first default route.
> > > > > 
> > > > > I'm wondering if this is because ip(8) is sorting the output, not just
> > > > > presenting it in the same order that the underlying netlink interface
> > > > > does.  
> > > > 
> > > > I don't see that happening at least in my environment (and I also can't
> > > > see any code that would sort it, that pretty much comes from
> > > > rtnl_dump_filter_l() in lib/libnetlink.c).
> > > > 
> > > > The netlink "filter", though, is slightly different. For IPv4:
> > > > 
> > > > $ strace -e sendto ip route show >/dev/null
> > > > sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156
> > > > 
> > > > $ strace -e sendto ./passt -f
> > > > sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28
> > > > [...]
> > > > 
> > > > and, while I don't think the FIB trie is actually descended in a
> > > > different way, we might still have a slightly different result from the
> > > > kernel (if I recall correctly, I didn't check right now).  
> > 
> > So, I've been staring at this for most of the day.  I don't think the
> > difference is because of the slightly different filters.  Instead it's
> > an easy to miss subtletly in nl_get_ext_if().
> > 
> > In nl_get_ext_if() both first_v4 and first_v6 will be set to the
> > interface which meets the criteria and comes first in the order of
> > routes returned by netlink from the kernel.  However, there's this early return:
> > 
> > 	word = (long *)has_v4;
> > 	for (i = 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) {
> > 		tmp = *word;
> > 		while ((n = ffsl(tmp))) {
> > 			int ifi = i * sizeof(long) * 8 + n - 1;
> > 
> > 			if (!first_v4)
> > 				first_v4 = ifi;
> > 
> > 			tmp &= ~(1UL << (n - 1));
> > 			if (bitmap_isset(has_v6, ifi)) {
> > 				*v4 = *v6 = IP_VERSION_ENABLED;
> > 				return ifi;    <================ HERE
> > 			}
> > 		}
> > 	}
> > 
> > Which means that if we find an interface with both a v4 and a v6
> > default route, we'll return it immediately.  That loop is across the
> > has_v4 bitmap, so we'll return the first one in *interface index order*,
> > which might not necessarily be the same as the order the routes are
> > returned from netlink.
> 
> Whoops :) ...right, sorry, I forgot about that.
> 
> > In the other cases: one of v4 or v6 is disabled, or there's no
> > interface with both v4 and v6 default routes, then we'll end up
> > returning the first interface in netlink route order.  Which makes
> > passt's overall priority of selecting interfaces pretty hard to
> > follow.
> > 
> > I've been thinking a bunch about what the order should be.  The
> > obvious cleanup to the current logic would be:
> > 	1. If there are >0 interfaces with both default v4 and v6
> > 	   routes, pick the one with the lowest metric
> > 	2. Otherwise if there are >0 interfaces with a default v4
> > 	   route, pick the one with the lowest metric
> > 	3. Otherwise if there are >0 interfaces with a default v6
> > 	   route, pick the one with the lowest metric
> > 	4. Otherwise fail.
> > 
> > ..but, I think that's the wrong choice.  For one thing it gets weird
> > in the unlikely but possible case that there are two interfaces with
> > both v4 and v6 routes, but one has a lower v4 metric and the other has
> > a lower v6 metric.  But more importantly, I think it fails the "do
> > what you want by default" test in the case of a host that has both v4
> > and v6 routing, but not on the same interface.  That's quite a
> > realistic situation if the v6 connectivity is coming from a tunnel
> > (6in4 or 6rd or 6over4 or whatever).
> 
> Definitely, I didn't consider the tunneled IPv6 case, but it looks
> still rather real even nowadays.

Yeah, I gather IPv6 is finally fairly standard in Germany, but here in
Australia I'm with basically the only residential ISP which provides v6.

> > Instead, I think we should separately pick an external interface for
> > v4 and v6 connectivity.  Yes, it's weird to have a view of two
> > different host interfaces on the same virtual interface.  It also
> > means pasta has to arbitrarily pick one of the host interface names
> > for the namespace, which will make the tests a bit uglier.  But it
> > makes picking the right interfaces a lot simpler and clearer, and does
> > what you probably want for the case of a host with tunnelled ipv6
> > connectivity.
> 
> I totally agree. It's still less weird than the alternative.

Excellent.

> We'll need to store two interface indices (say, "ifi4" and "ifi6"), but
> their usage right now is almost entirely limited to configuration and
> netlink functions.

Yes, I looked through all the uses of c->ifi, and it looks fairly
straightforward.

> One additional usage is to select a given interface for sockets bound
> to link-local addresses by means of setting .sin6_scope_id in struct
> sockaddr_in6 -- that happens in tcp_conn_from_tap() and sock_l4().
> There, we just need to switch to the new "ifi6" index.
> 
> Let me know if I should start looking into this or if you can do it
> instead.

I'll work on it.  Hoping I can get that done before heading on holiday
next week.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-07-20 10:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15  5:21 [PATCH 00/18] Test fixes, batch 5 David Gibson
2022-07-15  5:21 ` [PATCH 01/18] tests: Remove no longer needed /usr/bin/bash link David Gibson
2022-07-15  5:21 ` [PATCH 02/18] tests: Let Fedora find dhclient-script in /usr/sbin David Gibson
2022-07-15  5:21 ` [PATCH 03/18] tests: Add rudimentary debugging to dhclient-script David Gibson
2022-07-15  5:21 ` [PATCH 04/18] tests: Add some extra dhclient support directories to mbuto.img David Gibson
2022-07-15  5:21 ` [PATCH 05/18] tests: More robust parsing of resolv.conf for DHCP tests David Gibson
2022-07-15  5:21 ` [PATCH 06/18] tests: Handle the case of a nameserver on host localhost David Gibson
2022-07-15  5:21 ` [PATCH 07/18] tests: Correctly handle domain search list in dhclient-script David Gibson
2022-07-15  5:21 ` [PATCH 08/18] tests: Fix detection of empty 'hout' responses in passt{,_in_ns} tests David Gibson
2022-07-15  5:21 ` [PATCH 09/18] tests: Fix creation of test file in udp passt tests David Gibson
2022-07-15  5:21 ` [PATCH 10/18] valgrind needs futex David Gibson
2022-07-15  5:21 ` [PATCH 11/18] tests: Use socat instead of netcat David Gibson
2022-07-15  5:21 ` [PATCH 12/18] tests: Remove unnecessary ^D in passt_in_ns teardown David Gibson
2022-07-15  5:21 ` [PATCH 13/18] tests: Remove unnecessary truncation of temporary files in udp tests David Gibson
2022-07-15  5:21 ` [PATCH 14/18] tests: Use dhclient --no-pid for namespaces in two_guests tests David Gibson
2022-07-15  5:21 ` [PATCH 15/18] tests: Clean up better after iperf tests David Gibson
2022-07-15  5:21 ` [PATCH 16/18] tests: No need to retrieve host ifname in ndp/pasta David Gibson
2022-07-15  5:21 ` [PATCH 17/18] tests: Correct determination of host interface name in tests David Gibson
2022-07-19  6:20   ` David Gibson
2022-07-19 19:05     ` Stefano Brivio
2022-07-20  2:47       ` David Gibson
2022-07-20  7:24         ` David Gibson
2022-07-20  8:23           ` Stefano Brivio
2022-07-20 10:33             ` David Gibson [this message]
2022-07-15  5:21 ` [PATCH 18/18] demo: Use git protocol downloads David Gibson
2022-07-21 12:13 ` [PATCH 00/18] Test fixes, batch 5 Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YtfZ863+urykHimY@yekko \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).