public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>
Subject: Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
Date: Mon, 19 Aug 2024 19:52:49 +1000	[thread overview]
Message-ID: <ZsMV8YSTtsdaH1F2@zatzit.fritz.box> (raw)
In-Reply-To: <20240819112749.63d7476d@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 8201 bytes --]

On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:
> On Mon, 19 Aug 2024 18:46:31 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:
> > > Based on Stefano's recent patch for faster tests.
> > > 
> > > Allow the user to specify which addresses are translated when used by
> > > the guest, rather than always being the gateway address or nothing.
> > > We also allow this remapping to go to the host's global address (more
> > > precisely the address assigned to the guest) rather than just host
> > > loopback.
> > > 
> > > Suggestions for better names for the new options in patches 20 & 22
> > > are most welcome.
> > > 
> > > Along the way to implementing that make many changes to clarify what
> > > various addresses we track mean, fixing a number of small bugs as
> > > well.
> > > 
> > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > tests.  I haven't managed to figure out why it's causing the problem,
> > > or even what the exact triggering conditions are (running the single
> > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > thought I'd get this out for review anyway.  
> > 
> > I've identified the bug here.  IMO, it's a pre-existing problem that
> > only works by accident at the moment.  The immediate fix is pretty
> > obvious, but it raises some broader questions
> > 
> > The problem arises because of the MTU changes we make in order to test
> > throughput with different packet sizes.  Specifically we change the
> > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > re-enabled, but some configuration has been lost in the meantime.
> > 
> > After the MTU is restored the guest reconfigures with NDP, but does
> > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > prefix but not the exact /128 address we've tried to assign to it.
> > However, at least with the sequence of things we have in the tests,
> > the guest never sends any packets with the new address, so passt
> > doesn't update addr_seen.  When the inbound connection comes we send
> > it to the assigned address instead of the guest's actual address and
> > the guest rejects it.
> 
> I still have to take a closer look, but I'm fairly sure I hit a similar
> issue while I was writing these tests originally. I pondered
> reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> around that time.
> 
> Then:
> 
> > This "worked" previously, because before this patch, passt would
> > translate the inbound connection to have source/dest as link-local
> > addresses.
> 
> ...I realised that this worked and forgot about the whole issue.
> 
> > We *do* have a current addr_ll_seen because (a) it won't
> > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > the NDP traffic the guest generates will have link-local addresses
> > that update addr_ll_seen.  With this patch, and a global address for
> > --map-host-loopback, we now need to send to addr_seen instead of
> > addr_ll_seen, hence exposing the bug.
> > 
> > In the short term, the obvious fix would be to re-run dhclient -6 in
> > the guest after we twiddle MTU but before running IPv6 tests.
> 
> I guess setting keep_addr_on_down (even for "all" interfaces) should
> work as well.

Sounds like it.  I wasn't aware of that one.

/me tests..  actually, no it doesn't work..

# sysctl -a | grep keep_addr_on_down
net.ipv6.conf.all.keep_addr_on_down = 1
net.ipv6.conf.default.keep_addr_on_down = 1
net.ipv6.conf.dummy0.keep_addr_on_down = 1
net.ipv6.conf.lo.keep_addr_on_down = 0
# ip addr add 2001:db8::1 dev dummy0
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8::1/128 scope global 
       valid_lft forever preferred_lft forever
# ip link set dummy0 mtu 1200
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
# ip link set dummy0 mtu 1500
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff

My guess is that IPv6 being deconfigured because of an unsuitable MTU
is considered a different event from a mere "down".

> > This kind of opens a question about how hard we should try to
> > accomodate guests which don't configure themselves how we told them.
> 
> There's a notable distinction between guests temporarily diverging (in
> different ways) and guests we don't configure at all.

I'm not really sure what you're getting at here.

> It's probably more important to ensure we use the right type of address

"type" in what sense here?

> (security) rather than ensuring we somehow manage to deliver packets at
> any time (minor glitch otherwise), also because the one you describe is
> something we're unlikely to hit outside of tests.
> 
> > Personally I'd be ok with saying that nothing works if the guest
> > doesn't configure itself properly, thereby removing addr_seen and
> > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > idea in the past.
> 
> Yes, I still think we should support guests that don't use DHCPv6 or
> NDP at all,

Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
manually configure the interface in the guest to match the address
you've configured with -a.  Just like you'd expect to have to
correctly configure your address on a real network.

> or where related exchanges fail for any reason. It improves
> reliability and compatibility at a small cost. In this case, I think
> it's a nice feature that we would resume communicating as soon as the
> guest shows its global unicast address.

Hm, maybe.  I'm not entirely convinced the cost is so small long term.
It's pretty badly incompatible with having multiple guests behind the
same passt instance: such as the initial guest bridging or routing to
nested guests.

I'm actually not sure if encountering this bug makes me more or less
in favour of addr_seen.  On the one hand I think it highlights the
flakiness of this approach; there are situations where we just won't
know the right address.  On the other hand if shows a relatively
plausible case where the guest won't get exactly the address we want
it to (it uses NDP but not DHCPv6)

Hrm... actually this also shows a potential danger in the recent
patches to disable DAD in the guest.  With DAD enabled, when the guest
grabs a new address, we'd expect it to emit DAD messages, which would
have the side effect of updating our addr_seen (although I'm pretty
sure I hit this patch before the nodad patches were applied, so that
doesn't seem to be foolproof).

We could maybe update addr_seen when we send RA messages to the guest
- assuming that it will use the same host part (low 64-bits) for both
link-local and global addresses.  Not sure if that's a widely safe
assumption or not.

> If the cost is using the wrong type of address, then not, I'm not
> suggesting we do that, so I think the change from this series is
> desirable, but in a general case, things just work and we don't break
> anything, as far as I know.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-08-19  9:53 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
2024-08-18 15:44   ` Stefano Brivio
2024-08-19  1:28     ` David Gibson
2024-08-16  5:39 ` [PATCH 02/22] util: Helper for formatting MAC addresses David Gibson
2024-08-18 15:44   ` Stefano Brivio
2024-08-19  1:29     ` David Gibson
2024-08-16  5:39 ` [PATCH 03/22] treewide: Rename MAC address fields for clarity David Gibson
2024-08-18 15:45   ` Stefano Brivio
2024-08-19  1:36     ` David Gibson
2024-08-16  5:39 ` [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses David Gibson
2024-08-18 15:45   ` Stefano Brivio
2024-08-19  1:38     ` David Gibson
2024-08-16  5:39 ` [PATCH 05/22] conf: Use array indices rather than pointers for DNS array slots David Gibson
2024-08-16  5:39 ` [PATCH 06/22] conf: More accurately count entries added in get_dns() David Gibson
2024-08-16  5:39 ` [PATCH 07/22] conf: Move DNS array bounds checks into add_dns[46] David Gibson
2024-08-16  5:39 ` [PATCH 08/22] conf: Move adding of a nameserver from resolv.conf into subfunction David Gibson
2024-08-16  5:39 ` [PATCH 09/22] conf: Correct setting of dns_match address in add_dns6() David Gibson
2024-08-16  5:39 ` [PATCH 10/22] conf: Treat --dns addresses as guest visible addresses David Gibson
2024-08-16  5:39 ` [PATCH 11/22] conf: Remove incorrect initialisation of addr_ll_seen David Gibson
2024-08-16  5:39 ` [PATCH 12/22] util: Correct sock_l4() binding for link local addresses David Gibson
2024-08-20  0:14   ` Stefano Brivio
2024-08-20  1:29     ` David Gibson
2024-08-16  5:39 ` [PATCH 13/22] treewide: Change misleading 'addr_ll' name David Gibson
2024-08-20  0:15   ` Stefano Brivio
2024-08-20  1:30     ` David Gibson
2024-08-16  5:39 ` [PATCH 14/22] Clarify which addresses in ip[46]_ctx are meaningful where David Gibson
2024-08-16  5:39 ` [PATCH 15/22] Initialise our_tap_ll to ip6.gw when suitable David Gibson
2024-08-16  5:39 ` [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:40     ` David Gibson
2024-08-16  5:39 ` [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4 David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:56     ` David Gibson
2024-08-16  5:39 ` [PATCH 18/22] Don't take "our" MAC address from the host David Gibson
2024-08-16  5:40 ` [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:59     ` David Gibson
2024-08-16  5:40 ` [PATCH 20/22] conf: Allow address remapped to host to be configured David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  2:23     ` David Gibson
2024-08-16  5:40 ` [PATCH 21/22] fwd: Distinguish translatable from untranslatable addresses on inbound David Gibson
2024-08-16  5:40 ` [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  2:28     ` David Gibson
2024-08-16 14:45 ` [PATCH 00/22] RFC: Allow configuration of special case NATs Paul Holzinger
2024-08-16 15:03   ` Stefano Brivio
2024-08-17  8:01     ` David Gibson
2024-08-19  8:46 ` David Gibson
2024-08-19  9:27   ` Stefano Brivio
2024-08-19  9:52     ` David Gibson [this message]
2024-08-19 13:01       ` Stefano Brivio
2024-08-20  0:42         ` David Gibson
2024-08-20 20:39           ` Stefano Brivio
2024-08-21  2:51             ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsMV8YSTtsdaH1F2@zatzit.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).