From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address
Date: Thu, 17 Oct 2024 10:31:22 +0200 [thread overview]
Message-ID: <20241017103122.29b1afb0@elisabeth> (raw)
In-Reply-To: <ZxBmPgKI5liJTRaS@zatzit.fritz.box>
On Thu, 17 Oct 2024 12:19:58 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Wed, Oct 16, 2024 at 05:26:48PM +0200, Stefano Brivio wrote:
> > On Wed, 16 Oct 2024 19:39:40 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote:
> > > > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote:
> > > > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote:
> > > > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote:
> > > > > > > On Wed, 9 Oct 2024 15:07:21 +0200
> > > > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > [snip]
> > > > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
> > > > > > > > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) {
> > > > > > > > > /* spliceable */
> > > > > > > > >
> > > > > > > > > - /* Preserve the specific loopback adddress used, but let the
> > > > > > > > > - * kernel pick a source port on the target side
> > > > > > > > > + /* The traffic will go over the guest's 'lo' interface, but by
> > > > > > > > > + * default use its external address, so we don't inadvertently
> > > > > > > > > + * expose services that listen only on the guest's loopback
> > > > > > > > > + * address. That can be overridden by --host-lo-to-ns-lo which
> > > > > > > > > + * will instead forward to the loopback address in the guest.
> > > > > > > > > + *
> > > > > > > > > + * In either case, let the kernel pick the source address to
> > > > > > > > > + * match.
> > > > > > > > > */
> > > > > > > > > - tgt->oaddr = ini->eaddr;
> > > > > > > > > + if (inany_v4(&ini->eaddr)) {
> > > > > > > > > + if (c->host_lo_to_ns_lo)
> > > > > > > > > + tgt->eaddr = inany_loopback4;
> > > > > > > > > + else
> > > > > > > > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
> > > > > > > > > + tgt->oaddr = inany_any4;
> > > > > > > > > + } else {
> > > > > > > > > + if (c->host_lo_to_ns_lo)
> > > > > > > > > + tgt->eaddr = inany_loopback6;
> > > > > > > > > + else
> > > > > > > > > + tgt->eaddr.a6 = c->ip6.addr_seen;
> > > > > > > >
> > > > > > > > Either this...
> > > > > > > >
> > > > > > > > > + tgt->oaddr = inany_any6;
> > > > > > > >
> > > > > > > > or this (and not something before this patch, up to 3/4) make the
> > > > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang,
> > > > > > > > sometimes (about one in three/four runs), that's what I mistakenly
> > > > > > > > reported as coming from Laurent's series at:
> > > > > >
> > > > > > Huh, interesting. Just got back from my leave and ran that group of
> > > > > > tests in a loop this afternoon, but didn't manage to reproduce. I
> > > > > > have administrivia that will probably fill the rest of this week, but
> > > > > > I'll look into this as soon as I can.
> > > > >
> > > > > I reproduced the problem on passt.top, and I have a partial idea
> > > > > what's going on. As you say it's seeming like the address (addr_seen
> > > > > == addr in this case) isn't properly ready. This is over splice, but
> > > > > on the tap interface, I see the container sending NS messages for its
> > > > > own address - seems like it's doing DAD. But more importantly, we're
> > > > > answering those NS messages with NA messages, because we answer all
> > > > > NS. i.e. we're making the DAD fail. What I'm not sure of is how this
> > > > > ever worked at all. --config-net makes sense, since we disable DAD,
> > > > > but our test suite has always been using NDP+DHCP instead of
> > > > > --config-net.
> > > > >
> > > > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, which
> > > > > happens most of the time and for IPv4 via ARP, which is used much more
> > > > > rarely. I think we need to be more selective in what NS or ARP
> > > > > lookups we resopnd to. The question is what approach to take:
> > > >
> > > > Hmm... no.. there's more to this.
> > > >
> > > > Usually DAD requests have :: as the source address, and we *do*
> > > > exclude those from getting replies. In this case though, we're
> > > > getting NS requests for the assigned address from what looks like the
> > > > SLAAC address. So, I do think it would be wise to explicitly exclude
> > > > these: we shouldn't be giving NA responses for an address that ought
> > > > to belong to the guest, even if it doesn't look like a DAD.
> > > >
> > > > But, I'm not sure what's triggering this. Is for some reason the DHCP
> > > > address not "taking", so the container is trying to locate it on the
> > > > network instead? Or _is_ this DAD, but under some circumstances
> > > > rather than using :: as the source address it uses another configured
> > > > address.
> > >
> > > Ok.. I've understood a bit more. While timing is a factor here, it
> > > looks like the main reason I wasn't seeing it on my machine is what
> > > I'd consider a bug in the Debian version of the dhclient-script:
> > > when adding an IPv6 address, it returns without waiting for DAD to
> > > complete (i.e. for the address to be non-tentative).
> >
> > Oops. On one hand, I would feel inclined to propose a fix for the
> > Debian and Ubuntu packages. On the other hand, I wonder if it's
> > universally considered a bug: the DHCPv6 client did its job at that
> > point, and it's debatable whether dhclient should wait for the address
> > to be usable before forking to background.
> >
> > That is, arguably, the job of dhclient's is to request and configure an
> > address. It's not a network configuration daemon. There might be many
> > other reasons why that address is unusable, and yet dhclient is not
> > responsible for them.
>
> Hrm... I guess. Counterpoints..
> - Most other failures to get a usable address will result in a
> visible error
> - dhclient has a --dad-wait-time option which seems to imply that the
> script should wait for DAD
> - The upstream script version waits for DAD
>
> In any case I filed a report for it
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231
>
> > By the way, I guess it's just an issue for test scripts like this one.
>
> Why do you guess that?
Because it's kind of rare that your address changes if you use DHCPv6,
I guess, so this would be relevant almost exclusively at boot.
And, at boot, if a remote peer/client happens to try to connect to the
machine where the client is running right after an address was
assigned, it must have a retry mechanism almost for sure.
--
Stefano
next prev parent reply other threads:[~2024-10-17 8:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 5:48 [PATCH v3 0/4] Don't expose container loopback services to the host David Gibson
2024-10-02 5:48 ` [PATCH v3 1/4] passt.1: Mark --stderr as deprecated more prominently David Gibson
2024-10-02 5:48 ` [PATCH v3 2/4] passt.1: Clarify and update "Handling of local addresses" section David Gibson
2024-10-02 5:48 ` [PATCH v3 3/4] test: Clarify test for spliced inbound transfers David Gibson
2024-10-02 5:48 ` [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address David Gibson
2024-10-09 13:07 ` Stefano Brivio
2024-10-09 20:44 ` Stefano Brivio
2024-10-10 5:57 ` David Gibson
2024-10-16 3:15 ` David Gibson
2024-10-16 5:46 ` David Gibson
2024-10-16 8:39 ` David Gibson
2024-10-16 15:26 ` Stefano Brivio
2024-10-17 1:19 ` David Gibson
2024-10-17 8:31 ` Stefano Brivio [this message]
2024-10-21 1:35 ` David Gibson
2024-10-17 5:06 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241017103122.29b1afb0@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).