public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Callum Parsey <callum@neoninteger.au>,
	me@yawnt.com, lemmi@nerd2nerd.org
Subject: Re: [PATCH 00/10] RFC/RFT: Optionally copy all routes and addresses for pasta, allow gateway-less routes
Date: Tue, 16 May 2023 23:42:09 +0200	[thread overview]
Message-ID: <20230516234209.05e84523@elisabeth> (raw)
In-Reply-To: <ZGMPVQagYetHG9MS@yekko>

On Tue, 16 May 2023 15:06:29 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Sun, May 14, 2023 at 08:14:05PM +0200, Stefano Brivio wrote:
> > This series, along with pseudo-related fixes, enables:
> > 
> > - optional copy of all routes from selected interface in outer
> >   namespace, to (hopefully!) fix the issue reported by Callum at:
> >     https://github.com/containers/podman/issues/18539
> > 
> > - optional copy of all addresses, mostly for consistency. It doesn't,
> >   however, enable assignment of multiple addresses in the sense
> >   requested at:
> >     https://bugs.passt.top/show_bug.cgi?id=47
> > 
> >   because the addresses still need to be present on the host, and
> >   the "outer" address isn't selected depending on the address used
> >   inside the container
> > 
> > - operation without a gateway address, to (again, hopefully) support
> >   usage of Wireguard endpoints established outside the container,
> >     https://bugs.passt.top/show_bug.cgi?id=49
> > 
> > I tested the single functionalities introduced here, but I didn't
> > try to reproduce the setups where the issues were reported, so some
> > help with testing is definitely fundamental here. Thanks.  
> 
> I've sent reviews for some of the simpler patches in this series which
> make sense even without the context of the overall aim.  I think those
> can be applied immediately.

Those are actually the least important patches for users -- and I can't
apply 6/10 without breaking Podman's CI plus probably a number of
deployments (that's why it comes after 5/10)... so, no, I would rather
not apply the rest for the moment.

> For the rest of the series, I want to address the generalities before
> doing detailed review of the implementation.
> 
> I think the basic idea here is sound: we want to expose anything
> routable to the host as routable to the guest, even when the host has
> a more complex routing setup that just a netmask on the "main"
> interface and a default gateway within that prefix.

The intentions behind this series are actually slightly different:

- we have a complete breakage in a seemingly common use case (I would
  even say cloud-init setups in general), and I'd like to fix that
  sooner rather than later

- this concerns only the direct configuration pasta does, with
  --config-net. What we advertise is definitely related, but not the
  same topic... to the point that the issues fixed by this series don't
  even occur with a DHCP client:
    https://github.com/containers/podman/issues/18539#issuecomment-1545023424

  And, in general, we can't advertise everything we can configure (say,
  a route without router over DHCP).

  I'd be much more careful about what we advertise. We have direct
  control of what we configure via netlink, but for DHCP, NDP, DHCPv6,
  we need to think of possible interpretations and common half-bugs as
  well.

> But I think we want to think a bit more deeply about exactly what we
> need/want to expose here.
> 
> Even with the current code, the default gateway address we advertise
> to the guest is kind of meaningless: the guest cannot directly access
> that gateway, everything really goes through passt on the host.

In the simplest, probably most common network setups, that's actually
the gateway that connects our guest to other nodes.

For other cases, I think we should eventually implement
https://bugs.passt.top/show_bug.cgi?id=47 anyway, and it goes without
saying that, then, we can't just use the same host route no matter what
the container chooses. We'll need to match them.

I mean, I'm not saying that the behaviour from this series is complete
and self-consistent, just that it works around obvious, urgent issues
and at the same time it looks like we'll probably need something
similar to support further use cases.

> This works because the gateway address (like everything) will ARP/NDP
> to passt's host side MAC address and once the packets hit passt it
> doesn't matter what the guest thought the routing was going to be.
> 
> I think we have a few choices in two more-or-less orthogonal
> categories.
> 
> A) What routable prefixes do we advertise to the guest?
> 
>   A.1) Always a default route (0.0.0.0/0 and ::/0)
> 
> We tell the guest that every address is routable via the passt
> interface, regardless of routing setup on the host.  This essentially
> tells the guest to delegate all routing responsibility to passt.
> 
> Advantages:
>   * Simple
>   * No need to update anything if routing configuration on the host
>     changes
> Disadvantages:
>   * If addresses are unroutable from the host, the guest will only
>     know via ICMP/ICMPv6, rather than statically, which may be a worse
>     UX on the guest side.  Plus we might need to actually implement
>     those host unreachable ICMPs.
>   * Might be messy if the guest has multiple interfacees - e.g. if we
>     allow passt to be configured to attach to a specific host
>     interface only, then we have multiple passts attached to a single
>     guest: they'd all be advertising a default route.
> 
>   A.2) Copy routable prefixes from the host to the guest

I'm having a hard time figuring out the definition of this point. How
would you define that? Strictly speaking, in the case at hand, nothing
is routable: we have a /32 address.

> We just advertise those prefixes routable to the host to the guest
> (which might include an empty prefix == default route).
> 
> Advantages:
>   * Guest statically knows what addresses are routable via the passt
>     interface
> Disadvantages:
>   * What do we do with overlapping prefixes?  On the host we might
>     have more specific routes pointing to a specific interface.  For
>     the guest they all point to the passt interface, so what's the
>     point?
>   * Can we advertise an arbitrary set of static routes via all our
>     mechanisms (--config-net, DHCP, NDP+DHCPv6)?  Even if we can it
>     adds more complexity to that code
>   * How do we update things if the host routing configuration changes?
>   * What do we do if the host has source-based routing or other
>     advanced stuff set up?
> 
> B) What gateway, if any, do we advertise for each route?
> 
>   B.1) Copy it from the host
> 
> Advantages:
>  * Guest L3 configuration resembles that of the host

...which is a fundamental design goal of passt: transparency, and
pretending it doesn't exist. Otherwise we can have a route, a bridge,
an interface, etc.

Now, while there are use cases that rely on different aspects of this
transparency (KubeVirt and service mesh integration) I understand this
might sound a bit dogmatic, because you might say there are more
important use cases (which I'm not aware of) or supposed benefits.

What's far less dogmatic, though, is how many issues we happily and
automatically avoid by relying on the sanity of the host networking
configuration.

By trying to copy it as close as possible, we avoid one very important
source of issues, which is our interpretation or possible lack of
knowledge about how applications we don't know about chose to interact
with kernel and network setups. The main case fixed by this series
shows exactly that: I think it's broken, but it works, and users
expect it to work.

And by trusting the host configuration we don't lose much: if that's
broken, almost everything else is broken anyway.

> Disadvantages:
>  * If the host route doesn't have a gateway we have to fall back on
>    B.2 or B.3 anyway

Well, they are a particular case of B.1 then: what's the disadvantage?
This is consistent (especially with this series, and especially if we
start adapting the *default* behaviours in this sense).

>  * Misleading: in fact everything is routed by passt and the host
>    before it reaches any gateway we're listing here

But passt isn't supposed to be a router...? Let's say we have multiple
routes on the host, we configure or advertise multiple routes to the
guest. Does that make passt a router? I don't think so: we're just
associating them as closely as possible, without fancy interpretations.

A router has its own routing table, passt's would simply be a copy.
Right now it has essentially none.

>   B.2) Pick an address to represent passt as gateway
> 
> Advantages:
>  * Accurately represents that everything is routed by passt

This is configurable, actually, but no, I insist that passt isn't
*functionally* routing anything, or at least that we should get as
close as possible to that.

>  * We can make this the same as the NAT-to-host address, so we only
>    have one "magic" address (per AF)

Not really, if it's configurable.

> Disadvantages:
>  * Have to allocate an address that's safe, which is tricky (but we
>    usually want this for NAT-to-host anyway)

There's a difference between picking an address by default and letting
the user configure one. Besides, at least for IPv4, I don't think such
an address exists.

>  * Do we want just one address, or one for each distinct gateway from
>    the host?
>  * If we can't pick something in the interfaces "natural" prefix, we
>    will also need to advertise a static route to reach it.
> 
>   B.3) Don't advertise a gateway for any route
> 
> passt essentially proxy ARPs for the entire internet.
> 
> Advantages:
>   * No need to allocate an address - in fact passt need not have any
>     guest facing IP at all
>   * Extends naturally if we ever have a guest<->passt transport that's
>     point-to-point rather than pseudo-ethernet
> Disadvantages:
>   * Guest ARP / neighbour tables could get real big

...it would also break a number of applications that peek at netlink
(or do ioctl()s) to check they are in fact online.

> 
> 
> The status quo is, roughly, A.1+B.1, except that we also enforce that
> the host must have a default route, which sidesteps one of the
> complications of B.1.  IIUC, this series is implementing A.2+B.1.
> 
> Thinking about it, I'm moderately convinced that B.1 is a bad idea.
> I'm leaning towards B.2 - combining it with the NAT-to-host cleanups
> to have a more concrete guest-visible address for passt itself - but
> I'm also open to B.3.

...that, especially B.3, sounds like another tool, or at least like
another mode, because it conflicts quite a bit with design goals.

They're different from design _choices_ in the sense that that's what
I've been "selling" to users and what I and others have been
implementing in integrations so far.

> I'm not sure about A.1 vs. A.2.  I was leaning towards A.2, but on
> further consideration, I feel like the fact that A.1 automatically
> works for routing changes on the host might outweigh the fact that he
> guest only gets limited information (ICMP) about what's routable.

I don't think A.2 is doable, but even if it were, yes, I don't think
it would be worth the effort. If needed (and I never saw a request in
this sense), we could enrich ICMP/ICMPv6 handling guest- or
container-side quite a bit.

-- 
Stefano


  reply	other threads:[~2023-05-16 21:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-14 18:14 [PATCH 00/10] RFC/RFT: Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
2023-05-14 18:14 ` [PATCH 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
2023-05-16  3:23   ` David Gibson
2023-05-14 18:14 ` [PATCH 02/10] pasta: Improve error handling on failure to join network namespace Stefano Brivio
2023-05-16  3:24   ` David Gibson
2023-05-14 18:14 ` [PATCH 03/10] netlink: Add functionality to copy routes from outer namespace Stefano Brivio
2023-05-14 18:14 ` [PATCH 04/10] conf: --config-net option is for pasta mode only Stefano Brivio
2023-05-16  3:59   ` David Gibson
2023-05-14 18:14 ` [PATCH 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
2023-05-14 18:14 ` [PATCH 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" Stefano Brivio
2023-05-16  4:00   ` David Gibson
2023-05-14 18:14 ` [PATCH 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
2023-05-14 18:14 ` [PATCH 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
2023-05-14 18:14 ` [PATCH 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
2023-05-14 18:14 ` [PATCH 10/10] passt.h: Fix description of pasta_ifi in struct ctx Stefano Brivio
2023-05-16  4:03   ` David Gibson
2023-05-16  5:06 ` [PATCH 00/10] RFC/RFT: Optionally copy all routes and addresses for pasta, allow gateway-less routes David Gibson
2023-05-16 21:42   ` Stefano Brivio [this message]
2023-05-17  1:15     ` David Gibson
2023-05-17  6:52       ` Stefano Brivio
2023-05-18  3:26         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230516234209.05e84523@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=callum@neoninteger.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=lemmi@nerd2nerd.org \
    --cc=me@yawnt.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).