public inbox for passt-user@passt.top
 help / color / mirror / Atom feed
* Re: Rootless Podman with VRRP
       [not found] <172649928722.151934.9874324737582181440@maja>
@ 2024-09-17  1:08 ` David Gibson
       [not found]   ` <172658946856.151934.7414720839553284015@maja>
       [not found]   ` <SA1PR07MB87384D729F3FC0543F54BF4392612@SA1PR07MB8738.namprd07.prod.outlook.com>
  0 siblings, 2 replies; 6+ messages in thread
From: David Gibson @ 2024-09-17  1:08 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user

[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]

On Mon, Sep 16, 2024 at 02:46:51PM +0000, Castelli, Anton via user wrote:
> Date: Mon, 16 Sep 2024 14:46:51 +0000
> From: "Castelli, Anton" <anton.castelli@siu.edu>
> To: "passt-user@passt.top" <passt-user@passt.top>
> Subject: Rootless Podman with VRRP
> List-Id: "For passt users: support, questions and answers"
>  <passt-user.passt.top>
> 
> I'm trying to get a service in a rootless Podman container (BIND DNS
> server) to respond correctly when using VRRP (via keeepalived) on
> the host. It seems like Pasta will forward the inbound traffic to
> the container from the VRRP address, but the responses will be from
> the regular IP address instead of the VRRP address, which causes the
> client to ignore the response. I've tried adding Pasta network
> options to the container, but the behavior seems to be the same.

I'm not familiar with VRRP beyond a quick skim of the wikipedia
article just now.  I think the (only) relevant thing is that the host
has multiple addresses, but it's possible there's some other
complexity on the host that I'm not factoring in yet.

> OS: Centos Stream 9
> Podman: 5.2.2
> Pasta: 0^20240806.gee36266-2.el9.x86_64-pasta

This version includes UDP flow tracking, which fixes some of the worst
bugs with UDP forwarding / addressing, but there are still some edge
cases as you've discovered.  I think I know what's going on here.

With TCP, once we accept() a connection, it's local address on the
host is part of the accept()ed socket, so we'll always use the same
address for reply packets.

With UDP, however, this doesn't happen automatically: the local
address for replies is controlled by the bound address of the socket
we use to send them out.  In this instance that will be the same
socket as is listening for the incoming requests.  I'm assuming that
socket will be set to listen to DNS traffic on any address, so it will
be bound to 0.0.0.0:53 on the host.  That will see the traffic to the
VRRP address, but when we go to send the reply the kernel will pick
the local address according to its routing tables and it seems it's
picking the default address.

That is a bug, at least in principle: we should remember the local
address for a UDP flow and use the same local address for replies.
With the flow table we now have the means of correctly tracking this,
however, we haven't implemented this (yet) because it's kind of
fiddly: we'd need to use additional getsockopt() calls to determine
and control the local address for datagrams on a per-packet basis.

> Outside interface:
> 
> ens18
>     10.1.1.1/24 (main IP)
>     10.1.1.2/32 (VRRP IP)
> 
> TCPdump shows the problem (note that the reply packet has source as the main IP, not the VRRP IP:
> 
> IP 10.2.2.2.37392 > 10.1.1.2.53: 60211+ [1au] A? www.example.com. (56)
> IP 10.1.1.1.53 > 10.2.2.2.37392: 60211*- 1/0/1 A 192.168.254.7 (88)
> 
> Tried starting the container with non-default pasta options, but the result is the same:
> 
> --network pasta:-I,tap0,-o,10.1.1.2,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp

Most of those options are not relevant.  --dns-forward is set by
podman and won't change anything (it's about DNS requests _out_ from
the container).  Similarly -g, --ipv4-only and the --no-* options are
unlikely to be relevant here.

-a is also probably not relevant: it controls the guest's address, but
this is a host side addressing problem.  That said, -a 10.0.2.0 -n 24
is probably not a good idea, since it's setting the guest's address to
the network address.

-o is the one that you'd think is relevant, since it's supposed to
control the outbound local address.  The catch here is that while it
will control the local address for new sockets we create to handle
flows initiated by the guest, here the flow is initiated by the host
and so goes through the listening socket, which is still bound to
0.0.0.0:53.

> Any help with possible solutions would be greatly appreciated.

I believe I have a workaround.  The trick here is we want the
listening socket to be bound to the VRRP address, this can be done by
setting that specific IP for publishing the port so, for example:
	podman run --publish 10.1.1.2:53:53/udp ...
(which should translate to to the pasta option "-u 10.1.1.2/53:53").

The tradeoff for this workaround is that the container will now *only*
respond to DNS traffic to the VRRP address, but I'm guessing that
might be what you want anyway.  You could also explicitly publish any
additional addresses you want to respond on.

But, longer term, this is a bug that we'd like to fix when we have
time.  Would you mind filing a ticket on bugs.passt.top so we don't
forget it?

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Rootless Podman with VRRP
       [not found]   ` <172658946856.151934.7414720839553284015@maja>
@ 2024-09-17 16:14     ` Stefano Brivio
       [not found]       ` <SA1PR07MB8738A2C2355C0CA3E1C0822092612@SA1PR07MB8738.namprd07.prod.outlook.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2024-09-17 16:14 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user, David Gibson

Hi Anton,

On Tue, 17 Sep 2024 15:22:04 +0000
"Castelli, Anton via user" <passt-user@passt.top> wrote:

> I will be happy to submit a bug report. Unfortunately, I'm having
> trouble getting signed up. I've tried to send the new account email
> to both my work email address and a personal Gmail address. I have
> not received the email in either case (I've checked the spam folders
> too).

You should have the email now, I'm currently reviewing subscription
requests manually because of an influx of malicious automated requests.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Rootless Podman with VRRP
       [not found]         ` <SA1PR07MB8738CADA17DFA7FDA3D9BD4C92612@SA1PR07MB8738.namprd07.prod.outlook.com>
@ 2024-09-17 20:11           ` Stefano Brivio
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-09-17 20:11 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user, David Gibson

On Tue, 17 Sep 2024 18:53:24 +0000
"Castelli, Anton" <anton.castelli@siu.edu> wrote:

> Stefano,
> 
> I've encountered an error when trying to submit the bug report. I tried twice with the same result. See attached screenshot.

Sorry, my bad. Can you please try again?

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Rootless Podman with VRRP
       [not found]   ` <SA1PR07MB87384D729F3FC0543F54BF4392612@SA1PR07MB8738.namprd07.prod.outlook.com>
@ 2024-09-18  0:58     ` David Gibson
  2024-09-18  2:14       ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-09-18  0:58 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

On Tue, Sep 17, 2024 at 03:22:04PM +0000, Castelli, Anton wrote:
> David,
> 
> Thank you very much for the quick reply!
> 
> I tried querying the DNS with TCP and it worked correctly, using the
> VRRP address in the reply packet. Unfortunately, UDP is the default
> for DNS queries.

Right.

> Thanks for the advice about the options and the workaround. I had
> just copied them from the Podman docs and modified them slightly. I
> tried the '--publish 10.1.1.1:53:53/udp --publish
> 10.1.1.2:53:53/udp' options, and it worked great on the primary
> server that had the active VRRP address. I was able to query both
> the regular and VRRP addresses and get a response. Unfortunately,
> when I tried the same on the secondary server that doesn't have the
> VRRP address, it refused to bind to the non-existent '10.1.1.2'
> address.

Ah, right, of course.  I was just thinking about the primary, and
didn't consider how the secondaries would also need to listen on that
address at some future time.

> I tried with both the publish options and got an error (10.1.1.3 is
> the regular IP of the secondary server).
> 
> --publish 10.1.1.3:53:53/udp --publish 10.1.1.2:53:53/udp
> 
> Error: unable to start container "XXXX": pasta failed with exit code 1:
> Altering mapping of already mapped port number: 10.1.1.2/53-53:53-53

This looks like a different bug - although one that I think will be
fixed by some work that's pretty close to the top of my queue.  It's
not all that relevant for your case right now, because..

> Failed to bind port 53 (Cannot assign requested address) for option
> '-u 10.1.1.2/53-53:53-53', exiting

..this one is more fundamental.  Usually, you can't bind an address
you don't currently own.

> I also tried to publish just the VRRP address that isn't currently
> assigned to the secondary server and got a different error.
> 
> --publish 10.1.1.2:53:53/udp
> 
> Error: unable to start container "XXXX": pasta failed with exit code 1:
> Failed to bind port 53 (Cannot assign requested address) for option '-u 131.230.254.138/53-53:53-53', exiting

This one looks like the same error... except the IP is very strange.
Or was just this a mistake in anonymizing the addresses?

> Since the goal of this VRRP setup is to have an active/standby
> failover pair, I have to have the service started and running on the
> secondary server. If the primary server fails, the VRRP address will
> move to the secondary server and DNS should then respond to
> requests.
> 
> Unless you can think of another work-around for the secondary
> server, I might just have to use a rootful container and host
> networking for now.

I think I do have another workaround, although it will require
changing a setting as root.  If you set:
	sysctl net.ipv4.ip_nonlocal_bind=1
on the host (and the ipv6 version as well, if you need it), then pasta
should be able to bind the VRRP address even if it isn't (yet)
configured on the machine.

There's also a per-socket version of this (IP_FREEBIND) which wouldn't
require the root setup.  We've talked about supporting this in pasta
somehow, but we don't have any specific plans for it (it's not very
clear how you'd configure it, for example).

Note that with the ip_nonlocal_bind setting you might still run into
trouble binding both the VRRP and regular addresses due to the other
bug I mentioned in passing above.  As I said that one should be
addressed by some stuff that's pretty near the front of the queue.
Let me know if that's still an issue for you and I'll consider it a
priority bump to that work.

> I will be happy to submit a bug report. Unfortunately, I'm having
> trouble getting signed up. I've tried to send the new account email
> to both my work email address and a personal Gmail address. I have
> not received the email in either case (I've checked the spam folders
> too).

I see you and Stefano sorted that out.  Thanks for filing the bug.
Looks like in the confusion over signup, 4 almost duplicates were
filed, I've consolidated those now.

That will keep this on the radar, but I can't promise we'll be able to
fix this particularly soon.  There's a heap of other work that will
probably take priority.  I hope the workarounds above can tide you
over for now.

> I very much appreciate your work and the Pasta project. Thank you
> for taking the time to respond and helping me out!

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Rootless Podman with VRRP
  2024-09-18  0:58     ` David Gibson
@ 2024-09-18  2:14       ` David Gibson
       [not found]         ` <SA1PR07MB87382333E58D662293E51E0B92622@SA1PR07MB8738.namprd07.prod.outlook.com>
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-09-18  2:14 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

On Wed, Sep 18, 2024 at 10:58:44AM +1000, David Gibson wrote:
> On Tue, Sep 17, 2024 at 03:22:04PM +0000, Castelli, Anton wrote:
> > David,
> > 
> > Thank you very much for the quick reply!
> > 
> > I tried querying the DNS with TCP and it worked correctly, using the
> > VRRP address in the reply packet. Unfortunately, UDP is the default
> > for DNS queries.
> 
> Right.
> 
> > Thanks for the advice about the options and the workaround. I had
> > just copied them from the Podman docs and modified them slightly. I
> > tried the '--publish 10.1.1.1:53:53/udp --publish
> > 10.1.1.2:53:53/udp' options, and it worked great on the primary
> > server that had the active VRRP address. I was able to query both
> > the regular and VRRP addresses and get a response. Unfortunately,
> > when I tried the same on the secondary server that doesn't have the
> > VRRP address, it refused to bind to the non-existent '10.1.1.2'
> > address.
> 
> Ah, right, of course.  I was just thinking about the primary, and
> didn't consider how the secondaries would also need to listen on that
> address at some future time.
> 
> > I tried with both the publish options and got an error (10.1.1.3 is
> > the regular IP of the secondary server).
> > 
> > --publish 10.1.1.3:53:53/udp --publish 10.1.1.2:53:53/udp
> > 
> > Error: unable to start container "XXXX": pasta failed with exit code 1:
> > Altering mapping of already mapped port number: 10.1.1.2/53-53:53-53
> 
> This looks like a different bug - although one that I think will be
> fixed by some work that's pretty close to the top of my queue.  It's
> not all that relevant for your case right now, because..

I just had a closer look at the code which produces this error.  The
error is not really correct here - it is a bug.  However, it's issued
as only a warning and I think this shouldn't actually break anything
for your situation (assuming we can work around the other issues).

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Rootless Podman with VRRP
       [not found]         ` <SA1PR07MB87382333E58D662293E51E0B92622@SA1PR07MB8738.namprd07.prod.outlook.com>
@ 2024-09-19  2:15           ` David Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-09-19  2:15 UTC (permalink / raw)
  To: Castelli, Anton; +Cc: passt-user

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

On Wed, Sep 18, 2024 at 07:29:09PM +0000, Castelli, Anton wrote:
> David,
> 
> Yes, that one instance was a mistake when I was anonymizing the
> IPs. Sorry for the confusion.
> 
> Following your suggestion, I was able to set the sysctl value
> 'net.ipv4.ip_nonlocal_bind=1'. After that, I was able to
> successfully start the rootless container on the secondary server
> (that did not have the VRRP IP). You were correct that pasta emitted
> a warning, but it started anyway.

Ok.

> With this workaround, I can now successfully start rootless
> containers on both the primary and secondary servers. The primary
> server responds to UDP queries on both its main IP address and the
> VRRP IP address. I tried a manual failover to the secondary server,
> which then also responds on the VRRP IP address in addition to its
> main IP address. Everything appears to be working as intended.

Superb!

> Thank you so much for taking the time to help find a workaround to
> this issue! I'll be updating the bug report with the details on the
> workaround in case anyone else runs into the issue.

Thanks for that.  We have a _lot_ of edge cases of varying obscurity
to sort out eventually; recording the details so they're not forgotten
is super helpful.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-19  2:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <172649928722.151934.9874324737582181440@maja>
2024-09-17  1:08 ` Rootless Podman with VRRP David Gibson
     [not found]   ` <172658946856.151934.7414720839553284015@maja>
2024-09-17 16:14     ` Stefano Brivio
     [not found]       ` <SA1PR07MB8738A2C2355C0CA3E1C0822092612@SA1PR07MB8738.namprd07.prod.outlook.com>
     [not found]         ` <SA1PR07MB8738CADA17DFA7FDA3D9BD4C92612@SA1PR07MB8738.namprd07.prod.outlook.com>
2024-09-17 20:11           ` Stefano Brivio
     [not found]   ` <SA1PR07MB87384D729F3FC0543F54BF4392612@SA1PR07MB8738.namprd07.prod.outlook.com>
2024-09-18  0:58     ` David Gibson
2024-09-18  2:14       ` David Gibson
     [not found]         ` <SA1PR07MB87382333E58D662293E51E0B92622@SA1PR07MB8738.namprd07.prod.outlook.com>
2024-09-19  2:15           ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).