On Mon, Sep 16, 2024 at 02:46:51PM +0000, Castelli, Anton via user wrote: > Date: Mon, 16 Sep 2024 14:46:51 +0000 > From: "Castelli, Anton" > To: "passt-user@passt.top" > Subject: Rootless Podman with VRRP > List-Id: "For passt users: support, questions and answers" > > > I'm trying to get a service in a rootless Podman container (BIND DNS > server) to respond correctly when using VRRP (via keeepalived) on > the host. It seems like Pasta will forward the inbound traffic to > the container from the VRRP address, but the responses will be from > the regular IP address instead of the VRRP address, which causes the > client to ignore the response. I've tried adding Pasta network > options to the container, but the behavior seems to be the same. I'm not familiar with VRRP beyond a quick skim of the wikipedia article just now. I think the (only) relevant thing is that the host has multiple addresses, but it's possible there's some other complexity on the host that I'm not factoring in yet. > OS: Centos Stream 9 > Podman: 5.2.2 > Pasta: 0^20240806.gee36266-2.el9.x86_64-pasta This version includes UDP flow tracking, which fixes some of the worst bugs with UDP forwarding / addressing, but there are still some edge cases as you've discovered. I think I know what's going on here. With TCP, once we accept() a connection, it's local address on the host is part of the accept()ed socket, so we'll always use the same address for reply packets. With UDP, however, this doesn't happen automatically: the local address for replies is controlled by the bound address of the socket we use to send them out. In this instance that will be the same socket as is listening for the incoming requests. I'm assuming that socket will be set to listen to DNS traffic on any address, so it will be bound to 0.0.0.0:53 on the host. That will see the traffic to the VRRP address, but when we go to send the reply the kernel will pick the local address according to its routing tables and it seems it's picking the default address. That is a bug, at least in principle: we should remember the local address for a UDP flow and use the same local address for replies. With the flow table we now have the means of correctly tracking this, however, we haven't implemented this (yet) because it's kind of fiddly: we'd need to use additional getsockopt() calls to determine and control the local address for datagrams on a per-packet basis. > Outside interface: > > ens18 > 10.1.1.1/24 (main IP) > 10.1.1.2/32 (VRRP IP) > > TCPdump shows the problem (note that the reply packet has source as the main IP, not the VRRP IP: > > IP 10.2.2.2.37392 > 10.1.1.2.53: 60211+ [1au] A? www.example.com. (56) > IP 10.1.1.1.53 > 10.2.2.2.37392: 60211*- 1/0/1 A 192.168.254.7 (88) > > Tried starting the container with non-default pasta options, but the result is the same: > > --network pasta:-I,tap0,-o,10.1.1.2,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp Most of those options are not relevant. --dns-forward is set by podman and won't change anything (it's about DNS requests _out_ from the container). Similarly -g, --ipv4-only and the --no-* options are unlikely to be relevant here. -a is also probably not relevant: it controls the guest's address, but this is a host side addressing problem. That said, -a 10.0.2.0 -n 24 is probably not a good idea, since it's setting the guest's address to the network address. -o is the one that you'd think is relevant, since it's supposed to control the outbound local address. The catch here is that while it will control the local address for new sockets we create to handle flows initiated by the guest, here the flow is initiated by the host and so goes through the listening socket, which is still bound to 0.0.0.0:53. > Any help with possible solutions would be greatly appreciated. I believe I have a workaround. The trick here is we want the listening socket to be bound to the VRRP address, this can be done by setting that specific IP for publishing the port so, for example: podman run --publish 10.1.1.2:53:53/udp ... (which should translate to to the pasta option "-u 10.1.1.2/53:53"). The tradeoff for this workaround is that the container will now *only* respond to DNS traffic to the VRRP address, but I'm guessing that might be what you want anyway. You could also explicitly publish any additional addresses you want to respond on. But, longer term, this is a bug that we'd like to fix when we have time. Would you mind filing a ticket on bugs.passt.top so we don't forget it? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson