* [PATCH] udp: Fix breakage of UDP error handling by PKTINFO support
@ 2025-04-14 3:58 David Gibson
2025-04-14 9:56 ` Stefano Brivio
0 siblings, 1 reply; 3+ messages in thread
From: David Gibson @ 2025-04-14 3:58 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: Jon Maloy, David Gibson
We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
UDP sockets. This lets us obtain and properly handle the specific local
address used when we're "listening" with a socket on 0.0.0.0 or ::.
However, the PKTINFO cmsgs this option generates appear on error queue
messages as well as regular datagrams. udp_sock_recverr() doesn't expect
this and so flags an unrecoverable error when it can't parse the control
message.
Correct this by adding space in udp_sock_recverr()s control buffer for the
additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
than only looking at the first one.
Link: https://bugs.passt.top/show_bug.cgi?id=99
Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/udp.c b/udp.c
index 40af7dfc..f5fb98c2 100644
--- a/udp.c
+++ b/udp.c
@@ -155,6 +155,10 @@ __attribute__ ((aligned(32)))
#endif
udp_meta[UDP_MAX_FRAMES];
+#define PKTINFO_SPACE \
+ MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \
+ CMSG_SPACE(sizeof(struct in6_pktinfo)))
+
/**
* enum udp_iov_idx - Indices for the buffers making up a single UDP frame
* @UDP_IOV_TAP tap specific header
@@ -476,10 +480,10 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
struct sock_extended_err ee;
union sockaddr_inany saddr;
};
- const struct errhdr *eh;
- const struct cmsghdr *hdr;
- char buf[CMSG_SPACE(sizeof(struct errhdr))];
+ char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
char data[ICMP6_MAX_DLEN];
+ const struct errhdr *eh;
+ struct cmsghdr *hdr;
int s = ref.fd;
struct iovec iov = {
.iov_base = data,
@@ -507,12 +511,16 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
return -1;
}
- hdr = CMSG_FIRSTHDR(&mh);
- if (!((hdr->cmsg_level == IPPROTO_IP &&
- hdr->cmsg_type == IP_RECVERR) ||
- (hdr->cmsg_level == IPPROTO_IPV6 &&
- hdr->cmsg_type == IPV6_RECVERR))) {
- err("Unexpected cmsg reading error queue");
+ for (hdr = CMSG_FIRSTHDR(&mh); hdr; hdr = CMSG_NXTHDR(&mh, hdr)) {
+ if ((hdr->cmsg_level == IPPROTO_IP &&
+ hdr->cmsg_type == IP_RECVERR) ||
+ (hdr->cmsg_level == IPPROTO_IPV6 &&
+ hdr->cmsg_type == IPV6_RECVERR))
+ break;
+ }
+
+ if (!hdr) {
+ err("Missing RECVERR cmsg in error queue");
return -1;
}
@@ -587,10 +595,6 @@ static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
return n_err;
}
-#define PKTINFO_SPACE \
- MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \
- CMSG_SPACE(sizeof(struct in6_pktinfo)))
-
/**
* udp_peek_addr() - Get source address for next packet
* @s: Socket to get information from
--
@@ -155,6 +155,10 @@ __attribute__ ((aligned(32)))
#endif
udp_meta[UDP_MAX_FRAMES];
+#define PKTINFO_SPACE \
+ MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \
+ CMSG_SPACE(sizeof(struct in6_pktinfo)))
+
/**
* enum udp_iov_idx - Indices for the buffers making up a single UDP frame
* @UDP_IOV_TAP tap specific header
@@ -476,10 +480,10 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
struct sock_extended_err ee;
union sockaddr_inany saddr;
};
- const struct errhdr *eh;
- const struct cmsghdr *hdr;
- char buf[CMSG_SPACE(sizeof(struct errhdr))];
+ char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
char data[ICMP6_MAX_DLEN];
+ const struct errhdr *eh;
+ struct cmsghdr *hdr;
int s = ref.fd;
struct iovec iov = {
.iov_base = data,
@@ -507,12 +511,16 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
return -1;
}
- hdr = CMSG_FIRSTHDR(&mh);
- if (!((hdr->cmsg_level == IPPROTO_IP &&
- hdr->cmsg_type == IP_RECVERR) ||
- (hdr->cmsg_level == IPPROTO_IPV6 &&
- hdr->cmsg_type == IPV6_RECVERR))) {
- err("Unexpected cmsg reading error queue");
+ for (hdr = CMSG_FIRSTHDR(&mh); hdr; hdr = CMSG_NXTHDR(&mh, hdr)) {
+ if ((hdr->cmsg_level == IPPROTO_IP &&
+ hdr->cmsg_type == IP_RECVERR) ||
+ (hdr->cmsg_level == IPPROTO_IPV6 &&
+ hdr->cmsg_type == IPV6_RECVERR))
+ break;
+ }
+
+ if (!hdr) {
+ err("Missing RECVERR cmsg in error queue");
return -1;
}
@@ -587,10 +595,6 @@ static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
return n_err;
}
-#define PKTINFO_SPACE \
- MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \
- CMSG_SPACE(sizeof(struct in6_pktinfo)))
-
/**
* udp_peek_addr() - Get source address for next packet
* @s: Socket to get information from
--
2.49.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] udp: Fix breakage of UDP error handling by PKTINFO support
2025-04-14 3:58 [PATCH] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
@ 2025-04-14 9:56 ` Stefano Brivio
[not found] ` <20250414191251.3623406b@elisabeth>
0 siblings, 1 reply; 3+ messages in thread
From: Stefano Brivio @ 2025-04-14 9:56 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev, Jon Maloy
On Mon, 14 Apr 2025 13:58:53 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
> UDP sockets. This lets us obtain and properly handle the specific local
> address used when we're "listening" with a socket on 0.0.0.0 or ::.
>
> However, the PKTINFO cmsgs this option generates appear on error queue
> messages as well as regular datagrams. udp_sock_recverr() doesn't expect
> this and so flags an unrecoverable error when it can't parse the control
> message.
>
> Correct this by adding space in udp_sock_recverr()s control buffer for the
> additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
> than only looking at the first one.
>
> Link: https://bugs.passt.top/show_bug.cgi?id=99
> Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..")
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The patch looks good to me, but I'm hitting something in my tests (with
my recent DNS fix) that's perhaps not intended.
On the host:
$ cat /etc/resolv.conf
nameserver 127.0.0.1
...and nobody is listening on that address. Podman passes
--dns-forward 169.254.1.1 (default) and in the container:
$ podman run --net=pasta:-d,-l,/tmp/pasta.log --rm -ti alpine sh
I do:
/ # nslookup google.com 169.254.1.1
;; connection timed out; no servers could be reached
which is expected. But logs show a warning:
49.5377: Flow 0 (NEW): FREE -> NEW
49.5377: Flow 0 (INI): NEW -> INI
49.5377: Flow 0 (INI): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => ?
49.5377: Flow 0 (TGT): INI -> TGT
49.5377: Flow 0 (TGT): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [0.0.0.0]:43458 -> [127.0.0.1]:53
49.5377: Flow 0 (UDP flow): TGT -> TYPED
49.5377: Flow 0 (UDP flow): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [0.0.0.0]:43458 -> [127.0.0.1]:53
49.5378: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 148325
49.5378: Flow 0 (UDP flow): Side 1 hash table insert: bucket: 309967
49.5378: Flow 0 (UDP flow): TYPED -> ACTIVE
49.5378: Flow 0 (UDP flow): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [127.0.0.1]:43458 -> [127.0.0.1]:53
49.5378: WARNING: Error peeking at socket address: Connection refused
49.5379: ICMP error on UDP socket 208: Connection refused
49.5379: ICMP error on UDP socket 208: Connection refused
52.0404: ICMP error on UDP socket 208: Connection refused
52.0404: ICMP error on UDP socket 208: Connection refused
and I'm not sure if that warning is intended. By the way, I have the
feeling that it now takes longer (with the whole IP_PKTINFO thing) for
nslookup to fail, as if those ICMP errors were not relayed anymore, but
I'm not sure about this, and I didn't investigate yet.
--
Stefano
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] udp: Fix breakage of UDP error handling by PKTINFO support
[not found] ` <20250414191251.3623406b@elisabeth>
@ 2025-04-15 5:28 ` David Gibson
0 siblings, 0 replies; 3+ messages in thread
From: David Gibson @ 2025-04-15 5:28 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev, Jon Maloy
[-- Attachment #1: Type: text/plain, Size: 6899 bytes --]
On Mon, Apr 14, 2025 at 07:12:51PM +0200, Stefano Brivio wrote:
> A couple of further observations:
>
> On Mon, 14 Apr 2025 11:56:41 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
>
> > On Mon, 14 Apr 2025 13:58:53 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
> > > UDP sockets. This lets us obtain and properly handle the specific local
> > > address used when we're "listening" with a socket on 0.0.0.0 or ::.
> > >
> > > However, the PKTINFO cmsgs this option generates appear on error queue
> > > messages as well as regular datagrams. udp_sock_recverr() doesn't expect
> > > this and so flags an unrecoverable error when it can't parse the control
> > > message.
> > >
> > > Correct this by adding space in udp_sock_recverr()s control buffer for the
> > > additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
> > > than only looking at the first one.
> > >
> > > Link: https://bugs.passt.top/show_bug.cgi?id=99
> > > Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..")
> > > Reported-by: Stefano Brivio <sbrivio@redhat.com>
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >
> > The patch looks good to me, but I'm hitting something in my tests (with
> > my recent DNS fix) that's perhaps not intended.
> >
> > On the host:
> >
> > $ cat /etc/resolv.conf
> > nameserver 127.0.0.1
> >
> > ...and nobody is listening on that address. Podman passes
> > --dns-forward 169.254.1.1 (default) and in the container:
> >
> > $ podman run --net=pasta:-d,-l,/tmp/pasta.log --rm -ti alpine sh
> >
> > I do:
> >
> > / # nslookup google.com 169.254.1.1
> > ;; connection timed out; no servers could be reached
> >
> > which is expected. But logs show a warning:
> >
> > 49.5377: Flow 0 (NEW): FREE -> NEW
> > 49.5377: Flow 0 (INI): NEW -> INI
> > 49.5377: Flow 0 (INI): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => ?
> > 49.5377: Flow 0 (TGT): INI -> TGT
> > 49.5377: Flow 0 (TGT): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [0.0.0.0]:43458 -> [127.0.0.1]:53
> > 49.5377: Flow 0 (UDP flow): TGT -> TYPED
> > 49.5377: Flow 0 (UDP flow): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [0.0.0.0]:43458 -> [127.0.0.1]:53
> > 49.5378: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 148325
> > 49.5378: Flow 0 (UDP flow): Side 1 hash table insert: bucket: 309967
> > 49.5378: Flow 0 (UDP flow): TYPED -> ACTIVE
> > 49.5378: Flow 0 (UDP flow): TAP [88.198.0.164]:43458 -> [169.254.1.1]:53 => HOST [127.0.0.1]:43458 -> [127.0.0.1]:53
> > 49.5378: WARNING: Error peeking at socket address: Connection refused
> > 49.5379: ICMP error on UDP socket 208: Connection refused
> > 49.5379: ICMP error on UDP socket 208: Connection refused
> > 52.0404: ICMP error on UDP socket 208: Connection refused
> > 52.0404: ICMP error on UDP socket 208: Connection refused
> >
> > and I'm not sure if that warning is intended.
>
> The warning actually comes from the previous series, not from "[PATCH
> 0/3] Properly preseve local addresses for UDP flows", but rather from
> commit 84ab1305faba ("udp: Polish udp_vu_sock_info() and remove from
> vu specific code").
Right, I was beginning to suspect that from earlier information. My
own investigations today showed that as a problem, and I have a fix
for that along with some related issues.
> I couldn't figure out why we get that error *there* but I didn't try
> hard. If it's unavoidable, I guess we should change that to a debug()
> message, otherwise a guest / container doing a couple of UDP port scans
> could flood host logs.
Yes, they definitely need to be trace level messages. I know who it's
happening there, details in my upcoming commit messages.
> > By the way, I have the
> > feeling that it now takes longer (with the whole IP_PKTINFO thing) for
> > nslookup to fail, as if those ICMP errors were not relayed anymore, but
> > I'm not sure about this, and I didn't investigate yet.
>
> I couldn't reproduce the "fast" nslookup failure I'm fairly sure I had
> seen at some point. I tried reverting my fixes, and both of your series
> separately, but I can't get that to work.
So. I think there are two different problems stopping this from
working.
1) The new udp_flush_flow() is essentially discarding the error
without it passing through the EPOLLERR path and triggering the
correct ICMP.
2) If the host side error is generated locally (address 127.0.0.1), we
send that address untranslated to the guest where it makes no sense.
> I'm attaching a packet capture of the current behaviour. All the
> relevant details are the same as without your two series (and with or
> without my fix).
>
> I'm not sure if we should map the source address of the ICMP messages
> (Jon, thoughts?) instead of leaving it as 127.0.0.1, in case
> fwd_nat_from_tap() changed the outbound source address.
>
> That is, if we have:
>
> 4.2401: Flow 0 (UDP flow): TAP [88.198.0.164]:34843 -> [169.254.1.1]:53 => HOST [0.0.0.0]:34843 -> [127.0.0.1]:53
> 4.2402: ICMP error on UDP socket 209: Connection refused
>
> we currently send an ICMP message from 127.0.0.1 (I guess ignored)
Yes, that will certainly be ignored.
> instead of using 169.254.1.1 as source. Should udp_send_tap_icmp4()
> call tap_icmp4_send() with 'fromside->oaddr' as source (instead of
> 'saddr'), where 'fromside' is the flow side opposite to 'toside'?
No. fromside->oaddr will likely be the same address as the guest has,
so (depending how paranoid it's configuration is), it's still likely
to discard it. We need to do the reverse translation of
--map-guest-addr and --map-host-loopback; this is the first time we've
needed such translation on an address that's not one of the main flow
addresses. We didn't think of it earlier, but it makes sense: the
ICMP can come from some intermediate address that's not either
endpoint of the flow, so we need to NAT it specifically.
> I'm not entirely sure if those ICMP messages are ignored, though.
> Maybe it's all fine the way it is and it's just ports that need to
> match, and it's normal that nslookup retries after a while...
I'm pretty sure they are. I recently encountered a similar thing for
a different reason (I was testing an older version without 65cca54b
("udp: correct source address for ICMP messages").
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-15 7:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-14 3:58 [PATCH] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
2025-04-14 9:56 ` Stefano Brivio
[not found] ` <20250414191251.3623406b@elisabeth>
2025-04-15 5:28 ` David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).