public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Assorted fixes for UDP socket and error handling problems
@ 2025-04-15  7:16 David Gibson
  2025-04-15  7:16 ` [PATCH 1/7] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

My recent changes to UDP socket management caused some unfortunate
side effects.  In particular, it has some bad interactions with UDP
error handling.  Fix several bugs here, along with some reworks to
allow that.

David Gibson (7):
  udp: Fix breakage of UDP error handling by PKTINFO support
  udp: Be quieter about errors on UDP receive
  udp: Pass socket & flow information direction to error handling
    functions
  udp: Deal with errors as we go in udp_sock_fwd()
  udp: Add udp_pktinfo() helper
  udp: Minor re-organisation of udp_sock_recverr()
  udp: Propagate errors on listening and brand new sockets

 udp.c | 215 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 138 insertions(+), 77 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] udp: Fix breakage of UDP error handling by PKTINFO support
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15  7:16 ` [PATCH 2/7] udp: Be quieter about errors on UDP receive David Gibson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
UDP sockets.  This lets us obtain and properly handle the specific local
address used when we're "listening" with a socket on 0.0.0.0 or ::.

However, the PKTINFO cmsgs this option generates appear on error queue
messages as well as regular datagrams.  udp_sock_recverr() doesn't expect
this and so flags an unrecoverable error when it can't parse the control
message.

Correct this by adding space in udp_sock_recverr()s control buffer for the
additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
than only looking at the first one.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..")
Reported-by: Stefano Brivio <sbrivio@redhat.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/udp.c b/udp.c
index 40af7dfc..f5fb98c2 100644
--- a/udp.c
+++ b/udp.c
@@ -155,6 +155,10 @@ __attribute__ ((aligned(32)))
 #endif
 udp_meta[UDP_MAX_FRAMES];
 
+#define PKTINFO_SPACE					\
+	MAX(CMSG_SPACE(sizeof(struct in_pktinfo)),	\
+	    CMSG_SPACE(sizeof(struct in6_pktinfo)))
+
 /**
  * enum udp_iov_idx - Indices for the buffers making up a single UDP frame
  * @UDP_IOV_TAP         tap specific header
@@ -476,10 +480,10 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 		struct sock_extended_err ee;
 		union sockaddr_inany saddr;
 	};
-	const struct errhdr *eh;
-	const struct cmsghdr *hdr;
-	char buf[CMSG_SPACE(sizeof(struct errhdr))];
+	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
 	char data[ICMP6_MAX_DLEN];
+	const struct errhdr *eh;
+	struct cmsghdr *hdr;
 	int s = ref.fd;
 	struct iovec iov = {
 		.iov_base = data,
@@ -507,12 +511,16 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 		return -1;
 	}
 
-	hdr = CMSG_FIRSTHDR(&mh);
-	if (!((hdr->cmsg_level == IPPROTO_IP &&
-	       hdr->cmsg_type == IP_RECVERR) ||
-	      (hdr->cmsg_level == IPPROTO_IPV6 &&
-	       hdr->cmsg_type == IPV6_RECVERR))) {
-		err("Unexpected cmsg reading error queue");
+	for (hdr = CMSG_FIRSTHDR(&mh); hdr; hdr = CMSG_NXTHDR(&mh, hdr)) {
+		if ((hdr->cmsg_level == IPPROTO_IP &&
+		      hdr->cmsg_type == IP_RECVERR) ||
+		     (hdr->cmsg_level == IPPROTO_IPV6 &&
+		      hdr->cmsg_type == IPV6_RECVERR))
+		    break;
+	}
+
+	if (!hdr) {
+		err("Missing RECVERR cmsg in error queue");
 		return -1;
 	}
 
@@ -587,10 +595,6 @@ static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
 	return n_err;
 }
 
-#define PKTINFO_SPACE					\
-	MAX(CMSG_SPACE(sizeof(struct in_pktinfo)),	\
-	    CMSG_SPACE(sizeof(struct in6_pktinfo)))
-
 /**
  * udp_peek_addr() - Get source address for next packet
  * @s:		Socket to get information from
-- 
@@ -155,6 +155,10 @@ __attribute__ ((aligned(32)))
 #endif
 udp_meta[UDP_MAX_FRAMES];
 
+#define PKTINFO_SPACE					\
+	MAX(CMSG_SPACE(sizeof(struct in_pktinfo)),	\
+	    CMSG_SPACE(sizeof(struct in6_pktinfo)))
+
 /**
  * enum udp_iov_idx - Indices for the buffers making up a single UDP frame
  * @UDP_IOV_TAP         tap specific header
@@ -476,10 +480,10 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 		struct sock_extended_err ee;
 		union sockaddr_inany saddr;
 	};
-	const struct errhdr *eh;
-	const struct cmsghdr *hdr;
-	char buf[CMSG_SPACE(sizeof(struct errhdr))];
+	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
 	char data[ICMP6_MAX_DLEN];
+	const struct errhdr *eh;
+	struct cmsghdr *hdr;
 	int s = ref.fd;
 	struct iovec iov = {
 		.iov_base = data,
@@ -507,12 +511,16 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 		return -1;
 	}
 
-	hdr = CMSG_FIRSTHDR(&mh);
-	if (!((hdr->cmsg_level == IPPROTO_IP &&
-	       hdr->cmsg_type == IP_RECVERR) ||
-	      (hdr->cmsg_level == IPPROTO_IPV6 &&
-	       hdr->cmsg_type == IPV6_RECVERR))) {
-		err("Unexpected cmsg reading error queue");
+	for (hdr = CMSG_FIRSTHDR(&mh); hdr; hdr = CMSG_NXTHDR(&mh, hdr)) {
+		if ((hdr->cmsg_level == IPPROTO_IP &&
+		      hdr->cmsg_type == IP_RECVERR) ||
+		     (hdr->cmsg_level == IPPROTO_IPV6 &&
+		      hdr->cmsg_type == IPV6_RECVERR))
+		    break;
+	}
+
+	if (!hdr) {
+		err("Missing RECVERR cmsg in error queue");
 		return -1;
 	}
 
@@ -587,10 +595,6 @@ static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
 	return n_err;
 }
 
-#define PKTINFO_SPACE					\
-	MAX(CMSG_SPACE(sizeof(struct in_pktinfo)),	\
-	    CMSG_SPACE(sizeof(struct in6_pktinfo)))
-
 /**
  * udp_peek_addr() - Get source address for next packet
  * @s:		Socket to get information from
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] udp: Be quieter about errors on UDP receive
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
  2025-04-15  7:16 ` [PATCH 1/7] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15  7:16 ` [PATCH 3/7] udp: Pass socket & flow information direction to error handling functions David Gibson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

If we get an error on UDP receive, either in udp_peek_addr() or
udp_sock_recv(), we'll print an error message.  However, this could be
a perfectly routine UDP error triggered by an ICMP, which need not go to
the error log.

This doesn't usually happen, because before receiving we typically clear
the error queue from udp_sock_errs().  However, it's possible an error
could be flagged after udp_sock_errs() but before we receive.  So it's
better to handle this error "silently" (trace level only).  We'll bail out
of the receive, return to the epoll loop, and get an EPOLLERR where we'll
handle and report the error properly.

In particular there's one situation that can trigger this case much more
easily.  If we start a new outbound UDP flow to a local destination with
nothing listening, we'll get a more or less immediate connection refused
error.  So, we'll get that error on the very first receive after the
connect().  That will occur in udp_flow_defer() -> udp_flush_flow() ->
udp_sock_fwd() -> udp_peek_addr() -> recvmsg().  This path doesn't call
udp_sock_errs() first, so isn't (imperfectly) protected the way we are
most of the time.

Fixes: 84ab1305faba ("udp: Polish udp_vu_sock_info() and remove from...")
Fixes: 69e5393c3722 ("udp: Move some more of sock_handler tasks into...")

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/udp.c b/udp.c
index f5fb98c2..154f99b5 100644
--- a/udp.c
+++ b/udp.c
@@ -619,8 +619,8 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 
 	rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT);
 	if (rc < 0) {
-		if (errno != EAGAIN && errno != EWOULDBLOCK)
-			warn_perror("Error peeking at socket address");
+		trace("Error peeking at socket address: %s", strerror_(errno));
+		/* Bail out and let the EPOLLERR handler deal with it */
 		return rc;
 	}
 
@@ -664,7 +664,8 @@ static int udp_sock_recv(const struct ctx *c, int s, struct mmsghdr *mmh, int n)
 
 	n = recvmmsg(s, mmh, n, 0, NULL);
 	if (n < 0) {
-		err_perror("Error receiving datagrams");
+		trace("Error receiving datagrams: %s", strerror_(errno));
+		/* Bail out and let the EPOLLERR handler deal with it */
 		return 0;
 	}
 
-- 
@@ -619,8 +619,8 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 
 	rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT);
 	if (rc < 0) {
-		if (errno != EAGAIN && errno != EWOULDBLOCK)
-			warn_perror("Error peeking at socket address");
+		trace("Error peeking at socket address: %s", strerror_(errno));
+		/* Bail out and let the EPOLLERR handler deal with it */
 		return rc;
 	}
 
@@ -664,7 +664,8 @@ static int udp_sock_recv(const struct ctx *c, int s, struct mmsghdr *mmh, int n)
 
 	n = recvmmsg(s, mmh, n, 0, NULL);
 	if (n < 0) {
-		err_perror("Error receiving datagrams");
+		trace("Error receiving datagrams: %s", strerror_(errno));
+		/* Bail out and let the EPOLLERR handler deal with it */
 		return 0;
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] udp: Pass socket & flow information direction to error handling functions
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
  2025-04-15  7:16 ` [PATCH 1/7] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
  2025-04-15  7:16 ` [PATCH 2/7] udp: Be quieter about errors on UDP receive David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15  7:16 ` [PATCH 4/7] udp: Deal with errors as we go in udp_sock_fwd() David Gibson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

udp_sock_recverr() and udp_sock_errs() take an epoll reference from which
they obtain both the socket fd to receive errors from, and - for flow
specific sockets - the flow and side the socket is associated with.

We have some upcoming cases where we want to clear errors when we're not
directly associated with receiving an epoll event, so it's not natural to
have an epoll reference.  Therefore, make these functions take the socket
and flow from explicit parameters.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/udp.c b/udp.c
index 154f99b5..c51ac955 100644
--- a/udp.c
+++ b/udp.c
@@ -467,14 +467,15 @@ static void udp_send_tap_icmp6(const struct ctx *c,
 /**
  * udp_sock_recverr() - Receive and clear an error from a socket
  * @c:		Execution context
- * @ref:	epoll reference
+ * @s:		Socket to receive errors from
+ * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
  *
  * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
  *         if there was an error reading the queue
  *
  * #syscalls recvmsg
  */
-static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
+static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 {
 	struct errhdr {
 		struct sock_extended_err ee;
@@ -484,7 +485,6 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 	char data[ICMP6_MAX_DLEN];
 	const struct errhdr *eh;
 	struct cmsghdr *hdr;
-	int s = ref.fd;
 	struct iovec iov = {
 		.iov_base = data,
 		.iov_len = sizeof(data)
@@ -525,12 +525,12 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 	}
 
 	eh = (const struct errhdr *)CMSG_DATA(hdr);
-	if (ref.type == EPOLL_TYPE_UDP) {
-		flow_sidx_t sidx = flow_sidx_opposite(ref.flowside);
-		const struct flowside *toside = flowside_at_sidx(sidx);
+	if (flow_sidx_valid(sidx)) {
+		flow_sidx_t tosidx = flow_sidx_opposite(sidx);
+		const struct flowside *toside = flowside_at_sidx(tosidx);
 		size_t dlen = rc;
 
-		if (pif_is_socket(pif_at_sidx(sidx))) {
+		if (pif_is_socket(pif_at_sidx(tosidx))) {
 			/* XXX Is there any way to propagate ICMPs from socket
 			 * to socket? */
 		} else if (hdr->cmsg_level == IPPROTO_IP) {
@@ -554,21 +554,21 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 /**
  * udp_sock_errs() - Process errors on a socket
  * @c:		Execution context
- * @ref:	epoll reference
+ * @s:		Socket to receive errors from
+ * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
  *
  * Return: Number of errors handled, or < 0 if we have an unrecoverable error
  */
-static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
+static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
 {
 	unsigned n_err = 0;
 	socklen_t errlen;
-	int s = ref.fd;
 	int rc, err;
 
 	ASSERT(!c->no_udp);
 
 	/* Empty the error queue */
-	while ((rc = udp_sock_recverr(c, ref)) > 0)
+	while ((rc = udp_sock_recverr(c, s, sidx)) > 0)
 		n_err += rc;
 
 	if (rc < 0)
@@ -777,7 +777,7 @@ void udp_listen_sock_handler(const struct ctx *c,
 			     const struct timespec *now)
 {
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref) < 0) {
+		if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) {
 			err("UDP: Unrecoverable error on listening socket:"
 			    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
 			/* FIXME: what now?  close/re-open socket? */
@@ -804,7 +804,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 	ASSERT(!c->no_udp && uflow);
 
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref) < 0) {
+		if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) {
 			flow_err(uflow, "Unrecoverable error on flow socket");
 			goto fail;
 		}
-- 
@@ -467,14 +467,15 @@ static void udp_send_tap_icmp6(const struct ctx *c,
 /**
  * udp_sock_recverr() - Receive and clear an error from a socket
  * @c:		Execution context
- * @ref:	epoll reference
+ * @s:		Socket to receive errors from
+ * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
  *
  * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
  *         if there was an error reading the queue
  *
  * #syscalls recvmsg
  */
-static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
+static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 {
 	struct errhdr {
 		struct sock_extended_err ee;
@@ -484,7 +485,6 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 	char data[ICMP6_MAX_DLEN];
 	const struct errhdr *eh;
 	struct cmsghdr *hdr;
-	int s = ref.fd;
 	struct iovec iov = {
 		.iov_base = data,
 		.iov_len = sizeof(data)
@@ -525,12 +525,12 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 	}
 
 	eh = (const struct errhdr *)CMSG_DATA(hdr);
-	if (ref.type == EPOLL_TYPE_UDP) {
-		flow_sidx_t sidx = flow_sidx_opposite(ref.flowside);
-		const struct flowside *toside = flowside_at_sidx(sidx);
+	if (flow_sidx_valid(sidx)) {
+		flow_sidx_t tosidx = flow_sidx_opposite(sidx);
+		const struct flowside *toside = flowside_at_sidx(tosidx);
 		size_t dlen = rc;
 
-		if (pif_is_socket(pif_at_sidx(sidx))) {
+		if (pif_is_socket(pif_at_sidx(tosidx))) {
 			/* XXX Is there any way to propagate ICMPs from socket
 			 * to socket? */
 		} else if (hdr->cmsg_level == IPPROTO_IP) {
@@ -554,21 +554,21 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
 /**
  * udp_sock_errs() - Process errors on a socket
  * @c:		Execution context
- * @ref:	epoll reference
+ * @s:		Socket to receive errors from
+ * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
  *
  * Return: Number of errors handled, or < 0 if we have an unrecoverable error
  */
-static int udp_sock_errs(const struct ctx *c, union epoll_ref ref)
+static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
 {
 	unsigned n_err = 0;
 	socklen_t errlen;
-	int s = ref.fd;
 	int rc, err;
 
 	ASSERT(!c->no_udp);
 
 	/* Empty the error queue */
-	while ((rc = udp_sock_recverr(c, ref)) > 0)
+	while ((rc = udp_sock_recverr(c, s, sidx)) > 0)
 		n_err += rc;
 
 	if (rc < 0)
@@ -777,7 +777,7 @@ void udp_listen_sock_handler(const struct ctx *c,
 			     const struct timespec *now)
 {
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref) < 0) {
+		if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) {
 			err("UDP: Unrecoverable error on listening socket:"
 			    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
 			/* FIXME: what now?  close/re-open socket? */
@@ -804,7 +804,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 	ASSERT(!c->no_udp && uflow);
 
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref) < 0) {
+		if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) {
 			flow_err(uflow, "Unrecoverable error on flow socket");
 			goto fail;
 		}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] udp: Deal with errors as we go in udp_sock_fwd()
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
                   ` (2 preceding siblings ...)
  2025-04-15  7:16 ` [PATCH 3/7] udp: Pass socket & flow information direction to error handling functions David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15  7:16 ` [PATCH 5/7] udp: Add udp_pktinfo() helper David Gibson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

When we get an epoll event on a listening socket, we first deal with any
errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()).
However, it's theoretically possible that new errors could get flagged on
the socket after we call udp_sock_errs(), in which case we could get errors
returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg().

In fact, we do deal with this correctly, although the path is somewhat
non-obvious.  The recvmsg() error will cause us to bail out of
udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come
back here next epoll loop and call udp_sock_errs().

Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from
epoll events.  This is to deal with any packets that arrived between bind()
and connect(), and so might not be associated with the socket's intended
flow.  This expects udp_sock_fwd() to flush _all_ queued datagrams, so that
anything received later must be for the correct flow.

At the moment, udp_sock_errs() might fail to flush all datagrams if errors
occur.  In particular this can happen in practice for locally reported
errors which occur immediately after connect() (e.g. connecting to a local
port with nothing listening).

We can deal with the problem case, and also make the flow a little more
natural for the common case by having udp_sock_fwd() call udp_sock_errs()
to handle errors as the occur, rather than trying to deal with all errors
in advance.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 45 ++++++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/udp.c b/udp.c
index c51ac955..0bec499d 100644
--- a/udp.c
+++ b/udp.c
@@ -601,7 +601,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
  * @src:	Socket address (output)
  * @dst:	(Local) destination address (output)
  *
- * Return: 0 on success, -1 otherwise
+ * Return: 0 if no more packets, 1 on success, -ve error code on error
  */
 static int udp_peek_addr(int s, union sockaddr_inany *src,
 			 union inany_addr *dst)
@@ -619,9 +619,9 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 
 	rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT);
 	if (rc < 0) {
-		trace("Error peeking at socket address: %s", strerror_(errno));
-		/* Bail out and let the EPOLLERR handler deal with it */
-		return rc;
+		if (errno == EAGAIN || errno == EWOULDBLOCK)
+			return 0;
+		return -errno;
 	}
 
 	hdr = CMSG_FIRSTHDR(&msg);
@@ -644,7 +644,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 	      sockaddr_ntop(src, sastr, sizeof(sastr)),
 	      inany_ntop(dst, dstr, sizeof(dstr)));
 
-	return 0;
+	return 1;
 }
 
 /**
@@ -740,11 +740,27 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
 {
 	union sockaddr_inany src;
 	union inany_addr dst;
+	int rc;
 
-	while (udp_peek_addr(s, &src, &dst) == 0) {
-		flow_sidx_t tosidx = udp_flow_from_sock(c, frompif,
-							&dst, port, &src, now);
-		uint8_t topif = pif_at_sidx(tosidx);
+	while ((rc = udp_peek_addr(s, &src, &dst)) != 0) {
+		flow_sidx_t tosidx;
+		uint8_t topif;
+
+		if (rc < 0) {
+			trace("Error peeking at socket address: %s",
+			      strerror_(-rc));
+			/* Clear errors & carry on */
+			if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) {
+				err(
+"UDP: Unrecoverable error on listening socket: (%s port %hu)",
+				    pif_name(frompif), port);
+				/* FIXME: what now?  close/re-open socket? */
+			}
+			continue;
+		}
+
+		tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now);
+		topif = pif_at_sidx(tosidx);
 
 		if (pif_is_socket(topif)) {
 			udp_sock_to_sock(c, s, 1, tosidx);
@@ -776,16 +792,7 @@ void udp_listen_sock_handler(const struct ctx *c,
 			     union epoll_ref ref, uint32_t events,
 			     const struct timespec *now)
 {
-	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) {
-			err("UDP: Unrecoverable error on listening socket:"
-			    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
-			/* FIXME: what now?  close/re-open socket? */
-			return;
-		}
-	}
-
-	if (events & EPOLLIN)
+	if (events & (EPOLLERR | EPOLLIN))
 		udp_sock_fwd(c, ref.fd, ref.udp.pif, ref.udp.port, now);
 }
 
-- 
@@ -601,7 +601,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
  * @src:	Socket address (output)
  * @dst:	(Local) destination address (output)
  *
- * Return: 0 on success, -1 otherwise
+ * Return: 0 if no more packets, 1 on success, -ve error code on error
  */
 static int udp_peek_addr(int s, union sockaddr_inany *src,
 			 union inany_addr *dst)
@@ -619,9 +619,9 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 
 	rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT);
 	if (rc < 0) {
-		trace("Error peeking at socket address: %s", strerror_(errno));
-		/* Bail out and let the EPOLLERR handler deal with it */
-		return rc;
+		if (errno == EAGAIN || errno == EWOULDBLOCK)
+			return 0;
+		return -errno;
 	}
 
 	hdr = CMSG_FIRSTHDR(&msg);
@@ -644,7 +644,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 	      sockaddr_ntop(src, sastr, sizeof(sastr)),
 	      inany_ntop(dst, dstr, sizeof(dstr)));
 
-	return 0;
+	return 1;
 }
 
 /**
@@ -740,11 +740,27 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
 {
 	union sockaddr_inany src;
 	union inany_addr dst;
+	int rc;
 
-	while (udp_peek_addr(s, &src, &dst) == 0) {
-		flow_sidx_t tosidx = udp_flow_from_sock(c, frompif,
-							&dst, port, &src, now);
-		uint8_t topif = pif_at_sidx(tosidx);
+	while ((rc = udp_peek_addr(s, &src, &dst)) != 0) {
+		flow_sidx_t tosidx;
+		uint8_t topif;
+
+		if (rc < 0) {
+			trace("Error peeking at socket address: %s",
+			      strerror_(-rc));
+			/* Clear errors & carry on */
+			if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) {
+				err(
+"UDP: Unrecoverable error on listening socket: (%s port %hu)",
+				    pif_name(frompif), port);
+				/* FIXME: what now?  close/re-open socket? */
+			}
+			continue;
+		}
+
+		tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now);
+		topif = pif_at_sidx(tosidx);
 
 		if (pif_is_socket(topif)) {
 			udp_sock_to_sock(c, s, 1, tosidx);
@@ -776,16 +792,7 @@ void udp_listen_sock_handler(const struct ctx *c,
 			     union epoll_ref ref, uint32_t events,
 			     const struct timespec *now)
 {
-	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) {
-			err("UDP: Unrecoverable error on listening socket:"
-			    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
-			/* FIXME: what now?  close/re-open socket? */
-			return;
-		}
-	}
-
-	if (events & EPOLLIN)
+	if (events & (EPOLLERR | EPOLLIN))
 		udp_sock_fwd(c, ref.fd, ref.udp.pif, ref.udp.port, now);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] udp: Add udp_pktinfo() helper
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
                   ` (3 preceding siblings ...)
  2025-04-15  7:16 ` [PATCH 4/7] udp: Deal with errors as we go in udp_sock_fwd() David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15 18:54   ` Stefano Brivio
  2025-04-15  7:16 ` [PATCH 6/7] udp: Minor re-organisation of udp_sock_recverr() David Gibson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

Currently we open code parsing the control message for IP_PKTINFO in
udp_peek_addr().  We have an upcoming case where we want to parse PKTINFO
in another place, so split this out into a helper function.

While we're there, make the parsing a bit more robust: scan all cmsgs to
look for the one we want, rather than assuming there's only one.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 52 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/udp.c b/udp.c
index 0bec499d..352ab83b 100644
--- a/udp.c
+++ b/udp.c
@@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c,
 	tap_icmp6_send(c, saddr, eaddr, &msg, msglen);
 }
 
+/**
+ * udp_pktinfo() - Retreive packet destination address from cmsg
+ * @msg:	msghdr into which message has been received
+ * @dst:	(Local) destination address of message in @mh (output)
+ *
+ * Return: 0 on success, -1 if the information was missing (@dst is set to
+ *         inany_any6).
+ */
+static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
+{
+	struct cmsghdr *hdr;
+
+	for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) {
+		if (hdr->cmsg_level == IPPROTO_IP &&
+		    hdr->cmsg_type == IP_PKTINFO) {
+			const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr);
+
+			*dst = inany_from_v4(i4->ipi_addr);
+			return 0;
+		}
+
+		if (hdr->cmsg_level == IPPROTO_IPV6 &&
+			   hdr->cmsg_type == IPV6_PKTINFO) {
+			const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr);
+
+			dst->a6 = i6->ipi6_addr;
+			return 0;
+		}
+	}
+
+	err("Missing PKTINFO cmsg on datagram");
+	*dst = inany_any6;
+	return -1;
+}
+
 /**
  * udp_sock_recverr() - Receive and clear an error from a socket
  * @c:		Execution context
@@ -607,7 +642,6 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 			 union inany_addr *dst)
 {
 	char sastr[SOCKADDR_STRLEN], dstr[INANY_ADDRSTRLEN];
-	const struct cmsghdr *hdr;
 	char cmsg[PKTINFO_SPACE];
 	struct msghdr msg = {
 		.msg_name = src,
@@ -624,21 +658,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 		return -errno;
 	}
 
-	hdr = CMSG_FIRSTHDR(&msg);
-	if (hdr && hdr->cmsg_level == IPPROTO_IP &&
-	    hdr->cmsg_type == IP_PKTINFO) {
-		const struct in_pktinfo *info4 = (void *)CMSG_DATA(hdr);
-
-		*dst = inany_from_v4(info4->ipi_addr);
-	} else if (hdr && hdr->cmsg_level == IPPROTO_IPV6 &&
-		   hdr->cmsg_type == IPV6_PKTINFO) {
-		const struct in6_pktinfo *info6 = (void *)CMSG_DATA(hdr);
-
-		dst->a6 = info6->ipi6_addr;
-	} else {
-		debug("Unexpected cmsg on UDP datagram");
-		*dst = inany_any6;
-	}
+	udp_pktinfo(&msg, dst);
 
 	trace("Peeked UDP datagram: %s -> %s",
 	      sockaddr_ntop(src, sastr, sizeof(sastr)),
-- 
@@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c,
 	tap_icmp6_send(c, saddr, eaddr, &msg, msglen);
 }
 
+/**
+ * udp_pktinfo() - Retreive packet destination address from cmsg
+ * @msg:	msghdr into which message has been received
+ * @dst:	(Local) destination address of message in @mh (output)
+ *
+ * Return: 0 on success, -1 if the information was missing (@dst is set to
+ *         inany_any6).
+ */
+static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
+{
+	struct cmsghdr *hdr;
+
+	for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) {
+		if (hdr->cmsg_level == IPPROTO_IP &&
+		    hdr->cmsg_type == IP_PKTINFO) {
+			const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr);
+
+			*dst = inany_from_v4(i4->ipi_addr);
+			return 0;
+		}
+
+		if (hdr->cmsg_level == IPPROTO_IPV6 &&
+			   hdr->cmsg_type == IPV6_PKTINFO) {
+			const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr);
+
+			dst->a6 = i6->ipi6_addr;
+			return 0;
+		}
+	}
+
+	err("Missing PKTINFO cmsg on datagram");
+	*dst = inany_any6;
+	return -1;
+}
+
 /**
  * udp_sock_recverr() - Receive and clear an error from a socket
  * @c:		Execution context
@@ -607,7 +642,6 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 			 union inany_addr *dst)
 {
 	char sastr[SOCKADDR_STRLEN], dstr[INANY_ADDRSTRLEN];
-	const struct cmsghdr *hdr;
 	char cmsg[PKTINFO_SPACE];
 	struct msghdr msg = {
 		.msg_name = src,
@@ -624,21 +658,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src,
 		return -errno;
 	}
 
-	hdr = CMSG_FIRSTHDR(&msg);
-	if (hdr && hdr->cmsg_level == IPPROTO_IP &&
-	    hdr->cmsg_type == IP_PKTINFO) {
-		const struct in_pktinfo *info4 = (void *)CMSG_DATA(hdr);
-
-		*dst = inany_from_v4(info4->ipi_addr);
-	} else if (hdr && hdr->cmsg_level == IPPROTO_IPV6 &&
-		   hdr->cmsg_type == IPV6_PKTINFO) {
-		const struct in6_pktinfo *info6 = (void *)CMSG_DATA(hdr);
-
-		dst->a6 = info6->ipi6_addr;
-	} else {
-		debug("Unexpected cmsg on UDP datagram");
-		*dst = inany_any6;
-	}
+	udp_pktinfo(&msg, dst);
 
 	trace("Peeked UDP datagram: %s -> %s",
 	      sockaddr_ntop(src, sastr, sizeof(sastr)),
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] udp: Minor re-organisation of udp_sock_recverr()
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
                   ` (4 preceding siblings ...)
  2025-04-15  7:16 ` [PATCH 5/7] udp: Add udp_pktinfo() helper David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15  7:16 ` [PATCH 7/7] udp: Propagate errors on listening and brand new sockets David Gibson
  2025-04-15 19:10 ` [PATCH 0/7] Assorted fixes for UDP socket and error handling problems Stefano Brivio
  7 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

Usually we work with the "exit early" flow style, where we return early
on "error" conditions in functions.  We don't currently do this in
udp_sock_recverr() for the case where we don't have a flow to associate
the error with.

Reorganise to use the "exit early" style, which will make some subsequent
changes less awkward.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/udp.c b/udp.c
index 352ab83b..6db3accc 100644
--- a/udp.c
+++ b/udp.c
@@ -530,6 +530,9 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 		.msg_control = buf,
 		.msg_controllen = sizeof(buf),
 	};
+	const struct flowside *toside;
+	flow_sidx_t tosidx;
+	size_t dlen;
 	ssize_t rc;
 
 	rc = recvmsg(s, &mh, MSG_ERRQUEUE);
@@ -560,29 +563,32 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 	}
 
 	eh = (const struct errhdr *)CMSG_DATA(hdr);
-	if (flow_sidx_valid(sidx)) {
-		flow_sidx_t tosidx = flow_sidx_opposite(sidx);
-		const struct flowside *toside = flowside_at_sidx(tosidx);
-		size_t dlen = rc;
-
-		if (pif_is_socket(pif_at_sidx(tosidx))) {
-			/* XXX Is there any way to propagate ICMPs from socket
-			 * to socket? */
-		} else if (hdr->cmsg_level == IPPROTO_IP) {
-			dlen = MIN(dlen, ICMP4_MAX_DLEN);
-			udp_send_tap_icmp4(c, &eh->ee, toside,
-					   eh->saddr.sa4.sin_addr, data, dlen);
-		} else if (hdr->cmsg_level == IPPROTO_IPV6) {
-			udp_send_tap_icmp6(c, &eh->ee, toside,
-					   &eh->saddr.sa6.sin6_addr, data,
-					   dlen, sidx.flowi);
-		}
-	} else {
-		trace("Ignoring received IP_RECVERR cmsg on listener socket");
-	}
+
 	debug("%s error on UDP socket %i: %s",
 	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
 
+	if (!flow_sidx_valid(sidx)) {
+		trace("Ignoring received IP_RECVERR cmsg on listener socket");
+		return 1;
+	}
+
+	tosidx = flow_sidx_opposite(sidx);
+	toside = flowside_at_sidx(tosidx);
+	dlen = rc;
+
+	if (pif_is_socket(pif_at_sidx(tosidx))) {
+		/* XXX Is there any way to propagate ICMPs from socket to
+		 * socket? */
+	} else if (hdr->cmsg_level == IPPROTO_IP) {
+		dlen = MIN(dlen, ICMP4_MAX_DLEN);
+		udp_send_tap_icmp4(c, &eh->ee, toside,
+				   eh->saddr.sa4.sin_addr, data, dlen);
+	} else if (hdr->cmsg_level == IPPROTO_IPV6) {
+		udp_send_tap_icmp6(c, &eh->ee, toside,
+				   &eh->saddr.sa6.sin6_addr, data,
+				   dlen, sidx.flowi);
+	}
+
 	return 1;
 }
 
-- 
@@ -530,6 +530,9 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 		.msg_control = buf,
 		.msg_controllen = sizeof(buf),
 	};
+	const struct flowside *toside;
+	flow_sidx_t tosidx;
+	size_t dlen;
 	ssize_t rc;
 
 	rc = recvmsg(s, &mh, MSG_ERRQUEUE);
@@ -560,29 +563,32 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 	}
 
 	eh = (const struct errhdr *)CMSG_DATA(hdr);
-	if (flow_sidx_valid(sidx)) {
-		flow_sidx_t tosidx = flow_sidx_opposite(sidx);
-		const struct flowside *toside = flowside_at_sidx(tosidx);
-		size_t dlen = rc;
-
-		if (pif_is_socket(pif_at_sidx(tosidx))) {
-			/* XXX Is there any way to propagate ICMPs from socket
-			 * to socket? */
-		} else if (hdr->cmsg_level == IPPROTO_IP) {
-			dlen = MIN(dlen, ICMP4_MAX_DLEN);
-			udp_send_tap_icmp4(c, &eh->ee, toside,
-					   eh->saddr.sa4.sin_addr, data, dlen);
-		} else if (hdr->cmsg_level == IPPROTO_IPV6) {
-			udp_send_tap_icmp6(c, &eh->ee, toside,
-					   &eh->saddr.sa6.sin6_addr, data,
-					   dlen, sidx.flowi);
-		}
-	} else {
-		trace("Ignoring received IP_RECVERR cmsg on listener socket");
-	}
+
 	debug("%s error on UDP socket %i: %s",
 	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
 
+	if (!flow_sidx_valid(sidx)) {
+		trace("Ignoring received IP_RECVERR cmsg on listener socket");
+		return 1;
+	}
+
+	tosidx = flow_sidx_opposite(sidx);
+	toside = flowside_at_sidx(tosidx);
+	dlen = rc;
+
+	if (pif_is_socket(pif_at_sidx(tosidx))) {
+		/* XXX Is there any way to propagate ICMPs from socket to
+		 * socket? */
+	} else if (hdr->cmsg_level == IPPROTO_IP) {
+		dlen = MIN(dlen, ICMP4_MAX_DLEN);
+		udp_send_tap_icmp4(c, &eh->ee, toside,
+				   eh->saddr.sa4.sin_addr, data, dlen);
+	} else if (hdr->cmsg_level == IPPROTO_IPV6) {
+		udp_send_tap_icmp6(c, &eh->ee, toside,
+				   &eh->saddr.sa6.sin6_addr, data,
+				   dlen, sidx.flowi);
+	}
+
 	return 1;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] udp: Propagate errors on listening and brand new sockets
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
                   ` (5 preceding siblings ...)
  2025-04-15  7:16 ` [PATCH 6/7] udp: Minor re-organisation of udp_sock_recverr() David Gibson
@ 2025-04-15  7:16 ` David Gibson
  2025-04-15 18:54   ` Stefano Brivio
  2025-04-15 19:10 ` [PATCH 0/7] Assorted fixes for UDP socket and error handling problems Stefano Brivio
  7 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2025-04-15  7:16 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Jon Maloy, David Gibson

udp_sock_recverr() processes errors on UDP sockets and attempts to
propagate them as ICMP packets on the tap interface.  To do this it
currently requires the flow with which the error is associated as a
parameter.  If that's missing it will clear the error condition, but not
propagate it.

That means that we largely ignore errors on "listening" sockets.  It also
means we may discard some errors on flow specific sockets if they occur
very shortly after the socket is created.  In udp_flush_flow() we need to
clear any datagrams received between bind() and connect() which might not
be associated with the "final" flow for the socket.  If we get errors
before that point we'll ignore them in the same way because we don't know
the flow they're associated with in advance.

This can happen in practice if we have errors which occur almost
immediately after connect(), such as ECONNREFUSED when we connect() to a
local address where nothing is listening.

Between the extended error message itself and the PKTINFO information we
do actually have enough information to find the correct flow.  So, rather
than ignoring errors where we don't have a flow "hint", determine the flow
the hard way in udp_sock_recverr().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/udp.c b/udp.c
index 6db3accc..c04a091b 100644
--- a/udp.c
+++ b/udp.c
@@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
  * @c:		Execution context
  * @s:		Socket to receive errors from
  * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
+ * @pif:	Interface on which the error occurred
+ *              (only used if @sidx == FLOW_SIDX_NONE)
+ * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
  *
  * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
  *         if there was an error reading the queue
  *
  * #syscalls recvmsg
  */
-static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
+static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
+			    uint8_t pif, in_port_t port)
 {
 	struct errhdr {
 		struct sock_extended_err ee;
 		union sockaddr_inany saddr;
 	};
 	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
+	const struct errhdr *eh = NULL;
 	char data[ICMP6_MAX_DLEN];
-	const struct errhdr *eh;
 	struct cmsghdr *hdr;
 	struct iovec iov = {
 		.iov_base = data,
 		.iov_len = sizeof(data)
 	};
+	union sockaddr_inany src;
 	struct msghdr mh = {
+		.msg_name = &src,
+		.msg_namelen = sizeof(src),
 		.msg_iov = &iov,
 		.msg_iovlen = 1,
 		.msg_control = buf,
@@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 		      hdr->cmsg_type == IP_RECVERR) ||
 		     (hdr->cmsg_level == IPPROTO_IPV6 &&
 		      hdr->cmsg_type == IPV6_RECVERR))
-		    break;
+			break;
 	}
 
 	if (!hdr) {
@@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
 
 	if (!flow_sidx_valid(sidx)) {
-		trace("Ignoring received IP_RECVERR cmsg on listener socket");
-		return 1;
+		/* No hint from the socket, determine flow from addresses */
+		union inany_addr dst;
+
+		if (udp_pktinfo(&mh, &dst) < 0) {
+			warn("Missing PKTINFO on UDP error");
+			return 1;
+		}
+
+		sidx = flow_lookup_sa(c, IPPROTO_UDP, pif, &src, &dst, port);
+		if (!flow_sidx_valid(sidx)) {
+			debug("Ignoring UDP error without flow");
+			return 1;
+		}
 	}
 
 	tosidx = flow_sidx_opposite(sidx);
@@ -597,10 +615,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
  * @c:		Execution context
  * @s:		Socket to receive errors from
  * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
+ * @pif:	Interface on which the error occurred
+ *              (only used if @sidx == FLOW_SIDX_NONE)
+ * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
  *
  * Return: Number of errors handled, or < 0 if we have an unrecoverable error
  */
-static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
+static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx,
+			 uint8_t pif, in_port_t port)
 {
 	unsigned n_err = 0;
 	socklen_t errlen;
@@ -609,7 +631,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
 	ASSERT(!c->no_udp);
 
 	/* Empty the error queue */
-	while ((rc = udp_sock_recverr(c, s, sidx)) > 0)
+	while ((rc = udp_sock_recverr(c, s, sidx, pif, port)) > 0)
 		n_err += rc;
 
 	if (rc < 0)
@@ -776,7 +798,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
 			trace("Error peeking at socket address: %s",
 			      strerror_(-rc));
 			/* Clear errors & carry on */
-			if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) {
+			if (udp_sock_errs(c, s, FLOW_SIDX_NONE,
+					  frompif, port) < 0) {
 				err(
 "UDP: Unrecoverable error on listening socket: (%s port %hu)",
 				    pif_name(frompif), port);
@@ -837,7 +860,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 	ASSERT(!c->no_udp && uflow);
 
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) {
+		if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) {
 			flow_err(uflow, "Unrecoverable error on flow socket");
 			goto fail;
 		}
-- 
@@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
  * @c:		Execution context
  * @s:		Socket to receive errors from
  * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
+ * @pif:	Interface on which the error occurred
+ *              (only used if @sidx == FLOW_SIDX_NONE)
+ * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
  *
  * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
  *         if there was an error reading the queue
  *
  * #syscalls recvmsg
  */
-static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
+static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
+			    uint8_t pif, in_port_t port)
 {
 	struct errhdr {
 		struct sock_extended_err ee;
 		union sockaddr_inany saddr;
 	};
 	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
+	const struct errhdr *eh = NULL;
 	char data[ICMP6_MAX_DLEN];
-	const struct errhdr *eh;
 	struct cmsghdr *hdr;
 	struct iovec iov = {
 		.iov_base = data,
 		.iov_len = sizeof(data)
 	};
+	union sockaddr_inany src;
 	struct msghdr mh = {
+		.msg_name = &src,
+		.msg_namelen = sizeof(src),
 		.msg_iov = &iov,
 		.msg_iovlen = 1,
 		.msg_control = buf,
@@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 		      hdr->cmsg_type == IP_RECVERR) ||
 		     (hdr->cmsg_level == IPPROTO_IPV6 &&
 		      hdr->cmsg_type == IPV6_RECVERR))
-		    break;
+			break;
 	}
 
 	if (!hdr) {
@@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
 	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
 
 	if (!flow_sidx_valid(sidx)) {
-		trace("Ignoring received IP_RECVERR cmsg on listener socket");
-		return 1;
+		/* No hint from the socket, determine flow from addresses */
+		union inany_addr dst;
+
+		if (udp_pktinfo(&mh, &dst) < 0) {
+			warn("Missing PKTINFO on UDP error");
+			return 1;
+		}
+
+		sidx = flow_lookup_sa(c, IPPROTO_UDP, pif, &src, &dst, port);
+		if (!flow_sidx_valid(sidx)) {
+			debug("Ignoring UDP error without flow");
+			return 1;
+		}
 	}
 
 	tosidx = flow_sidx_opposite(sidx);
@@ -597,10 +615,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
  * @c:		Execution context
  * @s:		Socket to receive errors from
  * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
+ * @pif:	Interface on which the error occurred
+ *              (only used if @sidx == FLOW_SIDX_NONE)
+ * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
  *
  * Return: Number of errors handled, or < 0 if we have an unrecoverable error
  */
-static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
+static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx,
+			 uint8_t pif, in_port_t port)
 {
 	unsigned n_err = 0;
 	socklen_t errlen;
@@ -609,7 +631,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx)
 	ASSERT(!c->no_udp);
 
 	/* Empty the error queue */
-	while ((rc = udp_sock_recverr(c, s, sidx)) > 0)
+	while ((rc = udp_sock_recverr(c, s, sidx, pif, port)) > 0)
 		n_err += rc;
 
 	if (rc < 0)
@@ -776,7 +798,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
 			trace("Error peeking at socket address: %s",
 			      strerror_(-rc));
 			/* Clear errors & carry on */
-			if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) {
+			if (udp_sock_errs(c, s, FLOW_SIDX_NONE,
+					  frompif, port) < 0) {
 				err(
 "UDP: Unrecoverable error on listening socket: (%s port %hu)",
 				    pif_name(frompif), port);
@@ -837,7 +860,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 	ASSERT(!c->no_udp && uflow);
 
 	if (events & EPOLLERR) {
-		if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) {
+		if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) {
 			flow_err(uflow, "Unrecoverable error on flow socket");
 			goto fail;
 		}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/7] udp: Add udp_pktinfo() helper
  2025-04-15  7:16 ` [PATCH 5/7] udp: Add udp_pktinfo() helper David Gibson
@ 2025-04-15 18:54   ` Stefano Brivio
  2025-04-16  0:37     ` David Gibson
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2025-04-15 18:54 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Jon Maloy

I took the liberty of applying this with two minor changes:

On Tue, 15 Apr 2025 17:16:22 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Currently we open code parsing the control message for IP_PKTINFO in
> udp_peek_addr().  We have an upcoming case where we want to parse PKTINFO
> in another place, so split this out into a helper function.
> 
> While we're there, make the parsing a bit more robust: scan all cmsgs to
> look for the one we want, rather than assuming there's only one.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 52 ++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/udp.c b/udp.c
> index 0bec499d..352ab83b 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c,
>  	tap_icmp6_send(c, saddr, eaddr, &msg, msglen);
>  }
>  
> +/**
> + * udp_pktinfo() - Retreive packet destination address from cmsg

Nit: "Retrieve", and:

> + * @msg:	msghdr into which message has been received
> + * @dst:	(Local) destination address of message in @mh (output)
> + *
> + * Return: 0 on success, -1 if the information was missing (@dst is set to
> + *         inany_any6).
> + */
> +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
> +{
> +	struct cmsghdr *hdr;
> +
> +	for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) {
> +		if (hdr->cmsg_level == IPPROTO_IP &&
> +		    hdr->cmsg_type == IP_PKTINFO) {
> +			const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr);
> +
> +			*dst = inany_from_v4(i4->ipi_addr);
> +			return 0;
> +		}
> +
> +		if (hdr->cmsg_level == IPPROTO_IPV6 &&
> +			   hdr->cmsg_type == IPV6_PKTINFO) {
> +			const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr);
> +
> +			dst->a6 = i6->ipi6_addr;
> +			return 0;
> +		}
> +	}
> +
> +	err("Missing PKTINFO cmsg on datagram");

...from a quick glance at the matching kernel path, this looks a bit
too likely for err(), so I got a bit scared, and turned it to debug().

I think warn() is actually more correct than debug() (I think it
shouldn't really be err() though because we can keep using this flow),
but I would feel a lot more confident about it once we have
rate-limited versions of those.

Of course, I'll change this back to err() or warn() if you have a
compelling reason.

Similar reasoning in 7/7 where this is called.

By the way, two messages (here and in caller) look redundant to me,
regardless of their severity, but I'm not sure if you have further
changes in mind where these separate prints would make sense.

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 7/7] udp: Propagate errors on listening and brand new sockets
  2025-04-15  7:16 ` [PATCH 7/7] udp: Propagate errors on listening and brand new sockets David Gibson
@ 2025-04-15 18:54   ` Stefano Brivio
  2025-04-16  0:38     ` David Gibson
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2025-04-15 18:54 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Jon Maloy

On Tue, 15 Apr 2025 17:16:24 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> udp_sock_recverr() processes errors on UDP sockets and attempts to
> propagate them as ICMP packets on the tap interface.  To do this it
> currently requires the flow with which the error is associated as a
> parameter.  If that's missing it will clear the error condition, but not
> propagate it.
> 
> That means that we largely ignore errors on "listening" sockets.  It also
> means we may discard some errors on flow specific sockets if they occur
> very shortly after the socket is created.  In udp_flush_flow() we need to
> clear any datagrams received between bind() and connect() which might not
> be associated with the "final" flow for the socket.  If we get errors
> before that point we'll ignore them in the same way because we don't know
> the flow they're associated with in advance.
> 
> This can happen in practice if we have errors which occur almost
> immediately after connect(), such as ECONNREFUSED when we connect() to a
> local address where nothing is listening.
> 
> Between the extended error message itself and the PKTINFO information we
> do actually have enough information to find the correct flow.  So, rather
> than ignoring errors where we don't have a flow "hint", determine the flow
> the hard way in udp_sock_recverr().
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/udp.c b/udp.c
> index 6db3accc..c04a091b 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
>   * @c:		Execution context
>   * @s:		Socket to receive errors from
>   * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
> + * @pif:	Interface on which the error occurred
> + *              (only used if @sidx == FLOW_SIDX_NONE)
> + * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
>   *
>   * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
>   *         if there was an error reading the queue
>   *
>   * #syscalls recvmsg
>   */
> -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
> +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
> +			    uint8_t pif, in_port_t port)
>  {
>  	struct errhdr {
>  		struct sock_extended_err ee;
>  		union sockaddr_inany saddr;
>  	};
>  	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
> +	const struct errhdr *eh = NULL;
>  	char data[ICMP6_MAX_DLEN];
> -	const struct errhdr *eh;
>  	struct cmsghdr *hdr;
>  	struct iovec iov = {
>  		.iov_base = data,
>  		.iov_len = sizeof(data)
>  	};
> +	union sockaddr_inany src;
>  	struct msghdr mh = {
> +		.msg_name = &src,
> +		.msg_namelen = sizeof(src),
>  		.msg_iov = &iov,
>  		.msg_iovlen = 1,
>  		.msg_control = buf,
> @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
>  		      hdr->cmsg_type == IP_RECVERR) ||
>  		     (hdr->cmsg_level == IPPROTO_IPV6 &&
>  		      hdr->cmsg_type == IPV6_RECVERR))
> -		    break;
> +			break;
>  	}
>  
>  	if (!hdr) {
> @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
>  	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
>  
>  	if (!flow_sidx_valid(sidx)) {
> -		trace("Ignoring received IP_RECVERR cmsg on listener socket");
> -		return 1;
> +		/* No hint from the socket, determine flow from addresses */
> +		union inany_addr dst;
> +
> +		if (udp_pktinfo(&mh, &dst) < 0) {
> +			warn("Missing PKTINFO on UDP error");

...changed this to debug(), see my comments to 5/7.

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] Assorted fixes for UDP socket and error handling problems
  2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
                   ` (6 preceding siblings ...)
  2025-04-15  7:16 ` [PATCH 7/7] udp: Propagate errors on listening and brand new sockets David Gibson
@ 2025-04-15 19:10 ` Stefano Brivio
  7 siblings, 0 replies; 13+ messages in thread
From: Stefano Brivio @ 2025-04-15 19:10 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Jon Maloy

On Tue, 15 Apr 2025 17:16:17 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> My recent changes to UDP socket management caused some unfortunate
> side effects.  In particular, it has some bad interactions with UDP
> error handling.  Fix several bugs here, along with some reworks to
> allow that.
> 
> David Gibson (7):
>   udp: Fix breakage of UDP error handling by PKTINFO support
>   udp: Be quieter about errors on UDP receive
>   udp: Pass socket & flow information direction to error handling
>     functions
>   udp: Deal with errors as we go in udp_sock_fwd()
>   udp: Add udp_pktinfo() helper
>   udp: Minor re-organisation of udp_sock_recverr()
>   udp: Propagate errors on listening and brand new sockets

Applied with minor changes as described for 5/7 and 7/7.

-- 
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/7] udp: Add udp_pktinfo() helper
  2025-04-15 18:54   ` Stefano Brivio
@ 2025-04-16  0:37     ` David Gibson
  0 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-16  0:37 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 3542 bytes --]

On Tue, Apr 15, 2025 at 08:54:00PM +0200, Stefano Brivio wrote:
> I took the liberty of applying this with two minor changes:
> 
> On Tue, 15 Apr 2025 17:16:22 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently we open code parsing the control message for IP_PKTINFO in
> > udp_peek_addr().  We have an upcoming case where we want to parse PKTINFO
> > in another place, so split this out into a helper function.
> > 
> > While we're there, make the parsing a bit more robust: scan all cmsgs to
> > look for the one we want, rather than assuming there's only one.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 52 ++++++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 36 insertions(+), 16 deletions(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index 0bec499d..352ab83b 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c,
> >  	tap_icmp6_send(c, saddr, eaddr, &msg, msglen);
> >  }
> >  
> > +/**
> > + * udp_pktinfo() - Retreive packet destination address from cmsg
> 
> Nit: "Retrieve", and:

Oops, thanks.

> > + * @msg:	msghdr into which message has been received
> > + * @dst:	(Local) destination address of message in @mh (output)
> > + *
> > + * Return: 0 on success, -1 if the information was missing (@dst is set to
> > + *         inany_any6).
> > + */
> > +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
> > +{
> > +	struct cmsghdr *hdr;
> > +
> > +	for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) {
> > +		if (hdr->cmsg_level == IPPROTO_IP &&
> > +		    hdr->cmsg_type == IP_PKTINFO) {
> > +			const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr);
> > +
> > +			*dst = inany_from_v4(i4->ipi_addr);
> > +			return 0;
> > +		}
> > +
> > +		if (hdr->cmsg_level == IPPROTO_IPV6 &&
> > +			   hdr->cmsg_type == IPV6_PKTINFO) {
> > +			const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr);
> > +
> > +			dst->a6 = i6->ipi6_addr;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	err("Missing PKTINFO cmsg on datagram");
> 
> ...from a quick glance at the matching kernel path, this looks a bit
> too likely for err(), so I got a bit scared, and turned it to debug().

Ehh.. fair enough.

> I think warn() is actually more correct than debug() (I think it
> shouldn't really be err() though because we can keep using this flow),

So, technically we don't have a flow at this point: the point here is
that if we don't get the pktinfo we don't have the information we need
to determine the right flow.  But, regardless, you're right, warn()
would be the right level in theory.

> but I would feel a lot more confident about it once we have
> rate-limited versions of those.

Ok.

> Of course, I'll change this back to err() or warn() if you have a
> compelling reason.
> 
> Similar reasoning in 7/7 where this is called.
> 
> By the way, two messages (here and in caller) look redundant to me,
> regardless of their severity, but I'm not sure if you have further
> changes in mind where these separate prints would make sense.

In this patch there aren't two messages, it's only in this function.
There are two when I add the other caller in 7/7 which was an oversight.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 7/7] udp: Propagate errors on listening and brand new sockets
  2025-04-15 18:54   ` Stefano Brivio
@ 2025-04-16  0:38     ` David Gibson
  0 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2025-04-16  0:38 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 4464 bytes --]

On Tue, Apr 15, 2025 at 08:54:17PM +0200, Stefano Brivio wrote:
> On Tue, 15 Apr 2025 17:16:24 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > udp_sock_recverr() processes errors on UDP sockets and attempts to
> > propagate them as ICMP packets on the tap interface.  To do this it
> > currently requires the flow with which the error is associated as a
> > parameter.  If that's missing it will clear the error condition, but not
> > propagate it.
> > 
> > That means that we largely ignore errors on "listening" sockets.  It also
> > means we may discard some errors on flow specific sockets if they occur
> > very shortly after the socket is created.  In udp_flush_flow() we need to
> > clear any datagrams received between bind() and connect() which might not
> > be associated with the "final" flow for the socket.  If we get errors
> > before that point we'll ignore them in the same way because we don't know
> > the flow they're associated with in advance.
> > 
> > This can happen in practice if we have errors which occur almost
> > immediately after connect(), such as ECONNREFUSED when we connect() to a
> > local address where nothing is listening.
> > 
> > Between the extended error message itself and the PKTINFO information we
> > do actually have enough information to find the correct flow.  So, rather
> > than ignoring errors where we don't have a flow "hint", determine the flow
> > the hard way in udp_sock_recverr().
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 41 ++++++++++++++++++++++++++++++++---------
> >  1 file changed, 32 insertions(+), 9 deletions(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index 6db3accc..c04a091b 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst)
> >   * @c:		Execution context
> >   * @s:		Socket to receive errors from
> >   * @sidx:	Flow and side of @s, or FLOW_SIDX_NONE if unknown
> > + * @pif:	Interface on which the error occurred
> > + *              (only used if @sidx == FLOW_SIDX_NONE)
> > + * @port:	Local port number of @s (only used if @sidx == FLOW_SIDX_NONE)
> >   *
> >   * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
> >   *         if there was an error reading the queue
> >   *
> >   * #syscalls recvmsg
> >   */
> > -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
> > +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
> > +			    uint8_t pif, in_port_t port)
> >  {
> >  	struct errhdr {
> >  		struct sock_extended_err ee;
> >  		union sockaddr_inany saddr;
> >  	};
> >  	char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))];
> > +	const struct errhdr *eh = NULL;
> >  	char data[ICMP6_MAX_DLEN];
> > -	const struct errhdr *eh;
> >  	struct cmsghdr *hdr;
> >  	struct iovec iov = {
> >  		.iov_base = data,
> >  		.iov_len = sizeof(data)
> >  	};
> > +	union sockaddr_inany src;
> >  	struct msghdr mh = {
> > +		.msg_name = &src,
> > +		.msg_namelen = sizeof(src),
> >  		.msg_iov = &iov,
> >  		.msg_iovlen = 1,
> >  		.msg_control = buf,
> > @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
> >  		      hdr->cmsg_type == IP_RECVERR) ||
> >  		     (hdr->cmsg_level == IPPROTO_IPV6 &&
> >  		      hdr->cmsg_type == IPV6_RECVERR))
> > -		    break;
> > +			break;
> >  	}
> >  
> >  	if (!hdr) {
> > @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx)
> >  	      str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
> >  
> >  	if (!flow_sidx_valid(sidx)) {
> > -		trace("Ignoring received IP_RECVERR cmsg on listener socket");
> > -		return 1;
> > +		/* No hint from the socket, determine flow from addresses */
> > +		union inany_addr dst;
> > +
> > +		if (udp_pktinfo(&mh, &dst) < 0) {
> > +			warn("Missing PKTINFO on UDP error");
> 
> ...changed this to debug(), see my comments to 5/7.

Actually this one can go away entirely.  As you pointed out it's
redundant with the one in udp_pktinfo() itself.  Oh well, the extra
debug() should be harmless enough.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-16  0:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-15  7:16 [PATCH 0/7] Assorted fixes for UDP socket and error handling problems David Gibson
2025-04-15  7:16 ` [PATCH 1/7] udp: Fix breakage of UDP error handling by PKTINFO support David Gibson
2025-04-15  7:16 ` [PATCH 2/7] udp: Be quieter about errors on UDP receive David Gibson
2025-04-15  7:16 ` [PATCH 3/7] udp: Pass socket & flow information direction to error handling functions David Gibson
2025-04-15  7:16 ` [PATCH 4/7] udp: Deal with errors as we go in udp_sock_fwd() David Gibson
2025-04-15  7:16 ` [PATCH 5/7] udp: Add udp_pktinfo() helper David Gibson
2025-04-15 18:54   ` Stefano Brivio
2025-04-16  0:37     ` David Gibson
2025-04-15  7:16 ` [PATCH 6/7] udp: Minor re-organisation of udp_sock_recverr() David Gibson
2025-04-15  7:16 ` [PATCH 7/7] udp: Propagate errors on listening and brand new sockets David Gibson
2025-04-15 18:54   ` Stefano Brivio
2025-04-16  0:38     ` David Gibson
2025-04-15 19:10 ` [PATCH 0/7] Assorted fixes for UDP socket and error handling problems Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).