public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Handle error events on UDP sockets
@ 2024-07-16  5:29 David Gibson
  2024-07-16  5:29 ` [PATCH 1/5] conf: Don't configure port forwarding for a disabled protocol David Gibson
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Add handling of the error queue for UDP sockets.  We certainly need
this for the flow table: this is the cause of the epoll loop that
Stefano noticed.  Even without the flow table it should improve
robustness and debugability.

Along the way we make a few other clean ups.

David Gibson (5):
  conf: Don't configure port forwarding for a disabled protocol
  udp: Make udp_sock_recv static
  udp, tcp: Tweak handling of no_udp and no_tcp flags
  util: Add AF_UNSPEC support to sockaddr_ntop()
  udp: Handle errors on UDP sockets

 conf.c |  5 ++++
 tcp.c  | 14 ++++++++---
 udp.c  | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 util.c | 33 +++++++++++++++++++++++++
 util.h |  3 +++
 5 files changed, 124 insertions(+), 7 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] conf: Don't configure port forwarding for a disabled protocol
  2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
@ 2024-07-16  5:29 ` David Gibson
  2024-07-16  5:29 ` [PATCH 2/5] udp: Make udp_sock_recv static David Gibson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

UDP and/or TCP can be disabled with the --no-udp and --no-tcp options.
However, when this is specified, it's still possible to configure forwarded
ports for the disabled protocol.  In some cases this will open sockets and
perform other actions, which might not be safe since the entire protocol
won't be initialised.

Check for this case, and explicitly forbid it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/conf.c b/conf.c
index 3c38cebc..629eb897 100644
--- a/conf.c
+++ b/conf.c
@@ -132,6 +132,11 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 		return;
 	}
 
+	if ((optname == 't' || optname == 'T') && c->no_tcp)
+		die("TCP port forwarding requested but TCP is disabled");
+	if ((optname == 'u' || optname == 'U') && c->no_udp)
+		die("UDP port forwarding requested but UDP is disabled");
+
 	if (!strcmp(optarg, "auto")) {
 		if (fwd->mode)
 			goto mode_conflict;
-- 
@@ -132,6 +132,11 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 		return;
 	}
 
+	if ((optname == 't' || optname == 'T') && c->no_tcp)
+		die("TCP port forwarding requested but TCP is disabled");
+	if ((optname == 'u' || optname == 'U') && c->no_udp)
+		die("UDP port forwarding requested but UDP is disabled");
+
 	if (!strcmp(optarg, "auto")) {
 		if (fwd->mode)
 			goto mode_conflict;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] udp: Make udp_sock_recv static
  2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
  2024-07-16  5:29 ` [PATCH 1/5] conf: Don't configure port forwarding for a disabled protocol David Gibson
@ 2024-07-16  5:29 ` David Gibson
  2024-07-16  5:29 ` [PATCH 3/5] udp, tcp: Tweak handling of no_udp and no_tcp flags David Gibson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Through an oversight this was previously declared as a public function
although it's only used in udp.c and there is no prototype in any header.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/udp.c b/udp.c
index dee402f7..187483f0 100644
--- a/udp.c
+++ b/udp.c
@@ -737,8 +737,8 @@ static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr *mmh,
  *
  * #syscalls recvmmsg
  */
-int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
-		  struct mmsghdr *mmh)
+static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
+			 struct mmsghdr *mmh)
 {
 	/* For not entirely clear reasons (data locality?) pasta gets better
 	 * throughput if we receive tap datagrams one at a atime.  For small
-- 
@@ -737,8 +737,8 @@ static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr *mmh,
  *
  * #syscalls recvmmsg
  */
-int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
-		  struct mmsghdr *mmh)
+static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
+			 struct mmsghdr *mmh)
 {
 	/* For not entirely clear reasons (data locality?) pasta gets better
 	 * throughput if we receive tap datagrams one at a atime.  For small
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] udp, tcp: Tweak handling of no_udp and no_tcp flags
  2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
  2024-07-16  5:29 ` [PATCH 1/5] conf: Don't configure port forwarding for a disabled protocol David Gibson
  2024-07-16  5:29 ` [PATCH 2/5] udp: Make udp_sock_recv static David Gibson
@ 2024-07-16  5:29 ` David Gibson
  2024-07-16  5:29 ` [PATCH 4/5] util: Add AF_UNSPEC support to sockaddr_ntop() David Gibson
  2024-07-16  5:29 ` [PATCH 5/5] udp: Handle errors on UDP sockets David Gibson
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We abort the UDP socket handler if the no_udp flag is set.  But if UDP
was disabled we should never have had a UDP socket to trigger the handler
in the first place.  If we somehow did, ignoring it here isn't really going
to help because aborting without doing anything is likely to lead to an
epoll loop.  The same is the case for the TCP socket and timer handlers and
the no_tcp flag.

Change these checks on the flag to ASSERT()s.  Similarly add ASSERT()s to
several other entry points to the protocol specific code which should never
be called if the protocol is disabled.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 14 +++++++++++---
 udp.c | 13 +++++++++++--
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tcp.c b/tcp.c
index 75b959a2..de3f9f64 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2286,7 +2286,9 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref,
 	union flow *flow;
 	int s;
 
-	if (c->no_tcp || !(flow = flow_alloc()))
+	ASSERT(!c->no_tcp);
+
+	if (!(flow = flow_alloc()))
 		return;
 
 	s = accept4(ref.fd, &sa.sa, &sl, SOCK_NONBLOCK);
@@ -2339,8 +2341,7 @@ void tcp_timer_handler(struct ctx *c, union epoll_ref ref)
 	struct itimerspec check_armed = { { 0 }, { 0 } };
 	struct tcp_tap_conn *conn = CONN(ref.flow);
 
-	if (c->no_tcp)
-		return;
+	ASSERT(!c->no_tcp);
 
 	/* We don't reset timers on ~ACK_FROM_TAP_DUE, ~ACK_TO_TAP_DUE. If the
 	 * timer is currently armed, this event came from a previous setting,
@@ -2398,6 +2399,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events)
 {
 	struct tcp_tap_conn *conn = CONN(ref.flowside.flow);
 
+	ASSERT(!c->no_tcp);
 	ASSERT(conn->f.type == FLOW_TCP);
 	ASSERT(conn->f.pif[ref.flowside.side] != PIF_TAP);
 
@@ -2497,6 +2499,8 @@ int tcp_sock_init(const struct ctx *c, sa_family_t af, const void *addr,
 {
 	int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
+	ASSERT(!c->no_tcp);
+
 	if (af == AF_UNSPEC && c->ifi4 && c->ifi6)
 		/* Attempt to get a dual stack socket */
 		if (tcp_sock_init_af(c, AF_UNSPEC, port, addr, ifname) >= 0)
@@ -2574,6 +2578,8 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
  */
 void tcp_ns_sock_init(const struct ctx *c, in_port_t port)
 {
+	ASSERT(!c->no_tcp);
+
 	if (c->ifi4)
 		tcp_ns_sock_init4(c, port);
 	if (c->ifi6)
@@ -2661,6 +2667,8 @@ int tcp_init(struct ctx *c)
 {
 	unsigned b;
 
+	ASSERT(!c->no_tcp);
+
 	for (b = 0; b < TCP_HASH_TABLE_SIZE; b++)
 		tc_hash[b] = FLOW_SIDX_NONE;
 
diff --git a/udp.c b/udp.c
index 187483f0..fbf3ce73 100644
--- a/udp.c
+++ b/udp.c
@@ -749,7 +749,9 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
 	 */
 	int n = (c->mode == MODE_PASTA ? 1 : UDP_MAX_FRAMES);
 
-	if (c->no_udp || !(events & EPOLLIN))
+	ASSERT(!c->no_udp);
+
+	if (!(events & EPOLLIN))
 		return 0;
 
 	n = recvmmsg(s, mmh, n, 0, NULL);
@@ -849,10 +851,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 	in_port_t src, dst;
 	socklen_t sl;
 
-	(void)c;
 	(void)saddr;
 	(void)pif;
 
+	ASSERT(!c->no_udp);
+
 	uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
 	if (!uh)
 		return 1;
@@ -1026,6 +1029,8 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 				     .orig = true, .port = port };
 	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
+	ASSERT(!c->no_udp);
+
 	if (ns)
 		uref.pif = PIF_SPLICE;
 	else
@@ -1214,6 +1219,8 @@ void udp_timer(struct ctx *c, const struct timespec *now)
 	unsigned int i;
 	long *word, tmp;
 
+	ASSERT(!c->no_udp);
+
 	if (c->mode == MODE_PASTA) {
 		if (c->udp.fwd_out.f.mode == FWD_AUTO) {
 			fwd_scan_ports_udp(&c->udp.fwd_out.f, &c->udp.fwd_in.f,
@@ -1257,6 +1264,8 @@ v6:
  */
 int udp_init(struct ctx *c)
 {
+	ASSERT(!c->no_udp);
+
 	udp_iov_init(c);
 
 	udp_invert_portmap(&c->udp.fwd_in);
-- 
@@ -749,7 +749,9 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
 	 */
 	int n = (c->mode == MODE_PASTA ? 1 : UDP_MAX_FRAMES);
 
-	if (c->no_udp || !(events & EPOLLIN))
+	ASSERT(!c->no_udp);
+
+	if (!(events & EPOLLIN))
 		return 0;
 
 	n = recvmmsg(s, mmh, n, 0, NULL);
@@ -849,10 +851,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 	in_port_t src, dst;
 	socklen_t sl;
 
-	(void)c;
 	(void)saddr;
 	(void)pif;
 
+	ASSERT(!c->no_udp);
+
 	uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
 	if (!uh)
 		return 1;
@@ -1026,6 +1029,8 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 				     .orig = true, .port = port };
 	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
+	ASSERT(!c->no_udp);
+
 	if (ns)
 		uref.pif = PIF_SPLICE;
 	else
@@ -1214,6 +1219,8 @@ void udp_timer(struct ctx *c, const struct timespec *now)
 	unsigned int i;
 	long *word, tmp;
 
+	ASSERT(!c->no_udp);
+
 	if (c->mode == MODE_PASTA) {
 		if (c->udp.fwd_out.f.mode == FWD_AUTO) {
 			fwd_scan_ports_udp(&c->udp.fwd_out.f, &c->udp.fwd_in.f,
@@ -1257,6 +1264,8 @@ v6:
  */
 int udp_init(struct ctx *c)
 {
+	ASSERT(!c->no_udp);
+
 	udp_iov_init(c);
 
 	udp_invert_portmap(&c->udp.fwd_in);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] util: Add AF_UNSPEC support to sockaddr_ntop()
  2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
                   ` (2 preceding siblings ...)
  2024-07-16  5:29 ` [PATCH 3/5] udp, tcp: Tweak handling of no_udp and no_tcp flags David Gibson
@ 2024-07-16  5:29 ` David Gibson
  2024-07-16  5:29 ` [PATCH 5/5] udp: Handle errors on UDP sockets David Gibson
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Allow sockaddr_ntop() to format AF_UNSPEC socket addresses.  There do exist
a few cases where we might legitimately have either an AF_UNSPEC or a real
address, such as the origin address from MSG_ERRQUEUE.  Even in cases where
we shouldn't get an AF_UNSPEC address, formatting it is likely to make
things easier to debug if we ever somehow do.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 util.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/util.c b/util.c
index 9a73fbb9..428847d2 100644
--- a/util.c
+++ b/util.c
@@ -622,6 +622,10 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
 	} while (0)
 
 	switch (family) {
+	case AF_UNSPEC:
+		IPRINTF("<unspecified>");
+		break;
+
 	case AF_INET: {
 		const struct sockaddr_in *sa4 = sa;
 
-- 
@@ -622,6 +622,10 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
 	} while (0)
 
 	switch (family) {
+	case AF_UNSPEC:
+		IPRINTF("<unspecified>");
+		break;
+
 	case AF_INET: {
 		const struct sockaddr_in *sa4 = sa;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] udp: Handle errors on UDP sockets
  2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
                   ` (3 preceding siblings ...)
  2024-07-16  5:29 ` [PATCH 4/5] util: Add AF_UNSPEC support to sockaddr_ntop() David Gibson
@ 2024-07-16  5:29 ` David Gibson
  2024-07-16  7:25   ` Stefano Brivio
  4 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2024-07-16  5:29 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently we ignore all events other than EPOLLIN on UDP sockets.  This
means that if we ever receive an EPOLLERR event, we'll enter an infinite
loop on epoll, because we'll never do anything to clear the error.

Luckily that doesn't seem to have happened in practice, but it's certainly
fragile.  Furthermore changes in how we handle UDP sockets with the flow
table mean we will start receiving error events.

Add handling of EPOLLERR events.  For now we just read the error from the
error queue (thereby clearing the error state) and print a debug message.
We can add more substantial handling of specific events in future if we
want to.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util.c | 29 +++++++++++++++++++++++++++++
 util.h |  3 +++
 3 files changed, 91 insertions(+)

diff --git a/udp.c b/udp.c
index fbf3ce73..d761717a 100644
--- a/udp.c
+++ b/udp.c
@@ -110,6 +110,7 @@
 #include <sys/socket.h>
 #include <sys/uio.h>
 #include <time.h>
+#include <linux/errqueue.h>
 
 #include "checksum.h"
 #include "util.h"
@@ -728,6 +729,59 @@ static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr *mmh,
 	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
 }
 
+/**
+ * udp_sock_recverr() - Receive and clear an error from a socket
+ * @s:		Socket to receive from
+ *
+ * Return: true if an error received and processed, false if no more errors
+ *
+ * #syscalls recvmsg
+ */
+static bool udp_sock_recverr(int s)
+{
+	const struct sock_extended_err *ee;
+	const struct cmsghdr *hdr;
+	char buf[CMSG_SPACE(sizeof(*ee))];
+	struct msghdr mh = {
+		.msg_name = NULL,
+		.msg_namelen = 0,
+		.msg_iov = NULL,
+		.msg_iovlen = 0,
+		.msg_control = buf,
+		.msg_controllen = sizeof(buf),
+	};
+	ssize_t rc;
+
+	rc = recvmsg(s, &mh, MSG_ERRQUEUE);
+	if (rc < 0) {
+		if (errno != EAGAIN)
+			err_perror("Failed to read error queue");
+		return false;
+	}
+
+	if (!(mh.msg_flags & MSG_ERRQUEUE)) {
+		err("Missing MSG_ERRQUEUE flag reading error queue");
+		return false;
+	}
+
+	hdr = CMSG_FIRSTHDR(&mh);
+	if (!((hdr->cmsg_level == IPPROTO_IP &&
+	       hdr->cmsg_type == IP_RECVERR) ||
+	      (hdr->cmsg_level == IPPROTO_IPV6 &&
+	       hdr->cmsg_type == IPV6_RECVERR))) {
+		err("Unexpected cmsg reading error queue");
+		return false;
+	}
+
+	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
+
+	/* TODO: When possible propagate and otherwise handle errors */
+	debug("%s error on UDP socket %i: %s",
+	      str_ee_origin(ee), s, strerror(ee->ee_errno));
+
+	return true;
+}
+
 /**
  * udp_sock_recv() - Receive datagrams from a socket
  * @c:		Execution context
@@ -751,6 +805,11 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
 
 	ASSERT(!c->no_udp);
 
+	/* Clear any errors first */
+	if (events & EPOLLERR)
+		while (udp_sock_recverr(s))
+			;
+
 	if (!(events & EPOLLIN))
 		return 0;
 
diff --git a/util.c b/util.c
index 428847d2..70de1e16 100644
--- a/util.c
+++ b/util.c
@@ -25,6 +25,7 @@
 #include <time.h>
 #include <errno.h>
 #include <stdbool.h>
+#include <linux/errqueue.h>
 
 #include "util.h"
 #include "iov.h"
@@ -97,6 +98,14 @@ static int sock_l4_sa(const struct ctx *c, enum epoll_type type,
 	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &y, sizeof(y)))
 		debug("Failed to set SO_REUSEADDR on socket %i", fd);
 
+	if (proto == IPPROTO_UDP) {
+		int level = af == AF_INET ? IPPROTO_IP : IPPROTO_IPV6;
+		int opt = af == AF_INET ? IP_RECVERR : IPV6_RECVERR;
+
+		if (setsockopt(fd, level, opt, &y, sizeof(y)))
+			die("Failed to set RECVERR on socket %i", fd);
+	}
+
 	if (ifname && *ifname) {
 		/* Supported since kernel version 5.7, commit c427bfec18f2
 		 * ("net: core: enable SO_BINDTODEVICE for non-root users"). If
@@ -654,3 +663,23 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
 
 	return dst;
 }
+
+/** str_ee_origin() - Convert socket extended error origin to a string
+ * @ee:		Socket extended error structure
+ *
+ * Return: Static string describing error origin
+ */
+const char *str_ee_origin(const struct sock_extended_err *ee)
+{
+	const char *const desc[] = {
+		[SO_EE_ORIGIN_NONE]  = "<no origin>",
+		[SO_EE_ORIGIN_LOCAL] = "Local",
+		[SO_EE_ORIGIN_ICMP]  = "ICMP",
+		[SO_EE_ORIGIN_ICMP6] = "ICMPv6",
+	};
+
+	if (ee->ee_origin < ARRAY_SIZE(desc))
+		return desc[ee->ee_origin];
+
+	return "<invalid>";
+}
diff --git a/util.h b/util.h
index d0150396..1d479ddf 100644
--- a/util.h
+++ b/util.h
@@ -194,7 +194,10 @@ static inline const char *af_name(sa_family_t af)
 
 #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
 
+struct sock_extended_err;
+
 const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
+const char *str_ee_origin(const struct sock_extended_err *ee);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
@@ -194,7 +194,10 @@ static inline const char *af_name(sa_family_t af)
 
 #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
 
+struct sock_extended_err;
+
 const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
+const char *str_ee_origin(const struct sock_extended_err *ee);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] udp: Handle errors on UDP sockets
  2024-07-16  5:29 ` [PATCH 5/5] udp: Handle errors on UDP sockets David Gibson
@ 2024-07-16  7:25   ` Stefano Brivio
  2024-07-17  0:04     ` David Gibson
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Brivio @ 2024-07-16  7:25 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

Nits only (the rest of the series looks good to me):

On Tue, 16 Jul 2024 15:29:36 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Currently we ignore all events other than EPOLLIN on UDP sockets.  This
> means that if we ever receive an EPOLLERR event, we'll enter an infinite
> loop on epoll, because we'll never do anything to clear the error.
> 
> Luckily that doesn't seem to have happened in practice, but it's certainly
> fragile.  Furthermore changes in how we handle UDP sockets with the flow
> table mean we will start receiving error events.
> 
> Add handling of EPOLLERR events.  For now we just read the error from the
> error queue (thereby clearing the error state) and print a debug message.
> We can add more substantial handling of specific events in future if we
> want to.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  util.c | 29 +++++++++++++++++++++++++++++
>  util.h |  3 +++
>  3 files changed, 91 insertions(+)
> 
> diff --git a/udp.c b/udp.c
> index fbf3ce73..d761717a 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -110,6 +110,7 @@
>  #include <sys/socket.h>
>  #include <sys/uio.h>
>  #include <time.h>
> +#include <linux/errqueue.h>
>  
>  #include "checksum.h"
>  #include "util.h"
> @@ -728,6 +729,59 @@ static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr *mmh,
>  	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
>  }
>  
> +/**
> + * udp_sock_recverr() - Receive and clear an error from a socket
> + * @s:		Socket to receive from
> + *
> + * Return: true if an error received and processed, false if no more errors

It took me a while to realise there's an implied "was" between "error"
and "received". Maybe:

 * Return: true: errors received and processed, false: no more errors

?

> + *
> + * #syscalls recvmsg
> + */
> +static bool udp_sock_recverr(int s)
> +{
> +	const struct sock_extended_err *ee;
> +	const struct cmsghdr *hdr;
> +	char buf[CMSG_SPACE(sizeof(*ee))];
> +	struct msghdr mh = {
> +		.msg_name = NULL,
> +		.msg_namelen = 0,
> +		.msg_iov = NULL,
> +		.msg_iovlen = 0,
> +		.msg_control = buf,
> +		.msg_controllen = sizeof(buf),
> +	};
> +	ssize_t rc;
> +
> +	rc = recvmsg(s, &mh, MSG_ERRQUEUE);
> +	if (rc < 0) {
> +		if (errno != EAGAIN)

There are some existing cases where we forgot about this, but EAGAIN
and EWOULDBLOCK are not guaranteed to be the same value, and as
recvmsg(2) says:

  POSIX.1 allows either error to be returned for this case, and does
  not require these constants to have the same value, so a portable
  application should check for both possibilities.

> +			err_perror("Failed to read error queue");
> +		return false;
> +	}
> +
> +	if (!(mh.msg_flags & MSG_ERRQUEUE)) {
> +		err("Missing MSG_ERRQUEUE flag reading error queue");
> +		return false;
> +	}
> +
> +	hdr = CMSG_FIRSTHDR(&mh);
> +	if (!((hdr->cmsg_level == IPPROTO_IP &&
> +	       hdr->cmsg_type == IP_RECVERR) ||
> +	      (hdr->cmsg_level == IPPROTO_IPV6 &&
> +	       hdr->cmsg_type == IPV6_RECVERR))) {
> +		err("Unexpected cmsg reading error queue");
> +		return false;
> +	}
> +
> +	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
> +
> +	/* TODO: When possible propagate and otherwise handle errors */
> +	debug("%s error on UDP socket %i: %s",
> +	      str_ee_origin(ee), s, strerror(ee->ee_errno));
> +
> +	return true;
> +}
> +
>  /**
>   * udp_sock_recv() - Receive datagrams from a socket
>   * @c:		Execution context
> @@ -751,6 +805,11 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
>  
>  	ASSERT(!c->no_udp);
>  
> +	/* Clear any errors first */
> +	if (events & EPOLLERR)

For consistency: curly brackets around multi-line block (or simply
change that to "while (udp_sock_recverr(s));"? No preference from me).

> +		while (udp_sock_recverr(s))
> +			;
> +
>  	if (!(events & EPOLLIN))
>  		return 0;
>  
> diff --git a/util.c b/util.c
> index 428847d2..70de1e16 100644
> --- a/util.c
> +++ b/util.c
> @@ -25,6 +25,7 @@
>  #include <time.h>
>  #include <errno.h>
>  #include <stdbool.h>
> +#include <linux/errqueue.h>
>  
>  #include "util.h"
>  #include "iov.h"
> @@ -97,6 +98,14 @@ static int sock_l4_sa(const struct ctx *c, enum epoll_type type,
>  	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &y, sizeof(y)))
>  		debug("Failed to set SO_REUSEADDR on socket %i", fd);
>  
> +	if (proto == IPPROTO_UDP) {
> +		int level = af == AF_INET ? IPPROTO_IP : IPPROTO_IPV6;
> +		int opt = af == AF_INET ? IP_RECVERR : IPV6_RECVERR;
> +
> +		if (setsockopt(fd, level, opt, &y, sizeof(y)))
> +			die("Failed to set RECVERR on socket %i", fd);

There's also a die_perror() which would be practical here.

> +	}
> +
>  	if (ifname && *ifname) {
>  		/* Supported since kernel version 5.7, commit c427bfec18f2
>  		 * ("net: core: enable SO_BINDTODEVICE for non-root users"). If
> @@ -654,3 +663,23 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
>  
>  	return dst;
>  }
> +
> +/** str_ee_origin() - Convert socket extended error origin to a string
> + * @ee:		Socket extended error structure
> + *
> + * Return: Static string describing error origin
> + */
> +const char *str_ee_origin(const struct sock_extended_err *ee)
> +{
> +	const char *const desc[] = {
> +		[SO_EE_ORIGIN_NONE]  = "<no origin>",
> +		[SO_EE_ORIGIN_LOCAL] = "Local",
> +		[SO_EE_ORIGIN_ICMP]  = "ICMP",
> +		[SO_EE_ORIGIN_ICMP6] = "ICMPv6",
> +	};
> +
> +	if (ee->ee_origin < ARRAY_SIZE(desc))
> +		return desc[ee->ee_origin];
> +
> +	return "<invalid>";
> +}
> diff --git a/util.h b/util.h
> index d0150396..1d479ddf 100644
> --- a/util.h
> +++ b/util.h
> @@ -194,7 +194,10 @@ static inline const char *af_name(sa_family_t af)
>  
>  #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
>  
> +struct sock_extended_err;
> +
>  const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
> +const char *str_ee_origin(const struct sock_extended_err *ee);
>  
>  /**
>   * mod_sub() - Modular arithmetic subtraction

-- 
Stefano


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] udp: Handle errors on UDP sockets
  2024-07-16  7:25   ` Stefano Brivio
@ 2024-07-17  0:04     ` David Gibson
  0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2024-07-17  0:04 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7130 bytes --]

On Tue, Jul 16, 2024 at 09:25:56AM +0200, Stefano Brivio wrote:
> Nits only (the rest of the series looks good to me):
> 
> On Tue, 16 Jul 2024 15:29:36 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently we ignore all events other than EPOLLIN on UDP sockets.  This
> > means that if we ever receive an EPOLLERR event, we'll enter an infinite
> > loop on epoll, because we'll never do anything to clear the error.
> > 
> > Luckily that doesn't seem to have happened in practice, but it's certainly
> > fragile.  Furthermore changes in how we handle UDP sockets with the flow
> > table mean we will start receiving error events.
> > 
> > Add handling of EPOLLERR events.  For now we just read the error from the
> > error queue (thereby clearing the error state) and print a debug message.
> > We can add more substantial handling of specific events in future if we
> > want to.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  util.c | 29 +++++++++++++++++++++++++++++
> >  util.h |  3 +++
> >  3 files changed, 91 insertions(+)
> > 
> > diff --git a/udp.c b/udp.c
> > index fbf3ce73..d761717a 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -110,6 +110,7 @@
> >  #include <sys/socket.h>
> >  #include <sys/uio.h>
> >  #include <time.h>
> > +#include <linux/errqueue.h>
> >  
> >  #include "checksum.h"
> >  #include "util.h"
> > @@ -728,6 +729,59 @@ static void udp_tap_prepare(const struct ctx *c, const struct mmsghdr *mmh,
> >  	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
> >  }
> >  
> > +/**
> > + * udp_sock_recverr() - Receive and clear an error from a socket
> > + * @s:		Socket to receive from
> > + *
> > + * Return: true if an error received and processed, false if no more errors
> 
> It took me a while to realise there's an implied "was" between "error"
> and "received". Maybe:
> 
>  * Return: true: errors received and processed, false: no more errors
> 
> ?

Adjusted.

> 
> > + *
> > + * #syscalls recvmsg
> > + */
> > +static bool udp_sock_recverr(int s)
> > +{
> > +	const struct sock_extended_err *ee;
> > +	const struct cmsghdr *hdr;
> > +	char buf[CMSG_SPACE(sizeof(*ee))];
> > +	struct msghdr mh = {
> > +		.msg_name = NULL,
> > +		.msg_namelen = 0,
> > +		.msg_iov = NULL,
> > +		.msg_iovlen = 0,
> > +		.msg_control = buf,
> > +		.msg_controllen = sizeof(buf),
> > +	};
> > +	ssize_t rc;
> > +
> > +	rc = recvmsg(s, &mh, MSG_ERRQUEUE);
> > +	if (rc < 0) {
> > +		if (errno != EAGAIN)
> 
> There are some existing cases where we forgot about this, but EAGAIN
> and EWOULDBLOCK are not guaranteed to be the same value, and as
> recvmsg(2) says:
> 
>   POSIX.1 allows either error to be returned for this case, and does
>   not require these constants to have the same value, so a portable
>   application should check for both possibilities.

Fair point.  We're Linux only for now where they certainly do have the
same value, but no reason not to be robust against portability in
future.

> > +			err_perror("Failed to read error queue");
> > +		return false;
> > +	}
> > +
> > +	if (!(mh.msg_flags & MSG_ERRQUEUE)) {
> > +		err("Missing MSG_ERRQUEUE flag reading error queue");
> > +		return false;
> > +	}
> > +
> > +	hdr = CMSG_FIRSTHDR(&mh);
> > +	if (!((hdr->cmsg_level == IPPROTO_IP &&
> > +	       hdr->cmsg_type == IP_RECVERR) ||
> > +	      (hdr->cmsg_level == IPPROTO_IPV6 &&
> > +	       hdr->cmsg_type == IPV6_RECVERR))) {
> > +		err("Unexpected cmsg reading error queue");
> > +		return false;
> > +	}
> > +
> > +	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
> > +
> > +	/* TODO: When possible propagate and otherwise handle errors */
> > +	debug("%s error on UDP socket %i: %s",
> > +	      str_ee_origin(ee), s, strerror(ee->ee_errno));
> > +
> > +	return true;
> > +}
> > +
> >  /**
> >   * udp_sock_recv() - Receive datagrams from a socket
> >   * @c:		Execution context
> > @@ -751,6 +805,11 @@ static int udp_sock_recv(const struct ctx *c, int s, uint32_t events,
> >  
> >  	ASSERT(!c->no_udp);
> >  
> > +	/* Clear any errors first */
> > +	if (events & EPOLLERR)
> 
> For consistency: curly brackets around multi-line block (or simply
> change that to "while (udp_sock_recverr(s));"? No preference from me).

Done.

> > +		while (udp_sock_recverr(s))
> > +			;
> > +
> >  	if (!(events & EPOLLIN))
> >  		return 0;
> >  
> > diff --git a/util.c b/util.c
> > index 428847d2..70de1e16 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -25,6 +25,7 @@
> >  #include <time.h>
> >  #include <errno.h>
> >  #include <stdbool.h>
> > +#include <linux/errqueue.h>
> >  
> >  #include "util.h"
> >  #include "iov.h"
> > @@ -97,6 +98,14 @@ static int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> >  	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &y, sizeof(y)))
> >  		debug("Failed to set SO_REUSEADDR on socket %i", fd);
> >  
> > +	if (proto == IPPROTO_UDP) {
> > +		int level = af == AF_INET ? IPPROTO_IP : IPPROTO_IPV6;
> > +		int opt = af == AF_INET ? IP_RECVERR : IPV6_RECVERR;
> > +
> > +		if (setsockopt(fd, level, opt, &y, sizeof(y)))
> > +			die("Failed to set RECVERR on socket %i", fd);
> 
> There's also a die_perror() which would be practical here.

Oops, I think I meant to use that.  Done.

> > +	}
> > +
> >  	if (ifname && *ifname) {
> >  		/* Supported since kernel version 5.7, commit c427bfec18f2
> >  		 * ("net: core: enable SO_BINDTODEVICE for non-root users"). If
> > @@ -654,3 +663,23 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
> >  
> >  	return dst;
> >  }
> > +
> > +/** str_ee_origin() - Convert socket extended error origin to a string
> > + * @ee:		Socket extended error structure
> > + *
> > + * Return: Static string describing error origin
> > + */
> > +const char *str_ee_origin(const struct sock_extended_err *ee)
> > +{
> > +	const char *const desc[] = {
> > +		[SO_EE_ORIGIN_NONE]  = "<no origin>",
> > +		[SO_EE_ORIGIN_LOCAL] = "Local",
> > +		[SO_EE_ORIGIN_ICMP]  = "ICMP",
> > +		[SO_EE_ORIGIN_ICMP6] = "ICMPv6",
> > +	};
> > +
> > +	if (ee->ee_origin < ARRAY_SIZE(desc))
> > +		return desc[ee->ee_origin];
> > +
> > +	return "<invalid>";
> > +}
> > diff --git a/util.h b/util.h
> > index d0150396..1d479ddf 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -194,7 +194,10 @@ static inline const char *af_name(sa_family_t af)
> >  
> >  #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
> >  
> > +struct sock_extended_err;
> > +
> >  const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
> > +const char *str_ee_origin(const struct sock_extended_err *ee);
> >  
> >  /**
> >   * mod_sub() - Modular arithmetic subtraction
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-17  0:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-16  5:29 [PATCH 0/5] Handle error events on UDP sockets David Gibson
2024-07-16  5:29 ` [PATCH 1/5] conf: Don't configure port forwarding for a disabled protocol David Gibson
2024-07-16  5:29 ` [PATCH 2/5] udp: Make udp_sock_recv static David Gibson
2024-07-16  5:29 ` [PATCH 3/5] udp, tcp: Tweak handling of no_udp and no_tcp flags David Gibson
2024-07-16  5:29 ` [PATCH 4/5] util: Add AF_UNSPEC support to sockaddr_ntop() David Gibson
2024-07-16  5:29 ` [PATCH 5/5] udp: Handle errors on UDP sockets David Gibson
2024-07-16  7:25   ` Stefano Brivio
2024-07-17  0:04     ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).