public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun
@ 2024-09-06 11:49 David Gibson
  2024-09-06 11:49 ` [PATCH 1/4] tap: Split out handling of EPOLLIN events David Gibson
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: David Gibson @ 2024-09-06 11:49 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

These changes started off as part of the series re-introducing EPOLLET
for tap event handling.  That's now turned out to be of lower
priority, but along the way we fixed a bug where we could truncate
frames from the kernel tap interface.

This is a respin of that patch, plus a few minor preliminary cleanups.
Various minor tweaks based on feedback from the original posting as
part of the tap EPOLLET series.

David Gibson (4):
  tap: Split out handling of EPOLLIN events
  tap: Improve handling of EINTR in tap_passt_input()
  tap: Restructure in tap_pasta_input()
  tap: Don't risk truncating frames on full buffer in tap_pasta_input()

 tap.c | 98 +++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 40 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] tap: Split out handling of EPOLLIN events
  2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
@ 2024-09-06 11:49 ` David Gibson
  2024-09-06 11:49 ` [PATCH 2/4] tap: Improve handling of EINTR in tap_passt_input() David Gibson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-09-06 11:49 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently, tap_handler_pas{st,ta}() check for EPOLLRDHUP, EPOLLHUP and
EPOLLERR events, then assume anything left is EPOLLIN.  We have some future
cases that may want to also handle EPOLLOUT, so in preparation explicitly
handle EPOLLIN, moving the logic to a subfunction.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 50 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/tap.c b/tap.c
index 852d8376..c8abc068 100644
--- a/tap.c
+++ b/tap.c
@@ -982,24 +982,17 @@ static void tap_sock_reset(struct ctx *c)
 }
 
 /**
- * tap_handler_passt() - Packet handler for AF_UNIX file descriptor
+ * tap_passt_input() - Handler for new data on the socket to qemu
  * @c:		Execution context
- * @events:	epoll events
  * @now:	Current timestamp
  */
-void tap_handler_passt(struct ctx *c, uint32_t events,
-		       const struct timespec *now)
+static void tap_passt_input(struct ctx *c, const struct timespec *now)
 {
 	static const char *partial_frame;
 	static ssize_t partial_len = 0;
 	ssize_t n;
 	char *p;
 
-	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) {
-		tap_sock_reset(c);
-		return;
-	}
-
 	tap_flush_pools();
 
 	if (partial_len) {
@@ -1052,20 +1045,33 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
 }
 
 /**
- * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor
+ * tap_handler_passt() - Event handler for AF_UNIX file descriptor
  * @c:		Execution context
  * @events:	epoll events
  * @now:	Current timestamp
  */
-void tap_handler_pasta(struct ctx *c, uint32_t events,
+void tap_handler_passt(struct ctx *c, uint32_t events,
 		       const struct timespec *now)
+{
+	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) {
+		tap_sock_reset(c);
+		return;
+	}
+
+	if (events & EPOLLIN)
+		tap_passt_input(c, now);
+}
+
+/**
+ * tap_pasta_input() - Handler for new data on the socket to hypervisor
+ * @c:		Execution context
+ * @now:	Current timestamp
+ */
+static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 {
 	ssize_t n, len;
 	int ret;
 
-	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))
-		die("Disconnect event on /dev/net/tun device, exiting");
-
 redo:
 	n = 0;
 
@@ -1102,6 +1108,22 @@ restart:
 	die("Error on tap device, exiting");
 }
 
+/**
+ * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor
+ * @c:		Execution context
+ * @events:	epoll events
+ * @now:	Current timestamp
+ */
+void tap_handler_pasta(struct ctx *c, uint32_t events,
+		       const struct timespec *now)
+{
+	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))
+		die("Disconnect event on /dev/net/tun device, exiting");
+
+	if (events & EPOLLIN)
+		tap_pasta_input(c, now);
+}
+
 /**
  * tap_sock_unix_open() - Create and bind AF_UNIX socket
  * @sock_path:	Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
-- 
@@ -982,24 +982,17 @@ static void tap_sock_reset(struct ctx *c)
 }
 
 /**
- * tap_handler_passt() - Packet handler for AF_UNIX file descriptor
+ * tap_passt_input() - Handler for new data on the socket to qemu
  * @c:		Execution context
- * @events:	epoll events
  * @now:	Current timestamp
  */
-void tap_handler_passt(struct ctx *c, uint32_t events,
-		       const struct timespec *now)
+static void tap_passt_input(struct ctx *c, const struct timespec *now)
 {
 	static const char *partial_frame;
 	static ssize_t partial_len = 0;
 	ssize_t n;
 	char *p;
 
-	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) {
-		tap_sock_reset(c);
-		return;
-	}
-
 	tap_flush_pools();
 
 	if (partial_len) {
@@ -1052,20 +1045,33 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
 }
 
 /**
- * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor
+ * tap_handler_passt() - Event handler for AF_UNIX file descriptor
  * @c:		Execution context
  * @events:	epoll events
  * @now:	Current timestamp
  */
-void tap_handler_pasta(struct ctx *c, uint32_t events,
+void tap_handler_passt(struct ctx *c, uint32_t events,
 		       const struct timespec *now)
+{
+	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) {
+		tap_sock_reset(c);
+		return;
+	}
+
+	if (events & EPOLLIN)
+		tap_passt_input(c, now);
+}
+
+/**
+ * tap_pasta_input() - Handler for new data on the socket to hypervisor
+ * @c:		Execution context
+ * @now:	Current timestamp
+ */
+static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 {
 	ssize_t n, len;
 	int ret;
 
-	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))
-		die("Disconnect event on /dev/net/tun device, exiting");
-
 redo:
 	n = 0;
 
@@ -1102,6 +1108,22 @@ restart:
 	die("Error on tap device, exiting");
 }
 
+/**
+ * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor
+ * @c:		Execution context
+ * @events:	epoll events
+ * @now:	Current timestamp
+ */
+void tap_handler_pasta(struct ctx *c, uint32_t events,
+		       const struct timespec *now)
+{
+	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))
+		die("Disconnect event on /dev/net/tun device, exiting");
+
+	if (events & EPOLLIN)
+		tap_pasta_input(c, now);
+}
+
 /**
  * tap_sock_unix_open() - Create and bind AF_UNIX socket
  * @sock_path:	Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] tap: Improve handling of EINTR in tap_passt_input()
  2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
  2024-09-06 11:49 ` [PATCH 1/4] tap: Split out handling of EPOLLIN events David Gibson
@ 2024-09-06 11:49 ` David Gibson
  2024-09-06 11:49 ` [PATCH 3/4] tap: Restructure in tap_pasta_input() David Gibson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-09-06 11:49 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

When tap_passt_input() gets an error from recv() it (correctly) does not
print any error message for EINTR, EAGAIN or EWOULDBLOCK.  However in all
three cases it returns from the function.  That makes sense for EAGAIN and
EWOULDBLOCK, since we then want to wait for the next EPOLLIN event before
trying again.  For EINTR, however, it makes more sense to retry immediately
- as it stands we're likely to get a renewer EPOLLIN event immediately in
that case, since we're using level triggered signalling.

So, handle EINTR separately by immediately retrying until we succeed or
get a different type of error.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tap.c b/tap.c
index c8abc068..8977b3f2 100644
--- a/tap.c
+++ b/tap.c
@@ -1003,10 +1003,13 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
 		memmove(pkt_buf, partial_frame, partial_len);
 	}
 
-	n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len,
-		 MSG_DONTWAIT);
+	do {
+		n = recv(c->fd_tap, pkt_buf + partial_len,
+			 TAP_BUF_BYTES - partial_len, MSG_DONTWAIT);
+	} while ((n < 0) && errno == EINTR);
+
 	if (n < 0) {
-		if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) {
+		if (errno != EAGAIN && errno != EWOULDBLOCK) {
 			err_perror("Receive error on guest connection, reset");
 			tap_sock_reset(c);
 		}
-- 
@@ -1003,10 +1003,13 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
 		memmove(pkt_buf, partial_frame, partial_len);
 	}
 
-	n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len,
-		 MSG_DONTWAIT);
+	do {
+		n = recv(c->fd_tap, pkt_buf + partial_len,
+			 TAP_BUF_BYTES - partial_len, MSG_DONTWAIT);
+	} while ((n < 0) && errno == EINTR);
+
 	if (n < 0) {
-		if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) {
+		if (errno != EAGAIN && errno != EWOULDBLOCK) {
 			err_perror("Receive error on guest connection, reset");
 			tap_sock_reset(c);
 		}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] tap: Restructure in tap_pasta_input()
  2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
  2024-09-06 11:49 ` [PATCH 1/4] tap: Split out handling of EPOLLIN events David Gibson
  2024-09-06 11:49 ` [PATCH 2/4] tap: Improve handling of EINTR in tap_passt_input() David Gibson
@ 2024-09-06 11:49 ` David Gibson
  2024-09-06 11:49 ` [PATCH 4/4] tap: Don't risk truncating frames on full buffer " David Gibson
  2024-09-06 12:20 ` [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun Stefano Brivio
  4 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-09-06 11:49 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tap_pasta_input() has a rather confusing structure, using two gotos.
Remove these by restructuring the function to have the main loop condition
based on filling our buffer space, with errors or running out of data
treated as the exception, rather than the other way around.  This allows
us to handle the EINTR which triggered the 'restart' goto with a continue.

The outer 'redo' was triggered if we completely filled our buffer, to flush
it and do another pass.  This one is unnecessary since we don't (yet) use
EPOLLET on the tap device: if there's still more data we'll get another
event and re-enter the loop.

Along the way handle a couple of extra edge cases:
 - Check for EWOULDBLOCK as well as EAGAIN for the benefit of any future
   ports where those might not have the same value
 - Detect EOF on the tap device and exit in that case

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 45 +++++++++++++++++++--------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/tap.c b/tap.c
index 8977b3f2..145587fc 100644
--- a/tap.c
+++ b/tap.c
@@ -1073,42 +1073,35 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
 static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 {
 	ssize_t n, len;
-	int ret;
-
-redo:
-	n = 0;
 
 	tap_flush_pools();
-restart:
-	while ((len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n)) > 0) {
 
-		if (len < (ssize_t)sizeof(struct ethhdr) ||
-		    len > (ssize_t)ETH_MAX_MTU) {
-			n += len;
-			continue;
-		}
+	for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) {
+		len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n);
 
+		if (len == 0) {
+			die("EOF on tap device, exiting");
+		} else if (len < 0) {
+			if (errno == EINTR) {
+				len = 0;
+				continue;
+			}
 
-		tap_add_packet(c, len, pkt_buf + n);
+			if (errno == EAGAIN && errno == EWOULDBLOCK)
+				break; /* all done for now */
 
-		if ((n += len) == TAP_BUF_BYTES)
-			break;
-	}
+			die("Error on tap device, exiting");
+		}
 
-	if (len < 0 && errno == EINTR)
-		goto restart;
+		/* Ignore frames of bad length */
+		if (len < (ssize_t)sizeof(struct ethhdr) ||
+		    len > (ssize_t)ETH_MAX_MTU)
+			continue;
 
-	ret = errno;
+		tap_add_packet(c, len, pkt_buf + n);
+	}
 
 	tap_handler(c, now);
-
-	if (len > 0 || ret == EAGAIN)
-		return;
-
-	if (n == TAP_BUF_BYTES)
-		goto redo;
-
-	die("Error on tap device, exiting");
 }
 
 /**
-- 
@@ -1073,42 +1073,35 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
 static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 {
 	ssize_t n, len;
-	int ret;
-
-redo:
-	n = 0;
 
 	tap_flush_pools();
-restart:
-	while ((len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n)) > 0) {
 
-		if (len < (ssize_t)sizeof(struct ethhdr) ||
-		    len > (ssize_t)ETH_MAX_MTU) {
-			n += len;
-			continue;
-		}
+	for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) {
+		len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n);
 
+		if (len == 0) {
+			die("EOF on tap device, exiting");
+		} else if (len < 0) {
+			if (errno == EINTR) {
+				len = 0;
+				continue;
+			}
 
-		tap_add_packet(c, len, pkt_buf + n);
+			if (errno == EAGAIN && errno == EWOULDBLOCK)
+				break; /* all done for now */
 
-		if ((n += len) == TAP_BUF_BYTES)
-			break;
-	}
+			die("Error on tap device, exiting");
+		}
 
-	if (len < 0 && errno == EINTR)
-		goto restart;
+		/* Ignore frames of bad length */
+		if (len < (ssize_t)sizeof(struct ethhdr) ||
+		    len > (ssize_t)ETH_MAX_MTU)
+			continue;
 
-	ret = errno;
+		tap_add_packet(c, len, pkt_buf + n);
+	}
 
 	tap_handler(c, now);
-
-	if (len > 0 || ret == EAGAIN)
-		return;
-
-	if (n == TAP_BUF_BYTES)
-		goto redo;
-
-	die("Error on tap device, exiting");
 }
 
 /**
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] tap: Don't risk truncating frames on full buffer in tap_pasta_input()
  2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
                   ` (2 preceding siblings ...)
  2024-09-06 11:49 ` [PATCH 3/4] tap: Restructure in tap_pasta_input() David Gibson
@ 2024-09-06 11:49 ` David Gibson
  2024-09-06 12:20 ` [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun Stefano Brivio
  4 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-09-06 11:49 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tap_pasta_input() keeps reading frames from the tap device until the
buffer is full.  However, this has an ugly edge case, when we get close
to buffer full, we will provide just the remaining space as a read()
buffer.  If this is shorter than the next frame to read, the tap device
will truncate the frame and discard the remainder.

Adjust the code to make sure we always have room for a maximum size frame.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tap.c b/tap.c
index 145587fc..41af6a6d 100644
--- a/tap.c
+++ b/tap.c
@@ -1076,8 +1076,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 
 	tap_flush_pools();
 
-	for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) {
-		len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n);
+	for (n = 0; n <= (ssize_t)TAP_BUF_BYTES - ETH_MAX_MTU; n += len) {
+		len = read(c->fd_tap, pkt_buf + n, ETH_MAX_MTU);
 
 		if (len == 0) {
 			die("EOF on tap device, exiting");
-- 
@@ -1076,8 +1076,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 
 	tap_flush_pools();
 
-	for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) {
-		len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n);
+	for (n = 0; n <= (ssize_t)TAP_BUF_BYTES - ETH_MAX_MTU; n += len) {
+		len = read(c->fd_tap, pkt_buf + n, ETH_MAX_MTU);
 
 		if (len == 0) {
 			die("EOF on tap device, exiting");
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun
  2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
                   ` (3 preceding siblings ...)
  2024-09-06 11:49 ` [PATCH 4/4] tap: Don't risk truncating frames on full buffer " David Gibson
@ 2024-09-06 12:20 ` Stefano Brivio
  4 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-09-06 12:20 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  6 Sep 2024 21:49:35 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> These changes started off as part of the series re-introducing EPOLLET
> for tap event handling.  That's now turned out to be of lower
> priority, but along the way we fixed a bug where we could truncate
> frames from the kernel tap interface.
> 
> This is a respin of that patch, plus a few minor preliminary cleanups.
> Various minor tweaks based on feedback from the original posting as
> part of the tap EPOLLET series.
> 
> David Gibson (4):
>   tap: Split out handling of EPOLLIN events
>   tap: Improve handling of EINTR in tap_passt_input()
>   tap: Restructure in tap_pasta_input()
>   tap: Don't risk truncating frames on full buffer in tap_pasta_input()

Applied.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-06 12:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-06 11:49 [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun David Gibson
2024-09-06 11:49 ` [PATCH 1/4] tap: Split out handling of EPOLLIN events David Gibson
2024-09-06 11:49 ` [PATCH 2/4] tap: Improve handling of EINTR in tap_passt_input() David Gibson
2024-09-06 11:49 ` [PATCH 3/4] tap: Restructure in tap_pasta_input() David Gibson
2024-09-06 11:49 ` [PATCH 4/4] tap: Don't risk truncating frames on full buffer " David Gibson
2024-09-06 12:20 ` [PATCH 0/4] Fix possible truncation of frames from /dev/net/tun Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).