From: David Gibson <david@gibson.dropbear.id.au>
To: passt-dev@passt.top, Stefano Brivio <sbrivio@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH 5/5] tap: Improve handling of partially received frames on qemu socket
Date: Fri, 26 Jul 2024 17:20:31 +1000 [thread overview]
Message-ID: <20240726072031.3941305-6-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20240726072031.3941305-1-david@gibson.dropbear.id.au>
Because the Unix socket to qemu is a stream socket, we have no guarantee
of where the boundaries between recv() calls will lie. Typically they
will lie on frame boundaries, because that's how qemu will send then, but
we can't rely on it.
Currently we handle this case by detecting when we have received a partial
frame and performing a blocking recv() to get the remainder, and only then
processing the frames. Change it so instead we save the partial frame
persistently and include it as the first thing processed next time we
receive data from the socket. This handles a number of (unlikely) cases
which previously would not be dealt with correctly:
* If qemu sent a partial frame then waited some time before sending the
remainder, previously we could block here for an unacceptably long time
* If qemu sent a tiny partial frame (< 4 bytes) we'd leave the loop without
doing the partial frame handling, which would put us out of sync with
the stream from qemu
* If a the blocking recv() only received some of the remainder of the
frame, not all of it, we'd return leaving us out of sync with the
stream again
Caveat: This could memmove() a moderate amount of data (ETH_MAX_MTU). This
is probably acceptable because it's an unlikely case in practice. If
necessary we could mitigate this by using a true ring buffer.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
passt.h | 1 -
tap.c | 36 +++++++++++++++++++++++-------------
2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/passt.h b/passt.h
index 4cc2b6f0..d0f31a23 100644
--- a/passt.h
+++ b/passt.h
@@ -60,7 +60,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data),
#define TAP_BUF_BYTES \
ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE)
-#define TAP_BUF_FILL (TAP_BUF_BYTES - ETH_MAX_MTU - sizeof(uint32_t))
#define TAP_MSGS \
DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t))
diff --git a/tap.c b/tap.c
index aca27bb9..18eec279 100644
--- a/tap.c
+++ b/tap.c
@@ -989,6 +989,8 @@ static void tap_sock_reset(struct ctx *c)
void tap_handler_passt(struct ctx *c, uint32_t events,
const struct timespec *now)
{
+ static const char *partial_frame;
+ static ssize_t partial_len = 0;
ssize_t n;
char *p;
@@ -997,11 +999,18 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
- p = pkt_buf;
-
tap_flush_pools();
- n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT);
+ if (partial_len) {
+ /* We have a partial frame from an earlier pass. Move it to the
+ * start of the buffer, top up with new data, then process all
+ * of it.
+ */
+ memmove(pkt_buf, partial_frame, partial_len);
+ }
+
+ n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len,
+ MSG_DONTWAIT);
if (n < 0) {
if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) {
err_perror("Error receiving from QEMU socket");
@@ -1010,7 +1019,10 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
- while (n > (ssize_t)sizeof(uint32_t)) {
+ p = pkt_buf;
+ n += partial_len;
+
+ while (n >= (ssize_t)sizeof(uint32_t)) {
uint32_t l2len = ntohl_unaligned(p);
if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU) {
@@ -1019,24 +1031,22 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
+ if (l2len + sizeof(uint32_t) > (size_t)n)
+ /* Leave this incomplete frame for later */
+ break;
+
p += sizeof(uint32_t);
n -= sizeof(uint32_t);
- /* At most one packet might not fit in a single read, and this
- * needs to be blocking.
- */
- if (l2len > n) {
- ssize_t rem = recv(c->fd_tap, p + n, l2len - n, 0);
- if ((n += rem) != l2len)
- return;
- }
-
tap_add_packet(c, l2len, p);
p += l2len;
n -= l2len;
}
+ partial_len = n;
+ partial_frame = p;
+
tap_handler(c, now);
}
--
@@ -989,6 +989,8 @@ static void tap_sock_reset(struct ctx *c)
void tap_handler_passt(struct ctx *c, uint32_t events,
const struct timespec *now)
{
+ static const char *partial_frame;
+ static ssize_t partial_len = 0;
ssize_t n;
char *p;
@@ -997,11 +999,18 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
- p = pkt_buf;
-
tap_flush_pools();
- n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT);
+ if (partial_len) {
+ /* We have a partial frame from an earlier pass. Move it to the
+ * start of the buffer, top up with new data, then process all
+ * of it.
+ */
+ memmove(pkt_buf, partial_frame, partial_len);
+ }
+
+ n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len,
+ MSG_DONTWAIT);
if (n < 0) {
if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) {
err_perror("Error receiving from QEMU socket");
@@ -1010,7 +1019,10 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
- while (n > (ssize_t)sizeof(uint32_t)) {
+ p = pkt_buf;
+ n += partial_len;
+
+ while (n >= (ssize_t)sizeof(uint32_t)) {
uint32_t l2len = ntohl_unaligned(p);
if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU) {
@@ -1019,24 +1031,22 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
return;
}
+ if (l2len + sizeof(uint32_t) > (size_t)n)
+ /* Leave this incomplete frame for later */
+ break;
+
p += sizeof(uint32_t);
n -= sizeof(uint32_t);
- /* At most one packet might not fit in a single read, and this
- * needs to be blocking.
- */
- if (l2len > n) {
- ssize_t rem = recv(c->fd_tap, p + n, l2len - n, 0);
- if ((n += rem) != l2len)
- return;
- }
-
tap_add_packet(c, l2len, p);
p += l2len;
n -= l2len;
}
+ partial_len = n;
+ partial_frame = p;
+
tap_handler(c, now);
}
--
2.45.2
next prev parent reply other threads:[~2024-07-26 7:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-26 7:20 [PATCH 0/5] Fix assorted errors in the Qemu socket tap receive handler David Gibson
2024-07-26 7:20 ` [PATCH 1/5] tap: Better report errors receiving from QEMU socket David Gibson
2024-07-26 11:25 ` Stefano Brivio
2024-07-26 11:50 ` Stefano Brivio
2024-07-26 12:02 ` David Gibson
2024-07-26 7:20 ` [PATCH 2/5] tap: Don't attempt to carry on if we get a bad frame length from qemu David Gibson
2024-07-26 11:26 ` Stefano Brivio
2024-07-26 7:20 ` [PATCH 3/5] tap: Don't use EPOLLET on Qemu sockets David Gibson
2024-07-26 8:00 ` Stefano Brivio
2024-07-26 10:44 ` Stefano Brivio
2024-07-26 12:12 ` David Gibson
2024-07-26 13:25 ` Stefano Brivio
2024-07-29 1:15 ` David Gibson
2024-07-26 7:20 ` [PATCH 4/5] tap: Correctly handle frames of odd length David Gibson
2024-07-26 7:20 ` David Gibson [this message]
2024-07-26 11:39 ` [PATCH 5/5] tap: Improve handling of partially received frames on qemu socket Stefano Brivio
2024-07-26 12:33 ` David Gibson
2024-07-26 13:19 ` [PATCH 0/5] Fix assorted errors in the Qemu socket tap receive handler Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240726072031.3941305-6-david@gibson.dropbear.id.au \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).