From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 195455A031E for ; Fri, 26 Jul 2024 09:20:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1721978433; bh=ttrA3jzU2CyoQUc/woLoTwRCfa77A9/5K9+ksOQoz/E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rn1UPhlLzOQFTr7a38sikWfEPQjgwu8o2PqB7o97Bu4qQVETgBIViiZ8jdCdc+6fr LT0jmwKVxvfTTo7i58ith0ExBeSGkpJ6ljIupIDldLNsBNEmDN5qe06DuyJKz9cla8 itd+cMEVmZyJK3W7mLVObO3dI+LUKb3FZ9WxAtmoR4nXjb2/mLIUdMaCqs5YmFg5dk ppFmFrbW0R3i0bz0WbQ0SBMKPwfLVtFg5VH3Fzf6WLXuKiitwj8NeoWC6OuDLoSQy6 xdfFGi0/NgPQhQc8xleN7TyAfiBh9Yp3BnS2NJQqqd1OnH0h9gDWppdOCPecctW+JW ZyU5i8xFsHqig== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WVfKs12mWz4x3p; Fri, 26 Jul 2024 17:20:33 +1000 (AEST) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH 5/5] tap: Improve handling of partially received frames on qemu socket Date: Fri, 26 Jul 2024 17:20:31 +1000 Message-ID: <20240726072031.3941305-6-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240726072031.3941305-1-david@gibson.dropbear.id.au> References: <20240726072031.3941305-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: PDIH3O2ADVJC3VPIRDALVHP7E3TMIAQT X-Message-ID-Hash: PDIH3O2ADVJC3VPIRDALVHP7E3TMIAQT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Because the Unix socket to qemu is a stream socket, we have no guarantee of where the boundaries between recv() calls will lie. Typically they will lie on frame boundaries, because that's how qemu will send then, but we can't rely on it. Currently we handle this case by detecting when we have received a partial frame and performing a blocking recv() to get the remainder, and only then processing the frames. Change it so instead we save the partial frame persistently and include it as the first thing processed next time we receive data from the socket. This handles a number of (unlikely) cases which previously would not be dealt with correctly: * If qemu sent a partial frame then waited some time before sending the remainder, previously we could block here for an unacceptably long time * If qemu sent a tiny partial frame (< 4 bytes) we'd leave the loop without doing the partial frame handling, which would put us out of sync with the stream from qemu * If a the blocking recv() only received some of the remainder of the frame, not all of it, we'd return leaving us out of sync with the stream again Caveat: This could memmove() a moderate amount of data (ETH_MAX_MTU). This is probably acceptable because it's an unlikely case in practice. If necessary we could mitigate this by using a true ring buffer. Signed-off-by: David Gibson --- passt.h | 1 - tap.c | 36 +++++++++++++++++++++++------------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/passt.h b/passt.h index 4cc2b6f0..d0f31a23 100644 --- a/passt.h +++ b/passt.h @@ -60,7 +60,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data), #define TAP_BUF_BYTES \ ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE) -#define TAP_BUF_FILL (TAP_BUF_BYTES - ETH_MAX_MTU - sizeof(uint32_t)) #define TAP_MSGS \ DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t)) diff --git a/tap.c b/tap.c index aca27bb9..18eec279 100644 --- a/tap.c +++ b/tap.c @@ -989,6 +989,8 @@ static void tap_sock_reset(struct ctx *c) void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now) { + static const char *partial_frame; + static ssize_t partial_len = 0; ssize_t n; char *p; @@ -997,11 +999,18 @@ void tap_handler_passt(struct ctx *c, uint32_t events, return; } - p = pkt_buf; - tap_flush_pools(); - n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT); + if (partial_len) { + /* We have a partial frame from an earlier pass. Move it to the + * start of the buffer, top up with new data, then process all + * of it. + */ + memmove(pkt_buf, partial_frame, partial_len); + } + + n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len, + MSG_DONTWAIT); if (n < 0) { if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) { err_perror("Error receiving from QEMU socket"); @@ -1010,7 +1019,10 @@ void tap_handler_passt(struct ctx *c, uint32_t events, return; } - while (n > (ssize_t)sizeof(uint32_t)) { + p = pkt_buf; + n += partial_len; + + while (n >= (ssize_t)sizeof(uint32_t)) { uint32_t l2len = ntohl_unaligned(p); if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU) { @@ -1019,24 +1031,22 @@ void tap_handler_passt(struct ctx *c, uint32_t events, return; } + if (l2len + sizeof(uint32_t) > (size_t)n) + /* Leave this incomplete frame for later */ + break; + p += sizeof(uint32_t); n -= sizeof(uint32_t); - /* At most one packet might not fit in a single read, and this - * needs to be blocking. - */ - if (l2len > n) { - ssize_t rem = recv(c->fd_tap, p + n, l2len - n, 0); - if ((n += rem) != l2len) - return; - } - tap_add_packet(c, l2len, p); p += l2len; n -= l2len; } + partial_len = n; + partial_frame = p; + tap_handler(c, now); } -- 2.45.2