From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id B651F5A004F for ; Fri, 26 Jul 2024 09:20:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1721978433; bh=OdBo8wGuvPXLZ6cHeeLYzEwQsfRo6EqJRRJZcQW8dtA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G/OoADCEgk9psVgapxNtAJxDsPdabBd7xkjobtC3tkepjOaB2jwyDa12eyDIA8uRE b603K7gxHMj3i0oy8uzV2tVwE73GHPCeuONDebof4DJz7w8A6fGvYVakUyDJz2iYIc v+Pst/oOHMV4tX/hcUvFYk4O3Sz1vcj1IIws4psN5rWK/aaWeO4eEiP1pydMnmRThs ky3LJKS0+2+wwb1D+fMz+k3qitIHFE8y7pvBChRx2FXaWIhFPoYkA3B63Wt5Q7Few3 3blF9RMWspRNHQuoQwgSc7Hkm6jMdZ19ENPoqZNRDzFkYE2Rx9IgmcZvvfQEmR52yY iMFKqtFOE5cFQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WVfKs0qC8z4x42; Fri, 26 Jul 2024 17:20:33 +1000 (AEST) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH 3/5] tap: Don't use EPOLLET on Qemu sockets Date: Fri, 26 Jul 2024 17:20:29 +1000 Message-ID: <20240726072031.3941305-4-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240726072031.3941305-1-david@gibson.dropbear.id.au> References: <20240726072031.3941305-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: H4A6E5OI6YSGUN6JMDI64IRFJHMKXARR X-Message-ID-Hash: H4A6E5OI6YSGUN6JMDI64IRFJHMKXARR X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Currently we set EPOLLET (edge trigger) on the epoll flags for the connected Qemu Unix socket. It's not clear that there's a reason for doing this: for TCP sockets we need to use EPOLLET, because we leave data in the socket buffers for our flow control handling. That consideration doesn't apply to the way we handle the qemu socket however. Furthermore, using EPOLLET causes additional complications: 1) We don't set EPOLLET when opening /dev/net/tun for pasta mode, however we *do* set it when using pasta mode with --fd. This inconsistency doesn't seem to have broken anything, but it's odd. 2) EPOLLET requires that tap_handler_passt() loop until all data available is read (otherwise we may have data in the buffer but never get an event causing us to read it). We do that with a rather ugly goto. Worse, our condition for that goto appears to be incorrect. We'll only loop if rem is non-zero, which will only happen if we perform a blocking recv() for a partially received frame. We'll only perform that second recv() if the original recv() resulted in a partially read frame. As far as I can tell the original recv() could end on a frame boundary (never triggering the second recv()) even if there is additional data in the socket buffer. In that circumstance we wouldn't goto redo and could leave unprocessed frames in the qemu socket buffer indefinitely. This doesn't seem to have caused any problems in practice, but since there's no obvious reason to use EPOLLET here anyway, we might as well get rid of it. Signed-off-by: David Gibson --- tap.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/tap.c b/tap.c index 08700f65..df1287ad 100644 --- a/tap.c +++ b/tap.c @@ -989,7 +989,7 @@ static void tap_sock_reset(struct ctx *c) void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now) { - ssize_t n, rem; + ssize_t n; char *p; if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) { @@ -997,9 +997,7 @@ void tap_handler_passt(struct ctx *c, uint32_t events, return; } -redo: p = pkt_buf; - rem = 0; tap_flush_pools(); @@ -1028,7 +1026,7 @@ redo: * needs to be blocking. */ if (l2len > n) { - rem = recv(c->fd_tap, p + n, l2len - n, 0); + ssize_t rem = recv(c->fd_tap, p + n, l2len - n, 0); if ((n += rem) != l2len) return; } @@ -1040,10 +1038,6 @@ redo: } tap_handler(c, now); - - /* We can't use EPOLLET otherwise. */ - if (rem) - goto redo; } /** @@ -1228,7 +1222,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events) trace("tap: failed to set SO_SNDBUF to %i", v); ref.fd = c->fd_tap; - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; + ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); } @@ -1317,7 +1311,7 @@ void tap_sock_init(struct ctx *c) else ref.type = EPOLL_TYPE_TAP_PASTA; - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; + ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); return; -- 2.45.2