From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 079AF5A004F for ; Fri, 26 Jul 2024 14:37:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1721997440; bh=G40LEvS3A/Q6Fk4EsOmaVuD64P/GOVJOxiuC4j5F6Q4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ZDOf16YYjO5+/iJkhqg7NjnNi/80HiDQorfS0MWCU/JytU9omab/l6rLKDX4+AmLX lmV02aDkRe2DodyVDp1mZcJ9zqNpcrIKjaAv+AZMD5OPzydt4hhWNr2Gl4oxGAlXpe ri9vOqQnHHo/Sw0W6p+g6bdWVn54ROEsvd4+yDx36sRNw6pf572lrPOV2C7lJHM5uk Q7aUSfv0xkkrVWG2MQyJONtX3sRieOyguYY6CX6yqR4nhyqINH4pIlH5uhTjztbAd2 ahPWvICK+IiI2NBRuPkfiCWK3L5meDAkr9R7JLyRg7keaJ+gatNOdWyLm21AmsMmzb UljdbRD7CH1LA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WVnMN0NJhz4x81; Fri, 26 Jul 2024 22:37:20 +1000 (AEST) Date: Fri, 26 Jul 2024 22:33:09 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 5/5] tap: Improve handling of partially received frames on qemu socket Message-ID: References: <20240726072031.3941305-1-david@gibson.dropbear.id.au> <20240726072031.3941305-6-david@gibson.dropbear.id.au> <20240726133913.6cfa11f5@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="f/Z0mmgo/K2yOCCD" Content-Disposition: inline In-Reply-To: <20240726133913.6cfa11f5@elisabeth> Message-ID-Hash: RPPBZNC4YL2HL6CSCU57SISS4QWVL6V4 X-Message-ID-Hash: RPPBZNC4YL2HL6CSCU57SISS4QWVL6V4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --f/Z0mmgo/K2yOCCD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 26, 2024 at 01:39:13PM +0200, Stefano Brivio wrote: > On Fri, 26 Jul 2024 17:20:31 +1000 > David Gibson wrote: >=20 > > Because the Unix socket to qemu is a stream socket, we have no guarantee > > of where the boundaries between recv() calls will lie. Typically they > > will lie on frame boundaries, because that's how qemu will send then, b= ut > > we can't rely on it. > >=20 > > Currently we handle this case by detecting when we have received a part= ial > > frame and performing a blocking recv() to get the remainder, and only t= hen > > processing the frames. Change it so instead we save the partial frame > > persistently and include it as the first thing processed next time we > > receive data from the socket. This handles a number of (unlikely) cases > > which previously would not be dealt with correctly: > >=20 > > * If qemu sent a partial frame then waited some time before sending the > > remainder, previously we could block here for an unacceptably long ti= me > > * If qemu sent a tiny partial frame (< 4 bytes) we'd leave the loop wit= hout > > doing the partial frame handling, which would put us out of sync with > > the stream from qemu > > * If a the blocking recv() only received some of the remainder of the > > frame, not all of it, we'd return leaving us out of sync with the > > stream again > >=20 > > Caveat: This could memmove() a moderate amount of data (ETH_MAX_MTU). = This > > is probably acceptable because it's an unlikely case in practice. If > > necessary we could mitigate this by using a true ring buffer. >=20 > I don't think that that memmove() is a problem if we have a single > recv(), even if it happens to be one memmove() for every recv() (guest > filling up the buffer, common in throughput tests and bulk transfers), > because it's very small in relative terms anyway. >=20 > I think the ringbuffer would be worth it with multiple recv() calls per > epoll wakeup, with EPOLLET. So first, as noted on the earlier patch, I don't think multiple recv()s per wakeup requires EPOLLET, though the reverse is true. Regardless, AFAICT the proportion of memmove()s to data received would not vary regardless of whether we do multiple recv()s per wakeup or the same number of recv()s split across multiple wakeups. Of course, if the recv()s line up with frame boundaries, as we expect, then it doesn't matter anyway, since we won't get partial frames and we won't need memmove()s. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --f/Z0mmgo/K2yOCCD Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmajl3EACgkQzQJF27ox 2GeBDBAAlmQLMlDBxhg3vbvFuf5jqI9sHpYYplGY5Z06m7FfzBdYrdZ406gVSGuF 4mWMqWFIkxzU9xjcKijF4L9iyJUkr+/JCFb7owcszFWI5gUmn/k303E4uH+iAWlL +FFUvtxxpSeY/yyTRIbIDmzIevqKvf9RQmRzWSLg+zrW8NpWBvf6ct69JrLVWfzM Dl7b5l2kLPXrchhaYtoUfrfsXnY+zsYTbLFFi7sZzn8ugU3rjh0OfP9zNtVk32hx +jaHdkn7KtVsoxp8G+K7lw5Faj25XF9l6jExSvaJTPj4efvF3yg+JaQGysb4cA4w H+kI69koQxwECvcxb9SAWrUqJ5MIA54BJEfrW/qXW6Tn+HCelf8BeHm5ypJHFlMx 311a1slZ4Ekg6cijBNgIvQlpmNry2vEliYD3X+T51HrhfGuhbi2v9JHR1RAIpyeJ SUFvx3yRYt94U3k3LTOhx2kP3HcOZZhC+wHpY7URTSzf4Z2KyB1wp7SL5PzoIc/F vzaOh7HHL2Z223qpYDXwyIofJuw8YtRnui3qIkZArn/q3zRttBfINGHlAJBn+Mwb yRxDBtNxcpDfWPrc5vdOAFK4DJ2pzMAz1JFvaAueicOVqxb695NM2au8BMrei1KB LM0rtFi7tUrJuBLLd+uJ8+c3g3lWYd6Ig+1k1I76jWLnBUsDGAo= =WbOP -----END PGP SIGNATURE----- --f/Z0mmgo/K2yOCCD--