public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 06/12] packet: Don't hard code maximum packet size to UINT16_MAX
Date: Thu, 2 Jan 2025 12:00:30 +1100	[thread overview]
Message-ID: <Z3XlLkHJpSCdGZ1L@zatzit> (raw)
In-Reply-To: <20250101225433.45f52b86@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 4501 bytes --]

On Wed, Jan 01, 2025 at 10:54:33PM +0100, Stefano Brivio wrote:
> On Fri, 20 Dec 2024 19:35:29 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We verify that every packet we store in a pool - and every partial packet
> > we retreive from it has a length no longer than UINT16_MAX.  This
> > originated in the older packet pool implementation which stored packet
> > lengths in a uint16_t.  Now, that packets are represented by a struct
> > iovec with its size_t length, this check serves only as a sanity / security
> > check that we don't have some wildly out of range length due to a bug
> > elsewhere.
> > 
> > However, UINT16_MAX (65535) isn't quite enough, because the "packet" as
> > stored in the pool is in fact an entire frame including both L2 and any
> > backend specific headers.  We can exceed this in passt mode, even with the
> > default MTU: 65520 bytes of IP datagram + 14 bytes of Ethernet header +
> > 4 bytes of qemu stream length header = 65538 bytes.
> > 
> > Introduce our own define for the maximum length of a packet in the pool and
> > set it slightly larger, allowing 128 bytes for L2 and/or other backend
> > specific headers.  We'll use different amounts of that depending on the
> > tap backend, but since this is just a sanity check, the bound doesn't need
> > to be 100% tight.
> 
> I couldn't find the time to check what's the maximum amount of bytes we
> can get here depending on hypervisor and interface, but if this patch

So, it's a separate calculation for each backend type, and some of
them are pretty tricky.

For anything based on the kernel tap device it is 65535, because it
has an internal frame size limit of 65535, already including any L2
headers (it explicitly limits the MTU to 65535 - hard_header_len).
There is no "hardware" header.

For the qemu stream protocol it gets pretty complicated, because there
are multiple layers which could clamp the maximum size.  It doesn't
look like the socket protocol code itself imposes a limit beyond the
structural one of (2^32-1 + 4) (well, and putting it into an ssize_t,
which could be less for 32-bit systems).  AFAICT, it's not
theoretically impossible to have gigabyte frames with a weird virtual
NIC model... though obviously that wouldn't be IP, and probably not
even Ethernet.

Each virtual NIC could have its own limit.  I suspect that's going to
be in the vicinity of 64k.  But, I'm really struggling to figure out
what it is just for virtio-net, so I really don't want to try to
figure it out for all of them.  With a virtio-net NIC, I seem to be
able to set MTU all the way up to 65535 successfully, which implies a
maximum frame size of 65535 + 14 (L2 header) + 4 (stream protocol
header) = 65553 at least.

Similar situation for vhost-user, where I'm finding it even more
inscrutable to figure out what limits are imposed at the sub-IP
levels.  At the moment the "hardware" header
(virtio_net_hdr_mrg_rxbuf) doesn't count towards what we store in the
packet.c layer, but we might have reasons to change that.

So, any sub-IP limits for qemu, I'm basically not managing to find.
However, we (more or less) only care about IP, which imposes a more
practical limit of: 65535 + L2 header size + "hardware" header size.

At present that maxes out at 65553, as above, but if we ever support
other L2 encapsulations, or other backend protocols with larger
"hardware" headers, that could change.

> fixes an actual issue as you seem to imply, actually checking that with
> QEMU and muvm would be nice.
> 
> By the way, as you mention a specific calculation, does it really make
> sense to use a "good enough" value here? Can we ever exceed 65538
> bytes, or can we use that as limit? It would be good to find out, while
> at it.

So, yes, I think we can exceed 65538.  But more significantly, trying
to make the limit tight here feels like a borderline layering
violation.  The packet layer doesn't really care about the frame size
as long as it's "sane".  Fwiw, in the draft changes I have improving
MTU handling, it's my intention that individual backends calculate
and/or enforce tighter limits of their own where practical, and
BUILD_ASSERT() that those fit within the packet layer's frame size
limit.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-01-02  1:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20  8:35 [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20  8:35 ` [PATCH v2 01/12] test focus David Gibson
2024-12-20  8:35 ` [PATCH v2 02/12] hack: stop on fail, but not perf fail David Gibson
2024-12-20  8:35 ` [PATCH v2 03/12] make passt dumpable David Gibson
2024-12-20  8:35 ` [PATCH v2 04/12] packet: Use flexible array member in struct pool David Gibson
2024-12-20  8:35 ` [PATCH v2 05/12] packet: Don't pass start and offset separately too packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 06/12] packet: Don't hard code maximum packet size to UINT16_MAX David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  1:00     ` David Gibson [this message]
2025-01-02 21:59       ` Stefano Brivio
2025-01-03  1:16         ` David Gibson
2025-01-05 23:43           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 07/12] packet: Remove unhelpful packet_get_try() macro David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:15     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  4:48         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 08/12] util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers David Gibson
2024-12-20  8:35 ` [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:58     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  5:06         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 10/12] packet: Move packet length checks into packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 11/12] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  3:46     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  6:06         ` David Gibson
2024-12-20  8:35 ` [PATCH v2 12/12] packet: More cautious checks to avoid pointer arithmetic UB David Gibson
2024-12-20  9:00 ` [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20 10:06   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z3XlLkHJpSCdGZ1L@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).