public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 11/12] tap: Don't size pool_tap[46] for the maximum number of packets
Date: Thu, 2 Jan 2025 14:46:45 +1100	[thread overview]
Message-ID: <Z3YMJXIpAifPqSJ2@zatzit> (raw)
In-Reply-To: <20250101225444.130c1034@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]

On Wed, Jan 01, 2025 at 10:54:44PM +0100, Stefano Brivio wrote:
> On Fri, 20 Dec 2024 19:35:34 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently we attempt to size pool_tap[46] so they have room for the maximum
> > possible number of packets that could fit in pkt_buf, TAP_MSGS.  However,
> > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
> > the minimum possible L2 frame size.  But, we don't enforce that L2 frames
> > are at least ETH_ZLEN when we receive them from the tap backend, and since
> > we're dealing with virtual interfaces we don't have the physical Ethernet
> > limitations requiring that length.  Indeed it is possible to generate a
> > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on
> > the 'pasta' backend is only 42 bytes long).
> > 
> > It's also unclear if this limit is sufficient for vhost-user which isn't
> > limited by the size of pkt_buf as the other modes are.
> > 
> > We could attempt to correct the calculation, but that would leave us with
> > even larger arrays, which in practice rarely accumulate more than a handful
> > of packets.  So, instead, put an arbitrary cap on the number of packets we
> > can put in a batch, and if we run out of space, process and flush the
> > batch.
> 
> I ran a few more tests with this, keeping TAP_MSGS at 256, and in
> general I couldn't really see a difference in latency (especially for
> UDP streams with small packets) or throughput. Figures from short
> throughput tests (such as the ones from the test suite) look a bit more
> variable, but I don't have any statistically meaningful data.
> 
> Then I looked into how many messages we might have in the array without
> this change, and I realised that, with the throughput tests from the
> suite, we very easily exceed the 256 limit.

Ah, interesting.

> Perhaps surprisingly we get the highest buffer counts with TCP transfers
> and intermediate MTUs: we're at about 4000-5000 with 1500 bytes (and
> more like ~1000 with 1280 bytes) meaning that we move 6 to 8 megabytes
> in one shot, every 5-10ms (at 8 Gbps). With that kind of time interval,
> the extra system call overhead from forcibly flushing batches might
> become rather relevant.

Really?  I thought syscall overhead (as in the part that's
per-syscall, rather than per-work) was generally in the tens of µs
range, rather than the ms range.

But in any case, I'd be fine with upping the size of the array to 4k
or 8k based on that empirical data.  That's still much smaller than
the >150k we have now.

> With lower MTUs, it looks like we have a lower CPU load and
> transmissions are scheduled differently (resulting in smaller batches),
> but I didn't really trace things.

Ok.  I wonder if with the small MTUs we're hitting throughput
bottlenecks elsewhere which mean this particular path isn't
over-exercised.

> So I start thinking that this has the *potential* to introduce a
> performance regression in some cases and we shouldn't just assume that
> some arbitrary 256 limit is good enough. I didn't check with perf(1),
> though.
> 
> Right now that array takes, effectively, less than 100 KiB (it's ~5000
> copies of struct iovec, 16 bytes each), and in theory that could be
> ~2.5 MiB (at 161319 items). Even if we double or triple that (let's
> assume we use 2 * ETH_ALEN to keep it simple) it's not much... and will
> have no practical effect anyway.

Yeah, I guess.  I don't love the fact that currently for correctness
(not spuriously dropping packets) we rely on a fairly complex
calculation that's based on information from different layers: the
buffer size and enforcement is in the packet pool layer and is
independent of packet layout, but the minimum frame size comes from
the tap layer and depends quite specifically on which L2 encapsulation
we're using.

> All in all, I think we shouldn't change this limit without a deeper
> understanding of the practical impact. While this change doesn't bring
> any practical advantage, the current behaviour is somewhat tested by
> now, and a small limit isn't.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-01-02  3:47 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20  8:35 [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20  8:35 ` [PATCH v2 01/12] test focus David Gibson
2024-12-20  8:35 ` [PATCH v2 02/12] hack: stop on fail, but not perf fail David Gibson
2024-12-20  8:35 ` [PATCH v2 03/12] make passt dumpable David Gibson
2024-12-20  8:35 ` [PATCH v2 04/12] packet: Use flexible array member in struct pool David Gibson
2024-12-20  8:35 ` [PATCH v2 05/12] packet: Don't pass start and offset separately too packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 06/12] packet: Don't hard code maximum packet size to UINT16_MAX David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  1:00     ` David Gibson
2025-01-02 21:59       ` Stefano Brivio
2025-01-03  1:16         ` David Gibson
2025-01-05 23:43           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 07/12] packet: Remove unhelpful packet_get_try() macro David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:15     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  4:48         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 08/12] util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers David Gibson
2024-12-20  8:35 ` [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:58     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  5:06         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 10/12] packet: Move packet length checks into packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 11/12] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  3:46     ` David Gibson [this message]
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  6:06         ` David Gibson
2024-12-20  8:35 ` [PATCH v2 12/12] packet: More cautious checks to avoid pointer arithmetic UB David Gibson
2024-12-20  9:00 ` [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20 10:06   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z3YMJXIpAifPqSJ2@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).