From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets
Date: Fri, 20 Dec 2024 10:51:36 +0100 [thread overview]
Message-ID: <20241220105136.5ef4e018@elisabeth> (raw)
In-Reply-To: <Z2TEs6DN5VWmxVos@zatzit>
On Fri, 20 Dec 2024 12:13:23 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote:
> > On Fri, 13 Dec 2024 23:01:56 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > Currently we attempt to size pool_tap[46] so they have room for the maximum
> > > possible number of packets that could fit in pkt_buf, TAP_MSGS. However,
> > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
> > > the minimum possible L2 frame size. But, we don't enforce that L2 frames
> > > are at least ETH_ZLEN when we receive them from the tap backend, and since
> > > we're dealing with virtual interfaces we don't have the physical Ethernet
> > > limitations requiring that length. Indeed it is possible to generate a
> > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on
> > > the 'pasta' backend is only 42 bytes long).
> > >
> > > It's also unclear if this limit is sufficient for vhost-user which isn't
> > > limited by the size of pkt_buf as the other modes are.
> > >
> > > We could attempt to correct the calculation, but that would leave us with
> > > even larger arrays, which in practice rarely accumulate more than a handful
> > > of packets. So, instead, put an arbitrary cap on the number of packets we
> > > can put in a batch, and if we run out of space, process and flush the
> > > batch.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > packet.c | 13 ++++++++++++-
> > > packet.h | 3 +++
> > > passt.h | 2 --
> > > tap.c | 18 +++++++++++++++---
> > > tap.h | 3 ++-
> > > vu_common.c | 3 ++-
> > > 6 files changed, 34 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/packet.c b/packet.c
> > > index 5bfa7304..b68580cc 100644
> > > --- a/packet.c
> > > +++ b/packet.c
> > > @@ -22,6 +22,17 @@
> > > #include "util.h"
> > > #include "log.h"
> > >
> > > +/**
> > > + * pool_full() - Is a packet pool full?
> > > + * @p: Pointer to packet pool
> > > + *
> > > + * Return: true if the pool is full, false if more packets can be added
> > > + */
> > > +bool pool_full(const struct pool *p)
> > > +{
> > > + return p->count >= p->size;
> > > +}
> > > +
> > > /**
> > > * packet_add_do() - Add data as packet descriptor to given pool
> > > * @p: Existing pool
> > > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> > > {
> > > size_t idx = p->count;
> > >
> > > - if (idx >= p->size) {
> > > + if (pool_full(p)) {
> > > trace("add packet index %zu to pool with size %zu, %s:%i",
> > > idx, p->size, func, line);
> > > return;
> > > diff --git a/packet.h b/packet.h
> > > index 98eb8812..3618f213 100644
> > > --- a/packet.h
> > > +++ b/packet.h
> > > @@ -6,6 +6,8 @@
> > > #ifndef PACKET_H
> > > #define PACKET_H
> > >
> > > +#include <stdbool.h>
> > > +
> > > /**
> > > * struct pool - Generic pool of packets stored in nmemory
> > > * @size: Number of usable descriptors for the pool
> > > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> > > void *packet_get_do(const struct pool *p, const size_t idx,
> > > size_t offset, size_t len, size_t *left,
> > > const char *func, int line);
> > > +bool pool_full(const struct pool *p);
> > > void pool_flush(struct pool *p);
> > >
> > > #define packet_add(p, len, start) \
> > > diff --git a/passt.h b/passt.h
> > > index 0dd4efa0..81b2787f 100644
> > > --- a/passt.h
> > > +++ b/passt.h
> > > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data),
> > >
> > > #define TAP_BUF_BYTES \
> > > ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE)
> > > -#define TAP_MSGS \
> > > - DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t))
> > >
> > > #define PKT_BUF_BYTES MAX(TAP_BUF_BYTES, 0)
> > > extern char pkt_buf [PKT_BUF_BYTES];
> > > diff --git a/tap.c b/tap.c
> > > index 68231f09..42370a26 100644
> > > --- a/tap.c
> > > +++ b/tap.c
> > > @@ -61,6 +61,8 @@
> > > #include "vhost_user.h"
> > > #include "vu_common.h"
> > >
> > > +#define TAP_MSGS 256
> >
> > Sorry, I stopped at 2/3, had just a quick look at this one, and I
> > missed this.
> >
> > Assuming 4 KiB pages, this changes from 161319 to 256. You mention that
>
> Yes. I'm certainly open to arguments on what the number should be.
No idea, until now I just thought we'd have a limit that we can't hit
in practice. Let me have a look at what happens with 256 (your new
series) and iperf3 or udp_stream from neper.
> > in practice we never have more than a handful of messages, which is
> > probably almost always the case, but I wonder if that's also the case
> > with UDP "real-time" streams, where we could have bursts of a few
> > hundred (thousand?) messages at a time.
>
> Maybe. If we are getting them in large bursts, then we're no longer
> really suceeding at the streams being "real-time", but sure, we should
> try to catch up as best we can.
It could even be that flushing more frequently actually improves
things. I'm not sure. I was just pointing out that, quite likely, we
can actually hit the new limit.
> > I wonder: how bad would it be to correct the calculation, instead? We
> > wouldn't actually use more memory, right?
>
> I was pretty painful when I tried, and it would use more memory. The
> safe option would be to use ETH_HLEN as the minimum size (which is
> pretty much all we enforce in the tap layer), which would expand the
> iovec array here by 2-3x. It's not enormous, but it's not nothing.
> Or do you mean the unused pages of the array would never be
> instantiated? In which case, yeah, I guess not.
Yes, that's what I meant.
> Remember that with the changes in this patch if we exceed TAP_MSGS,
> nothing particularly bad happens: we don't crash, and we don't drop
> packets; we just process things in batches of TAP_MSGS frames at a
> time. So this doesn't need to be large enough to handle any burst we
> could ever get, just large enough to adequately mitigate the per-batch
> costs, which I don't think are _that_ large. 256 was a first guess at
> that. Maybe it's not enough, but I'd be pretty surprised if it needed
> to be greater than ~1000 to make the per-batch costs negligible
> compared to the per-frame costs. UDP_MAX_FRAMES, which is on the
> reverse path but serves a similar function, is only 32.
Okay, let me give that a try. I guess you didn't see a change in UDP
throughput tests... but there we use fairly large messages.
--
Stefano
next prev parent reply other threads:[~2024-12-20 9:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 12:01 [PATCH 0/3] Cleanups to packet pool handling and sizing David Gibson
2024-12-13 12:01 ` [PATCH 1/3] packet: Use flexible array member in struct pool David Gibson
2024-12-13 12:01 ` [PATCH 2/3] packet: Don't have struct pool specify its buffer David Gibson
2024-12-19 9:00 ` Stefano Brivio
2024-12-20 0:59 ` David Gibson
2024-12-20 9:51 ` Stefano Brivio
2024-12-21 6:59 ` David Gibson
2024-12-13 12:01 ` [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2024-12-19 9:00 ` Stefano Brivio
2024-12-20 1:13 ` David Gibson
2024-12-20 9:51 ` Stefano Brivio [this message]
2024-12-21 7:00 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241220105136.5ef4e018@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).