public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets
Date: Fri, 20 Dec 2024 12:13:23 +1100	[thread overview]
Message-ID: <Z2TEs6DN5VWmxVos@zatzit> (raw)
In-Reply-To: <20241219100015.3e4b7599@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 6108 bytes --]

On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote:
> On Fri, 13 Dec 2024 23:01:56 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently we attempt to size pool_tap[46] so they have room for the maximum
> > possible number of packets that could fit in pkt_buf, TAP_MSGS.  However,
> > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
> > the minimum possible L2 frame size.  But, we don't enforce that L2 frames
> > are at least ETH_ZLEN when we receive them from the tap backend, and since
> > we're dealing with virtual interfaces we don't have the physical Ethernet
> > limitations requiring that length.  Indeed it is possible to generate a
> > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on
> > the 'pasta' backend is only 42 bytes long).
> > 
> > It's also unclear if this limit is sufficient for vhost-user which isn't
> > limited by the size of pkt_buf as the other modes are.
> > 
> > We could attempt to correct the calculation, but that would leave us with
> > even larger arrays, which in practice rarely accumulate more than a handful
> > of packets.  So, instead, put an arbitrary cap on the number of packets we
> > can put in a batch, and if we run out of space, process and flush the
> > batch.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  packet.c    | 13 ++++++++++++-
> >  packet.h    |  3 +++
> >  passt.h     |  2 --
> >  tap.c       | 18 +++++++++++++++---
> >  tap.h       |  3 ++-
> >  vu_common.c |  3 ++-
> >  6 files changed, 34 insertions(+), 8 deletions(-)
> > 
> > diff --git a/packet.c b/packet.c
> > index 5bfa7304..b68580cc 100644
> > --- a/packet.c
> > +++ b/packet.c
> > @@ -22,6 +22,17 @@
> >  #include "util.h"
> >  #include "log.h"
> >  
> > +/**
> > + * pool_full() - Is a packet pool full?
> > + * @p:		Pointer to packet pool
> > + *
> > + * Return: true if the pool is full, false if more packets can be added
> > + */
> > +bool pool_full(const struct pool *p)
> > +{
> > +	return p->count >= p->size;
> > +}
> > +
> >  /**
> >   * packet_add_do() - Add data as packet descriptor to given pool
> >   * @p:		Existing pool
> > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> >  {
> >  	size_t idx = p->count;
> >  
> > -	if (idx >= p->size) {
> > +	if (pool_full(p)) {
> >  		trace("add packet index %zu to pool with size %zu, %s:%i",
> >  		      idx, p->size, func, line);
> >  		return;
> > diff --git a/packet.h b/packet.h
> > index 98eb8812..3618f213 100644
> > --- a/packet.h
> > +++ b/packet.h
> > @@ -6,6 +6,8 @@
> >  #ifndef PACKET_H
> >  #define PACKET_H
> >  
> > +#include <stdbool.h>
> > +
> >  /**
> >   * struct pool - Generic pool of packets stored in nmemory
> >   * @size:	Number of usable descriptors for the pool
> > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> >  void *packet_get_do(const struct pool *p, const size_t idx,
> >  		    size_t offset, size_t len, size_t *left,
> >  		    const char *func, int line);
> > +bool pool_full(const struct pool *p);
> >  void pool_flush(struct pool *p);
> >  
> >  #define packet_add(p, len, start)					\
> > diff --git a/passt.h b/passt.h
> > index 0dd4efa0..81b2787f 100644
> > --- a/passt.h
> > +++ b/passt.h
> > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data),
> >  
> >  #define TAP_BUF_BYTES							\
> >  	ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE)
> > -#define TAP_MSGS							\
> > -	DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t))
> >  
> >  #define PKT_BUF_BYTES		MAX(TAP_BUF_BYTES, 0)
> >  extern char pkt_buf		[PKT_BUF_BYTES];
> > diff --git a/tap.c b/tap.c
> > index 68231f09..42370a26 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -61,6 +61,8 @@
> >  #include "vhost_user.h"
> >  #include "vu_common.h"
> >  
> > +#define TAP_MSGS		256
> 
> Sorry, I stopped at 2/3, had just a quick look at this one, and I
> missed this.
> 
> Assuming 4 KiB pages, this changes from 161319 to 256. You mention that

Yes.  I'm certainly open to arguments on what the number should be.

> in practice we never have more than a handful of messages, which is
> probably almost always the case, but I wonder if that's also the case
> with UDP "real-time" streams, where we could have bursts of a few
> hundred (thousand?) messages at a time.

Maybe.  If we are getting them in large bursts, then we're no longer
really suceeding at the streams being "real-time", but sure, we should
try to catch up as best we can.

> I wonder: how bad would it be to correct the calculation, instead? We
> wouldn't actually use more memory, right?

I was pretty painful when I tried, and it would use more memory.  The
safe option would be to use ETH_HLEN as the minimum size (which is
pretty much all we enforce in the tap layer), which would expand the
iovec array here by 2-3x.  It's not enormous, but it's not nothing.
Or do you mean the unused pages of the array would never be
instantiated?  In which case, yeah, I guess not.

Remember that with the changes in this patch if we exceed TAP_MSGS,
nothing particularly bad happens: we don't crash, and we don't drop
packets; we just process things in batches of TAP_MSGS frames at a
time.  So this doesn't need to be large enough to handle any burst we
could ever get, just large enough to adequately mitigate the per-batch
costs, which I don't think are _that_ large.  256 was a first guess at
that.  Maybe it's not enough, but I'd be pretty surprised if it needed
to be greater than ~1000 to make the per-batch costs negligible
compared to the per-frame costs.  UDP_MAX_FRAMES, which is on the
reverse path but serves a similar function, is only 32.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-12-20  1:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13 12:01 [PATCH 0/3] Cleanups to packet pool handling and sizing David Gibson
2024-12-13 12:01 ` [PATCH 1/3] packet: Use flexible array member in struct pool David Gibson
2024-12-13 12:01 ` [PATCH 2/3] packet: Don't have struct pool specify its buffer David Gibson
2024-12-19  9:00   ` Stefano Brivio
2024-12-20  0:59     ` David Gibson
2024-12-20  9:51       ` Stefano Brivio
2024-12-21  6:59         ` David Gibson
2024-12-13 12:01 ` [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2024-12-19  9:00   ` Stefano Brivio
2024-12-20  1:13     ` David Gibson [this message]
2024-12-20  9:51       ` Stefano Brivio
2024-12-21  7:00         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2TEs6DN5VWmxVos@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).