public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets
Date: Sat, 21 Dec 2024 18:00:44 +1100	[thread overview]
Message-ID: <Z2ZnnBZlxrsiBlyP@zatzit> (raw)
In-Reply-To: <20241220105136.5ef4e018@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 7437 bytes --]

On Fri, Dec 20, 2024 at 10:51:36AM +0100, Stefano Brivio wrote:
> On Fri, 20 Dec 2024 12:13:23 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote:
> > > On Fri, 13 Dec 2024 23:01:56 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Currently we attempt to size pool_tap[46] so they have room for the maximum
> > > > possible number of packets that could fit in pkt_buf, TAP_MSGS.  However,
> > > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
> > > > the minimum possible L2 frame size.  But, we don't enforce that L2 frames
> > > > are at least ETH_ZLEN when we receive them from the tap backend, and since
> > > > we're dealing with virtual interfaces we don't have the physical Ethernet
> > > > limitations requiring that length.  Indeed it is possible to generate a
> > > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on
> > > > the 'pasta' backend is only 42 bytes long).
> > > > 
> > > > It's also unclear if this limit is sufficient for vhost-user which isn't
> > > > limited by the size of pkt_buf as the other modes are.
> > > > 
> > > > We could attempt to correct the calculation, but that would leave us with
> > > > even larger arrays, which in practice rarely accumulate more than a handful
> > > > of packets.  So, instead, put an arbitrary cap on the number of packets we
> > > > can put in a batch, and if we run out of space, process and flush the
> > > > batch.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  packet.c    | 13 ++++++++++++-
> > > >  packet.h    |  3 +++
> > > >  passt.h     |  2 --
> > > >  tap.c       | 18 +++++++++++++++---
> > > >  tap.h       |  3 ++-
> > > >  vu_common.c |  3 ++-
> > > >  6 files changed, 34 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/packet.c b/packet.c
> > > > index 5bfa7304..b68580cc 100644
> > > > --- a/packet.c
> > > > +++ b/packet.c
> > > > @@ -22,6 +22,17 @@
> > > >  #include "util.h"
> > > >  #include "log.h"
> > > >  
> > > > +/**
> > > > + * pool_full() - Is a packet pool full?
> > > > + * @p:		Pointer to packet pool
> > > > + *
> > > > + * Return: true if the pool is full, false if more packets can be added
> > > > + */
> > > > +bool pool_full(const struct pool *p)
> > > > +{
> > > > +	return p->count >= p->size;
> > > > +}
> > > > +
> > > >  /**
> > > >   * packet_add_do() - Add data as packet descriptor to given pool
> > > >   * @p:		Existing pool
> > > > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> > > >  {
> > > >  	size_t idx = p->count;
> > > >  
> > > > -	if (idx >= p->size) {
> > > > +	if (pool_full(p)) {
> > > >  		trace("add packet index %zu to pool with size %zu, %s:%i",
> > > >  		      idx, p->size, func, line);
> > > >  		return;
> > > > diff --git a/packet.h b/packet.h
> > > > index 98eb8812..3618f213 100644
> > > > --- a/packet.h
> > > > +++ b/packet.h
> > > > @@ -6,6 +6,8 @@
> > > >  #ifndef PACKET_H
> > > >  #define PACKET_H
> > > >  
> > > > +#include <stdbool.h>
> > > > +
> > > >  /**
> > > >   * struct pool - Generic pool of packets stored in nmemory
> > > >   * @size:	Number of usable descriptors for the pool
> > > > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> > > >  void *packet_get_do(const struct pool *p, const size_t idx,
> > > >  		    size_t offset, size_t len, size_t *left,
> > > >  		    const char *func, int line);
> > > > +bool pool_full(const struct pool *p);
> > > >  void pool_flush(struct pool *p);
> > > >  
> > > >  #define packet_add(p, len, start)					\
> > > > diff --git a/passt.h b/passt.h
> > > > index 0dd4efa0..81b2787f 100644
> > > > --- a/passt.h
> > > > +++ b/passt.h
> > > > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data),
> > > >  
> > > >  #define TAP_BUF_BYTES							\
> > > >  	ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE)
> > > > -#define TAP_MSGS							\
> > > > -	DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t))
> > > >  
> > > >  #define PKT_BUF_BYTES		MAX(TAP_BUF_BYTES, 0)
> > > >  extern char pkt_buf		[PKT_BUF_BYTES];
> > > > diff --git a/tap.c b/tap.c
> > > > index 68231f09..42370a26 100644
> > > > --- a/tap.c
> > > > +++ b/tap.c
> > > > @@ -61,6 +61,8 @@
> > > >  #include "vhost_user.h"
> > > >  #include "vu_common.h"
> > > >  
> > > > +#define TAP_MSGS		256  
> > > 
> > > Sorry, I stopped at 2/3, had just a quick look at this one, and I
> > > missed this.
> > > 
> > > Assuming 4 KiB pages, this changes from 161319 to 256. You mention that  
> > 
> > Yes.  I'm certainly open to arguments on what the number should be.
> 
> No idea, until now I just thought we'd have a limit that we can't hit
> in practice. Let me have a look at what happens with 256 (your new
> series) and iperf3 or udp_stream from neper.
> 
> > > in practice we never have more than a handful of messages, which is
> > > probably almost always the case, but I wonder if that's also the case
> > > with UDP "real-time" streams, where we could have bursts of a few
> > > hundred (thousand?) messages at a time.  
> > 
> > Maybe.  If we are getting them in large bursts, then we're no longer
> > really suceeding at the streams being "real-time", but sure, we should
> > try to catch up as best we can.
> 
> It could even be that flushing more frequently actually improves
> things. I'm not sure. I was just pointing out that, quite likely, we
> can actually hit the new limit.
> 
> > > I wonder: how bad would it be to correct the calculation, instead? We
> > > wouldn't actually use more memory, right?  
> > 
> > I was pretty painful when I tried, and it would use more memory.  The
> > safe option would be to use ETH_HLEN as the minimum size (which is
> > pretty much all we enforce in the tap layer), which would expand the
> > iovec array here by 2-3x.  It's not enormous, but it's not nothing.
> > Or do you mean the unused pages of the array would never be
> > instantiated?  In which case, yeah, I guess not.
> 
> Yes, that's what I meant.
> 
> > Remember that with the changes in this patch if we exceed TAP_MSGS,
> > nothing particularly bad happens: we don't crash, and we don't drop
> > packets; we just process things in batches of TAP_MSGS frames at a
> > time.  So this doesn't need to be large enough to handle any burst we
> > could ever get, just large enough to adequately mitigate the per-batch
> > costs, which I don't think are _that_ large.  256 was a first guess at
> > that.  Maybe it's not enough, but I'd be pretty surprised if it needed
> > to be greater than ~1000 to make the per-batch costs negligible
> > compared to the per-frame costs.  UDP_MAX_FRAMES, which is on the
> > reverse path but serves a similar function, is only 32.
> 
> Okay, let me give that a try. I guess you didn't see a change in UDP
> throughput tests... but there we use fairly large messages.

No, but TBH I didn't look that closely.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2024-12-21  7:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13 12:01 [PATCH 0/3] Cleanups to packet pool handling and sizing David Gibson
2024-12-13 12:01 ` [PATCH 1/3] packet: Use flexible array member in struct pool David Gibson
2024-12-13 12:01 ` [PATCH 2/3] packet: Don't have struct pool specify its buffer David Gibson
2024-12-19  9:00   ` Stefano Brivio
2024-12-20  0:59     ` David Gibson
2024-12-20  9:51       ` Stefano Brivio
2024-12-21  6:59         ` David Gibson
2024-12-13 12:01 ` [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2024-12-19  9:00   ` Stefano Brivio
2024-12-20  1:13     ` David Gibson
2024-12-20  9:51       ` Stefano Brivio
2024-12-21  7:00         ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2ZnnBZlxrsiBlyP@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).