On Fri, Dec 20, 2024 at 10:51:36AM +0100, Stefano Brivio wrote: > On Fri, 20 Dec 2024 12:13:23 +1100 > David Gibson wrote: > > > On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote: > > > On Fri, 13 Dec 2024 23:01:56 +1100 > > > David Gibson wrote: > > > > > > > Currently we attempt to size pool_tap[46] so they have room for the maximum > > > > possible number of packets that could fit in pkt_buf, TAP_MSGS. However, > > > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as > > > > the minimum possible L2 frame size. But, we don't enforce that L2 frames > > > > are at least ETH_ZLEN when we receive them from the tap backend, and since > > > > we're dealing with virtual interfaces we don't have the physical Ethernet > > > > limitations requiring that length. Indeed it is possible to generate a > > > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on > > > > the 'pasta' backend is only 42 bytes long). > > > > > > > > It's also unclear if this limit is sufficient for vhost-user which isn't > > > > limited by the size of pkt_buf as the other modes are. > > > > > > > > We could attempt to correct the calculation, but that would leave us with > > > > even larger arrays, which in practice rarely accumulate more than a handful > > > > of packets. So, instead, put an arbitrary cap on the number of packets we > > > > can put in a batch, and if we run out of space, process and flush the > > > > batch. > > > > > > > > Signed-off-by: David Gibson > > > > --- > > > > packet.c | 13 ++++++++++++- > > > > packet.h | 3 +++ > > > > passt.h | 2 -- > > > > tap.c | 18 +++++++++++++++--- > > > > tap.h | 3 ++- > > > > vu_common.c | 3 ++- > > > > 6 files changed, 34 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/packet.c b/packet.c > > > > index 5bfa7304..b68580cc 100644 > > > > --- a/packet.c > > > > +++ b/packet.c > > > > @@ -22,6 +22,17 @@ > > > > #include "util.h" > > > > #include "log.h" > > > > > > > > +/** > > > > + * pool_full() - Is a packet pool full? > > > > + * @p: Pointer to packet pool > > > > + * > > > > + * Return: true if the pool is full, false if more packets can be added > > > > + */ > > > > +bool pool_full(const struct pool *p) > > > > +{ > > > > + return p->count >= p->size; > > > > +} > > > > + > > > > /** > > > > * packet_add_do() - Add data as packet descriptor to given pool > > > > * @p: Existing pool > > > > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > > { > > > > size_t idx = p->count; > > > > > > > > - if (idx >= p->size) { > > > > + if (pool_full(p)) { > > > > trace("add packet index %zu to pool with size %zu, %s:%i", > > > > idx, p->size, func, line); > > > > return; > > > > diff --git a/packet.h b/packet.h > > > > index 98eb8812..3618f213 100644 > > > > --- a/packet.h > > > > +++ b/packet.h > > > > @@ -6,6 +6,8 @@ > > > > #ifndef PACKET_H > > > > #define PACKET_H > > > > > > > > +#include > > > > + > > > > /** > > > > * struct pool - Generic pool of packets stored in nmemory > > > > * @size: Number of usable descriptors for the pool > > > > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > > void *packet_get_do(const struct pool *p, const size_t idx, > > > > size_t offset, size_t len, size_t *left, > > > > const char *func, int line); > > > > +bool pool_full(const struct pool *p); > > > > void pool_flush(struct pool *p); > > > > > > > > #define packet_add(p, len, start) \ > > > > diff --git a/passt.h b/passt.h > > > > index 0dd4efa0..81b2787f 100644 > > > > --- a/passt.h > > > > +++ b/passt.h > > > > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data), > > > > > > > > #define TAP_BUF_BYTES \ > > > > ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE) > > > > -#define TAP_MSGS \ > > > > - DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t)) > > > > > > > > #define PKT_BUF_BYTES MAX(TAP_BUF_BYTES, 0) > > > > extern char pkt_buf [PKT_BUF_BYTES]; > > > > diff --git a/tap.c b/tap.c > > > > index 68231f09..42370a26 100644 > > > > --- a/tap.c > > > > +++ b/tap.c > > > > @@ -61,6 +61,8 @@ > > > > #include "vhost_user.h" > > > > #include "vu_common.h" > > > > > > > > +#define TAP_MSGS 256 > > > > > > Sorry, I stopped at 2/3, had just a quick look at this one, and I > > > missed this. > > > > > > Assuming 4 KiB pages, this changes from 161319 to 256. You mention that > > > > Yes. I'm certainly open to arguments on what the number should be. > > No idea, until now I just thought we'd have a limit that we can't hit > in practice. Let me have a look at what happens with 256 (your new > series) and iperf3 or udp_stream from neper. > > > > in practice we never have more than a handful of messages, which is > > > probably almost always the case, but I wonder if that's also the case > > > with UDP "real-time" streams, where we could have bursts of a few > > > hundred (thousand?) messages at a time. > > > > Maybe. If we are getting them in large bursts, then we're no longer > > really suceeding at the streams being "real-time", but sure, we should > > try to catch up as best we can. > > It could even be that flushing more frequently actually improves > things. I'm not sure. I was just pointing out that, quite likely, we > can actually hit the new limit. > > > > I wonder: how bad would it be to correct the calculation, instead? We > > > wouldn't actually use more memory, right? > > > > I was pretty painful when I tried, and it would use more memory. The > > safe option would be to use ETH_HLEN as the minimum size (which is > > pretty much all we enforce in the tap layer), which would expand the > > iovec array here by 2-3x. It's not enormous, but it's not nothing. > > Or do you mean the unused pages of the array would never be > > instantiated? In which case, yeah, I guess not. > > Yes, that's what I meant. > > > Remember that with the changes in this patch if we exceed TAP_MSGS, > > nothing particularly bad happens: we don't crash, and we don't drop > > packets; we just process things in batches of TAP_MSGS frames at a > > time. So this doesn't need to be large enough to handle any burst we > > could ever get, just large enough to adequately mitigate the per-batch > > costs, which I don't think are _that_ large. 256 was a first guess at > > that. Maybe it's not enough, but I'd be pretty surprised if it needed > > to be greater than ~1000 to make the per-batch costs negligible > > compared to the per-frame costs. UDP_MAX_FRAMES, which is on the > > reverse path but serves a similar function, is only 32. > > Okay, let me give that a try. I guess you didn't see a change in UDP > throughput tests... but there we use fairly large messages. No, but TBH I didn't look that closely. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson