On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote: > On Fri, 13 Dec 2024 23:01:56 +1100 > David Gibson wrote: > > > Currently we attempt to size pool_tap[46] so they have room for the maximum > > possible number of packets that could fit in pkt_buf, TAP_MSGS. However, > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as > > the minimum possible L2 frame size. But, we don't enforce that L2 frames > > are at least ETH_ZLEN when we receive them from the tap backend, and since > > we're dealing with virtual interfaces we don't have the physical Ethernet > > limitations requiring that length. Indeed it is possible to generate a > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on > > the 'pasta' backend is only 42 bytes long). > > > > It's also unclear if this limit is sufficient for vhost-user which isn't > > limited by the size of pkt_buf as the other modes are. > > > > We could attempt to correct the calculation, but that would leave us with > > even larger arrays, which in practice rarely accumulate more than a handful > > of packets. So, instead, put an arbitrary cap on the number of packets we > > can put in a batch, and if we run out of space, process and flush the > > batch. > > > > Signed-off-by: David Gibson > > --- > > packet.c | 13 ++++++++++++- > > packet.h | 3 +++ > > passt.h | 2 -- > > tap.c | 18 +++++++++++++++--- > > tap.h | 3 ++- > > vu_common.c | 3 ++- > > 6 files changed, 34 insertions(+), 8 deletions(-) > > > > diff --git a/packet.c b/packet.c > > index 5bfa7304..b68580cc 100644 > > --- a/packet.c > > +++ b/packet.c > > @@ -22,6 +22,17 @@ > > #include "util.h" > > #include "log.h" > > > > +/** > > + * pool_full() - Is a packet pool full? > > + * @p: Pointer to packet pool > > + * > > + * Return: true if the pool is full, false if more packets can be added > > + */ > > +bool pool_full(const struct pool *p) > > +{ > > + return p->count >= p->size; > > +} > > + > > /** > > * packet_add_do() - Add data as packet descriptor to given pool > > * @p: Existing pool > > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > { > > size_t idx = p->count; > > > > - if (idx >= p->size) { > > + if (pool_full(p)) { > > trace("add packet index %zu to pool with size %zu, %s:%i", > > idx, p->size, func, line); > > return; > > diff --git a/packet.h b/packet.h > > index 98eb8812..3618f213 100644 > > --- a/packet.h > > +++ b/packet.h > > @@ -6,6 +6,8 @@ > > #ifndef PACKET_H > > #define PACKET_H > > > > +#include > > + > > /** > > * struct pool - Generic pool of packets stored in nmemory > > * @size: Number of usable descriptors for the pool > > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > void *packet_get_do(const struct pool *p, const size_t idx, > > size_t offset, size_t len, size_t *left, > > const char *func, int line); > > +bool pool_full(const struct pool *p); > > void pool_flush(struct pool *p); > > > > #define packet_add(p, len, start) \ > > diff --git a/passt.h b/passt.h > > index 0dd4efa0..81b2787f 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data), > > > > #define TAP_BUF_BYTES \ > > ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE) > > -#define TAP_MSGS \ > > - DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t)) > > > > #define PKT_BUF_BYTES MAX(TAP_BUF_BYTES, 0) > > extern char pkt_buf [PKT_BUF_BYTES]; > > diff --git a/tap.c b/tap.c > > index 68231f09..42370a26 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -61,6 +61,8 @@ > > #include "vhost_user.h" > > #include "vu_common.h" > > > > +#define TAP_MSGS 256 > > Sorry, I stopped at 2/3, had just a quick look at this one, and I > missed this. > > Assuming 4 KiB pages, this changes from 161319 to 256. You mention that Yes. I'm certainly open to arguments on what the number should be. > in practice we never have more than a handful of messages, which is > probably almost always the case, but I wonder if that's also the case > with UDP "real-time" streams, where we could have bursts of a few > hundred (thousand?) messages at a time. Maybe. If we are getting them in large bursts, then we're no longer really suceeding at the streams being "real-time", but sure, we should try to catch up as best we can. > I wonder: how bad would it be to correct the calculation, instead? We > wouldn't actually use more memory, right? I was pretty painful when I tried, and it would use more memory. The safe option would be to use ETH_HLEN as the minimum size (which is pretty much all we enforce in the tap layer), which would expand the iovec array here by 2-3x. It's not enormous, but it's not nothing. Or do you mean the unused pages of the array would never be instantiated? In which case, yeah, I guess not. Remember that with the changes in this patch if we exceed TAP_MSGS, nothing particularly bad happens: we don't crash, and we don't drop packets; we just process things in batches of TAP_MSGS frames at a time. So this doesn't need to be large enough to handle any burst we could ever get, just large enough to adequately mitigate the per-batch costs, which I don't think are _that_ large. 256 was a first guess at that. Maybe it's not enough, but I'd be pretty surprised if it needed to be greater than ~1000 to make the per-batch costs negligible compared to the per-frame costs. UDP_MAX_FRAMES, which is on the reverse path but serves a similar function, is only 32. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson