From: David Gibson <david@gibson.dropbear.id.au>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 1/8] tap: Convert packet pools to per-queue-pair arrays for multiqueue
Date: Mon, 29 Jun 2026 19:59:40 +1000 [thread overview]
Message-ID: <akJCDM_kZ-kVGUNJ@zatzit> (raw)
In-Reply-To: <20260616171052.3785909-2-lvivier@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 10799 bytes --]
On Tue, Jun 16, 2026 at 07:10:45PM +0200, Laurent Vivier wrote:
> Convert the global pool_tap4 and pool_tap6 packet pools from single
> pools to arrays of pools, one for each queue pair. This change is
> necessary to support multiqueue operation in vhost-user mode, where
> multiple queue pairs may be processing packets concurrently.
>
> The pool storage structures (pool_tap4_storage and pool_tap6_storage)
> are now arrays of VHOST_USER_MAX_VQS/2 elements, with corresponding
> pointer arrays (pool_tap4 and pool_tap6) for accessing them.
>
> Add a qpair parameter to tap_flush_pools() so it flushes the correct
> pool. tap4_handler() and tap6_handler() now use the qpair they
> already receive to index into the pool arrays. Add bounds checking
> assertions in tap_handler() and tap_add_packet().
>
> In passt and pasta modes, all operations use QPAIR_DEFAULT. In
> vhost-user mode, the queue pair is derived from the virtqueue index
> via QPAIR_FROM_QUEUE().
>
> All pools within the array share the same buffer pointer:
> - In vhost-user mode: Points to the vhost-user memory structure, which
> is safe as packet data remains in guest memory and pools only track
> iovecs
> - In passt/pasta mode: Points to pkt_buf, which is safe as only queue
> pair 0 is used
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Generally LGTM, a few notes below.
> ---
> tap.c | 77 ++++++++++++++++++++++++++++++-----------------------
> tap.h | 2 +-
> vu_common.c | 2 +-
> 3 files changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/tap.c b/tap.c
> index 5e9c7a1701bf..80912372e216 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -94,9 +94,13 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU);
> DIV_ROUND_UP(sizeof(pkt_buf), \
> ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
>
> -/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
> -static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4);
> -static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6);
> +/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers
> + * One pool per queue pair for multiqueue support
> + */
> +static PACKET_POOL_DECL(pool_tap4, TAP_MSGS_IP4) pool_tap4_storage[VHOST_USER_MAX_VQS / 2];
> +static struct pool *pool_tap4[VHOST_USER_MAX_VQS / 2];
> +static PACKET_POOL_DECL(pool_tap6, TAP_MSGS_IP6) pool_tap6_storage[VHOST_USER_MAX_VQS / 2];
> +static struct pool *pool_tap6[VHOST_USER_MAX_VQS / 2];
We probably want to introduce an explicit cacheline alignment -
otherwise the last part of one array element could occupy the same
cacheline as the first part of the next element. Since those will be
used more or less exclusively by different threads, it could be a
pretty bad case for cacheline pingponging. I mean, probably not,
because TAP_MSGS_* is large enough that we'll rarely be using the last
pieces of each entry, but still.
Also this is the only place PACKET_POOL_NOINIT() was used, so the
macro can be removed too.
A possible longer term idea: we could consider restructuring things so
that the pool sizes aren't compile time fixed: for the non-vu cases
we'd assign the whole iovec array to the single pool that's in use.
For vu, we'd split the iovec array between the number of queues
actually in use.
> #define TAP_SEQS 128 /* Different L4 tuples in one batch */
>
> @@ -717,12 +721,12 @@ static int tap4_handler(struct ctx *c, unsigned int qpair,
> unsigned int i, j, seq_count;
> struct tap4_l4_t *seq;
>
> - if (!c->ifi4 || !pool_tap4->count)
> - return pool_tap4->count;
> + if (!c->ifi4 || !pool_tap4[qpair]->count)
Here and elsewhere, I'd kind of prefer to see a local, rather than
repeating pool_tapX[qpair] a bunch of times.
> + return pool_tap4[qpair]->count;
>
> i = 0;
> resume:
> - for (seq_count = 0, seq = NULL; i < pool_tap4->count; i++) {
> + for (seq_count = 0, seq = NULL; i < pool_tap4[qpair]->count; i++) {
> size_t l3len, hlen, l4len;
> struct ethhdr eh_storage;
> struct iphdr iph_storage;
> @@ -732,7 +736,7 @@ resume:
> struct iov_tail data;
> struct iphdr *iph;
>
> - if (!packet_get(pool_tap4, i, &data))
> + if (!packet_get(pool_tap4[qpair], i, &data))
> continue;
>
> eh = IOV_PEEK_HEADER(&data, eh_storage);
> @@ -804,7 +808,7 @@ resume:
> if (iph->protocol == IPPROTO_UDP) {
> struct iov_tail eh_data;
>
> - packet_get(pool_tap4, i, &eh_data);
> + packet_get(pool_tap4[qpair], i, &eh_data);
> if (dhcp(c, qpair, &eh_data))
> continue;
> }
> @@ -835,7 +839,7 @@ resume:
> goto append;
>
> if (seq_count == TAP_SEQS)
> - break; /* Resume after flushing if i < pool_tap4->count */
> + break; /* Resume after flushing if i < pool_tap4[qpair]->count */
>
> for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) {
> if (L4_MATCH(iph, uh, seq)) {
> @@ -881,10 +885,10 @@ append:
> }
> }
>
> - if (i < pool_tap4->count)
> + if (i < pool_tap4[qpair]->count)
> goto resume;
>
> - return pool_tap4->count;
> + return pool_tap4[qpair]->count;
> }
>
> #define IPV6_NH_OPT(nh) \
> @@ -953,12 +957,12 @@ static int tap6_handler(struct ctx *c, unsigned int qpair,
> unsigned int i, j, seq_count = 0;
> struct tap6_l4_t *seq;
>
> - if (!c->ifi6 || !pool_tap6->count)
> - return pool_tap6->count;
> + if (!c->ifi6 || !pool_tap6[qpair]->count)
> + return pool_tap6[qpair]->count;
>
> i = 0;
> resume:
> - for (seq_count = 0, seq = NULL; i < pool_tap6->count; i++) {
> + for (seq_count = 0, seq = NULL; i < pool_tap6[qpair]->count; i++) {
> size_t l4len, plen, check;
> struct in6_addr *saddr, *daddr;
> struct ipv6hdr ip6h_storage;
> @@ -970,7 +974,7 @@ resume:
> struct ipv6hdr *ip6h;
> uint8_t proto;
>
> - if (!packet_get(pool_tap6, i, &data))
> + if (!packet_get(pool_tap6[qpair], i, &data))
> return -1;
>
> eh = IOV_REMOVE_HEADER(&data, eh_storage);
> @@ -1083,7 +1087,7 @@ resume:
> goto append;
>
> if (seq_count == TAP_SEQS)
> - break; /* Resume after flushing if i < pool_tap6->count */
> + break; /* Resume after flushing if i < pool_tap6[qpair]->count */
>
> for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) {
> if (L4_MATCH(ip6h, proto, uh, seq)) {
> @@ -1130,19 +1134,19 @@ append:
> }
> }
>
> - if (i < pool_tap6->count)
> + if (i < pool_tap6[qpair]->count)
> goto resume;
>
> - return pool_tap6->count;
> + return pool_tap6[qpair]->count;
> }
>
> /**
> - * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools
> + * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools for a given qpair
> */
> -void tap_flush_pools(void)
> +void tap_flush_pools(unsigned int qpair)
> {
> - pool_flush(pool_tap4);
> - pool_flush(pool_tap6);
> + pool_flush(pool_tap4[qpair]);
> + pool_flush(pool_tap6[qpair]);
> }
>
> /**
> @@ -1153,6 +1157,7 @@ void tap_flush_pools(void)
> */
> void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now)
> {
> + assert(qpair < VHOST_USER_MAX_VQS / 2);
> tap4_handler(c, qpair, now);
> tap6_handler(c, qpair, now);
> }
> @@ -1187,21 +1192,23 @@ void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
> proto_update_l2_buf(c->guest_mac);
> }
>
> + assert(qpair < VHOST_USER_MAX_VQS / 2);
> +
> switch (ntohs(eh->h_proto)) {
> case ETH_P_ARP:
> case ETH_P_IP:
> - if (!pool_can_fit(pool_tap4, data)) {
> + if (!pool_can_fit(pool_tap4[qpair], data)) {
> tap4_handler(c, qpair, now);
> - pool_flush(pool_tap4);
> + pool_flush(pool_tap4[qpair]);
> }
> - packet_add(pool_tap4, data);
> + packet_add(pool_tap4[qpair], data);
> break;
> case ETH_P_IPV6:
> - if (!pool_can_fit(pool_tap6, data)) {
> + if (!pool_can_fit(pool_tap6[qpair], data)) {
> tap6_handler(c, qpair, now);
> - pool_flush(pool_tap6);
> + pool_flush(pool_tap6[qpair]);
> }
> - packet_add(pool_tap6, data);
> + packet_add(pool_tap6[qpair], data);
> break;
> default:
> break;
> @@ -1239,7 +1246,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
> ssize_t n;
> char *p;
>
> - tap_flush_pools();
> + tap_flush_pools(QPAIR_DEFAULT);
>
> if (partial_len) {
> /* We have a partial frame from an earlier pass. Move it to the
> @@ -1322,7 +1329,7 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
> {
> ssize_t n, len;
>
> - tap_flush_pools();
> + tap_flush_pools(QPAIR_DEFAULT);
>
> for (n = 0;
> n <= (ssize_t)(sizeof(pkt_buf) - L2_MAX_LEN_PASTA);
> @@ -1591,10 +1598,14 @@ static void tap_sock_tun_init(struct ctx *c)
> */
> static void tap_sock_update_pool(void *base, size_t size)
> {
> - int i;
> + unsigned int i;
>
> - pool_tap4_storage = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size);
> - pool_tap6_storage = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size);
> + for (i = 0; i < VHOST_USER_MAX_VQS / 2; i++) {
> + pool_tap4_storage[i] = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size);
> + pool_tap4[i] = (struct pool *)&pool_tap4_storage[i];
> + pool_tap6_storage[i] = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size);
> + pool_tap6[i] = (struct pool *)&pool_tap6_storage[i];
> + }
>
> for (i = 0; i < TAP_SEQS; i++) {
> tap4_l4[i].p = PACKET_INIT(pool_l4, UIO_MAXIOV, base, size);
> diff --git a/tap.h b/tap.h
> index ecb12de372b5..8a1d8f933a5b 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -123,7 +123,7 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
> int tap_sock_unix_open(char *sock_path);
> void tap_sock_reset(struct ctx *c);
> void tap_backend_init(struct ctx *c);
> -void tap_flush_pools(void);
> +void tap_flush_pools(unsigned int qpair);
> void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now);
> void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
> const struct timespec *now);
> diff --git a/vu_common.c b/vu_common.c
> index 6aa1ba768136..2e69c1945055 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -178,7 +178,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> assert(QPAIR_FROMGUEST_QUEUE(QPAIR_FROM_QUEUE(index)) ==
> (unsigned int)index);
>
> - tap_flush_pools();
> + tap_flush_pools(QPAIR_FROM_QUEUE(index));
>
> count = 0;
> out_sg_count = 0;
> --
> 2.54.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2026-06-29 9:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 17:10 [PATCH 0/8] multithreading: Prepare data structures for concurrent queue pair workers Laurent Vivier
2026-06-16 17:10 ` [PATCH 1/8] tap: Convert packet pools to per-queue-pair arrays for multiqueue Laurent Vivier
2026-06-29 9:59 ` David Gibson [this message]
2026-06-16 17:10 ` [PATCH 2/8] tap: Make L4 sequence pools per-qpair for thread safety Laurent Vivier
2026-06-16 17:10 ` [PATCH 3/8] tcp: Make static buffers stack-local " Laurent Vivier
2026-06-16 17:10 ` [PATCH 4/8] udp_vu: Make virtqueue " Laurent Vivier
2026-06-16 17:10 ` [PATCH 5/8] flow: Make flow timer per-caller " Laurent Vivier
2026-06-16 17:10 ` [PATCH 6/8] tcp: Make TCP timer state per-caller and guard global tasks Laurent Vivier
2026-06-16 17:10 ` [PATCH 7/8] tcp: Protect init socket pools with mutex for thread safety Laurent Vivier
2026-06-16 17:10 ` [PATCH 8/8] flow: Add mutex and per-qpair filtering to flow table operations Laurent Vivier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akJCDM_kZ-kVGUNJ@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).