On Tue, Jun 16, 2026 at 07:10:46PM +0200, Laurent Vivier wrote: > The L4 sequence arrays tap4_l4[] and tap6_l4[] are used to batch > packets with the same L4 tuple within a single tap_handler() call. > They are global, but tap_handler() can be called concurrently from > different worker threads with different qpairs in vhost-user mode. > > Make these arrays per-qpair by adding a VHOST_USER_MAX_VQS/2 first > dimension, indexed by the qpair parameter already available in > tap4_handler() and tap6_handler(). > > Update tap_sock_update_pool() to initialize all qpair*seq entries. > > Signed-off-by: Laurent Vivier Generally LGTM. Couple of nits and one broader design thought. > --- > tap.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/tap.c b/tap.c > index 80912372e216..659df9d560d3 100644 > --- a/tap.c > +++ b/tap.c > @@ -606,7 +606,7 @@ static struct tap4_l4_t { > struct in_addr daddr; > > struct pool_l4_t p; > -} tap4_l4[TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; > +} tap4_l4[VHOST_USER_MAX_VQS / 2][TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; Nit: VHOST_USER_MAX_VQS is being used often enough for sizing data structures that we might want a new #define for it. As in 1/8, I think an explicit cacheline alignment would be a good idea here too. > /** > * struct l4_seq6_t - Message sequence for one protocol handler call, IPv6 > @@ -633,7 +633,7 @@ static struct tap6_l4_t { > uint8_t hop_limit; > > struct pool_l4_t p; > -} tap6_l4[TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; > +} tap6_l4[VHOST_USER_MAX_VQS / 2][TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; Ditto. Also... maybe this would be more trouble for a first pass. But we're introducing a bunch of parallel per-qpair arrays. I'm wondering if it might be nicer to instead gather all the per-qpair structures together into a struct qpair_ctx or struct qpair_tap_ctx. Couple of advantages I can see (neither of which are must haves on their own): * Clearer to describe the locking semantics ("none, because it's thread private") for a single big structure than parallel slices of a whole bunch of other structures * Might give better memory locality for each thread > /** > * tap_packet_debug() - Print debug message for packet(s) from guest/tap > @@ -841,7 +841,7 @@ resume: > if (seq_count == TAP_SEQS) > break; /* Resume after flushing if i < pool_tap4[qpair]->count */ > > - for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) { > + for (seq = tap4_l4[qpair] + seq_count - 1; seq >= tap4_l4[qpair]; seq--) { > if (L4_MATCH(iph, uh, seq)) { > if (seq->p.count >= UIO_MAXIOV) > seq = NULL; > @@ -849,8 +849,8 @@ resume: > } > } > > - if (!seq || seq < tap4_l4) { > - seq = tap4_l4 + seq_count++; > + if (!seq || seq < tap4_l4[qpair]) { > + seq = tap4_l4[qpair] + seq_count++; > L4_SET(iph, uh, seq); > pool_flush((struct pool *)&seq->p); > } > @@ -862,7 +862,7 @@ append: > packet_add((struct pool *)&seq->p, &data); > } > > - for (j = 0, seq = tap4_l4; j < seq_count; j++, seq++) { > + for (j = 0, seq = tap4_l4[qpair]; j < seq_count; j++, seq++) { > const struct pool *p = (const struct pool *)&seq->p; > size_t k; > > @@ -1089,7 +1089,7 @@ resume: > if (seq_count == TAP_SEQS) > break; /* Resume after flushing if i < pool_tap6[qpair]->count */ > > - for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) { > + for (seq = tap6_l4[qpair] + seq_count - 1; seq >= tap6_l4[qpair]; seq--) { > if (L4_MATCH(ip6h, proto, uh, seq)) { > if (seq->p.count >= UIO_MAXIOV) > seq = NULL; > @@ -1097,8 +1097,8 @@ resume: > } > } > > - if (!seq || seq < tap6_l4) { > - seq = tap6_l4 + seq_count++; > + if (!seq || seq < tap6_l4[qpair]) { > + seq = tap6_l4[qpair] + seq_count++; > L4_SET(ip6h, proto, uh, seq); > pool_flush((struct pool *)&seq->p); > } > @@ -1110,7 +1110,7 @@ append: > packet_add((struct pool *)&seq->p, &data); > } > > - for (j = 0, seq = tap6_l4; j < seq_count; j++, seq++) { > + for (j = 0, seq = tap6_l4[qpair]; j < seq_count; j++, seq++) { > const struct pool *p = (const struct pool *)&seq->p; > size_t k; > > @@ -1607,9 +1607,15 @@ static void tap_sock_update_pool(void *base, size_t size) > pool_tap6[i] = (struct pool *)&pool_tap6_storage[i]; > } > > - for (i = 0; i < TAP_SEQS; i++) { > - tap4_l4[i].p = PACKET_INIT(pool_l4, UIO_MAXIOV, base, size); > - tap6_l4[i].p = PACKET_INIT(pool_l4, UIO_MAXIOV, base, size); > + for (i = 0; i < VHOST_USER_MAX_VQS / 2; i++) { > + unsigned int j; > + > + for (j = 0; j < TAP_SEQS; j++) { > + tap4_l4[i][j].p = PACKET_INIT(pool_l4, UIO_MAXIOV, > + base, size); > + tap6_l4[i][j].p = PACKET_INIT(pool_l4, UIO_MAXIOV, > + base, size); > + } > } > } > > -- > 2.54.0 > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson