[PATCH v3 0/6] vhost-user: Add multiqueue support

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH v3 0/6] vhost-user: Add multiqueue support
@ 2025-12-03 18:54 Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler() Laurent Vivier
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

This series implements multiqueue support for vhost-user mode, allowing passt
to utilize multiple queue pairs for improved network throughput when used with
multi-CPU guest VMs. While this version uses a single thread for packet
processing, it enables the guest kernel to distribute network traffic across
multiple queues and vCPUs.

The implementation advertises support for up to 16 queue pairs (32 virtqueues)
by setting VIRTIO_NET_F_MQ and VHOST_USER_PROTOCOL_F_MQ feature flags.
Packets are routed to the appropriate RX queue based on which TX queue they
originated from, following the virtio specification's automatic receive
steering requirements.

This series adds:
- Multiqueue capability advertisement (VIRTIO_NET_F_MQ and
  VHOST_USER_PROTOCOL_F_MQ features)
- Per-queue-pair packet pools to support concurrent queue operations
- Queue pair parameter throughout the network stack, propagated through all
  protocol handlers (TCP, UDP, ICMP, ARP, DHCP, DHCPv6, NDP)
- Flow-aware queue routing that tracks the originating TX queue for each flow
  and routes return packets to the corresponding RX queue
- Test coverage with VHOST_USER_MQ environment variable to validate multiqueue
  functionality across all protocols (TCP, UDP, ICMP) and services (DHCP, NDP)

Current behavior:
- TX queue selection is controlled by the guest kernel's networking stack
- RX packets are routed to queues based on their associated flows, with the
  queue assignment updated on each packet from TX to maintain affinity
- Host-initiated flows (e.g., from socket-side connections) currently default
  to queue pair 0

The changes are transparent to single-queue operation - passt/pasta modes and
single-queue vhost-user configurations continue to work unchanged, always
using queue pair 0.

v3:
- Removed --max-qpairs configuration option - multiqueue support is now
  always enabled up to 16 queue pairs without requiring explicit configuration
- Replaced "tap: Add queue pair parameter throughout the packet processing
  path" with "tap: Convert packet pools to per-queue-pair arrays for
  multiqueue" - simplified implementation by converting global pools to arrays
  rather than passing pool parameters throughout
- Changed qpair parameter type from int to unsigned int throughout the codebase
- Simplified test infrastructure - queues parameter is always set on netdev,
  mq=true added to virtio-net only when VHOST_USER_MQ > 1
- Updated QEMU usage hints to always show multiqueue-capable command line

v2:
- New patch: "tap: Remove pool parameter from tap4_handler() and tap6_handler()"
  to clean up unused parameters before adding queue pair parameter
- Changed to one packet pool per queue pair instead of shared pools across
  all queue pairs
- Split "multiqueue: Add queue-aware flow management..." into two patches:
  - "tap: Add queue pair parameter throughout the packet processing path"
  - "flow: Add queue pair tracking to flow management"
- Updated test infrastructure patch with refined implementation

Laurent Vivier (6):
  tap: Remove pool parameter from tap4_handler() and tap6_handler()
  vhost-user: Enable multiqueue
  test: Add multiqueue support to vhost-user test infrastructure
  vhost-user: Add queue pair parameter throughout the network stack
  tap: Convert packet pools to per-queue-pair arrays for multiqueue
  flow: Add queue pair tracking to flow management

 arp.c          |  15 +++--
 arp.h          |   6 +-
 dhcp.c         |   5 +-
 dhcp.h         |   2 +-
 dhcpv6.c       |  12 ++--
 dhcpv6.h       |   2 +-
 flow.c         |  33 +++++++++
 flow.h         |  17 +++++
 fwd.c          |  18 ++---
 fwd.h          |   5 +-
 icmp.c         |  25 ++++---
 icmp.h         |   4 +-
 ndp.c          |  35 ++++++----
 ndp.h          |   7 +-
 netlink.c      |   2 +-
 tap.c          | 177 ++++++++++++++++++++++++++++---------------------
 tap.h          |  20 +++---
 tcp.c          |  47 +++++++------
 tcp.h          |   7 +-
 tcp_vu.c       |   8 ++-
 test/lib/setup |  21 +++---
 test/run       |  23 +++++++
 udp.c          |  47 +++++++------
 udp.h          |   6 +-
 udp_flow.c     |   8 ++-
 udp_flow.h     |   2 +-
 udp_internal.h |   4 +-
 udp_vu.c       |   4 +-
 vhost_user.c   |  10 +--
 virtio.h       |   2 +-
 vu_common.c    |  15 +++--
 vu_common.h    |   3 +-
 32 files changed, 374 insertions(+), 218 deletions(-)

-- 
2.51.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler()
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  2025-12-05  4:14   ` David Gibson
  2025-12-03 18:54 ` [PATCH v3 2/6] vhost-user: Enable multiqueue Laurent Vivier
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

These handlers only ever operate on their respective global pools
(pool_tap4 and pool_tap6). The pool parameter was always passed the
same value, making it unnecessary indirection.

Access the global pools directly instead, simplifying the function
signatures.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 tap.c | 46 +++++++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/tap.c b/tap.c
index 44b06448757a..2cda8c9772b8 100644
--- a/tap.c
+++ b/tap.c
@@ -696,23 +696,21 @@ static bool tap4_is_fragment(const struct iphdr *iph,
 /**
  * tap4_handler() - IPv4 and ARP packet handler for tap file descriptor
  * @c:		Execution context
- * @in:		Ingress packet pool, packets with Ethernet headers
  * @now:	Current timestamp
  *
  * Return: count of packets consumed by handlers
  */
-static int tap4_handler(struct ctx *c, const struct pool *in,
-			const struct timespec *now)
+static int tap4_handler(struct ctx *c, const struct timespec *now)
 {
 	unsigned int i, j, seq_count;
 	struct tap4_l4_t *seq;
 
-	if (!c->ifi4 || !in->count)
-		return in->count;
+	if (!c->ifi4 || !pool_tap4->count)
+		return pool_tap4->count;
 
 	i = 0;
 resume:
-	for (seq_count = 0, seq = NULL; i < in->count; i++) {
+	for (seq_count = 0, seq = NULL; i < pool_tap4->count; i++) {
 		size_t l3len, hlen, l4len;
 		struct ethhdr eh_storage;
 		struct iphdr iph_storage;
@@ -722,7 +720,7 @@ resume:
 		struct iov_tail data;
 		struct iphdr *iph;
 
-		if (!packet_get(in, i, &data))
+		if (!packet_get(pool_tap4, i, &data))
 			continue;
 
 		eh = IOV_PEEK_HEADER(&data, eh_storage);
@@ -789,7 +787,7 @@ resume:
 		if (iph->protocol == IPPROTO_UDP) {
 			struct iov_tail eh_data;
 
-			packet_get(in, i, &eh_data);
+			packet_get(pool_tap4, i, &eh_data);
 			if (dhcp(c, &eh_data))
 				continue;
 		}
@@ -820,7 +818,7 @@ resume:
 			goto append;
 
 		if (seq_count == TAP_SEQS)
-			break;	/* Resume after flushing if i < in->count */
+			break;	/* Resume after flushing if i < pool_tap4->count */
 
 		for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) {
 			if (L4_MATCH(iph, uh, seq)) {
@@ -866,32 +864,30 @@ append:
 		}
 	}
 
-	if (i < in->count)
+	if (i < pool_tap4->count)
 		goto resume;
 
-	return in->count;
+	return pool_tap4->count;
 }
 
 /**
  * tap6_handler() - IPv6 packet handler for tap file descriptor
  * @c:		Execution context
- * @in:		Ingress packet pool, packets with Ethernet headers
  * @now:	Current timestamp
  *
  * Return: count of packets consumed by handlers
  */
-static int tap6_handler(struct ctx *c, const struct pool *in,
-			const struct timespec *now)
+static int tap6_handler(struct ctx *c, const struct timespec *now)
 {
 	unsigned int i, j, seq_count = 0;
 	struct tap6_l4_t *seq;
 
-	if (!c->ifi6 || !in->count)
-		return in->count;
+	if (!c->ifi6 || !pool_tap6->count)
+		return pool_tap6->count;
 
 	i = 0;
 resume:
-	for (seq_count = 0, seq = NULL; i < in->count; i++) {
+	for (seq_count = 0, seq = NULL; i < pool_tap6->count; i++) {
 		size_t l4len, plen, check;
 		struct in6_addr *saddr, *daddr;
 		struct ipv6hdr ip6h_storage;
@@ -903,7 +899,7 @@ resume:
 		struct ipv6hdr *ip6h;
 		uint8_t proto;
 
-		if (!packet_get(in, i, &data))
+		if (!packet_get(pool_tap6, i, &data))
 			return -1;
 
 		eh = IOV_REMOVE_HEADER(&data, eh_storage);
@@ -1011,7 +1007,7 @@ resume:
 			goto append;
 
 		if (seq_count == TAP_SEQS)
-			break;	/* Resume after flushing if i < in->count */
+			break;	/* Resume after flushing if i < pool_tap6->count */
 
 		for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) {
 			if (L4_MATCH(ip6h, proto, uh, seq)) {
@@ -1058,10 +1054,10 @@ append:
 		}
 	}
 
-	if (i < in->count)
+	if (i < pool_tap6->count)
 		goto resume;
 
-	return in->count;
+	return pool_tap6->count;
 }
 
 /**
@@ -1080,8 +1076,8 @@ void tap_flush_pools(void)
  */
 void tap_handler(struct ctx *c, const struct timespec *now)
 {
-	tap4_handler(c, pool_tap4, now);
-	tap6_handler(c, pool_tap6, now);
+	tap4_handler(c, now);
+	tap6_handler(c, now);
 }
 
 /**
@@ -1115,14 +1111,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
 	case ETH_P_ARP:
 	case ETH_P_IP:
 		if (!pool_can_fit(pool_tap4, data)) {
-			tap4_handler(c, pool_tap4, now);
+			tap4_handler(c, now);
 			pool_flush(pool_tap4);
 		}
 		packet_add(pool_tap4, data);
 		break;
 	case ETH_P_IPV6:
 		if (!pool_can_fit(pool_tap6, data)) {
-			tap6_handler(c, pool_tap6, now);
+			tap6_handler(c, now);
 			pool_flush(pool_tap6);
 		}
 		packet_add(pool_tap6, data);
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 2/6] vhost-user: Enable multiqueue
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler() Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  2025-12-10  0:04   ` David Gibson
  2025-12-11  7:01   ` Stefano Brivio
  2025-12-03 18:54 ` [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure Laurent Vivier
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Advertise multi-queue support in vhost-user by setting VIRTIO_NET_F_MQ
and VHOST_USER_PROTOCOL_F_MQ feature flags, and increase
VHOST_USER_MAX_VQS from 2 to 32, supporting up to 16 queue pairs.

Currently, only the first RX queue (queue 0) is used for receiving
packets. The guest kernel selects which TX queue to use for
transmission. Full multi-RX queue load balancing will be implemented in
future work.

Update the QEMU usage hint to show the required parameters for enabling
multiqueue: queues parameter on the netdev, and mq=true on the
virtio-net device.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 tap.c        |  7 +++++--
 vhost_user.c | 10 ++++++----
 virtio.h     |  2 +-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/tap.c b/tap.c
index 2cda8c9772b8..591b49491aa3 100644
--- a/tap.c
+++ b/tap.c
@@ -1314,8 +1314,11 @@ static void tap_backend_show_hints(struct ctx *c)
 		break;
 	case MODE_VU:
 		info("You can start qemu with:");
-		info("    kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n",
-		     c->sock_path);
+		info("    kvm ... -chardev socket,id=chr0,path=%s "
+		     "-netdev vhost-user,id=netdev0,chardev=chr0,queues=$QUEUES "
+		     "-device virtio-net,netdev=netdev0,mq=true "
+		     "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE "
+		     "-numa node,memdev=memfd0\n", c->sock_path);
 		break;
 	}
 }
diff --git a/vhost_user.c b/vhost_user.c
index aa7c869d9e56..845fdb551c84 100644
--- a/vhost_user.c
+++ b/vhost_user.c
@@ -323,6 +323,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev,
 	uint64_t features =
 		1ULL << VIRTIO_F_VERSION_1 |
 		1ULL << VIRTIO_NET_F_MRG_RXBUF |
+		1ULL << VIRTIO_NET_F_MQ |
 		1ULL << VHOST_F_LOG_ALL |
 		1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
 
@@ -767,7 +768,8 @@ static void vu_check_queue_msg_file(struct vhost_user_msg *vmsg)
 	int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
 
 	if (idx >= VHOST_USER_MAX_VQS)
-		die("Invalid vhost-user queue index: %u", idx);
+		die("Invalid vhost-user queue index: %u (maximum %u)", idx,
+		    VHOST_USER_MAX_VQS);
 
 	if (nofd) {
 		vmsg_close_fds(vmsg);
@@ -896,7 +898,8 @@ static bool vu_get_protocol_features_exec(struct vu_dev *vdev,
 	uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
 			    1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
 			    1ULL << VHOST_USER_PROTOCOL_F_DEVICE_STATE |
-			    1ULL << VHOST_USER_PROTOCOL_F_RARP;
+			    1ULL << VHOST_USER_PROTOCOL_F_RARP |
+			    1ULL << VHOST_USER_PROTOCOL_F_MQ;
 
 	(void)vdev;
 	vmsg_set_reply_u64(vmsg, features);
@@ -935,10 +938,9 @@ static bool vu_get_queue_num_exec(struct vu_dev *vdev,
 {
 	(void)vdev;
 
-	/* NOLINTNEXTLINE(misc-redundant-expression) */
 	vmsg_set_reply_u64(vmsg, VHOST_USER_MAX_VQS / 2);
 
-	debug("VHOST_USER_MAX_VQS  %u", VHOST_USER_MAX_VQS / 2);
+	debug("queue num  %u", VHOST_USER_MAX_VQS / 2);
 
 	return true;
 }
diff --git a/virtio.h b/virtio.h
index 12caaa0b6def..176c935cecc7 100644
--- a/virtio.h
+++ b/virtio.h
@@ -88,7 +88,7 @@ struct vu_dev_region {
 	uint64_t mmap_addr;
 };
 
-#define VHOST_USER_MAX_VQS 2
+#define VHOST_USER_MAX_VQS 32
 
 /*
  * Set a reasonable maximum number of ram slots, which will be supported by
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler() Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 2/6] vhost-user: Enable multiqueue Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  2025-12-10  0:05   ` David Gibson
  2025-12-03 18:54 ` [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack Laurent Vivier
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

With the recent addition of multiqueue support to passt's vhost-user
implementation, we need test coverage to validate the functionality. The
test infrastructure previously only tested single queue configurations.

Add a VHOST_USER_MQ environment variable to control the number of queue
pairs. The queues parameter on the netdev is always set to this value
(defaulting to 1 for single queue). When set to values greater than 1,
the setup scripts add mq=true to the virtio-net device for enabling
multiqueue support.

The test suite now runs an additional set of tests with 8 queue pairs to
exercise the multiqueue paths across all protocols (TCP, UDP, ICMP) and
services (DHCP, NDP). Note that the guest kernel will only enable as many
queues as there are vCPUs.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 test/lib/setup | 21 +++++++++++++--------
 test/run       | 23 +++++++++++++++++++++++
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/test/lib/setup b/test/lib/setup
index 5994598744a3..3872a02b109b 100755
--- a/test/lib/setup
+++ b/test/lib/setup
@@ -18,6 +18,8 @@ VCPUS="$( [ $(nproc) -ge 8 ] && echo 6 || echo $(( $(nproc) / 2 + 1 )) )"
 MEM_KIB="$(sed -n 's/MemTotal:[ ]*\([0-9]*\) kB/\1/p' /proc/meminfo)"
 QEMU_ARCH="$(uname -m)"
 [ "${QEMU_ARCH}" = "i686" ] && QEMU_ARCH=i386
+VHOST_USER=0
+VHOST_USER_MQ=1
 
 # setup_build() - Set up pane layout for build tests
 setup_build() {
@@ -46,6 +48,7 @@ setup_passt() {
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
 	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
 	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
+	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
 
 	context_run passt "make clean"
 	context_run passt "make valgrind"
@@ -59,8 +62,8 @@ setup_passt() {
 		__vmem="$(((${__vmem} + 500) / 1000))G"
 		__qemu_netdev="						       \
 			-chardev socket,id=c,path=${STATESETUP}/passt.socket   \
-			-netdev vhost-user,id=v,chardev=c		       \
-			-device virtio-net,netdev=v			       \
+			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
+			-device virtio-net,netdev=v${__virtio_opts}            \
 			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
 			-numa node,memdev=m"
 	else
@@ -156,6 +159,7 @@ setup_passt_in_ns() {
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
 	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
 	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
+	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
 
 	if [ ${VALGRIND} -eq 1 ]; then
 		context_run passt "make clean"
@@ -173,8 +177,8 @@ setup_passt_in_ns() {
 		__vmem="$(((${__vmem} + 500) / 1000))G"
 		__qemu_netdev="						       \
 			-chardev socket,id=c,path=${STATESETUP}/passt.socket   \
-			-netdev vhost-user,id=v,chardev=c		       \
-			-device virtio-net,netdev=v			       \
+			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
+			-device virtio-net,netdev=v${__virtio_opts}            \
 			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
 			-numa node,memdev=m"
 	else
@@ -251,6 +255,7 @@ setup_two_guests() {
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
 	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
 	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
+	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
 
 	context_run_bg passt_2 "./passt -s ${STATESETUP}/passt_2.socket -P ${STATESETUP}/passt_2.pid -f ${__opts} --hostname hostname2 --fqdn fqdn2 -t 10004 -u 10004"
 	wait_for [ -f "${STATESETUP}/passt_2.pid" ]
@@ -260,14 +265,14 @@ setup_two_guests() {
 		__vmem="$(((${__vmem} + 500) / 1000))G"
 		__qemu_netdev1="					       \
 			-chardev socket,id=c,path=${STATESETUP}/passt_1.socket \
-			-netdev vhost-user,id=v,chardev=c		       \
-			-device virtio-net,netdev=v			       \
+			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
+			-device virtio-net,netdev=v${__virtio_opts}            \
 			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
 			-numa node,memdev=m"
 		__qemu_netdev2="					       \
 			-chardev socket,id=c,path=${STATESETUP}/passt_2.socket \
-			-netdev vhost-user,id=v,chardev=c		       \
-			-device virtio-net,netdev=v			       \
+			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
+			-device virtio-net,netdev=v${__virtio_opts}            \
 			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
 			-numa node,memdev=m"
 	else
diff --git a/test/run b/test/run
index f858e5586847..652cc12b1234 100755
--- a/test/run
+++ b/test/run
@@ -190,6 +190,29 @@ run() {
 	test passt_vu_in_ns/shutdown
 	teardown passt_in_ns
 
+	VHOST_USER=1
+	VHOST_USER_MQ=8
+	setup passt_in_ns
+	test passt_vu/ndp
+	test passt_vu_in_ns/dhcp
+	test passt_vu_in_ns/icmp
+	test passt_vu_in_ns/tcp
+	test passt_vu_in_ns/udp
+	test passt_vu_in_ns/shutdown
+	teardown passt_in_ns
+
+	setup two_guests
+	test two_guests_vu/basic
+	teardown two_guests
+
+	setup passt_in_ns
+	test passt_vu/ndp
+	test passt_vu_in_ns/dhcp
+	test perf/passt_vu_tcp
+	test perf/passt_vu_udp
+	test passt_vu_in_ns/shutdown
+	teardown passt_in_ns
+
 	# TODO: Make those faster by at least pre-installing gcc and make on
 	# non-x86 images, then re-enable.
 skip_distro() {
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
                   ` (2 preceding siblings ...)
  2025-12-03 18:54 ` [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  2025-12-11  7:01   ` Stefano Brivio
  2025-12-03 18:54 ` [PATCH v3 5/6] tap: Convert packet pools to per-queue-pair arrays for multiqueue Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 6/6] flow: Add queue pair tracking to flow management Laurent Vivier
  5 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Add a queue pair parameter to vu_send_single() and propagate this parameter
through the entire network stack call chain. The queue pair parameter specifies
which queue pair to use for sending packets in vhost-user mode.

All callers currently pass queue pair #0 to preserve existing
behavior. This is a preparatory step for enabling multi-queue and
per-queue worker threads in vhost-user mode.

No functional change.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 arp.c          | 15 +++++----
 arp.h          |  6 ++--
 dhcp.c         |  5 +--
 dhcp.h         |  2 +-
 dhcpv6.c       | 12 +++++---
 dhcpv6.h       |  2 +-
 fwd.c          | 18 ++++++-----
 fwd.h          |  5 +--
 icmp.c         |  6 ++--
 ndp.c          | 35 ++++++++++++---------
 ndp.h          |  7 +++--
 netlink.c      |  2 +-
 tap.c          | 83 +++++++++++++++++++++++++++++---------------------
 tap.h          | 17 ++++++-----
 tcp.c          | 16 ++++++----
 tcp.h          |  7 +++--
 udp.c          | 22 +++++++------
 udp.h          |  4 +--
 udp_internal.h |  4 +--
 vu_common.c    | 13 +++++---
 vu_common.h    |  3 +-
 21 files changed, 164 insertions(+), 120 deletions(-)

diff --git a/arp.c b/arp.c
index bb042e9585a3..1dc8b87cd993 100644
--- a/arp.c
+++ b/arp.c
@@ -63,11 +63,12 @@ static bool ignore_arp(const struct ctx *c,
 /**
  * arp() - Check if this is a supported ARP message, reply as needed
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the reply
  * @data:	Single packet with Ethernet buffer
  *
  * Return: 1 if handled, -1 on failure
  */
-int arp(const struct ctx *c, struct iov_tail *data)
+int arp(const struct ctx *c, unsigned int qpair, struct iov_tail *data)
 {
 	union inany_addr tgt;
 	struct {
@@ -112,7 +113,7 @@ int arp(const struct ctx *c, struct iov_tail *data)
 	memcpy(resp.am.tha,		am->sha,	sizeof(resp.am.tha));
 	memcpy(resp.am.tip,		am->sip,	sizeof(resp.am.tip));
 
-	tap_send_single(c, &resp, sizeof(resp));
+	tap_send_single(c, qpair, &resp, sizeof(resp));
 
 	return 1;
 }
@@ -120,8 +121,9 @@ int arp(const struct ctx *c, struct iov_tail *data)
 /**
  * arp_send_init_req() - Send initial ARP request to retrieve guest MAC address
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the request
  */
-void arp_send_init_req(const struct ctx *c)
+void arp_send_init_req(const struct ctx *c, unsigned int qpair)
 {
 	struct {
 		struct ethhdr eh;
@@ -148,16 +150,17 @@ void arp_send_init_req(const struct ctx *c)
 	memcpy(req.am.tip,	&c->ip4.addr,		sizeof(req.am.tip));
 
 	debug("Sending initial ARP request for guest MAC address");
-	tap_send_single(c, &req, sizeof(req));
+	tap_send_single(c, qpair, &req, sizeof(req));
 }
 
 /**
  * arp_announce() - Send an ARP announcement for an IPv4 host
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the announcement
  * @ip:	IPv4 address we announce as owned by @mac
  * @mac:	MAC address to advertise for @ip
  */
-void arp_announce(const struct ctx *c, struct in_addr *ip,
+void arp_announce(const struct ctx *c, unsigned int qpair, struct in_addr *ip,
 		  const unsigned char *mac)
 {
 	char ip_str[INET_ADDRSTRLEN];
@@ -202,5 +205,5 @@ void arp_announce(const struct ctx *c, struct in_addr *ip,
 	eth_ntop(mac, mac_str, sizeof(mac_str));
 	debug("ARP announcement for %s / %s", ip_str, mac_str);
 
-	tap_send_single(c, &msg, sizeof(msg));
+	tap_send_single(c, qpair, &msg, sizeof(msg));
 }
diff --git a/arp.h b/arp.h
index 4862e90a14ee..0f7a722a8ea8 100644
--- a/arp.h
+++ b/arp.h
@@ -20,9 +20,9 @@ struct arpmsg {
 	unsigned char tip[4];
 } __attribute__((__packed__));
 
-int arp(const struct ctx *c, struct iov_tail *data);
-void arp_send_init_req(const struct ctx *c);
-void arp_announce(const struct ctx *c, struct in_addr *ip,
+int arp(const struct ctx *c, unsigned int qpair, struct iov_tail *data);
+void arp_send_init_req(const struct ctx *c, unsigned int qpair);
+void arp_announce(const struct ctx *c, unsigned int qpair, struct in_addr *ip,
 		  const unsigned char *mac);
 
 #endif /* ARP_H */
diff --git a/dhcp.c b/dhcp.c
index 6b9c2e3b9e5a..e3f5673cc5d8 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -296,11 +296,12 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
 /**
  * dhcp() - Check if this is a DHCP message, reply as needed
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the reply
  * @data:	Single packet with Ethernet buffer
  *
  * Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure
  */
-int dhcp(const struct ctx *c, struct iov_tail *data)
+int dhcp(const struct ctx *c, unsigned int qpair, struct iov_tail *data)
 {
 	char macstr[ETH_ADDRSTRLEN];
 	size_t mlen, dlen, opt_len;
@@ -471,7 +472,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data)
 	else
 		dst = c->ip4.addr;
 
-	tap_udp4_send(c, c->ip4.our_tap_addr, 67, dst, 68, &reply, dlen);
+	tap_udp4_send(c, qpair, c->ip4.our_tap_addr, 67, dst, 68, &reply, dlen);
 
 	return 1;
 }
diff --git a/dhcp.h b/dhcp.h
index cd50c99b8856..6d034f0c58af 100644
--- a/dhcp.h
+++ b/dhcp.h
@@ -6,7 +6,7 @@
 #ifndef DHCP_H
 #define DHCP_H
 
-int dhcp(const struct ctx *c, struct iov_tail *data);
+int dhcp(const struct ctx *c, unsigned int qpair, struct iov_tail *data);
 void dhcp_init(void);
 
 #endif /* DHCP_H */
diff --git a/dhcpv6.c b/dhcpv6.c
index e4df0db562e6..5fffac5d95e5 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -369,12 +369,13 @@ notonlink:
 /**
  * dhcpv6_send_ia_notonlink() - Send NotOnLink status
  * @c:			Execution context
+ * @qpair:		Queue pair on which to send the reply
  * @ia_base:		Non-appropriate IA_NA or IA_TA base
  * @client_id_base:	Client ID message option base
  * @len:		Client ID length
  * @xid:		Transaction ID for message exchange
  */
-static void dhcpv6_send_ia_notonlink(struct ctx *c,
+static void dhcpv6_send_ia_notonlink(struct ctx *c, unsigned int qpair,
 				     const struct iov_tail *ia_base,
 				     const struct iov_tail *client_id_base,
 				     int len, uint32_t xid)
@@ -404,7 +405,7 @@ static void dhcpv6_send_ia_notonlink(struct ctx *c,
 
 	resp_not_on_link.hdr.xid = xid;
 
-	tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+	tap_udp6_send(c, qpair, src, 547, tap_ip6_daddr(c, src), 546,
 		      xid, &resp_not_on_link, n);
 }
 
@@ -539,13 +540,14 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data,
 /**
  * dhcpv6() - Check if this is a DHCPv6 message, reply as needed
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the reply
  * @data:	Single packet starting from UDP header
  * @saddr:	Source IPv6 address of original message
  * @daddr:	Destination IPv6 address of original message
  *
  * Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure
  */
-int dhcpv6(struct ctx *c, struct iov_tail *data,
+int dhcpv6(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 	   const struct in6_addr *saddr, const struct in6_addr *daddr)
 {
 	const struct opt_server_id *server_id = NULL;
@@ -627,7 +629,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
 
 		if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) {
 
-			dhcpv6_send_ia_notonlink(c, data, &client_id_base,
+			dhcpv6_send_ia_notonlink(c, qpair, data, &client_id_base,
 						 ntohs(client_id->l), mh->xid);
 
 			return 1;
@@ -677,7 +679,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
 
 	resp.hdr.xid = mh->xid;
 
-	tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+	tap_udp6_send(c, qpair, src, 547, tap_ip6_daddr(c, src), 546,
 		      mh->xid, &resp, n);
 	c->ip6.addr_seen = c->ip6.addr;
 
diff --git a/dhcpv6.h b/dhcpv6.h
index c706dfdbb2ac..3a249b39e6c7 100644
--- a/dhcpv6.h
+++ b/dhcpv6.h
@@ -6,7 +6,7 @@
 #ifndef DHCPV6_H
 #define DHCPV6_H
 
-int dhcpv6(struct ctx *c, struct iov_tail *data,
+int dhcpv6(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 	   struct in6_addr *saddr, struct in6_addr *daddr);
 void dhcpv6_init(const struct ctx *c);
 
diff --git a/fwd.c b/fwd.c
index c417e0f5ddb9..082d9de1eb91 100644
--- a/fwd.c
+++ b/fwd.c
@@ -110,12 +110,14 @@ static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c,
 /**
  * fwd_neigh_table_update() - Allocate or update neighbour table entry
  * @c:		Execution context
+ * @qpair:	Queue pair to use for sending announcements
  * @addr:	IP address used to determine insertion slot and store in entry
  * @mac:	The MAC address associated with the neighbour address
  * @permanent:	Created entry cannot be altered or freed
  */
-void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
-			    const uint8_t *mac, bool permanent)
+void fwd_neigh_table_update(const struct ctx *c, unsigned int qpair,
+			    const union inany_addr *addr, const uint8_t *mac,
+			    bool permanent)
 {
 	struct neigh_table *t = &neigh_table;
 	struct neigh_table_entry *e;
@@ -147,9 +149,9 @@ void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
 		return;
 
 	if (inany_v4(addr))
-		arp_announce(c, inany_v4(addr), e->mac);
+		arp_announce(c, qpair, inany_v4(addr), e->mac);
 	else
-		ndp_unsolicited_na(c, &addr->a6);
+		ndp_unsolicited_na(c, qpair, &addr->a6);
 }
 
 /**
@@ -230,19 +232,19 @@ void fwd_neigh_table_init(const struct ctx *c)
 
 	/* Blocker entries to stop events from hosts using these addresses */
 	if (!inany_is_unspecified4(&mhl))
-		fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true);
+		fwd_neigh_table_update(c, 0, &mhl, c->our_tap_mac, true);
 
 	if (!inany_is_unspecified4(&mga))
-		fwd_neigh_table_update(c, &mga, c->our_tap_mac, true);
+		fwd_neigh_table_update(c, 0, &mga, c->our_tap_mac, true);
 
 	mhl = *(union inany_addr *)&c->ip6.map_host_loopback;
 	mga = *(union inany_addr *)&c->ip6.map_guest_addr;
 
 	if (!inany_is_unspecified6(&mhl))
-		fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true);
+		fwd_neigh_table_update(c, 0, &mhl, c->our_tap_mac, true);
 
 	if (!inany_is_unspecified6(&mga))
-		fwd_neigh_table_update(c, &mga, c->our_tap_mac, true);
+		fwd_neigh_table_update(c, 0, &mga, c->our_tap_mac, true);
 }
 
 /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral
diff --git a/fwd.h b/fwd.h
index 779258221a9a..839737028ace 100644
--- a/fwd.h
+++ b/fwd.h
@@ -55,8 +55,9 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
 			    const struct flowside *ini, struct flowside *tgt);
 uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 			  const struct flowside *ini, struct flowside *tgt);
-void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
-			    const uint8_t *mac, bool permanent);
+void fwd_neigh_table_update(const struct ctx *c, unsigned int qpair,
+			    const union inany_addr *addr, const uint8_t *mac,
+			    bool permanent);
 void fwd_neigh_table_free(const struct ctx *c,
 			  const union inany_addr *addr);
 void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
diff --git a/icmp.c b/icmp.c
index 35faefb91870..a9f0518c2f61 100644
--- a/icmp.c
+++ b/icmp.c
@@ -132,12 +132,14 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref)
 		const struct in_addr *daddr = inany_v4(&ini->eaddr);
 
 		ASSERT(saddr && daddr); /* Must have IPv4 addresses */
-		tap_icmp4_send(c, *saddr, *daddr, buf, pingf->f.tap_omac, n);
+		tap_icmp4_send(c, 0, *saddr, *daddr, buf,
+			       pingf->f.tap_omac, n);
 	} else if (pingf->f.type == FLOW_PING6) {
 		const struct in6_addr *saddr = &ini->oaddr.a6;
 		const struct in6_addr *daddr = &ini->eaddr.a6;
 
-		tap_icmp6_send(c, saddr, daddr, buf, pingf->f.tap_omac, n);
+		tap_icmp6_send(c, 0, saddr, daddr, buf,
+			       pingf->f.tap_omac, n);
 	}
 	return;
 
diff --git a/ndp.c b/ndp.c
index eb9e31399555..c1d8dd62d7e2 100644
--- a/ndp.c
+++ b/ndp.c
@@ -175,25 +175,27 @@ struct ndp_ns {
 /**
  * ndp_send() - Send an NDP message
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the message
  * @dst:	IPv6 address to send the message to
  * @buf:	ICMPv6 header + message payload
  * @l4len:	Length of message, including ICMPv6 header
  */
-static void ndp_send(const struct ctx *c, const struct in6_addr *dst,
+static void ndp_send(const struct ctx *c, unsigned int qpair, const struct in6_addr *dst,
 		     const void *buf, size_t l4len)
 {
 	const struct in6_addr *src = &c->ip6.our_tap_ll;
 
-	tap_icmp6_send(c, src, dst, buf, c->our_tap_mac, l4len);
+	tap_icmp6_send(c, qpair, src, dst, buf, c->our_tap_mac, l4len);
 }
 
 /**
  * ndp_na() - Send an NDP Neighbour Advertisement (NA) message
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the NA
  * @dst:	IPv6 address to send the NA to
  * @addr:	IPv6 address to advertise
  */
-static void ndp_na(const struct ctx *c, const struct in6_addr *dst,
+static void ndp_na(const struct ctx *c, unsigned int qpair, const struct in6_addr *dst,
 	    const struct in6_addr *addr)
 {
 	union inany_addr tgt;
@@ -217,26 +219,29 @@ static void ndp_na(const struct ctx *c, const struct in6_addr *dst,
 	inany_from_af(&tgt, AF_INET6, addr);
 	fwd_neigh_mac_get(c, &tgt, na.target_l2_addr.mac);
 
-	ndp_send(c, dst, &na, sizeof(na));
+	ndp_send(c, qpair, dst, &na, sizeof(na));
 }
 
 /**
  * ndp_unsolicited_na() - Send unsolicited NA
  * @c:         Execution context
+ * @qpair:     Queue pair on which to send the RA
  * @addr:      IPv6 address to advertise
  */
-void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr)
+void ndp_unsolicited_na(const struct ctx *c, unsigned int qpair,
+			const struct in6_addr *addr)
 {
 	if (tap_is_ready(c))
-		ndp_na(c, &in6addr_ll_all_nodes, addr);
+		ndp_na(c, qpair, &in6addr_ll_all_nodes, addr);
 }
 
 /**
  * ndp_ra() - Send an NDP Router Advertisement (RA) message
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the RA
  * @dst:	IPv6 address to send the RA to
  */
-static void ndp_ra(const struct ctx *c, const struct in6_addr *dst)
+static void ndp_ra(const struct ctx *c, unsigned int qpair, const struct in6_addr *dst)
 {
 	struct ndp_ra ra = {
 		.ih = {
@@ -342,18 +347,19 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst)
 	memcpy(&ra.source_ll.mac, c->our_tap_mac, ETH_ALEN);
 
 	/* NOLINTNEXTLINE(clang-analyzer-security.PointerSub) */
-	ndp_send(c, dst, &ra, ptr - (unsigned char *)&ra);
+	ndp_send(c, qpair, dst, &ra, ptr - (unsigned char *)&ra);
 }
 
 /**
  * ndp() - Check for NDP solicitations, reply as needed
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send replies
  * @saddr:	Source IPv6 address
  * @data:	Single packet with ICMPv6 header
  *
  * Return: 0 if not handled here, 1 if handled, -1 on failure
  */
-int ndp(const struct ctx *c, const struct in6_addr *saddr,
+int ndp(const struct ctx *c, unsigned int qpair, const struct in6_addr *saddr,
 	struct iov_tail *data)
 {
 	struct icmp6hdr ih_storage;
@@ -382,13 +388,13 @@ int ndp(const struct ctx *c, const struct in6_addr *saddr,
 
 		info("NDP: received NS, sending NA");
 
-		ndp_na(c, saddr, &ns->target_addr);
+		ndp_na(c, qpair, saddr, &ns->target_addr);
 	} else if (ih->icmp6_type == RS) {
 		if (c->no_ra)
 			return 1;
 
 		info("NDP: received RS, sending RA");
-		ndp_ra(c, saddr);
+		ndp_ra(c, qpair, saddr);
 	}
 
 	return 1;
@@ -446,7 +452,7 @@ void ndp_timer(const struct ctx *c, const struct timespec *now)
 
 	info("NDP: sending unsolicited RA, next in %llds", (long long)interval);
 
-	ndp_ra(c, &in6addr_ll_all_nodes);
+	ndp_ra(c, 0, &in6addr_ll_all_nodes);
 
 first:
 	next_ra = now->tv_sec + interval;
@@ -455,8 +461,9 @@ first:
 /**
  * ndp_send_init_req() - Send initial NDP NS to retrieve guest MAC address
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the request
  */
-void ndp_send_init_req(const struct ctx *c)
+void ndp_send_init_req(const struct ctx *c, unsigned int qpair)
 {
 	struct ndp_ns ns = {
 		.ih = {
@@ -469,5 +476,5 @@ void ndp_send_init_req(const struct ctx *c)
 		.target_addr = c->ip6.addr
 	};
 	debug("Sending initial NDP NS request for guest MAC address");
-	ndp_send(c, &c->ip6.addr, &ns, sizeof(ns));
+	ndp_send(c, qpair, &c->ip6.addr, &ns, sizeof(ns));
 }
diff --git a/ndp.h b/ndp.h
index 56b756d8400b..8c168fc199fe 100644
--- a/ndp.h
+++ b/ndp.h
@@ -8,10 +8,11 @@
 
 struct icmp6hdr;
 
-int ndp(const struct ctx *c, const struct in6_addr *saddr,
+int ndp(const struct ctx *c, unsigned int qpair, const struct in6_addr *saddr,
 	struct iov_tail *data);
 void ndp_timer(const struct ctx *c, const struct timespec *now);
-void ndp_send_init_req(const struct ctx *c);
-void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr);
+void ndp_send_init_req(const struct ctx *c, unsigned int qpair);
+void ndp_unsolicited_na(const struct ctx *c, unsigned int qpair,
+			const struct in6_addr *addr);
 
 #endif /* NDP_H */
diff --git a/netlink.c b/netlink.c
index 82a2f0c9aef7..c5b402a0ab98 100644
--- a/netlink.c
+++ b/netlink.c
@@ -1201,7 +1201,7 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
 
 	eth_ntop(lladdr, mac_str, sizeof(mac_str));
 	trace("neighbour notifier update: %s / %s", ip_str, mac_str);
-	fwd_neigh_table_update(c, &daddr, lladdr, false);
+	fwd_neigh_table_update(c, 0, &daddr, lladdr, false);
 }
 
 /**
diff --git a/tap.c b/tap.c
index 591b49491aa3..0d1f05865d60 100644
--- a/tap.c
+++ b/tap.c
@@ -125,10 +125,12 @@ unsigned long tap_l2_max_len(const struct ctx *c)
 /**
  * tap_send_single() - Send a single frame
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the frame
  * @data:	Packet buffer
  * @l2len:	Total L2 packet length
  */
-void tap_send_single(const struct ctx *c, const void *data, size_t l2len)
+void tap_send_single(const struct ctx *c, unsigned int qpair, const void *data,
+		     size_t l2len)
 {
 	uint32_t vnet_len = htonl(l2len);
 	struct iovec iov[2];
@@ -147,7 +149,7 @@ void tap_send_single(const struct ctx *c, const void *data, size_t l2len)
 		tap_send_frames(c, iov, iovcnt, 1);
 		break;
 	case MODE_VU:
-		vu_send_single(c, data, l2len);
+		vu_send_single(c, qpair, data, l2len);
 		break;
 	}
 }
@@ -250,6 +252,7 @@ void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
 /**
  * tap_udp4_send() - Send UDP over IPv4 packet
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packet
  * @src:	IPv4 source address
  * @sport:	UDP source port
  * @dst:	IPv4 destination address
@@ -257,7 +260,7 @@ void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
  * @in:	UDP payload contents (not including UDP header)
  * @dlen:	UDP payload length (not including UDP header)
  */
-void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
+void tap_udp4_send(const struct ctx *c, unsigned int qpair, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen)
 {
@@ -268,20 +271,22 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 	char *data = tap_push_uh4(uh, src, sport, dst, dport, in, dlen);
 
 	memcpy(data, in, dlen);
-	tap_send_single(c, buf, dlen + (data - buf));
+	tap_send_single(c, qpair, buf, dlen + (data - buf));
 }
 
 /**
  * tap_icmp4_send() - Send ICMPv4 packet
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packet
  * @src:	IPv4 source address
  * @dst:	IPv4 destination address
  * @in:		ICMP packet, including ICMP header
  * @src_mac:	MAC address to be used as source for message
  * @l4len:	ICMP packet length, including ICMP header
  */
-void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst,
-		    const void *in, const void *src_mac, size_t l4len)
+void tap_icmp4_send(const struct ctx *c, unsigned int qpair, struct in_addr src,
+		    struct in_addr dst, const void *in, const void *src_mac,
+		    size_t l4len)
 {
 	char buf[USHRT_MAX];
 	struct iphdr *ip4h = tap_push_l2h(c, buf, src_mac, ETH_P_IP);
@@ -291,7 +296,7 @@ void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst,
 	memcpy(icmp4h, in, l4len);
 	csum_icmp4(icmp4h, icmp4h + 1, l4len - sizeof(*icmp4h));
 
-	tap_send_single(c, buf, l4len + ((char *)icmp4h - buf));
+	tap_send_single(c, qpair, buf, l4len + ((char *)icmp4h - buf));
 }
 
 /**
@@ -355,6 +360,7 @@ void *tap_push_uh6(struct udphdr *uh,
 /**
  * tap_udp6_send() - Send UDP over IPv6 packet
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packet
  * @src:	IPv6 source address
  * @sport:	UDP source port
  * @dst:	IPv6 destination address
@@ -363,7 +369,7 @@ void *tap_push_uh6(struct udphdr *uh,
  * @in:	UDP payload contents (not including UDP header)
  * @dlen:	UDP payload length (not including UDP header)
  */
-void tap_udp6_send(const struct ctx *c,
+void tap_udp6_send(const struct ctx *c, unsigned int qpair,
 		   const struct in6_addr *src, in_port_t sport,
 		   const struct in6_addr *dst, in_port_t dport,
 		   uint32_t flow, void *in, size_t dlen)
@@ -376,19 +382,20 @@ void tap_udp6_send(const struct ctx *c,
 	char *data = tap_push_uh6(uh, src, sport, dst, dport, in, dlen);
 
 	memcpy(data, in, dlen);
-	tap_send_single(c, buf, dlen + (data - buf));
+	tap_send_single(c, qpair, buf, dlen + (data - buf));
 }
 
 /**
  * tap_icmp6_send() - Send ICMPv6 packet
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packet
  * @src:	IPv6 source address
  * @dst:	IPv6 destination address
  * @in:		ICMP packet, including ICMP header
  * @src_mac:	MAC address to be used as source for message
  * @l4len:	ICMP packet length, including ICMP header
  */
-void tap_icmp6_send(const struct ctx *c,
+void tap_icmp6_send(const struct ctx *c, unsigned int qpair,
 		    const struct in6_addr *src, const struct in6_addr *dst,
 		    const void *in, const void *src_mac, size_t l4len)
 {
@@ -400,7 +407,7 @@ void tap_icmp6_send(const struct ctx *c,
 	memcpy(icmp6h, in, l4len);
 	csum_icmp6(icmp6h, src, dst, icmp6h + 1, l4len - sizeof(*icmp6h));
 
-	tap_send_single(c, buf, l4len + ((char *)icmp6h - buf));
+	tap_send_single(c, qpair, buf, l4len + ((char *)icmp6h - buf));
 }
 
 /**
@@ -696,11 +703,13 @@ static bool tap4_is_fragment(const struct iphdr *iph,
 /**
  * tap4_handler() - IPv4 and ARP packet handler for tap file descriptor
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packets
  * @now:	Current timestamp
  *
  * Return: count of packets consumed by handlers
  */
-static int tap4_handler(struct ctx *c, const struct timespec *now)
+static int tap4_handler(struct ctx *c, unsigned int qpair,
+			const struct timespec *now)
 {
 	unsigned int i, j, seq_count;
 	struct tap4_l4_t *seq;
@@ -727,7 +736,7 @@ resume:
 		if (!eh)
 			continue;
 		if (ntohs(eh->h_proto) == ETH_P_ARP) {
-			arp(c, &data);
+			arp(c, qpair, &data);
 			continue;
 		}
 
@@ -788,7 +797,7 @@ resume:
 			struct iov_tail eh_data;
 
 			packet_get(pool_tap4, i, &eh_data);
-			if (dhcp(c, &eh_data))
+			if (dhcp(c, qpair, &eh_data))
 				continue;
 		}
 
@@ -851,7 +860,7 @@ append:
 			if (c->no_tcp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += tcp_tap_handler(c, PIF_TAP, AF_INET,
+				k += tcp_tap_handler(c, qpair, PIF_TAP, AF_INET,
 						     &seq->saddr, &seq->daddr,
 						     0, p, k, now);
 		} else if (seq->protocol == IPPROTO_UDP) {
@@ -873,11 +882,12 @@ append:
 /**
  * tap6_handler() - IPv6 packet handler for tap file descriptor
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packets
  * @now:	Current timestamp
  *
  * Return: count of packets consumed by handlers
  */
-static int tap6_handler(struct ctx *c, const struct timespec *now)
+static int tap6_handler(struct ctx *c, unsigned int qpair, const struct timespec *now)
 {
 	unsigned int i, j, seq_count = 0;
 	struct tap6_l4_t *seq;
@@ -954,7 +964,7 @@ resume:
 				continue;
 
 			ndp_data = data;
-			if (ndp(c, saddr, &ndp_data))
+			if (ndp(c, qpair, saddr, &ndp_data))
 				continue;
 
 			tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
@@ -973,7 +983,7 @@ resume:
 		if (proto == IPPROTO_UDP) {
 			struct iov_tail uh_data = data;
 
-			if (dhcpv6(c, &uh_data, saddr, daddr))
+			if (dhcpv6(c, qpair, &uh_data, saddr, daddr))
 				continue;
 		}
 
@@ -1041,7 +1051,7 @@ append:
 			if (c->no_tcp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += tcp_tap_handler(c, PIF_TAP, AF_INET6,
+				k += tcp_tap_handler(c, qpair, PIF_TAP, AF_INET6,
 						     &seq->saddr, &seq->daddr,
 						     seq->flow_lbl, p, k, now);
 		} else if (seq->protocol == IPPROTO_UDP) {
@@ -1072,21 +1082,23 @@ void tap_flush_pools(void)
 /**
  * tap_handler() - IPv4/IPv6 and ARP packet handler for tap file descriptor
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packets
  * @now:	Current timestamp
  */
-void tap_handler(struct ctx *c, const struct timespec *now)
+void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now)
 {
-	tap4_handler(c, now);
-	tap6_handler(c, now);
+	tap4_handler(c, qpair, now);
+	tap6_handler(c, qpair, now);
 }
 
 /**
  * tap_add_packet() - Queue/capture packet, update notion of guest MAC address
  * @c:		Execution context
+ * @qpair:	Queue pair associated with the packet
  * @data:	Packet to add to the pool
  * @now:	Current timestamp
  */
-void tap_add_packet(struct ctx *c, struct iov_tail *data,
+void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 		    const struct timespec *now)
 {
 	struct ethhdr eh_storage;
@@ -1111,14 +1123,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
 	case ETH_P_ARP:
 	case ETH_P_IP:
 		if (!pool_can_fit(pool_tap4, data)) {
-			tap4_handler(c, now);
+			tap4_handler(c, qpair, now);
 			pool_flush(pool_tap4);
 		}
 		packet_add(pool_tap4, data);
 		break;
 	case ETH_P_IPV6:
 		if (!pool_can_fit(pool_tap6, data)) {
-			tap6_handler(c, now);
+			tap6_handler(c, qpair, now);
 			pool_flush(pool_tap6);
 		}
 		packet_add(pool_tap6, data);
@@ -1205,7 +1217,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
 		n -= sizeof(uint32_t);
 
 		data = IOV_TAIL_FROM_BUF(p, l2len, 0);
-		tap_add_packet(c, &data, now);
+		tap_add_packet(c, 0, &data, now);
 
 		p += l2len;
 		n -= l2len;
@@ -1214,7 +1226,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
 	partial_len = n;
 	partial_frame = p;
 
-	tap_handler(c, now);
+	tap_handler(c, 0, now);
 }
 
 /**
@@ -1273,10 +1285,10 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 			continue;
 
 		data = IOV_TAIL_FROM_BUF(pkt_buf + n, len, 0);
-		tap_add_packet(c, &data, now);
+		tap_add_packet(c, 0, &data, now);
 	}
 
-	tap_handler(c, now);
+	tap_handler(c, 0, now);
 }
 
 /**
@@ -1365,8 +1377,9 @@ bool tap_is_ready(const struct ctx *c)
 /**
  * tap_start_connection() - start a new connection
  * @c:		Execution context
+ * @qpair:	Queue pair to use for the connection
  */
-static void tap_start_connection(const struct ctx *c)
+static void tap_start_connection(const struct ctx *c, unsigned int qpair)
 {
 	union epoll_ref ref = { 0 };
 
@@ -1389,9 +1402,9 @@ static void tap_start_connection(const struct ctx *c)
 		return;
 
 	if (c->ifi4)
-		arp_send_init_req(c);
+		arp_send_init_req(c, qpair);
 	if (c->ifi6 && !c->no_ndp)
-		ndp_send_init_req(c);
+		ndp_send_init_req(c, qpair);
 }
 
 /**
@@ -1439,7 +1452,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
 	    setsockopt(c->fd_tap, SOL_SOCKET, SO_SNDBUF, &v, sizeof(v)))
 		trace("tap: failed to set SO_SNDBUF to %i", v);
 
-	tap_start_connection(c);
+	tap_start_connection(c, 0);
 }
 
 /**
@@ -1489,7 +1502,7 @@ static void tap_sock_tun_init(struct ctx *c)
 
 	pasta_ns_conf(c);
 
-	tap_start_connection(c);
+	tap_start_connection(c, 0);
 }
 
 /**
@@ -1526,7 +1539,7 @@ void tap_backend_init(struct ctx *c)
 
 	if (c->fd_tap != -1) { /* Passed as --fd */
 		ASSERT(c->one_off);
-		tap_start_connection(c);
+		tap_start_connection(c, 0);
 		return;
 	}
 
diff --git a/tap.h b/tap.h
index ee22a9d78c44..d3ac0cb6a233 100644
--- a/tap.h
+++ b/tap.h
@@ -87,24 +87,25 @@ void *tap_push_ip6h(struct ipv6hdr *ip6h,
 		    const struct in6_addr *src,
 		    const struct in6_addr *dst,
 		    size_t l4len, uint8_t proto, uint32_t flow);
-void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
+void tap_udp4_send(const struct ctx *c, unsigned int qpair, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen);
-void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst,
-		    const void *in, const void *src_mac, size_t l4len);
+void tap_icmp4_send(const struct ctx *c, unsigned int qpair, struct in_addr src,
+		    struct in_addr dst, const void *in, const void *src_mac,
+		    size_t l4len);
 const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
 void *tap_push_ip6h(struct ipv6hdr *ip6h,
 		    const struct in6_addr *src, const struct in6_addr *dst,
 		    size_t l4len, uint8_t proto, uint32_t flow);
-void tap_udp6_send(const struct ctx *c,
+void tap_udp6_send(const struct ctx *c, unsigned int qpair,
 		   const struct in6_addr *src, in_port_t sport,
 		   const struct in6_addr *dst, in_port_t dport,
 		   uint32_t flow, void *in, size_t dlen);
-void tap_icmp6_send(const struct ctx *c,
+void tap_icmp6_send(const struct ctx *c, unsigned int qpair,
 		    const struct in6_addr *src, const struct in6_addr *dst,
 		    const void *in, const void *src_mac, size_t l4len);
-void tap_send_single(const struct ctx *c, const void *data, size_t l2len);
+void tap_send_single(const struct ctx *c, unsigned int qpair, const void *data, size_t l2len);
 size_t tap_send_frames(const struct ctx *c, const struct iovec *iov,
 		       size_t bufs_per_frame, size_t nframes);
 void eth_update_mac(struct ethhdr *eh,
@@ -119,7 +120,7 @@ int tap_sock_unix_open(char *sock_path);
 void tap_sock_reset(struct ctx *c);
 void tap_backend_init(struct ctx *c);
 void tap_flush_pools(void);
-void tap_handler(struct ctx *c, const struct timespec *now);
-void tap_add_packet(struct ctx *c, struct iov_tail *data,
+void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now);
+void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 		    const struct timespec *now);
 #endif /* TAP_H */
diff --git a/tcp.c b/tcp.c
index 3202d3385a63..4c84f0e621b8 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1985,6 +1985,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
 /**
  * tcp_rst_no_conn() - Send RST in response to a packet with no connection
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the reply
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address of the packet we're responding to
  * @daddr:	Destination address of the packet we're responding to
@@ -1992,7 +1993,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
  * @th:		TCP header of the packet we're responding to
  * @l4len:	Packet length, including TCP header
  */
-static void tcp_rst_no_conn(const struct ctx *c, int af,
+static void tcp_rst_no_conn(const struct ctx *c, unsigned int qpair, int af,
 			    const void *saddr, const void *daddr,
 			    uint32_t flow_lbl,
 			    const struct tcphdr *th, size_t l4len)
@@ -2050,12 +2051,13 @@ static void tcp_rst_no_conn(const struct ctx *c, int af,
 
 	tcp_update_csum(psum, rsth, &payload);
 	rst_l2len = ((char *)rsth - buf) + sizeof(*rsth);
-	tap_send_single(c, buf, rst_l2len);
+	tap_send_single(c, qpair, buf, rst_l2len);
 }
 
 /**
  * tcp_tap_handler() - Handle packets from tap and state transitions
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send packets
  * @pif:	pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address
@@ -2067,9 +2069,10 @@ static void tcp_rst_no_conn(const struct ctx *c, int af,
  *
  * Return: count of consumed packets
  */
-int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
-		    const void *saddr, const void *daddr, uint32_t flow_lbl,
-		    const struct pool *p, int idx, const struct timespec *now)
+int tcp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
+		    sa_family_t af, const void *saddr, const void *daddr,
+		    uint32_t flow_lbl, const struct pool *p, int idx,
+		    const struct timespec *now)
 {
 	struct tcp_tap_conn *conn;
 	struct tcphdr th_storage;
@@ -2109,7 +2112,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
 			tcp_conn_from_tap(c, af, saddr, daddr, th,
 					  opts, optlen, now);
 		else
-			tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, l4len);
+			tcp_rst_no_conn(c, qpair, af, saddr, daddr, flow_lbl, th,
+					l4len);
 		return 1;
 	}
 
diff --git a/tcp.h b/tcp.h
index 0082386725c2..6329e348194c 100644
--- a/tcp.h
+++ b/tcp.h
@@ -15,9 +15,10 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref,
 			const struct timespec *now);
 void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
 		      uint32_t events);
-int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
-		    const void *saddr, const void *daddr, uint32_t flow_lbl,
-		    const struct pool *p, int idx, const struct timespec *now);
+int tcp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
+		    sa_family_t af, const void *saddr, const void *daddr,
+		    uint32_t flow_lbl, const struct pool *p, int idx,
+		    const struct timespec *now);
 int tcp_sock_init(const struct ctx *c, const union inany_addr *addr,
 		  const char *ifname, in_port_t port);
 int tcp_init(struct ctx *c);
diff --git a/udp.c b/udp.c
index 9c00950250a0..cbc2d7055647 100644
--- a/udp.c
+++ b/udp.c
@@ -384,13 +384,14 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
 /**
  * udp_send_tap_icmp4() - Construct and send ICMPv4 to local peer
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the ICMPv4 packet
  * @ee:	Extended error descriptor
  * @toside:	Destination side of flow
  * @saddr:	Address of ICMP generating node
  * @in:	First bytes (max 8) of original UDP message body
  * @dlen:	Length of the read part of original UDP message body
  */
-static void udp_send_tap_icmp4(const struct ctx *c,
+static void udp_send_tap_icmp4(const struct ctx *c, unsigned int qpair,
 			       const struct sock_extended_err *ee,
 			       const struct flowside *toside,
 			       struct in_addr saddr,
@@ -426,13 +427,14 @@ static void udp_send_tap_icmp4(const struct ctx *c,
 	/* Try to obtain the MAC address of the generating node */
 	saddr_any = inany_from_v4(saddr);
 	fwd_neigh_mac_get(c, &saddr_any, tap_omac);
-	tap_icmp4_send(c, saddr, eaddr, &msg, tap_omac, msglen);
+	tap_icmp4_send(c, qpair, saddr, eaddr, &msg, tap_omac, msglen);
 }
 
 
 /**
  * udp_send_tap_icmp6() - Construct and send ICMPv6 to local peer
  * @c:		Execution context
+ * @qpair:	Queue pair on which to send the ICMPv6 packet
  * @ee:	Extended error descriptor
  * @toside:	Destination side of flow
  * @saddr:	Address of ICMP generating node
@@ -440,7 +442,7 @@ static void udp_send_tap_icmp4(const struct ctx *c,
  * @dlen:	Length of the read part of original UDP message body
  * @flow:	IPv6 flow identifier
  */
-static void udp_send_tap_icmp6(const struct ctx *c,
+static void udp_send_tap_icmp6(const struct ctx *c, unsigned int qpair,
 			       const struct sock_extended_err *ee,
 			       const struct flowside *toside,
 			       const struct in6_addr *saddr,
@@ -474,7 +476,7 @@ static void udp_send_tap_icmp6(const struct ctx *c,
 
 	/* Try to obtain the MAC address of the generating node */
 	fwd_neigh_mac_get(c, (union inany_addr *) saddr, tap_omac);
-	tap_icmp6_send(c, saddr, eaddr, &msg, tap_omac, msglen);
+	tap_icmp6_send(c, qpair, saddr, eaddr, &msg, tap_omac, msglen);
 }
 
 /**
@@ -634,12 +636,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
 	if (hdr->cmsg_level == IPPROTO_IP &&
 	    (o4 = inany_v4(&otap)) && inany_v4(&toside->eaddr)) {
 		dlen = MIN(dlen, ICMP4_MAX_DLEN);
-		udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen);
+		udp_send_tap_icmp4(c, 0, ee, toside, *o4, data, dlen);
 		return 1;
 	}
 
 	if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) {
-		udp_send_tap_icmp6(c, ee, toside, &otap.a6, data, dlen,
+		udp_send_tap_icmp6(c, 0, ee, toside, &otap.a6, data, dlen,
 				   FLOW_IDX(uflow));
 		return 1;
 	}
@@ -833,8 +835,8 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n,
  * @port:	Our (local) port number of @s
  * @now:	Current timestamp
  */
-void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
-		  in_port_t port, const struct timespec *now)
+void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, in_port_t port,
+		  const struct timespec *now)
 {
 	union sockaddr_inany src;
 	union inany_addr dst;
@@ -912,8 +914,8 @@ void udp_listen_sock_handler(const struct ctx *c,
  * @events:	epoll events bitmap
  * @now:	Current timestamp
  */
-void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
-		      uint32_t events, const struct timespec *now)
+void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
+		      const struct timespec *now)
 {
 	struct udp_flow *uflow = udp_at_sidx(ref.flowside);
 
diff --git a/udp.h b/udp.h
index f1d83f380b3f..7d3cd59d9a42 100644
--- a/udp.h
+++ b/udp.h
@@ -9,8 +9,8 @@
 void udp_portmap_clear(void);
 void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 			     uint32_t events, const struct timespec *now);
-void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
-		      uint32_t events, const struct timespec *now);
+void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
+		      const struct timespec *now);
 int udp_tap_handler(const struct ctx *c, uint8_t pif,
 		    sa_family_t af, const void *saddr, const void *daddr,
 		    uint8_t ttl, const struct pool *p, int idx,
diff --git a/udp_internal.h b/udp_internal.h
index 96d11cff6833..ed13c5aec8d5 100644
--- a/udp_internal.h
+++ b/udp_internal.h
@@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
                        const struct flowside *toside, size_t dlen,
 		       bool no_udp_csum);
-void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
-		  in_port_t port, const struct timespec *now);
+void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, in_port_t port,
+		  const struct timespec *now);
 
 #endif /* UDP_INTERNAL_H */
diff --git a/vu_common.c b/vu_common.c
index b13b7c308fd8..80d9a30f6f71 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
 
 		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
 		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
-			tap_add_packet(vdev->context, &data, now);
+			tap_add_packet(vdev->context, 0, &data, now);
 
 		count++;
 	}
-	tap_handler(vdev->context, now);
+	tap_handler(vdev->context, 0, now);
 
 	if (count) {
 		int i;
@@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
 }
 
 /**
- * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
+ * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
  * @c:		execution context
+ * @qpair:	Queue pair on which to send the buffer
  * @buf:	address of the buffer
  * @size:	size of the buffer
  *
  * Return: number of bytes sent, -1 if there is an error
  */
-int vu_send_single(const struct ctx *c, const void *buf, size_t size)
+int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
 {
 	struct vu_dev *vdev = c->vdev;
-	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
 	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
 	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
+	struct vu_virtq *vq;
 	size_t total;
 	int elem_cnt;
 	int i;
 
+	vq = &vdev->vq[qpair << 1];
+
 	trace("vu_send_single size %zu", size);
 
 	if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
diff --git a/vu_common.h b/vu_common.h
index f538f237790b..9ceb8034a9a5 100644
--- a/vu_common.h
+++ b/vu_common.h
@@ -56,6 +56,7 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
 	      struct vu_virtq_element *elem, int elem_cnt);
 void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
 		const struct timespec *now);
-int vu_send_single(const struct ctx *c, const void *buf, size_t size);
+int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf,
+		   size_t size);
 
 #endif /* VU_COMMON_H */
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 5/6] tap: Convert packet pools to per-queue-pair arrays for multiqueue
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
                   ` (3 preceding siblings ...)
  2025-12-03 18:54 ` [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  2025-12-03 18:54 ` [PATCH v3 6/6] flow: Add queue pair tracking to flow management Laurent Vivier
  5 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Convert the global pool_tap4 and pool_tap6 packet pools from single
pools to arrays of pools, one for each queue pair. This change is
necessary to support multiqueue operation in vhost-user mode, where
multiple queue pairs may be processing packets concurrently.

The pool storage structures (pool_tap4_storage and pool_tap6_storage)
are now arrays of VHOST_USER_MAX_VQS/2 elements, with corresponding
pointer arrays (pool_tap4 and pool_tap6) for accessing them.

Update tap_flush_pools() and tap_handler() to take a qpair parameter
that selects which pool to operate on. Add bounds checking assertions
to ensure qpair is within valid range.

In passt and pasta modes, all operations use queue pair 0 (hardcoded
in tap_passt_input and tap_pasta_input). In vhost-user mode, the queue
pair is derived from the virtqueue index (index / 2, as TX/RX queues
come in pairs).

All pools within the array share the same buffer pointer:
- In vhost-user mode: Points to the vhost-user memory structure, which
  is safe as packet data remains in guest memory and pools only track
  iovecs
- In passt/pasta mode: Points to pkt_buf, which is safe as only queue
  pair 0 is used

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 tap.c       | 77 ++++++++++++++++++++++++++++++-----------------------
 tap.h       |  5 ++--
 vu_common.c |  6 ++---
 3 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/tap.c b/tap.c
index 0d1f05865d60..c56afb73fd7e 100644
--- a/tap.c
+++ b/tap.c
@@ -94,9 +94,13 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU);
 	DIV_ROUND_UP(sizeof(pkt_buf),					\
 		     ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
 
-/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
-static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4);
-static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6);
+/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers
+ * One pool per queue pair for multiqueue support
+ */
+static PACKET_POOL_DECL(pool_tap4, TAP_MSGS_IP4) pool_tap4_storage[VHOST_USER_MAX_VQS / 2];
+static struct pool *pool_tap4[VHOST_USER_MAX_VQS / 2];
+static PACKET_POOL_DECL(pool_tap6, TAP_MSGS_IP6) pool_tap6_storage[VHOST_USER_MAX_VQS / 2];
+static struct pool *pool_tap6[VHOST_USER_MAX_VQS / 2];
 
 #define TAP_SEQS		128 /* Different L4 tuples in one batch */
 #define FRAGMENT_MSG_RATE	10  /* # seconds between fragment warnings */
@@ -714,12 +718,12 @@ static int tap4_handler(struct ctx *c, unsigned int qpair,
 	unsigned int i, j, seq_count;
 	struct tap4_l4_t *seq;
 
-	if (!c->ifi4 || !pool_tap4->count)
-		return pool_tap4->count;
+	if (!c->ifi4 || !pool_tap4[qpair]->count)
+		return pool_tap4[qpair]->count;
 
 	i = 0;
 resume:
-	for (seq_count = 0, seq = NULL; i < pool_tap4->count; i++) {
+	for (seq_count = 0, seq = NULL; i < pool_tap4[qpair]->count; i++) {
 		size_t l3len, hlen, l4len;
 		struct ethhdr eh_storage;
 		struct iphdr iph_storage;
@@ -729,7 +733,7 @@ resume:
 		struct iov_tail data;
 		struct iphdr *iph;
 
-		if (!packet_get(pool_tap4, i, &data))
+		if (!packet_get(pool_tap4[qpair], i, &data))
 			continue;
 
 		eh = IOV_PEEK_HEADER(&data, eh_storage);
@@ -796,7 +800,7 @@ resume:
 		if (iph->protocol == IPPROTO_UDP) {
 			struct iov_tail eh_data;
 
-			packet_get(pool_tap4, i, &eh_data);
+			packet_get(pool_tap4[qpair], i, &eh_data);
 			if (dhcp(c, qpair, &eh_data))
 				continue;
 		}
@@ -827,7 +831,7 @@ resume:
 			goto append;
 
 		if (seq_count == TAP_SEQS)
-			break;	/* Resume after flushing if i < pool_tap4->count */
+			break;	/* Resume after flushing if i < pool_tap4[qpair]->count */
 
 		for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) {
 			if (L4_MATCH(iph, uh, seq)) {
@@ -873,10 +877,10 @@ append:
 		}
 	}
 
-	if (i < pool_tap4->count)
+	if (i < pool_tap4[qpair]->count)
 		goto resume;
 
-	return pool_tap4->count;
+	return pool_tap4[qpair]->count;
 }
 
 /**
@@ -892,12 +896,12 @@ static int tap6_handler(struct ctx *c, unsigned int qpair, const struct timespec
 	unsigned int i, j, seq_count = 0;
 	struct tap6_l4_t *seq;
 
-	if (!c->ifi6 || !pool_tap6->count)
-		return pool_tap6->count;
+	if (!c->ifi6 || !pool_tap6[qpair]->count)
+		return pool_tap6[qpair]->count;
 
 	i = 0;
 resume:
-	for (seq_count = 0, seq = NULL; i < pool_tap6->count; i++) {
+	for (seq_count = 0, seq = NULL; i < pool_tap6[qpair]->count; i++) {
 		size_t l4len, plen, check;
 		struct in6_addr *saddr, *daddr;
 		struct ipv6hdr ip6h_storage;
@@ -909,7 +913,7 @@ resume:
 		struct ipv6hdr *ip6h;
 		uint8_t proto;
 
-		if (!packet_get(pool_tap6, i, &data))
+		if (!packet_get(pool_tap6[qpair], i, &data))
 			return -1;
 
 		eh = IOV_REMOVE_HEADER(&data, eh_storage);
@@ -1017,7 +1021,7 @@ resume:
 			goto append;
 
 		if (seq_count == TAP_SEQS)
-			break;	/* Resume after flushing if i < pool_tap6->count */
+			break;	/* Resume after flushing if i < pool_tap6[qpair]->count */
 
 		for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) {
 			if (L4_MATCH(ip6h, proto, uh, seq)) {
@@ -1064,19 +1068,19 @@ append:
 		}
 	}
 
-	if (i < pool_tap6->count)
+	if (i < pool_tap6[qpair]->count)
 		goto resume;
 
-	return pool_tap6->count;
+	return pool_tap6[qpair]->count;
 }
 
 /**
- * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools
+ * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools for a given qpair
  */
-void tap_flush_pools(void)
+void tap_flush_pools(unsigned int qpair)
 {
-	pool_flush(pool_tap4);
-	pool_flush(pool_tap6);
+	pool_flush(pool_tap4[qpair]);
+	pool_flush(pool_tap6[qpair]);
 }
 
 /**
@@ -1087,6 +1091,7 @@ void tap_flush_pools(void)
  */
 void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now)
 {
+	ASSERT(qpair < VHOST_USER_MAX_VQS / 2);
 	tap4_handler(c, qpair, now);
 	tap6_handler(c, qpair, now);
 }
@@ -1119,21 +1124,23 @@ void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 		proto_update_l2_buf(c->guest_mac);
 	}
 
+	ASSERT(qpair < VHOST_USER_MAX_VQS / 2);
+
 	switch (ntohs(eh->h_proto)) {
 	case ETH_P_ARP:
 	case ETH_P_IP:
-		if (!pool_can_fit(pool_tap4, data)) {
+		if (!pool_can_fit(pool_tap4[qpair], data)) {
 			tap4_handler(c, qpair, now);
-			pool_flush(pool_tap4);
+			pool_flush(pool_tap4[qpair]);
 		}
-		packet_add(pool_tap4, data);
+		packet_add(pool_tap4[qpair], data);
 		break;
 	case ETH_P_IPV6:
-		if (!pool_can_fit(pool_tap6, data)) {
+		if (!pool_can_fit(pool_tap6[qpair], data)) {
 			tap6_handler(c, qpair, now);
-			pool_flush(pool_tap6);
+			pool_flush(pool_tap6[qpair]);
 		}
-		packet_add(pool_tap6, data);
+		packet_add(pool_tap6[qpair], data);
 		break;
 	default:
 		break;
@@ -1173,7 +1180,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
 	ssize_t n;
 	char *p;
 
-	tap_flush_pools();
+	tap_flush_pools(0);
 
 	if (partial_len) {
 		/* We have a partial frame from an earlier pass.  Move it to the
@@ -1256,7 +1263,7 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
 {
 	ssize_t n, len;
 
-	tap_flush_pools();
+	tap_flush_pools(0);
 
 	for (n = 0;
 	     n <= (ssize_t)(sizeof(pkt_buf) - L2_MAX_LEN_PASTA);
@@ -1512,10 +1519,14 @@ static void tap_sock_tun_init(struct ctx *c)
  */
 static void tap_sock_update_pool(void *base, size_t size)
 {
-	int i;
+	unsigned int i;
 
-	pool_tap4_storage = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size);
-	pool_tap6_storage = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size);
+	for (i = 0; i < VHOST_USER_MAX_VQS / 2; i++) {
+		pool_tap4_storage[i] = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size);
+		pool_tap4[i] = (struct pool *)&pool_tap4_storage[i];
+		pool_tap6_storage[i] = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size);
+		pool_tap6[i] = (struct pool *)&pool_tap6_storage[i];
+	}
 
 	for (i = 0; i < TAP_SEQS; i++) {
 		tap4_l4[i].p = PACKET_INIT(pool_l4, UIO_MAXIOV, base, size);
diff --git a/tap.h b/tap.h
index d3ac0cb6a233..6d4f8bd156fb 100644
--- a/tap.h
+++ b/tap.h
@@ -119,8 +119,9 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
 int tap_sock_unix_open(char *sock_path);
 void tap_sock_reset(struct ctx *c);
 void tap_backend_init(struct ctx *c);
-void tap_flush_pools(void);
-void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now);
+void tap_flush_pools(unsigned int qpair);
+void tap_handler(struct ctx *c, unsigned int qpair,
+		 const struct timespec *now);
 void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data,
 		    const struct timespec *now);
 #endif /* TAP_H */
diff --git a/vu_common.c b/vu_common.c
index 80d9a30f6f71..8f0fa1180c78 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -170,7 +170,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
 
 	ASSERT(VHOST_USER_IS_QUEUE_TX(index));
 
-	tap_flush_pools();
+	tap_flush_pools(index / 2);
 
 	count = 0;
 	out_sg_count = 0;
@@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
 
 		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
 		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
-			tap_add_packet(vdev->context, 0, &data, now);
+			tap_add_packet(vdev->context, index / 2, &data, now);
 
 		count++;
 	}
-	tap_handler(vdev->context, 0, now);
+	tap_handler(vdev->context, index / 2, now);
 
 	if (count) {
 		int i;
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 6/6] flow: Add queue pair tracking to flow management
  2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
                   ` (4 preceding siblings ...)
  2025-12-03 18:54 ` [PATCH v3 5/6] tap: Convert packet pools to per-queue-pair arrays for multiqueue Laurent Vivier
@ 2025-12-03 18:54 ` Laurent Vivier
  5 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-03 18:54 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

For multiqueue support, we need to ensure packets are routed to the
correct RX queue based on which TX queue they originated from. This
requires tracking the queue pair association for each flow.

Add a qpair field to struct flow_common to store the queue pair number
for each flow (FLOW_QPAIR_INVALID if not assigned). The field uses 5
bits, allowing support for up to 31 queue pairs (index 31 is reserved
for FLOW_QPAIR_INVALID), which we verify is sufficient for
VHOST_USER_MAX_VQS via static assertion.

Introduce flow_qp() to retrieve the queue pair for a flow (returning 0
for NULL flows or flows without a valid assignment), and flow_setqp()
to assign queue pairs. Update all protocol handlers (TCP, UDP, ICMP)
and their tap handlers to accept a qpair parameter and assign it to
flows using FLOW_SETQP().

The implementation updates the queue pair assignment on every packet
received from TX. This follows the virtio specification's requirement
for automatic receive steering: "After the driver transmitted a packet
of a flow on transmitqX, the device SHOULD cause incoming packets for
that flow to be steered to receiveqX." By tracking the most recent TX
queue for each flow, we ensure return traffic is directed to the
corresponding RX queue, maintaining flow affinity across queue pairs.

The vhost-user code now uses FLOW_QP() to select the appropriate RX
queue when sending packets, ensuring they're routed based on the
originating TX queue rather than always using queue 0.

Note that flows initiated from the host side (via sockets, for example
udp_flow_from_sock()) currently default to queue pair 0, as they don't
have an associated incoming queue to derive the assignment from.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 flow.c     | 33 +++++++++++++++++++++++++++++++++
 flow.h     | 17 +++++++++++++++++
 icmp.c     | 23 +++++++++++++----------
 icmp.h     |  4 ++--
 tap.c      |  8 ++++----
 tcp.c      | 33 +++++++++++++++++++--------------
 tcp_vu.c   |  8 +++++---
 udp.c      | 29 ++++++++++++++++-------------
 udp.h      |  2 +-
 udp_flow.c |  8 +++++++-
 udp_flow.h |  2 +-
 udp_vu.c   |  4 +++-
 12 files changed, 121 insertions(+), 50 deletions(-)

diff --git a/flow.c b/flow.c
index 278a9cf0ac6d..fd6c4752ec81 100644
--- a/flow.c
+++ b/flow.c
@@ -405,6 +405,38 @@ void flow_epollid_register(int epollid, int epollfd)
 	epoll_id_to_fd[epollid] = epollfd;
 }
 
+/**
+ * flow_qp() - Get the queue pair for a flow
+ * @f:		Flow to query (may be NULL)
+ *
+ * Return: queue pair number for the flow, or 0 if flow is NULL or has no
+ *         valid queue pair assignment
+ */
+unsigned int flow_qp(const struct flow_common *f)
+{
+	if (f == NULL || f->qpair == FLOW_QPAIR_INVALID)
+		return 0;
+	return f->qpair;
+}
+
+/**
+ * flow_setqp() - Set queue pair assignment for a flow
+ * @f:		Flow to update
+ * @qpair:	Queue pair number to assign
+ */
+void flow_setqp(struct flow_common *f, unsigned int qpair)
+{
+	ASSERT(qpair < FLOW_QPAIR_MAX);
+
+	if (f->qpair == qpair)
+		return;
+
+	flow_trace((union flow *)f, "updating queue pair from %d to %d",
+		   f->qpair, qpair);
+
+	f->qpair = qpair;
+}
+
 /**
  * flow_initiate_() - Move flow to INI, setting pif[INISIDE]
  * @flow:	Flow to change state
@@ -609,6 +641,7 @@ union flow *flow_alloc(void)
 	flow_new_entry = flow;
 	memset(flow, 0, sizeof(*flow));
 	flow_epollid_clear(&flow->f);
+	flow->f.qpair = FLOW_QPAIR_INVALID;
 	flow_set_state(&flow->f, FLOW_STATE_NEW);
 
 	return flow;
diff --git a/flow.h b/flow.h
index b43b0b1dd7f2..a4a1e680227c 100644
--- a/flow.h
+++ b/flow.h
@@ -179,6 +179,8 @@ int flowside_connect(const struct ctx *c, int s,
  * @side[]:	Information for each side of the flow
  * @tap_omac:	MAC address of remote endpoint as seen from the guest
  * @epollid:	epollfd identifier, or EPOLLFD_ID_INVALID
+ * @qpair:	Queue pair number assigned to this flow
+ *		(FLOW_QPAIR_INVALID if not assigned)
  */
 struct flow_common {
 #ifdef __GNUC__
@@ -199,6 +201,8 @@ struct flow_common {
 
 #define EPOLLFD_ID_BITS 8
 	unsigned int	epollid:EPOLLFD_ID_BITS;
+#define FLOW_QPAIR_BITS 5
+	unsigned int	qpair:FLOW_QPAIR_BITS;
 };
 
 #define EPOLLFD_ID_DEFAULT	0
@@ -206,6 +210,12 @@ struct flow_common {
 #define EPOLLFD_ID_MAX		(EPOLLFD_ID_SIZE - 1)
 #define EPOLLFD_ID_INVALID	EPOLLFD_ID_MAX
 
+#define FLOW_QPAIR_NUM		(1 << FLOW_QPAIR_BITS)
+#define FLOW_QPAIR_MAX		(FLOW_QPAIR_NUM - 1)
+#define FLOW_QPAIR_INVALID	FLOW_QPAIR_MAX
+
+static_assert(VHOST_USER_MAX_VQS <= FLOW_QPAIR_MAX * 2);
+
 #define FLOW_INDEX_BITS		17	/* 128k - 1 */
 #define FLOW_MAX		MAX_FROM_BITS(FLOW_INDEX_BITS)
 
@@ -266,6 +276,13 @@ int flow_epollfd(const struct flow_common *f);
 void flow_epollid_set(struct flow_common *f, int epollid);
 void flow_epollid_clear(struct flow_common *f);
 void flow_epollid_register(int epollid, int epollfd);
+unsigned int flow_qp(const struct flow_common *f);
+#define FLOW_QP(flow_)				\
+	(flow_qp(&(flow_)->f))
+void flow_setqp(struct flow_common *f, unsigned int qpair);
+#define FLOW_SETQP(flow_, _qpair)		\
+	(flow_setqp(&(flow_)->f, _qpair))
+
 void flow_defer_handler(const struct ctx *c, const struct timespec *now);
 int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage,
 			      int fd);
diff --git a/icmp.c b/icmp.c
index a9f0518c2f61..744b0ec9edae 100644
--- a/icmp.c
+++ b/icmp.c
@@ -132,13 +132,13 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref)
 		const struct in_addr *daddr = inany_v4(&ini->eaddr);
 
 		ASSERT(saddr && daddr); /* Must have IPv4 addresses */
-		tap_icmp4_send(c, 0, *saddr, *daddr, buf,
+		tap_icmp4_send(c, FLOW_QP(pingf), *saddr, *daddr, buf,
 			       pingf->f.tap_omac, n);
 	} else if (pingf->f.type == FLOW_PING6) {
 		const struct in6_addr *saddr = &ini->oaddr.a6;
 		const struct in6_addr *daddr = &ini->eaddr.a6;
 
-		tap_icmp6_send(c, 0, saddr, daddr, buf,
+		tap_icmp6_send(c, FLOW_QP(pingf), saddr, daddr, buf,
 			       pingf->f.tap_omac, n);
 	}
 	return;
@@ -238,17 +238,18 @@ cancel:
 
 /**
  * icmp_tap_handler() - Handle packets from tap
- * @c:		Execution context
- * @pif:	pif on which the packet is arriving
- * @af:		Address family, AF_INET or AF_INET6
- * @saddr:	Source address
- * @daddr:	Destination address
- * @data:	Single packet with ICMP/ICMPv6 header
- * @now:	Current timestamp
+ * @c:			Execution context
+ * @qpair:		Queue pair
+ * @pif:		pif on which the packet is arriving
+ * @af:			Address family, AF_INET or AF_INET6
+ * @saddr:		Source address
+ * @daddr:		Destination address
+ * @data:		Single packet with ICMP/ICMPv6 header
+ * @now:		Current timestamp
  *
  * Return: count of consumed packets (always 1, even if malformed)
  */
-int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
+int icmp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif, sa_family_t af,
 		     const void *saddr, const void *daddr,
 		     struct iov_tail *data, const struct timespec *now)
 {
@@ -309,6 +310,8 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
 	else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr)))
 		return 1;
 
+	FLOW_SETQP(pingf, qpair);
+
 	tgt = &pingf->f.side[TGTSIDE];
 
 	ASSERT(flow_proto[pingf->f.type] == proto);
diff --git a/icmp.h b/icmp.h
index 1a0e6205f087..7b9982529fd1 100644
--- a/icmp.h
+++ b/icmp.h
@@ -10,8 +10,8 @@ struct ctx;
 struct icmp_ping_flow;
 
 void icmp_sock_handler(const struct ctx *c, union epoll_ref ref);
-int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
-		     const void *saddr, const void *daddr,
+int icmp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
+		     sa_family_t af, const void *saddr, const void *daddr,
 		     struct iov_tail *data, const struct timespec *now);
 void icmp_init(void);
 
diff --git a/tap.c b/tap.c
index c56afb73fd7e..c603b48ea4b5 100644
--- a/tap.c
+++ b/tap.c
@@ -787,7 +787,7 @@ resume:
 
 			tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
 
-			icmp_tap_handler(c, PIF_TAP, AF_INET,
+			icmp_tap_handler(c, qpair, PIF_TAP, AF_INET,
 					 &iph->saddr, &iph->daddr,
 					 &data, now);
 			continue;
@@ -871,7 +871,7 @@ append:
 			if (c->no_udp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += udp_tap_handler(c, PIF_TAP, AF_INET,
+				k += udp_tap_handler(c, qpair, PIF_TAP, AF_INET,
 						     &seq->saddr, &seq->daddr,
 						     seq->ttl, p, k, now);
 		}
@@ -973,7 +973,7 @@ resume:
 
 			tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
 
-			icmp_tap_handler(c, PIF_TAP, AF_INET6,
+			icmp_tap_handler(c, qpair, PIF_TAP, AF_INET6,
 					 saddr, daddr, &data, now);
 			continue;
 		}
@@ -1062,7 +1062,7 @@ append:
 			if (c->no_udp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += udp_tap_handler(c, PIF_TAP, AF_INET6,
+				k += udp_tap_handler(c, qpair, PIF_TAP, AF_INET6,
 						     &seq->saddr, &seq->daddr,
 						     seq->hop_limit, p, k, now);
 		}
diff --git a/tcp.c b/tcp.c
index 4c84f0e621b8..3b7322a655bc 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1497,21 +1497,23 @@ static void tcp_bind_outbound(const struct ctx *c,
 
 /**
  * tcp_conn_from_tap() - Handle connection request (SYN segment) from tap
- * @c:		Execution context
- * @af:		Address family, AF_INET or AF_INET6
- * @saddr:	Source address, pointer to in_addr or in6_addr
- * @daddr:	Destination address, pointer to in_addr or in6_addr
- * @th:		TCP header from tap: caller MUST ensure it's there
- * @opts:	Pointer to start of options
- * @optlen:	Bytes in options: caller MUST ensure available length
- * @now:	Current timestamp
+ * @c:			Execution context
+ * @qpair:		Queue pair for the flow
+ * @af:			Address family, AF_INET or AF_INET6
+ * @saddr:		Source address, pointer to in_addr or in6_addr
+ * @daddr:		Destination address, pointer to in_addr or in6_addr
+ * @th:			TCP header from tap: caller MUST ensure it's there
+ * @opts:		Pointer to start of options
+ * @optlen:		Bytes in options: caller MUST ensure available length
+ * @now:		Current timestamp
  *
  * #syscalls:vu getsockname
  */
-static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
-			      const void *saddr, const void *daddr,
-			      const struct tcphdr *th, const char *opts,
-			      size_t optlen, const struct timespec *now)
+static void tcp_conn_from_tap(const struct ctx *c, unsigned int qpair,
+			      sa_family_t af, const void *saddr,
+			      const void *daddr, const struct tcphdr *th,
+			      const char *opts, size_t optlen,
+			      const struct timespec *now)
 {
 	in_port_t srcport = ntohs(th->source);
 	in_port_t dstport = ntohs(th->dest);
@@ -1623,6 +1625,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
 		conn_event(c, conn, TAP_SYN_ACK_SENT);
 	}
 
+	FLOW_SETQP(conn, qpair);
 	tcp_epoll_ctl(c, conn);
 
 	if (c->mode == MODE_VU) { /* To rebind to same oport after migration */
@@ -2057,7 +2060,6 @@ static void tcp_rst_no_conn(const struct ctx *c, unsigned int qpair, int af,
 /**
  * tcp_tap_handler() - Handle packets from tap and state transitions
  * @c:		Execution context
- * @qpair:	Queue pair on which to send packets
  * @pif:	pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address
@@ -2109,7 +2111,7 @@ int tcp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
 	/* New connection from tap */
 	if (!flow) {
 		if (opts && th->syn && !th->ack)
-			tcp_conn_from_tap(c, af, saddr, daddr, th,
+			tcp_conn_from_tap(c, qpair, af, saddr, daddr, th,
 					  opts, optlen, now);
 		else
 			tcp_rst_no_conn(c, qpair, af, saddr, daddr, flow_lbl, th,
@@ -2121,6 +2123,9 @@ int tcp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
 	ASSERT(pif_at_sidx(sidx) == PIF_TAP);
 	conn = &flow->tcp;
 
+	/* update queue pair */
+	FLOW_SETQP(flow, qpair);
+
 	flow_trace(conn, "packet length %zu from tap", l4len);
 
 	if (th->rst) {
diff --git a/tcp_vu.c b/tcp_vu.c
index 1c81ce376dad..1044491d404c 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -71,14 +71,15 @@ static size_t tcp_vu_hdrlen(bool v6)
 int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 {
 	struct vu_dev *vdev = c->vdev;
-	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
-	size_t optlen, hdrlen;
+	int rx_queue = FLOW_QP(conn) * 2;
+	struct vu_virtq *vq = &vdev->vq[rx_queue];
 	struct vu_virtq_element flags_elem[2];
 	struct ipv6hdr *ip6h = NULL;
 	struct iphdr *ip4h = NULL;
 	struct iovec flags_iov[2];
 	struct tcp_syn_opts *opts;
 	struct iov_tail payload;
+	size_t optlen, hdrlen;
 	struct tcphdr *th;
 	struct ethhdr *eh;
 	uint32_t seq;
@@ -349,7 +350,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 {
 	uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap;
 	struct vu_dev *vdev = c->vdev;
-	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
+	int rx_queue = FLOW_QP(conn) * 2;
+	struct vu_virtq *vq = &vdev->vq[rx_queue];
 	ssize_t len, previous_dlen;
 	int i, iov_cnt, head_cnt;
 	size_t hdrlen, fillsize;
diff --git a/udp.c b/udp.c
index cbc2d7055647..e83daaaaf0d5 100644
--- a/udp.c
+++ b/udp.c
@@ -636,12 +636,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx,
 	if (hdr->cmsg_level == IPPROTO_IP &&
 	    (o4 = inany_v4(&otap)) && inany_v4(&toside->eaddr)) {
 		dlen = MIN(dlen, ICMP4_MAX_DLEN);
-		udp_send_tap_icmp4(c, 0, ee, toside, *o4, data, dlen);
+		udp_send_tap_icmp4(c, FLOW_QP(uflow), ee, toside,
+				   *o4, data, dlen);
 		return 1;
 	}
 
 	if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) {
-		udp_send_tap_icmp6(c, 0, ee, toside, &otap.a6, data, dlen,
+		udp_send_tap_icmp6(c, FLOW_QP(uflow), ee,
+				   toside, &otap.a6, data, dlen,
 				   FLOW_IDX(uflow));
 		return 1;
 	}
@@ -970,21 +972,22 @@ fail:
 
 /**
  * udp_tap_handler() - Handle packets from tap
- * @c:		Execution context
- * @pif:	pif on which the packet is arriving
- * @af:		Address family, AF_INET or AF_INET6
- * @saddr:	Source address
- * @daddr:	Destination address
- * @ttl:	TTL or hop limit for packets to be sent in this call
- * @p:		Pool of UDP packets, with UDP headers
- * @idx:	Index of first packet to process
- * @now:	Current timestamp
+ * @c:			Execution context
+ * @qpair:		Queue pair
+ * @pif:		pif on which the packet is arriving
+ * @af:			Address family, AF_INET or AF_INET6
+ * @saddr:		Source address
+ * @daddr:		Destination address
+ * @ttl:		TTL or hop limit for packets to be sent in this call
+ * @p:			Pool of UDP packets, with UDP headers
+ * @idx:		Index of first packet to process
+ * @now:		Current timestamp
  *
  * Return: count of consumed packets
  *
  * #syscalls sendmmsg
  */
-int udp_tap_handler(const struct ctx *c, uint8_t pif,
+int udp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
 		    sa_family_t af, const void *saddr, const void *daddr,
 		    uint8_t ttl, const struct pool *p, int idx,
 		    const struct timespec *now)
@@ -1018,7 +1021,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
 	src = ntohs(uh->source);
 	dst = ntohs(uh->dest);
 
-	tosidx = udp_flow_from_tap(c, pif, af, saddr, daddr, src, dst, now);
+	tosidx = udp_flow_from_tap(c, qpair, pif, af, saddr, daddr, src, dst, now);
 	if (!(uflow = udp_at_sidx(tosidx))) {
 		char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
 
diff --git a/udp.h b/udp.h
index 7d3cd59d9a42..b2f2dc8b5ac4 100644
--- a/udp.h
+++ b/udp.h
@@ -11,7 +11,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 			     uint32_t events, const struct timespec *now);
 void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now);
-int udp_tap_handler(const struct ctx *c, uint8_t pif,
+int udp_tap_handler(const struct ctx *c, unsigned int qpair, uint8_t pif,
 		    sa_family_t af, const void *saddr, const void *daddr,
 		    uint8_t ttl, const struct pool *p, int idx,
 		    const struct timespec *now);
diff --git a/udp_flow.c b/udp_flow.c
index 8907f2f72741..35014c3692a9 100644
--- a/udp_flow.c
+++ b/udp_flow.c
@@ -266,17 +266,19 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
 /**
  * udp_flow_from_tap() - Find or create UDP flow for tap packets
  * @c:		Execution context
+ * @qpair:	Queue pair for the flow
  * @pif:	pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address on guest side
  * @daddr:	Destination address guest side
  * @srcport:	Source port on guest side
  * @dstport:	Destination port on guest side
+ * @now:	Current timestamp
  *
  * Return: sidx for the destination side of the flow for this packet, or
  *         FLOW_SIDX_NONE if we couldn't find or create a flow.
  */
-flow_sidx_t udp_flow_from_tap(const struct ctx *c,
+flow_sidx_t udp_flow_from_tap(const struct ctx *c, unsigned int qpair,
 			      uint8_t pif, sa_family_t af,
 			      const void *saddr, const void *daddr,
 			      in_port_t srcport, in_port_t dstport,
@@ -293,6 +295,8 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
 			      srcport, dstport);
 	if ((uflow = udp_at_sidx(sidx))) {
 		uflow->ts = now->tv_sec;
+		/* update qpair */
+		FLOW_SETQP(uflow, qpair);
 		return flow_sidx_opposite(sidx);
 	}
 
@@ -316,6 +320,8 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
 		return FLOW_SIDX_NONE;
 	}
 
+	FLOW_SETQP(flow, qpair);
+
 	return udp_flow_new(c, flow, now);
 }
 
diff --git a/udp_flow.h b/udp_flow.h
index 4c528e95ca66..03e6ecdcbaf2 100644
--- a/udp_flow.h
+++ b/udp_flow.h
@@ -36,7 +36,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
 			       const union inany_addr *dst, in_port_t port,
 			       const union sockaddr_inany *s_in,
 			       const struct timespec *now);
-flow_sidx_t udp_flow_from_tap(const struct ctx *c,
+flow_sidx_t udp_flow_from_tap(const struct ctx *c, unsigned int qpair,
 			      uint8_t pif, sa_family_t af,
 			      const void *saddr, const void *daddr,
 			      in_port_t srcport, in_port_t dstport,
diff --git a/udp_vu.c b/udp_vu.c
index 099677f914e7..f3cf97393d0a 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -202,9 +202,11 @@ static void udp_vu_csum(const struct flowside *toside, int iov_used)
 void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 {
 	const struct flowside *toside = flowside_at_sidx(tosidx);
+	const struct udp_flow *uflow = udp_at_sidx(tosidx);
 	bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr));
 	struct vu_dev *vdev = c->vdev;
-	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
+	int rx_queue = FLOW_QP(uflow) * 2;
+	struct vu_virtq *vq = &vdev->vq[rx_queue];
 	int i;
 
 	for (i = 0; i < n; i++) {
-- 
2.51.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler()
  2025-12-03 18:54 ` [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler() Laurent Vivier
@ 2025-12-05  4:14   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2025-12-05  4:14 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5392 bytes --]

On Wed, Dec 03, 2025 at 07:54:29PM +0100, Laurent Vivier wrote:
> These handlers only ever operate on their respective global pools
> (pool_tap4 and pool_tap6). The pool parameter was always passed the
> same value, making it unnecessary indirection.
> 
> Access the global pools directly instead, simplifying the function
> signatures.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c | 46 +++++++++++++++++++++-------------------------
>  1 file changed, 21 insertions(+), 25 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 44b06448757a..2cda8c9772b8 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -696,23 +696,21 @@ static bool tap4_is_fragment(const struct iphdr *iph,
>  /**
>   * tap4_handler() - IPv4 and ARP packet handler for tap file descriptor
>   * @c:		Execution context
> - * @in:		Ingress packet pool, packets with Ethernet headers
>   * @now:	Current timestamp
>   *
>   * Return: count of packets consumed by handlers
>   */
> -static int tap4_handler(struct ctx *c, const struct pool *in,
> -			const struct timespec *now)
> +static int tap4_handler(struct ctx *c, const struct timespec *now)
>  {
>  	unsigned int i, j, seq_count;
>  	struct tap4_l4_t *seq;
>  
> -	if (!c->ifi4 || !in->count)
> -		return in->count;
> +	if (!c->ifi4 || !pool_tap4->count)
> +		return pool_tap4->count;
>  
>  	i = 0;
>  resume:
> -	for (seq_count = 0, seq = NULL; i < in->count; i++) {
> +	for (seq_count = 0, seq = NULL; i < pool_tap4->count; i++) {
>  		size_t l3len, hlen, l4len;
>  		struct ethhdr eh_storage;
>  		struct iphdr iph_storage;
> @@ -722,7 +720,7 @@ resume:
>  		struct iov_tail data;
>  		struct iphdr *iph;
>  
> -		if (!packet_get(in, i, &data))
> +		if (!packet_get(pool_tap4, i, &data))
>  			continue;
>  
>  		eh = IOV_PEEK_HEADER(&data, eh_storage);
> @@ -789,7 +787,7 @@ resume:
>  		if (iph->protocol == IPPROTO_UDP) {
>  			struct iov_tail eh_data;
>  
> -			packet_get(in, i, &eh_data);
> +			packet_get(pool_tap4, i, &eh_data);
>  			if (dhcp(c, &eh_data))
>  				continue;
>  		}
> @@ -820,7 +818,7 @@ resume:
>  			goto append;
>  
>  		if (seq_count == TAP_SEQS)
> -			break;	/* Resume after flushing if i < in->count */
> +			break;	/* Resume after flushing if i < pool_tap4->count */
>  
>  		for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) {
>  			if (L4_MATCH(iph, uh, seq)) {
> @@ -866,32 +864,30 @@ append:
>  		}
>  	}
>  
> -	if (i < in->count)
> +	if (i < pool_tap4->count)
>  		goto resume;
>  
> -	return in->count;
> +	return pool_tap4->count;
>  }
>  
>  /**
>   * tap6_handler() - IPv6 packet handler for tap file descriptor
>   * @c:		Execution context
> - * @in:		Ingress packet pool, packets with Ethernet headers
>   * @now:	Current timestamp
>   *
>   * Return: count of packets consumed by handlers
>   */
> -static int tap6_handler(struct ctx *c, const struct pool *in,
> -			const struct timespec *now)
> +static int tap6_handler(struct ctx *c, const struct timespec *now)
>  {
>  	unsigned int i, j, seq_count = 0;
>  	struct tap6_l4_t *seq;
>  
> -	if (!c->ifi6 || !in->count)
> -		return in->count;
> +	if (!c->ifi6 || !pool_tap6->count)
> +		return pool_tap6->count;
>  
>  	i = 0;
>  resume:
> -	for (seq_count = 0, seq = NULL; i < in->count; i++) {
> +	for (seq_count = 0, seq = NULL; i < pool_tap6->count; i++) {
>  		size_t l4len, plen, check;
>  		struct in6_addr *saddr, *daddr;
>  		struct ipv6hdr ip6h_storage;
> @@ -903,7 +899,7 @@ resume:
>  		struct ipv6hdr *ip6h;
>  		uint8_t proto;
>  
> -		if (!packet_get(in, i, &data))
> +		if (!packet_get(pool_tap6, i, &data))
>  			return -1;
>  
>  		eh = IOV_REMOVE_HEADER(&data, eh_storage);
> @@ -1011,7 +1007,7 @@ resume:
>  			goto append;
>  
>  		if (seq_count == TAP_SEQS)
> -			break;	/* Resume after flushing if i < in->count */
> +			break;	/* Resume after flushing if i < pool_tap6->count */
>  
>  		for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) {
>  			if (L4_MATCH(ip6h, proto, uh, seq)) {
> @@ -1058,10 +1054,10 @@ append:
>  		}
>  	}
>  
> -	if (i < in->count)
> +	if (i < pool_tap6->count)
>  		goto resume;
>  
> -	return in->count;
> +	return pool_tap6->count;
>  }
>  
>  /**
> @@ -1080,8 +1076,8 @@ void tap_flush_pools(void)
>   */
>  void tap_handler(struct ctx *c, const struct timespec *now)
>  {
> -	tap4_handler(c, pool_tap4, now);
> -	tap6_handler(c, pool_tap6, now);
> +	tap4_handler(c, now);
> +	tap6_handler(c, now);
>  }
>  
>  /**
> @@ -1115,14 +1111,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
>  	case ETH_P_ARP:
>  	case ETH_P_IP:
>  		if (!pool_can_fit(pool_tap4, data)) {
> -			tap4_handler(c, pool_tap4, now);
> +			tap4_handler(c, now);
>  			pool_flush(pool_tap4);
>  		}
>  		packet_add(pool_tap4, data);
>  		break;
>  	case ETH_P_IPV6:
>  		if (!pool_can_fit(pool_tap6, data)) {
> -			tap6_handler(c, pool_tap6, now);
> +			tap6_handler(c, now);
>  			pool_flush(pool_tap6);
>  		}
>  		packet_add(pool_tap6, data);
> -- 
> 2.51.1
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] vhost-user: Enable multiqueue
  2025-12-03 18:54 ` [PATCH v3 2/6] vhost-user: Enable multiqueue Laurent Vivier
@ 2025-12-10  0:04   ` David Gibson
  2025-12-11  7:01   ` Stefano Brivio
  1 sibling, 0 replies; 18+ messages in thread
From: David Gibson @ 2025-12-10  0:04 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3997 bytes --]

On Wed, Dec 03, 2025 at 07:54:30PM +0100, Laurent Vivier wrote:
> Advertise multi-queue support in vhost-user by setting VIRTIO_NET_F_MQ
> and VHOST_USER_PROTOCOL_F_MQ feature flags, and increase
> VHOST_USER_MAX_VQS from 2 to 32, supporting up to 16 queue pairs.
> 
> Currently, only the first RX queue (queue 0) is used for receiving
> packets. The guest kernel selects which TX queue to use for
> transmission. Full multi-RX queue load balancing will be implemented in
> future work.
> 
> Update the QEMU usage hint to show the required parameters for enabling
> multiqueue: queues parameter on the netdev, and mq=true on the
> virtio-net device.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c        |  7 +++++--
>  vhost_user.c | 10 ++++++----
>  virtio.h     |  2 +-
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 2cda8c9772b8..591b49491aa3 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1314,8 +1314,11 @@ static void tap_backend_show_hints(struct ctx *c)
>  		break;
>  	case MODE_VU:
>  		info("You can start qemu with:");
> -		info("    kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n",
> -		     c->sock_path);
> +		info("    kvm ... -chardev socket,id=chr0,path=%s "
> +		     "-netdev vhost-user,id=netdev0,chardev=chr0,queues=$QUEUES "
> +		     "-device virtio-net,netdev=netdev0,mq=true "
> +		     "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE "
> +		     "-numa node,memdev=memfd0\n", c->sock_path);
>  		break;
>  	}
>  }
> diff --git a/vhost_user.c b/vhost_user.c
> index aa7c869d9e56..845fdb551c84 100644
> --- a/vhost_user.c
> +++ b/vhost_user.c
> @@ -323,6 +323,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev,
>  	uint64_t features =
>  		1ULL << VIRTIO_F_VERSION_1 |
>  		1ULL << VIRTIO_NET_F_MRG_RXBUF |
> +		1ULL << VIRTIO_NET_F_MQ |
>  		1ULL << VHOST_F_LOG_ALL |
>  		1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
>  
> @@ -767,7 +768,8 @@ static void vu_check_queue_msg_file(struct vhost_user_msg *vmsg)
>  	int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>  
>  	if (idx >= VHOST_USER_MAX_VQS)
> -		die("Invalid vhost-user queue index: %u", idx);
> +		die("Invalid vhost-user queue index: %u (maximum %u)", idx,
> +		    VHOST_USER_MAX_VQS);
>  
>  	if (nofd) {
>  		vmsg_close_fds(vmsg);
> @@ -896,7 +898,8 @@ static bool vu_get_protocol_features_exec(struct vu_dev *vdev,
>  	uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
>  			    1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
>  			    1ULL << VHOST_USER_PROTOCOL_F_DEVICE_STATE |
> -			    1ULL << VHOST_USER_PROTOCOL_F_RARP;
> +			    1ULL << VHOST_USER_PROTOCOL_F_RARP |
> +			    1ULL << VHOST_USER_PROTOCOL_F_MQ;
>  
>  	(void)vdev;
>  	vmsg_set_reply_u64(vmsg, features);
> @@ -935,10 +938,9 @@ static bool vu_get_queue_num_exec(struct vu_dev *vdev,
>  {
>  	(void)vdev;
>  
> -	/* NOLINTNEXTLINE(misc-redundant-expression) */
>  	vmsg_set_reply_u64(vmsg, VHOST_USER_MAX_VQS / 2);
>  
> -	debug("VHOST_USER_MAX_VQS  %u", VHOST_USER_MAX_VQS / 2);
> +	debug("queue num  %u", VHOST_USER_MAX_VQS / 2);
>  
>  	return true;
>  }
> diff --git a/virtio.h b/virtio.h
> index 12caaa0b6def..176c935cecc7 100644
> --- a/virtio.h
> +++ b/virtio.h
> @@ -88,7 +88,7 @@ struct vu_dev_region {
>  	uint64_t mmap_addr;
>  };
>  
> -#define VHOST_USER_MAX_VQS 2
> +#define VHOST_USER_MAX_VQS 32
>  
>  /*
>   * Set a reasonable maximum number of ram slots, which will be supported by
> -- 
> 2.51.1
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure
  2025-12-03 18:54 ` [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure Laurent Vivier
@ 2025-12-10  0:05   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2025-12-10  0:05 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5983 bytes --]

On Wed, Dec 03, 2025 at 07:54:31PM +0100, Laurent Vivier wrote:
> With the recent addition of multiqueue support to passt's vhost-user
> implementation, we need test coverage to validate the functionality. The
> test infrastructure previously only tested single queue configurations.
> 
> Add a VHOST_USER_MQ environment variable to control the number of queue
> pairs. The queues parameter on the netdev is always set to this value
> (defaulting to 1 for single queue). When set to values greater than 1,
> the setup scripts add mq=true to the virtio-net device for enabling
> multiqueue support.
> 
> The test suite now runs an additional set of tests with 8 queue pairs to
> exercise the multiqueue paths across all protocols (TCP, UDP, ICMP) and
> services (DHCP, NDP). Note that the guest kernel will only enable as many
> queues as there are vCPUs.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  test/lib/setup | 21 +++++++++++++--------
>  test/run       | 23 +++++++++++++++++++++++
>  2 files changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/test/lib/setup b/test/lib/setup
> index 5994598744a3..3872a02b109b 100755
> --- a/test/lib/setup
> +++ b/test/lib/setup
> @@ -18,6 +18,8 @@ VCPUS="$( [ $(nproc) -ge 8 ] && echo 6 || echo $(( $(nproc) / 2 + 1 )) )"
>  MEM_KIB="$(sed -n 's/MemTotal:[ ]*\([0-9]*\) kB/\1/p' /proc/meminfo)"
>  QEMU_ARCH="$(uname -m)"
>  [ "${QEMU_ARCH}" = "i686" ] && QEMU_ARCH=i386
> +VHOST_USER=0
> +VHOST_USER_MQ=1
>  
>  # setup_build() - Set up pane layout for build tests
>  setup_build() {
> @@ -46,6 +48,7 @@ setup_passt() {
>  	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
>  	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
>  	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
> +	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
>  
>  	context_run passt "make clean"
>  	context_run passt "make valgrind"
> @@ -59,8 +62,8 @@ setup_passt() {
>  		__vmem="$(((${__vmem} + 500) / 1000))G"
>  		__qemu_netdev="						       \
>  			-chardev socket,id=c,path=${STATESETUP}/passt.socket   \
> -			-netdev vhost-user,id=v,chardev=c		       \
> -			-device virtio-net,netdev=v			       \
> +			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
> +			-device virtio-net,netdev=v${__virtio_opts}            \
>  			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
>  			-numa node,memdev=m"
>  	else
> @@ -156,6 +159,7 @@ setup_passt_in_ns() {
>  	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
>  	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
>  	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
> +	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
>  
>  	if [ ${VALGRIND} -eq 1 ]; then
>  		context_run passt "make clean"
> @@ -173,8 +177,8 @@ setup_passt_in_ns() {
>  		__vmem="$(((${__vmem} + 500) / 1000))G"
>  		__qemu_netdev="						       \
>  			-chardev socket,id=c,path=${STATESETUP}/passt.socket   \
> -			-netdev vhost-user,id=v,chardev=c		       \
> -			-device virtio-net,netdev=v			       \
> +			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
> +			-device virtio-net,netdev=v${__virtio_opts}            \
>  			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
>  			-numa node,memdev=m"
>  	else
> @@ -251,6 +255,7 @@ setup_two_guests() {
>  	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
>  	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
>  	[ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user"
> +	[ ${VHOST_USER_MQ} -gt 1 ] && __virtio_opts="${__virtio_opts},mq=true"
>  
>  	context_run_bg passt_2 "./passt -s ${STATESETUP}/passt_2.socket -P ${STATESETUP}/passt_2.pid -f ${__opts} --hostname hostname2 --fqdn fqdn2 -t 10004 -u 10004"
>  	wait_for [ -f "${STATESETUP}/passt_2.pid" ]
> @@ -260,14 +265,14 @@ setup_two_guests() {
>  		__vmem="$(((${__vmem} + 500) / 1000))G"
>  		__qemu_netdev1="					       \
>  			-chardev socket,id=c,path=${STATESETUP}/passt_1.socket \
> -			-netdev vhost-user,id=v,chardev=c		       \
> -			-device virtio-net,netdev=v			       \
> +			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
> +			-device virtio-net,netdev=v${__virtio_opts}            \
>  			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
>  			-numa node,memdev=m"
>  		__qemu_netdev2="					       \
>  			-chardev socket,id=c,path=${STATESETUP}/passt_2.socket \
> -			-netdev vhost-user,id=v,chardev=c		       \
> -			-device virtio-net,netdev=v			       \
> +			-netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ} \
> +			-device virtio-net,netdev=v${__virtio_opts}            \
>  			-object memory-backend-memfd,id=m,share=on,size=${__vmem} \
>  			-numa node,memdev=m"
>  	else
> diff --git a/test/run b/test/run
> index f858e5586847..652cc12b1234 100755
> --- a/test/run
> +++ b/test/run
> @@ -190,6 +190,29 @@ run() {
>  	test passt_vu_in_ns/shutdown
>  	teardown passt_in_ns
>  
> +	VHOST_USER=1
> +	VHOST_USER_MQ=8
> +	setup passt_in_ns
> +	test passt_vu/ndp
> +	test passt_vu_in_ns/dhcp
> +	test passt_vu_in_ns/icmp
> +	test passt_vu_in_ns/tcp
> +	test passt_vu_in_ns/udp
> +	test passt_vu_in_ns/shutdown
> +	teardown passt_in_ns
> +
> +	setup two_guests
> +	test two_guests_vu/basic
> +	teardown two_guests
> +
> +	setup passt_in_ns
> +	test passt_vu/ndp
> +	test passt_vu_in_ns/dhcp
> +	test perf/passt_vu_tcp
> +	test perf/passt_vu_udp
> +	test passt_vu_in_ns/shutdown
> +	teardown passt_in_ns
> +
>  	# TODO: Make those faster by at least pre-installing gcc and make on
>  	# non-x86 images, then re-enable.
>  skip_distro() {
> -- 
> 2.51.1
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] vhost-user: Enable multiqueue
  2025-12-03 18:54 ` [PATCH v3 2/6] vhost-user: Enable multiqueue Laurent Vivier
  2025-12-10  0:04   ` David Gibson
@ 2025-12-11  7:01   ` Stefano Brivio
  2025-12-11  8:29     ` Laurent Vivier
  1 sibling, 1 reply; 18+ messages in thread
From: Stefano Brivio @ 2025-12-11  7:01 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Wed,  3 Dec 2025 19:54:30 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> Advertise multi-queue support in vhost-user by setting VIRTIO_NET_F_MQ
> and VHOST_USER_PROTOCOL_F_MQ feature flags, and increase
> VHOST_USER_MAX_VQS from 2 to 32, supporting up to 16 queue pairs.
> 
> Currently, only the first RX queue (queue 0) is used for receiving
> packets. The guest kernel selects which TX queue to use for
> transmission. Full multi-RX queue load balancing will be implemented in
> future work.
> 
> Update the QEMU usage hint to show the required parameters for enabling
> multiqueue: queues parameter on the netdev, and mq=true on the
> virtio-net device.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  tap.c        |  7 +++++--
>  vhost_user.c | 10 ++++++----
>  virtio.h     |  2 +-
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 2cda8c9772b8..591b49491aa3 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1314,8 +1314,11 @@ static void tap_backend_show_hints(struct ctx *c)
>  		break;
>  	case MODE_VU:
>  		info("You can start qemu with:");
> -		info("    kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n",
> -		     c->sock_path);
> +		info("    kvm ... -chardev socket,id=chr0,path=%s "
> +		     "-netdev vhost-user,id=netdev0,chardev=chr0,queues=$QUEUES "
> +		     "-device virtio-net,netdev=netdev0,mq=true "
> +		     "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE "
> +		     "-numa node,memdev=memfd0\n", c->sock_path);
>  		break;
>  	}
>  }
> diff --git a/vhost_user.c b/vhost_user.c
> index aa7c869d9e56..845fdb551c84 100644
> --- a/vhost_user.c
> +++ b/vhost_user.c
> @@ -323,6 +323,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev,
>  	uint64_t features =
>  		1ULL << VIRTIO_F_VERSION_1 |
>  		1ULL << VIRTIO_NET_F_MRG_RXBUF |
> +		1ULL << VIRTIO_NET_F_MQ |
>  		1ULL << VHOST_F_LOG_ALL |
>  		1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
>  
> @@ -767,7 +768,8 @@ static void vu_check_queue_msg_file(struct vhost_user_msg *vmsg)
>  	int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>  
>  	if (idx >= VHOST_USER_MAX_VQS)
> -		die("Invalid vhost-user queue index: %u", idx);
> +		die("Invalid vhost-user queue index: %u (maximum %u)", idx,
> +		    VHOST_USER_MAX_VQS);
>  
>  	if (nofd) {
>  		vmsg_close_fds(vmsg);
> @@ -896,7 +898,8 @@ static bool vu_get_protocol_features_exec(struct vu_dev *vdev,
>  	uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
>  			    1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
>  			    1ULL << VHOST_USER_PROTOCOL_F_DEVICE_STATE |
> -			    1ULL << VHOST_USER_PROTOCOL_F_RARP;
> +			    1ULL << VHOST_USER_PROTOCOL_F_RARP |
> +			    1ULL << VHOST_USER_PROTOCOL_F_MQ;
>  
>  	(void)vdev;
>  	vmsg_set_reply_u64(vmsg, features);
> @@ -935,10 +938,9 @@ static bool vu_get_queue_num_exec(struct vu_dev *vdev,
>  {
>  	(void)vdev;
>  
> -	/* NOLINTNEXTLINE(misc-redundant-expression) */
>  	vmsg_set_reply_u64(vmsg, VHOST_USER_MAX_VQS / 2);
>  
> -	debug("VHOST_USER_MAX_VQS  %u", VHOST_USER_MAX_VQS / 2);
> +	debug("queue num  %u", VHOST_USER_MAX_VQS / 2);

Nit, if you respin: this "queue num  %u" message doesn't carry any
context at all. Actually, why is it needed? It's defined at build time
anyway.

If it's needed maybe "Using up to %u vhost-user queue pairs"?

>  
>  	return true;
>  }
> diff --git a/virtio.h b/virtio.h
> index 12caaa0b6def..176c935cecc7 100644
> --- a/virtio.h
> +++ b/virtio.h
> @@ -88,7 +88,7 @@ struct vu_dev_region {
>  	uint64_t mmap_addr;
>  };
>  
> -#define VHOST_USER_MAX_VQS 2
> +#define VHOST_USER_MAX_VQS 32
>  
>  /*
>   * Set a reasonable maximum number of ram slots, which will be supported by

-- 
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-03 18:54 ` [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack Laurent Vivier
@ 2025-12-11  7:01   ` Stefano Brivio
  2025-12-11  8:48     ` Laurent Vivier
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Brivio @ 2025-12-11  7:01 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Wed,  3 Dec 2025 19:54:32 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> diff --git a/vu_common.c b/vu_common.c
> index b13b7c308fd8..80d9a30f6f71 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
>  
>  		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
>  		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
> -			tap_add_packet(vdev->context, &data, now);
> +			tap_add_packet(vdev->context, 0, &data, now);
>  
>  		count++;
>  	}
> -	tap_handler(vdev->context, now);
> +	tap_handler(vdev->context, 0, now);
>  
>  	if (count) {
>  		int i;
> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>  }
>  
>  /**
> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
>   * @c:		execution context
> + * @qpair:	Queue pair on which to send the buffer
>   * @buf:	address of the buffer
>   * @size:	size of the buffer
>   *
>   * Return: number of bytes sent, -1 if there is an error
>   */
> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
>  {
>  	struct vu_dev *vdev = c->vdev;
> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>  	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
>  	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
> +	struct vu_virtq *vq;
>  	size_t total;
>  	int elem_cnt;
>  	int i;
>  
> +	vq = &vdev->vq[qpair << 1];

<< 1 instead of * 2 is a bit surprising here, for a few seconds I
thought you swapped qpair and 1.

Then I started thinking that somebody is likely to mix up (probably not
you) indices of RX and TX queues at some point. So... what about some
macros, say (let's see if I got it right this time):

#define VHOST_SEND_QUEUE(pair)	((pair) * 2)
#define VHOST_RECV_QUEUE(pair)	(pair)

and:

#define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)

...are they correct? A short description or "Theory of operation"
section somewhere with a recap of how queue indices are used would be
nice to have.

And maybe also something explaining that 0 that's now appearing in
argument lists:

#define VHOST_NO_QUEUE		0

?

> +
>  	trace("vu_send_single size %zu", size);
>  
>  	if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
> diff --git a/vu_common.h b/vu_common.h
> index f538f237790b..9ceb8034a9a5 100644
> --- a/vu_common.h
> +++ b/vu_common.h
> @@ -56,6 +56,7 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
>  	      struct vu_virtq_element *elem, int elem_cnt);
>  void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>  		const struct timespec *now);
> -int vu_send_single(const struct ctx *c, const void *buf, size_t size);
> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf,
> +		   size_t size);
>  
>  #endif /* VU_COMMON_H */

I'm still reviewing the rest, currently at 5/6.

-- 
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] vhost-user: Enable multiqueue
  2025-12-11  7:01   ` Stefano Brivio
@ 2025-12-11  8:29     ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-11  8:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 12/11/25 08:01, Stefano Brivio wrote:
> On Wed,  3 Dec 2025 19:54:30 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> Advertise multi-queue support in vhost-user by setting VIRTIO_NET_F_MQ
>> and VHOST_USER_PROTOCOL_F_MQ feature flags, and increase
>> VHOST_USER_MAX_VQS from 2 to 32, supporting up to 16 queue pairs.
>>
>> Currently, only the first RX queue (queue 0) is used for receiving
>> packets. The guest kernel selects which TX queue to use for
>> transmission. Full multi-RX queue load balancing will be implemented in
>> future work.
>>
>> Update the QEMU usage hint to show the required parameters for enabling
>> multiqueue: queues parameter on the netdev, and mq=true on the
>> virtio-net device.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>>   tap.c        |  7 +++++--
>>   vhost_user.c | 10 ++++++----
>>   virtio.h     |  2 +-
>>   3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/tap.c b/tap.c
>> index 2cda8c9772b8..591b49491aa3 100644
>> --- a/tap.c
>> +++ b/tap.c
>> @@ -1314,8 +1314,11 @@ static void tap_backend_show_hints(struct ctx *c)
>>   		break;
>>   	case MODE_VU:
>>   		info("You can start qemu with:");
>> -		info("    kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n",
>> -		     c->sock_path);
>> +		info("    kvm ... -chardev socket,id=chr0,path=%s "
>> +		     "-netdev vhost-user,id=netdev0,chardev=chr0,queues=$QUEUES "
>> +		     "-device virtio-net,netdev=netdev0,mq=true "
>> +		     "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE "
>> +		     "-numa node,memdev=memfd0\n", c->sock_path);
>>   		break;
>>   	}
>>   }
>> diff --git a/vhost_user.c b/vhost_user.c
>> index aa7c869d9e56..845fdb551c84 100644
>> --- a/vhost_user.c
>> +++ b/vhost_user.c
>> @@ -323,6 +323,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev,
>>   	uint64_t features =
>>   		1ULL << VIRTIO_F_VERSION_1 |
>>   		1ULL << VIRTIO_NET_F_MRG_RXBUF |
>> +		1ULL << VIRTIO_NET_F_MQ |
>>   		1ULL << VHOST_F_LOG_ALL |
>>   		1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
>>   
>> @@ -767,7 +768,8 @@ static void vu_check_queue_msg_file(struct vhost_user_msg *vmsg)
>>   	int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>>   
>>   	if (idx >= VHOST_USER_MAX_VQS)
>> -		die("Invalid vhost-user queue index: %u", idx);
>> +		die("Invalid vhost-user queue index: %u (maximum %u)", idx,
>> +		    VHOST_USER_MAX_VQS);
>>   
>>   	if (nofd) {
>>   		vmsg_close_fds(vmsg);
>> @@ -896,7 +898,8 @@ static bool vu_get_protocol_features_exec(struct vu_dev *vdev,
>>   	uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK |
>>   			    1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
>>   			    1ULL << VHOST_USER_PROTOCOL_F_DEVICE_STATE |
>> -			    1ULL << VHOST_USER_PROTOCOL_F_RARP;
>> +			    1ULL << VHOST_USER_PROTOCOL_F_RARP |
>> +			    1ULL << VHOST_USER_PROTOCOL_F_MQ;
>>   
>>   	(void)vdev;
>>   	vmsg_set_reply_u64(vmsg, features);
>> @@ -935,10 +938,9 @@ static bool vu_get_queue_num_exec(struct vu_dev *vdev,
>>   {
>>   	(void)vdev;
>>   
>> -	/* NOLINTNEXTLINE(misc-redundant-expression) */
>>   	vmsg_set_reply_u64(vmsg, VHOST_USER_MAX_VQS / 2);
>>   
>> -	debug("VHOST_USER_MAX_VQS  %u", VHOST_USER_MAX_VQS / 2);
>> +	debug("queue num  %u", VHOST_USER_MAX_VQS / 2);
> 
> Nit, if you respin: this "queue num  %u" message doesn't carry any
> context at all. Actually, why is it needed? It's defined at build time
> anyway.
> 
> If it's needed maybe "Using up to %u vhost-user queue pairs"?

I agree, and I was planning to update it but missed the change before posting.

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-11  7:01   ` Stefano Brivio
@ 2025-12-11  8:48     ` Laurent Vivier
  2025-12-11 12:16       ` Stefano Brivio
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2025-12-11  8:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 12/11/25 08:01, Stefano Brivio wrote:
> On Wed,  3 Dec 2025 19:54:32 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> diff --git a/vu_common.c b/vu_common.c
>> index b13b7c308fd8..80d9a30f6f71 100644
>> --- a/vu_common.c
>> +++ b/vu_common.c
>> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
>>   
>>   		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
>>   		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
>> -			tap_add_packet(vdev->context, &data, now);
>> +			tap_add_packet(vdev->context, 0, &data, now);
>>   
>>   		count++;
>>   	}
>> -	tap_handler(vdev->context, now);
>> +	tap_handler(vdev->context, 0, now);
>>   
>>   	if (count) {
>>   		int i;
>> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>>   }
>>   
>>   /**
>> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
>> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
>>    * @c:		execution context
>> + * @qpair:	Queue pair on which to send the buffer
>>    * @buf:	address of the buffer
>>    * @size:	size of the buffer
>>    *
>>    * Return: number of bytes sent, -1 if there is an error
>>    */
>> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
>> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
>>   {
>>   	struct vu_dev *vdev = c->vdev;
>> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>>   	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
>>   	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
>> +	struct vu_virtq *vq;
>>   	size_t total;
>>   	int elem_cnt;
>>   	int i;
>>   
>> +	vq = &vdev->vq[qpair << 1];
> 
> << 1 instead of * 2 is a bit surprising here, for a few seconds I
> thought you swapped qpair and 1.
> 
> Then I started thinking that somebody is likely to mix up (probably not
> you) indices of RX and TX queues at some point. So... what about some
> macros, say (let's see if I got it right this time):
> 
> #define VHOST_SEND_QUEUE(pair)	((pair) * 2)
> #define VHOST_RECV_QUEUE(pair)	(pair)

I will. David had the same comment. TX and RX are from the point of view of guest, it's 
not obvious when we read passt code.

I would prefer as David proposed to use, i.e. FROMGUEST and TOGUEST:

#define VHOST_FROM_GUEST(qpair) ((qpair) * 2 + 1)
#define VHOST_TO_GUEST(qpair)   ((qpair) * 2)>
> and:
> 
> #define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)

I don't undestand the purpose of this one.

> 
> ...are they correct? A short description or "Theory of operation"
> section somewhere with a recap of how queue indices are used would be
> nice to have.
> 
> And maybe also something explaining that 0 that's now appearing in
> argument lists:
> 
> #define VHOST_NO_QUEUE		0

It's not really NO_QUEUE, it's default queue pair, the queue pair 0

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-11  8:48     ` Laurent Vivier
@ 2025-12-11 12:16       ` Stefano Brivio
  2025-12-11 13:26         ` Laurent Vivier
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Brivio @ 2025-12-11 12:16 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Thu, 11 Dec 2025 09:48:42 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 12/11/25 08:01, Stefano Brivio wrote:
> > On Wed,  3 Dec 2025 19:54:32 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> >> diff --git a/vu_common.c b/vu_common.c
> >> index b13b7c308fd8..80d9a30f6f71 100644
> >> --- a/vu_common.c
> >> +++ b/vu_common.c
> >> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> >>   
> >>   		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
> >>   		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
> >> -			tap_add_packet(vdev->context, &data, now);
> >> +			tap_add_packet(vdev->context, 0, &data, now);
> >>   
> >>   		count++;
> >>   	}
> >> -	tap_handler(vdev->context, now);
> >> +	tap_handler(vdev->context, 0, now);
> >>   
> >>   	if (count) {
> >>   		int i;
> >> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
> >>   }
> >>   
> >>   /**
> >> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
> >> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
> >>    * @c:		execution context
> >> + * @qpair:	Queue pair on which to send the buffer
> >>    * @buf:	address of the buffer
> >>    * @size:	size of the buffer
> >>    *
> >>    * Return: number of bytes sent, -1 if there is an error
> >>    */
> >> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
> >> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
> >>   {
> >>   	struct vu_dev *vdev = c->vdev;
> >> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
> >>   	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> >>   	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
> >> +	struct vu_virtq *vq;
> >>   	size_t total;
> >>   	int elem_cnt;
> >>   	int i;
> >>   
> >> +	vq = &vdev->vq[qpair << 1];  
> > 
> > << 1 instead of * 2 is a bit surprising here, for a few seconds I
> > thought you swapped qpair and 1.
> > 
> > Then I started thinking that somebody is likely to mix up (probably not
> > you) indices of RX and TX queues at some point. So... what about some
> > macros, say (let's see if I got it right this time):
> > 
> > #define VHOST_SEND_QUEUE(pair)	((pair) * 2)
> > #define VHOST_RECV_QUEUE(pair)	(pair)  
> 
> I will. David had the same comment.

Uh, wait, I must have missed it. Do you have a Message-ID? I'm afraid I
must have missed some emails here but I don't see them in archives
either...

> TX and RX are from the point of view of guest, it's 
> not obvious when we read passt code.

Right, yes, for me neither, I always get confused. That's why I thought
we could make the RX vhost-user queue become "SEND" in passt's code,
but:

> I would prefer as David proposed to use, i.e. FROMGUEST and TOGUEST:
> 
> #define VHOST_FROM_GUEST(qpair) ((qpair) * 2 + 1)
> #define VHOST_TO_GUEST(qpair)   ((qpair) * 2)

...this is even clearer. It misses the QUEUE though. Does
VHOST_QUEUE_{FROM,TO}_GUEST fit where you use it? Otherwise I guess VQ
together with FROM / TO should be clear enough.

> > and:
> > 
> > #define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)  
> 
> I don't undestand the purpose of this one.

To get the pair number from a queue number. You're doing something like
that (I guess?) in 5/6, vu_handle_tx():

+	tap_flush_pools(index / 2);

+			tap_add_packet(vdev->context, index / 2, &data, now);

+	tap_handler(vdev->context, index / 2, now);

but now that I see your definition for VHOST_FROM_GUEST() above, and
that the purpose wasn't clear to you, I guess it should be:

#define VHOST_PAIR_FROM_QUEUE(q) (((q) % 2) ? ((q) - 1 / 2) : ((q) / 2))

...or maybe it's not needed? I'm not sure.

> > 
> > ...are they correct? A short description or "Theory of operation"
> > section somewhere with a recap of how queue indices are used would be
> > nice to have.
> > 
> > And maybe also something explaining that 0 that's now appearing in
> > argument lists:
> > 
> > #define VHOST_NO_QUEUE		0  
> 
> It's not really NO_QUEUE, it's default queue pair, the queue pair 0

Hmm but for non-vhost-user usages then it's not a queue, right? Well,
whatever, as long as we have a definition for it... or maybe we could
have VHOST_QUEUE_DEFAULT and NO_VHOST_QUEUE or VHOST_NO_QUEUE all being
0?

-- 
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-11 12:16       ` Stefano Brivio
@ 2025-12-11 13:26         ` Laurent Vivier
  2025-12-11 15:27           ` Stefano Brivio
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Vivier @ 2025-12-11 13:26 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 12/11/25 13:16, Stefano Brivio wrote:
> On Thu, 11 Dec 2025 09:48:42 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> On 12/11/25 08:01, Stefano Brivio wrote:
>>> On Wed,  3 Dec 2025 19:54:32 +0100
>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>    
>>>> diff --git a/vu_common.c b/vu_common.c
>>>> index b13b7c308fd8..80d9a30f6f71 100644
>>>> --- a/vu_common.c
>>>> +++ b/vu_common.c
>>>> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
>>>>    
>>>>    		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
>>>>    		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
>>>> -			tap_add_packet(vdev->context, &data, now);
>>>> +			tap_add_packet(vdev->context, 0, &data, now);
>>>>    
>>>>    		count++;
>>>>    	}
>>>> -	tap_handler(vdev->context, now);
>>>> +	tap_handler(vdev->context, 0, now);
>>>>    
>>>>    	if (count) {
>>>>    		int i;
>>>> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>>>>    }
>>>>    
>>>>    /**
>>>> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
>>>> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
>>>>     * @c:		execution context
>>>> + * @qpair:	Queue pair on which to send the buffer
>>>>     * @buf:	address of the buffer
>>>>     * @size:	size of the buffer
>>>>     *
>>>>     * Return: number of bytes sent, -1 if there is an error
>>>>     */
>>>> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
>>>> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
>>>>    {
>>>>    	struct vu_dev *vdev = c->vdev;
>>>> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>>>>    	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
>>>>    	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
>>>> +	struct vu_virtq *vq;
>>>>    	size_t total;
>>>>    	int elem_cnt;
>>>>    	int i;
>>>>    
>>>> +	vq = &vdev->vq[qpair << 1];
>>>
>>> << 1 instead of * 2 is a bit surprising here, for a few seconds I
>>> thought you swapped qpair and 1.
>>>
>>> Then I started thinking that somebody is likely to mix up (probably not
>>> you) indices of RX and TX queues at some point. So... what about some
>>> macros, say (let's see if I got it right this time):
>>>
>>> #define VHOST_SEND_QUEUE(pair)	((pair) * 2)
>>> #define VHOST_RECV_QUEUE(pair)	(pair)
>>
>> I will. David had the same comment.
> 
> Uh, wait, I must have missed it. Do you have a Message-ID? I'm afraid I
> must have missed some emails here but I don't see them in archives
> either...

Message-ID: aRF1_Qj6uxf1ndiA@zatzit

>> TX and RX are from the point of view of guest, it's
>> not obvious when we read passt code.
> 
> Right, yes, for me neither, I always get confused. That's why I thought
> we could make the RX vhost-user queue become "SEND" in passt's code,
> but:
> 
>> I would prefer as David proposed to use, i.e. FROMGUEST and TOGUEST:
>>
>> #define VHOST_FROM_GUEST(qpair) ((qpair) * 2 + 1)
>> #define VHOST_TO_GUEST(qpair)   ((qpair) * 2)
> 
> ...this is even clearer. It misses the QUEUE though. Does
> VHOST_QUEUE_{FROM,TO}_GUEST fit where you use it? Otherwise I guess VQ
> together with FROM / TO should be clear enough.
> 
>>> and:
>>>
>>> #define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)
>>
>> I don't undestand the purpose of this one.
> 
> To get the pair number from a queue number. You're doing something like
> that (I guess?) in 5/6, vu_handle_tx():
> 
> +	tap_flush_pools(index / 2);
> 
> +			tap_add_packet(vdev->context, index / 2, &data, now);
> 
> +	tap_handler(vdev->context, index / 2, now);
> 
> but now that I see your definition for VHOST_FROM_GUEST() above, and
> that the purpose wasn't clear to you, I guess it should be:
> 
> #define VHOST_PAIR_FROM_QUEUE(q) (((q) % 2) ? ((q) - 1 / 2) : ((q) / 2))
> 

Why not simply:

#define VHOST_PAIR_FROM_QUEUE(q) (q / 2)

QUEUES 0,1 -> QP 0
QUEUES 2,3 -> QP 1

> ...or maybe it's not needed? I'm not sure.
> 
>>>
>>> ...are they correct? A short description or "Theory of operation"
>>> section somewhere with a recap of how queue indices are used would be
>>> nice to have.
>>>
>>> And maybe also something explaining that 0 that's now appearing in
>>> argument lists:
>>>
>>> #define VHOST_NO_QUEUE		0
>>
>> It's not really NO_QUEUE, it's default queue pair, the queue pair 0
> 
> Hmm but for non-vhost-user usages then it's not a queue, right? Well,
For non vhost usage we can say there is only one queue.

> whatever, as long as we have a definition for it... or maybe we could
> have VHOST_QUEUE_DEFAULT and NO_VHOST_QUEUE or VHOST_NO_QUEUE all being
> 0?
> 

Perhaps we could instead use a generic naming:

QPAIR_DEFAULT
QUEUE_FROM_GUEST(qpair)
QUEUE_TO_GUEST(qpair)

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-11 13:26         ` Laurent Vivier
@ 2025-12-11 15:27           ` Stefano Brivio
  2025-12-11 16:06             ` Laurent Vivier
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Brivio @ 2025-12-11 15:27 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Thu, 11 Dec 2025 14:26:01 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 12/11/25 13:16, Stefano Brivio wrote:
> > On Thu, 11 Dec 2025 09:48:42 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> >> On 12/11/25 08:01, Stefano Brivio wrote:  
> >>> On Wed,  3 Dec 2025 19:54:32 +0100
> >>> Laurent Vivier <lvivier@redhat.com> wrote:
> >>>      
> >>>> diff --git a/vu_common.c b/vu_common.c
> >>>> index b13b7c308fd8..80d9a30f6f71 100644
> >>>> --- a/vu_common.c
> >>>> +++ b/vu_common.c
> >>>> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> >>>>    
> >>>>    		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
> >>>>    		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
> >>>> -			tap_add_packet(vdev->context, &data, now);
> >>>> +			tap_add_packet(vdev->context, 0, &data, now);
> >>>>    
> >>>>    		count++;
> >>>>    	}
> >>>> -	tap_handler(vdev->context, now);
> >>>> +	tap_handler(vdev->context, 0, now);
> >>>>    
> >>>>    	if (count) {
> >>>>    		int i;
> >>>> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
> >>>>    }
> >>>>    
> >>>>    /**
> >>>> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
> >>>> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
> >>>>     * @c:		execution context
> >>>> + * @qpair:	Queue pair on which to send the buffer
> >>>>     * @buf:	address of the buffer
> >>>>     * @size:	size of the buffer
> >>>>     *
> >>>>     * Return: number of bytes sent, -1 if there is an error
> >>>>     */
> >>>> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
> >>>> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
> >>>>    {
> >>>>    	struct vu_dev *vdev = c->vdev;
> >>>> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
> >>>>    	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> >>>>    	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
> >>>> +	struct vu_virtq *vq;
> >>>>    	size_t total;
> >>>>    	int elem_cnt;
> >>>>    	int i;
> >>>>    
> >>>> +	vq = &vdev->vq[qpair << 1];  
> >>>
> >>> << 1 instead of * 2 is a bit surprising here, for a few seconds I
> >>> thought you swapped qpair and 1.
> >>>
> >>> Then I started thinking that somebody is likely to mix up (probably not
> >>> you) indices of RX and TX queues at some point. So... what about some
> >>> macros, say (let's see if I got it right this time):
> >>>
> >>> #define VHOST_SEND_QUEUE(pair)	((pair) * 2)
> >>> #define VHOST_RECV_QUEUE(pair)	(pair)  
> >>
> >> I will. David had the same comment.  
> > 
> > Uh, wait, I must have missed it. Do you have a Message-ID? I'm afraid I
> > must have missed some emails here but I don't see them in archives
> > either...  
> 
> Message-ID: aRF1_Qj6uxf1ndiA@zatzit

Ah, yes, I read that, but I didn't relate it to this topic as it was
just about the direction / naming. I see now.

> >> TX and RX are from the point of view of guest, it's
> >> not obvious when we read passt code.  
> > 
> > Right, yes, for me neither, I always get confused. That's why I thought
> > we could make the RX vhost-user queue become "SEND" in passt's code,
> > but:
> >   
> >> I would prefer as David proposed to use, i.e. FROMGUEST and TOGUEST:
> >>
> >> #define VHOST_FROM_GUEST(qpair) ((qpair) * 2 + 1)
> >> #define VHOST_TO_GUEST(qpair)   ((qpair) * 2)  
> > 
> > ...this is even clearer. It misses the QUEUE though. Does
> > VHOST_QUEUE_{FROM,TO}_GUEST fit where you use it? Otherwise I guess VQ
> > together with FROM / TO should be clear enough.
> >   
> >>> and:
> >>>
> >>> #define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)  
> >>
> >> I don't undestand the purpose of this one.  
> > 
> > To get the pair number from a queue number. You're doing something like
> > that (I guess?) in 5/6, vu_handle_tx():
> > 
> > +	tap_flush_pools(index / 2);
> > 
> > +			tap_add_packet(vdev->context, index / 2, &data, now);
> > 
> > +	tap_handler(vdev->context, index / 2, now);
> > 
> > but now that I see your definition for VHOST_FROM_GUEST() above, and
> > that the purpose wasn't clear to you, I guess it should be:
> > 
> > #define VHOST_PAIR_FROM_QUEUE(q) (((q) % 2) ? ((q) - 1 / 2) : ((q) / 2))
> >   
> 
> Why not simply:
> 
> #define VHOST_PAIR_FROM_QUEUE(q) (q / 2)
> 
> QUEUES 0,1 -> QP 0
> QUEUES 2,3 -> QP 1

Ah, right, of course. Don't forget the parentheses around 'q'.

> > ...or maybe it's not needed? I'm not sure.
> >   
> >>>
> >>> ...are they correct? A short description or "Theory of operation"
> >>> section somewhere with a recap of how queue indices are used would be
> >>> nice to have.
> >>>
> >>> And maybe also something explaining that 0 that's now appearing in
> >>> argument lists:
> >>>
> >>> #define VHOST_NO_QUEUE		0  
> >>
> >> It's not really NO_QUEUE, it's default queue pair, the queue pair 0  
> > 
> > Hmm but for non-vhost-user usages then it's not a queue, right? Well,  
> For non vhost usage we can say there is only one queue.
> 
> > whatever, as long as we have a definition for it... or maybe we could
> > have VHOST_QUEUE_DEFAULT and NO_VHOST_QUEUE or VHOST_NO_QUEUE all being
> > 0?
> >   
> 
> Perhaps we could instead use a generic naming:
> 
> QPAIR_DEFAULT
> QUEUE_FROM_GUEST(qpair)
> QUEUE_TO_GUEST(qpair)

...but for non vhost-user we would have QUEUE_FROM_GUEST(QPAIR_DEFAULT)
evaluating to 1 which isn't correct I guess?

In general it looks reasonable to me, I would just like to make sure we
avoid passing around a '0' in the non-vhost-user case which would look
rather obscure.

-- 
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack
  2025-12-11 15:27           ` Stefano Brivio
@ 2025-12-11 16:06             ` Laurent Vivier
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Vivier @ 2025-12-11 16:06 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 12/11/25 16:27, Stefano Brivio wrote:
> On Thu, 11 Dec 2025 14:26:01 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> On 12/11/25 13:16, Stefano Brivio wrote:
>>> On Thu, 11 Dec 2025 09:48:42 +0100
>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>    
>>>> On 12/11/25 08:01, Stefano Brivio wrote:
>>>>> On Wed,  3 Dec 2025 19:54:32 +0100
>>>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>>>       
>>>>>> diff --git a/vu_common.c b/vu_common.c
>>>>>> index b13b7c308fd8..80d9a30f6f71 100644
>>>>>> --- a/vu_common.c
>>>>>> +++ b/vu_common.c
>>>>>> @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
>>>>>>     
>>>>>>     		data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
>>>>>>     		if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
>>>>>> -			tap_add_packet(vdev->context, &data, now);
>>>>>> +			tap_add_packet(vdev->context, 0, &data, now);
>>>>>>     
>>>>>>     		count++;
>>>>>>     	}
>>>>>> -	tap_handler(vdev->context, now);
>>>>>> +	tap_handler(vdev->context, 0, now);
>>>>>>     
>>>>>>     	if (count) {
>>>>>>     		int i;
>>>>>> @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>>>>>>     }
>>>>>>     
>>>>>>     /**
>>>>>> - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue
>>>>>> + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue
>>>>>>      * @c:		execution context
>>>>>> + * @qpair:	Queue pair on which to send the buffer
>>>>>>      * @buf:	address of the buffer
>>>>>>      * @size:	size of the buffer
>>>>>>      *
>>>>>>      * Return: number of bytes sent, -1 if there is an error
>>>>>>      */
>>>>>> -int vu_send_single(const struct ctx *c, const void *buf, size_t size)
>>>>>> +int vu_send_single(const struct ctx *c, unsigned int qpair, const void *buf, size_t size)
>>>>>>     {
>>>>>>     	struct vu_dev *vdev = c->vdev;
>>>>>> -	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>>>>>>     	struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
>>>>>>     	struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
>>>>>> +	struct vu_virtq *vq;
>>>>>>     	size_t total;
>>>>>>     	int elem_cnt;
>>>>>>     	int i;
>>>>>>     
>>>>>> +	vq = &vdev->vq[qpair << 1];
>>>>>
>>>>> << 1 instead of * 2 is a bit surprising here, for a few seconds I
>>>>> thought you swapped qpair and 1.
>>>>>
>>>>> Then I started thinking that somebody is likely to mix up (probably not
>>>>> you) indices of RX and TX queues at some point. So... what about some
>>>>> macros, say (let's see if I got it right this time):
>>>>>
>>>>> #define VHOST_SEND_QUEUE(pair)	((pair) * 2)
>>>>> #define VHOST_RECV_QUEUE(pair)	(pair)
>>>>
>>>> I will. David had the same comment.
>>>
>>> Uh, wait, I must have missed it. Do you have a Message-ID? I'm afraid I
>>> must have missed some emails here but I don't see them in archives
>>> either...
>>
>> Message-ID: aRF1_Qj6uxf1ndiA@zatzit
> 
> Ah, yes, I read that, but I didn't relate it to this topic as it was
> just about the direction / naming. I see now.
> 
>>>> TX and RX are from the point of view of guest, it's
>>>> not obvious when we read passt code.
>>>
>>> Right, yes, for me neither, I always get confused. That's why I thought
>>> we could make the RX vhost-user queue become "SEND" in passt's code,
>>> but:
>>>    
>>>> I would prefer as David proposed to use, i.e. FROMGUEST and TOGUEST:
>>>>
>>>> #define VHOST_FROM_GUEST(qpair) ((qpair) * 2 + 1)
>>>> #define VHOST_TO_GUEST(qpair)   ((qpair) * 2)
>>>
>>> ...this is even clearer. It misses the QUEUE though. Does
>>> VHOST_QUEUE_{FROM,TO}_GUEST fit where you use it? Otherwise I guess VQ
>>> together with FROM / TO should be clear enough.
>>>    
>>>>> and:
>>>>>
>>>>> #define VHOST_QUEUE_PAIR(q)	((q) % 2) ? (q) : (q) / 2)
>>>>
>>>> I don't undestand the purpose of this one.
>>>
>>> To get the pair number from a queue number. You're doing something like
>>> that (I guess?) in 5/6, vu_handle_tx():
>>>
>>> +	tap_flush_pools(index / 2);
>>>
>>> +			tap_add_packet(vdev->context, index / 2, &data, now);
>>>
>>> +	tap_handler(vdev->context, index / 2, now);
>>>
>>> but now that I see your definition for VHOST_FROM_GUEST() above, and
>>> that the purpose wasn't clear to you, I guess it should be:
>>>
>>> #define VHOST_PAIR_FROM_QUEUE(q) (((q) % 2) ? ((q) - 1 / 2) : ((q) / 2))
>>>    
>>
>> Why not simply:
>>
>> #define VHOST_PAIR_FROM_QUEUE(q) (q / 2)
>>
>> QUEUES 0,1 -> QP 0
>> QUEUES 2,3 -> QP 1
> 
> Ah, right, of course. Don't forget the parentheses around 'q'.

Noted :)

> 
>>> ...or maybe it's not needed? I'm not sure.
>>>    
>>>>>
>>>>> ...are they correct? A short description or "Theory of operation"
>>>>> section somewhere with a recap of how queue indices are used would be
>>>>> nice to have.
>>>>>
>>>>> And maybe also something explaining that 0 that's now appearing in
>>>>> argument lists:
>>>>>
>>>>> #define VHOST_NO_QUEUE		0
>>>>
>>>> It's not really NO_QUEUE, it's default queue pair, the queue pair 0
>>>
>>> Hmm but for non-vhost-user usages then it's not a queue, right? Well,
>> For non vhost usage we can say there is only one queue.
>>
>>> whatever, as long as we have a definition for it... or maybe we could
>>> have VHOST_QUEUE_DEFAULT and NO_VHOST_QUEUE or VHOST_NO_QUEUE all being
>>> 0?
>>>    
>>
>> Perhaps we could instead use a generic naming:
>>
>> QPAIR_DEFAULT
>> QUEUE_FROM_GUEST(qpair)
>> QUEUE_TO_GUEST(qpair)
> 
> ...but for non vhost-user we would have QUEUE_FROM_GUEST(QPAIR_DEFAULT)
> evaluating to 1 which isn't correct I guess?

Yes, _but_ it can map it to what it wants. For instance, for non vhost-user, there will be 
only qpair #0, and QUEUE_FROM_GUEST(QPAIR_DEFAULT) can be mapped to "read from the tap 
socket" and QUEUE_TO_GUEST(QPAIR_DEFAULT) can be mapped to "write to the tap socket".

In fact, in the threading part, one thread will manage one queue pair, and each action is 
mapped to the expected queue.

> 
> In general it looks reasonable to me, I would just like to make sure we
> avoid passing around a '0' in the non-vhost-user case which would look
> rather obscure.
> 
I totally agree and I'm reviewing my code to avoid that.

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-12-11 16:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-03 18:54 [PATCH v3 0/6] vhost-user: Add multiqueue support Laurent Vivier
2025-12-03 18:54 ` [PATCH v3 1/6] tap: Remove pool parameter from tap4_handler() and tap6_handler() Laurent Vivier
2025-12-05  4:14   ` David Gibson
2025-12-03 18:54 ` [PATCH v3 2/6] vhost-user: Enable multiqueue Laurent Vivier
2025-12-10  0:04   ` David Gibson
2025-12-11  7:01   ` Stefano Brivio
2025-12-11  8:29     ` Laurent Vivier
2025-12-03 18:54 ` [PATCH v3 3/6] test: Add multiqueue support to vhost-user test infrastructure Laurent Vivier
2025-12-10  0:05   ` David Gibson
2025-12-03 18:54 ` [PATCH v3 4/6] vhost-user: Add queue pair parameter throughout the network stack Laurent Vivier
2025-12-11  7:01   ` Stefano Brivio
2025-12-11  8:48     ` Laurent Vivier
2025-12-11 12:16       ` Stefano Brivio
2025-12-11 13:26         ` Laurent Vivier
2025-12-11 15:27           ` Stefano Brivio
2025-12-11 16:06             ` Laurent Vivier
2025-12-03 18:54 ` [PATCH v3 5/6] tap: Convert packet pools to per-queue-pair arrays for multiqueue Laurent Vivier
2025-12-03 18:54 ` [PATCH v3 6/6] flow: Add queue pair tracking to flow management Laurent Vivier

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).