From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 88BF35A004E for ; Fri, 19 Jul 2024 23:30:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721424607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=139I+Rr2RKv9E7+TzRleUkoyO+X76HFoZFVYvvKQ3xU=; b=Q9sG7jP4TYQjKb7oFnk9Z8rOyg9zCfI60gkqOS6nkuKEfC0M6Vpja5B7TCaRPCyUu3jTjr xJTZzL7T9/lrsidmvU+J+0+ZWrJL5v39GvPQcF+dmdWH0/e514rvddEyJirnTaEIaLKWqn 96V8Czuzf3YjH67cxNd9mLWtsfogZ0c= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-600-qsyHcbXeN-KW06e2ZEcOPA-1; Fri, 19 Jul 2024 17:30:06 -0400 X-MC-Unique: qsyHcbXeN-KW06e2ZEcOPA-1 Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-e05e410d310so4996343276.2 for ; Fri, 19 Jul 2024 14:30:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721424604; x=1722029404; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=139I+Rr2RKv9E7+TzRleUkoyO+X76HFoZFVYvvKQ3xU=; b=g5aoLghhKT7u00TAH6Rh4/0MYLFfqZp3V666A62QV3c+ziWqdvXhkCSYHp8mw1vf0R yywrYMmpxUcBsZxgcsMW8uV6fSxEjqxxfmsFpiorFmZU+t+vZ4bGj5doJI0XyBpx5Bif nv7veV4TAsmEL4fGOzpmBhbxmzKFOkxGyDQJBPqKVjJWH2XVowGpgDVTa3DDJR/Uon9y NG4eSw1gszRroQ64zfgstmhWG8DjRpuUP/TKiQQ0mnXHVsedtoCN7X6PHMYAQUjh7Yw6 PxZCHUgVRLzA88gKFUleTqCMFb/NJcxv3wf5f44lj/6wv6N9W30I1JocMFfuXHe6PwH/ u8AQ== X-Gm-Message-State: AOJu0YzvpO4J9lB2wvIZqCgbwOm6FvKEhHpOZ+7sum8a6Ql1wXf3AjWO Y4B7C9+nE2kGf5vleKmd5Eo3YLjclHBTqjSvyc1vDXTSRX0A/dtiB8X0IVNslDqy5BV7Z/OLOeY hJKiSdvZKDMenkjwdH6tVyJkuKrdS89/XfLsbEQG4c19M2in3IRGA7uQwKkPW19Vrfw387Wma5P X2jnwCWOHnIkzZboCtkI0ESOa/0JkNVZSc6L8= X-Received: by 2002:a05:6902:1249:b0:e05:fde3:30a5 with SMTP id 3f1490d57ef6-e05fde332femr8310646276.30.1721424604253; Fri, 19 Jul 2024 14:30:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE792Tok9r7Mv7wxvxUhLbNdRbCJ2/el/MGt7em6g+J8Z3oktTVqhMLMM1WkdBXF3T2vNSjkQ== X-Received: by 2002:a05:6902:1249:b0:e05:fde3:30a5 with SMTP id 3f1490d57ef6-e05fde332femr8310618276.30.1721424603603; Fri, 19 Jul 2024 14:30:03 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6b7ac7bd61dsm11901346d6.12.2024.07.19.14.30.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jul 2024 14:30:02 -0700 (PDT) Date: Fri, 19 Jul 2024 23:29:28 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v2 2/4] vhost-user: introduce virtio API Message-ID: <20240719232928.0461b22e@elisabeth> In-Reply-To: <20240712153244.831436-3-lvivier@redhat.com> References: <20240712153244.831436-1-lvivier@redhat.com> <20240712153244.831436-3-lvivier@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: QBZDDA42W5W6NBJBU5C4KGOXFKOWQYKO X-Message-ID-Hash: QBZDDA42W5W6NBJBU5C4KGOXFKOWQYKO X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 12 Jul 2024 17:32:42 +0200 Laurent Vivier wrote: > Add virtio.c and virtio.h that define the functions needed > to manage virtqueues. > > Signed-off-by: Laurent Vivier > --- > Makefile | 4 +- > util.h | 11 + > virtio.c | 611 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > virtio.h | 190 +++++++++++++++++ > 4 files changed, 814 insertions(+), 2 deletions(-) > create mode 100644 virtio.c > create mode 100644 virtio.h > > diff --git a/Makefile b/Makefile > index 09fc461d087e..39613a7cf1f2 100644 > --- a/Makefile > +++ b/Makefile > @@ -47,7 +47,7 @@ FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) > PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ > icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \ > ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ > - tcp_buf.c tcp_splice.c udp.c util.c > + tcp_buf.c tcp_splice.c udp.c util.c virtio.c > QRAP_SRCS = qrap.c > SRCS = $(PASST_SRCS) $(QRAP_SRCS) > > @@ -57,7 +57,7 @@ PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h fwd.h \ > flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ > lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h \ > siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.h \ > - udp.h util.h > + udp.h util.h virtio.h > HEADERS = $(PASST_HEADERS) seccomp.h > > C := \#include \nstruct tcp_info x = { .tcpi_snd_wnd = 0 }; > diff --git a/util.h b/util.h > index eebb027be487..56c4e2e7b4fe 100644 > --- a/util.h > +++ b/util.h > @@ -48,6 +48,9 @@ > #define ROUND_DOWN(x, y) ((x) & ~((y) - 1)) > #define ROUND_UP(x, y) (((x) + (y) - 1) & ~((y) - 1)) > > +#define ALIGN_DOWN(n, m) ((n) / (m) * (m)) > +#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m)) > + > #define MAX_FROM_BITS(n) (((1U << (n)) - 1)) > > #define BIT(n) (1UL << (n)) > @@ -116,6 +119,14 @@ > #define htonl_constant(x) (__bswap_constant_32(x)) > #endif > > +static inline void barrier(void) { __asm__ __volatile__("" ::: "memory"); } > +#define smp_mb() do { barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); } while (0) > +#define smp_mb_release() do { barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); } while (0) > +#define smp_mb_acquire() do { barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0) > + > +#define smp_wmb() smp_mb_release() > +#define smp_rmb() smp_mb_acquire() > + > #define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 8) > int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, > void *arg); > diff --git a/virtio.c b/virtio.c > new file mode 100644 > index 000000000000..5f984f92cae0 > --- /dev/null > +++ b/virtio.c > @@ -0,0 +1,611 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * Copyright Red Hat > + * Author: Laurent Vivier > + * > + * virtio API, vring and virtqueue functions definition > + */ > + > +/* some parts copied from QEMU subprojects/libvhost-user/libvhost-user.c */ I think full attribution would be nice, even though not legally required in this case. See checksum.c for an example (and the comment to csum_avx2() there if it applies, but I don't think that part would be practical here). > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "util.h" > +#include "virtio.h" > + > +#define VIRTQUEUE_MAX_SIZE 1024 > + > +/** > + * vu_gpa_to_va() - Translate guest physical address to our virtual address. > + * @dev: Vhost-user device > + * @plen: Physical length to map (input), virtual address mapped (output) > + * @guest_addr: Guest physical address > + * > + * Return: virtual address in our address space of the guest physical address > + */ > +static void *vu_gpa_to_va(struct vu_dev *dev, uint64_t *plen, uint64_t guest_addr) > +{ > + unsigned int i; > + > + if (*plen == 0) > + return NULL; > + > + /* Find matching memory region. */ Extra whitespace before */. > + for (i = 0; i < dev->nregions; i++) { > + const struct vu_dev_region *r = &dev->regions[i]; > + > + if ((guest_addr >= r->gpa) && > + (guest_addr < (r->gpa + r->size))) { > + if ((guest_addr + *plen) > (r->gpa + r->size)) > + *plen = r->gpa + r->size - guest_addr; > + /* NOLINTNEXTLINE(performance-no-int-to-ptr) */ > + return (void *)(guest_addr - r->gpa + r->mmap_addr + > + r->mmap_offset); > + } > + } > + > + return NULL; > +} > + > +/** > + * vring_avail_flags() - Read the available ring flags > + * @vq: Virtqueue > + * > + * Return: the available ring descriptor flags of the given virtqueue > + */ > +static inline uint16_t vring_avail_flags(const struct vu_virtq *vq) > +{ > + return le16toh(vq->vring.avail->flags); > +} > + > +/** > + * vring_avail_idx() - Read the available ring index > + * @vq: Virtqueue > + * > + * Return: the available ring index of the given virtqueue > + */ > +static inline uint16_t vring_avail_idx(struct vu_virtq *vq) > +{ > + vq->shadow_avail_idx = le16toh(vq->vring.avail->idx); > + > + return vq->shadow_avail_idx; > +} > + > +/** > + * vring_avail_ring() - Read an available ring entry > + * @vq: Virtqueue > + * @i Index of the entry to read @i: > + * > + * Return: the ring entry content (head of the descriptor chain) > + */ > +static inline uint16_t vring_avail_ring(const struct vu_virtq *vq, int i) > +{ > + return le16toh(vq->vring.avail->ring[i]); > +} > + > +/** > + * vring_get_used_event() - Get the used event from the available ring > + * @vq Virtqueue > + * > + * Return: the used event (available only if VIRTIO_RING_F_EVENT_IDX is set) > + * used_event is a performant alternative where the driver This is taken from QEMU's hw/virtio/virtio.c, not from subprojects/libvhost-user/libvhost-user.c. > + * specifies how far the device can progress before a notification > + * is required. In this case, virq_avail is defined as: s/virq_avail/virtq_avail/, but... > + * struct virtq_avail { > + * le16 flags; > + * le16 idx; > + * le16 ring[num]; > + * le16 used_event; // Only if VIRTIO_F_EVENT_IDX > + * }; I don't understand why you describe this structure here. All this function returns is an index of a descriptor, right? > + * If the idx field in the used ring (which determined where that > + * descriptor index was placed) was equal to used_event, the device > + * must send a notification. > + * Otherwise the device should not send a notification. > + */ > +static inline uint16_t vring_get_used_event(const struct vu_virtq *vq) > +{ > + return vring_avail_ring(vq, vq->vring.num); > +} > + > +/** > + * virtqueue_get_head() - Get the head of the descriptor chain for a given > + * index > + * @vq: Virtqueue > + * @idx: Available ring entry index > + * @head: Head of the descriptor chain > + */ > +static void virtqueue_get_head(const struct vu_virtq *vq, > + unsigned int idx, unsigned int *head) > +{ > + /* Grab the next descriptor number they're advertising, and increment > + * the index we've seen. > + */ > + *head = vring_avail_ring(vq, idx % vq->vring.num); > + > + /* If their number is silly, that's a fatal mistake. */ > + if (*head >= vq->vring.num) > + vu_panic("Guest says index %u is available", *head); I think David's comment in: https://archives.passt.top/passt-dev/ZnjgSNbIXxKrAllp@zatzit/ really referred to using die() in place of vu_panic(), instead of defining vu_panic() as die() and using it. Well, in any case, that would be my comment: why do why need vu_panic() at all? > +} > + > +/** > + * virtqueue_read_indirect_desc() - Copy virtio ring descriptors from guest > + * memory > + * @dev: Vhost-user device > + * @desc: Destination address to copy the descriptors > + * @addr: Guest memory address to copy from > + * @len: Length of memory to copy > + * > + * Return: -1 if there is an error, 0 otherwise > + */ > +static int virtqueue_read_indirect_desc(struct vu_dev *dev, struct vring_desc *desc, > + uint64_t addr, size_t len) > +{ > + uint64_t read_len; > + > + if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) > + return -1; > + > + if (len == 0) > + return -1; > + > + while (len) { > + const struct vring_desc *ori_desc; It took me a bit to understand that "ori" means... "orig". :) In general I'd say "orig" (ending with a consonant) is much clearer, that's what we use in another occurrence in passt and also what the Linux kernel generally uses. > + > + read_len = len; > + ori_desc = vu_gpa_to_va(dev, &read_len, addr); > + if (!ori_desc) > + return -1; > + > + memcpy(desc, ori_desc, read_len); > + len -= read_len; > + addr += read_len; > + desc += read_len / sizeof(struct vring_desc); > + } > + > + return 0; > +} > + > +/** > + * enum virtqueue_read_desc_state - State in the descriptor chain > + * @VIRTQUEUE_READ_DESC_ERROR Found an invalid descriptor > + * @VIRTQUEUE_READ_DESC_DONE No more descriptor in the chain > + * @VIRTQUEUE_READ_DESC_MORE there is more descriptors in the chain > + */ > +enum virtqueue_read_desc_state { > + VIRTQUEUE_READ_DESC_ERROR = -1, > + VIRTQUEUE_READ_DESC_DONE = 0, /* end of chain */ > + VIRTQUEUE_READ_DESC_MORE = 1, /* more buffers in chain */ > +}; > + > +/** > + * virtqueue_read_next_desc() - Read the the next descriptor in the chain > + * @desc: Virtio ring descriptors > + * @i: Index of the current descriptor > + * @max: Maximum value of the descriptor index > + * @next: Index of the next descriptor in the chain (output value) > + * > + * Return: current chain descriptor state (error, next, done) > + */ > +static int virtqueue_read_next_desc(const struct vring_desc *desc, > + int i, unsigned int max, unsigned int *next) > +{ > + /* If this descriptor says it doesn't chain, we're done. */ > + if (!(le16toh(desc[i].flags) & VRING_DESC_F_NEXT)) > + return VIRTQUEUE_READ_DESC_DONE; > + > + /* Check they're not leading us off end of descriptors. */ > + *next = le16toh(desc[i].next); > + /* Make sure compiler knows to grab that: we don't want it changing! */ > + smp_wmb(); > + > + if (*next >= max) > + return VIRTQUEUE_READ_DESC_ERROR; > + > + return VIRTQUEUE_READ_DESC_MORE; > +} > + > +/** > + * vu_queue_empty() - Check if virtqueue is empty > + * @vq: Virtqueue > + * > + * Return: true if the virtqueue is empty, false otherwise > + */ > +bool vu_queue_empty(struct vu_virtq *vq) > +{ > + if (!vq->vring.avail) > + return true; > + > + if (vq->shadow_avail_idx != vq->last_avail_idx) > + return false; > + > + return vring_avail_idx(vq) == vq->last_avail_idx; > +} > + > +/** > + * vring_notify() - Check if a notification can be sent > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * > + * Return: true if notification can be sent > + */ > +static bool vring_notify(const struct vu_dev *dev, struct vu_virtq *vq) > +{ > + uint16_t old, new; > + bool v; > + > + /* We need to expose used array entries before checking used event. */ > + smp_mb(); > + > + /* Always notify when queue is empty (when feature acknowledge) */ > + if (vu_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) && > + !vq->inuse && vu_queue_empty(vq)) { > + return true; > + } > + > + if (!vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) > + return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT); > + > + v = vq->signalled_used_valid; > + vq->signalled_used_valid = true; > + old = vq->signalled_used; > + new = vq->signalled_used = vq->used_idx; > + return !v || vring_need_event(vring_get_used_event(vq), new, old); > +} > + > +/** > + * vu_queue_notify() - Send a notification the given virtqueue s/the/to the/ > + * @dev: Vhost-user device > + * @vq: Virtqueue > + */ > +/* cppcheck-suppress unusedFunction */ > +void vu_queue_notify(const struct vu_dev *dev, struct vu_virtq *vq) > +{ > + if (!vq->vring.avail) > + return; > + > + if (!vring_notify(dev, vq)) { > + debug("skipped notify..."); > + return; > + } > + > + if (eventfd_write(vq->call_fd, 1) < 0) > + vu_panic("Error writing eventfd: %s", strerror(errno)); > +} > + > +/** > + * vring_set_avail_event() - Set avail_event > + * @vq: Virtqueue > + * @val: Value to set to avail_event > + * avail_event is used in the same way the used_event is in the > + * avail_ring. > + * struct virtq_used { > + * le16 flags; > + * le16 idx; > + * struct virtq_used_elem ringnum]; > + * le16 avail_event; // Only if VIRTIO_F_EVENT_IDX > + * }; Same as above: why is this struct described here? > + * avail_event is used to advise the driver that notifications > + * are unnecessary until the driver writes entry with an index > + * specified by avail_event into the available ring. > + */ > +static inline void vring_set_avail_event(struct vu_virtq *vq, uint16_t val) > +{ > + uint16_t val_le = htole16(val); > + > + if (!vq->notification) > + return; > + > + memcpy(&vq->vring.used->ring[vq->vring.num], &val_le, sizeof(uint16_t)); > +} > + > +/** > + * virtqueue_map_desc() - Translate descriptor ring physical address into our > + * virtual address space > + * @dev: Vhost-user device > + * @p_num_sg: First iov entry to use (input), > + * first iov entry not sued (output) > + * @iov: Iov array to use to store buffer virtual addresses > + * @max_num_sg: Maximum number of iov entries > + * @pa: Guest physical address of the buffer to map into our virtual > + * address > + * @sz: Size of the buffer > + * > + * Return: false on error, true otherwise > + */ > +static bool virtqueue_map_desc(struct vu_dev *dev, > + unsigned int *p_num_sg, struct iovec *iov, > + unsigned int max_num_sg, > + uint64_t pa, size_t sz) > +{ > + unsigned int num_sg = *p_num_sg; > + > + ASSERT(num_sg <= max_num_sg); > + > + if (!sz) > + vu_panic("virtio: zero sized buffers are not allowed"); > + > + while (sz) { > + uint64_t len = sz; > + > + if (num_sg == max_num_sg) > + vu_panic("virtio: too many descriptors in indirect table"); > + > + iov[num_sg].iov_base = vu_gpa_to_va(dev, &len, pa); > + if (iov[num_sg].iov_base == NULL) > + vu_panic("virtio: invalid address for buffers"); > + iov[num_sg].iov_len = len; > + num_sg++; > + sz -= len; > + pa += len; > + } > + > + *p_num_sg = num_sg; > + return true; > +} > + > +/** > + * vu_queue_map_desc - Map the virqueue descriptor ring into our virtual > + * address space > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @idx: First descriptor ring entry to map > + * @elem: Virtqueue element to store descriptor ring iov > + * > + * Return: -1 if there is an error, 0 otherwise > + */ > +static int vu_queue_map_desc(struct vu_dev *dev, struct vu_virtq *vq, unsigned int idx, > + struct vu_virtq_element *elem) > +{ > + const struct vring_desc *desc = vq->vring.desc; > + struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE]; > + unsigned int out_num = 0, in_num = 0; > + unsigned int max = vq->vring.num; > + unsigned int i = idx; > + uint64_t read_len; > + int rc; > + > + if (le16toh(desc[i].flags) & VRING_DESC_F_INDIRECT) { > + unsigned int desc_len; > + uint64_t desc_addr; > + > + if (le32toh(desc[i].len) % sizeof(struct vring_desc)) > + vu_panic("Invalid size for indirect buffer table"); > + > + /* loop over the indirect descriptor table */ > + desc_addr = le64toh(desc[i].addr); > + desc_len = le32toh(desc[i].len); > + max = desc_len / sizeof(struct vring_desc); > + read_len = desc_len; > + desc = vu_gpa_to_va(dev, &read_len, desc_addr); > + if (desc && read_len != desc_len) { > + /* Failed to use zero copy */ > + desc = NULL; > + if (!virtqueue_read_indirect_desc(dev, desc_buf, desc_addr, desc_len)) > + desc = desc_buf; > + } > + if (!desc) > + vu_panic("Invalid indirect buffer table"); > + i = 0; > + } > + > + /* Collect all the descriptors */ > + do { > + if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) { > + if (!virtqueue_map_desc(dev, &in_num, elem->in_sg, > + elem->in_num, > + le64toh(desc[i].addr), > + le32toh(desc[i].len))) { > + return -1; > + } > + } else { > + if (in_num) > + vu_panic("Incorrect order for descriptors"); > + if (!virtqueue_map_desc(dev, &out_num, elem->out_sg, > + elem->out_num, > + le64toh(desc[i].addr), > + le32toh(desc[i].len))) { > + return -1; > + } > + } > + > + /* If we've got too many, that implies a descriptor loop. */ > + if ((in_num + out_num) > max) > + vu_panic("Looped descriptor"); > + rc = virtqueue_read_next_desc(desc, i, max, &i); > + } while (rc == VIRTQUEUE_READ_DESC_MORE); > + > + if (rc == VIRTQUEUE_READ_DESC_ERROR) > + vu_panic("read descriptor error"); > + > + elem->index = idx; > + elem->in_num = in_num; > + elem->out_num = out_num; > + > + return 0; > +} > + > +/** > + * vu_queue_pop() - Pop an entry from the virtqueue > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @elem: Virtqueue element to file with the entry information > + * > + * Return: -1 if there is an error, 0 otherwise > + */ > +/* cppcheck-suppress unusedFunction */ > +int vu_queue_pop(struct vu_dev *dev, struct vu_virtq *vq, struct vu_virtq_element *elem) > +{ > + unsigned int head; > + int ret; > + > + if (!vq->vring.avail) > + return -1; > + > + if (vu_queue_empty(vq)) > + return -1; > + > + /* > + * Needed after vu_queue_empty(), see comment in > + * virtqueue_num_heads(). > + */ > + smp_rmb(); > + > + if (vq->inuse >= vq->vring.num) > + vu_panic("Virtqueue size exceeded"); > + > + virtqueue_get_head(vq, vq->last_avail_idx++, &head); > + > + if (vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) > + vring_set_avail_event(vq, vq->last_avail_idx); > + > + ret = vu_queue_map_desc(dev, vq, head, elem); > + > + if (ret < 0) > + return ret; > + > + vq->inuse++; > + > + return 0; > +} > + > +/** > + * vu_queue_detach_element() - Detach an element from the virqueue > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @index: Index of the element to detach > + * @len: Size of the element to detach > + */ > +void vu_queue_detach_element(struct vu_dev *dev, struct vu_virtq *vq, > + unsigned int index, size_t len) > +{ > + (void)dev; > + (void)index; > + (void)len; > + > + vq->inuse--; > + /* unmap, when DMA support is added */ > +} > + > +/** > + * vu_queue_unpop() - Push back a previously popped element from the virqueue > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @index: Index of the element to unpop > + * @len: Size of the element to unpop > + */ > +/* cppcheck-suppress unusedFunction */ > +void vu_queue_unpop(struct vu_dev *dev, struct vu_virtq *vq, unsigned int index, size_t len) > +{ > + vq->last_avail_idx--; > + vu_queue_detach_element(dev, vq, index, len); > +} > + > +/** > + * vu_queue_rewind() - Push back a given number of popped elements > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @num: Number of element to unpop > + */ > +/* cppcheck-suppress unusedFunction */ > +bool vu_queue_rewind(struct vu_dev *dev, struct vu_virtq *vq, unsigned int num) > +{ > + (void)dev; > + if (num > vq->inuse) > + return false; > + > + vq->last_avail_idx -= num; > + vq->inuse -= num; > + return true; > +} > + > +/** > + * vring_used_write() - Write an entry in the used ring > + * @vq: Virtqueue > + * @uelem: Entry to write > + * @i: Index of the entry in the used ring > + */ > +static inline void vring_used_write(struct vu_virtq *vq, > + const struct vring_used_elem *uelem, int i) > +{ > + struct vring_used *used = vq->vring.used; > + > + used->ring[i] = *uelem; > +} > + > +/** > + * vu_queue_fill_by_index() - Update information of a descriptor ring entry > + * in the used ring > + * @vq: Virtqueue > + * @index: Descriptor ring index > + * @len: Size of the element > + * @idx: Used ring entry index > + */ > +void vu_queue_fill_by_index(struct vu_virtq *vq, unsigned int index, > + unsigned int len, unsigned int idx) > +{ > + struct vring_used_elem uelem; > + > + if (!vq->vring.avail) > + return; > + > + idx = (idx + vq->used_idx) % vq->vring.num; > + > + uelem.id = htole32(index); > + uelem.len = htole32(len); > + vring_used_write(vq, &uelem, idx); > +} > + > +/** > + * vu_queue_fill() - Update information of a given element in the used ring > + * @dev: Vhost-user device > + * @vq: Virtqueue > + * @elem: Element information to fill > + * @len: Size of the element > + * @idx: Used ring entry index > + */ > +/* cppcheck-suppress unusedFunction */ > +void vu_queue_fill(struct vu_virtq *vq, const struct vu_virtq_element *elem, > + unsigned int len, unsigned int idx) > +{ > + vu_queue_fill_by_index(vq, elem->index, len, idx); > +} > + > +/** > + * vring_used_idx_set() - Set the descriptor ring current index > + * @vq: Virtqueue > + * @val: Value to set in the index > + */ > +static inline void vring_used_idx_set(struct vu_virtq *vq, uint16_t val) > +{ > + vq->vring.used->idx = htole16(val); > + > + vq->used_idx = val; > +} > + > +/** > + * vu_queue_flush() - Flush the virtqueue > + * @vq: Virtqueue > + * @count: Number of entry to flush > + */ > +/* cppcheck-suppress unusedFunction */ > +void vu_queue_flush(struct vu_virtq *vq, unsigned int count) > +{ > + uint16_t old, new; > + > + if (!vq->vring.avail) > + return; > + > + /* Make sure buffer is written before we update index. */ > + smp_wmb(); > + > + old = vq->used_idx; > + new = old + count; > + vring_used_idx_set(vq, new); > + vq->inuse -= count; > + if ((int16_t)(new - vq->signalled_used) < (uint16_t)(new - old)) > + vq->signalled_used_valid = false; > +} > diff --git a/virtio.h b/virtio.h > new file mode 100644 > index 000000000000..0a2cf6230139 > --- /dev/null > +++ b/virtio.h > @@ -0,0 +1,190 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * Copyright Red Hat > + * Author: Laurent Vivier > + * > + * virtio API, vring and virtqueue functions definition > + */ > + > +#ifndef VIRTIO_H > +#define VIRTIO_H > + > +#include > +#include > + > +#define vu_panic(...) die( __VA_ARGS__ ) > + > +/* Maximum size of a virtqueue */ > +#define VIRTQUEUE_MAX_SIZE 1024 > + > +/** > + * struct vu_ring - Virtqueue rings > + * @num: Size of the queue > + * @desc: Descriptor ring > + * @avail: Available ring > + * @used: Used ring > + * @log_guest_addr: Guest address for logging > + * @flags: Vring flags > + * VHOST_VRING_F_LOG is set if log address is valid > + */ > +struct vu_ring { > + unsigned int num; > + struct vring_desc *desc; > + struct vring_avail *avail; > + struct vring_used *used; > + uint64_t log_guest_addr; > + uint32_t flags; > +}; > + > +/** > + * struct vu_virtq - Virtqueue definition > + * @vring: Virtqueue rings > + * @last_avail_idx: Next head to pop > + * @shadow_avail_idx: Last avail_idx read from VQ. > + * @used_idx: Descriptor ring current index > + * @signalled_used: Last used index value we have signalled on > + * @signalled_used_valid: True if signalled_used if valid > + * @notification: True if the queues notify (via event > + * index or interrupt) > + * @inuse: Number of entries in use > + * @call_fd: The event file descriptor to signal when > + * buffers are used. > + * @kick_fd: The event file descriptor for adding > + * buffers to the vring > + * @err_fd: The event file descriptor to signal when > + * error occurs > + * @enable: True if the virtqueue is enabled > + * @started: True if the virtqueue is started > + * @vra: QEMU address of our rings > + */ > +struct vu_virtq { > + struct vu_ring vring; > + uint16_t last_avail_idx; > + uint16_t shadow_avail_idx; > + uint16_t used_idx; > + uint16_t signalled_used; > + bool signalled_used_valid; > + bool notification; > + unsigned int inuse; > + int call_fd; > + int kick_fd; > + int err_fd; > + unsigned int enable; > + bool started; > + struct vhost_vring_addr vra; > +}; > + > +/** > + * struct vu_dev_region - guest shared memory region > + * @gpa: Guest physical address of the region > + * @size: Memory size in bytes > + * @qva: QEMU virtual address > + * @mmap_offset: Offset where the region starts in the mapped memory > + * @mmap_addr: Address of the mapped memory > + */ > +struct vu_dev_region { > + uint64_t gpa; > + uint64_t size; > + uint64_t qva; > + uint64_t mmap_offset; > + uint64_t mmap_addr; > +}; > + > +#define VHOST_USER_MAX_QUEUES 2 > + > +/* > + * Set a reasonable maximum number of ram slots, which will be supported by > + * any architecture. > + */ > +#define VHOST_USER_MAX_RAM_SLOTS 32 See QEMU's commit 0fa6344c90a0 ("libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509"). I'm not sure if that, or other bits of the series posted at: https://lore.kernel.org/all/20240214151701.29906-1-david@redhat.com/ are actually relevant for us. > + > +/** > + * struct vu_dev Missing description. It represents a... vhost-user device, with guest mappings, I guess? > + * @context: Execution context > + * nregions: Number of shared memory regions > + * @regions: Guest shared memory regions > + * @features: Vhost-user features > + * @protocol_features: Vhost-user protocol features > + * @hdrlen: Virtio -net header length > + */ > +struct vu_dev { > + uint32_t nregions; > + struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS]; > + struct vu_virtq vq[VHOST_USER_MAX_QUEUES]; > + uint64_t features; > + uint64_t protocol_features; > + int hdrlen; > +}; > + > +/** > + * struct vu_virtq_element And this is an element in the vhost-user virtqueue ring? > + * @index: Descriptor ring index > + * @out_num: Number of outgoing iovec buffers > + * @in_num: Number of incoming iovec buffers > + * @in_sg: Incoming iovec buffers > + * @out_sg: Outgoing iovec buffers > + */ > +struct vu_virtq_element { > + unsigned int index; > + unsigned int out_num; > + unsigned int in_num; > + struct iovec *in_sg; > + struct iovec *out_sg; > +}; > + > +/** > + * has_feature() - Check a feature bit in a features set > + * @features: Features set > + * @fb: Feature bit to check > + * > + * Return: True if the feature bit is set > + */ > +static inline bool has_feature(uint64_t features, unsigned int fbit) > +{ > + return !!(features & (1ULL << fbit)); > +} > + > +/** > + * vu_has_feature() - Check if a virtio-net feature is available > + * @vdev: Vhost-user device > + * @bit: Feature to check > + * > + * Return: True if the feature is available > + */ > +static inline bool vu_has_feature(const struct vu_dev *vdev, > + unsigned int fbit) > +{ > + return has_feature(vdev->features, fbit); > +} > + > +/** > + * vu_has_protocol_feature() - Check if a vhost-user feature is available > + * @vdev: Vhost-user device > + * @bit: Feature to check > + * > + * Return: True if the feature is available > + */ > +/* cppcheck-suppress unusedFunction */ > +static inline bool vu_has_protocol_feature(const struct vu_dev *vdev, > + unsigned int fbit) > +{ > + return has_feature(vdev->protocol_features, fbit); > +} > + > +bool vu_queue_empty(struct vu_virtq *vq); > +void vu_queue_notify(const struct vu_dev *dev, struct vu_virtq *vq); > +int vu_queue_pop(struct vu_dev *dev, struct vu_virtq *vq, > + struct vu_virtq_element *elem); > +void vu_queue_detach_element(struct vu_dev *dev, struct vu_virtq *vq, > + unsigned int index, size_t len); > +void vu_queue_unpop(struct vu_dev *dev, struct vu_virtq *vq, > + unsigned int index, size_t len); > +bool vu_queue_rewind(struct vu_dev *dev, struct vu_virtq *vq, > + unsigned int num); > + > +void vu_queue_fill_by_index(struct vu_virtq *vq, unsigned int index, > + unsigned int len, unsigned int idx); > +void vu_queue_fill(struct vu_virtq *vq, > + const struct vu_virtq_element *elem, unsigned int len, > + unsigned int idx); > +void vu_queue_flush(struct vu_virtq *vq, unsigned int count); > +#endif /* VIRTIO_H */ -- Stefano