From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>,
Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 3/4] vhost-user: introduce vhost-user API
Date: Tue, 27 Aug 2024 00:14:20 +0200 [thread overview]
Message-ID: <20240827001420.1f895c7d@elisabeth> (raw)
In-Reply-To: <ZswSFOJsSsRmXABP@zatzit.fritz.box>
On Mon, 26 Aug 2024 15:26:44 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Thu, Aug 15, 2024 at 05:50:22PM +0200, Laurent Vivier wrote:
> > Add vhost_user.c and vhost_user.h that define the functions needed
> > to implement vhost-user backend.
> >
> > [...]
> >
> > +static int vu_message_read_default(int conn_fd, struct vhost_user_msg *vmsg)
> > +{
> > + char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS *
> > + sizeof(int))] = { 0 };
> > + struct iovec iov = {
> > + .iov_base = (char *)vmsg,
> > + .iov_len = VHOST_USER_HDR_SIZE,
> > + };
> > + struct msghdr msg = {
> > + .msg_iov = &iov,
> > + .msg_iovlen = 1,
> > + .msg_control = control,
> > + .msg_controllen = sizeof(control),
> > + };
> > + ssize_t ret, sz_payload;
> > + struct cmsghdr *cmsg;
> > + size_t fd_size;
> > +
> > + ret = recvmsg(conn_fd, &msg, MSG_DONTWAIT);
> > + if (ret < 0) {
> > + if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK)
> > + return 0;
> > + return -1;
> > + }
> > +
> > + vmsg->fd_num = 0;
> > + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
> > + cmsg = CMSG_NXTHDR(&msg, cmsg)) {
> > + if (cmsg->cmsg_level == SOL_SOCKET &&
> > + cmsg->cmsg_type == SCM_RIGHTS) {
> > + fd_size = cmsg->cmsg_len - CMSG_LEN(0);
> > + ASSERT(fd_size / sizeof(int) <=
> > + VHOST_MEMORY_BASELINE_NREGIONS);
>
> IIUC, this could be tripped by a bug in the peer (qemu?) rather than
> in our own code. In which case I think a die() would be more
> appropriate than an ASSERT().
Ah, right, it wouldn't be our issue... what about neither, so that we
don't crash if QEMU has an issue we could easily recover from?
> > [...]
> >
> > +/**
> > + * vu_set_mem_table_exec() - Sets the memory map regions to be able to
> > + * translate the vring addresses.
> > + * @vdev: Vhost-user device
> > + * @vmsg: Vhost-user message
> > + *
> > + * Return: false as no reply is requested
> > + *
> > + * #syscalls:vu mmap munmap
> > + */
> > +static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> > + struct vhost_user_msg *msg)
> > +{
> > + struct vhost_user_memory m = msg->payload.memory, *memory = &m;
>
> Is there a reason to take a copy of the message, rather than just
> referencing into msg as passed?
>
> > + unsigned int i;
> > +
> > + for (i = 0; i < vdev->nregions; i++) {
> > + struct vu_dev_region *r = &vdev->regions[i];
> > + /* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> > + void *mm = (void *)r->mmap_addr;
> > +
> > + if (mm)
> > + munmap(mm, r->size + r->mmap_offset);
>
> Do we actually ever need to change the mapping of the regions? If not
> we can avoid this unmapping loop.
>
> > + }
> > + vdev->nregions = memory->nregions;
> > +
> > + debug("Nregions: %u", memory->nregions);
> > + for (i = 0; i < vdev->nregions; i++) {
> > + struct vhost_user_memory_region *msg_region = &memory->regions[i];
> > + struct vu_dev_region *dev_region = &vdev->regions[i];
> > + void *mmap_addr;
> > +
> > + debug("Region %d", i);
> > + debug(" guest_phys_addr: 0x%016"PRIx64,
> > + msg_region->guest_phys_addr);
> > + debug(" memory_size: 0x%016"PRIx64,
> > + msg_region->memory_size);
> > + debug(" userspace_addr 0x%016"PRIx64,
> > + msg_region->userspace_addr);
> > + debug(" mmap_offset 0x%016"PRIx64,
> > + msg_region->mmap_offset);
> > +
> > + dev_region->gpa = msg_region->guest_phys_addr;
> > + dev_region->size = msg_region->memory_size;
> > + dev_region->qva = msg_region->userspace_addr;
> > + dev_region->mmap_offset = msg_region->mmap_offset;
> > +
> > + /* We don't use offset argument of mmap() since the
> > + * mapped address has to be page aligned, and we use huge
> > + * pages.
>
> We do what now?
We do madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE) in main(), but
we're not using pkt_buf in this case, so I guess it's not relevant. I'm
not sure if _passt_ calling madvise(..., MADV_HUGEPAGE) on the memory
regions we get would have any effect, by the way.
> > [...]
> >
> > +/**
> > + * vu_send() - Send a buffer to the front-end using the RX virtqueue
> > + * @vdev: vhost-user device
> > + * @buf: address of the buffer
> > + * @size: size of the buffer
> > + *
> > + * Return: number of bytes sent, -1 if there is an error
> > + */
> > +/* cppcheck-suppress unusedFunction */
> > +int vu_send(struct vu_dev *vdev, const void *buf, size_t size)
> > +{
> > + struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
> > + struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> > + struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
> > + size_t lens[VIRTQUEUE_MAX_SIZE];
> > + __virtio16 *num_buffers_ptr = NULL;
> > + size_t hdrlen = vdev->hdrlen;
> > + int in_sg_count = 0;
> > + size_t offset = 0;
> > + int i = 0, j;
> > +
> > + debug("vu_send size %zu hdrlen %zu", size, hdrlen);
> > +
> > + if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
> > + err("Got packet, but no available descriptors on RX virtq.");
> > + return 0;
> > + }
> > +
> > + while (offset < size) {
> > + size_t len;
> > + int total;
> > + int ret;
> > +
> > + total = 0;
> > +
> > + if (i == ARRAY_SIZE(elem) ||
> > + in_sg_count == ARRAY_SIZE(in_sg)) {
> > + err("virtio-net unexpected long buffer chain");
> > + goto err;
> > + }
> > +
> > + elem[i].out_num = 0;
> > + elem[i].out_sg = NULL;
> > + elem[i].in_num = ARRAY_SIZE(in_sg) - in_sg_count;
> > + elem[i].in_sg = &in_sg[in_sg_count];
> > +
> > + ret = vu_queue_pop(vdev, vq, &elem[i]);
> > + if (ret < 0) {
> > + if (vu_wait_queue(vq) != -1)
> > + continue;
> > + if (i) {
> > + err("virtio-net unexpected empty queue: "
> > + "i %d mergeable %d offset %zd, size %zd, "
> > + "features 0x%" PRIx64,
> > + i, vu_has_feature(vdev,
> > + VIRTIO_NET_F_MRG_RXBUF),
> > + offset, size, vdev->features);
> > + }
> > + offset = -1;
> > + goto err;
> > + }
> > + in_sg_count += elem[i].in_num;
>
> Initially I thought this would consume the entire in_sg array on the
> first loop iteration, but I guess vu_queue_pop() reduces in_num from
> the value we initialise above.
>
> > + if (elem[i].in_num < 1) {
>
> I realise it doesn't really matter in this context, but it makes more
> sense to me for this check to go _before_ we use in_num to update
> in_sg_cuont.
>
>
> > + err("virtio-net receive queue contains no in buffers");
> > + vu_queue_detach_element(vq);
> > + offset = -1;
> > + goto err;
> > + }
> > +
> > + if (i == 0) {
> > + struct virtio_net_hdr hdr = {
> > + .flags = VIRTIO_NET_HDR_F_DATA_VALID,
> > + .gso_type = VIRTIO_NET_HDR_GSO_NONE,
> > + };
> > +
> > + ASSERT(offset == 0);
> > + ASSERT(elem[i].in_sg[0].iov_len >= hdrlen);
>
> Is this necessarily our bug, or could it be cause by the peer giving
> unreasonably small buffers? If the latter, then a die() would make
> more sense.
...same here.
--
Stefano
next prev parent reply other threads:[~2024-08-26 22:14 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-15 15:50 [PATCH v3 0/4] Add vhost-user support to passt. (part 3) Laurent Vivier
2024-08-15 15:50 ` [PATCH v3 1/4] packet: replace struct desc by struct iovec Laurent Vivier
2024-08-20 0:27 ` David Gibson
2024-08-15 15:50 ` [PATCH v3 2/4] vhost-user: introduce virtio API Laurent Vivier
2024-08-20 1:00 ` David Gibson
2024-08-22 22:14 ` Stefano Brivio
2024-08-15 15:50 ` [PATCH v3 3/4] vhost-user: introduce vhost-user API Laurent Vivier
2024-08-22 22:14 ` Stefano Brivio
2024-08-26 5:27 ` David Gibson
2024-08-26 7:55 ` Stefano Brivio
2024-08-26 9:53 ` David Gibson
2024-08-26 5:26 ` David Gibson
2024-08-26 22:14 ` Stefano Brivio [this message]
2024-08-27 4:42 ` David Gibson
2024-09-05 9:58 ` Laurent Vivier
2024-08-15 15:50 ` [PATCH v3 4/4] vhost-user: add vhost-user Laurent Vivier
2024-08-22 9:59 ` Stefano Brivio
2024-08-22 22:14 ` Stefano Brivio
2024-08-23 12:32 ` Stefano Brivio
2024-08-20 22:41 ` [PATCH v3 0/4] Add vhost-user support to passt. (part 3) Stefano Brivio
2024-08-22 16:53 ` Stefano Brivio
2024-08-23 12:32 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240827001420.1f895c7d@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).