From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=YYE/PIeN;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by passt.top (Postfix) with ESMTP id BE5275A0262
	for <passt-dev@passt.top>; Fri, 23 Aug 2024 00:14:34 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1724364873;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=5UAgPwktXA5X7zKi4g0HYriAnVHseXw7btmNP1rts3I=;
	b=YYE/PIeNUnrvaN8iAlAT9y05OSVbL9vPsF92GGedxUlkiGCYwou/t4KpguAxcMwCOPw19x
	++BiwVq8se4F24pLaxMB3Ee6q3cgnNzzCsqU7S3smlBHhDHQyCq4agjt8aVEyiq15jMPr6
	/9mgXGqw+ZIqndtflun4QCLD1mr6BWA=
Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com
 [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-355-ic1uOdTKPQW0hkWS2VEnvA-1; Thu, 22 Aug 2024 18:14:31 -0400
X-MC-Unique: ic1uOdTKPQW0hkWS2VEnvA-1
Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-2d3f948347eso2135738a91.1
        for <passt-dev@passt.top>; Thu, 22 Aug 2024 15:14:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1724364869; x=1724969669;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=IQQg+nzyINMWArKRFiVlnzuFCuRE5/fHr2QL1M5GWO0=;
        b=QiYg0jpzJyAM9M9/50y7kpXlnKdoWxR5phBtBBSym9n5UX5LzPS9sv+7IGmvFKXWVK
         vjMdPi3/E0cJ+ljZZbxk1mrR1tiJ2ASjGUZgOClCQDDKVAFLShHyCdtiItFY3tivP2Xp
         U/X9AUo2FcYYUskvbpc/s20vEUaHDbtAJwd/nNgNU1oBFpjxyg1WzQ+wKC3HDAXgdcU3
         /Q/nCRjMxqKICPmRjC0M1vQSxDDN/0R1zJsAwtcS7AD9dUHdkcVa7C3qnLh5YV099GYV
         Od5HwwXAInx5GwvbswWIT6+iv0qMttUW+YdYIRUOFIykxA66e9VHOW90ET+z++0KSUMk
         oXOw==
X-Gm-Message-State: AOJu0Yzdb1gaaluXpNtmiwGLYAbjArkCsnpeT0I5/iCPoJlN2mnamma/
	qI6akQeu/pcGLQ/hhDyOlxI6/Y1OsMcFHQ5MJ+bli0zpO674+wOFSGu6Mu/hnnylF8+zSbM0X9X
	G0t7gQ3hwRUD5igqERfKw6aFfI9tj1cR0QmdYKPxA6RLJxLGyc9JCf6ayyXRXvJIHoWpnL3dewT
	ID3fJkbDxgzRm2y5ft3c1fZ9O4bNvxc00f
X-Received: by 2002:a17:90a:ff17:b0:2c8:2cd1:881b with SMTP id 98e67ed59e1d1-2d60aa0f083mr6430891a91.20.1724364869105;
        Thu, 22 Aug 2024 15:14:29 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IFwO/ZMtcqfHtqlkDzMvvuwDsDcM5zd8W4WcLX/ij8vStmBfXw/oelAywAzAglcI5uAaQ4ocg==
X-Received: by 2002:a17:90a:ff17:b0:2c8:2cd1:881b with SMTP id 98e67ed59e1d1-2d60aa0f083mr6430825a91.20.1724364868173;
        Thu, 22 Aug 2024 15:14:28 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d5eb90c72dsm4732790a91.22.2024.08.22.15.14.25
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 22 Aug 2024 15:14:26 -0700 (PDT)
Date: Fri, 23 Aug 2024 00:14:22 +0200
From: Stefano Brivio <sbrivio@redhat.com>
To: Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH v3 3/4] vhost-user: introduce vhost-user API
Message-ID: <20240823001422.6c441841@elisabeth>
In-Reply-To: <20240815155024.827956-4-lvivier@redhat.com>
References: <20240815155024.827956-1-lvivier@redhat.com>
	<20240815155024.827956-4-lvivier@redhat.com>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: 6C6MHYKZKRZ4WYPXMC7OVX6CCO5FIJTK
X-Message-ID-Hash: 6C6MHYKZKRZ4WYPXMC7OVX6CCO5FIJTK
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20240823001422.6c441841@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/6C6MHYKZKRZ4WYPXMC7OVX6CCO5FIJTK/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Thu, 15 Aug 2024 17:50:22 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> Add vhost_user.c and vhost_user.h that define the functions needed
> to implement vhost-user backend.
>=20
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  Makefile     |    4 +-
>  iov.c        |    1 -
>  vhost_user.c | 1271 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  vhost_user.h |  202 ++++++++
>  virtio.c     |    5 -
>  virtio.h     |    2 +-
>  6 files changed, 1476 insertions(+), 9 deletions(-)
>  create mode 100644 vhost_user.c
>  create mode 100644 vhost_user.h
>=20
> diff --git a/Makefile b/Makefile
> index f171c7955ac9..4ccefffacfde 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -47,7 +47,7 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCKETS)
>  PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd=
.c \
>  =09icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \
>  =09ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \
> -=09tcp_buf.c tcp_splice.c udp.c udp_flow.c util.c virtio.c
> +=09tcp_buf.c tcp_splice.c udp.c udp_flow.c util.c vhost_user.c virtio.c
>  QRAP_SRCS =3D qrap.c
>  SRCS =3D $(PASST_SRCS) $(QRAP_SRCS)
> =20
> @@ -57,7 +57,7 @@ PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h=
 dhcpv6.h flow.h fwd.h \
>  =09flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \
>  =09lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.=
h \
>  =09siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.=
h \
> -=09udp.h udp_flow.h util.h virtio.h
> +=09udp.h udp_flow.h util.h vhost_user.h virtio.h
>  HEADERS =3D $(PASST_HEADERS) seccomp.h
> =20
>  C :=3D \#include <linux/tcp.h>\nstruct tcp_info x =3D { .tcpi_snd_wnd =
=3D 0 };
> diff --git a/iov.c b/iov.c
> index 3f9e229a305f..3741db21790f 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -68,7 +68,6 @@ size_t iov_skip_bytes(const struct iovec *iov, size_t n=
,
>   *
>   * Returns:    The number of bytes successfully copied.
>   */
> -/* cppcheck-suppress unusedFunction */
>  size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
>  =09=09    size_t offset, const void *buf, size_t bytes)
>  {
> diff --git a/vhost_user.c b/vhost_user.c
> new file mode 100644
> index 000000000000..c4cd25fae84e
> --- /dev/null
> +++ b/vhost_user.c
> @@ -0,0 +1,1271 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later

Same as 2/4 with the SPDX tag:

// SPDX-License-Identifier: GPL-2.0-or-later

> + *
> + * vhost-user API, command management and virtio interface
> + *
> + * Copyright Red Hat
> + * Author: Laurent Vivier <lvivier@redhat.com>
> + */
> +/* some parts from QEMU subprojects/libvhost-user/libvhost-user.c
> + * licensed under the following terms:
> + *
> + * Copyright IBM, Corp. 2007
> + * Copyright (c) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *  Anthony Liguori <aliguori@us.ibm.com>
> + *  Marc-Andr=C3=A9 Lureau <mlureau@redhat.com>
> + *  Victor Kaplansky <victork@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <string.h>
> +#include <assert.h>
> +#include <stdbool.h>
> +#include <inttypes.h>
> +#include <time.h>
> +#include <net/ethernet.h>
> +#include <netinet/in.h>
> +#include <sys/epoll.h>
> +#include <sys/eventfd.h>
> +#include <sys/mman.h>
> +#include <linux/vhost_types.h>
> +#include <linux/virtio_net.h>
> +
> +#include "util.h"
> +#include "passt.h"
> +#include "tap.h"
> +#include "vhost_user.h"
> +
> +/* vhost-user version we are compatible with */
> +#define VHOST_USER_VERSION 1
> +
> +/**
> + * vu_print_capabilities() - print vhost-user capabilities
> + * =09=09=09     this is part of the vhost-user backend
> + * =09=09=09     convention.
> + */
> +/* cppcheck-suppress unusedFunction */
> +void vu_print_capabilities(void)
> +{
> +=09info("{");
> +=09info("  \"type\": \"net\"");
> +=09info("}");
> +=09exit(EXIT_SUCCESS);
> +}
> +
> +/**
> + * vu_request_to_string() - convert a vhost-user request number to its n=
ame
> + * @req:=09request number
> + *
> + * Return: the name of request number
> + */
> +static const char *vu_request_to_string(unsigned int req)
> +{
> +=09if (req < VHOST_USER_MAX) {
> +#define REQ(req) [req] =3D #req

Oh, neat, I had never thought of a macro like this.

> +=09=09static const char * const vu_request_str[] =3D {
> +=09=09=09REQ(VHOST_USER_NONE),
> +=09=09=09REQ(VHOST_USER_GET_FEATURES),
> +=09=09=09REQ(VHOST_USER_SET_FEATURES),
> +=09=09=09REQ(VHOST_USER_SET_OWNER),
> +=09=09=09REQ(VHOST_USER_RESET_OWNER),
> +=09=09=09REQ(VHOST_USER_SET_MEM_TABLE),
> +=09=09=09REQ(VHOST_USER_SET_LOG_BASE),
> +=09=09=09REQ(VHOST_USER_SET_LOG_FD),
> +=09=09=09REQ(VHOST_USER_SET_VRING_NUM),
> +=09=09=09REQ(VHOST_USER_SET_VRING_ADDR),
> +=09=09=09REQ(VHOST_USER_SET_VRING_BASE),
> +=09=09=09REQ(VHOST_USER_GET_VRING_BASE),
> +=09=09=09REQ(VHOST_USER_SET_VRING_KICK),
> +=09=09=09REQ(VHOST_USER_SET_VRING_CALL),
> +=09=09=09REQ(VHOST_USER_SET_VRING_ERR),
> +=09=09=09REQ(VHOST_USER_GET_PROTOCOL_FEATURES),
> +=09=09=09REQ(VHOST_USER_SET_PROTOCOL_FEATURES),
> +=09=09=09REQ(VHOST_USER_GET_QUEUE_NUM),
> +=09=09=09REQ(VHOST_USER_SET_VRING_ENABLE),
> +=09=09=09REQ(VHOST_USER_SEND_RARP),
> +=09=09=09REQ(VHOST_USER_NET_SET_MTU),
> +=09=09=09REQ(VHOST_USER_SET_BACKEND_REQ_FD),
> +=09=09=09REQ(VHOST_USER_IOTLB_MSG),
> +=09=09=09REQ(VHOST_USER_SET_VRING_ENDIAN),
> +=09=09=09REQ(VHOST_USER_GET_CONFIG),
> +=09=09=09REQ(VHOST_USER_SET_CONFIG),
> +=09=09=09REQ(VHOST_USER_POSTCOPY_ADVISE),
> +=09=09=09REQ(VHOST_USER_POSTCOPY_LISTEN),
> +=09=09=09REQ(VHOST_USER_POSTCOPY_END),
> +=09=09=09REQ(VHOST_USER_GET_INFLIGHT_FD),
> +=09=09=09REQ(VHOST_USER_SET_INFLIGHT_FD),
> +=09=09=09REQ(VHOST_USER_GPU_SET_SOCKET),
> +=09=09=09REQ(VHOST_USER_VRING_KICK),
> +=09=09=09REQ(VHOST_USER_GET_MAX_MEM_SLOTS),
> +=09=09=09REQ(VHOST_USER_ADD_MEM_REG),
> +=09=09=09REQ(VHOST_USER_REM_MEM_REG),
> +=09=09=09REQ(VHOST_USER_MAX),

REQ(VHOST_USER_MAX) isn't really needed here, you check it's less than
that.

> +=09=09};
> +#undef REQ
> +=09=09return vu_request_str[req];
> +=09}
> +
> +=09return "unknown";
> +}
> +
> +/**
> + * qva_to_va() -  Translate front-end (QEMU) virtual address to our virt=
ual
> + * =09=09  address
> + * @dev:=09=09Vhost-user device

vhost-user device

> + * @qemu_addr:=09=09front-end userspace address
> + *
> + * Return: the memory address in our process virtual address space.
> + */
> +static void *qva_to_va(struct vu_dev *dev, uint64_t qemu_addr)
> +{
> +=09unsigned int i;
> +
> +=09/* Find matching memory region.  */
> +=09for (i =3D 0; i < dev->nregions; i++) {
> +=09=09const struct vu_dev_region *r =3D &dev->regions[i];
> +
> +=09=09if ((qemu_addr >=3D r->qva) && (qemu_addr < (r->qva + r->size))) {
> +=09=09=09/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> +=09=09=09return (void *)(qemu_addr - r->qva + r->mmap_addr +
> +=09=09=09=09=09r->mmap_offset);
> +=09=09}
> +=09}

Not a strong preference, only if you find this convenient: this could
be vu_gpa_to_va() if it optionally took NULL as plen (in that case, you
wouldn't use it, or set it).

> +
> +=09return NULL;
> +}
> +
> +/**
> + * vmsg_close_fds() - Close all file descriptors of a given message
> + * @vmsg:=09Vhost-user message with the list of the file descriptors

vhost-user

> + */
> +static void vmsg_close_fds(const struct vhost_user_msg *vmsg)
> +{
> +=09int i;
> +
> +=09for (i =3D 0; i < vmsg->fd_num; i++)
> +=09=09close(vmsg->fds[i]);
> +}
> +
> +/**
> + * vu_remove_watch() - Remove a file descriptor from an our passt epoll

s/an //

> + * =09=09       file descriptor
> + * @vdev:=09Vhost-user device
> + * @fd:=09=09file descriptor to remove
> + */
> +static void vu_remove_watch(const struct vu_dev *vdev, int fd)
> +{
> +=09(void)vdev;
> +=09(void)fd;
> +}
> +
> +/**
> + * vmsg_set_reply_u64() - Set reply payload.u64 and clear request flags
> + * =09=09=09  and fd_num
> + * @vmsg:=09Vhost-user message

vhost-user

> + * @val:=0964bit value to reply

64-bit

> + */
> +static void vmsg_set_reply_u64(struct vhost_user_msg *vmsg, uint64_t val=
)
> +{
> +=09vmsg->hdr.flags =3D 0; /* defaults will be set by vu_send_reply() */
> +=09vmsg->hdr.size =3D sizeof(vmsg->payload.u64);
> +=09vmsg->payload.u64 =3D val;
> +=09vmsg->fd_num =3D 0;
> +}
> +
> +/**
> + * vu_message_read_default() - Read incoming vhost-user message from the
> + * =09=09=09       front-end
> + * @conn_fd:=09Vhost-user command socket
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: -1 there is an error,
> + *          0 if recvmsg() has been interrupted,

or if there's no data to read

> + *          1 if a message has been received
> + */
> +static int vu_message_read_default(int conn_fd, struct vhost_user_msg *v=
msg)
> +{
> +=09char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS *
> +=09=09     sizeof(int))] =3D { 0 };
> +=09struct iovec iov =3D {
> +=09=09.iov_base =3D (char *)vmsg,
> +=09=09.iov_len =3D VHOST_USER_HDR_SIZE,
> +=09};
> +=09struct msghdr msg =3D {
> +=09=09.msg_iov =3D &iov,
> +=09=09.msg_iovlen =3D 1,
> +=09=09.msg_control =3D control,
> +=09=09.msg_controllen =3D sizeof(control),
> +=09};
> +=09ssize_t ret, sz_payload;
> +=09struct cmsghdr *cmsg;
> +=09size_t fd_size;
> +
> +=09ret =3D recvmsg(conn_fd, &msg, MSG_DONTWAIT);
> +=09if (ret < 0) {
> +=09=09if (errno =3D=3D EINTR || errno =3D=3D EAGAIN || errno =3D=3D EWOU=
LDBLOCK)
> +=09=09=09return 0;
> +=09=09return -1;
> +=09}
> +
> +=09vmsg->fd_num =3D 0;
> +=09for (cmsg =3D CMSG_FIRSTHDR(&msg); cmsg !=3D NULL;
> +=09     cmsg =3D CMSG_NXTHDR(&msg, cmsg)) {
> +=09=09if (cmsg->cmsg_level =3D=3D SOL_SOCKET &&
> +=09=09    cmsg->cmsg_type =3D=3D SCM_RIGHTS) {
> +=09=09=09fd_size =3D cmsg->cmsg_len - CMSG_LEN(0);
> +=09=09=09ASSERT(fd_size / sizeof(int) <=3D
> +=09=09=09       VHOST_MEMORY_BASELINE_NREGIONS);
> +=09=09=09vmsg->fd_num =3D fd_size / sizeof(int);
> +=09=09=09memcpy(vmsg->fds, CMSG_DATA(cmsg), fd_size);

Coverity doesn't quite like the fact that fd_size is used without an
appropriate check.

If sizeof(int) is 4, VHOST_MEMORY_BASELINE_NREGIONS is 8, and fd_size
is 35, we'll pass the ASSERT(), because 35 / 4 =3D 8, but we'll have
three extra bytes here.

This looks safer:

=09=09    =09size_t fd_size;

=09=09=09ASSERT(cmsg->cmsg_len >=3D CMSG_LEN(0));
=09=09=09fd_size =3D cmsg->cmsg_len - CMSG_LEN(0);
=09=09=09ASSERT(fd_size <
=09=09=09       VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int));

=09=09=09vmsg->fd_num =3D fd_size / sizeof(int);
=09=09=09memcpy(vmsg->fds, CMSG_DATA(cmsg), fd_size);

or even:

=09=09=09ASSERT(fd_size <=3D
=09=09=09       sizeof(((struct vhost_user_msg *)0)->fds));

> +=09=09=09break;
> +=09=09}
> +=09}
> +
> +=09sz_payload =3D vmsg->hdr.size;
> +=09if ((size_t)sz_payload > sizeof(vmsg->payload)) {
> +=09=09die("Error: too big message request: %d,"

It's not clear that it's about a vhost-user message, perhaps:

=09vhost-user message request too big: ...

> +=09=09=09 " size: vmsg->size: %zd, "
> +=09=09=09 "while sizeof(vmsg->payload) =3D %zu",
> +=09=09=09 vmsg->hdr.request, sz_payload, sizeof(vmsg->payload));
> +=09}
> +
> +=09if (sz_payload) {
> +=09=09do {
> +=09=09=09ret =3D recv(conn_fd, &vmsg->payload, sz_payload, 0);
> +=09=09} while (ret < 0 && (errno =3D=3D EINTR || errno =3D=3D EAGAIN));

No need for curly brackets, it's a one-line statement.

> +
> +=09=09if (ret < sz_payload)
> +=09=09=09die_perror("Error while reading");

errno will not necessarily indicate _this_ error, here, because you can
also hit this with a positive, or zero value.

And I'm not sure if partial reads are a risk, but if they are, you
should keep a count of how much you read, say:

=09=09for (n =3D 0; n < sz_payload, n +=3D rc) {
=09=09=09rc =3D recv(conn_fd, &vmsg->payload + n, sz_payload - n,
=09=09=09=09  0);

=09=09=09if ((rc < 0 && errno !=3D EINTR && errno !=3D EAGAIN))
=09=09=09=09die_perror("vhost-user message receive");
=09=09=09if (rc =3D=3D 0)
=09=09=09=09die("EOF on vhost-user message receive");
=09=09}

By the way, the socket is actually blocking, and if you really meant to
keep it blocking, you'll never get EAGAIN, and you don't need to loop.
Same for the first recvmsg() in this function, you wouldn't need to check
for EAGAIN or EWOULDBLOCK.

> +=09}
> +
> +=09return 1;
> +}
> +
> +/**
> + * vu_message_write() - send a message to the front-end

Send

> + * @conn_fd:=09Vhost-user command socket
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * #syscalls:vu sendmsg
> + */
> +static void vu_message_write(int conn_fd, struct vhost_user_msg *vmsg)
> +{
> +=09char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))=
] =3D { 0 };
> +=09struct iovec iov =3D {
> +=09=09.iov_base =3D (char *)vmsg,
> +=09=09.iov_len =3D VHOST_USER_HDR_SIZE,
> +=09};
> +=09struct msghdr msg =3D {
> +=09=09.msg_iov =3D &iov,
> +=09=09.msg_iovlen =3D 1,
> +=09=09.msg_control =3D control,
> +=09};
> +=09const uint8_t *p =3D (uint8_t *)vmsg;
> +=09int rc;
> +
> +=09memset(control, 0, sizeof(control));
> +=09ASSERT(vmsg->fd_num <=3D VHOST_MEMORY_BASELINE_NREGIONS);
> +=09if (vmsg->fd_num > 0) {
> +=09=09size_t fdsize =3D vmsg->fd_num * sizeof(int);
> +=09=09struct cmsghdr *cmsg;
> +
> +=09=09msg.msg_controllen =3D CMSG_SPACE(fdsize);
> +=09=09cmsg =3D CMSG_FIRSTHDR(&msg);
> +=09=09cmsg->cmsg_len =3D CMSG_LEN(fdsize);
> +=09=09cmsg->cmsg_level =3D SOL_SOCKET;
> +=09=09cmsg->cmsg_type =3D SCM_RIGHTS;
> +=09=09memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
> +=09} else {
> +=09=09msg.msg_controllen =3D 0;
> +=09}
> +
> +=09do {
> +=09=09rc =3D sendmsg(conn_fd, &msg, 0);
> +=09} while (rc < 0 && (errno =3D=3D EINTR || errno =3D=3D EAGAIN));

Same as above: if you keep the socket blocking, you don't need to check
for EAGAIN...

> +
> +=09if (vmsg->hdr.size) {
> +=09=09do {
> +=09=09=09rc =3D write(conn_fd, p + VHOST_USER_HDR_SIZE,
> +=09=09=09=09   vmsg->hdr.size);
> +=09=09} while (rc < 0 && (errno =3D=3D EINTR || errno =3D=3D EAGAIN));

and you don't need to loop here, either.

> +=09}
> +
> +=09if (rc <=3D 0)
> +=09=09die_perror("Error while writing");

"vhost-user message send"?

> +}
> +
> +/**
> + * vu_send_reply() - Update message flags and send it to front-end
> + * @conn_fd:=09Vhost-user command socket
> + * @vmsg:=09Vhost-user message
> + */
> +static void vu_send_reply(int conn_fd, struct vhost_user_msg *msg)
> +{
> +=09msg->hdr.flags &=3D ~VHOST_USER_VERSION_MASK;
> +=09msg->hdr.flags |=3D VHOST_USER_VERSION;
> +=09msg->hdr.flags |=3D VHOST_USER_REPLY_MASK;
> +
> +=09vu_message_write(conn_fd, msg);
> +}
> +
> +/**
> + * vu_get_features_exec() - Provide back-end features bitmask to front-e=
nd
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: true as a reply is requested
> + */
> +static bool vu_get_features_exec(struct vhost_user_msg *msg)
> +{
> +=09uint64_t features =3D
> +=09=091ULL << VIRTIO_F_VERSION_1 |
> +=09=091ULL << VIRTIO_NET_F_MRG_RXBUF |
> +=09=091ULL << VHOST_USER_F_PROTOCOL_FEATURES;
> +
> +=09vmsg_set_reply_u64(msg, features);
> +
> +=09debug("Sending back to guest u64: 0x%016"PRIx64, msg->payload.u64);
> +
> +=09return true;
> +}
> +
> +/**
> + * vu_set_enable_all_rings() - Enable/disable all the virtqueues
> + * @vdev:=09Vhost-user device
> + * @enable:=09New virtqueues state
> + */
> +static void vu_set_enable_all_rings(struct vu_dev *vdev, bool enable)
> +{
> +=09uint16_t i;
> +
> +=09for (i =3D 0; i < VHOST_USER_MAX_QUEUES; i++)
> +=09=09vdev->vq[i].enable =3D enable;
> +}
> +
> +/**
> + * vu_set_features_exec() - Enable features of the back-end
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_features_exec(struct vu_dev *vdev,
> +=09=09=09=09 struct vhost_user_msg *msg)
> +{
> +=09debug("u64: 0x%016"PRIx64, msg->payload.u64);
> +
> +=09vdev->features =3D msg->payload.u64;
> +=09/* We only support devices conforming to VIRTIO 1.0 or
> +=09 * later
> +=09 */
> +=09if (!vu_has_feature(vdev, VIRTIO_F_VERSION_1))
> +=09=09die("virtio legacy devices aren't supported by passt");
> +
> +=09if (!vu_has_feature(vdev, VHOST_USER_F_PROTOCOL_FEATURES))
> +=09=09vu_set_enable_all_rings(vdev, true);
> +
> +=09/* virtio-net features */
> +
> +=09if (vu_has_feature(vdev, VIRTIO_F_VERSION_1) ||
> +=09    vu_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
> +=09=09vdev->hdrlen =3D sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +=09} else {
> +=09=09vdev->hdrlen =3D sizeof(struct virtio_net_hdr);
> +=09}
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_set_owner_exec() - Session start flag, do nothing in our case
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_owner_exec(void)
> +{
> +=09return false;
> +}
> +
> +/**
> + * map_ring() - Convert ring front-end (QEMU) addresses to our process
> + * =09=09virtual address space.
> + * @vdev:=09Vhost-user device

vhost-user

> + * @vq:=09=09Virtqueue
> + *
> + * Return: true if ring cannot be mapped to our address space
> + */
> +static bool map_ring(struct vu_dev *vdev, struct vu_virtq *vq)
> +{
> +=09vq->vring.desc =3D qva_to_va(vdev, vq->vra.desc_user_addr);
> +=09vq->vring.used =3D qva_to_va(vdev, vq->vra.used_user_addr);
> +=09vq->vring.avail =3D qva_to_va(vdev, vq->vra.avail_user_addr);
> +
> +=09debug("Setting virtq addresses:");
> +=09debug("    vring_desc  at %p", (void *)vq->vring.desc);
> +=09debug("    vring_used  at %p", (void *)vq->vring.used);
> +=09debug("    vring_avail at %p", (void *)vq->vring.avail);
> +
> +=09return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> +}
> +
> +/**
> + * vu_packet_check_range() - Check if a given memory zone is contained i=
n
> + * =09=09=09     a mapped guest memory region
> + * @buf:=09Array of the available memory regions
> + * @offset:=09Offset of data range in packet descriptor
> + * @size:=09Length of desired data range
> + * @start:=09Start of the packet descriptor
> + *
> + * Return: 0 if the zone in a mapped memory region, -1 otherwise

s/in/is in/

> + */
> +/* cppcheck-suppress unusedFunction */
> +int vu_packet_check_range(void *buf, size_t offset, size_t len,
> +=09=09=09  const char *start)
> +{
> +=09struct vu_dev_region *dev_region;
> +
> +=09for (dev_region =3D buf; dev_region->mmap_addr; dev_region++) {
> +=09=09/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> +=09=09char *m =3D (char *)dev_region->mmap_addr;
> +
> +=09=09if (m <=3D start &&
> +=09=09    start + offset + len < m + dev_region->mmap_offset +

Shouldn't this be <=3D as well? If the packet length matches the size of
the region, we're not out of it.

> +=09=09=09=09=09       dev_region->size)
> +=09=09=09return 0;
> +=09}
> +
> +=09return -1;
> +}
> +
> +/**
> + * vu_set_mem_table_exec() - Sets the memory map regions to be able to
> + * =09=09=09     translate the vring addresses.
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + *
> + * #syscalls:vu mmap munmap

As I mentioned in my comments to 2/4: it would be great if we could
assume a model where this function is invoked during initialisation,
and then we go ahead and apply an appropriate seccomp profile. I'm not
sure if it's possible.

If it helps: seccomp-bpf profiles can be appended, so we could also
allow mmap() until this function is called, and then have an extra jump
at the end of the BPF filter where, after this function is called, we
add one instruction denying mmap(). If it's called again, we would
report an error.

> + */
> +static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> +=09=09=09=09  struct vhost_user_msg *msg)
> +{
> +=09struct vhost_user_memory m =3D msg->payload.memory, *memory =3D &m;
> +=09unsigned int i;
> +
> +=09for (i =3D 0; i < vdev->nregions; i++) {
> +=09=09struct vu_dev_region *r =3D &vdev->regions[i];
> +=09=09/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> +=09=09void *mm =3D (void *)r->mmap_addr;
> +
> +=09=09if (mm)
> +=09=09=09munmap(mm, r->size + r->mmap_offset);
> +=09}
> +=09vdev->nregions =3D memory->nregions;
> +
> +=09debug("Nregions: %u", memory->nregions);

It's debug(), so it doesn't need to be perfectly clear, but still it
would be nice to prefix this and "Region" below with "vhost-user".

> +=09for (i =3D 0; i < vdev->nregions; i++) {
> +=09=09struct vhost_user_memory_region *msg_region =3D &memory->regions[i=
];
> +=09=09struct vu_dev_region *dev_region =3D &vdev->regions[i];
> +=09=09void *mmap_addr;
> +
> +=09=09debug("Region %d", i);
> +=09=09debug("    guest_phys_addr: 0x%016"PRIx64,
> +=09=09      msg_region->guest_phys_addr);
> +=09=09debug("    memory_size:     0x%016"PRIx64,
> +=09=09      msg_region->memory_size);
> +=09=09debug("    userspace_addr   0x%016"PRIx64,
> +=09=09      msg_region->userspace_addr);
> +=09=09debug("    mmap_offset      0x%016"PRIx64,
> +=09=09      msg_region->mmap_offset);
> +
> +=09=09dev_region->gpa =3D msg_region->guest_phys_addr;
> +=09=09dev_region->size =3D msg_region->memory_size;
> +=09=09dev_region->qva =3D msg_region->userspace_addr;
> +=09=09dev_region->mmap_offset =3D msg_region->mmap_offset;
> +
> +=09=09/* We don't use offset argument of mmap() since the
> +=09=09 * mapped address has to be page aligned, and we use huge
> +=09=09 * pages.
> +=09=09 */
> +=09=09mmap_addr =3D mmap(0, dev_region->size + dev_region->mmap_offset,
> +=09=09=09=09 PROT_READ | PROT_WRITE, MAP_SHARED |
> +=09=09=09=09 MAP_NORESERVE, msg->fds[i], 0);
> +
> +=09=09if (mmap_addr =3D=3D MAP_FAILED)
> +=09=09=09die_perror("region mmap error");

Also here, "vhost-user region...".

> +
> +=09=09dev_region->mmap_addr =3D (uint64_t)(uintptr_t)mmap_addr;
> +=09=09debug("    mmap_addr:       0x%016"PRIx64,
> +=09=09      dev_region->mmap_addr);
> +
> +=09=09close(msg->fds[i]);
> +=09}
> +
> +=09for (i =3D 0; i < VHOST_USER_MAX_QUEUES; i++) {
> +=09=09if (vdev->vq[i].vring.desc) {
> +=09=09=09if (map_ring(vdev, &vdev->vq[i]))
> +=09=09=09=09die("remapping queue %d during setmemtable", i);
> +=09=09}
> +=09}
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_set_vring_num_exec() - Set the size of the queue (vring size)
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_num_exec(struct vu_dev *vdev,
> +=09=09=09=09  struct vhost_user_msg *msg)
> +{
> +=09unsigned int idx =3D msg->payload.state.index;
> +=09unsigned int num =3D msg->payload.state.num;
> +
> +=09debug("State.index: %u", idx);
> +=09debug("State.num:   %u", num);
> +=09vdev->vq[idx].vring.num =3D num;
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_set_vring_addr_exec() - Set the addresses of the vring
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_addr_exec(struct vu_dev *vdev,
> +=09=09=09=09   struct vhost_user_msg *msg)
> +{
> +=09struct vhost_vring_addr addr =3D msg->payload.addr, *vra =3D &addr;
> +=09struct vu_virtq *vq =3D &vdev->vq[vra->index];
> +
> +=09debug("vhost_vring_addr:");
> +=09debug("    index:  %d", vra->index);
> +=09debug("    flags:  %d", vra->flags);
> +=09debug("    desc_user_addr:   0x%016" PRIx64, (uint64_t)vra->desc_user=
_addr);
> +=09debug("    used_user_addr:   0x%016" PRIx64, (uint64_t)vra->used_user=
_addr);
> +=09debug("    avail_user_addr:  0x%016" PRIx64, (uint64_t)vra->avail_use=
r_addr);
> +=09debug("    log_guest_addr:   0x%016" PRIx64, (uint64_t)vra->log_guest=
_addr);
> +
> +=09vq->vra =3D *vra;
> +=09vq->vring.flags =3D vra->flags;
> +=09vq->vring.log_guest_addr =3D vra->log_guest_addr;
> +
> +=09if (map_ring(vdev, vq))
> +=09=09die("Invalid vring_addr message");
> +
> +=09vq->used_idx =3D le16toh(vq->vring.used->idx);
> +
> +=09if (vq->last_avail_idx !=3D vq->used_idx) {
> +=09=09debug("Last avail index !=3D used index: %u !=3D %u",
> +=09=09      vq->last_avail_idx, vq->used_idx);
> +=09}
> +
> +=09return false;
> +}
> +/**
> + * vu_set_vring_base_exec() - Sets the next index to use for descriptors
> + * =09=09=09      in this vring
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_base_exec(struct vu_dev *vdev,
> +=09=09=09=09   struct vhost_user_msg *msg)
> +{
> +=09unsigned int idx =3D msg->payload.state.index;
> +=09unsigned int num =3D msg->payload.state.num;
> +
> +=09debug("State.index: %u", idx);
> +=09debug("State.num:   %u", num);
> +=09vdev->vq[idx].shadow_avail_idx =3D vdev->vq[idx].last_avail_idx =3D n=
um;
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_get_vring_base_exec() - Stops the vring and returns the current
> + * =09=09=09      descriptor index or indices
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as a reply is requested

True, then. :)

> + */
> +static bool vu_get_vring_base_exec(struct vu_dev *vdev,
> +=09=09=09=09   struct vhost_user_msg *msg)
> +{
> +=09unsigned int idx =3D msg->payload.state.index;
> +
> +=09debug("State.index: %u", idx);
> +=09msg->payload.state.num =3D vdev->vq[idx].last_avail_idx;
> +=09msg->hdr.size =3D sizeof(msg->payload.state);
> +
> +=09vdev->vq[idx].started =3D false;
> +
> +=09if (vdev->vq[idx].call_fd !=3D -1) {
> +=09=09close(vdev->vq[idx].call_fd);
> +=09=09vdev->vq[idx].call_fd =3D -1;
> +=09}
> +=09if (vdev->vq[idx].kick_fd !=3D -1) {
> +=09=09vu_remove_watch(vdev, vdev->vq[idx].kick_fd);
> +=09=09close(vdev->vq[idx].kick_fd);
> +=09=09vdev->vq[idx].kick_fd =3D -1;
> +=09}
> +
> +=09return true;
> +}
> +
> +/**
> + * vu_set_watch() - Add a file descriptor to the passt epoll file descri=
ptor
> + * @vdev:=09vhost-user device
> + * @fd:=09=09file descriptor to add
> + */
> +static void vu_set_watch(const struct vu_dev *vdev, int fd)
> +{
> +=09(void)vdev;
> +=09(void)fd;
> +}
> +
> +/**
> + * vu_wait_queue() - wait new free entries in the virtqueue
> + * @vq:=09=09virtqueue to wait on
> + */
> +static int vu_wait_queue(const struct vu_virtq *vq)
> +{
> +=09eventfd_t kick_data;
> +=09ssize_t rc;
> +=09int status;
> +
> +=09/* wait the kernel to put new entries in the queue */

s/the/for the/

> +=09status =3D fcntl(vq->kick_fd, F_GETFL);
> +=09if (status =3D=3D -1)
> +=09=09return -1;
> +
> +=09status =3D fcntl(vq->kick_fd, F_SETFL, status & ~O_NONBLOCK);

This value is not used, the function could be a bit shorter by omitting
the store, if (fcntl(...)) return -1;

> +=09if (status =3D=3D -1)
> +=09=09return -1;
> +=09rc =3D eventfd_read(vq->kick_fd, &kick_data);
> +=09status =3D fcntl(vq->kick_fd, F_SETFL, status);

Same here.

> +=09if (status =3D=3D -1)
> +=09=09return -1;
> +
> +=09if (rc =3D=3D -1)
> +=09=09return -1;
> +
> +=09return 0;
> +}
> +
> +/**
> + * vu_send() - Send a buffer to the front-end using the RX virtqueue
> + * @vdev:=09vhost-user device
> + * @buf:=09address of the buffer
> + * @size:=09size of the buffer
> + *
> + * Return: number of bytes sent, -1 if there is an error
> + */
> +/* cppcheck-suppress unusedFunction */
> +int vu_send(struct vu_dev *vdev, const void *buf, size_t size)
> +{
> +=09struct vu_virtq *vq =3D &vdev->vq[VHOST_USER_RX_QUEUE];
> +=09struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> +=09struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
> +=09size_t lens[VIRTQUEUE_MAX_SIZE];
> +=09__virtio16 *num_buffers_ptr =3D NULL;
> +=09size_t hdrlen =3D vdev->hdrlen;
> +=09int in_sg_count =3D 0;
> +=09size_t offset =3D 0;
> +=09int i =3D 0, j;
> +
> +=09debug("vu_send size %zu hdrlen %zu", size, hdrlen);
> +
> +=09if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
> +=09=09err("Got packet, but no available descriptors on RX virtq.");
> +=09=09return 0;
> +=09}
> +
> +=09while (offset < size) {
> +=09=09size_t len;
> +=09=09int total;
> +=09=09int ret;
> +
> +=09=09total =3D 0;
> +
> +=09=09if (i =3D=3D ARRAY_SIZE(elem) ||
> +=09=09    in_sg_count =3D=3D ARRAY_SIZE(in_sg)) {
> +=09=09=09err("virtio-net unexpected long buffer chain");
> +=09=09=09goto err;
> +=09=09}
> +
> +=09=09elem[i].out_num =3D 0;
> +=09=09elem[i].out_sg =3D NULL;
> +=09=09elem[i].in_num =3D ARRAY_SIZE(in_sg) - in_sg_count;
> +=09=09elem[i].in_sg =3D &in_sg[in_sg_count];
> +
> +=09=09ret =3D vu_queue_pop(vdev, vq, &elem[i]);
> +=09=09if (ret < 0) {
> +=09=09=09if (vu_wait_queue(vq) !=3D -1)
> +=09=09=09=09continue;
> +=09=09=09if (i) {
> +=09=09=09=09err("virtio-net unexpected empty queue: "
> +=09=09=09=09    "i %d mergeable %d offset %zd, size %zd, "
> +=09=09=09=09    "features 0x%" PRIx64,
> +=09=09=09=09    i, vu_has_feature(vdev,
> +=09=09=09=09=09=09      VIRTIO_NET_F_MRG_RXBUF),
> +=09=09=09=09    offset, size, vdev->features);
> +=09=09=09}
> +=09=09=09offset =3D -1;
> +=09=09=09goto err;
> +=09=09}
> +=09=09in_sg_count +=3D elem[i].in_num;
> +
> +=09=09if (elem[i].in_num < 1) {
> +=09=09=09err("virtio-net receive queue contains no in buffers");
> +=09=09=09vu_queue_detach_element(vq);
> +=09=09=09offset =3D -1;
> +=09=09=09goto err;
> +=09=09}
> +
> +=09=09if (i =3D=3D 0) {
> +=09=09=09struct virtio_net_hdr hdr =3D {
> +=09=09=09=09.flags =3D VIRTIO_NET_HDR_F_DATA_VALID,
> +=09=09=09=09.gso_type =3D VIRTIO_NET_HDR_GSO_NONE,
> +=09=09=09};
> +
> +=09=09=09ASSERT(offset =3D=3D 0);
> +=09=09=09ASSERT(elem[i].in_sg[0].iov_len >=3D hdrlen);
> +
> +=09=09=09len =3D iov_from_buf(elem[i].in_sg, elem[i].in_num, 0,
> +=09=09=09=09=09   &hdr, sizeof(hdr));
> +
> +=09=09=09num_buffers_ptr =3D (__virtio16 *)((char *)elem[i].in_sg[0].iov=
_base +
> +=09=09=09=09=09=09=09 len);
> +
> +=09=09=09total +=3D hdrlen;
> +=09=09}
> +
> +=09=09len =3D iov_from_buf(elem[i].in_sg, elem[i].in_num, total,
> +=09=09=09=09   (char *)buf + offset, size - offset);
> +
> +=09=09total +=3D len;
> +=09=09offset +=3D len;
> +
> +=09=09/* If buffers can't be merged, at this point we
> +=09=09 * must have consumed the complete packet.
> +=09=09 * Otherwise, drop it.
> +=09=09 */
> +=09=09if (!vu_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) &&
> +=09=09    offset < size) {
> +=09=09=09vu_queue_unpop(vq);
> +=09=09=09goto err;
> +=09=09}
> +
> +=09=09lens[i] =3D total;
> +=09=09i++;
> +=09}
> +
> +=09if (num_buffers_ptr && vu_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> +=09=09*num_buffers_ptr =3D htole16(i);
> +
> +=09for (j =3D 0; j < i; j++) {
> +=09=09debug("filling total %zd idx %d", lens[j], j);
> +=09=09vu_queue_fill(vq, &elem[j], lens[j], j);
> +=09}
> +
> +=09vu_queue_flush(vq, i);
> +=09vu_queue_notify(vdev, vq);
> +
> +=09debug("vhost-user sent %zu", offset);
> +
> +=09return offset;
> +err:
> +=09for (j =3D 0; j < i; j++)
> +=09=09vu_queue_detach_element(vq);
> +
> +=09return offset;
> +}
> +
> +/**
> + * vu_handle_tx() - Receive data from the TX virtqueue
> + * @vdev:=09vhost-user device
> + * @index:=09index of the virtqueue

 * @now:=09Current timestamp

> + */
> +static void vu_handle_tx(struct vu_dev *vdev, int index,
> +=09=09=09 const struct timespec *now)
> +{
> +=09struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> +=09struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
> +=09struct vu_virtq *vq =3D &vdev->vq[index];
> +=09int hdrlen =3D vdev->hdrlen;
> +=09int out_sg_count;
> +=09int count;
> +
> +=09if (!VHOST_USER_IS_QUEUE_TX(index)) {
> +=09=09debug("index %d is not a TX queue", index);
> +=09=09return;
> +=09}
> +
> +=09tap_flush_pools();
> +
> +=09count =3D 0;
> +=09out_sg_count =3D 0;
> +=09while (1) {
> +=09=09int ret;
> +
> +

Excess newline.

> +=09=09elem[count].out_num =3D 1;
> +=09=09elem[count].out_sg =3D &out_sg[out_sg_count];
> +=09=09elem[count].in_num =3D 0;
> +=09=09elem[count].in_sg =3D NULL;
> +=09=09ret =3D vu_queue_pop(vdev, vq, &elem[count]);
> +=09=09if (ret < 0)
> +=09=09=09break;

Perhaps I already asked but I can't remember/find the conclusion.

Shouldn't we assign a budget limit to this function, so that we break
the loop after a maximum number (1024?) of descriptors, to guarantee
some amount of fairness?

> +=09=09out_sg_count +=3D elem[count].out_num;
> +
> +=09=09if (elem[count].out_num < 1) {
> +=09=09=09debug("virtio-net header not in first element");
> +=09=09=09break;
> +=09=09}
> +=09=09ASSERT(elem[count].out_num =3D=3D 1);
> +
> +=09=09tap_add_packet(vdev->context,
> +=09=09=09       elem[count].out_sg[0].iov_len - hdrlen,
> +=09=09=09       (char *)elem[count].out_sg[0].iov_base + hdrlen);
> +=09=09count++;
> +=09}
> +=09tap_handler(vdev->context, now);
> +
> +=09if (count) {
> +=09=09int i;
> +
> +=09=09for (i =3D 0; i < count; i++)
> +=09=09=09vu_queue_fill(vq, &elem[i], 0, i);
> +=09=09vu_queue_flush(vq, count);
> +=09=09vu_queue_notify(vdev, vq);
> +=09}
> +}
> +
> +/**
> + * vu_kick_cb() - Called on a kick event to start to receive data
> + * @vdev:=09vhost-user device
> + * @ref:=09epoll reference information
> + */
> +/* cppcheck-suppress unusedFunction */
> +void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
> +=09=09const struct timespec *now)
> +{
> +=09eventfd_t kick_data;
> +=09ssize_t rc;
> +=09int idx;
> +
> +=09for (idx =3D 0; idx < VHOST_USER_MAX_QUEUES; idx++)

Multi-line body loop, use curly brackets.

> +=09=09if (vdev->vq[idx].kick_fd =3D=3D ref.fd)
> +=09=09=09break;
> +
> +=09if (idx =3D=3D VHOST_USER_MAX_QUEUES)
> +=09=09return;
> +
> +=09rc =3D eventfd_read(ref.fd, &kick_data);
> +=09if (rc =3D=3D -1)
> +=09=09die_perror("kick eventfd_read()");

"vhost-user kick ..."

> +
> +=09debug("Got kick_data: %016"PRIx64" idx:%d",
> +=09      kick_data, idx);
> +=09if (VHOST_USER_IS_QUEUE_TX(idx))
> +=09=09vu_handle_tx(vdev, idx, now);
> +}
> +
> +/**
> + * vu_check_queue_msg_file() - Check if a message is valid,
> + * =09=09=09       close fds if NOFD bit is set
> + * @vmsg:=09Vhost-user message

vhost-user

> + */
> +static void vu_check_queue_msg_file(struct vhost_user_msg *msg)
> +{
> +=09bool nofd =3D msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
> +=09int idx =3D msg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +
> +=09if (idx >=3D VHOST_USER_MAX_QUEUES)
> +=09=09die("Invalid queue index: %u", idx);

Invalid vhost-user queue...

> +
> +=09if (nofd) {
> +=09=09vmsg_close_fds(msg);
> +=09=09return;
> +=09}
> +
> +=09if (msg->fd_num !=3D 1)
> +=09=09die("Invalid fds in request: %d", msg->hdr.request);

in vhost-user request...

> +}
> +
> +/**
> + * vu_set_vring_kick_exec() - Set the event file descriptor for adding b=
uffers
> + * =09=09=09      to the vring
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_kick_exec(struct vu_dev *vdev,
> +=09=09=09=09   struct vhost_user_msg *msg)
> +{
> +=09bool nofd =3D msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
> +=09int idx =3D msg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +
> +=09debug("u64: 0x%016"PRIx64, msg->payload.u64);
> +
> +=09vu_check_queue_msg_file(msg);
> +
> +=09if (vdev->vq[idx].kick_fd !=3D -1) {
> +=09=09vu_remove_watch(vdev, vdev->vq[idx].kick_fd);
> +=09=09close(vdev->vq[idx].kick_fd);
> +=09}
> +
> +=09vdev->vq[idx].kick_fd =3D nofd ? -1 : msg->fds[0];
> +=09debug("Got kick_fd: %d for vq: %d", vdev->vq[idx].kick_fd, idx);
> +
> +=09vdev->vq[idx].started =3D true;
> +
> +=09if (vdev->vq[idx].kick_fd !=3D -1 && VHOST_USER_IS_QUEUE_TX(idx)) {
> +=09=09vu_set_watch(vdev, vdev->vq[idx].kick_fd);
> +=09=09debug("Waiting for kicks on fd: %d for vq: %d",
> +=09=09      vdev->vq[idx].kick_fd, idx);
> +=09}
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_set_vring_call_exec() - Set the event file descriptor to signal wh=
en
> + * =09=09=09      buffers are used
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_call_exec(struct vu_dev *vdev,
> +=09=09=09=09   struct vhost_user_msg *msg)
> +{
> +=09bool nofd =3D msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
> +=09int idx =3D msg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +
> +=09debug("u64: 0x%016"PRIx64, msg->payload.u64);
> +
> +=09vu_check_queue_msg_file(msg);
> +
> +=09if (vdev->vq[idx].call_fd !=3D -1)
> +=09=09close(vdev->vq[idx].call_fd);
> +
> +=09vdev->vq[idx].call_fd =3D nofd ? -1 : msg->fds[0];
> +
> +=09/* in case of I/O hang after reconnecting */
> +=09if (vdev->vq[idx].call_fd !=3D -1)
> +=09=09eventfd_write(msg->fds[0], 1);
> +
> +=09debug("Got call_fd: %d for vq: %d", vdev->vq[idx].call_fd, idx);
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_set_vring_err_exec() - Set the event file descriptor to signal whe=
n
> + * =09=09=09     error occurs
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_err_exec(struct vu_dev *vdev,
> +=09=09=09=09  struct vhost_user_msg *msg)
> +{
> +=09bool nofd =3D msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK;
> +=09int idx =3D msg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> +
> +=09debug("u64: 0x%016"PRIx64, msg->payload.u64);
> +
> +=09vu_check_queue_msg_file(msg);
> +
> +=09if (vdev->vq[idx].err_fd !=3D -1) {
> +=09=09close(vdev->vq[idx].err_fd);
> +=09=09vdev->vq[idx].err_fd =3D -1;
> +=09}
> +
> +=09/* cppcheck-suppress redundantAssignment */
> +=09vdev->vq[idx].err_fd =3D nofd ? -1 : msg->fds[0];

Wouldn't it be easier (and not require a suppression) to say:

=09if (!nofd)
=09=09vdev->vq[idx].err_fd =3D msg->fds[0];

?

> +
> +=09return false;
> +}
> +
> +/**
> + * vu_get_protocol_features_exec() - Provide the protocol (vhost-user) f=
eatures
> + * =09=09=09=09     to the front-end
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message

vhost-user

> + *
> + * Return: false as a reply is requested

True.

> + */
> +static bool vu_get_protocol_features_exec(struct vhost_user_msg *msg)
> +{
> +=09uint64_t features =3D 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK;
> +
> +=09vmsg_set_reply_u64(msg, features);
> +
> +=09return true;
> +}
> +
> +/**
> + * vu_set_protocol_features_exec() - Enable protocol (vhost-user) featur=
es
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_protocol_features_exec(struct vu_dev *vdev,
> +=09=09=09=09=09  struct vhost_user_msg *msg)
> +{
> +=09uint64_t features =3D msg->payload.u64;
> +
> +=09debug("u64: 0x%016"PRIx64, features);
> +
> +=09vdev->protocol_features =3D msg->payload.u64;
> +
> +=09if (vu_has_protocol_feature(vdev,
> +=09=09=09=09    VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS) &&

Do we actually care about VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS at
all, I wonder? This whole part (coming from ff1320050a3a "libvhost-user:
implement in-band notifications") is rather hard to read/understand, so it
would be great if we could just get rid of it altogether.

But if not, sure, let's leave it like the original, I'd say.

> +=09    (!vu_has_protocol_feature(vdev, VHOST_USER_PROTOCOL_F_BACKEND_REQ=
) ||
> +=09     !vu_has_protocol_feature(vdev, VHOST_USER_PROTOCOL_F_REPLY_ACK))=
) {
> +=09/*
> +=09 * The use case for using messages for kick/call is simulation, to ma=
ke
> +=09 * the kick and call synchronous. To actually get that behaviour, bot=
h
> +=09 * of the other features are required.
> +=09 * Theoretically, one could use only kick messages, or do them withou=
t
> +=09 * having F_REPLY_ACK, but too many (possibly pending) messages on th=
e
> +=09 * socket will eventually cause the master to hang, to avoid this in
> +=09 * scenarios where not desired enforce that the settings are in a way
> +=09 * that actually enables the simulation case.
> +=09 */
> +=09=09die("F_IN_BAND_NOTIFICATIONS requires F_BACKEND_REQ && F_REPLY_ACK=
");
> +=09}
> +
> +=09return false;
> +}
> +
> +/**
> + * vu_get_queue_num_exec() - Tell how many queues we support
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: true as a reply is requested
> + */
> +static bool vu_get_queue_num_exec(struct vhost_user_msg *msg)
> +{
> +=09vmsg_set_reply_u64(msg, VHOST_USER_MAX_QUEUES);
> +=09return true;
> +}
> +
> +/**
> + * vu_set_vring_enable_exec() - Enable or disable corresponding vring
> + * @vdev:=09Vhost-user device
> + * @vmsg:=09Vhost-user message
> + *
> + * Return: false as no reply is requested
> + */
> +static bool vu_set_vring_enable_exec(struct vu_dev *vdev,
> +=09=09=09=09     struct vhost_user_msg *msg)
> +{
> +=09unsigned int enable =3D msg->payload.state.num;
> +=09unsigned int idx =3D msg->payload.state.index;
> +
> +=09debug("State.index:  %u", idx);
> +=09debug("State.enable: %u", enable);
> +
> +=09if (idx >=3D VHOST_USER_MAX_QUEUES)
> +=09=09die("Invalid vring_enable index: %u", idx);
> +
> +=09vdev->vq[idx].enable =3D enable;
> +=09return false;
> +}
> +
> +/**
> + * vu_init() - Initialize vhost-user device structure
> + * @c:=09=09execution context
> + * @vdev:=09vhost-user device
> + */
> +/* cppcheck-suppress unusedFunction */
> +void vu_init(struct ctx *c, struct vu_dev *vdev)
> +{
> +=09int i;
> +
> +=09vdev->context =3D c;
> +=09vdev->hdrlen =3D 0;
> +=09for (i =3D 0; i < VHOST_USER_MAX_QUEUES; i++) {
> +=09=09vdev->vq[i] =3D (struct vu_virtq){
> +=09=09=09.call_fd =3D -1,
> +=09=09=09.kick_fd =3D -1,
> +=09=09=09.err_fd =3D -1,
> +=09=09=09.notification =3D true,
> +=09=09};
> +=09}
> +}
> +
> +/**
> + * vu_cleanup() - Reset vhost-user device

On the same topic as mmap(): if we just terminate after one connection
(implying --one-off / -1), we don't need to clean up after ourselves.

> + * @vdev:=09vhost-user device
> + */
> +void vu_cleanup(struct vu_dev *vdev)
> +{
> +=09unsigned int i;
> +
> +=09for (i =3D 0; i < VHOST_USER_MAX_QUEUES; i++) {
> +=09=09struct vu_virtq *vq =3D &vdev->vq[i];
> +
> +=09=09vq->started =3D false;
> +=09=09vq->notification =3D true;
> +
> +=09=09if (vq->call_fd !=3D -1) {
> +=09=09=09close(vq->call_fd);
> +=09=09=09vq->call_fd =3D -1;
> +=09=09}
> +=09=09if (vq->err_fd !=3D -1) {
> +=09=09=09close(vq->err_fd);
> +=09=09=09vq->err_fd =3D -1;
> +=09=09}
> +=09=09if (vq->kick_fd !=3D -1) {
> +=09=09=09vu_remove_watch(vdev, vq->kick_fd);
> +=09=09=09close(vq->kick_fd);
> +=09=09=09vq->kick_fd =3D -1;
> +=09=09}
> +
> +=09=09vq->vring.desc =3D 0;
> +=09=09vq->vring.used =3D 0;
> +=09=09vq->vring.avail =3D 0;
> +=09}
> +=09vdev->hdrlen =3D 0;
> +
> +=09for (i =3D 0; i < vdev->nregions; i++) {
> +=09=09const struct vu_dev_region *r =3D &vdev->regions[i];
> +=09=09/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> +=09=09void *m =3D (void *)r->mmap_addr;
> +
> +=09=09if (m)
> +=09=09=09munmap(m, r->size + r->mmap_offset);
> +=09}
> +=09vdev->nregions =3D 0;
> +}
> +
> +/**
> + * vu_sock_reset() - Reset connection socket
> + * @vdev:=09vhost-user device
> + */
> +static void vu_sock_reset(struct vu_dev *vdev)
> +{
> +=09(void)vdev;
> +}
> +
> +/**
> + * tap_handler_vu() - Packet handler for vhost-user
> + * @vdev:=09vhost-user device
> + * @fd:=09=09vhost-user message socket
> + * @events:=09epoll events
> + */
> +/* cppcheck-suppress unusedFunction */
> +void tap_handler_vu(struct vu_dev *vdev, int fd, uint32_t events)
> +{
> +=09struct vhost_user_msg msg =3D { 0 };
> +=09bool need_reply, reply_requested;
> +=09int ret;
> +
> +=09if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) {
> +=09=09vu_sock_reset(vdev);
> +=09=09return;
> +=09}
> +
> +=09ret =3D vu_message_read_default(fd, &msg);
> +=09if (ret < 0)
> +=09=09die_perror("Error while recvmsg");

Right now this looks correct, we should only hit this if
vu_message_read_default() fails on the recvmsg(), but I think it's
rather bug prone, as errno could be accidentally set after recvmsg().

And this is the only call site, so we could die_perror() directly there.

> +=09if (ret =3D=3D 0) {
> +=09=09vu_sock_reset(vdev);

This sounds a bit harsh on EINTR. Again, if we just terminate on EOF or
error, we don't need to handle this.

> +=09=09return;
> +=09}
> +=09debug("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost user me=
ssage =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D");
> +=09debug("Request: %s (%d)", vu_request_to_string(msg.hdr.request),
> +=09=09msg.hdr.request);
> +=09debug("Flags:   0x%x", msg.hdr.flags);
> +=09debug("Size:    %u", msg.hdr.size);
> +
> +=09need_reply =3D msg.hdr.flags & VHOST_USER_NEED_REPLY_MASK;
> +=09switch (msg.hdr.request) {
> +=09case VHOST_USER_GET_FEATURES:
> +=09=09reply_requested =3D vu_get_features_exec(&msg);

Maybe we could have an array of function pointers (and always pass vdev
and &msg), say:

=09bool (*handle[VHOST_USER_MAX])(struct vu_dev *vdev,
=09=09=09=09       struct vhost_user_msg *msg) =3D {
=09=09[VHOST_USER_FEATURES]=09=09   =3D vu_set_features,
=09=09[VHOST_USER_GET_PROTOCOL_FEATURES] =3D vu_get_protocol_features,
=09=09...
=09};

=09if (msg.hdr.request >=3D 0 && msg.hdr.request < VHOST_USER_MAX &&
=09    handle[msg.hdr.request])
=09=09handle[msg.hdr.request](vdev, &msg);

> +=09=09break;
> +=09case VHOST_USER_SET_FEATURES:
> +=09=09reply_requested =3D vu_set_features_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_GET_PROTOCOL_FEATURES:
> +=09=09reply_requested =3D vu_get_protocol_features_exec(&msg);
> +=09=09break;
> +=09case VHOST_USER_SET_PROTOCOL_FEATURES:
> +=09=09reply_requested =3D vu_set_protocol_features_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_GET_QUEUE_NUM:
> +=09=09reply_requested =3D vu_get_queue_num_exec(&msg);
> +=09=09break;
> +=09case VHOST_USER_SET_OWNER:
> +=09=09reply_requested =3D vu_set_owner_exec();
> +=09=09break;
> +=09case VHOST_USER_SET_MEM_TABLE:
> +=09=09reply_requested =3D vu_set_mem_table_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_NUM:
> +=09=09reply_requested =3D vu_set_vring_num_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_ADDR:
> +=09=09reply_requested =3D vu_set_vring_addr_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_BASE:
> +=09=09reply_requested =3D vu_set_vring_base_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_GET_VRING_BASE:
> +=09=09reply_requested =3D vu_get_vring_base_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_KICK:
> +=09=09reply_requested =3D vu_set_vring_kick_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_CALL:
> +=09=09reply_requested =3D vu_set_vring_call_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_ERR:
> +=09=09reply_requested =3D vu_set_vring_err_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_SET_VRING_ENABLE:
> +=09=09reply_requested =3D vu_set_vring_enable_exec(vdev, &msg);
> +=09=09break;
> +=09case VHOST_USER_NONE:
> +=09=09vu_cleanup(vdev);
> +=09=09return;
> +=09default:
> +=09=09die("Unhandled request: %d", msg.hdr.request);
> +=09}
> +
> +=09if (!reply_requested && need_reply) {
> +=09=09msg.payload.u64 =3D 0;
> +=09=09msg.hdr.flags =3D 0;
> +=09=09msg.hdr.size =3D sizeof(msg.payload.u64);
> +=09=09msg.fd_num =3D 0;
> +=09=09reply_requested =3D true;
> +=09}
> +
> +=09if (reply_requested)
> +=09=09vu_send_reply(fd, &msg);
> +}
> diff --git a/vhost_user.h b/vhost_user.h
> new file mode 100644
> index 000000000000..135856dc2873
> --- /dev/null
> +++ b/vhost_user.h
> @@ -0,0 +1,202 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later

Same here with the SPDX tag.

> + * Copyright Red Hat
> + * Author: Laurent Vivier <lvivier@redhat.com>
> + *
> + * vhost-user API, command management and virtio interface
> + */
> +
> +/* some parts from subprojects/libvhost-user/libvhost-user.h */
> +
> +#ifndef VHOST_USER_H
> +#define VHOST_USER_H
> +
> +#include "virtio.h"
> +#include "iov.h"
> +
> +#define VHOST_USER_F_PROTOCOL_FEATURES 30
> +
> +#define VHOST_MEMORY_BASELINE_NREGIONS 8
> +
> +/**
> + * enum vhost_user_protocol_feature - List of available vhost-user featu=
res
> + */
> +enum vhost_user_protocol_feature {
> +=09VHOST_USER_PROTOCOL_F_MQ =3D 0,
> +=09VHOST_USER_PROTOCOL_F_LOG_SHMFD =3D 1,
> +=09VHOST_USER_PROTOCOL_F_RARP =3D 2,
> +=09VHOST_USER_PROTOCOL_F_REPLY_ACK =3D 3,
> +=09VHOST_USER_PROTOCOL_F_NET_MTU =3D 4,
> +=09VHOST_USER_PROTOCOL_F_BACKEND_REQ =3D 5,
> +=09VHOST_USER_PROTOCOL_F_CROSS_ENDIAN =3D 6,
> +=09VHOST_USER_PROTOCOL_F_CRYPTO_SESSION =3D 7,
> +=09VHOST_USER_PROTOCOL_F_PAGEFAULT =3D 8,
> +=09VHOST_USER_PROTOCOL_F_CONFIG =3D 9,
> +=09VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD =3D 10,
> +=09VHOST_USER_PROTOCOL_F_HOST_NOTIFIER =3D 11,
> +=09VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD =3D 12,
> +=09VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS =3D 14,
> +=09VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS =3D 15,
> +
> +=09VHOST_USER_PROTOCOL_F_MAX
> +};
> +
> +/**
> + * enum vhost_user_request - list of available vhost-user request

List, requests

> + */
> +enum vhost_user_request {
> +=09VHOST_USER_NONE =3D 0,
> +=09VHOST_USER_GET_FEATURES =3D 1,
> +=09VHOST_USER_SET_FEATURES =3D 2,
> +=09VHOST_USER_SET_OWNER =3D 3,
> +=09VHOST_USER_RESET_OWNER =3D 4,
> +=09VHOST_USER_SET_MEM_TABLE =3D 5,
> +=09VHOST_USER_SET_LOG_BASE =3D 6,
> +=09VHOST_USER_SET_LOG_FD =3D 7,
> +=09VHOST_USER_SET_VRING_NUM =3D 8,
> +=09VHOST_USER_SET_VRING_ADDR =3D 9,
> +=09VHOST_USER_SET_VRING_BASE =3D 10,
> +=09VHOST_USER_GET_VRING_BASE =3D 11,
> +=09VHOST_USER_SET_VRING_KICK =3D 12,
> +=09VHOST_USER_SET_VRING_CALL =3D 13,
> +=09VHOST_USER_SET_VRING_ERR =3D 14,
> +=09VHOST_USER_GET_PROTOCOL_FEATURES =3D 15,
> +=09VHOST_USER_SET_PROTOCOL_FEATURES =3D 16,
> +=09VHOST_USER_GET_QUEUE_NUM =3D 17,
> +=09VHOST_USER_SET_VRING_ENABLE =3D 18,
> +=09VHOST_USER_SEND_RARP =3D 19,
> +=09VHOST_USER_NET_SET_MTU =3D 20,
> +=09VHOST_USER_SET_BACKEND_REQ_FD =3D 21,
> +=09VHOST_USER_IOTLB_MSG =3D 22,
> +=09VHOST_USER_SET_VRING_ENDIAN =3D 23,
> +=09VHOST_USER_GET_CONFIG =3D 24,
> +=09VHOST_USER_SET_CONFIG =3D 25,
> +=09VHOST_USER_CREATE_CRYPTO_SESSION =3D 26,
> +=09VHOST_USER_CLOSE_CRYPTO_SESSION =3D 27,
> +=09VHOST_USER_POSTCOPY_ADVISE  =3D 28,
> +=09VHOST_USER_POSTCOPY_LISTEN  =3D 29,
> +=09VHOST_USER_POSTCOPY_END     =3D 30,
> +=09VHOST_USER_GET_INFLIGHT_FD =3D 31,
> +=09VHOST_USER_SET_INFLIGHT_FD =3D 32,
> +=09VHOST_USER_GPU_SET_SOCKET =3D 33,
> +=09VHOST_USER_VRING_KICK =3D 35,
> +=09VHOST_USER_GET_MAX_MEM_SLOTS =3D 36,
> +=09VHOST_USER_ADD_MEM_REG =3D 37,
> +=09VHOST_USER_REM_MEM_REG =3D 38,
> +=09VHOST_USER_MAX
> +};
> +
> +/**
> + * struct vhost_user_header - Vhost-user message header

vhost-user

> + * @request:=09Request type of the message
> + * @flags:=09Request flags
> + * @size:=09The following payload size
> + */
> +struct vhost_user_header {
> +=09enum vhost_user_request request;
> +
> +#define VHOST_USER_VERSION_MASK     0x3
> +#define VHOST_USER_REPLY_MASK       (0x1 << 2)
> +#define VHOST_USER_NEED_REPLY_MASK  (0x1 << 3)
> +=09uint32_t flags;
> +=09uint32_t size; /* the following payload size */

It's already in the struct comment.

> +} __attribute__ ((__packed__));
> +
> +/**
> + * struct vhost_user_memory_region - Front-end shared memory region info=
rmation
> + * @guest_phys_addr:=09Guest physical address of the region
> + * @memory_size:=09Memory size
> + * @userspace_addr:=09front-end (QEMU) userspace address
> + * @mmap_offset:=09region offset in the shared memory area
> + */
> +struct vhost_user_memory_region {
> +=09uint64_t guest_phys_addr;
> +=09uint64_t memory_size;
> +=09uint64_t userspace_addr;
> +=09uint64_t mmap_offset;
> +};
> +
> +/**
> + * struct vhost_user_memory - List of all the shared memory regions
> + * @nregions:=09Number of memory regions
> + * @padding:=09Padding
> + * @regions:=09Memory regions list
> + */
> +struct vhost_user_memory {
> +=09uint32_t nregions;
> +=09uint32_t padding;
> +=09struct vhost_user_memory_region regions[VHOST_MEMORY_BASELINE_NREGION=
S];
> +};
> +
> +/**
> + * union vhost_user_payload - Vhost-user message payload
> + * @u64:=09=0964bit payload
> + * @state:=09=09Vring state payload
> + * @addr:=09=09Vring addresses payload
> + * vhost_user_memory:=09Memory regions information payload

vhost-uesr, 64-bit, vring

> + */
> +union vhost_user_payload {
> +#define VHOST_USER_VRING_IDX_MASK   0xff
> +#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
> +=09uint64_t u64;
> +=09struct vhost_vring_state state;
> +=09struct vhost_vring_addr addr;
> +=09struct vhost_user_memory memory;
> +};
> +
> +/**
> + * struct vhost_user_msg - Vhost-use message

vhost-user

> + * @hdr:=09=09Message header
> + * @payload:=09=09Message payload
> + * @fds:=09=09File descriptors associated with the message
> + * =09=09=09in the ancillary data.
> + * =09=09=09(shared memory or event file descriptors)
> + * @fd_num:=09=09Number of file descriptors
> + */
> +struct vhost_user_msg {
> +=09struct vhost_user_header hdr;
> +=09union vhost_user_payload payload;
> +
> +=09int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> +=09int fd_num;
> +} __attribute__ ((__packed__));
> +#define VHOST_USER_HDR_SIZE sizeof(struct vhost_user_header)
> +
> +/* index of the RX virtqueue */
> +#define VHOST_USER_RX_QUEUE 0
> +/* index of the TX virtqueue */
> +#define VHOST_USER_TX_QUEUE 1
> +
> +/* in case of multiqueue, we RX and TX queues are interleaved */

s/we/the/

> +#define VHOST_USER_IS_QUEUE_TX(n)=09(n % 2)
> +#define VHOST_USER_IS_QUEUE_RX(n)=09(!(n % 2))
> +
> +/**
> + * vu_queue_enabled - Return state of a virtqueue
> + * @vq:=09=09Virtqueue to check

virtqueue

> + *
> + * Return: true if the virqueue is enabled, false otherwise

virtqueue

> + */
> +static inline bool vu_queue_enabled(const struct vu_virtq *vq)
> +{
> +=09return vq->enable;
> +}
> +
> +/**
> + * vu_queue_started - Return state of a virtqueue
> + * @vq:=09=09Virtqueue to check

virtqueue

> + *
> + * Return: true if the virqueue is started, false otherwise

virtqueue

> + */
> +static inline bool vu_queue_started(const struct vu_virtq *vq)
> +{
> +=09return vq->started;
> +}
> +
> +int vu_send(struct vu_dev *vdev, const void *buf, size_t size);
> +void vu_print_capabilities(void);
> +void vu_init(struct ctx *c, struct vu_dev *vdev);
> +void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
> +=09=09const struct timespec *now);
> +void vu_cleanup(struct vu_dev *vdev);
> +void tap_handler_vu(struct vu_dev *vdev, int fd, uint32_t events);
> +#endif /* VHOST_USER_H */
> diff --git a/virtio.c b/virtio.c
> index 8354f6052aee..d02e6e04701d 100644
> --- a/virtio.c
> +++ b/virtio.c
> @@ -323,7 +323,6 @@ static bool vring_can_notify(const struct vu_dev *dev=
, struct vu_virtq *vq)
>   * @dev:=09Vhost-user device
>   * @vq:=09=09Virtqueue
>   */
> -/* cppcheck-suppress unusedFunction */
>  void vu_queue_notify(const struct vu_dev *dev, struct vu_virtq *vq)
>  {
>  =09if (!vq->vring.avail)
> @@ -500,7 +499,6 @@ static int vu_queue_map_desc(struct vu_dev *dev, stru=
ct vu_virtq *vq, unsigned i
>   *
>   * Return: -1 if there is an error, 0 otherwise
>   */
> -/* cppcheck-suppress unusedFunction */
>  int vu_queue_pop(struct vu_dev *dev, struct vu_virtq *vq, struct vu_virt=
q_element *elem)
>  {
>  =09unsigned int head;
> @@ -550,7 +548,6 @@ void vu_queue_detach_element(struct vu_virtq *vq)
>   * vu_queue_unpop() - Push back the previously popped element from the v=
irqueue
>   * @vq:=09=09Virtqueue
>   */
> -/* cppcheck-suppress unusedFunction */
>  void vu_queue_unpop(struct vu_virtq *vq)
>  {
>  =09vq->last_avail_idx--;
> @@ -618,7 +615,6 @@ void vu_queue_fill_by_index(struct vu_virtq *vq, unsi=
gned int index,
>   * @len:=09Size of the element
>   * @idx:=09Used ring entry index
>   */
> -/* cppcheck-suppress unusedFunction */
>  void vu_queue_fill(struct vu_virtq *vq, const struct vu_virtq_element *e=
lem,
>  =09=09   unsigned int len, unsigned int idx)
>  {
> @@ -642,7 +638,6 @@ static inline void vring_used_idx_set(struct vu_virtq=
 *vq, uint16_t val)
>   * @vq:=09=09Virtqueue
>   * @count:=09Number of entry to flush
>   */
> -/* cppcheck-suppress unusedFunction */
>  void vu_queue_flush(struct vu_virtq *vq, unsigned int count)
>  {
>  =09uint16_t old, new;
> diff --git a/virtio.h b/virtio.h
> index af9cadc990b9..242e788e07e9 100644
> --- a/virtio.h
> +++ b/virtio.h
> @@ -106,6 +106,7 @@ struct vu_dev_region {
>   * @hdrlen:=09=09Virtio -net header length
>   */
>  struct vu_dev {
> +=09struct ctx *context;
>  =09uint32_t nregions;
>  =09struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS];
>  =09struct vu_virtq vq[VHOST_USER_MAX_QUEUES];
> @@ -162,7 +163,6 @@ static inline bool vu_has_feature(const struct vu_dev=
 *vdev,
>   *
>   * Return:=09True if the feature is available
>   */
> -/* cppcheck-suppress unusedFunction */
>  static inline bool vu_has_protocol_feature(const struct vu_dev *vdev,
>  =09=09=09=09=09   unsigned int fbit)
>  {

--=20
Stefano