From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LLbHL6J7; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 592C75A004E for ; Tue, 10 Sep 2024 17:47:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725983259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t718K2z8uGT6UEmkXkQXklhQPyse/7ienAUkRuYFD/c=; b=LLbHL6J76YBrvxQ0nd95eaYbhFiJb+oFReh6umcEjkk0iePvNrMfdtiw/AGSUSZqZts6NV dRLf4ZVOvnQIlriG9j+xOxgWEdIH+8M65lxg2wWnFtsa7ltsF9QOevi5D/ZCaySK8fvw4B 60PIlwmT+RMv0pxr3/+hOwuY8Nz5suE= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-88-SUM9wL-sPIGSM3JYDpAM5g-1; Tue, 10 Sep 2024 11:47:37 -0400 X-MC-Unique: SUM9wL-sPIGSM3JYDpAM5g-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-42cb050acc3so17109445e9.1 for ; Tue, 10 Sep 2024 08:47:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725983255; x=1726588055; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gyDi+RBxe1XV9TVMj/5TxJbT0cNLf12jul0qpmpIAfo=; b=VNBG0WqttYu0BzciMbFHnhXW/Lgr/4tW/szY3ooTTUZojdbWvO2X3kwyFWAFlTbJUH /tnYbZXh8/7Q1tfjJNBrmZW7tGW/+2QiejNmm0P58/iYJaLsy1Tyz6CJ10cjjukg6TJ9 H58oPE2cIv/kN9T4tUzMRgFZBqJp+2YoTUQpZmOccboNulM9vWNXzsvOpwt0Qtq3Uj3o EiWn3uY7Ipt65esqHnDrR6p5OnkCTAr0F+R73tyS2Vlw3CgnhNcS9uk9EeLGMnZb24Ky e64D4EygUZ6YTlFw2W7AtHBnOG0r5WKgYHU89sqv9UOMiQ5A8UyPCWdBzY00Zx3n17YD iCQA== X-Gm-Message-State: AOJu0YzLiYFp+QHdAHEnX6eVYmYSZ0YzyEBA8c/aEH8i5/9xcvDUk2to hJ//UQyHP0A7PjoA92JqaSR7dm7EPoPWBGa4Bw8TFZctGMP6nZWX/LUD5JrLTS1eIyCVtw3dTP3 iZ8KecnFOWNfTqYf+8Noe25st+1nZQbRDyA3BKKy85IGAd667t3FnMJ4KktJubQflWcJzAh8CPs oGNtKscn8NCrjMmUMCzPp8Hy97aKeqRTxI X-Received: by 2002:a05:600c:35d1:b0:424:895c:b84b with SMTP id 5b1f17b1804b1-42cbddbd7f1mr22145955e9.4.1725983255323; Tue, 10 Sep 2024 08:47:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFa8LFu30zEU0F6c2ATQxoC+athRD+ZPEoKLL2axOoCOWD6cvFFmZQVaGswomhDEo2muYjJKQ== X-Received: by 2002:a05:600c:35d1:b0:424:895c:b84b with SMTP id 5b1f17b1804b1-42cbddbd7f1mr22145555e9.4.1725983254161; Tue, 10 Sep 2024 08:47:34 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42cb179f8e3sm107647025e9.43.2024.09.10.08.47.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Sep 2024 08:47:33 -0700 (PDT) Date: Tue, 10 Sep 2024 17:47:31 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v4 2/4] vhost-user: introduce virtio API Message-ID: <20240910174731.3c4db56f@elisabeth> In-Reply-To: <20240906160455.2088854-3-lvivier@redhat.com> References: <20240906160455.2088854-1-lvivier@redhat.com> <20240906160455.2088854-3-lvivier@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VPINUJNXYYQ3E5CAUE6YWVNFXYDBE3A4 X-Message-ID-Hash: VPINUJNXYYQ3E5CAUE6YWVNFXYDBE3A4 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Just one comment here: On Fri, 6 Sep 2024 18:04:47 +0200 Laurent Vivier wrote: > Add virtio.c and virtio.h that define the functions needed > to manage virtqueues. >=20 > Signed-off-by: Laurent Vivier > --- > Makefile | 4 +- > util.h | 8 + > virtio.c | 665 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > virtio.h | 185 ++++++++++++++++ > 4 files changed, 860 insertions(+), 2 deletions(-) > create mode 100644 virtio.c > create mode 100644 virtio.h >=20 > diff --git a/Makefile b/Makefile > index 01fada45adc7..e9a154bdd718 100644 > --- a/Makefile > +++ b/Makefile > @@ -47,7 +47,7 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCKETS) > PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd= .c \ > =09icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \ > =09ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ > -=09tcp_buf.c tcp_splice.c udp.c udp_flow.c util.c > +=09tcp_buf.c tcp_splice.c udp.c udp_flow.c util.c virtio.c > QRAP_SRCS =3D qrap.c > SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > =20 > @@ -57,7 +57,7 @@ PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h= dhcpv6.h flow.h fwd.h \ > =09flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ > =09lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.= h \ > =09siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.= h \ > -=09udp.h udp_flow.h util.h > +=09udp.h udp_flow.h util.h virtio.h > HEADERS =3D $(PASST_HEADERS) seccomp.h > =20 > C :=3D \#include \nstruct tcp_info x =3D { .tcpi_snd_wnd = =3D 0 }; > diff --git a/util.h b/util.h > index 1463c92153d5..0960903ccaec 100644 > --- a/util.h > +++ b/util.h > @@ -134,6 +134,14 @@ static inline uint32_t ntohl_unaligned(const void *p= ) > =09return ntohl(val); > } > =20 > +static inline void barrier(void) { __asm__ __volatile__("" ::: "memory")= ; } > +#define smp_mb()=09=09do { barrier(); __atomic_thread_fence(__ATOMIC_SEQ= _CST); } while (0) > +#define smp_mb_release()=09do { barrier(); __atomic_thread_fence(__ATOMI= C_RELEASE); } while (0) > +#define smp_mb_acquire()=09do { barrier(); __atomic_thread_fence(__ATOMI= C_ACQUIRE); } while (0) > + > +#define smp_wmb()=09smp_mb_release() > +#define smp_rmb()=09smp_mb_acquire() > + > #define NS_FN_STACK_SIZE=09(RLIMIT_STACK_VAL * 1024 / 8) > int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int= flags, > =09 void *arg); > diff --git a/virtio.c b/virtio.c > new file mode 100644 > index 000000000000..380590afbca3 > --- /dev/null > +++ b/virtio.c > @@ -0,0 +1,665 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later AND BSD-3-Clause > +/* > + * virtio API, vring and virtqueue functions definition > + * > + * Copyright Red Hat > + * Author: Laurent Vivier > + */ > + > +/* Some parts copied from QEMU subprojects/libvhost-user/libvhost-user.c > + * originally licensed under the following terms: > + * > + * -- > + * > + * Copyright IBM, Corp. 2007 > + * Copyright (c) 2016 Red Hat, Inc. > + * > + * Authors: > + * Anthony Liguori > + * Marc-Andr=C3=A9 Lureau > + * Victor Kaplansky > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or > + * later. See the COPYING file in the top-level directory. > + * > + * Some parts copied from QEMU hw/virtio/virtio.c > + * licensed under the following terms: > + * > + * Copyright IBM, Corp. 2007 > + * > + * Authors: > + * Anthony Liguori > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + * > + * -- > + * > + * virtq_used_event() and virtq_avail_event() from > + * https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd0= 1.html#x1-712000A > + * licensed under the following terms: > + * > + * -- > + * > + * This header is BSD licensed so anyone can use the definitions > + * to implement compatible drivers/servers. > + * > + * Copyright 2007, 2009, IBM Corporation > + * Copyright 2011, Red Hat, Inc > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in th= e > + * documentation and/or other materials provided with the distributio= n. > + * 3. Neither the name of IBM nor the names of its contributors > + * may be used to endorse or promote products derived from this softw= are > + * without specific prior written permission. > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS = =E2=80=98=E2=80=98AS IS=E2=80=99=E2=80=99 AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PU= RPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUE= NTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO= DS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, S= TRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY= WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O= F > + * SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "util.h" > +#include "virtio.h" > + > +#define VIRTQUEUE_MAX_SIZE 1024 > + > +/** > + * vu_gpa_to_va() - Translate guest physical address to our virtual addr= ess. > + * @dev:=09Vhost-user device > + * @plen:=09Physical length to map (input), capped to region (output) > + * @guest_addr:=09Guest physical address > + * > + * Return: virtual address in our address space of the guest physical ad= dress > + */ > +static void *vu_gpa_to_va(struct vu_dev *dev, uint64_t *plen, uint64_t g= uest_addr) > +{ > +=09unsigned int i; > + > +=09if (*plen =3D=3D 0) > +=09=09return NULL; > + > +=09/* Find matching memory region. */ > +=09for (i =3D 0; i < dev->nregions; i++) { > +=09=09const struct vu_dev_region *r =3D &dev->regions[i]; > + > +=09=09if ((guest_addr >=3D r->gpa) && > +=09=09 (guest_addr < (r->gpa + r->size))) { > +=09=09=09if ((guest_addr + *plen) > (r->gpa + r->size)) > +=09=09=09=09*plen =3D r->gpa + r->size - guest_addr; > +=09=09=09/* NOLINTNEXTLINE(performance-no-int-to-ptr) */ > +=09=09=09return (void *)(guest_addr - r->gpa + r->mmap_addr + > +=09=09=09=09=09=09 r->mmap_offset); > +=09=09} > +=09} > + > +=09return NULL; > +} > + > +/** > + * vring_avail_flags() - Read the available ring flags > + * @vq:=09=09Virtqueue > + * > + * Return: the available ring descriptor flags of the given virtqueue > + */ > +static inline uint16_t vring_avail_flags(const struct vu_virtq *vq) > +{ > +=09return le16toh(vq->vring.avail->flags); > +} > + > +/** > + * vring_avail_idx() - Read the available ring index > + * @vq:=09=09Virtqueue > + * > + * Return: the available ring index of the given virtqueue > + */ > +static inline uint16_t vring_avail_idx(struct vu_virtq *vq) > +{ > +=09vq->shadow_avail_idx =3D le16toh(vq->vring.avail->idx); > + > +=09return vq->shadow_avail_idx; > +} > + > +/** > + * vring_avail_ring() - Read an available ring entry > + * @vq:=09=09Virtqueue > + * @i:=09=09Index of the entry to read > + * > + * Return: the ring entry content (head of the descriptor chain) > + */ > +static inline uint16_t vring_avail_ring(const struct vu_virtq *vq, int i= ) > +{ > +=09return le16toh(vq->vring.avail->ring[i]); > +} > + > +/** > + * virtq_used_event - Get location of used event indices > + *=09=09 (only with VIRTIO_F_EVENT_IDX) > + * @vq=09=09Virtqueue > + * > + * Return: return the location of the used event index > + */ > +static inline uint16_t *virtq_used_event(const struct vu_virtq *vq) > +{ > + /* For backwards compat, used event index is at *end* of avail r= ing. */ > + return &vq->vring.avail->ring[vq->vring.num]; > +} > + > +/** > + * vring_get_used_event() - Get the used event from the available ring > + * @vq=09=09Virtqueue > + * > + * Return: the used event (available only if VIRTIO_RING_F_EVENT_IDX is = set) > + * used_event is a performant alternative where the driver > + * specifies how far the device can progress before a notificati= on > + * is required. > + */ > +static inline uint16_t vring_get_used_event(const struct vu_virtq *vq) > +{ > +=09return le16toh(*virtq_used_event(vq)); > +} > + > +/** > + * virtqueue_get_head() - Get the head of the descriptor chain for a giv= en > + * index > + * @vq:=09=09Virtqueue > + * @idx:=09Available ring entry index > + * @head:=09Head of the descriptor chain > + */ > +static void virtqueue_get_head(const struct vu_virtq *vq, > +=09=09=09 unsigned int idx, unsigned int *head) > +{ > +=09/* Grab the next descriptor number they're advertising, and increment > +=09 * the index we've seen. > +=09 */ > +=09*head =3D vring_avail_ring(vq, idx % vq->vring.num); > + > +=09/* If their number is silly, that's a fatal mistake. */ > +=09if (*head >=3D vq->vring.num) > +=09=09die("vhost-user: Guest says index %u is available", *head); > +} > + > +/** > + * virtqueue_read_indirect_desc() - Copy virtio ring descriptors from gu= est > + * memory > + * @dev:=09Vhost-user device > + * @desc:=09Destination address to copy the descriptors to > + * @addr:=09Guest memory address to copy from > + * @len:=09Length of memory to copy > + * > + * Return: -1 if there is an error, 0 otherwise > + */ > +static int virtqueue_read_indirect_desc(struct vu_dev *dev, struct vring= _desc *desc, > +=09=09=09=09=09uint64_t addr, size_t len) > +{ > +=09uint64_t read_len; > + > +=09if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) > +=09=09return -1; > + > +=09if (len =3D=3D 0) > +=09=09return -1; > + > +=09while (len) { > +=09=09const struct vring_desc *orig_desc; > + > +=09=09read_len =3D len; > +=09=09orig_desc =3D vu_gpa_to_va(dev, &read_len, addr); In case you missed this in my review of v3 (I'm not sure if it's a valid concern): -- Should we also return if read_len < sizeof(struct vring_desc) after this call? Can that ever happen, if we pick a particular value of addr so that it's almost at the end of a region? -- --=20 Stefano