From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=I28/ZRaO; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id B64B85A0620 for ; Fri, 17 Jan 2025 19:05:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737137108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JLu/BhHMJvGu81uGvAu4xwamZ3zrdC/hNBh0c4Ezh1U=; b=I28/ZRaOmpte01jgtNfYCB6DlQDtxOWpvw9R3uV8l9dKSlNI/46CX6o9ennRRylwmbAeaP gimV1aOXQ5w421a12oBKjUnSJgIucPH//uF6Nts+6JDwwrm8NO10fkfb98hp6sN5jRhrPo MR2R8T3ztqcYLRY+hvX5NKQVZaeH+HE= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-385-HDbnI2uyPRqJZw0Mumh_4g-1; Fri, 17 Jan 2025 13:05:07 -0500 X-MC-Unique: HDbnI2uyPRqJZw0Mumh_4g-1 X-Mimecast-MFC-AGG-ID: HDbnI2uyPRqJZw0Mumh_4g Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38a35a65575so1927212f8f.1 for ; Fri, 17 Jan 2025 10:05:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737137105; x=1737741905; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=JLu/BhHMJvGu81uGvAu4xwamZ3zrdC/hNBh0c4Ezh1U=; b=m/n6XLuzSjfGZaWcvYsE7nCTRLLDK0Zh4u6MrMWn2+maXdHLriW782r8NqB+NYC00I 5Mzcu3gM6QzSkmtCo4XIOlZW0wGTo30Lvg1V48kQYX1nlU57f+zcBp1Oidjeiw1mCvBo mTv+VIzJB1X+2zieQ2fOvCrH2CkywGvgp49d01SNK45vwo+KKRGXzEtqi4d2BP3N2HOz Md7Eac3JzAfvfhrFPq5bsvLSP4Xe5+7jed0nkbS7e4Mp4huJgB9eHsMpZuRMP4xXWMwX aQimoYao3RXrpEs8qKMGsVGaMt+M1wlp+XYS/UKntXn3CqrR3As1hKBm22szYteTGAk4 8eQw== X-Gm-Message-State: AOJu0Yy5JMT1gocIy5d3u2bzji4FLQeSf7ydCVilUMF1XOZV90tvc96O eDpOAOuF+tXCjUR5vibXMKDgGNP2SY2MX+XoLEC0IXKwCr4FFYapk8tXSHrCJrKuzBEw1WPSiBC H1p3RpG2lP6ElwbynfdjtW20FpAWxIhnA4kJiX+ilIMafiV1l7lJnTlQoJyHNNWuiAz2FExGTwX PM/EdDLaJ2P108RJ7hlY6H0zHPbQKvi8C3 X-Gm-Gg: ASbGncsR1J+Frl9Z4z60VP9hcJsDBleoVDf2tFclV61VJsMa3dgIzUuFt3uwnEL9bJK i8DYmDxfX5Gs4XL8BD7h41RD6t+hJJHZMyChbNLjQBB/iVw9IoJGini/3XsBngJdrjY/nEG0o5N q+cQSO+0ZBWvH8h4Owpk63PfRcYV7PkdF5TPVtRPpAxxxWl3/6B4x8bD9A/bNVHW6CfEOrH56tn jMAeXBkBR5nmnze/a4UxFqEkNV8bjpa4H5w0xfh1+tdGAI9FUPNdSh4pWoDh4uVbJOEkroTugAU crQwcWyQrA== X-Received: by 2002:a5d:5f51:0:b0:385:ee59:4510 with SMTP id ffacd0b85a97d-38bf5655441mr3866136f8f.9.1737137105372; Fri, 17 Jan 2025 10:05:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IFuCYsgsC5/JKlFZnchV4F9fhDQpVPdmvgpcapLSbKPDkJIcgLiqeGpJ2ssZO8SXV5vensPIA== X-Received: by 2002:a5d:5f51:0:b0:385:ee59:4510 with SMTP id ffacd0b85a97d-38bf5655441mr3866075f8f.9.1737137104714; Fri, 17 Jan 2025 10:05:04 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-437c7499942sm100617945e9.6.2025.01.17.10.05.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 10:05:04 -0800 (PST) Date: Fri, 17 Jan 2025 19:05:02 +0100 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH 5/9] vhost-user: add VHOST_USER_SET_LOG_BASE command Message-ID: <20250117190502.6590b489@elisabeth> In-Reply-To: <20241219111400.2352110-6-lvivier@redhat.com> References: <20241219111400.2352110-1-lvivier@redhat.com> <20241219111400.2352110-6-lvivier@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 08ehbNcispV-oja3rbiSzhUxSBt6cSFXiiDVnVdElSc_1737137106 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: KTQGXK7ZK4M52YKQO5VD6AFGLJP5LQB6 X-Message-ID-Hash: KTQGXK7ZK4M52YKQO5VD6AFGLJP5LQB6 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 19 Dec 2024 12:13:56 +0100 Laurent Vivier wrote: > Sets logging shared memory space. > > When the back-end has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature, > the log memory fd is provided in the ancillary data of > VHOST_USER_SET_LOG_BASE message, the size and offset of shared memory > area provided in the message. > > Signed-off-by: Laurent Vivier > --- > util.h | 3 ++ > vhost_user.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++- > vhost_user.h | 3 ++ > virtio.c | 74 ++++++++++++++++++++++++++++++++++++++++++-- > virtio.h | 4 +++ > 5 files changed, 168 insertions(+), 3 deletions(-) > > diff --git a/util.h b/util.h > index 3fa1d12544a0..d02333d5a88d 100644 > --- a/util.h > +++ b/util.h > @@ -152,6 +152,9 @@ static inline void barrier(void) { __asm__ __volatile__("" ::: "memory"); } > #define smp_wmb() smp_mb_release() > #define smp_rmb() smp_mb_acquire() > > +#define qatomic_or(ptr, n) \ > + ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)) > + > #define NS_FN_STACK_SIZE (1024 * 1024) /* 1MiB */ > > int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, > diff --git a/vhost_user.c b/vhost_user.c > index ce4373d9eeca..c2fac58badf1 100644 > --- a/vhost_user.c > +++ b/vhost_user.c > @@ -510,6 +510,12 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev, > */ > static void vu_close_log(struct vu_dev *vdev) > { > + if (vdev->log_table) { > + if (munmap(vdev->log_table, vdev->log_size) != 0) > + die_perror("close log munmap() error"); > + vdev->log_table = NULL; > + } > + > if (vdev->log_call_fd != -1) { > close(vdev->log_call_fd); > vdev->log_call_fd = -1; > @@ -520,7 +526,6 @@ static void vu_close_log(struct vu_dev *vdev) > * vu_log_kick() - Inform the front-end that the log has been modified > * @vdev: vhost-user device > */ > -/* cppcheck-suppress unusedFunction */ > void vu_log_kick(const struct vu_dev *vdev) > { > if (vdev->log_call_fd != -1) { > @@ -532,6 +537,84 @@ void vu_log_kick(const struct vu_dev *vdev) > } > } > > + > +/** Excess newline. > + * vu_log_page() -- Update logging table Single '-' between function name and comment. > + * @log_table: Base address of the logging table > + * @page: Page number that has been updated > + */ > +/* NOLINTNEXTLINE(readability-non-const-parameter) */ > +static void vu_log_page(uint8_t *log_table, uint64_t page) > +{ > + qatomic_or(&log_table[page / 8], 1 << (page % 8)); > +} > + > +/** > + * vu_log_write() -- Log memory write Single '-' between function name and comment. > + * @dev: Vhost-user device vhost-user > + * @address: Memory address > + * @length: Memory size > + */ > +void vu_log_write(const struct vu_dev *vdev, uint64_t address, uint64_t length) > +{ > + uint64_t page; > + > + if (!vdev->log_table || !length || > + !vu_has_feature(vdev, VHOST_F_LOG_ALL)) > + return; > + > + page = address / VHOST_LOG_PAGE; > + while (page * VHOST_LOG_PAGE < address + length) { > + vu_log_page(vdev->log_table, page); > + page++; > + } > + vu_log_kick(vdev); > +} > + > +/** > + * vu_set_log_base_exec() - Set the memory log base > + * @vdev: vhost-user device > + * @vmsg: vhost-user message > + * > + * Return: False as no reply is requested > + * > + * #syscalls:vu mmap munmap I wonder: will there be a way around this the day that we want to disable mmap() for vhost-user mode too? > + */ > +static bool vu_set_log_base_exec(struct vu_dev *vdev, > + struct vhost_user_msg *msg) > +{ > + uint64_t log_mmap_size, log_mmap_offset; > + void *base; > + int fd; > + > + if (msg->fd_num != 1 || msg->hdr.size != sizeof(msg->payload.log)) > + die("Invalid log_base message"); Maybe prefix this with "vhost-user:", otherwise it's not really clear where it's coming from. > + > + fd = msg->fds[0]; > + log_mmap_offset = msg->payload.log.mmap_offset; > + log_mmap_size = msg->payload.log.mmap_size; > + > + debug("Log mmap_offset: %"PRId64, log_mmap_offset); > + debug("Log mmap_size: %"PRId64, log_mmap_size); > + > + base = mmap(0, log_mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, > + log_mmap_offset); > + close(fd); > + if (base == MAP_FAILED) > + die("log mmap error"); Same here. > + > + if (vdev->log_table) > + munmap(vdev->log_table, vdev->log_size); > + > + vdev->log_table = base; > + vdev->log_size = log_mmap_size; > + > + msg->hdr.size = sizeof(msg->payload.u64); > + msg->fd_num = 0; > + > + return true; > +} > + > /** > * vu_set_log_fd_exec() -- Set the eventfd used to report logging update > * @vdev: vhost-user device > @@ -915,6 +998,7 @@ void vu_init(struct ctx *c) > .notification = true, > }; > } > + c->vdev->log_table = NULL; > c->vdev->log_call_fd = -1; > } > > @@ -984,6 +1068,7 @@ static bool (*vu_handle[VHOST_USER_MAX])(struct vu_dev *vdev, > [VHOST_USER_GET_QUEUE_NUM] = vu_get_queue_num_exec, > [VHOST_USER_SET_OWNER] = vu_set_owner_exec, > [VHOST_USER_SET_MEM_TABLE] = vu_set_mem_table_exec, > + [VHOST_USER_SET_LOG_BASE] = vu_set_log_base_exec, > [VHOST_USER_SET_LOG_FD] = vu_set_log_fd_exec, > [VHOST_USER_SET_VRING_NUM] = vu_set_vring_num_exec, > [VHOST_USER_SET_VRING_ADDR] = vu_set_vring_addr_exec, > diff --git a/vhost_user.h b/vhost_user.h > index 2fc0342ff5ba..22a5d059073f 100644 > --- a/vhost_user.h > +++ b/vhost_user.h > @@ -15,6 +15,7 @@ > #include "iov.h" > > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > +#define VHOST_LOG_PAGE 4096 Does this need to be 65536 on ppc64 and ppc64le? In case, we have PAGE_SIZE exported by the Makefile in (it uses 'getconf' so it's not cross-build-safe, we should find a better way eventually). > > #define VHOST_MEMORY_BASELINE_NREGIONS 8 > > @@ -241,5 +242,7 @@ void vu_print_capabilities(void); > void vu_init(struct ctx *c); > void vu_cleanup(struct vu_dev *vdev); > void vu_log_kick(const struct vu_dev *vdev); > +void vu_log_write(const struct vu_dev *vdev, uint64_t address, > + uint64_t length); > void vu_control_handler(struct vu_dev *vdev, int fd, uint32_t events); > #endif /* VHOST_USER_H */ > diff --git a/virtio.c b/virtio.c > index 52d5a4d4be52..13838586ad1a 100644 > --- a/virtio.c > +++ b/virtio.c > @@ -81,6 +81,7 @@ > > #include "util.h" > #include "virtio.h" > +#include "vhost_user.h" > > #define VIRTQUEUE_MAX_SIZE 1024 > > @@ -592,7 +593,72 @@ static inline void vring_used_write(const struct vu_dev *vdev, > struct vring_used *used = vq->vring.used; > > used->ring[i] = *uelem; > - (void)vdev; > + vu_log_write(vdev, vq->vring.log_guest_addr + > + offsetof(struct vring_used, ring[i]), > + sizeof(used->ring[i])); > +} > + > +/** > + * vu_log_queue_fill() -- Log virtqueue memory update Single '-' between function name and comment. > + * @dev: Vhost-user device vhost-user > + * @vq: Virtqueue > + * @index: Descriptor ring index > + * @len: Size of the element > + */ > +static void vu_log_queue_fill(const struct vu_dev *vdev, struct vu_virtq *vq, > + unsigned int index, unsigned int len) > +{ > + struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE]; > + struct vring_desc *desc = vq->vring.desc; > + unsigned int max, min; > + unsigned num_bufs = 0; > + uint64_t read_len; > + > + if (!vdev->log_table || !len || !vu_has_feature(vdev, VHOST_F_LOG_ALL)) > + return; > + > + max = vq->vring.num; > + > + if (le16toh(desc[index].flags) & VRING_DESC_F_INDIRECT) { > + unsigned int desc_len; > + uint64_t desc_addr; > + > + if (le32toh(desc[index].len) % sizeof(struct vring_desc)) > + die("Invalid size for indirect buffer table"); > + > + /* loop over the indirect descriptor table */ > + desc_addr = le64toh(desc[index].addr); > + desc_len = le32toh(desc[index].len); > + max = desc_len / sizeof(struct vring_desc); > + read_len = desc_len; > + desc = vu_gpa_to_va(vdev, &read_len, desc_addr); > + if (desc && read_len != desc_len) { > + /* Failed to use zero copy */ Follow-up on the question above: could we skip mmap() if we used only this path? > + desc = NULL; > + if (!virtqueue_read_indirect_desc(vdev, desc_buf, > + desc_addr, > + desc_len)) > + desc = desc_buf; > + } > + > + if (!desc) > + die("Invalid indirect buffer table"); > + > + index = 0; > + } > + > + do { > + if (++num_bufs > max) > + die("Looped descriptor"); > + > + if (le16toh(desc[index].flags) & VRING_DESC_F_WRITE) { > + min = MIN(le32toh(desc[index].len), len); > + vu_log_write(vdev, le64toh(desc[index].addr), min); > + len -= min; > + } > + } while (len > 0 && > + (virtqueue_read_next_desc(desc, index, max, &index) == > + VIRTQUEUE_READ_DESC_MORE)); It's a bit weird that we could get a negative length because of the do { } while. That is: while (len > 0) { ... if (virtqueue_read_next_desc(desc, index, max, &index) != VIRTQUEUE_READ_DESC_MORE)) break; } would have looked more natural/safer to me. But perhaps there's a reason for that. > } > > > @@ -614,6 +680,8 @@ void vu_queue_fill_by_index(const struct vu_dev *vdev, struct vu_virtq *vq, > if (!vq->vring.avail) > return; > > + vu_log_queue_fill(vdev, vq, index, len); > + > idx = (idx + vq->used_idx) % vq->vring.num; > > uelem.id = htole32(index); > @@ -646,7 +714,9 @@ static inline void vring_used_idx_set(const struct vu_dev *vdev, > struct vu_virtq *vq, uint16_t val) > { > vq->vring.used->idx = htole16(val); > - (void)vdev; > + vu_log_write(vdev, vq->vring.log_guest_addr + > + offsetof(struct vring_used, idx), > + sizeof(vq->vring.used->idx)); > > vq->used_idx = val; > } > diff --git a/virtio.h b/virtio.h > index d95bb07bb913..f572341a0034 100644 > --- a/virtio.h > +++ b/virtio.h > @@ -104,6 +104,8 @@ struct vu_dev_region { > * @features: Vhost-user features > * @protocol_features: Vhost-user protocol features > * @log_call_fd: Eventfd to report logging update > + * @log_size: Size of the logging memory region > + * @log_table: Base of the logging memory region > */ > struct vu_dev { > struct ctx *context; > @@ -113,6 +115,8 @@ struct vu_dev { > uint64_t features; > uint64_t protocol_features; > int log_call_fd; > + uint64_t log_size; > + uint8_t *log_table; > }; > > /** -- Stefano