From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=dk8dYV3s; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 0CA915A004C for ; Tue, 27 Aug 2024 00:14:27 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724710467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vHIVzFV9x4MVpXrqMSWSy+2+Ty7DiCQ9yW4kLvEN9Bc=; b=dk8dYV3sDbox5KsWwG2E9a/5yqRUqLLMN3zOeQAKbTcf5lsDO6CtV2bB0z7Pnmxf2zByJB 6p6ZoAHIoXZzdjA7Ezuh6irFq1Tir7WD1naVSdtAyw416JjaC+eNwKcTfMWg+mcyi8VB6B VatWYKWvVFsrSQeBnrpc34NAAQTqSyU= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-388-iQ3Fzs5FMOWooN45RXEkWg-1; Mon, 26 Aug 2024 18:14:25 -0400 X-MC-Unique: iQ3Fzs5FMOWooN45RXEkWg-1 Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-715c530f80eso1551b3a.0 for ; Mon, 26 Aug 2024 15:14:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724710464; x=1725315264; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vHIVzFV9x4MVpXrqMSWSy+2+Ty7DiCQ9yW4kLvEN9Bc=; b=GhjcYC5jxdo178jKg4TIhLZHjXtb7PoP7esUrNr13I5AYOuEDP/dWwT5uGWd/8IcGV /l3Q7NnN+4LJNVW/GC80vPvucmTucLKu6kds0AmbYjMSh+Xih5odfNdcIpOzuIERgEzo XouqIlqcJbcw+FFuubS9xDrUXs9U138qONPXGt7F47/ArPb2fO4jTycCcs/zezuSxuZm as9pfjfaP/SxN74IyZ6GULw7w896pD77xRs6rkI9+f0K9EoxZilF4nodmKbZfHlkGeBe OthmTp7pQTjHNVqI48pmW4X//nMw96BLVnNCQRHKCs0pQ5Rab+oqCsz8LkjNcUbvK1mI XKsw== X-Gm-Message-State: AOJu0YwDg37mlLJepIP7B8kcsoFpIqgbzSvMRlY9tkegg4W0WHqrX9Qp 9hBXKlCGmAZoYiBqNJcYtYWnXnwArI8NwBXI6B03uu5aI1o0NMguGysvaZs3NxrXyXIVucRu9ys pxjmT54A9Ey8iZzBqXDWDFv14wn8U1bJRXUdu1kWtpYM+RyrIEg== X-Received: by 2002:a05:6a00:66d1:b0:714:186a:ae0b with SMTP id d2e1a72fcca58-71445e77f21mr10892137b3a.24.1724710464321; Mon, 26 Aug 2024 15:14:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGyzmLZF6ASdFEdR064npjwrtHUp25HVHLCsofV09tokfhAgxcaBqnG2a2cgq6+8ML2nNX2bw== X-Received: by 2002:a05:6a00:66d1:b0:714:186a:ae0b with SMTP id d2e1a72fcca58-71445e77f21mr10892121b3a.24.1724710463878; Mon, 26 Aug 2024 15:14:23 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71434255635sm7465622b3a.72.2024.08.26.15.14.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 15:14:23 -0700 (PDT) Date: Tue, 27 Aug 2024 00:14:20 +0200 From: Stefano Brivio To: David Gibson , Laurent Vivier Subject: Re: [PATCH v3 3/4] vhost-user: introduce vhost-user API Message-ID: <20240827001420.1f895c7d@elisabeth> In-Reply-To: References: <20240815155024.827956-1-lvivier@redhat.com> <20240815155024.827956-4-lvivier@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: NI2FVWLCELRD5X74P6QTBO6MHUNAQ6PZ X-Message-ID-Hash: NI2FVWLCELRD5X74P6QTBO6MHUNAQ6PZ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 26 Aug 2024 15:26:44 +1000 David Gibson wrote: > On Thu, Aug 15, 2024 at 05:50:22PM +0200, Laurent Vivier wrote: > > Add vhost_user.c and vhost_user.h that define the functions needed > > to implement vhost-user backend. > > > > [...] > > > > +static int vu_message_read_default(int conn_fd, struct vhost_user_msg *vmsg) > > +{ > > + char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * > > + sizeof(int))] = { 0 }; > > + struct iovec iov = { > > + .iov_base = (char *)vmsg, > > + .iov_len = VHOST_USER_HDR_SIZE, > > + }; > > + struct msghdr msg = { > > + .msg_iov = &iov, > > + .msg_iovlen = 1, > > + .msg_control = control, > > + .msg_controllen = sizeof(control), > > + }; > > + ssize_t ret, sz_payload; > > + struct cmsghdr *cmsg; > > + size_t fd_size; > > + > > + ret = recvmsg(conn_fd, &msg, MSG_DONTWAIT); > > + if (ret < 0) { > > + if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) > > + return 0; > > + return -1; > > + } > > + > > + vmsg->fd_num = 0; > > + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; > > + cmsg = CMSG_NXTHDR(&msg, cmsg)) { > > + if (cmsg->cmsg_level == SOL_SOCKET && > > + cmsg->cmsg_type == SCM_RIGHTS) { > > + fd_size = cmsg->cmsg_len - CMSG_LEN(0); > > + ASSERT(fd_size / sizeof(int) <= > > + VHOST_MEMORY_BASELINE_NREGIONS); > > IIUC, this could be tripped by a bug in the peer (qemu?) rather than > in our own code. In which case I think a die() would be more > appropriate than an ASSERT(). Ah, right, it wouldn't be our issue... what about neither, so that we don't crash if QEMU has an issue we could easily recover from? > > [...] > > > > +/** > > + * vu_set_mem_table_exec() - Sets the memory map regions to be able to > > + * translate the vring addresses. > > + * @vdev: Vhost-user device > > + * @vmsg: Vhost-user message > > + * > > + * Return: false as no reply is requested > > + * > > + * #syscalls:vu mmap munmap > > + */ > > +static bool vu_set_mem_table_exec(struct vu_dev *vdev, > > + struct vhost_user_msg *msg) > > +{ > > + struct vhost_user_memory m = msg->payload.memory, *memory = &m; > > Is there a reason to take a copy of the message, rather than just > referencing into msg as passed? > > > + unsigned int i; > > + > > + for (i = 0; i < vdev->nregions; i++) { > > + struct vu_dev_region *r = &vdev->regions[i]; > > + /* NOLINTNEXTLINE(performance-no-int-to-ptr) */ > > + void *mm = (void *)r->mmap_addr; > > + > > + if (mm) > > + munmap(mm, r->size + r->mmap_offset); > > Do we actually ever need to change the mapping of the regions? If not > we can avoid this unmapping loop. > > > + } > > + vdev->nregions = memory->nregions; > > + > > + debug("Nregions: %u", memory->nregions); > > + for (i = 0; i < vdev->nregions; i++) { > > + struct vhost_user_memory_region *msg_region = &memory->regions[i]; > > + struct vu_dev_region *dev_region = &vdev->regions[i]; > > + void *mmap_addr; > > + > > + debug("Region %d", i); > > + debug(" guest_phys_addr: 0x%016"PRIx64, > > + msg_region->guest_phys_addr); > > + debug(" memory_size: 0x%016"PRIx64, > > + msg_region->memory_size); > > + debug(" userspace_addr 0x%016"PRIx64, > > + msg_region->userspace_addr); > > + debug(" mmap_offset 0x%016"PRIx64, > > + msg_region->mmap_offset); > > + > > + dev_region->gpa = msg_region->guest_phys_addr; > > + dev_region->size = msg_region->memory_size; > > + dev_region->qva = msg_region->userspace_addr; > > + dev_region->mmap_offset = msg_region->mmap_offset; > > + > > + /* We don't use offset argument of mmap() since the > > + * mapped address has to be page aligned, and we use huge > > + * pages. > > We do what now? We do madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE) in main(), but we're not using pkt_buf in this case, so I guess it's not relevant. I'm not sure if _passt_ calling madvise(..., MADV_HUGEPAGE) on the memory regions we get would have any effect, by the way. > > [...] > > > > +/** > > + * vu_send() - Send a buffer to the front-end using the RX virtqueue > > + * @vdev: vhost-user device > > + * @buf: address of the buffer > > + * @size: size of the buffer > > + * > > + * Return: number of bytes sent, -1 if there is an error > > + */ > > +/* cppcheck-suppress unusedFunction */ > > +int vu_send(struct vu_dev *vdev, const void *buf, size_t size) > > +{ > > + struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; > > + struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE]; > > + struct iovec in_sg[VIRTQUEUE_MAX_SIZE]; > > + size_t lens[VIRTQUEUE_MAX_SIZE]; > > + __virtio16 *num_buffers_ptr = NULL; > > + size_t hdrlen = vdev->hdrlen; > > + int in_sg_count = 0; > > + size_t offset = 0; > > + int i = 0, j; > > + > > + debug("vu_send size %zu hdrlen %zu", size, hdrlen); > > + > > + if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) { > > + err("Got packet, but no available descriptors on RX virtq."); > > + return 0; > > + } > > + > > + while (offset < size) { > > + size_t len; > > + int total; > > + int ret; > > + > > + total = 0; > > + > > + if (i == ARRAY_SIZE(elem) || > > + in_sg_count == ARRAY_SIZE(in_sg)) { > > + err("virtio-net unexpected long buffer chain"); > > + goto err; > > + } > > + > > + elem[i].out_num = 0; > > + elem[i].out_sg = NULL; > > + elem[i].in_num = ARRAY_SIZE(in_sg) - in_sg_count; > > + elem[i].in_sg = &in_sg[in_sg_count]; > > + > > + ret = vu_queue_pop(vdev, vq, &elem[i]); > > + if (ret < 0) { > > + if (vu_wait_queue(vq) != -1) > > + continue; > > + if (i) { > > + err("virtio-net unexpected empty queue: " > > + "i %d mergeable %d offset %zd, size %zd, " > > + "features 0x%" PRIx64, > > + i, vu_has_feature(vdev, > > + VIRTIO_NET_F_MRG_RXBUF), > > + offset, size, vdev->features); > > + } > > + offset = -1; > > + goto err; > > + } > > + in_sg_count += elem[i].in_num; > > Initially I thought this would consume the entire in_sg array on the > first loop iteration, but I guess vu_queue_pop() reduces in_num from > the value we initialise above. > > > + if (elem[i].in_num < 1) { > > I realise it doesn't really matter in this context, but it makes more > sense to me for this check to go _before_ we use in_num to update > in_sg_cuont. > > > > + err("virtio-net receive queue contains no in buffers"); > > + vu_queue_detach_element(vq); > > + offset = -1; > > + goto err; > > + } > > + > > + if (i == 0) { > > + struct virtio_net_hdr hdr = { > > + .flags = VIRTIO_NET_HDR_F_DATA_VALID, > > + .gso_type = VIRTIO_NET_HDR_GSO_NONE, > > + }; > > + > > + ASSERT(offset == 0); > > + ASSERT(elem[i].in_sg[0].iov_len >= hdrlen); > > Is this necessarily our bug, or could it be cause by the peer giving > unreasonably small buffers? If the latter, then a die() would make > more sense. ...same here. -- Stefano