On Tue, Aug 27, 2024 at 12:14:20AM +0200, Stefano Brivio wrote: > On Mon, 26 Aug 2024 15:26:44 +1000 > David Gibson wrote: > > > On Thu, Aug 15, 2024 at 05:50:22PM +0200, Laurent Vivier wrote: > > > Add vhost_user.c and vhost_user.h that define the functions needed > > > to implement vhost-user backend. > > > > > > [...] > > > > > > +static int vu_message_read_default(int conn_fd, struct vhost_user_msg *vmsg) > > > +{ > > > + char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * > > > + sizeof(int))] = { 0 }; > > > + struct iovec iov = { > > > + .iov_base = (char *)vmsg, > > > + .iov_len = VHOST_USER_HDR_SIZE, > > > + }; > > > + struct msghdr msg = { > > > + .msg_iov = &iov, > > > + .msg_iovlen = 1, > > > + .msg_control = control, > > > + .msg_controllen = sizeof(control), > > > + }; > > > + ssize_t ret, sz_payload; > > > + struct cmsghdr *cmsg; > > > + size_t fd_size; > > > + > > > + ret = recvmsg(conn_fd, &msg, MSG_DONTWAIT); > > > + if (ret < 0) { > > > + if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) > > > + return 0; > > > + return -1; > > > + } > > > + > > > + vmsg->fd_num = 0; > > > + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; > > > + cmsg = CMSG_NXTHDR(&msg, cmsg)) { > > > + if (cmsg->cmsg_level == SOL_SOCKET && > > > + cmsg->cmsg_type == SCM_RIGHTS) { > > > + fd_size = cmsg->cmsg_len - CMSG_LEN(0); > > > + ASSERT(fd_size / sizeof(int) <= > > > + VHOST_MEMORY_BASELINE_NREGIONS); > > > > IIUC, this could be tripped by a bug in the peer (qemu?) rather than > > in our own code. In which case I think a die() would be more > > appropriate than an ASSERT(). > > Ah, right, it wouldn't be our issue... what about neither, so that we > don't crash if QEMU has an issue we could easily recover from? It wasn't immediately obvious to me if we could reasily recover from that or not. [snip] > > > + vdev->nregions = memory->nregions; > > > + > > > + debug("Nregions: %u", memory->nregions); > > > + for (i = 0; i < vdev->nregions; i++) { > > > + struct vhost_user_memory_region *msg_region = &memory->regions[i]; > > > + struct vu_dev_region *dev_region = &vdev->regions[i]; > > > + void *mmap_addr; > > > + > > > + debug("Region %d", i); > > > + debug(" guest_phys_addr: 0x%016"PRIx64, > > > + msg_region->guest_phys_addr); > > > + debug(" memory_size: 0x%016"PRIx64, > > > + msg_region->memory_size); > > > + debug(" userspace_addr 0x%016"PRIx64, > > > + msg_region->userspace_addr); > > > + debug(" mmap_offset 0x%016"PRIx64, > > > + msg_region->mmap_offset); > > > + > > > + dev_region->gpa = msg_region->guest_phys_addr; > > > + dev_region->size = msg_region->memory_size; > > > + dev_region->qva = msg_region->userspace_addr; > > > + dev_region->mmap_offset = msg_region->mmap_offset; > > > + > > > + /* We don't use offset argument of mmap() since the > > > + * mapped address has to be page aligned, and we use huge > > > + * pages. > > > > We do what now? > > We do madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE) in main(), but > we're not using pkt_buf in this case, so I guess it's not relevant. I'm > not sure if _passt_ calling madvise(..., MADV_HUGEPAGE) on the memory > regions we get would have any effect, by the way. Huh, I'd forgotten about that. AIUI qemu allocates the memory and we map it into passt, so I don't think our madvise() would have any effect here. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson