From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=M6h8r1GB; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 389D05A026F for ; Wed, 02 Apr 2025 14:03:44 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1743595423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mQdjo7F5R5znUjSR52DkAksUCwsO4kbFDsa8YRMLzls=; b=M6h8r1GBTrrnxuUVBqMQH0bp3I+KxcWOoSW6pTO14k4QXIF3Wwgb6zkQ3fFUBKSWZ1jU7I 59yB70c37AzXwzI9+e+ASWiCfb+lFvPZwH2PVy7q2MtRUSIRncHIrPxoRx+n19Qg9ow+fd Kku1wNQgcPz/MXjykvkjoSUg2aXemMc= Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-148-QFOz_i8AMM-DuczN5lbN7w-1; Wed, 02 Apr 2025 08:03:41 -0400 X-MC-Unique: QFOz_i8AMM-DuczN5lbN7w-1 X-Mimecast-MFC-AGG-ID: QFOz_i8AMM-DuczN5lbN7w_1743595421 Received: by mail-pf1-f199.google.com with SMTP id d2e1a72fcca58-736a7d0b82fso45651b3a.1 for ; Wed, 02 Apr 2025 05:03:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743595421; x=1744200221; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mQdjo7F5R5znUjSR52DkAksUCwsO4kbFDsa8YRMLzls=; b=toYal+T6yHtPgYbiYlC/Fckqq/7YYvgHlcgPoFfLNWU1YPbrnL4mAv0ZqE8aYelL0I 3PZc9Tli2sOQcXI4cpH8tdQHv+R9i862UflekvGeu115SLDgykBD8P7w2BtzFe18CBIu htvFFSmE/hG8xjtO1p2i5qOvxNd/ir3d197GB1pCeFAZqvBosO2xaVHytwnSsOyCe64V SuyhvFGcRsqkQCpfH+seW2ud3soV5nENImehqWll+TqpD1k6gHr1h/w/MLH69/CfYxU7 KKWvgB426BV7v/AjPoIki1zRy5wofyyNIOdMjy+53x4ACizxPzXoE3SNA2SO8BC9dtje oAeQ== X-Gm-Message-State: AOJu0Yw7ZIYX1QfTTLJqHHDCrj3AeNJyPa0+MUrwDixYJl7HswyrkDZw 5lwRac2wNSJiQHQLZR6zmAI4dBO0GDJNSjvX67s4Q/IA5X5m9b9edKYKYzC25jO1j5HMhMpdnbL luzczD8yYivRxOIIoJ3kBT6SHCTyrLVfQEZmkOmh5HPHsGRrNBE/83od/w0jI7+Az8r01DE/OdT NHX7pxFXZSyuFeb7CcnX63Fj0R X-Gm-Gg: ASbGncvbWKLb+lbbDFWU8lZ1UVMpDzBxXZyXR2Q7nLHnJfHEP9k9vkqURuPM2CKN7dx iqC2jCzDFG/tiXy5PCzYTpmzcH3gxx5dQcIlmrlDOMPY3/5WeerBJZoBC79n/7UU6rCx1ujM= X-Received: by 2002:a05:6a00:14c1:b0:736:520a:58f9 with SMTP id d2e1a72fcca58-73980462e7dmr18291707b3a.17.1743595420418; Wed, 02 Apr 2025 05:03:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHUL07mSjV/r0T004RBsxUeVADYJvsZPIX+yY88fRqHLdflliDgAMNcJcBBE1FGOWj2+5Fy4XDkuzFWldZ2Jkg= X-Received: by 2002:a05:6a00:14c1:b0:736:520a:58f9 with SMTP id d2e1a72fcca58-73980462e7dmr18291685b3a.17.1743595419809; Wed, 02 Apr 2025 05:03:39 -0700 (PDT) MIME-Version: 1.0 References: <20250401113809.1765282-1-eperezma@redhat.com> <20250401113809.1765282-3-eperezma@redhat.com> <20250402091622.7cda67ba@elisabeth> In-Reply-To: <20250402091622.7cda67ba@elisabeth> From: Eugenio Perez Martin Date: Wed, 2 Apr 2025 14:03:03 +0200 X-Gm-Features: AQ5f1JquzeSaj0U2uyFxIs-YEap3YLkFBegwuFVVCDmejUYyII5BY7iEoQVuueo Message-ID: Subject: Re: [PATCH 2/3] tap: implement vhost_call_cb To: Stefano Brivio X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 3XjgW7cyeYjeXRo-z8C6m7Rr7rc8CLGaXxSw2sseUrU_1743595421 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: Q4IGXHREVHUHNAUV4UBK3KWVVQ6HO6A5 X-Message-ID-Hash: Q4IGXHREVHUHNAUV4UBK3KWVVQ6HO6A5 X-MailFrom: eperezma@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jmaloy@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, Apr 2, 2025 at 9:16=E2=80=AFAM Stefano Brivio = wrote: > > A couple of general notes: > > - there's no need to Cc: people already on this list unless you need > their attention specifically (it can get quite noisy...) > Got it, I'll do for the next time. > - things kind of make sense to me, many are hard to evaluate at this > early stage, so I noted below just some specific comments/questions > here, but in the sense of "being on the same page" my current answer > is... yes, I guess so! > > - I'm not reviewing 1/3 and 3/3 right away as I guess you'll revisit > them anyway, I'm just not sure we need a separate pool... but I'm not > sure if that's temporary either. > > On Tue, 1 Apr 2025 13:38:08 +0200 > Eugenio P=C3=A9rez wrote: > > > This is the main callback when the tap device has processed any buffer. > > Another possibility is to reuse the tap callback for this, so less code > > changes are needed. > > > > Signed-off-by: Eugenio P=C3=A9rez > > --- > > epoll_type.h | 2 + > > passt.c | 4 + > > passt.h | 12 +- > > tap.c | 316 ++++++++++++++++++++++++++++++++++++++++++++++++++- > > tap.h | 2 + > > 5 files changed, 334 insertions(+), 2 deletions(-) > > > > diff --git a/epoll_type.h b/epoll_type.h > > index 7f2a121..6284c79 100644 > > --- a/epoll_type.h > > +++ b/epoll_type.h > > @@ -44,6 +44,8 @@ enum epoll_type { > > EPOLL_TYPE_REPAIR_LISTEN, > > /* TCP_REPAIR helper socket */ > > EPOLL_TYPE_REPAIR, > > + /* vhost-kernel call socket */ > > + EPOLL_TYPE_VHOST_CALL, > > > > EPOLL_NUM_TYPES, > > }; > > diff --git a/passt.c b/passt.c > > index cd06772..19c5d5f 100644 > > --- a/passt.c > > +++ b/passt.c > > @@ -79,6 +79,7 @@ char *epoll_type_str[] =3D { > > [EPOLL_TYPE_VHOST_KICK] =3D "vhost-user kick socket", > > [EPOLL_TYPE_REPAIR_LISTEN] =3D "TCP_REPAIR helper listening = socket", > > [EPOLL_TYPE_REPAIR] =3D "TCP_REPAIR helper socket", > > + [EPOLL_TYPE_VHOST_CALL] =3D "vhost-kernel call socket", > > }; > > static_assert(ARRAY_SIZE(epoll_type_str) =3D=3D EPOLL_NUM_TYPES, > > "epoll_type_str[] doesn't match enum epoll_type"); > > @@ -357,6 +358,9 @@ loop: > > case EPOLL_TYPE_REPAIR: > > repair_handler(&c, eventmask); > > break; > > + case EPOLL_TYPE_VHOST_CALL: > > + vhost_call_cb(&c, ref, &now); > > + break; > > default: > > /* Can't happen */ > > ASSERT(0); > > diff --git a/passt.h b/passt.h > > index 8f45091..eb5aa03 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -45,7 +45,7 @@ union epoll_ref; > > * @icmp: ICMP-specific reference part > > * @data: Data handled by protocol handlers > > * @nsdir_fd: netns dirfd for fallback timer checking if namesp= ace is gone > > - * @queue: vhost-user queue index for this fd > > + * @queue: vhost queue index for this fd > > * @u64: Opaque reference for epoll_ctl() and epoll_wait() > > */ > > union epoll_ref { > > @@ -271,11 +271,14 @@ struct ctx { > > int fd_tap; > > int fd_repair_listen; > > int fd_repair; > > + /* TODO document all added fields */ > > + int fd_vhost; > > unsigned char our_tap_mac[ETH_ALEN]; > > unsigned char guest_mac[ETH_ALEN]; > > uint16_t mtu; > > > > uint64_t hash_secret[2]; > > + uint64_t virtio_features; > > > > int ifi4; > > struct ip4_ctx ip4; > > @@ -299,6 +302,13 @@ struct ctx { > > int no_icmp; > > struct icmp_ctx icmp; > > > > + struct { > > + uint16_t last_used_idx; > > + > > + int kick_fd; > > + int call_fd; > > + } vq[2]; > > + > > int no_dns; > > int no_dns_search; > > int no_dhcp_dns; > > diff --git a/tap.c b/tap.c > > index ce859ba..fbe83aa 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -31,6 +31,7 @@ > > #include > > #include > > #include > > +#include > > Why do we need eventfds here? Is there anything peculiar in the > protocol, or it's all stuff that can be done with "regular" file > descriptors plus epoll? > Yes, vhost-net trusts eventfd for signalling. > > #include > > #include > > #include > > @@ -82,6 +83,46 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_b= uf); > > #define TAP_SEQS 128 /* Different L4 tuples in one batch *= / > > #define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warning= s */ > > > > +#define VHOST_VIRTIO 0xAF > > +#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64) > > +#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64) > > +#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) > > +#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memo= ry) > > +#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vrin= g_state) > > +#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vri= ng_addr) > > +#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vri= ng_file) > > +#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vri= ng_file) > > +#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vrin= g_file) > > +#define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64) > > +#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vr= ing_file) > > Coding style (no strict requirement): align those nicely/table-like if > possible. > I'll fix it (and the rest of code style comments) in the next RFC! > > + > > +char virtio_rx_pkt_buf[PKT_BUF_BYTES] __attribute__((aligned(PAGE_SIZE= ))); > > +static PACKET_POOL_NOINIT(pool_virtiorx4, TAP_MSGS, virtio_rx_pkt_buf)= ; > > +static PACKET_POOL_NOINIT(pool_virtiorx6, TAP_MSGS, virtio_rx_pkt_buf)= ; > > + > > +/* TODO is it better to have 1024 or 65520 bytes per packet? */ > > In general 65535 bytes (including the Ethernet header) appears to be a > good idea, but actual profiling would be nice in the long term. > That can be done for sure. > > +#define VHOST_NDESCS 1024 > > +static struct vring_desc vring_desc[2][VHOST_NDESCS] __attribute__((al= igned(PAGE_SIZE))); > > Coding style, here and a bit all over the place: wrap to 80 columns > ("net" kernel-like style). > > > +static union { > > + struct vring_avail avail; > > + char buf[offsetof(struct vring_avail, ring[VHOST_NDESCS])]; > > +} vring_avail_0 __attribute__((aligned(PAGE_SIZE))), vring_avail_1 __a= ttribute__((aligned(PAGE_SIZE))); > > +static union { > > + struct vring_used used; > > + char buf[offsetof(struct vring_used, ring[VHOST_NDESCS])]; > > +} vring_used_0 __attribute__((aligned(PAGE_SIZE))), vring_used_1 __att= ribute__((aligned(PAGE_SIZE))); > > + > > +/* all descs ring + 2rings * 2vqs + tx pkt buf + rx pkt buf */ > > What's the "all descs ring"? A short "theory of operation" section > might help eventually. > The vhost kernel module expects these descriptors ring for the buffer addresses & length. I can add more docs for sure. > > +#define N_VHOST_REGIONS 7 > > +union { > > + struct vhost_memory mem; > > + char buf[offsetof(struct vhost_memory, regions[N_VHOST_REGIONS])]= ; > > +} vhost_memory =3D { > > + .mem =3D { > > + .nregions =3D N_VHOST_REGIONS, > > + }, > > +}; > > + > > /** > > * tap_l2_max_len() - Maximum frame size (including L2 header) for cur= rent mode > > * @c: Execution context > > @@ -1360,6 +1401,89 @@ void tap_listen_handler(struct ctx *c, uint32_t = events) > > tap_start_connection(c); > > } > > > > +static void *virtqueue_get_buf(struct ctx *c, unsigned qid, unsigned *= len) > > +{ > > + struct vring_used *used =3D !qid ? &vring_used_0.used : &vring_us= ed_1.used; > > + uint32_t i; > > + uint16_t used_idx, last_used; > > + > > + /* TODO think if this has races with previous eventfd_read */ > > + /* TODO we could improve performance with a shadow_used_idx */ > > + used_idx =3D le16toh(used->idx); > > + > > + smp_rmb(); > > + > > + if (used_idx =3D=3D c->vq[qid].last_used_idx) { > > + *len =3D 0; > > + return NULL; > > + } > > + > > + last_used =3D c->vq[qid].last_used_idx & VHOST_NDESCS; > > + i =3D le32toh(used->ring[last_used].id); > > + *len =3D le32toh(used->ring[last_used].len); > > + > > + if (i > VHOST_NDESCS) { > > + /* TODO imporove this, it will cause infinite loop */ > > + warn("vhost: id %u at used position %u out of range (max= =3D%u)", i, last_used, VHOST_NDESCS); > > + return NULL; > > + } > > + > > + if (*len > PKT_BUF_BYTES/VHOST_NDESCS) { > > + warn("vhost: id %d len %u > %zu", i, *len, PKT_BUF_BYTES/= VHOST_NDESCS); > > + return NULL; > > + } > > + > > + /* TODO check if the id is valid and it has not been double used = */ > > + c->vq[qid].last_used_idx++; > > + return virtio_rx_pkt_buf + i * (PKT_BUF_BYTES/VHOST_NDESCS); > > +} > > + > > +/* container is tx but we receive it from vhost POV */ > > +void vhost_call_cb(struct ctx *c, union epoll_ref ref, const struct ti= mespec *now) > > +{ > > + eventfd_read(ref.fd, (eventfd_t[]){ 0 }); > > + > > + tap_flush_pools(); > > + > > + while (true) { > > + struct virtio_net_hdr_mrg_rxbuf *hdr; > > + unsigned len; > > + > > + hdr =3D virtqueue_get_buf(c, ref.queue, &len); > > + if (!hdr) > > + break; > > + > > + if (len < sizeof(*hdr)) { > > + warn("vhost: invalid len %u", len); > > + continue; > > + } > > + > > + /* TODO this will break from this moment */ > > + if (hdr->num_buffers !=3D 1) { > > + warn("vhost: Too many buffers %u, %zu bytes shoul= d be enough for everybody!", hdr->num_buffers, PKT_BUF_BYTES/VHOST_NDESCS); > > + continue; > > + } > > + > > + /* TODO fix the v6 pool to support ipv6 */ > > + tap_add_packet(c, len - sizeof(*hdr), (void *)(hdr+1), po= ol_virtiorx4, pool_virtiorx6); > > + } > > + > > + tap_handler(c, now, pool_virtiorx4, pool_virtiorx6); > > +} > > + > > +/* TODO: Actually refill */ > > +static void rx_pkt_refill(int kick_fd) > > +{ > > + for (unsigned i =3D 0; i < VHOST_NDESCS; ++i) { > > + vring_desc[0][i].addr =3D (uintptr_t)virtio_rx_pkt_buf + = i * (PKT_BUF_BYTES/VHOST_NDESCS); > > + vring_desc[0][i].len =3D PKT_BUF_BYTES/VHOST_NDESCS; > > + vring_desc[0][i].flags =3D VRING_DESC_F_WRITE; > > + } > > + > > + vring_avail_0.avail.idx =3D VHOST_NDESCS; > > + eventfd_write(kick_fd, 1); > > +} > > + > > /** > > * tap_ns_tun() - Get tuntap fd in namespace > > * @c: Execution context > > @@ -1370,10 +1494,13 @@ void tap_listen_handler(struct ctx *c, uint32_t= events) > > */ > > static int tap_ns_tun(void *arg) > > { > > + /* TODO we need to check if vhost support VIRTIO_NET_F_MRG_RXBUF = and VHOST_NET_F_VIRTIO_NET_HDR actually */ > > + static const uint64_t features =3D > > + (1ULL << VIRTIO_F_VERSION_1) | (1ULL << VIRTIO_NET_F_MRG_= RXBUF) | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR); > > struct ifreq ifr =3D { .ifr_flags =3D IFF_TAP | IFF_NO_PI }; > > I kind of wonder, by the way, if IFF_TUN simplifies things here. It's > something we should already add, as an option, see also: > https://bugs.passt.top/show_bug.cgi?id=3D49, but if it makes your life > easier for any reason you might consider adding it right away. > I'm not sure, but maybe we can reuse something, yes! > > int flags =3D O_RDWR | O_NONBLOCK | O_CLOEXEC; > > struct ctx *c =3D (struct ctx *)arg; > > - int fd, rc; > > + int fd, vhost_fd, rc; > > > > c->fd_tap =3D -1; > > memcpy(ifr.ifr_name, c->pasta_ifn, IFNAMSIZ); > > @@ -1383,6 +1510,175 @@ static int tap_ns_tun(void *arg) > > if (fd < 0) > > die_perror("Failed to open() /dev/net/tun"); > > > > + vhost_fd =3D open("/dev/vhost-net", flags); > > + if (vhost_fd < 0) > > + die_perror("Failed to open() /dev/vhost-net"); > > Note pretty much to future self: this will need adjustments to AppArmor > and SELinux policies. > > > + > > + rc =3D ioctl(vhost_fd, VHOST_SET_OWNER, NULL); > > + if (rc < 0) > > + die_perror("VHOST_SET_OWNER ioctl on /dev/vhost-net faile= d"); > > + > > + rc =3D ioctl(vhost_fd, VHOST_GET_FEATURES, &c->virtio_features); > > + if (rc < 0) > > + die_perror("VHOST_GET_FEATURES ioctl on /dev/vhost-net fa= iled"); > > + > > + /* TODO inform more explicitely */ > > + fprintf(stderr, "vhost features: %lx\n", c->virtio_features); > > + fprintf(stderr, "req features: %lx\n", features); > > + c->virtio_features &=3D features; > > + if (c->virtio_features !=3D features) > > + die("vhost does not support required features"); > > + > > + for (int i =3D 0; i < ARRAY_SIZE(c->vq); i++) { > > No declarations directly in loops (it hides them somehow). > > > + struct vhost_vring_file file =3D { > > + .index =3D i, > > + }; > > + union epoll_ref ref =3D { .type =3D EPOLL_TYPE_VHOST_CALL= , > > + .queue =3D i }; > > + struct epoll_event ev; > > + > > + file.fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > > + ref.fd =3D file.fd; > > + if (file.fd < 0) > > + die_perror("Failed to create call eventfd"); > > + > > + rc =3D ioctl(vhost_fd, VHOST_SET_VRING_CALL, &file); > > + if (rc < 0) > > + die_perror( > > + "VHOST_SET_VRING_CALL ioctl on /dev/vhost= -net failed"); > > Same as net kernel style: if it's more than a line, even as a single > statement, use curly brackets (rationale: somebody later adds another > statement without noticing and... oops). > > > + > > + ev =3D (struct epoll_event){ .data.u64 =3D ref.u64, .even= ts =3D EPOLLIN }; > > + rc =3D epoll_ctl(c->epollfd, EPOLL_CTL_ADD, ref.fd, &ev); > > + if (rc < 0) > > + die_perror("Failed to add call eventfd to epoll")= ; > > + c->vq[i].call_fd =3D file.fd; > > + } > > + > > + /* 1:1 translation */ > > + vhost_memory.mem.regions[0] =3D (struct vhost_memory_region){ > > Space between cast and initialiser, ") {", for consistency. I'll wait > before we have some kind of theory of operation / general description > before actually looking into those, I'm not sure about the exact role of > those seven regions. > In the case of QEMU, vhost reads the descriptor ring with addresses in the guest address space, as it is a direct communication between them without QEMU intervening. These regions tells vhost how to translate these addresses to addresses in the QEMU address space so vhost can use copy_to_user and copy_from_user. > > + .guest_phys_addr =3D (uintptr_t)&vring_desc, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(vring_desc), > > + .userspace_addr =3D (uintptr_t)&vring_desc, > > + }; > > + vhost_memory.mem.regions[1] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)&vring_avail_0, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(vring_avail_0), > > + .userspace_addr =3D (uintptr_t)&vring_avail_0, > > + }; > > + vhost_memory.mem.regions[2] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)&vring_avail_1, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(vring_avail_1), > > + .userspace_addr =3D (uintptr_t)&vring_avail_1, > > + }; > > + vhost_memory.mem.regions[3] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)&vring_used_0, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(vring_avail_0), > > + .userspace_addr =3D (uintptr_t)&vring_used_0, > > + }; > > + vhost_memory.mem.regions[4] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)&vring_used_1, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(vring_avail_1), > > + .userspace_addr =3D (uintptr_t)&vring_used_1, > > + }; > > + vhost_memory.mem.regions[5] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)virtio_rx_pkt_buf, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(virtio_rx_pkt_buf), > > + .userspace_addr =3D (uintptr_t)virtio_rx_pkt_buf, > > + }; > > + vhost_memory.mem.regions[6] =3D (struct vhost_memory_region){ > > + .guest_phys_addr =3D (uintptr_t)pkt_buf, > > + /* memory size should include the last byte, but we proba= bly never send > > + * ptrs there so... > > + */ > > + .memory_size =3D sizeof(pkt_buf), > > + .userspace_addr =3D (uintptr_t)pkt_buf, > > + }; > > + static_assert(6 < N_VHOST_REGIONS); > > + > > + rc =3D ioctl(vhost_fd, VHOST_SET_MEM_TABLE, &vhost_memory.mem); > > + if (rc < 0) > > + die_perror( > > + "VHOST_SET_MEM_TABLE ioctl on /dev/vhost-net fail= ed"); > > + > > + /* TODO: probably it increases RX perf */ > > +#if 0 > > + struct ifreq ifr; > > + memset(&ifr, 0, sizeof(ifr)); > > + > > + if (ioctl(fd, TUNGETIFF, &ifr) !=3D 0) > > + die_perror("Unable to query TUNGETIFF on FD %d", fd); > > + } > > + > > + if (ifr.ifr_flags & IFF_VNET_HDR) > > + net->dev.features &=3D ~(1ULL << VIRTIO_NET_F_MRG_RXBUF); > > +#endif > > + rc =3D ioctl(vhost_fd, VHOST_SET_FEATURES, &c->virtio_features); > > + if (rc < 0) > > + die_perror("VHOST_SET_FEATURES ioctl on /dev/vhost-net fa= iled"); > > + > > + /* Duplicating foreach queue to follow the exact order from QEMU = */ > > + for (int i =3D 0; i < ARRAY_SIZE(c->vq); i++) { > > + struct vhost_vring_addr addr =3D { > > + .index =3D i, > > + .desc_user_addr =3D (unsigned long)vring_desc[i], > > + .avail_user_addr =3D i =3D=3D 0 ? (unsigned long)= &vring_avail_0 : > > + = (unsigned long)&vring_avail_1, > > + .used_user_addr =3D i =3D=3D 0 ? (unsigned long)&= vring_used_0 : > > + = (unsigned long)&vring_used_1, > > + /* GPA addr */ > > + .log_guest_addr =3D i =3D=3D 0 ? (unsigned long)&= vring_used_0 : > > + = (unsigned long)&vring_used_1, > > + }; > > + struct vhost_vring_state state =3D { > > + .index =3D i, > > + .num =3D VHOST_NDESCS, > > + }; > > + struct vhost_vring_file file =3D { > > + .index =3D i, > > + }; > > + > > + rc =3D ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state); > > + if (rc < 0) > > + die_perror( > > + "VHOST_SET_VRING_NUM ioctl on /dev/vhost-= net failed"); > > + > > + fprintf(stderr, "qid: %d\n", i); > > + fprintf(stderr, "vhost desc addr: 0x%llx\n", addr.desc_us= er_addr); > > + fprintf(stderr, "vhost avail addr: 0x%llx\n", addr.avail_= user_addr); > > + fprintf(stderr, "vhost used addr: 0x%llx\n", addr.used_us= er_addr); > > + rc =3D ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr); > > + if (rc < 0) > > + die_perror( > > + "VHOST_SET_VRING_ADDR ioctl on /dev/vhost= -net failed"); > > + > > + file.fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > > + if (file.fd < 0) > > + die_perror("Failed to create kick eventfd"); > > + rc =3D ioctl(vhost_fd, VHOST_SET_VRING_KICK, &file); > > + if (rc < 0) > > + die_perror( > > + "VHOST_SET_VRING_KICK ioctl on /dev/vhost= -net failed"); > > + c->vq[i].kick_fd =3D file.fd; > > + } > > + > > rc =3D ioctl(fd, (int)TUNSETIFF, &ifr); > > if (rc < 0) > > die_perror("TUNSETIFF ioctl on /dev/net/tun failed"); > > @@ -1390,7 +1686,25 @@ static int tap_ns_tun(void *arg) > > if (!(c->pasta_ifi =3D if_nametoindex(c->pasta_ifn))) > > die("Tap device opened but no network interface found"); > > > > + rx_pkt_refill(c->vq[0].kick_fd); > > + > > + /* Duplicating foreach queue to follow the exact order from QEMU = */ > > + for (int i =3D 0; i < ARRAY_SIZE(c->vq); i++) { > > + struct vhost_vring_file file =3D { > > + .index =3D i, > > + .fd =3D fd, > > + }; > > + > > + fprintf(stderr, "qid: %d\n", file.index); > > + fprintf(stderr, "tap fd: %d\n", file.fd); > > + rc =3D ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &file); > > + if (rc < 0) > > + die_perror( > > + "VHOST_NET_SET_BACKEND ioctl on /dev/vhos= t-net failed"); > > + } > > + > > c->fd_tap =3D fd; > > + c->fd_vhost =3D vhost_fd; > > > > return 0; > > } > > diff --git a/tap.h b/tap.h > > index 0b5ad17..91d3f62 100644 > > --- a/tap.h > > +++ b/tap.h > > @@ -69,6 +69,8 @@ static inline void tap_hdr_update(struct tap_hdr *thd= r, size_t l2len) > > thdr->vnet_len =3D htonl(l2len); > > } > > > > +void vhost_call_cb(struct ctx *c, union epoll_ref ref, const struct ti= mespec *now); > > + > > unsigned long tap_l2_max_len(const struct ctx *c); > > void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto); > > void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, >