From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Pk36jD+g; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id A56F05A0278 for ; Mon, 28 Jul 2025 19:03:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1753722231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oyilz/t/vU6+i/0ajKiX+8t0m8emsYsmbds0BGc6QYo=; b=Pk36jD+gB5ms7hr+XJArKUlPc25lYD6aChIgRxHysr1zq0PAezWG/ftwtIUMpZf2eljpGW heulxy4BPMKrqoZBv87fXOuiYxBaZoW3ryPPt7JJqFRIeVWTYnQJSRUi8r2hyPsMIG7In2 k7dpF5ETO+0nV8aVpYPFcQuHo9H37Bc= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-86-OGHe_1O-Preuol5STWEADQ-1; Mon, 28 Jul 2025 13:03:50 -0400 X-MC-Unique: OGHe_1O-Preuol5STWEADQ-1 X-Mimecast-MFC-AGG-ID: OGHe_1O-Preuol5STWEADQ_1753722229 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-23fed1492f6so26604425ad.2 for ; Mon, 28 Jul 2025 10:03:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753722229; x=1754327029; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oyilz/t/vU6+i/0ajKiX+8t0m8emsYsmbds0BGc6QYo=; b=fheCecTatGavbrbyElBGOOzxSapnNQgSmVtyWtuSsCFIMIz/p3O+oqR5KG7Cthu8U2 x/PyIMKJVVcBjrHND2QCX/lZg/PPjHPK/Vr0fwSJ+1l/UoJA6i3pOWb2O7dCzIjJLJMl g/ipLRmHi23oNg4nOyk36HMq5RbGTzzObDz79d+05mq//JKPMBCjU+NpHUs+5LuioZVH obUfBebcTDYX1AiK/m5ntD4IQMo8CfLm4x+8IpgXaAu23eeceL8VEilvMKvDAZqbfbuu gzbTBJMDkqZluXaZVoMOeBPIg3RaqbqY8gELErS51e83mIhWGlK30spAEaso9hvMRbMD 5A1g== X-Gm-Message-State: AOJu0YwnLEx1ndZV/o4ceoCfuydzeJuLqY2gku4MuHL8SooJ5e+sSM3y qm+uDtMZ+78kz6FIp7h8cqJtEeqPUnFAscljU5STTTxuXc2w2QMwsZ0jMe9ucwiXeXRWBZ4Hmk1 y6G0nIBD1nN1V5sWRqXlcOvZjXC1gwD28qMizMJtTkZ+vumTGQxo4CBGtsH3dDIzHfw8jf6R9r0 6mJqfTl/BCdWUjnWrKRuHF5T7lzJWY X-Gm-Gg: ASbGnctoTEVc6kPhPOoTSH8lc3eXB+c8icbkOtwntdMqKJYP1QDJPMVGc5xFQ81JzKb 7Bi68BpwKQLQ5qWCqM5UxHdnljJ72QE+CeBZPJcY+P01ZRkL1XGg0wheuNPOAF/Rzodcp/HNk/q zLjpLcR+7XQ1zEsUCrlOt9 X-Received: by 2002:a17:902:e153:b0:23f:e2e0:f89b with SMTP id d9443c01a7336-23fe2e0fb20mr75301635ad.3.1753722228830; Mon, 28 Jul 2025 10:03:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFcPP6ZvGEVIgnK8bH/EkBEZQRfAUSRKL/ym1TSqimd8LnsRIsTQq9/K47703Oba1qM+ckw7l9hZmnGFR/9KX4= X-Received: by 2002:a17:902:e153:b0:23f:e2e0:f89b with SMTP id d9443c01a7336-23fe2e0fb20mr75301365ad.3.1753722228389; Mon, 28 Jul 2025 10:03:48 -0700 (PDT) MIME-Version: 1.0 References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-11-eperezma@redhat.com> In-Reply-To: From: Eugenio Perez Martin Date: Mon, 28 Jul 2025 19:03:12 +0200 X-Gm-Features: Ac12FXxbDuy9m3le3CRsHyOKKfxMXvcDiqSXnovJUTjKzW_WDjHq2hh1fS6EnfQ Message-ID: Subject: Re: [RFC v2 10/11] tap: add poll(2) to used_idx To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: cGZAq9zKcXHxMeQlPixE17X__y03Uhj4_iewQWOsp0U_1753722229 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: OCURQZAQEO3T4GC5Q5JD7XXL3EV7BBBT X-Message-ID-Hash: OCURQZAQEO3T4GC5Q5JD7XXL3EV7BBBT X-MailFrom: eperezma@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, Jul 24, 2025 at 3:21=E2=80=AFAM David Gibson wrote: > > On Wed, Jul 09, 2025 at 07:47:47PM +0200, Eugenio P=C3=A9rez wrote: > > From ~13Gbit/s to ~11.5Gbit/s. > > Again, I really don't know what you're comparing to what here. > When the buffer is full I'm using poll() to wait until vhost free some buffers, instead of actively checking the used index. This is the cost of the syscall. We could mitigate some of this by making the tx queue larger though. > > > > TODO: Maybe we can reuse epoll for this, not needing to introduce a new > > syscall. > > I really hope so. > > > > > Signed-off-by: Eugenio P=C3=A9rez > > --- > > tap.c | 59 +++++++++++++++++++++++++++++++++++++++++++++---------- > > tap.h | 2 +- > > tcp_buf.c | 6 +++--- > > 3 files changed, 53 insertions(+), 14 deletions(-) > > > > diff --git a/tap.c b/tap.c > > index 55357e3..93a8c12 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -19,6 +19,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -120,7 +121,7 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6, = pkt_buf); > > #define VHOST_NDESCS (PKT_BUF_BYTES / 65520) > > static_assert(!(VHOST_NDESCS & (VHOST_NDESCS - 1)), > > "Number of vhost descs must be a power of two by= standard"); > > -static struct { > > +static struct vhost_virtqueue { > > /* Descriptor index we're using. This is not the same as avail id= x in > > * split: this takes into account the chained descs */ > > uint16_t vring_idx; > > @@ -472,26 +473,63 @@ static void vhost_kick(struct vring_used *used, i= nt kick_fd) { > > eventfd_write(kick_fd, 1); > > } > > > > +/* > > + * #syscalls:pasta read poll > > + */ > > +static uint16_t used_idx(struct vhost_virtqueue *vq, > > + struct vring_avail *avail, > > + const struct vring_used *used, int pollfd) > > +{ > > + struct pollfd fds =3D { .fd =3D pollfd, .events =3D POLLIN }; > > + int r; > > + > > + if (vq->shadow_used_idx =3D=3D vq->last_used_idx) > > + vq->shadow_used_idx =3D le16toh(used->idx); > > + > > + if (vq->shadow_used_idx !=3D vq->last_used_idx || pollfd < 0) > > + return vq->shadow_used_idx; > > + > > + avail->flags &=3D ~htole16(1ULL << VRING_AVAIL_F_NO_INTERRUPT); > > + /* trusting syscall for smp_wb() */ > > + r =3D read(pollfd, (uint64_t[]){0}, sizeof(uint64_t)); > > + assert((r < 0 && errno =3D=3D EAGAIN) || r =3D=3D 8); > > + > > + /* Another oportunity before syscalls */ > > + vq->shadow_used_idx =3D le16toh(used->idx); > > + if (vq->shadow_used_idx !=3D vq->last_used_idx) { > > + return vqs->shadow_used_idx; > > + } > > + > > + r =3D poll(&fds, 1, -1); > > + assert (0 < r); > > + avail->flags |=3D htole16(1ULL << VRING_AVAIL_F_NO_INTERRUPT); > > + vq->shadow_used_idx =3D le16toh(used->idx); > > + return vq->shadow_used_idx; > > I don't understand what this is accomplishing. It seems like it's > blocking waiting for a buffer to be freed, which seems like exactly > what we don't want to do. > > > +} > > + > > /* n =3D target */ > > -void tap_free_old_xmit(size_t n) > > +size_t tap_free_old_xmit(const struct ctx *c, size_t n) > > { > > size_t r =3D 0; > > + int pollfd =3D (n =3D=3D (size_t)-1) ? -1 : c->vq[1].call_fd; > > > > while (r < n) { > > - uint16_t used_idx =3D vqs[1].last_used_idx; > > - if (vqs[1].shadow_used_idx =3D=3D used_idx) { > > - vqs[1].shadow_used_idx =3D le16toh(*(volatile uint= 16_t*)&vring_used_1.used.idx); > > - > > - if (vqs[1].shadow_used_idx =3D=3D used_idx) > > - continue; > > + uint16_t last_used =3D vqs[1].last_used_idx; > > + if (used_idx(&vqs[1], &vring_avail_1.avail, &vring_used_1= .used, pollfd) =3D=3D last_used) { > > + assert(pollfd =3D=3D -1); > > + return r; > > } > > > > + /* Only get used array entries after they have been expos= ed by vhost. */ > > + smp_rmb(); > > /* assert in-order */ > > - assert(vring_used_1.used.ring[used_idx % VHOST_NDESCS].id= =3D=3D vring_avail_1.avail.ring[used_idx % VHOST_NDESCS]); > > - vqs[1].num_free +=3D vqs[1].ndescs[used_idx % VHOST_NDESC= S]; > > + assert(vring_used_1.used.ring[last_used % VHOST_NDESCS].i= d =3D=3D vring_avail_1.avail.ring[last_used % VHOST_NDESCS]); > > + vqs[1].num_free +=3D vqs[1].ndescs[last_used % VHOST_NDES= CS]; > > vqs[1].last_used_idx++; > > r++; > > } > > + > > + return r; > > } > > > > /** > > @@ -1687,6 +1725,7 @@ static int tap_ns_tun(void *arg) > > if (rc < 0) > > die_perror("Failed to add call eventfd to= epoll"); > > } > > + fprintf(stderr, "[eperezma %s:%d][i=3D%d][call_fd=3D%d]\n= ", __func__, __LINE__, i, file.fd); > > c->vq[i].call_fd =3D file.fd; > > > > file.fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > > diff --git a/tap.h b/tap.h > > index 7ca0fb0..7004116 100644 > > --- a/tap.h > > +++ b/tap.h > > @@ -112,7 +112,7 @@ void tap_icmp6_send(const struct ctx *c, > > const struct in6_addr *src, const struct in6_addr *ds= t, > > const void *in, size_t l4len); > > void tap_send_single(const struct ctx *c, const void *data, size_t l2l= en, bool vhost); > > -void tap_free_old_xmit(size_t n); > > +size_t tap_free_old_xmit(const struct ctx *c, size_t n); > > size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, > > size_t bufs_per_frame, size_t nframes, bool vhost)= ; > > void eth_update_mac(struct ethhdr *eh, > > diff --git a/tcp_buf.c b/tcp_buf.c > > index 0437120..f74d22d 100644 > > --- a/tcp_buf.c > > +++ b/tcp_buf.c > > @@ -137,10 +137,10 @@ static void tcp_revert_seq(const struct ctx *c, s= truct tcp_tap_conn **conns, > > } > > } > > > > -static void tcp_buf_free_old_tap_xmit(void) > > +static void tcp_buf_free_old_tap_xmit(const struct ctx *c) > > { > > while (tcp_payload_tap_used) { > > - tap_free_old_xmit(tcp_payload_tap_used); > > + tap_free_old_xmit(c, tcp_payload_tap_used); > > > > tcp_payload_tap_used =3D 0; > > tcp_payload_sock_used =3D 0; > > @@ -162,7 +162,7 @@ void tcp_payload_flush(const struct ctx *c) > > tcp_payload_sock_used - m); > > } > > tcp_payload_tap_used +=3D m; > > - tcp_buf_free_old_tap_xmit(); > > + tcp_buf_free_old_tap_xmit(c); > > } > > > > /** > > -- > David Gibson (he or they) | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you, not the other wa= y > | around. > http://www.ozlabs.org/~dgibson