From: Eugenio Perez Martin <eperezma@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, jasowang@redhat.com
Subject: Re: [RFC v2 10/11] tap: add poll(2) to used_idx
Date: Thu, 31 Jul 2025 10:11:14 +0200 [thread overview]
Message-ID: <CAJaqyWeiEUJfjvmHu+mqVy4GUfuiJR2RNHv7OyXzJpwKibyDvQ@mail.gmail.com> (raw)
In-Reply-To: <aIsGRehsCgTgSMyB@zatzit>
On Thu, Jul 31, 2025 at 7:59 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Wed, Jul 30, 2025 at 08:11:20AM +0200, Eugenio Perez Martin wrote:
> > On Wed, Jul 30, 2025 at 2:34 AM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Tue, Jul 29, 2025 at 09:04:19AM +0200, Eugenio Perez Martin wrote:
> > > > On Tue, Jul 29, 2025 at 2:33 AM David Gibson
> > > > <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > On Mon, Jul 28, 2025 at 07:03:12PM +0200, Eugenio Perez Martin wrote:
> > > > > > On Thu, Jul 24, 2025 at 3:21 AM David Gibson
> > > > > > <david@gibson.dropbear.id.au> wrote:
> > > > > > >
> > > > > > > On Wed, Jul 09, 2025 at 07:47:47PM +0200, Eugenio Pérez wrote:
> > > > > > > > From ~13Gbit/s to ~11.5Gbit/s.
> > > > > > >
> > > > > > > Again, I really don't know what you're comparing to what here.
> > > > > > >
> > > > > >
> > > > > > When the buffer is full I'm using poll() to wait until vhost free some
> > > > > > buffers, instead of actively checking the used index. This is the cost
> > > > > > of the syscall.
> > > > >
> > > > > Ah, right. So.. I'm not sure if it's so much the cost of the syscall
> > > > > itself, as the fact that you're actively waiting for free buffers,
> > > > > rather than returning to the main epoll loop so you can maybe make
> > > > > progress on something else before returning to the Tx path.
> > > > >
> > > >
> > > > Previous patch also wait for free buffers, but it does it burning a
> > > > CPU for that.
> > >
> > > Ah, ok. Hrm. I still find it hard to believe that it's the cost of
> > > the syscall per se that's causing the slowdown. My guess is that the
> > > cost is because having the poll() leads to a higher latency between
> > > the buffer being released and us detecting it and re-using.
> > >
> > > > The next patch is the one that allows to continue progress as long as
> > > > there are enough free buffers, instead of always wait until all the
> > > > buffer has been sent. But there are situations where this conversion
> > > > needs other code changes. In particular, all the calls to
> > > > tcp_payload_flush after checking that we have enough buffers like:
> > > >
> > > > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) {
> > > > tcp_buf_free_old_tap_xmit(c, 2);
> > > > tcp_payload_flush(c);
> > > > ...
> > > > }
> > > >
> > > > Seems like coroutines would be a good fix here, but maybe there are
> > > > simpler ways to go back to the main loop while keeping the tcp socket
> > > > "ready to read" by epoll POV. Out of curiosity, what do you think
> > > > about setjmp()? :).
> > >
> > > I think it has its uses, but deciding to go with it is a big
> > > architectural decision not to be entered into likely.
> > >
> >
> > Got it,
> >
> > Another idea is to add the flows that are being processed but they had
> > no space available into the virtqueue to a "pending" list. When the
> > kernel tells pasta that new buffers are available, pasta checks that
> > pending list. Maybe it can consist of only one element.
>
> I think this makes sense. We already kind of want the same thing for
> the (rare) cases where the tap buffer fills up (or the pipe buffer for
> passt/qemu). This is part of what we'd need to make the event
> handling simpler (if we have proper wakeups on tap side writability we
> can always use EPOLLET on the socket side, instead of turning it on
> and off).
>
That can be one way, yes.
> I'm not actually sure if we need an explicit list. It might be
> adequate to just have a pending flag (or even derive it from existing
> state) and poll the entire flow list. Might be more expensive, but
> could well be good enough (we already scan the entire flow list on
> every epoll cycle).
>
I'm ok with a new flag but the memory increase will be bigger than a
single "pending" entry. If scan the whole list for a new state is also
a possibility, sure, I'm ok with that too.
next prev parent reply other threads:[~2025-07-31 8:11 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 17:47 [RFC v2 00/11] Add vhost-net kernel support Eugenio Pérez
2025-07-09 17:47 ` [RFC v2 01/11] tap: implement vhost_call_cb Eugenio Pérez
2025-07-23 6:56 ` David Gibson
2025-07-28 16:33 ` Eugenio Perez Martin
2025-07-29 0:11 ` David Gibson
2025-07-09 17:47 ` [RFC v2 02/11] tap: add die() on vhost error Eugenio Pérez
2025-07-23 6:58 ` David Gibson
2025-07-09 17:47 ` [RFC v2 03/11] tap: replace tx tap hdr with virtio_nethdr_mrg_rxbuf Eugenio Pérez
2025-07-24 0:17 ` David Gibson
2025-07-28 16:37 ` Eugenio Perez Martin
2025-07-09 17:47 ` [RFC v2 04/11] tcp: export memory regions to vhost Eugenio Pérez
2025-07-23 7:06 ` David Gibson
2025-07-28 16:41 ` Eugenio Perez Martin
2025-07-29 0:25 ` David Gibson
2025-07-09 17:47 ` [RFC v2 05/11] virtio: Fill .next in tx queue Eugenio Pérez
2025-07-23 7:07 ` David Gibson
2025-07-28 16:44 ` Eugenio Perez Martin
2025-07-09 17:47 ` [RFC v2 06/11] tap: move static iov_sock to tcp_buf_data_from_sock Eugenio Pérez
2025-07-23 7:09 ` David Gibson
2025-07-28 16:43 ` Eugenio Perez Martin
2025-07-29 0:28 ` David Gibson
2025-07-09 17:47 ` [RFC v2 07/11] tap: support tx through vhost Eugenio Pérez
2025-07-24 0:24 ` David Gibson
2025-07-24 14:30 ` Stefano Brivio
2025-07-25 0:23 ` David Gibson
2025-07-09 17:47 ` [RFC v2 08/11] tap: add tap_free_old_xmit Eugenio Pérez
2025-07-24 0:32 ` David Gibson
2025-07-28 16:45 ` Eugenio Perez Martin
2025-07-09 17:47 ` [RFC v2 09/11] tcp: start conversion to circular buffer Eugenio Pérez
2025-07-24 1:03 ` David Gibson
2025-07-28 16:55 ` Eugenio Perez Martin
2025-07-29 0:30 ` David Gibson
2025-07-09 17:47 ` [RFC v2 10/11] tap: add poll(2) to used_idx Eugenio Pérez
2025-07-24 1:20 ` David Gibson
2025-07-28 17:03 ` Eugenio Perez Martin
2025-07-29 0:32 ` David Gibson
2025-07-29 7:04 ` Eugenio Perez Martin
2025-07-30 0:32 ` David Gibson
2025-07-30 6:11 ` Eugenio Perez Martin
2025-07-31 5:59 ` David Gibson
2025-07-31 8:11 ` Eugenio Perez Martin [this message]
2025-07-09 17:47 ` [RFC v2 11/11] tcp_buf: adding TCP tx circular buffer Eugenio Pérez
2025-07-24 1:33 ` David Gibson
2025-07-28 17:04 ` Eugenio Perez Martin
2025-07-10 9:46 ` [RFC v2 00/11] Add vhost-net kernel support Eugenio Perez Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJaqyWeiEUJfjvmHu+mqVy4GUfuiJR2RNHv7OyXzJpwKibyDvQ@mail.gmail.com \
--to=eperezma@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=jasowang@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).