On Wed, Jul 30, 2025 at 08:11:20AM +0200, Eugenio Perez Martin wrote:
> On Wed, Jul 30, 2025 at 2:34 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Tue, Jul 29, 2025 at 09:04:19AM +0200, Eugenio Perez Martin wrote:
> > > On Tue, Jul 29, 2025 at 2:33 AM David Gibson
> > > <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > On Mon, Jul 28, 2025 at 07:03:12PM +0200, Eugenio Perez Martin wrote:
> > > > > On Thu, Jul 24, 2025 at 3:21 AM David Gibson
> > > > > <david@gibson.dropbear.id.au> wrote:
> > > > > >
> > > > > > On Wed, Jul 09, 2025 at 07:47:47PM +0200, Eugenio Pérez wrote:
> > > > > > > From ~13Gbit/s to ~11.5Gbit/s.
> > > > > >
> > > > > > Again, I really don't know what you're comparing to what here.
> > > > > >
> > > > >
> > > > > When the buffer is full I'm using poll() to wait until vhost free some
> > > > > buffers, instead of actively checking the used index. This is the cost
> > > > > of the syscall.
> > > >
> > > > Ah, right.  So.. I'm not sure if it's so much the cost of the syscall
> > > > itself, as the fact that you're actively waiting for free buffers,
> > > > rather than returning to the main epoll loop so you can maybe make
> > > > progress on something else before returning to the Tx path.
> > > >
> > >
> > > Previous patch also wait for free buffers, but it does it burning a
> > > CPU for that.
> >
> > Ah, ok.  Hrm.  I still find it hard to believe that it's the cost of
> > the syscall per se that's causing the slowdown.  My guess is that the
> > cost is because having the poll() leads to a higher latency between
> > the buffer being released and us detecting it and re-using.
> >
> > > The next patch is the one that allows to continue progress as long as
> > > there are enough free buffers, instead of always wait until all the
> > > buffer has been sent. But there are situations where this conversion
> > > needs other code changes. In particular, all the calls to
> > > tcp_payload_flush after checking that we have enough buffers like:
> > >
> > > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) {
> > >         tcp_buf_free_old_tap_xmit(c, 2);
> > >         tcp_payload_flush(c);
> > >         ...
> > > }
> > >
> > > Seems like coroutines would be a good fix here, but maybe there are
> > > simpler ways to go back to the main loop while keeping the tcp socket
> > > "ready to read" by epoll POV. Out of curiosity, what do you think
> > > about setjmp()? :).
> >
> > I think it has its uses, but deciding to go with it is a big
> > architectural decision not to be entered into likely.
> >
> 
> Got it,
> 
> Another idea is to add the flows that are being processed but they had
> no space available into the virtqueue to a "pending" list. When the
> kernel tells pasta that new buffers are available, pasta checks that
> pending list. Maybe it can consist of only one element.

I think this makes sense.  We already kind of want the same thing for
the (rare) cases where the tap buffer fills up (or the pipe buffer for
passt/qemu).  This is part of what we'd need to make the event
handling simpler (if we have proper wakeups on tap side writability we
can always use EPOLLET on the socket side, instead of turning it on
and off).

I'm not actually sure if we need an explicit list.  It might be
adequate to just have a pending flag (or even derive it from existing
state) and poll the entire flow list.  Might be more expensive, but
could well be good enough (we already scan the entire flow list on
every epoll cycle).

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson