From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=endXllFU; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 2436E5A0271 for ; Mon, 04 Aug 2025 06:09:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1754280583; bh=T7MZ3wT6Ae2O9qZCd1ftx6ntb+Gjxkd7cWUYmByPH04=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=endXllFUl6Mu+L277z9KFkHqmFNMnm4Sk43CU4QuWo4F05wYcCIpI3sNjSE7hVtsG Y17v2IRxOk4m+txpYhw9q1pZeifeLLggiWcz/9jTZu1QSAbp7MhNMPk5quZk1P6N7H SJeDPpqP9YN9FagaMwWCKwc+oKlRs5I35BiPsKmifLznG7feAn17/3PL+oUBhbIH8C yCC4YCvqeEfIIcVd5bdnlMVZyJDoCoA472LzGy9swTldfUJY3i3jRpXLa9DFT/Th0d etGnLg2y/Ri5k/SBM4PbKqCQBnfTCe3OvBvTtNu5RpGkUWibpROnjK85a51sx85sIx 26xaR/vysnubA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4bwNP366SBz4x3d; Mon, 4 Aug 2025 14:09:43 +1000 (AEST) Date: Mon, 4 Aug 2025 14:09:39 +1000 From: David Gibson To: Eugenio Perez Martin Subject: Re: [RFC v2 10/11] tap: add poll(2) to used_idx Message-ID: References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-11-eperezma@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="XukV5A/LKl3JRXLH" Content-Disposition: inline In-Reply-To: Message-ID-Hash: 47D6HPNBJERXNQJBISCYDB647OXF37VC X-Message-ID-Hash: 47D6HPNBJERXNQJBISCYDB647OXF37VC X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --XukV5A/LKl3JRXLH Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 31, 2025 at 10:11:14AM +0200, Eugenio Perez Martin wrote: > On Thu, Jul 31, 2025 at 7:59=E2=80=AFAM David Gibson > wrote: > > > > On Wed, Jul 30, 2025 at 08:11:20AM +0200, Eugenio Perez Martin wrote: > > > On Wed, Jul 30, 2025 at 2:34=E2=80=AFAM David Gibson > > > wrote: > > > > > > > > On Tue, Jul 29, 2025 at 09:04:19AM +0200, Eugenio Perez Martin wrot= e: > > > > > On Tue, Jul 29, 2025 at 2:33=E2=80=AFAM David Gibson > > > > > wrote: > > > > > > > > > > > > On Mon, Jul 28, 2025 at 07:03:12PM +0200, Eugenio Perez Martin = wrote: > > > > > > > On Thu, Jul 24, 2025 at 3:21=E2=80=AFAM David Gibson > > > > > > > wrote: > > > > > > > > > > > > > > > > On Wed, Jul 09, 2025 at 07:47:47PM +0200, Eugenio P=C3=A9re= z wrote: > > > > > > > > > From ~13Gbit/s to ~11.5Gbit/s. > > > > > > > > > > > > > > > > Again, I really don't know what you're comparing to what he= re. > > > > > > > > > > > > > > > > > > > > > > When the buffer is full I'm using poll() to wait until vhost = free some > > > > > > > buffers, instead of actively checking the used index. This is= the cost > > > > > > > of the syscall. > > > > > > > > > > > > Ah, right. So.. I'm not sure if it's so much the cost of the s= yscall > > > > > > itself, as the fact that you're actively waiting for free buffe= rs, > > > > > > rather than returning to the main epoll loop so you can maybe m= ake > > > > > > progress on something else before returning to the Tx path. > > > > > > > > > > > > > > > > Previous patch also wait for free buffers, but it does it burning= a > > > > > CPU for that. > > > > > > > > Ah, ok. Hrm. I still find it hard to believe that it's the cost of > > > > the syscall per se that's causing the slowdown. My guess is that t= he > > > > cost is because having the poll() leads to a higher latency between > > > > the buffer being released and us detecting it and re-using. > > > > > > > > > The next patch is the one that allows to continue progress as lon= g as > > > > > there are enough free buffers, instead of always wait until all t= he > > > > > buffer has been sent. But there are situations where this convers= ion > > > > > needs other code changes. In particular, all the calls to > > > > > tcp_payload_flush after checking that we have enough buffers like: > > > > > > > > > > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) { > > > > > tcp_buf_free_old_tap_xmit(c, 2); > > > > > tcp_payload_flush(c); > > > > > ... > > > > > } > > > > > > > > > > Seems like coroutines would be a good fix here, but maybe there a= re > > > > > simpler ways to go back to the main loop while keeping the tcp so= cket > > > > > "ready to read" by epoll POV. Out of curiosity, what do you think > > > > > about setjmp()? :). > > > > > > > > I think it has its uses, but deciding to go with it is a big > > > > architectural decision not to be entered into likely. > > > > > > > > > > Got it, > > > > > > Another idea is to add the flows that are being processed but they had > > > no space available into the virtqueue to a "pending" list. When the > > > kernel tells pasta that new buffers are available, pasta checks that > > > pending list. Maybe it can consist of only one element. > > > > I think this makes sense. We already kind of want the same thing for > > the (rare) cases where the tap buffer fills up (or the pipe buffer for > > passt/qemu). This is part of what we'd need to make the event > > handling simpler (if we have proper wakeups on tap side writability we > > can always use EPOLLET on the socket side, instead of turning it on > > and off). > > >=20 > That can be one way, yes. >=20 > > I'm not actually sure if we need an explicit list. It might be > > adequate to just have a pending flag (or even derive it from existing > > state) and poll the entire flow list. Might be more expensive, but > > could well be good enough (we already scan the entire flow list on > > every epoll cycle). >=20 > I'm ok with a new flag but the memory increase will be bigger than a > single "pending" entry. If scan the whole list for a new state is also > a possibility, sure, I'm ok with that too. To be clear, I'm not insisting on a particular way of doing it. I'm just suggesting a few options; I'm not sure which will work out to be best / simplest. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --XukV5A/LKl3JRXLH Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiQMoIACgkQzQJF27ox 2GdvCA/9GTh320anb5UMSHaAeT+DDcvHR2ohjSXhrhSF/y1Z32PmIG/NHu5CqsAI vwqeXlOxmqlF5Enzobj5kh0O+L3OQRVmg8K2B2EToIpEqUhxCUrqbHjuxQ251AEF wBVUh+hfxmLP+5BAd+2a+mG2Vmg465iVsKSBTthtuaOS9DCtEEJMJmWf6hZA8XHg 75iA+zTdIZYcC8YSkUfPwI135TAi1wd8M4zbHgjO6IuFV5UCSN4iMsRdpH3WHeHO IYqS8KdbzuHNZM8PZASNpTeQ5A/qj+xaqonk5iTJ0aN1iUIEOg1QUDjlgDQru9Jp 959NxydA96cZEqR+vmCF4+aLoc8GhyjslAByFP6j0OWUp00UEiwm617CZOSWxb3j wjV4xHnNSdPiODk3KX7qReb0zpa1Pu4dBzhI3EFs92DtcT1J/7y97I9btX0XAlhX oYkDPeZn+Av2ZeLTRYJZLgMpp/3xqT8RDf2Q9V39TCzasOA4ONY9oI6gSQDbZjlQ u+N40VD4j7x2tmE7QfW+pDcSqwcNUy/ZM1fc0X4S48eZg4bblGC4bix05HM0jCJE ObDCNQL77wJOY5lcI5yoyQm25lRMaxYUS9B8hATOaT+TUzmqFqczbap3tC8rlDdd 03dXM1H32Olp9NsNWNRX7RuhIAK47+EFNMCvSHCnzlww17uMunA= =lj6j -----END PGP SIGNATURE----- --XukV5A/LKl3JRXLH--