From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZMkAIhrM; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 67A345A0271 for ; Thu, 31 Jul 2025 10:11:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1753949513; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cYh/84Q/YwN4FSYogzj9rf8EnFCQVXAi2RuS3AmTCeI=; b=ZMkAIhrMoX8UpaikWEONPPPX2pN1W/CentsgEUayqCxXWfdnfK85DkCRallO8cBgsNHmTR i/UfGvAj1iauLpTGb3vV4qE3tWR+ZH8yeoy9coh83Fs814XlDJjphcA6r4Gm0QoyS8EfTA TorhntJRTUFAIx4fI0Wy599e1Q0sLRA= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-397-AntTLDWcM0i3YKhGbAKGCw-1; Thu, 31 Jul 2025 04:11:51 -0400 X-MC-Unique: AntTLDWcM0i3YKhGbAKGCw-1 X-Mimecast-MFC-AGG-ID: AntTLDWcM0i3YKhGbAKGCw_1753949511 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-237e6963f70so14182575ad.2 for ; Thu, 31 Jul 2025 01:11:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753949511; x=1754554311; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cYh/84Q/YwN4FSYogzj9rf8EnFCQVXAi2RuS3AmTCeI=; b=UlocZ10T4ySLvNZ/Bn/XLSRs6l2AHJ8vbNc5B2D7lxERLY5Z1zaYNkE0CCB8mnqxIH UG1lV9QkY6lVMlS5nR/W85+7J889gbGuu1d8eSqRi1B6Yx9YgPzk6HPG18YarigP14co MOhWgZeZMRH1tM6nXbm1dI4JVJpDKjLWYo6sl3Hh9Xa/Cr5ZGruiMYAKG9zHR38vI3Qt iJ5RvSmAQDPEvhy4uQ2ZO9USmYifQskyJn+stTGXFTXiJwn3Cb69FM+6AVsPY8cbYb1t cMrdbS7ut19U6agvpQgnMY0D2sB3ij16FmHz+yX4lzr7jfMOnCee+7uL/HXzHSguUFN3 pIrw== X-Gm-Message-State: AOJu0Yy62AsuDPn40MalHyS4ratwJ5VdZlp/mSJN+6saZmz4rDHhkYw0 LxNVNhfjOe4zm2UxzuVhnSXj778s5xN+5LAGQYnOpjyTS8S1UUZ2Qvd9BlH57xh51ZL05Q5d7Ik dVpx02BAPxvKhmouVoPfX2FKzoJ4azumvh+RPwsdgNFU/t9lMHT+8BD16L86d/NVMMlJneKRiUJ VCktsvJ7wb6nYaRdV+cp/64QkEStDq X-Gm-Gg: ASbGncsz1wfrMIXHcbJHPMu7/8OcdvuoI2GV7gAil1302Ec939rfLgv14ueao8TU+Zw j/cNVrGPtZRnOoX4/HUZSBoi7J5BpLsfD2Q0aVAbCpdl654RcmEJHFk82oNYaa9V1SkwzaYGDUh jlfl6lffB4EYLbgz4YPxpa/eGtQ1aG9+sPiGoYVHyi86gj0XmPno+Lo5Q= X-Received: by 2002:a17:902:d4cc:b0:240:6406:c471 with SMTP id d9443c01a7336-24096a4f7e5mr85975775ad.10.1753949510596; Thu, 31 Jul 2025 01:11:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHF4hKX9hJ1XzA/sjf7MS7dPTaR4wt7xCyY6qodK/z008+Pi/OXAaHRUscN7cBWVA7qrP5vp3aaO6PWgc12ZXE= X-Received: by 2002:a17:902:d4cc:b0:240:6406:c471 with SMTP id d9443c01a7336-24096a4f7e5mr85975465ad.10.1753949510105; Thu, 31 Jul 2025 01:11:50 -0700 (PDT) MIME-Version: 1.0 References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-11-eperezma@redhat.com> In-Reply-To: From: Eugenio Perez Martin Date: Thu, 31 Jul 2025 10:11:14 +0200 X-Gm-Features: Ac12FXzJEDWnWBYQ-X9veyJrlZhKWkZ0g9gytXNtLysI1CUH1RbK5Jvf_14xa_I Message-ID: Subject: Re: [RFC v2 10/11] tap: add poll(2) to used_idx To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: C3K5Qw2f8TXPNDemKIcvG53OhhveqBZQRqD0CCbZ5dE_1753949511 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VIBDW7RL7UEVLEIUMTG4GB7OJOB5LWNS X-Message-ID-Hash: VIBDW7RL7UEVLEIUMTG4GB7OJOB5LWNS X-MailFrom: eperezma@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, Jul 31, 2025 at 7:59=E2=80=AFAM David Gibson wrote: > > On Wed, Jul 30, 2025 at 08:11:20AM +0200, Eugenio Perez Martin wrote: > > On Wed, Jul 30, 2025 at 2:34=E2=80=AFAM David Gibson > > wrote: > > > > > > On Tue, Jul 29, 2025 at 09:04:19AM +0200, Eugenio Perez Martin wrote: > > > > On Tue, Jul 29, 2025 at 2:33=E2=80=AFAM David Gibson > > > > wrote: > > > > > > > > > > On Mon, Jul 28, 2025 at 07:03:12PM +0200, Eugenio Perez Martin wr= ote: > > > > > > On Thu, Jul 24, 2025 at 3:21=E2=80=AFAM David Gibson > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Jul 09, 2025 at 07:47:47PM +0200, Eugenio P=C3=A9rez = wrote: > > > > > > > > From ~13Gbit/s to ~11.5Gbit/s. > > > > > > > > > > > > > > Again, I really don't know what you're comparing to what here= . > > > > > > > > > > > > > > > > > > > When the buffer is full I'm using poll() to wait until vhost fr= ee some > > > > > > buffers, instead of actively checking the used index. This is t= he cost > > > > > > of the syscall. > > > > > > > > > > Ah, right. So.. I'm not sure if it's so much the cost of the sys= call > > > > > itself, as the fact that you're actively waiting for free buffers= , > > > > > rather than returning to the main epoll loop so you can maybe mak= e > > > > > progress on something else before returning to the Tx path. > > > > > > > > > > > > > Previous patch also wait for free buffers, but it does it burning a > > > > CPU for that. > > > > > > Ah, ok. Hrm. I still find it hard to believe that it's the cost of > > > the syscall per se that's causing the slowdown. My guess is that the > > > cost is because having the poll() leads to a higher latency between > > > the buffer being released and us detecting it and re-using. > > > > > > > The next patch is the one that allows to continue progress as long = as > > > > there are enough free buffers, instead of always wait until all the > > > > buffer has been sent. But there are situations where this conversio= n > > > > needs other code changes. In particular, all the calls to > > > > tcp_payload_flush after checking that we have enough buffers like: > > > > > > > > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) { > > > > tcp_buf_free_old_tap_xmit(c, 2); > > > > tcp_payload_flush(c); > > > > ... > > > > } > > > > > > > > Seems like coroutines would be a good fix here, but maybe there are > > > > simpler ways to go back to the main loop while keeping the tcp sock= et > > > > "ready to read" by epoll POV. Out of curiosity, what do you think > > > > about setjmp()? :). > > > > > > I think it has its uses, but deciding to go with it is a big > > > architectural decision not to be entered into likely. > > > > > > > Got it, > > > > Another idea is to add the flows that are being processed but they had > > no space available into the virtqueue to a "pending" list. When the > > kernel tells pasta that new buffers are available, pasta checks that > > pending list. Maybe it can consist of only one element. > > I think this makes sense. We already kind of want the same thing for > the (rare) cases where the tap buffer fills up (or the pipe buffer for > passt/qemu). This is part of what we'd need to make the event > handling simpler (if we have proper wakeups on tap side writability we > can always use EPOLLET on the socket side, instead of turning it on > and off). > That can be one way, yes. > I'm not actually sure if we need an explicit list. It might be > adequate to just have a pending flag (or even derive it from existing > state) and poll the entire flow list. Might be more expensive, but > could well be good enough (we already scan the entire flow list on > every epoll cycle). > I'm ok with a new flag but the memory increase will be bigger than a single "pending" entry. If scan the whole list for a new state is also a possibility, sure, I'm ok with that too.