From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WELMaush; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 48C735A061B for ; Fri, 20 Dec 2024 10:51:43 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1734688302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MjLMCUNoIJfJwNC1is2D2qZcxMrR/iM5i8waGm5HqAE=; b=WELMaushm4pLNNkUYtJTf94dylVhBVoj9k6go2lgwIL0KOt6g3O/2YlDhmbA+pNuLgDZvw EbHzDajzu6A0y0+wkTXQKjDyyxmcEcCZ9RBjYGA+ZczCaqmYf/2EGph3zgjZE4XorgcaJs nje9SFgYojf8c/9idoEO0zsP918+UoQ= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-636-0nsYvep6MEyTlvwkQkUGeA-1; Fri, 20 Dec 2024 04:51:40 -0500 X-MC-Unique: 0nsYvep6MEyTlvwkQkUGeA-1 X-Mimecast-MFC-AGG-ID: 0nsYvep6MEyTlvwkQkUGeA Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4359eb032c9so14948455e9.2 for ; Fri, 20 Dec 2024 01:51:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734688299; x=1735293099; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MjLMCUNoIJfJwNC1is2D2qZcxMrR/iM5i8waGm5HqAE=; b=AviTQlf0xU5og7TXL7T7XXMdnrkT5QMA6//88R5xlgYXJrMptiM9f5hP5luqx/SzmU Kq4KiCsvMBWau/u/7S/DYVaur1pRSJuPoanNtBBdCN8rjhL1oRRDbggTXS0Y6lYnbew7 PI/8n70vJ1YwA7Qq8Tq4b7NIDtdB9bFDWmqViQPELnVudfpztOyFnChHZcy+S1sgQExC Z8z0HgoHQmFfOX0uGzKIDozQO/QbHbAdqbVpTz4jiMeQji7p4KksWNkonc38bpCf79ws LXdL2KMDgYmovqmvJXwmx75DKagK0Ddch5TppnL/p020URTiPdoayGGwrLC0K3SGPeml WRrg== X-Gm-Message-State: AOJu0Yzk+7GS+d+ulGE6I2qTh+cuf4B3TBF5GZtVIwCrVsnFhF1HAuuy TXXEXEqtVy027sCu8Fbq7V78eWBWr/hjEtGZwWaPCtVzKx58700xWJzeoKiWOzQoEIBGo/nK8C/ WwKyEu7OA4y4TXbYqL+ubrpLOFnNwXKGuEi6/IlC/8remzAgFnA== X-Gm-Gg: ASbGncvZL6PkO4D7kSndqwLOYUAsT8nzRiMxKUjA2k8MlEmwzvb9aJgvHrZIMa/MBLD NYQPrQAzUo4a6S6aADdu1Fl+GigGQi2Lfs/bmlF0OyOYMKETO/Erf8jq0IfEleJ4GssldiyX7jX pzqYQ/Rxk3y8mHR7J7Bt06mDo8vu8zADSJreRoWLzXXC5KMZcUsZ4AYzayDNM7SbpGPdT0cQpnI j3WD52RyhpVoYf1ePB2xRiWCArkR54RCOWpc6MOspWDyCCnoxZkZdtSNfdN1tg6wUBsQFgNTts5 SvCENTsj1g== X-Received: by 2002:a05:600c:314a:b0:434:a815:2b57 with SMTP id 5b1f17b1804b1-43668b480b9mr15735075e9.20.1734688299479; Fri, 20 Dec 2024 01:51:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IH4C5PXn8EWVqfN+JfyffiX9YggyYngTPmpcvl11UehbJntZFoeZNpYFT25kAlDYCwSCWkcQw== X-Received: by 2002:a05:600c:314a:b0:434:a815:2b57 with SMTP id 5b1f17b1804b1-43668b480b9mr15734845e9.20.1734688299052; Fri, 20 Dec 2024 01:51:39 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-436612008casm41164435e9.14.2024.12.20.01.51.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 01:51:37 -0800 (PST) Date: Fri, 20 Dec 2024 10:51:36 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/3] tap: Don't size pool_tap[46] for the maximum number of packets Message-ID: <20241220105136.5ef4e018@elisabeth> In-Reply-To: References: <20241213120156.4123972-1-david@gibson.dropbear.id.au> <20241213120156.4123972-4-david@gibson.dropbear.id.au> <20241219100015.3e4b7599@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 0rh6hP9hU7YdKI6t1Lh8EDq54E8sFMMzUEBPKeICGOQ_1734688300 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: SCFU2AGHFBUFDUZMPAHN2CEJ33WTGVO4 X-Message-ID-Hash: SCFU2AGHFBUFDUZMPAHN2CEJ33WTGVO4 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 20 Dec 2024 12:13:23 +1100 David Gibson wrote: > On Thu, Dec 19, 2024 at 10:00:15AM +0100, Stefano Brivio wrote: > > On Fri, 13 Dec 2024 23:01:56 +1100 > > David Gibson wrote: > > > > > Currently we attempt to size pool_tap[46] so they have room for the maximum > > > possible number of packets that could fit in pkt_buf, TAP_MSGS. However, > > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as > > > the minimum possible L2 frame size. But, we don't enforce that L2 frames > > > are at least ETH_ZLEN when we receive them from the tap backend, and since > > > we're dealing with virtual interfaces we don't have the physical Ethernet > > > limitations requiring that length. Indeed it is possible to generate a > > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on > > > the 'pasta' backend is only 42 bytes long). > > > > > > It's also unclear if this limit is sufficient for vhost-user which isn't > > > limited by the size of pkt_buf as the other modes are. > > > > > > We could attempt to correct the calculation, but that would leave us with > > > even larger arrays, which in practice rarely accumulate more than a handful > > > of packets. So, instead, put an arbitrary cap on the number of packets we > > > can put in a batch, and if we run out of space, process and flush the > > > batch. > > > > > > Signed-off-by: David Gibson > > > --- > > > packet.c | 13 ++++++++++++- > > > packet.h | 3 +++ > > > passt.h | 2 -- > > > tap.c | 18 +++++++++++++++--- > > > tap.h | 3 ++- > > > vu_common.c | 3 ++- > > > 6 files changed, 34 insertions(+), 8 deletions(-) > > > > > > diff --git a/packet.c b/packet.c > > > index 5bfa7304..b68580cc 100644 > > > --- a/packet.c > > > +++ b/packet.c > > > @@ -22,6 +22,17 @@ > > > #include "util.h" > > > #include "log.h" > > > > > > +/** > > > + * pool_full() - Is a packet pool full? > > > + * @p: Pointer to packet pool > > > + * > > > + * Return: true if the pool is full, false if more packets can be added > > > + */ > > > +bool pool_full(const struct pool *p) > > > +{ > > > + return p->count >= p->size; > > > +} > > > + > > > /** > > > * packet_add_do() - Add data as packet descriptor to given pool > > > * @p: Existing pool > > > @@ -35,7 +46,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > { > > > size_t idx = p->count; > > > > > > - if (idx >= p->size) { > > > + if (pool_full(p)) { > > > trace("add packet index %zu to pool with size %zu, %s:%i", > > > idx, p->size, func, line); > > > return; > > > diff --git a/packet.h b/packet.h > > > index 98eb8812..3618f213 100644 > > > --- a/packet.h > > > +++ b/packet.h > > > @@ -6,6 +6,8 @@ > > > #ifndef PACKET_H > > > #define PACKET_H > > > > > > +#include > > > + > > > /** > > > * struct pool - Generic pool of packets stored in nmemory > > > * @size: Number of usable descriptors for the pool > > > @@ -23,6 +25,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > void *packet_get_do(const struct pool *p, const size_t idx, > > > size_t offset, size_t len, size_t *left, > > > const char *func, int line); > > > +bool pool_full(const struct pool *p); > > > void pool_flush(struct pool *p); > > > > > > #define packet_add(p, len, start) \ > > > diff --git a/passt.h b/passt.h > > > index 0dd4efa0..81b2787f 100644 > > > --- a/passt.h > > > +++ b/passt.h > > > @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data), > > > > > > #define TAP_BUF_BYTES \ > > > ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE) > > > -#define TAP_MSGS \ > > > - DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t)) > > > > > > #define PKT_BUF_BYTES MAX(TAP_BUF_BYTES, 0) > > > extern char pkt_buf [PKT_BUF_BYTES]; > > > diff --git a/tap.c b/tap.c > > > index 68231f09..42370a26 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -61,6 +61,8 @@ > > > #include "vhost_user.h" > > > #include "vu_common.h" > > > > > > +#define TAP_MSGS 256 > > > > Sorry, I stopped at 2/3, had just a quick look at this one, and I > > missed this. > > > > Assuming 4 KiB pages, this changes from 161319 to 256. You mention that > > Yes. I'm certainly open to arguments on what the number should be. No idea, until now I just thought we'd have a limit that we can't hit in practice. Let me have a look at what happens with 256 (your new series) and iperf3 or udp_stream from neper. > > in practice we never have more than a handful of messages, which is > > probably almost always the case, but I wonder if that's also the case > > with UDP "real-time" streams, where we could have bursts of a few > > hundred (thousand?) messages at a time. > > Maybe. If we are getting them in large bursts, then we're no longer > really suceeding at the streams being "real-time", but sure, we should > try to catch up as best we can. It could even be that flushing more frequently actually improves things. I'm not sure. I was just pointing out that, quite likely, we can actually hit the new limit. > > I wonder: how bad would it be to correct the calculation, instead? We > > wouldn't actually use more memory, right? > > I was pretty painful when I tried, and it would use more memory. The > safe option would be to use ETH_HLEN as the minimum size (which is > pretty much all we enforce in the tap layer), which would expand the > iovec array here by 2-3x. It's not enormous, but it's not nothing. > Or do you mean the unused pages of the array would never be > instantiated? In which case, yeah, I guess not. Yes, that's what I meant. > Remember that with the changes in this patch if we exceed TAP_MSGS, > nothing particularly bad happens: we don't crash, and we don't drop > packets; we just process things in batches of TAP_MSGS frames at a > time. So this doesn't need to be large enough to handle any burst we > could ever get, just large enough to adequately mitigate the per-batch > costs, which I don't think are _that_ large. 256 was a first guess at > that. Maybe it's not enough, but I'd be pretty surprised if it needed > to be greater than ~1000 to make the per-batch costs negligible > compared to the per-frame costs. UDP_MAX_FRAMES, which is on the > reverse path but serves a similar function, is only 32. Okay, let me give that a try. I guess you didn't see a change in UDP throughput tests... but there we use fairly large messages. -- Stefano