From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 13/13] flow: Avoid moving flow entries to compact table
Date: Thu, 4 Jan 2024 21:02:19 +1100 [thread overview]
Message-ID: <ZZaCK8U3mYUTForT@zatzit> (raw)
In-Reply-To: <20240102191341.7c91dd44@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 6513 bytes --]
On Tue, Jan 02, 2024 at 07:13:41PM +0100, Stefano Brivio wrote:
> On Mon, 1 Jan 2024 23:01:17 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Sat, Dec 30, 2023 at 11:33:04AM +0100, Stefano Brivio wrote:
> > > On Thu, 28 Dec 2023 19:25:25 +0100
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > > > On Thu, 21 Dec 2023 17:15:49 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > [...]
> > > >
> > > > [...]
> > > >
> > > > I wonder if we really have to keep track of the number of (non-)entries
> > > > in the free "block", and if we have to be explicit about the two cases.
> > > >
> > > > I'm trying to find out if we can simplify the whole thing with slightly
> > > > different variants, for example:
> > >
> > > So... I think the version with (explicit) blocks has this fundamental
> > > advantage, on deletion:
> > >
> > > > > + flow->f.type = FLOW_TYPE_NONE;
> > > > > + /* Put it back in a length 1 free block, don't attempt to fully reverse
> > > > > + * flow_alloc()s steps. This will get folded together the next time
> > > > > + * flow_defer_handler runs anyway() */
> > > > > + flow->free.n = 1;
> > > > > + flow->free.next = flow_first_free;
> > > > > + flow_first_free = FLOW_IDX(flow);
> > >
> > > which is doable even without explicit blocks, but much harder to
> > > follow.
> >
> > Remember this is not a general deletion, only a "cancel" of the most
> > recent allocation.
>
> Oh, I thought that was only the case for this series and you would use
> that as actual deletion in another pending series (which I haven't
> finished reviewing yet).
No. Not allowing deletion of any entry at any time is what I'm
trading off to get both O(1) allocation and (effectively) O(1)
deletion.
> But now I'm not sure anymore why I was thinking this...
>
> Anyway... do we really need it, then? Can't we just mark the "failed"
> flows as whatever means "closed" for a specific protocol, and clean
> them up later, instead of calling cancel() right away?
We could, but I'm not sure we want to. For starters, that requires
protocol-specific behaviour whenever we need to back out an allocation
like this. Not a big deal, since that's in protocol specific code
already, but I think it's uglier than calling cancel.
It also requires that the protocol specific deferred cleanup functions
(e.g. tcp_flow_defer()) handle partially initialised entries. With
'cancel' we can back out just the initialisation steps we've already
done (because we know where we've failed during init), then remove the
entry. The deferred cleanup function only needs to deal with
"complete" entries. Again, certainly possible, but IMO uglier than
having 'cancel'.
Finally, leaving the "cancelled" entry there until the next deferred
pass means if we allocate other flows before that defer pass they'll
avoid this slot. That works against trying to keep the entries
reasonably compact.
> > To reduce fragmentation we are keeping the linked
> > list of free clusters in strictly ascending order. The logic above is
> > only correct if the entry we're freeing is before any other free entry
> > in the table. That will be the case for the most recent allocation,
> > because we always allocatte the first free entry in the table.
>
> I see now. This isn't entirely clear from the "Theory of Operation",
> on the other hand it's probably a bad idea to overload that description
> with all these details.
I'll see what I can in terms of making that clearer in the description.
> > > Other than the comments I have, I would go ahead with your approach. My
> > > attempt below in case you're interested. The comment at the end of
> > > flow_del() shows the issue I'm facing:
> > >
> > > struct flow_free_block {
> > > /* Must be first element */
> > > struct flow_common f;
> > >
> > > unsigned next_free;
> > > unsigned next_used;
> > > };
> > >
> > > static union flow *flow_alloc(void)
> > > {
> > > union flow *flow;
> > >
> > > if (flow_first_free >= FLOW_MAX)
> > > return NULL;
> > >
> > > flow = flowtab + flow_first_free;
> > >
> > > flow_first_free = flow->free.next_free;
> > >
> > > memset(flow, 0, sizeof(*flow)); /* TODO: select region */
> > > return flow;
> > > }
> > >
> > > static void flow_del(union flow *del)
> > > {
> > > union flow *left, *right, *next_used = NULL, *first_free;
> > fg
> > >
> > > del->f.type = FLOW_TYPE_NONE;
> > >
> > > left = (FLOW_IDX(del) > 0) ? FLOW(FLOW_IDX(del) - 1) : NULL;
> > > right = (FLOW_IDX(del) < FLOW_MAX - 1) ? FLOW(FLOW_IDX(del) + 1) : NULL;
> > >
> > > first_free = flow_first_free < FLOW_MAX ? FLOW(flow_first_free) : NULL;
> > >
> > > if (right) {
> > > if (right->f.type == FLOW_TYPE_NONE)
> > > del->free.next_used = right->free.next_used;
> > > else
> > > del->free.next_used = right;
> > > } else {
> > > del->free.next_used = FLOW_MAX;
> > > }
> > >
> > > if (left && left->f.type == FLOW_TYPE_NONE) {
> > > left->free.next_free = FLOW_IDX(del);
> > > left->free.next_used = del->free.next_used;
> > > return;
> > > }
> > >
> > > if (flow_first_free == FLOW_MAX) {
> > > flow_first_free = FLOW_IDX(del);
> > > } else if (flow_first_free > FLOW_IDX(del)) {
> > > flow_first_free = FLOW_IDX(del);
> > > del->free.next_free = flow_first_free;
> > > } else if (flow_first_free < FLOW_IDX(del)) {
> > > ;
> > > /* Walk free slots from flow_first_free until FLOW_IDX(del) to
> > > * find insertion point... but that makes deletion at most O(n),
> > > * perhaps O(log(n)), certainly not O(1).
> >
> > Exactly. I can't see any way (short of far more complex data
> > structures) around having a linear scan somewhere, although you can
> > kind of choose whether it happens on alloc or free.
> >
> > The idea of my implementation is to have it at free time - but to
> > merge it with an existing linear scan so it won't have an additional
> > cost.
>
> Right, yes, I see now. This idea is also not terribly clear from the
> description, by the way, but I don't have a good proposal right now.
I'll see what I can come up with.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-01-05 7:34 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 6:15 [PATCH v3 00/13] Manage more flow related things from generic flow code David Gibson
2023-12-21 6:15 ` [PATCH v3 01/13] flow: Make flow_table.h #include the protocol specific headers it needs David Gibson
2023-12-21 6:15 ` [PATCH v3 02/13] treewide: Standardise on 'now' for current timestamp variables David Gibson
2023-12-21 6:15 ` [PATCH v3 03/13] tcp, tcp_splice: Remove redundant handling from tcp_timer() David Gibson
2023-12-21 6:15 ` [PATCH v3 04/13] tcp, tcp_splice: Move per-type cleanup logic into per-type helpers David Gibson
2023-12-21 6:15 ` [PATCH v3 05/13] flow, tcp: Add flow-centric dispatch for deferred flow handling David Gibson
2023-12-28 18:24 ` Stefano Brivio
2023-12-31 5:56 ` David Gibson
2024-01-02 18:13 ` Stefano Brivio
2024-01-03 3:45 ` David Gibson
2023-12-21 6:15 ` [PATCH v3 06/13] flow, tcp: Add handling for per-flow timers David Gibson
2023-12-21 6:15 ` [PATCH v3 07/13] epoll: Better handling of number of epoll types David Gibson
2023-12-21 6:15 ` [PATCH v3 08/13] tcp, tcp_splice: Avoid double layered dispatch for connected TCP sockets David Gibson
2023-12-21 6:15 ` [PATCH v3 09/13] flow: Move flow_log_() to near top of flow.c David Gibson
2023-12-21 6:15 ` [PATCH v3 10/13] flow: Move flow_count from context structure to a global David Gibson
2023-12-28 18:25 ` Stefano Brivio
2023-12-31 5:58 ` David Gibson
2024-01-02 18:13 ` Stefano Brivio
2024-01-03 3:54 ` David Gibson
2024-01-03 7:08 ` Stefano Brivio
2024-01-04 9:51 ` David Gibson
2024-01-05 7:55 ` Stefano Brivio
2024-01-07 5:23 ` David Gibson
2023-12-21 6:15 ` [PATCH v3 11/13] flow: Abstract allocation of new flows with helper function David Gibson
2023-12-21 6:15 ` [PATCH v3 12/13] flow: Enforce that freeing of closed flows must happen in deferred handlers David Gibson
2023-12-21 6:15 ` [PATCH v3 13/13] flow: Avoid moving flow entries to compact table David Gibson
2023-12-28 18:25 ` Stefano Brivio
2023-12-30 10:33 ` Stefano Brivio
2024-01-01 12:01 ` David Gibson
2024-01-02 18:13 ` Stefano Brivio
2024-01-04 10:02 ` David Gibson [this message]
2024-01-05 8:33 ` Stefano Brivio
2024-01-05 9:39 ` David Gibson
2024-01-05 10:27 ` Stefano Brivio
2024-01-06 11:32 ` David Gibson
2024-01-06 13:02 ` Stefano Brivio
2024-01-07 5:20 ` David Gibson
2024-01-01 10:44 ` David Gibson
2024-01-02 18:13 ` Stefano Brivio
2024-01-05 9:45 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZaCK8U3mYUTForT@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).