public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 13/13] flow: Avoid moving flow entries to compact table
Date: Fri, 5 Jan 2024 11:27:54 +0100	[thread overview]
Message-ID: <20240105112754.75c765e3@elisabeth> (raw)
In-Reply-To: <ZZfOZgDeVlzQU6PI@zatzit>

On Fri, 5 Jan 2024 20:39:50 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Jan 05, 2024 at 09:33:35AM +0100, Stefano Brivio wrote:
> > On Thu, 4 Jan 2024 21:02:19 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, Jan 02, 2024 at 07:13:41PM +0100, Stefano Brivio wrote:  
> > > > On Mon, 1 Jan 2024 23:01:17 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Sat, Dec 30, 2023 at 11:33:04AM +0100, Stefano Brivio wrote:    
> > > > > > On Thu, 28 Dec 2023 19:25:25 +0100
> > > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > >       
> > > > > > > > On Thu, 21 Dec 2023 17:15:49 +1100
> > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > 
> > > > > > > > [...]      
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > I wonder if we really have to keep track of the number of (non-)entries
> > > > > > > in the free "block", and if we have to be explicit about the two cases.
> > > > > > > 
> > > > > > > I'm trying to find out if we can simplify the whole thing with slightly
> > > > > > > different variants, for example:      
> > > > > > 
> > > > > > So... I think the version with (explicit) blocks has this fundamental
> > > > > > advantage, on deletion:
> > > > > >       
> > > > > > > > +	flow->f.type = FLOW_TYPE_NONE;
> > > > > > > > +	/* Put it back in a length 1 free block, don't attempt to fully reverse
> > > > > > > > +	 * flow_alloc()s steps.  This will get folded together the next time
> > > > > > > > +	 * flow_defer_handler runs anyway() */
> > > > > > > > +	flow->free.n = 1;
> > > > > > > > +	flow->free.next = flow_first_free;
> > > > > > > > +	flow_first_free = FLOW_IDX(flow);      
> > > > > > 
> > > > > > which is doable even without explicit blocks, but much harder to
> > > > > > follow.      
> > > > > 
> > > > > Remember this is not a general deletion, only a "cancel" of the most
> > > > > recent allocation.    
> > > > 
> > > > Oh, I thought that was only the case for this series and you would use
> > > > that as actual deletion in another pending series (which I haven't
> > > > finished reviewing yet).    
> > > 
> > > No.  Not allowing deletion of any entry at any time is what I'm
> > > trading off to get both O(1) allocation and (effectively) O(1)
> > > deletion.
> > >   
> > > > But now I'm not sure anymore why I was thinking this...
> > > > 
> > > > Anyway... do we really need it, then? Can't we just mark the "failed"
> > > > flows as whatever means "closed" for a specific protocol, and clean
> > > > them up later, instead of calling cancel() right away?    
> > > 
> > > We could, but I'm not sure we want to.  For starters, that requires
> > > protocol-specific behaviour whenever we need to back out an allocation
> > > like this.  Not a big deal, since that's in protocol specific code
> > > already, but I think it's uglier than calling cancel.
> > > 
> > > It also requires that the protocol specific deferred cleanup functions
> > > (e.g. tcp_flow_defer()) handle partially initialised entries.  With
> > > 'cancel' we can back out just the initialisation steps we've already
> > > done (because we know where we've failed during init), then remove the
> > > entry.  The deferred cleanup function only needs to deal with
> > > "complete" entries.  Again, certainly possible, but IMO uglier than
> > > having 'cancel'.  
> > 
> > Okay, yes, I see now.
> > 
> > Another doubt that comes to me now is: if you don't plan to use this
> > alloc_cancel() thing anywhere else, the only reason why you are adding
> > it is to replace the (flow_count >= FLOW_MAX) check with a flow_alloc()
> > version that can fail.
> > 
> > But at this point, speaking of ugliness, couldn't we just have a
> > bool flow_may_alloc() { return flow_first_free < FLOW_MAX }; the caller
> > can use to decide to abort earlier? To me it looks so much simpler and
> > more robust.  
> 
> Well, we could, but there are a couple of reasons I don't love it.
> The first is abstraction: this returns explicit handling of the layout
> of the table to the protocol specific callers.  It's not a huge deal
> right now, but once we have 4 or 5 protocols doing this, having to
> change all of them if we make any tiny change to the semantics of
> flow_first_free isn't great.

Hmm, I don't get the difference in terms of abstraction between
checking the return value of flow_alloc() and checking the return value
of flow_may_alloc().

In both cases the protocol handlers know that there's a table, a
function to reserve entries, and that reserving entries might fail...
and not much else.

> The other issue is that to do this (without a bunch of fairly large
> and ugly temporaries) means we'd populate at least some of the fields
> in flow_common before we have officially "allocated" the entry.  At
> that point it becomes a bit fuzzy as to when that allocation really
> occurs.  Is it when we do the FLOW_MAX tesT?

I would say yes -- after that we can't fail.

I mean, we work with rather constrained structures for a number of
reasons, which comes with a number of hard problems... let's at least
reap the benefits of it?

> Is it when we write to
> f.type?  Is it when we update flow_first_free?  If we fail somewhere
> in the middle of that, what steps do we need to reverse?

We can't fail in the middle of it, at the moment. Of course, if the
"allocation" function changes, we might need to change the scheme. But
is it really likely? And anyway it's just a few lines in your current
version...

> For those reasons I prefer the scheme presented.  Fwiw, in an earlier
> draft I did this differently with a "flow_prealloc()", which was
> essentially the check against FLOW_MAX, then a later
> flow_alloc_commit().  I thought it turned out pretty confusing
> compared to the alloc/cancel approach.

The current flow_alloc_cancel() implementation is definitely simple and
semantically clear.

What worries me a bit is that you would have two different cases for
free "blocks" of size one, depending on the order of the events. So if
we want to debug something like that and we see a block with size one
it might be a failed bind(), so a fake one, or also not: it might be an
actual block with size one.

Thinking of multithreading: defining flow_may_alloc() becomes more
complicated because the caller can't just assume the "allocation" will
succeed (as long as we don't cross an "epoll cycle" or something like
that). But we'll probably need some form of locking or userspace RCU
giving us barriers around the pair may_alloc() / alloc().

If we stick to the failing alloc(), this part is simpler, but the
interpretation of flow_first_free and block sizes becomes non-trivial.

Well, on the other hand, it's all simple enough that we can change it
as needed (for example for multithreading). If we can hope that the new
scheme is reasonably low on bugs and we'll probably never have to guess
why a block has size one, I'm fine with the failing alloc() as well.

-- 
Stefano


  reply	other threads:[~2024-01-05 10:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21  6:15 [PATCH v3 00/13] Manage more flow related things from generic flow code David Gibson
2023-12-21  6:15 ` [PATCH v3 01/13] flow: Make flow_table.h #include the protocol specific headers it needs David Gibson
2023-12-21  6:15 ` [PATCH v3 02/13] treewide: Standardise on 'now' for current timestamp variables David Gibson
2023-12-21  6:15 ` [PATCH v3 03/13] tcp, tcp_splice: Remove redundant handling from tcp_timer() David Gibson
2023-12-21  6:15 ` [PATCH v3 04/13] tcp, tcp_splice: Move per-type cleanup logic into per-type helpers David Gibson
2023-12-21  6:15 ` [PATCH v3 05/13] flow, tcp: Add flow-centric dispatch for deferred flow handling David Gibson
2023-12-28 18:24   ` Stefano Brivio
2023-12-31  5:56     ` David Gibson
2024-01-02 18:13       ` Stefano Brivio
2024-01-03  3:45         ` David Gibson
2023-12-21  6:15 ` [PATCH v3 06/13] flow, tcp: Add handling for per-flow timers David Gibson
2023-12-21  6:15 ` [PATCH v3 07/13] epoll: Better handling of number of epoll types David Gibson
2023-12-21  6:15 ` [PATCH v3 08/13] tcp, tcp_splice: Avoid double layered dispatch for connected TCP sockets David Gibson
2023-12-21  6:15 ` [PATCH v3 09/13] flow: Move flow_log_() to near top of flow.c David Gibson
2023-12-21  6:15 ` [PATCH v3 10/13] flow: Move flow_count from context structure to a global David Gibson
2023-12-28 18:25   ` Stefano Brivio
2023-12-31  5:58     ` David Gibson
2024-01-02 18:13       ` Stefano Brivio
2024-01-03  3:54         ` David Gibson
2024-01-03  7:08           ` Stefano Brivio
2024-01-04  9:51             ` David Gibson
2024-01-05  7:55               ` Stefano Brivio
2024-01-07  5:23                 ` David Gibson
2023-12-21  6:15 ` [PATCH v3 11/13] flow: Abstract allocation of new flows with helper function David Gibson
2023-12-21  6:15 ` [PATCH v3 12/13] flow: Enforce that freeing of closed flows must happen in deferred handlers David Gibson
2023-12-21  6:15 ` [PATCH v3 13/13] flow: Avoid moving flow entries to compact table David Gibson
2023-12-28 18:25   ` Stefano Brivio
2023-12-30 10:33     ` Stefano Brivio
2024-01-01 12:01       ` David Gibson
2024-01-02 18:13         ` Stefano Brivio
2024-01-04 10:02           ` David Gibson
2024-01-05  8:33             ` Stefano Brivio
2024-01-05  9:39               ` David Gibson
2024-01-05 10:27                 ` Stefano Brivio [this message]
2024-01-06 11:32                   ` David Gibson
2024-01-06 13:02                     ` Stefano Brivio
2024-01-07  5:20                       ` David Gibson
2024-01-01 10:44     ` David Gibson
2024-01-02 18:13       ` Stefano Brivio
2024-01-05  9:45         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240105112754.75c765e3@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).