public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 16/16] tcp: Don't defer hash table removal
Date: Sat, 2 Dec 2023 05:34:58 +0100	[thread overview]
Message-ID: <20231202053458.2390d4fb@elisabeth> (raw)
In-Reply-To: <ZWkj1udgR2Le8YZ1@zatzit>

On Fri, 1 Dec 2023 11:07:50 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
> > On Thu, 30 Nov 2023 13:02:22 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > When a TCP connection is closed, we mark it by setting events to CLOSED,
> > > then some time later we do final cleanups: closing sockets, removing from
> > > the hash table and so forth.
> > > 
> > > This does mean that when making a hash lookup we need to exclude any
> > > apparent matches that are CLOSED, since they represent a stale connection.
> > > This can happen in practice if one connection closes and a new one with the
> > > same endpoints is started shortly afterward.
> > > 
> > > Checking for CLOSED is quite specific to TCP however, and won't work when
> > > we extend the hash table to more general flows.  So, alter the code to
> > > immediately remove the connection from the hash table when CLOSED, although
> > > we still defer closing sockets and other cleanup.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  tcp.c | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tcp.c b/tcp.c
> > > index 74d06bf..17c7cba 100644
> > > --- a/tcp.c
> > > +++ b/tcp.c
> > > @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > >  		tcp_timer_ctl(c, conn);
> > >  }
> > >  
> > > +static void tcp_hash_remove(const struct ctx *c,
> > > +			    const struct tcp_tap_conn *conn);
> > > +
> > >  /**
> > >   * conn_event_do() - Set and log connection events, update epoll state
> > >   * @c:		Execution context
> > > @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > >  		flow_dbg(conn, "%s",
> > >  			 num == -1 	       ? "CLOSED" : tcp_event_str[num]);
> > >  
> > > -	if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > +	if (event == CLOSED)
> > > +		tcp_hash_remove(c, conn);
> > > +	else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > >  		conn_flag(c, conn, ACTIVE_CLOSE);
> > >  	else
> > >  		tcp_epoll_ctl(c, conn);
> > > @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn,
> > >  			  const union inany_addr *faddr,
> > >  			  in_port_t eport, in_port_t fport)
> > >  {
> > > -	if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) &&
> > > +	if (inany_equals(&conn->faddr, faddr) &&
> > >  	    conn->eport == eport && conn->fport == fport)
> > >  		return 1;
> > >  
> > > @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow)
> > >  	if (conn->timer != -1)
> > >  		close(conn->timer);
> > >  
> > > -	tcp_hash_remove(c, conn);
> > >  	flow_table_compact(c, flow);  
> > 
> > I was pretty sure, due to the way I originally implemented this, that
> > removing an entry from the hash table without compacting the table
> > afterwards, with an event possibly coming between the two, would
> > present some inconsistency while we're handling that event.
> > 
> > But looking at it now, I don't see any issue with this. I just wanted
> > to raise it in case you're aware of (but didn't think about) some
> > concern in this sense.  
> 
> I think it's ok.  The thing is that compacting the connection table
> itself is largely independent of the hash table, whose buckets are
> separately indexed.  A hash remove shuffles things around in the hash
> buckets, but doesn't change where connections sit in the connection
> table.  A connection table compaction changes the indices in the
> connection table, which requires updating things in the hash buckets,
> but not moving things around in the buckets - exactly the same entries
> are in every hash bucket, it's just that one of them has a new "name"
> now.
> 
> > By the way, the reason why I deferred tcp_hash_remove() back then was
> > to save cycles between epoll events and get higher CRR rates, but I
> > think the effect is negligible anyway.  
> 
> Right.. to process a FIN and the next SYN at once, I guess?

That's one example, yes, but in any case it was an optimisation for...

> I figured
> this might make a difference, but probably not much.  There's no
> syscall here, and batching doesn't reduce the total amount of work in
> this case.

supposedly better data locality, with batching. But never
micro-benchmarked, and surely negligible anyway.

-- 
Stefano


  reply	other threads:[~2023-12-02  4:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30  2:02 [PATCH v3 00/16] Introduce unified flow table, first steps David Gibson
2023-11-30  2:02 ` [PATCH v3 01/16] treewide: Add messages to static_assert() calls David Gibson
2023-11-30  2:02 ` [PATCH v3 02/16] flow, tcp: Generalise connection types David Gibson
2023-11-30  2:02 ` [PATCH v3 03/16] flow, tcp: Move TCP connection table to unified flow table David Gibson
2023-11-30  2:02 ` [PATCH v3 04/16] flow, tcp: Consolidate flow pointer<->index helpers David Gibson
2023-11-30  2:02 ` [PATCH v3 05/16] util: MAX_FROM_BITS() should be unsigned David Gibson
2023-11-30  2:02 ` [PATCH v3 06/16] flow: Make unified version of flow table compaction David Gibson
2023-11-30  2:02 ` [PATCH v3 07/16] flow, tcp: Add logging helpers for connection related messages David Gibson
2023-11-30  2:02 ` [PATCH v3 08/16] flow: Introduce 'sidx' type to represent one side of one flow David Gibson
2023-12-02  4:35   ` Stefano Brivio
2023-11-30  2:02 ` [PATCH v3 09/16] tcp: Remove unneccessary bounds check in tcp_timer_handler() David Gibson
2023-11-30  2:02 ` [PATCH v3 10/16] flow,tcp: Generalise TCP epoll_ref to generic flows David Gibson
2023-11-30  2:02 ` [PATCH v3 11/16] tcp_splice: Use unsigned to represent side David Gibson
2023-11-30  2:02 ` [PATCH v3 12/16] flow,tcp: Use epoll_ref type including flow and side David Gibson
2023-11-30  2:02 ` [PATCH v3 13/16] test: Avoid hitting guestfish command length limits David Gibson
2023-11-30  2:02 ` [PATCH v3 14/16] pif: Add helpers to get the name of a pif David Gibson
2023-11-30 12:45   ` Stefano Brivio
2023-12-01  0:08     ` David Gibson
2023-11-30  2:02 ` [PATCH v3 15/16] tcp: "TCP" hash secret doesn't need to be TCP specific David Gibson
2023-11-30  2:02 ` [PATCH v3 16/16] tcp: Don't defer hash table removal David Gibson
2023-11-30 12:45   ` Stefano Brivio
2023-12-01  0:07     ` David Gibson
2023-12-02  4:34       ` Stefano Brivio [this message]
2023-12-04  0:43         ` David Gibson
2023-12-04  9:54 ` [PATCH v3 00/16] Introduce unified flow table, first steps Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231202053458.2390d4fb@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).