public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 16/16] tcp: Don't defer hash table removal
Date: Mon, 4 Dec 2023 11:43:11 +1100	[thread overview]
Message-ID: <ZW0gnwjnQTu7n4_y@zatzit> (raw)
In-Reply-To: <20231202053458.2390d4fb@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 5594 bytes --]

On Sat, Dec 02, 2023 at 05:34:58AM +0100, Stefano Brivio wrote:
> On Fri, 1 Dec 2023 11:07:50 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
> > > On Thu, 30 Nov 2023 13:02:22 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > When a TCP connection is closed, we mark it by setting events to CLOSED,
> > > > then some time later we do final cleanups: closing sockets, removing from
> > > > the hash table and so forth.
> > > > 
> > > > This does mean that when making a hash lookup we need to exclude any
> > > > apparent matches that are CLOSED, since they represent a stale connection.
> > > > This can happen in practice if one connection closes and a new one with the
> > > > same endpoints is started shortly afterward.
> > > > 
> > > > Checking for CLOSED is quite specific to TCP however, and won't work when
> > > > we extend the hash table to more general flows.  So, alter the code to
> > > > immediately remove the connection from the hash table when CLOSED, although
> > > > we still defer closing sockets and other cleanup.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  tcp.c | 10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/tcp.c b/tcp.c
> > > > index 74d06bf..17c7cba 100644
> > > > --- a/tcp.c
> > > > +++ b/tcp.c
> > > > @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > >  		tcp_timer_ctl(c, conn);
> > > >  }
> > > >  
> > > > +static void tcp_hash_remove(const struct ctx *c,
> > > > +			    const struct tcp_tap_conn *conn);
> > > > +
> > > >  /**
> > > >   * conn_event_do() - Set and log connection events, update epoll state
> > > >   * @c:		Execution context
> > > > @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > >  		flow_dbg(conn, "%s",
> > > >  			 num == -1 	       ? "CLOSED" : tcp_event_str[num]);
> > > >  
> > > > -	if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > > +	if (event == CLOSED)
> > > > +		tcp_hash_remove(c, conn);
> > > > +	else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > >  		conn_flag(c, conn, ACTIVE_CLOSE);
> > > >  	else
> > > >  		tcp_epoll_ctl(c, conn);
> > > > @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn,
> > > >  			  const union inany_addr *faddr,
> > > >  			  in_port_t eport, in_port_t fport)
> > > >  {
> > > > -	if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) &&
> > > > +	if (inany_equals(&conn->faddr, faddr) &&
> > > >  	    conn->eport == eport && conn->fport == fport)
> > > >  		return 1;
> > > >  
> > > > @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow)
> > > >  	if (conn->timer != -1)
> > > >  		close(conn->timer);
> > > >  
> > > > -	tcp_hash_remove(c, conn);
> > > >  	flow_table_compact(c, flow);  
> > > 
> > > I was pretty sure, due to the way I originally implemented this, that
> > > removing an entry from the hash table without compacting the table
> > > afterwards, with an event possibly coming between the two, would
> > > present some inconsistency while we're handling that event.
> > > 
> > > But looking at it now, I don't see any issue with this. I just wanted
> > > to raise it in case you're aware of (but didn't think about) some
> > > concern in this sense.  
> > 
> > I think it's ok.  The thing is that compacting the connection table
> > itself is largely independent of the hash table, whose buckets are
> > separately indexed.  A hash remove shuffles things around in the hash
> > buckets, but doesn't change where connections sit in the connection
> > table.  A connection table compaction changes the indices in the
> > connection table, which requires updating things in the hash buckets,
> > but not moving things around in the buckets - exactly the same entries
> > are in every hash bucket, it's just that one of them has a new "name"
> > now.
> > 
> > > By the way, the reason why I deferred tcp_hash_remove() back then was
> > > to save cycles between epoll events and get higher CRR rates, but I
> > > think the effect is negligible anyway.  
> > 
> > Right.. to process a FIN and the next SYN at once, I guess?
> 
> That's one example, yes, but in any case it was an optimisation for...
> 
> > I figured
> > this might make a difference, but probably not much.  There's no
> > syscall here, and batching doesn't reduce the total amount of work in
> > this case.
> 
> supposedly better data locality, with batching. But never
> micro-benchmarked, and surely negligible anyway.

Hmm.. except hash tables are by construction non-local, so I wouldn't
really expect batching unrelated hash entries to do that.  Even if the
connection table entries themselves are close, which is more likely,
they take a cacheline each on the most common platform, so that's not
likely to win anything.  In fact if anything, I'd expect better
locality with the non-deferred approach - we're triggering the hash
remove when we've already been working on that connection, so it
should be cache hot.

I guess batching does win some icache locality.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-12-04  0:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30  2:02 [PATCH v3 00/16] Introduce unified flow table, first steps David Gibson
2023-11-30  2:02 ` [PATCH v3 01/16] treewide: Add messages to static_assert() calls David Gibson
2023-11-30  2:02 ` [PATCH v3 02/16] flow, tcp: Generalise connection types David Gibson
2023-11-30  2:02 ` [PATCH v3 03/16] flow, tcp: Move TCP connection table to unified flow table David Gibson
2023-11-30  2:02 ` [PATCH v3 04/16] flow, tcp: Consolidate flow pointer<->index helpers David Gibson
2023-11-30  2:02 ` [PATCH v3 05/16] util: MAX_FROM_BITS() should be unsigned David Gibson
2023-11-30  2:02 ` [PATCH v3 06/16] flow: Make unified version of flow table compaction David Gibson
2023-11-30  2:02 ` [PATCH v3 07/16] flow, tcp: Add logging helpers for connection related messages David Gibson
2023-11-30  2:02 ` [PATCH v3 08/16] flow: Introduce 'sidx' type to represent one side of one flow David Gibson
2023-12-02  4:35   ` Stefano Brivio
2023-11-30  2:02 ` [PATCH v3 09/16] tcp: Remove unneccessary bounds check in tcp_timer_handler() David Gibson
2023-11-30  2:02 ` [PATCH v3 10/16] flow,tcp: Generalise TCP epoll_ref to generic flows David Gibson
2023-11-30  2:02 ` [PATCH v3 11/16] tcp_splice: Use unsigned to represent side David Gibson
2023-11-30  2:02 ` [PATCH v3 12/16] flow,tcp: Use epoll_ref type including flow and side David Gibson
2023-11-30  2:02 ` [PATCH v3 13/16] test: Avoid hitting guestfish command length limits David Gibson
2023-11-30  2:02 ` [PATCH v3 14/16] pif: Add helpers to get the name of a pif David Gibson
2023-11-30 12:45   ` Stefano Brivio
2023-12-01  0:08     ` David Gibson
2023-11-30  2:02 ` [PATCH v3 15/16] tcp: "TCP" hash secret doesn't need to be TCP specific David Gibson
2023-11-30  2:02 ` [PATCH v3 16/16] tcp: Don't defer hash table removal David Gibson
2023-11-30 12:45   ` Stefano Brivio
2023-12-01  0:07     ` David Gibson
2023-12-02  4:34       ` Stefano Brivio
2023-12-04  0:43         ` David Gibson [this message]
2023-12-04  9:54 ` [PATCH v3 00/16] Introduce unified flow table, first steps Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZW0gnwjnQTu7n4_y@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).