On Sat, Dec 02, 2023 at 05:34:58AM +0100, Stefano Brivio wrote:
> On Fri, 1 Dec 2023 11:07:50 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
> > > On Thu, 30 Nov 2023 13:02:22 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > When a TCP connection is closed, we mark it by setting events to CLOSED,
> > > > then some time later we do final cleanups: closing sockets, removing from
> > > > the hash table and so forth.
> > > > 
> > > > This does mean that when making a hash lookup we need to exclude any
> > > > apparent matches that are CLOSED, since they represent a stale connection.
> > > > This can happen in practice if one connection closes and a new one with the
> > > > same endpoints is started shortly afterward.
> > > > 
> > > > Checking for CLOSED is quite specific to TCP however, and won't work when
> > > > we extend the hash table to more general flows.  So, alter the code to
> > > > immediately remove the connection from the hash table when CLOSED, although
> > > > we still defer closing sockets and other cleanup.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  tcp.c | 10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/tcp.c b/tcp.c
> > > > index 74d06bf..17c7cba 100644
> > > > --- a/tcp.c
> > > > +++ b/tcp.c
> > > > @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > >  		tcp_timer_ctl(c, conn);
> > > >  }
> > > >  
> > > > +static void tcp_hash_remove(const struct ctx *c,
> > > > +			    const struct tcp_tap_conn *conn);
> > > > +
> > > >  /**
> > > >   * conn_event_do() - Set and log connection events, update epoll state
> > > >   * @c:		Execution context
> > > > @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > >  		flow_dbg(conn, "%s",
> > > >  			 num == -1 	       ? "CLOSED" : tcp_event_str[num]);
> > > >  
> > > > -	if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > > +	if (event == CLOSED)
> > > > +		tcp_hash_remove(c, conn);
> > > > +	else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > >  		conn_flag(c, conn, ACTIVE_CLOSE);
> > > >  	else
> > > >  		tcp_epoll_ctl(c, conn);
> > > > @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn,
> > > >  			  const union inany_addr *faddr,
> > > >  			  in_port_t eport, in_port_t fport)
> > > >  {
> > > > -	if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) &&
> > > > +	if (inany_equals(&conn->faddr, faddr) &&
> > > >  	    conn->eport == eport && conn->fport == fport)
> > > >  		return 1;
> > > >  
> > > > @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow)
> > > >  	if (conn->timer != -1)
> > > >  		close(conn->timer);
> > > >  
> > > > -	tcp_hash_remove(c, conn);
> > > >  	flow_table_compact(c, flow);  
> > > 
> > > I was pretty sure, due to the way I originally implemented this, that
> > > removing an entry from the hash table without compacting the table
> > > afterwards, with an event possibly coming between the two, would
> > > present some inconsistency while we're handling that event.
> > > 
> > > But looking at it now, I don't see any issue with this. I just wanted
> > > to raise it in case you're aware of (but didn't think about) some
> > > concern in this sense.  
> > 
> > I think it's ok.  The thing is that compacting the connection table
> > itself is largely independent of the hash table, whose buckets are
> > separately indexed.  A hash remove shuffles things around in the hash
> > buckets, but doesn't change where connections sit in the connection
> > table.  A connection table compaction changes the indices in the
> > connection table, which requires updating things in the hash buckets,
> > but not moving things around in the buckets - exactly the same entries
> > are in every hash bucket, it's just that one of them has a new "name"
> > now.
> > 
> > > By the way, the reason why I deferred tcp_hash_remove() back then was
> > > to save cycles between epoll events and get higher CRR rates, but I
> > > think the effect is negligible anyway.  
> > 
> > Right.. to process a FIN and the next SYN at once, I guess?
> 
> That's one example, yes, but in any case it was an optimisation for...
> 
> > I figured
> > this might make a difference, but probably not much.  There's no
> > syscall here, and batching doesn't reduce the total amount of work in
> > this case.
> 
> supposedly better data locality, with batching. But never
> micro-benchmarked, and surely negligible anyway.

Hmm.. except hash tables are by construction non-local, so I wouldn't
really expect batching unrelated hash entries to do that.  Even if the
connection table entries themselves are close, which is more likely,
they take a cacheline each on the most common platform, so that's not
likely to win anything.  In fact if anything, I'd expect better
locality with the non-deferred approach - we're triggering the hash
remove when we've already been working on that connection, so it
should be cache hot.

I guess batching does win some icache locality.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson