From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76])
	by passt.top (Postfix) with ESMTPS id DBA365A027A
	for <passt-dev@passt.top>; Mon,  4 Dec 2023 01:45:59 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gibson.dropbear.id.au; s=202312; t=1701650753;
	bh=DwuOhMbiOwyEO5LB9ZbTfD+1uYlVJ3hBryqFJ0xw+Xk=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=nd/zFzJefIew/TAdGHso1vmVsSqC07+st8padsrd1cfe6oh7EZGJR+7Pl3TGnw3fX
	 eXKuOd80zTedO0LuthGHikuAeQhh9udQPA7AgB0CyDwJv4jHqeSpNbWlYoOaEEzxew
	 4nGvf8tW6MLw30n6pkBzkLxLGeFciNsPSzCP3KqhYP1BtflqpA+wGJiOXoAJwZV7zt
	 G3cYsisD4wjNfjLOSUHS4GsqUAwT6Gy82lCj0WJRQPYU1KdzEbjs2i7RB7kDDWPDA6
	 ZJ72eLvUAjtMHTfBH/rVAMT8+GSnum0anEt3EGqJ0S9e3kgojT04RBF9ikLbRwRwZI
	 91MlqSPOX8Ggw==
Received: by gandalf.ozlabs.org (Postfix, from userid 1007)
	id 4Sk4hx5Ynwz4xh3; Mon,  4 Dec 2023 11:45:53 +1100 (AEDT)
Date: Mon, 4 Dec 2023 11:43:11 +1100
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Subject: Re: [PATCH v3 16/16] tcp: Don't defer hash table removal
Message-ID: <ZW0gnwjnQTu7n4_y@zatzit>
References: <20231130020222.4056647-1-david@gibson.dropbear.id.au>
 <20231130020222.4056647-17-david@gibson.dropbear.id.au>
 <20231130134532.7c5b959e@elisabeth>
 <ZWkj1udgR2Le8YZ1@zatzit>
 <20231202053458.2390d4fb@elisabeth>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="PgPHotEcU/ZCItnv"
Content-Disposition: inline
In-Reply-To: <20231202053458.2390d4fb@elisabeth>
Message-ID-Hash: WGG4IJO4M6E4ODA6PGHD55WVJHM5FG45
X-Message-ID-Hash: WGG4IJO4M6E4ODA6PGHD55WVJHM5FG45
X-MailFrom: dgibson@gandalf.ozlabs.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/ZW0gnwjnQTu7n4_y@zatzit/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/WGG4IJO4M6E4ODA6PGHD55WVJHM5FG45/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>


--PgPHotEcU/ZCItnv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Dec 02, 2023 at 05:34:58AM +0100, Stefano Brivio wrote:
> On Fri, 1 Dec 2023 11:07:50 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>=20
> > On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
> > > On Thu, 30 Nov 2023 13:02:22 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >  =20
> > > > When a TCP connection is closed, we mark it by setting events to CL=
OSED,
> > > > then some time later we do final cleanups: closing sockets, removin=
g from
> > > > the hash table and so forth.
> > > >=20
> > > > This does mean that when making a hash lookup we need to exclude any
> > > > apparent matches that are CLOSED, since they represent a stale conn=
ection.
> > > > This can happen in practice if one connection closes and a new one =
with the
> > > > same endpoints is started shortly afterward.
> > > >=20
> > > > Checking for CLOSED is quite specific to TCP however, and won't wor=
k when
> > > > we extend the hash table to more general flows.  So, alter the code=
 to
> > > > immediately remove the connection from the hash table when CLOSED, =
although
> > > > we still defer closing sockets and other cleanup.
> > > >=20
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  tcp.c | 10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > >=20
> > > > diff --git a/tcp.c b/tcp.c
> > > > index 74d06bf..17c7cba 100644
> > > > --- a/tcp.c
> > > > +++ b/tcp.c
> > > > @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, s=
truct tcp_tap_conn *conn,
> > > >  		tcp_timer_ctl(c, conn);
> > > >  }
> > > > =20
> > > > +static void tcp_hash_remove(const struct ctx *c,
> > > > +			    const struct tcp_tap_conn *conn);
> > > > +
> > > >  /**
> > > >   * conn_event_do() - Set and log connection events, update epoll s=
tate
> > > >   * @c:		Execution context
> > > > @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, =
struct tcp_tap_conn *conn,
> > > >  		flow_dbg(conn, "%s",
> > > >  			 num =3D=3D -1 	       ? "CLOSED" : tcp_event_str[num]);
> > > > =20
> > > > -	if ((event =3D=3D TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD=
))
> > > > +	if (event =3D=3D CLOSED)
> > > > +		tcp_hash_remove(c, conn);
> > > > +	else if ((event =3D=3D TAP_FIN_RCVD) && !(conn->events & SOCK_FIN=
_RCVD))
> > > >  		conn_flag(c, conn, ACTIVE_CLOSE);
> > > >  	else
> > > >  		tcp_epoll_ctl(c, conn);
> > > > @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_ta=
p_conn *conn,
> > > >  			  const union inany_addr *faddr,
> > > >  			  in_port_t eport, in_port_t fport)
> > > >  {
> > > > -	if (conn->events !=3D CLOSED && inany_equals(&conn->faddr, faddr)=
 &&
> > > > +	if (inany_equals(&conn->faddr, faddr) &&
> > > >  	    conn->eport =3D=3D eport && conn->fport =3D=3D fport)
> > > >  		return 1;
> > > > =20
> > > > @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, u=
nion flow *flow)
> > > >  	if (conn->timer !=3D -1)
> > > >  		close(conn->timer);
> > > > =20
> > > > -	tcp_hash_remove(c, conn);
> > > >  	flow_table_compact(c, flow); =20
> > >=20
> > > I was pretty sure, due to the way I originally implemented this, that
> > > removing an entry from the hash table without compacting the table
> > > afterwards, with an event possibly coming between the two, would
> > > present some inconsistency while we're handling that event.
> > >=20
> > > But looking at it now, I don't see any issue with this. I just wanted
> > > to raise it in case you're aware of (but didn't think about) some
> > > concern in this sense. =20
> >=20
> > I think it's ok.  The thing is that compacting the connection table
> > itself is largely independent of the hash table, whose buckets are
> > separately indexed.  A hash remove shuffles things around in the hash
> > buckets, but doesn't change where connections sit in the connection
> > table.  A connection table compaction changes the indices in the
> > connection table, which requires updating things in the hash buckets,
> > but not moving things around in the buckets - exactly the same entries
> > are in every hash bucket, it's just that one of them has a new "name"
> > now.
> >=20
> > > By the way, the reason why I deferred tcp_hash_remove() back then was
> > > to save cycles between epoll events and get higher CRR rates, but I
> > > think the effect is negligible anyway. =20
> >=20
> > Right.. to process a FIN and the next SYN at once, I guess?
>=20
> That's one example, yes, but in any case it was an optimisation for...
>=20
> > I figured
> > this might make a difference, but probably not much.  There's no
> > syscall here, and batching doesn't reduce the total amount of work in
> > this case.
>=20
> supposedly better data locality, with batching. But never
> micro-benchmarked, and surely negligible anyway.

Hmm.. except hash tables are by construction non-local, so I wouldn't
really expect batching unrelated hash entries to do that.  Even if the
connection table entries themselves are close, which is more likely,
they take a cacheline each on the most common platform, so that's not
likely to win anything.  In fact if anything, I'd expect better
locality with the non-deferred approach - we're triggering the hash
remove when we've already been working on that connection, so it
should be cache hot.

I guess batching does win some icache locality.

--=20
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--PgPHotEcU/ZCItnv
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmVtII4ACgkQzQJF27ox
2Gcz1xAAn45PUCTmauQeiUzrw7xy5D6jvpubGQdL9cmtmJVIwzi4sK/cisfH2IHa
yq5UqiIqHcKg3aAMrJB68UEfEKxybXzgFbjhowrrpQwLb7JyO2LtVXiWfkYHPum2
Sz10+u4LFzebcBU9Naz6C+i5pEACpyKhRxL9HyKZMSu4DTZZtXz/IUepkwrWmHgg
znge+//O5AkHqMi1EkgDs9QJ+lMCc/Kj1FWK5Rk6cb/wgr9Jw3Sdai0PwhOdlvhA
b57yLcjeziDj34rLNGZc5Z41++2l9nj4gtNiiGTh0Yt+vFuZNjWI+uMoykboixKZ
v7whhT5inkyJeMTNU+wWipg5PWUTqL3yZhotXqJzTtCpgiu6no2Q+Ln7nxabwAI7
hubh/I7xMPUi+4AO8YbCs4x4ESUrplZSWOPbXJ4pCGQDQ5wXxJ1P2ezXiuogZnp+
QKVxjmtcudru0yGxg+QL+H9yj+zS/3mvYg90i+U3ZUcu5TAXbgaYsDtgpQsKuM2a
TtLqMNl8ExqKpXgV0iMrgmF2ADh4urGSR9yV+mnIKFp78ZPCCnfHwvQp28h6EFXi
PiEJPazrPD4Ry0eWHAFGjJmp/LupeTUKOGAgg9N+21PHMWWUE+3CukMUDt21qfIg
kSdWoktH5eCLB6yCBOLdQd7NiAUfNMPq+or2ep0sfgGpJGGK2EE=
=q88Y
-----END PGP SIGNATURE-----

--PgPHotEcU/ZCItnv--