From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 16/16] tcp: Don't defer hash table removal
Date: Sat, 2 Dec 2023 05:34:58 +0100 [thread overview]
Message-ID: <20231202053458.2390d4fb@elisabeth> (raw)
In-Reply-To: <ZWkj1udgR2Le8YZ1@zatzit>
On Fri, 1 Dec 2023 11:07:50 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
> > On Thu, 30 Nov 2023 13:02:22 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > When a TCP connection is closed, we mark it by setting events to CLOSED,
> > > then some time later we do final cleanups: closing sockets, removing from
> > > the hash table and so forth.
> > >
> > > This does mean that when making a hash lookup we need to exclude any
> > > apparent matches that are CLOSED, since they represent a stale connection.
> > > This can happen in practice if one connection closes and a new one with the
> > > same endpoints is started shortly afterward.
> > >
> > > Checking for CLOSED is quite specific to TCP however, and won't work when
> > > we extend the hash table to more general flows. So, alter the code to
> > > immediately remove the connection from the hash table when CLOSED, although
> > > we still defer closing sockets and other cleanup.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > tcp.c | 10 +++++++---
> > > 1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tcp.c b/tcp.c
> > > index 74d06bf..17c7cba 100644
> > > --- a/tcp.c
> > > +++ b/tcp.c
> > > @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > tcp_timer_ctl(c, conn);
> > > }
> > >
> > > +static void tcp_hash_remove(const struct ctx *c,
> > > + const struct tcp_tap_conn *conn);
> > > +
> > > /**
> > > * conn_event_do() - Set and log connection events, update epoll state
> > > * @c: Execution context
> > > @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
> > > flow_dbg(conn, "%s",
> > > num == -1 ? "CLOSED" : tcp_event_str[num]);
> > >
> > > - if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > + if (event == CLOSED)
> > > + tcp_hash_remove(c, conn);
> > > + else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
> > > conn_flag(c, conn, ACTIVE_CLOSE);
> > > else
> > > tcp_epoll_ctl(c, conn);
> > > @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn,
> > > const union inany_addr *faddr,
> > > in_port_t eport, in_port_t fport)
> > > {
> > > - if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) &&
> > > + if (inany_equals(&conn->faddr, faddr) &&
> > > conn->eport == eport && conn->fport == fport)
> > > return 1;
> > >
> > > @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow)
> > > if (conn->timer != -1)
> > > close(conn->timer);
> > >
> > > - tcp_hash_remove(c, conn);
> > > flow_table_compact(c, flow);
> >
> > I was pretty sure, due to the way I originally implemented this, that
> > removing an entry from the hash table without compacting the table
> > afterwards, with an event possibly coming between the two, would
> > present some inconsistency while we're handling that event.
> >
> > But looking at it now, I don't see any issue with this. I just wanted
> > to raise it in case you're aware of (but didn't think about) some
> > concern in this sense.
>
> I think it's ok. The thing is that compacting the connection table
> itself is largely independent of the hash table, whose buckets are
> separately indexed. A hash remove shuffles things around in the hash
> buckets, but doesn't change where connections sit in the connection
> table. A connection table compaction changes the indices in the
> connection table, which requires updating things in the hash buckets,
> but not moving things around in the buckets - exactly the same entries
> are in every hash bucket, it's just that one of them has a new "name"
> now.
>
> > By the way, the reason why I deferred tcp_hash_remove() back then was
> > to save cycles between epoll events and get higher CRR rates, but I
> > think the effect is negligible anyway.
>
> Right.. to process a FIN and the next SYN at once, I guess?
That's one example, yes, but in any case it was an optimisation for...
> I figured
> this might make a difference, but probably not much. There's no
> syscall here, and batching doesn't reduce the total amount of work in
> this case.
supposedly better data locality, with batching. But never
micro-benchmarked, and surely negligible anyway.
--
Stefano
next prev parent reply other threads:[~2023-12-02 4:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 2:02 [PATCH v3 00/16] Introduce unified flow table, first steps David Gibson
2023-11-30 2:02 ` [PATCH v3 01/16] treewide: Add messages to static_assert() calls David Gibson
2023-11-30 2:02 ` [PATCH v3 02/16] flow, tcp: Generalise connection types David Gibson
2023-11-30 2:02 ` [PATCH v3 03/16] flow, tcp: Move TCP connection table to unified flow table David Gibson
2023-11-30 2:02 ` [PATCH v3 04/16] flow, tcp: Consolidate flow pointer<->index helpers David Gibson
2023-11-30 2:02 ` [PATCH v3 05/16] util: MAX_FROM_BITS() should be unsigned David Gibson
2023-11-30 2:02 ` [PATCH v3 06/16] flow: Make unified version of flow table compaction David Gibson
2023-11-30 2:02 ` [PATCH v3 07/16] flow, tcp: Add logging helpers for connection related messages David Gibson
2023-11-30 2:02 ` [PATCH v3 08/16] flow: Introduce 'sidx' type to represent one side of one flow David Gibson
2023-12-02 4:35 ` Stefano Brivio
2023-11-30 2:02 ` [PATCH v3 09/16] tcp: Remove unneccessary bounds check in tcp_timer_handler() David Gibson
2023-11-30 2:02 ` [PATCH v3 10/16] flow,tcp: Generalise TCP epoll_ref to generic flows David Gibson
2023-11-30 2:02 ` [PATCH v3 11/16] tcp_splice: Use unsigned to represent side David Gibson
2023-11-30 2:02 ` [PATCH v3 12/16] flow,tcp: Use epoll_ref type including flow and side David Gibson
2023-11-30 2:02 ` [PATCH v3 13/16] test: Avoid hitting guestfish command length limits David Gibson
2023-11-30 2:02 ` [PATCH v3 14/16] pif: Add helpers to get the name of a pif David Gibson
2023-11-30 12:45 ` Stefano Brivio
2023-12-01 0:08 ` David Gibson
2023-11-30 2:02 ` [PATCH v3 15/16] tcp: "TCP" hash secret doesn't need to be TCP specific David Gibson
2023-11-30 2:02 ` [PATCH v3 16/16] tcp: Don't defer hash table removal David Gibson
2023-11-30 12:45 ` Stefano Brivio
2023-12-01 0:07 ` David Gibson
2023-12-02 4:34 ` Stefano Brivio [this message]
2023-12-04 0:43 ` David Gibson
2023-12-04 9:54 ` [PATCH v3 00/16] Introduce unified flow table, first steps Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231202053458.2390d4fb@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).