From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202606 header.b=SnuBC2Jr; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 11BD25A0265 for ; Thu, 02 Jul 2026 05:42:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202606; t=1782963771; bh=D0XmYITloG0xQFrd+aVYNjYxEm0cnq7IKF5DJgiBg3s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SnuBC2JrWhGoN18T0/UH9vhaaCrSM8WOnQvhWghtMZmwLRUf9Gy4np+71JDyxOBrq d3vhKq6kWcsaact4TWdE4EQ7UlnU6bBFYLOs6FCTOJ2y75bK9Y/+Ow6QuIdm2JC+cT UvuQgKtku8LNOdyLQZ77RzwBWY1gbbGsSnnUBJGj4t+wskytdoWYtNnvIxHQoI45kb aGWeKdRS0FD3x6lY31+vba05bjCVjHLpLUHHGr8AD/c2e65caApoIVVNxxRTprKz+0 YGrTqe10UfnGp8nslO4M4faHNbWc2GtvzGI9D6xyJqbdsl2g1fN2zq2WF9kG2IaWUs JPh096Vp/513g== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4grN4q5901z4wSP; Thu, 02 Jul 2026 13:42:51 +1000 (AEST) Date: Thu, 2 Jul 2026 13:03:41 +1000 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 8/8] flow: Add mutex and per-qpair filtering to flow table operations Message-ID: References: <20260616171052.3785909-1-lvivier@redhat.com> <20260616171052.3785909-9-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="mPb9vC7QXfdSRO/9" Content-Disposition: inline In-Reply-To: <20260616171052.3785909-9-lvivier@redhat.com> Message-ID-Hash: HGZAVTLPGFPVDB7MQRZIWTYJPX7I5KLM X-Message-ID-Hash: HGZAVTLPGFPVDB7MQRZIWTYJPX7I5KLM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --mPb9vC7QXfdSRO/9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 16, 2026 at 07:10:52PM +0200, Laurent Vivier wrote: > The flow table free list, hash table, and flow_new_entry are global > shared state accessed from multiple threads. >=20 > Protect flow_alloc(), flow_alloc_cancel(), flow_hash_insert(), > flow_hash_remove(), and the free list rebuild in flow_defer_handler() > with a pthread_rwlock_t: writers for mutations, readers for lookups. >=20 > Make flow_new_entry _Thread_local so each thread independently tracks > its own in-progress allocation. >=20 > Since the lock is released between flow_alloc() and flow_activate(), > other threads can observe intermediate flow states (NEW, INI, TGT, > TYPED) during traversal. Adapt flow_foreach() and flow_defer_handler() > accordingly: skip these entries silently rather than treating them as > errors, and break free-list cluster merging across them. >=20 > Filter flow_defer_handler()'s first loop by qpair, so each thread > only processes its own flows. >=20 > Signed-off-by: Laurent Vivier I'm skipping review of this, since it sounds like you're reworking the flow table locking significantly anyway. > --- > flow.c | 59 +++++++++++++++++++++++++++++++++++++++++++--------- > flow_table.h | 2 +- > 2 files changed, 50 insertions(+), 11 deletions(-) >=20 > diff --git a/flow.c b/flow.c > index 08c7620c7b0f..149360c3ec87 100644 > --- a/flow.c > +++ b/flow.c > @@ -12,6 +12,8 @@ > #include > #include > =20 > +#include > + > #include "util.h" > #include "ip.h" > #include "passt.h" > @@ -129,7 +131,7 @@ static_assert(ARRAY_SIZE(flow_epoll) =3D=3D FLOW_NUM_= TYPES, > =20 > unsigned flow_first_free; > union flow flowtab[FLOW_MAX]; > -static const union flow *flow_new_entry; /* =3D NULL */ > +static _Thread_local const union flow *flow_new_entry; /* =3D NULL */ > int qpair_to_fd[FLOW_QPAIR_SIZE]; > =20 > /* Hash table to index it */ > @@ -142,6 +144,8 @@ static flow_sidx_t flow_hashtab[FLOW_HASH_SIZE]; > static_assert(ARRAY_SIZE(flow_hashtab) >=3D 2 * FLOW_MAX, > "Safe linear probing requires hash table with more entries than the numb= er of sides in the flow table"); > =20 > +static pthread_rwlock_t flow_lock =3D PTHREAD_RWLOCK_INITIALIZER; > + > /** flowside_from_af() - Initialise flowside from addresses > * @side: flowside to initialise > * @af: Address family (AF_INET or AF_INET6) > @@ -616,12 +620,18 @@ void flow_activate(struct flow_common *f) > */ > union flow *flow_alloc(void) > { > - union flow *flow =3D &flowtab[flow_first_free]; > + union flow *flow; > + > + pthread_rwlock_wrlock(&flow_lock); > + > + flow =3D &flowtab[flow_first_free]; > =20 > assert(!flow_new_entry); > =20 > - if (flow_first_free >=3D FLOW_MAX) > + if (flow_first_free >=3D FLOW_MAX) { > + pthread_rwlock_unlock(&flow_lock); > return NULL; > + } > =20 > assert(flow->f.state =3D=3D FLOW_STATE_FREE); > assert(flow->f.type =3D=3D FLOW_TYPE_NONE); > @@ -650,6 +660,8 @@ union flow *flow_alloc(void) > memset(flow, 0, sizeof(*flow)); > flow_set_state(&flow->f, FLOW_STATE_NEW); > =20 > + pthread_rwlock_unlock(&flow_lock); > + > return flow; > } > =20 > @@ -661,6 +673,8 @@ union flow *flow_alloc(void) > */ > void flow_alloc_cancel(union flow *flow) > { > + pthread_rwlock_wrlock(&flow_lock); > + > assert(flow_new_entry =3D=3D flow); > assert(flow->f.state =3D=3D FLOW_STATE_NEW || > flow->f.state =3D=3D FLOW_STATE_INI || > @@ -678,6 +692,8 @@ void flow_alloc_cancel(union flow *flow) > flow->free.next =3D flow_first_free; > flow_first_free =3D FLOW_IDX(flow); > flow_new_entry =3D NULL; > + > + pthread_rwlock_unlock(&flow_lock); > } > =20 > /** > @@ -763,9 +779,13 @@ static inline unsigned flow_hash_probe(const struct = ctx *c, flow_sidx_t sidx) > uint64_t flow_hash_insert(const struct ctx *c, flow_sidx_t sidx) > { > uint64_t hash =3D flow_sidx_hash(c, sidx); > - unsigned b =3D flow_hash_probe_(hash, sidx); > + unsigned b; > =20 > + pthread_rwlock_wrlock(&flow_lock); > + b =3D flow_hash_probe_(hash, sidx); > flow_hashtab[b] =3D sidx; > + pthread_rwlock_unlock(&flow_lock); > + > flow_dbg(flow_at_sidx(sidx), "Side %u hash table insert: bucket: %u", > sidx.sidei, b); > =20 > @@ -779,10 +799,16 @@ uint64_t flow_hash_insert(const struct ctx *c, flow= _sidx_t sidx) > */ > void flow_hash_remove(const struct ctx *c, flow_sidx_t sidx) > { > - unsigned b =3D flow_hash_probe(c, sidx), s; > + unsigned b, s; > =20 > - if (!flow_sidx_valid(flow_hashtab[b])) > + pthread_rwlock_wrlock(&flow_lock); > + > + b =3D flow_hash_probe(c, sidx); > + > + if (!flow_sidx_valid(flow_hashtab[b])) { > + pthread_rwlock_unlock(&flow_lock); > return; /* Redundant remove */ > + } > =20 > flow_dbg(flow_at_sidx(sidx), "Side %u hash table remove: bucket: %u", > sidx.sidei, b); > @@ -802,6 +828,8 @@ void flow_hash_remove(const struct ctx *c, flow_sidx_= t sidx) > } > =20 > flow_hashtab[b] =3D FLOW_SIDX_NONE; > + > + pthread_rwlock_unlock(&flow_lock); > } > =20 > /** > @@ -816,10 +844,12 @@ void flow_hash_remove(const struct ctx *c, flow_sid= x_t sidx) > static flow_sidx_t flowside_lookup(const struct ctx *c, uint8_t proto, > uint8_t pif, const struct flowside *side) > { > - flow_sidx_t sidx; > + flow_sidx_t sidx, ret; > union flow *flow; > unsigned b; > =20 > + pthread_rwlock_rdlock(&flow_lock); > + > b =3D flow_hash(c, proto, pif, side) % FLOW_HASH_SIZE; > while ((sidx =3D flow_hashtab[b], flow =3D flow_at_sidx(sidx)) && > !(FLOW_PROTO(&flow->f) =3D=3D proto && > @@ -827,7 +857,11 @@ static flow_sidx_t flowside_lookup(const struct ctx = *c, uint8_t proto, > flowside_eq(&flow->f.side[sidx.sidei], side))) > b =3D mod_sub(b, 1, FLOW_HASH_SIZE); > =20 > - return flow_hashtab[b]; > + ret =3D flow_hashtab[b]; > + > + pthread_rwlock_unlock(&flow_lock); > + > + return ret; > } > =20 > /** > @@ -920,6 +954,9 @@ void flow_defer_handler(const struct ctx *c, const st= ruct timespec *now, > flow_foreach(flow) { > bool closed =3D false; > =20 > + if (flow->f.qpair !=3D qpair) > + continue; > + > switch (flow->f.type) { > case FLOW_TYPE_NONE: > assert(false); > @@ -951,6 +988,7 @@ void flow_defer_handler(const struct ctx *c, const st= ruct timespec *now, > } > =20 > /* Second step: actually free the flows */ > + pthread_rwlock_wrlock(&flow_lock); > flow_foreach_slot(flow) { > switch (flow->f.state) { > case FLOW_STATE_FREE: { > @@ -979,8 +1017,8 @@ void flow_defer_handler(const struct ctx *c, const s= truct timespec *now, > case FLOW_STATE_INI: > case FLOW_STATE_TGT: > case FLOW_STATE_TYPED: > - /* Incomplete flow at end of cycle */ > - assert(false); > + /* In-progress allocation on another thread */ > + free_head =3D NULL; > break; > =20 > case FLOW_STATE_ACTIVE: > @@ -1012,6 +1050,7 @@ void flow_defer_handler(const struct ctx *c, const = struct timespec *now, > } > =20 > *last_next =3D FLOW_MAX; > + pthread_rwlock_unlock(&flow_lock); > } > =20 > /** > diff --git a/flow_table.h b/flow_table.h > index e4ff6f73c35c..f2545390205a 100644 > --- a/flow_table.h > +++ b/flow_table.h > @@ -72,7 +72,7 @@ extern union flow flowtab[]; > (flow) +=3D (flow)->free.n - 1; \ > /* NOLINTNEXTLINE(readability-inconsistent-ifelse-braces) */\ > else if ((flow)->f.state !=3D FLOW_STATE_ACTIVE) { \ > - flow_err((flow), "Bad flow state during traversal"); \ > + (void)0; /* Differs from bare continue */ \ > continue; \ > } else > =20 > --=20 > 2.54.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --mPb9vC7QXfdSRO/9 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmpF1QAACgkQzQJF27ox 2GfuVRAApvQE39VHsrriUGhD4ZEh1u2Y4QvKx1WP/v8kwHD+JjEBegmk2jBhFeyA nbqxHspRCFdp/QQhHrQNRcpWPEqmrOG7mMlD48woqwNut+fllZBDKjJheXbZjz2X ds2XL0BB7Koyr/B2hN9oQ1kPBZBf5YvXz7/qKYiQmzJIVxBPFJ3hO5OitnUz4SIc o+/2Q5oZG1UfVLM1AVq4k40y6nd7NPHqSBYgSMILyQCUdxv1JyO8aMntXAmUFW85 GJfFHjqYBunTngTt6eCQATyst8zbfTu7YiJeW0S9YxQww/ORTzLudCRK8aXufc4x IrdGYEqTThA//A0PS422h/grYtwDO0vz2iYq6IpR+2vSBroLKLqYbDOVLU7UgSYr FPbYTM6cTjWx2N2st5xPNS3s+klmf9gI9uiN8ei5pME43N2vStuYxOcrtk3zkzcN /Pcxk6JYCNVAQOeimAOI/L28qk/vb9E1SAVfuVviEJlyl39ofCthP9/nbDVHw/Uq v3Djnol7rGwRYnN5m6n6Pr8WokpqCpdTsFJsmDIKQkzr4gqARE/CeV1PQ8BiEck5 g7+QjmKhvTFVz2E+E1AUcN3jwi1vH2vb4V+OLLkPCZ6QZK1JZ8kZCh7gy8dfTkzO bLh0oyiiGJWoJmx6beUxcyCY+j1nmg0//vQQ0xDzMCIlk6cF7zo= =o5s+ -----END PGP SIGNATURE----- --mPb9vC7QXfdSRO/9--