From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=dfO7dRWe; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 54B505A004E for ; Wed, 14 Jan 2026 02:06:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1768352770; bh=H5oJiLZdC4CaQ3WZBEV2fzeGpKxAhBWjeel+jMI454g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dfO7dRWei+aIW4kvEKedk4Cww2G+GAaKN9o3Zp80C7SQG+TPXZomTMo/iLJVjieLT jFfmVBrtLmHs4adDC5xRReVs+SIgJJlic6ujsh5p++f0B27Dr8W8KnaNHGGdb9Jad9 FKCV1ZNgpoVh4cQ32/UK47wIVnAXhrVxXwXnqz3NNd0g9HP3gPBDFfL8vb2zwoA7yu PGJgVoum/rvQfDVw+/o02mIf4lk7BcNqokdd3vKbbhsnfPkJOR4ncxuBUf807vY1LL txrZGL3irK4MP+3Mkfnw555ojQAjs0aGyBW6It8nerezAtOFzmjunbIV/teXqhNMAL iPxqTvGaKhYXg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4drSc211KTz4wDk; Wed, 14 Jan 2026 12:06:10 +1100 (AEDT) Date: Wed, 14 Jan 2026 12:06:04 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible Message-ID: References: <20260108022948.2657573-1-david@gibson.dropbear.id.au> <20260108022948.2657573-15-david@gibson.dropbear.id.au> <20260113231303.78fb3471@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="/I4XIg/8e3JaR9IC" Content-Disposition: inline In-Reply-To: <20260113231303.78fb3471@elisabeth> Message-ID-Hash: R4DK75NCHIUB7T7YB6DD4WC7JNAOZRRP X-Message-ID-Hash: R4DK75NCHIUB7T7YB6DD4WC7JNAOZRRP X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --/I4XIg/8e3JaR9IC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 13, 2026 at 11:13:03PM +0100, Stefano Brivio wrote: > On Thu, 8 Jan 2026 13:29:48 +1100 > David Gibson wrote: >=20 > > Now that listening sockets include a reference to the forwarding rule w= hich > > created them we can, in many cases, avoid a linear search of the forwar= ding > > table when we want to find the relevant rule. Instead we can take the > > rule index from the socket's epoll reference, and use that to immediate= ly > > find the correct rule. > >=20 > > This is conceptually simple, but requires a moderate amount of plumbing= to > > get the index from the reference through to the rule lookup. We still > > allow fall back to linear search if we don't have the index, and this m= ay > > (rarely) be used in the udp_flush_flow() case, where we could get packe= ts > > for one flow on a different flow's socket, rather than through a listen= ing > > socket as usual. > >=20 > > Signed-off-by: David Gibson > > --- > > flow.c | 13 +++++++++++-- > > flow_table.h | 2 +- > > fwd.c | 3 +-- > > fwd.h | 1 + > > icmp.c | 2 +- > > tcp.c | 4 ++-- > > udp.c | 14 +++++++++----- > > udp_flow.c | 14 ++++++++------ > > udp_flow.h | 2 +- > > udp_internal.h | 4 ++-- > > 10 files changed, 37 insertions(+), 22 deletions(-) > >=20 > > diff --git a/flow.c b/flow.c > > index 38f88732..85759970 100644 > > --- a/flow.c > > +++ b/flow.c > > @@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flo= w, uint8_t pif, > > * flow_target() - Determine where flow should forward to, and move to= TGT > > * @c: Execution context > > * @flow: Flow to forward > > + * @rule_hint: Index of relevant forwarding rule, or -1 if unknown > > * @proto: Protocol > > * > > * Return: pointer to the target flowside information > > */ > > struct flowside *flow_target(const struct ctx *c, union flow *flow, > > - uint8_t proto) > > + int rule_hint, uint8_t proto) > > { > > char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN]; > > struct flow_common *f =3D &flow->f; > > @@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, = union flow *flow, > > else > > goto nofwd; > > =20 > > - if (!(rule =3D fwd_rule_search(fwd, ini))) { > > + if (rule_hint >=3D 0) { > > + ASSERT((unsigned)rule_hint < fwd->count); > > + rule =3D &fwd->rules[rule_hint]; > > + if (!fwd_rule_match(rule, ini)) { > > + flow_dbg(flow, > > + "Unexpected mismatching forward rule"); > > + goto nofwd; > > + } > > + } else if (!(rule =3D fwd_rule_search(fwd, ini))) { >=20 > *If* we need this fwd_rule_search() / fwd_rule_match() / else if > fwd_rule_search() pattern in more places, it might be convenient to > hide the complexity by taking 'rule_hint' as argument for > fwd_rule_search() perhaps, and then do something like this in > fwd_rule_search(): >=20 > for (i =3D hint =3D=3D -1 ? 0 : hint; i < fwd->count; i++) { > if (fwd_rule_match(&fwd->rules[i], ini)) > return &fwd->rules[i]; > if (hint) > break; > } >=20 > return NULL; >=20 > but I haven't really thought this through. It makes fwd_rule_search() a > bit ugly. I considered that approach, but decided against it. However, looking again I think that might make dealing with the coverity issue in the earlier patch a bit cleaner, so I'm reconsidering. > > /* This shouldn't happen, because if there's no rule for > > * it we should have no listening socket that would let > > * us get here > > diff --git a/flow_table.h b/flow_table.h > > index 5ee13acc..73de13ba 100644 > > --- a/flow_table.h > > +++ b/flow_table.h > > @@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *f= low, uint8_t pif, > > const void *saddr, in_port_t sport, > > const void *daddr, in_port_t dport); > > struct flowside *flow_target(const struct ctx *c, union flow *flow, > > - uint8_t proto); > > + int rule_hint, uint8_t proto); > > =20 > > union flow *flow_set_type(union flow *flow, enum flow_type type); > > #define FLOW_SET_TYPE(flow_, t_, var_) (&flow_set_type((flow_), (t_))-= >var_) > > diff --git a/fwd.c b/fwd.c > > index 6727d26f..250b470c 100644 > > --- a/fwd.c > > +++ b/fwd.c > > @@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t fl= ags, > > * > > * Returns: true if the rule applies to the flow, false otherwise > > */ > > -static bool fwd_rule_match(const struct fwd_rule *rule, > > - const struct flowside *ini) > > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside= *ini) > > { > > return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) && > > ini->oport >=3D rule->first && ini->oport <=3D rule->last; > > diff --git a/fwd.h b/fwd.h > > index 435f422a..2af7791d 100644 > > --- a/fwd.h > > +++ b/fwd.h > > @@ -106,6 +106,7 @@ struct fwd_ports { > > void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags, > > const union inany_addr *addr, const char *ifname, > > in_port_t first, in_port_t last, in_port_t to); > > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside= *ini); > > const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd, > > const struct flowside *ini); > > void fwd_rules_print(const struct fwd_ports *fwd); > > diff --git a/icmp.c b/icmp.c > > index 9564c496..0f4d48bb 100644 > > --- a/icmp.c > > +++ b/icmp.c > > @@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const s= truct ctx *c, > > return NULL; > > =20 > > flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id); > > - if (!(tgt =3D flow_target(c, flow, proto))) > > + if (!(tgt =3D flow_target(c, flow, -1, proto))) > > goto cancel; > > =20 > > if (flow->f.pif[TGTSIDE] !=3D PIF_HOST) { > > diff --git a/tcp.c b/tcp.c > > index fc03e38f..89b22a51 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c= , sa_family_t af, > > ini =3D flow_initiate_af(flow, PIF_TAP, > > af, saddr, srcport, daddr, dstport); > > =20 > > - if (!(tgt =3D flow_target(c, flow, IPPROTO_TCP))) > > + if (!(tgt =3D flow_target(c, flow, -1, IPPROTO_TCP))) > > goto cancel; > > =20 > > if (flow->f.pif[TGTSIDE] !=3D PIF_HOST) { > > @@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, unio= n epoll_ref ref, > > goto cancel; > > } > > =20 > > - if (!flow_target(c, flow, IPPROTO_TCP)) > > + if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP)) > > goto cancel; > > =20 > > switch (flow->f.pif[TGTSIDE]) { > > diff --git a/udp.c b/udp.c > > index 761221f6..d7e47e52 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx = *c, int s, int n, > > * udp_sock_fwd() - Forward datagrams from a possibly unconnected sock= et > > * @c: Execution context > > * @s: Socket to forward from > > + * @rule_hint: Forwarding rule to use, or -1 if unknown > > * @frompif: Interface to which @s belongs > > * @port: Our (local) port number of @s > > * @now: Current timestamp > > */ > > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, > > - in_port_t port, const struct timespec *now) > > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > > + uint8_t frompif, in_port_t port, const struct timespec *now) > > { > > union sockaddr_inany src; > > union inany_addr dst; > > @@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8= _t frompif, > > continue; > > } > > =20 > > - tosidx =3D udp_flow_from_sock(c, frompif, &dst, port, &src, now); > > + tosidx =3D udp_flow_from_sock(c, frompif, &dst, port, &src, > > + rule_hint, now); > > topif =3D pif_at_sidx(tosidx); > > =20 > > if (pif_is_socket(topif)) { > > @@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c, > > union epoll_ref ref, uint32_t events, > > const struct timespec *now) > > { > > - if (events & (EPOLLERR | EPOLLIN)) > > - udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now); > > + if (events & (EPOLLERR | EPOLLIN)) { > > + udp_sock_fwd(c, ref.fd, ref.listen.rule, > > + ref.listen.pif, ref.listen.port, now); > > + } > > } > > =20 > > /** > > diff --git a/udp_flow.c b/udp_flow.c > > index 8907f2f7..b82c6c06 100644 > > --- a/udp_flow.c > > +++ b/udp_flow.c > > @@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c, > > * udp_flow_new() - Common setup for a new UDP flow > > * @c: Execution context > > * @flow: Initiated flow > > + * @rule_hint: Index of forwarding rule, or -1 if unknown > > * @now: Timestamp > > * > > * Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_= NONE > > @@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c, > > * #syscalls getsockname > > */ > > static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > > - const struct timespec *now) > > + int rule_hint, const struct timespec *now) > > { > > struct udp_flow *uflow =3D NULL; > > const struct flowside *tgt; > > unsigned sidei; > > =20 > > - if (!(tgt =3D flow_target(c, flow, IPPROTO_UDP))) > > + if (!(tgt =3D flow_target(c, flow, rule_hint, IPPROTO_UDP))) > > goto cancel; > > =20 > > uflow =3D FLOW_SET_TYPE(flow, FLOW_UDP, udp); > > @@ -216,6 +217,7 @@ cancel: > > * @dst: Our (local) address to which the datagram is arriving > > * @port: Our (local) port number to which the datagram is arriving > > * @s_in: Source socket address, filled in by recvmmsg() > > + * @rule_hint: Index of forwarding rule, or -1 if unknown > > * @now: Timestamp > > * > > * #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64 > > @@ -226,7 +228,7 @@ cancel: > > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > > const union inany_addr *dst, in_port_t port, > > const union sockaddr_inany *s_in, > > - const struct timespec *now) > > + int rule_hint, const struct timespec *now) > > { > > const struct flowside *ini; > > struct udp_flow *uflow; > > @@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c,= uint8_t pif, > > return FLOW_SIDX_NONE; > > } > > =20 > > - return udp_flow_new(c, flow, now); > > + return udp_flow_new(c, flow, rule_hint, now); > > } > > =20 > > /** > > @@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c, > > return FLOW_SIDX_NONE; > > } > > =20 > > - return udp_flow_new(c, flow, now); > > + return udp_flow_new(c, flow, -1, now); > > } > > =20 > > /** > > @@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c, > > { > > /* We don't know exactly where the datagrams will come from, but we k= now > > * they'll have an interface and oport matching this flow */ > > - udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei], > > + udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei], >=20 > I guess this is fixed on your local branch because this patch (and only > this one) applied with some fuzz, but in case it's not... this passes > -1 as socket ('s' argument), not as 'rule_hint'. Oops, no. Not sure where the fuzz came from, but this was wrong in my tree.. > If I swap it with uflow->s[sidei] it all works as expected. =2E. but not any more. > > uflow->f.side[sidei].oport, now); > > } > > =20 > > diff --git a/udp_flow.h b/udp_flow.h > > index 4c528e95..14e0f920 100644 > > --- a/udp_flow.h > > +++ b/udp_flow.h > > @@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx); > > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > > const union inany_addr *dst, in_port_t port, > > const union sockaddr_inany *s_in, > > - const struct timespec *now); > > + int rule_hint, const struct timespec *now); > > flow_sidx_t udp_flow_from_tap(const struct ctx *c, > > uint8_t pif, sa_family_t af, > > const void *saddr, const void *daddr, > > diff --git a/udp_internal.h b/udp_internal.h > > index 96d11cff..f0ce8f14 100644 > > --- a/udp_internal.h > > +++ b/udp_internal.h > > @@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp= _payload_t *bp, > > size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, > > const struct flowside *toside, size_t dlen, > > bool no_udp_csum); > > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, > > - in_port_t port, const struct timespec *now); > > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > > + uint8_t frompif, in_port_t port, const struct timespec *now); > > =20 > > #endif /* UDP_INTERNAL_H */ >=20 > So, uh, having reached this point, I was asking myself "is this really > everything"? :) >=20 > Well, it's not exactly everything as you mentioned in the cover letter, > perhaps the biggest missing part (other than client / interface) is > outbound forwarding, but still, reaching this point with just 14 > relatively small patches is quite impressive. >=20 > I guess it's also merit of how generic / well structured the flow table > and especially the flow "stages" are. Thank you? As also noted in the cover letter, I think this is now complete enough to be interesting. There is a lot still to do though: * Allow configuration of the target address as well as matching address * Allow addresses to be configured for outbound / splice mappings * Allow n to 1 port remapping * Remove port bitmap (or rather make them local variables only) * This is a bit complicated because if we want to implement port scanning for passt, we might need it to be persistent again * Allow "auto" mappings to be configured * Allow "auto" mappings to be for specific bind addresses * Outbound forwarding table * Configuring specific NATs * ? Merging separate tcp/udp in/out tables into one * Remove global forwarding mode (other than as a local during conf) * Probably a bunch more I'm not remembering And, of course, dynamic updates.=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --/I4XIg/8e3JaR9IC Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmlm6/sACgkQzQJF27ox 2GeNLw//Xr23rK7fozL0cD2GIroLgxkLyugDFLGsZqQehRWbZDuuitqgvT/oo0Dr DrwOBe0YeXM/jnbJveGuoNpZGQqLFW79vVB5sYBhL19sOwAZYuDrEvtRVkuxjlQZ 0Lumsw4KvLiZ9Si85GBnyNr9VOOLZNIh/4mMCv8wmqaZXbp0tYHB3u5lWR5dU71d qBQ1BRhBHEKj4THMCp+5tGVbo1pbekxyFk/k8iSxO1c4FKk7HYdbAqeoGjhN1fHA 5i9mZ7c+dodxWiXGEvOUAX/pC0E1upiVr8u9H3PzEizKeo9qtLR0+Pg9v9LdxLu9 c5dI8aBEbFNdnbZW2c+yQU1KwQnWRBZzL9vNqMjZaVTPqeJeEPC9i8Wj3cNYt4qC RP9b5GTrDe/aG1T6gnITQgdtJOyC3IijvxBgTUjwQ6D1lNFGiQsdJLBLZ91tmrMk azdjBEhADviGxWVzolwu+8OldHonmHlCtJUkYo3RPhP3BycNy94waNDOJuR3+dAG UsXuOYAsFXlMNKJCcYM+1URpADdkZGaX9FQgd4r1wdFxu/0cGqW5RBtkB3e6bPqL 16i5lai9SvcmSPqVA0gbnCu4eO1nVcdcAwQL1SOL8GggY4eUn92R0xKp9fCG1QED iHLbceyMG1WUbSMxpoueIp5NrNTkbkQRDpAj0Zsfxw5n/K9W1mk= =UHOe -----END PGP SIGNATURE----- --/I4XIg/8e3JaR9IC--