From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible
Date: Wed, 14 Jan 2026 12:06:04 +1100 [thread overview]
Message-ID: <aWbr_JmvhOOdEWnH@zatzit> (raw)
In-Reply-To: <20260113231303.78fb3471@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 14535 bytes --]
On Tue, Jan 13, 2026 at 11:13:03PM +0100, Stefano Brivio wrote:
> On Thu, 8 Jan 2026 13:29:48 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > Now that listening sockets include a reference to the forwarding rule which
> > created them we can, in many cases, avoid a linear search of the forwarding
> > table when we want to find the relevant rule. Instead we can take the
> > rule index from the socket's epoll reference, and use that to immediately
> > find the correct rule.
> >
> > This is conceptually simple, but requires a moderate amount of plumbing to
> > get the index from the reference through to the rule lookup. We still
> > allow fall back to linear search if we don't have the index, and this may
> > (rarely) be used in the udp_flush_flow() case, where we could get packets
> > for one flow on a different flow's socket, rather than through a listening
> > socket as usual.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > flow.c | 13 +++++++++++--
> > flow_table.h | 2 +-
> > fwd.c | 3 +--
> > fwd.h | 1 +
> > icmp.c | 2 +-
> > tcp.c | 4 ++--
> > udp.c | 14 +++++++++-----
> > udp_flow.c | 14 ++++++++------
> > udp_flow.h | 2 +-
> > udp_internal.h | 4 ++--
> > 10 files changed, 37 insertions(+), 22 deletions(-)
> >
> > diff --git a/flow.c b/flow.c
> > index 38f88732..85759970 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif,
> > * flow_target() - Determine where flow should forward to, and move to TGT
> > * @c: Execution context
> > * @flow: Flow to forward
> > + * @rule_hint: Index of relevant forwarding rule, or -1 if unknown
> > * @proto: Protocol
> > *
> > * Return: pointer to the target flowside information
> > */
> > struct flowside *flow_target(const struct ctx *c, union flow *flow,
> > - uint8_t proto)
> > + int rule_hint, uint8_t proto)
> > {
> > char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN];
> > struct flow_common *f = &flow->f;
> > @@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
> > else
> > goto nofwd;
> >
> > - if (!(rule = fwd_rule_search(fwd, ini))) {
> > + if (rule_hint >= 0) {
> > + ASSERT((unsigned)rule_hint < fwd->count);
> > + rule = &fwd->rules[rule_hint];
> > + if (!fwd_rule_match(rule, ini)) {
> > + flow_dbg(flow,
> > + "Unexpected mismatching forward rule");
> > + goto nofwd;
> > + }
> > + } else if (!(rule = fwd_rule_search(fwd, ini))) {
>
> *If* we need this fwd_rule_search() / fwd_rule_match() / else if
> fwd_rule_search() pattern in more places, it might be convenient to
> hide the complexity by taking 'rule_hint' as argument for
> fwd_rule_search() perhaps, and then do something like this in
> fwd_rule_search():
>
> for (i = hint == -1 ? 0 : hint; i < fwd->count; i++) {
> if (fwd_rule_match(&fwd->rules[i], ini))
> return &fwd->rules[i];
> if (hint)
> break;
> }
>
> return NULL;
>
> but I haven't really thought this through. It makes fwd_rule_search() a
> bit ugly.
I considered that approach, but decided against it. However, looking
again I think that might make dealing with the coverity issue in the
earlier patch a bit cleaner, so I'm reconsidering.
> > /* This shouldn't happen, because if there's no rule for
> > * it we should have no listening socket that would let
> > * us get here
> > diff --git a/flow_table.h b/flow_table.h
> > index 5ee13acc..73de13ba 100644
> > --- a/flow_table.h
> > +++ b/flow_table.h
> > @@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *flow, uint8_t pif,
> > const void *saddr, in_port_t sport,
> > const void *daddr, in_port_t dport);
> > struct flowside *flow_target(const struct ctx *c, union flow *flow,
> > - uint8_t proto);
> > + int rule_hint, uint8_t proto);
> >
> > union flow *flow_set_type(union flow *flow, enum flow_type type);
> > #define FLOW_SET_TYPE(flow_, t_, var_) (&flow_set_type((flow_), (t_))->var_)
> > diff --git a/fwd.c b/fwd.c
> > index 6727d26f..250b470c 100644
> > --- a/fwd.c
> > +++ b/fwd.c
> > @@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
> > *
> > * Returns: true if the rule applies to the flow, false otherwise
> > */
> > -static bool fwd_rule_match(const struct fwd_rule *rule,
> > - const struct flowside *ini)
> > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini)
> > {
> > return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) &&
> > ini->oport >= rule->first && ini->oport <= rule->last;
> > diff --git a/fwd.h b/fwd.h
> > index 435f422a..2af7791d 100644
> > --- a/fwd.h
> > +++ b/fwd.h
> > @@ -106,6 +106,7 @@ struct fwd_ports {
> > void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
> > const union inany_addr *addr, const char *ifname,
> > in_port_t first, in_port_t last, in_port_t to);
> > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini);
> > const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd,
> > const struct flowside *ini);
> > void fwd_rules_print(const struct fwd_ports *fwd);
> > diff --git a/icmp.c b/icmp.c
> > index 9564c496..0f4d48bb 100644
> > --- a/icmp.c
> > +++ b/icmp.c
> > @@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
> > return NULL;
> >
> > flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id);
> > - if (!(tgt = flow_target(c, flow, proto)))
> > + if (!(tgt = flow_target(c, flow, -1, proto)))
> > goto cancel;
> >
> > if (flow->f.pif[TGTSIDE] != PIF_HOST) {
> > diff --git a/tcp.c b/tcp.c
> > index fc03e38f..89b22a51 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
> > ini = flow_initiate_af(flow, PIF_TAP,
> > af, saddr, srcport, daddr, dstport);
> >
> > - if (!(tgt = flow_target(c, flow, IPPROTO_TCP)))
> > + if (!(tgt = flow_target(c, flow, -1, IPPROTO_TCP)))
> > goto cancel;
> >
> > if (flow->f.pif[TGTSIDE] != PIF_HOST) {
> > @@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref,
> > goto cancel;
> > }
> >
> > - if (!flow_target(c, flow, IPPROTO_TCP))
> > + if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP))
> > goto cancel;
> >
> > switch (flow->f.pif[TGTSIDE]) {
> > diff --git a/udp.c b/udp.c
> > index 761221f6..d7e47e52 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n,
> > * udp_sock_fwd() - Forward datagrams from a possibly unconnected socket
> > * @c: Execution context
> > * @s: Socket to forward from
> > + * @rule_hint: Forwarding rule to use, or -1 if unknown
> > * @frompif: Interface to which @s belongs
> > * @port: Our (local) port number of @s
> > * @now: Current timestamp
> > */
> > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> > - in_port_t port, const struct timespec *now)
> > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
> > + uint8_t frompif, in_port_t port, const struct timespec *now)
> > {
> > union sockaddr_inany src;
> > union inany_addr dst;
> > @@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> > continue;
> > }
> >
> > - tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now);
> > + tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src,
> > + rule_hint, now);
> > topif = pif_at_sidx(tosidx);
> >
> > if (pif_is_socket(topif)) {
> > @@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c,
> > union epoll_ref ref, uint32_t events,
> > const struct timespec *now)
> > {
> > - if (events & (EPOLLERR | EPOLLIN))
> > - udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now);
> > + if (events & (EPOLLERR | EPOLLIN)) {
> > + udp_sock_fwd(c, ref.fd, ref.listen.rule,
> > + ref.listen.pif, ref.listen.port, now);
> > + }
> > }
> >
> > /**
> > diff --git a/udp_flow.c b/udp_flow.c
> > index 8907f2f7..b82c6c06 100644
> > --- a/udp_flow.c
> > +++ b/udp_flow.c
> > @@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c,
> > * udp_flow_new() - Common setup for a new UDP flow
> > * @c: Execution context
> > * @flow: Initiated flow
> > + * @rule_hint: Index of forwarding rule, or -1 if unknown
> > * @now: Timestamp
> > *
> > * Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_NONE
> > @@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c,
> > * #syscalls getsockname
> > */
> > static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow,
> > - const struct timespec *now)
> > + int rule_hint, const struct timespec *now)
> > {
> > struct udp_flow *uflow = NULL;
> > const struct flowside *tgt;
> > unsigned sidei;
> >
> > - if (!(tgt = flow_target(c, flow, IPPROTO_UDP)))
> > + if (!(tgt = flow_target(c, flow, rule_hint, IPPROTO_UDP)))
> > goto cancel;
> >
> > uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp);
> > @@ -216,6 +217,7 @@ cancel:
> > * @dst: Our (local) address to which the datagram is arriving
> > * @port: Our (local) port number to which the datagram is arriving
> > * @s_in: Source socket address, filled in by recvmmsg()
> > + * @rule_hint: Index of forwarding rule, or -1 if unknown
> > * @now: Timestamp
> > *
> > * #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64
> > @@ -226,7 +228,7 @@ cancel:
> > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> > const union inany_addr *dst, in_port_t port,
> > const union sockaddr_inany *s_in,
> > - const struct timespec *now)
> > + int rule_hint, const struct timespec *now)
> > {
> > const struct flowside *ini;
> > struct udp_flow *uflow;
> > @@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> > return FLOW_SIDX_NONE;
> > }
> >
> > - return udp_flow_new(c, flow, now);
> > + return udp_flow_new(c, flow, rule_hint, now);
> > }
> >
> > /**
> > @@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
> > return FLOW_SIDX_NONE;
> > }
> >
> > - return udp_flow_new(c, flow, now);
> > + return udp_flow_new(c, flow, -1, now);
> > }
> >
> > /**
> > @@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c,
> > {
> > /* We don't know exactly where the datagrams will come from, but we know
> > * they'll have an interface and oport matching this flow */
> > - udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei],
> > + udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei],
>
> I guess this is fixed on your local branch because this patch (and only
> this one) applied with some fuzz, but in case it's not... this passes
> -1 as socket ('s' argument), not as 'rule_hint'.
Oops, no. Not sure where the fuzz came from, but this was wrong in my
tree..
> If I swap it with uflow->s[sidei] it all works as expected.
.. but not any more.
> > uflow->f.side[sidei].oport, now);
> > }
> >
> > diff --git a/udp_flow.h b/udp_flow.h
> > index 4c528e95..14e0f920 100644
> > --- a/udp_flow.h
> > +++ b/udp_flow.h
> > @@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx);
> > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> > const union inany_addr *dst, in_port_t port,
> > const union sockaddr_inany *s_in,
> > - const struct timespec *now);
> > + int rule_hint, const struct timespec *now);
> > flow_sidx_t udp_flow_from_tap(const struct ctx *c,
> > uint8_t pif, sa_family_t af,
> > const void *saddr, const void *daddr,
> > diff --git a/udp_internal.h b/udp_internal.h
> > index 96d11cff..f0ce8f14 100644
> > --- a/udp_internal.h
> > +++ b/udp_internal.h
> > @@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> > size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> > const struct flowside *toside, size_t dlen,
> > bool no_udp_csum);
> > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> > - in_port_t port, const struct timespec *now);
> > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
> > + uint8_t frompif, in_port_t port, const struct timespec *now);
> >
> > #endif /* UDP_INTERNAL_H */
>
> So, uh, having reached this point, I was asking myself "is this really
> everything"? :)
>
> Well, it's not exactly everything as you mentioned in the cover letter,
> perhaps the biggest missing part (other than client / interface) is
> outbound forwarding, but still, reaching this point with just 14
> relatively small patches is quite impressive.
>
> I guess it's also merit of how generic / well structured the flow table
> and especially the flow "stages" are.
Thank you?
As also noted in the cover letter, I think this is now complete enough
to be interesting. There is a lot still to do though:
* Allow configuration of the target address as well as matching address
* Allow addresses to be configured for outbound / splice mappings
* Allow n to 1 port remapping
* Remove port bitmap (or rather make them local variables only)
* This is a bit complicated because if we want to implement port
scanning for passt, we might need it to be persistent again
* Allow "auto" mappings to be configured
* Allow "auto" mappings to be for specific bind addresses
* Outbound forwarding table
* Configuring specific NATs
* ? Merging separate tcp/udp in/out tables into one
* Remove global forwarding mode (other than as a local during conf)
* Probably a bunch more I'm not remembering
And, of course, dynamic updates.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2026-01-14 1:06 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-08 2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
2026-01-08 2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
2026-01-08 13:16 ` Laurent Vivier
2026-01-08 2:29 ` [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration David Gibson
2026-01-12 23:26 ` Stefano Brivio
2026-01-13 5:12 ` David Gibson
2026-01-13 22:13 ` Stefano Brivio
2026-01-13 23:53 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards David Gibson
2026-01-08 2:29 ` [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
2026-01-12 23:26 ` Stefano Brivio
2026-01-13 5:14 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table David Gibson
2026-01-12 23:26 ` Stefano Brivio
2026-01-13 5:28 ` David Gibson
2026-01-13 22:13 ` Stefano Brivio
2026-01-13 23:57 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
2026-01-08 13:22 ` Laurent Vivier
2026-01-08 23:12 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
2026-01-12 23:26 ` Stefano Brivio
2026-01-13 5:38 ` David Gibson
2026-01-13 22:13 ` Stefano Brivio
2026-01-13 23:59 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
2026-01-08 2:29 ` [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules David Gibson
2026-01-12 23:26 ` Stefano Brivio
2026-01-13 5:41 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
2026-01-08 2:29 ` [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket David Gibson
2026-01-13 22:12 ` Stefano Brivio
2026-01-14 0:09 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule David Gibson
2026-01-13 22:12 ` Stefano Brivio
2026-01-14 0:24 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references David Gibson
2026-01-13 22:12 ` Stefano Brivio
2026-01-14 0:37 ` David Gibson
2026-01-08 2:29 ` [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible David Gibson
2026-01-13 22:13 ` Stefano Brivio
2026-01-14 1:06 ` David Gibson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWbr_JmvhOOdEWnH@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).