public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible
Date: Wed, 14 Jan 2026 12:06:04 +1100	[thread overview]
Message-ID: <aWbr_JmvhOOdEWnH@zatzit> (raw)
In-Reply-To: <20260113231303.78fb3471@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 14535 bytes --]

On Tue, Jan 13, 2026 at 11:13:03PM +0100, Stefano Brivio wrote:
> On Thu,  8 Jan 2026 13:29:48 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Now that listening sockets include a reference to the forwarding rule which
> > created them we can, in many cases, avoid a linear search of the forwarding
> > table when we want to find the relevant rule.  Instead we can take the
> > rule index from the socket's epoll reference, and use that to immediately
> > find the correct rule.
> > 
> > This is conceptually simple, but requires a moderate amount of plumbing to
> > get the index from the reference through to the rule lookup.  We still
> > allow fall back to linear search if we don't have the index, and this may
> > (rarely) be used in the udp_flush_flow() case, where we could get packets
> > for one flow on a different flow's socket, rather than through a listening
> > socket as usual.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  flow.c         | 13 +++++++++++--
> >  flow_table.h   |  2 +-
> >  fwd.c          |  3 +--
> >  fwd.h          |  1 +
> >  icmp.c         |  2 +-
> >  tcp.c          |  4 ++--
> >  udp.c          | 14 +++++++++-----
> >  udp_flow.c     | 14 ++++++++------
> >  udp_flow.h     |  2 +-
> >  udp_internal.h |  4 ++--
> >  10 files changed, 37 insertions(+), 22 deletions(-)
> > 
> > diff --git a/flow.c b/flow.c
> > index 38f88732..85759970 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif,
> >   * flow_target() - Determine where flow should forward to, and move to TGT
> >   * @c:		Execution context
> >   * @flow:	Flow to forward
> > + * @rule_hint:	Index of relevant forwarding rule, or -1 if unknown
> >   * @proto:	Protocol
> >   *
> >   * Return: pointer to the target flowside information
> >   */
> >  struct flowside *flow_target(const struct ctx *c, union flow *flow,
> > -			     uint8_t proto)
> > +			     int rule_hint, uint8_t proto)
> >  {
> >  	char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN];
> >  	struct flow_common *f = &flow->f;
> > @@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow,
> >  		else
> >  			goto nofwd;
> >  
> > -		if (!(rule = fwd_rule_search(fwd, ini))) {
> > +		if (rule_hint >= 0) {
> > +			ASSERT((unsigned)rule_hint < fwd->count);
> > +			rule = &fwd->rules[rule_hint];
> > +			if (!fwd_rule_match(rule, ini)) {
> > +				flow_dbg(flow,
> > +					 "Unexpected mismatching forward rule");
> > +				goto nofwd;
> > +			}
> > +		} else if (!(rule = fwd_rule_search(fwd, ini))) {
> 
> *If* we need this fwd_rule_search() / fwd_rule_match() / else if
> fwd_rule_search() pattern in more places, it might be convenient to
> hide the complexity by taking 'rule_hint' as argument for
> fwd_rule_search() perhaps, and then do something like this in
> fwd_rule_search():
> 
> 	for (i = hint == -1 ? 0 : hint; i < fwd->count; i++) {
> 		if (fwd_rule_match(&fwd->rules[i], ini))
> 			return &fwd->rules[i];
> 		if (hint)
> 			break;
> 	}
> 
> 	return NULL;
> 
> but I haven't really thought this through. It makes fwd_rule_search() a
> bit ugly.

I considered that approach, but decided against it.  However, looking
again I think that might make dealing with the coverity issue in the
earlier patch a bit cleaner, so I'm reconsidering.

> >  			/* This shouldn't happen, because if there's no rule for
> >  			 * it we should have no listening socket that would let
> >  			 * us get here
> > diff --git a/flow_table.h b/flow_table.h
> > index 5ee13acc..73de13ba 100644
> > --- a/flow_table.h
> > +++ b/flow_table.h
> > @@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *flow, uint8_t pif,
> >  				      const void *saddr, in_port_t sport,
> >  				      const void *daddr, in_port_t dport);
> >  struct flowside *flow_target(const struct ctx *c, union flow *flow,
> > -			     uint8_t proto);
> > +			     int rule_hint, uint8_t proto);
> >  
> >  union flow *flow_set_type(union flow *flow, enum flow_type type);
> >  #define FLOW_SET_TYPE(flow_, t_, var_)	(&flow_set_type((flow_), (t_))->var_)
> > diff --git a/fwd.c b/fwd.c
> > index 6727d26f..250b470c 100644
> > --- a/fwd.c
> > +++ b/fwd.c
> > @@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
> >   *
> >   * Returns: true if the rule applies to the flow, false otherwise
> >   */
> > -static bool fwd_rule_match(const struct fwd_rule *rule,
> > -			   const struct flowside *ini)
> > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini)
> >  {
> >  	return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) &&
> >  		ini->oport >= rule->first && ini->oport <= rule->last;
> > diff --git a/fwd.h b/fwd.h
> > index 435f422a..2af7791d 100644
> > --- a/fwd.h
> > +++ b/fwd.h
> > @@ -106,6 +106,7 @@ struct fwd_ports {
> >  void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags,
> >  		  const union inany_addr *addr, const char *ifname,
> >  		  in_port_t first, in_port_t last, in_port_t to);
> > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini);
> >  const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd,
> >  				       const struct flowside *ini);
> >  void fwd_rules_print(const struct fwd_ports *fwd);
> > diff --git a/icmp.c b/icmp.c
> > index 9564c496..0f4d48bb 100644
> > --- a/icmp.c
> > +++ b/icmp.c
> > @@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
> >  		return NULL;
> >  
> >  	flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id);
> > -	if (!(tgt = flow_target(c, flow, proto)))
> > +	if (!(tgt = flow_target(c, flow, -1, proto)))
> >  		goto cancel;
> >  
> >  	if (flow->f.pif[TGTSIDE] != PIF_HOST) {
> > diff --git a/tcp.c b/tcp.c
> > index fc03e38f..89b22a51 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
> >  	ini = flow_initiate_af(flow, PIF_TAP,
> >  			       af, saddr, srcport, daddr, dstport);
> >  
> > -	if (!(tgt = flow_target(c, flow, IPPROTO_TCP)))
> > +	if (!(tgt = flow_target(c, flow, -1, IPPROTO_TCP)))
> >  		goto cancel;
> >  
> >  	if (flow->f.pif[TGTSIDE] != PIF_HOST) {
> > @@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref,
> >  		goto cancel;
> >  	}
> >  
> > -	if (!flow_target(c, flow, IPPROTO_TCP))
> > +	if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP))
> >  		goto cancel;
> >  
> >  	switch (flow->f.pif[TGTSIDE]) {
> > diff --git a/udp.c b/udp.c
> > index 761221f6..d7e47e52 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n,
> >   * udp_sock_fwd() - Forward datagrams from a possibly unconnected socket
> >   * @c:		Execution context
> >   * @s:		Socket to forward from
> > + * @rule_hint:	Forwarding rule to use, or -1 if unknown
> >   * @frompif:	Interface to which @s belongs
> >   * @port:	Our (local) port number of @s
> >   * @now:	Current timestamp
> >   */
> > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> > -		  in_port_t port, const struct timespec *now)
> > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
> > +		  uint8_t frompif, in_port_t port, const struct timespec *now)
> >  {
> >  	union sockaddr_inany src;
> >  	union inany_addr dst;
> > @@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> >  			continue;
> >  		}
> >  
> > -		tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now);
> > +		tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src,
> > +					    rule_hint, now);
> >  		topif = pif_at_sidx(tosidx);
> >  
> >  		if (pif_is_socket(topif)) {
> > @@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c,
> >  			     union epoll_ref ref, uint32_t events,
> >  			     const struct timespec *now)
> >  {
> > -	if (events & (EPOLLERR | EPOLLIN))
> > -		udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now);
> > +	if (events & (EPOLLERR | EPOLLIN)) {
> > +		udp_sock_fwd(c, ref.fd, ref.listen.rule,
> > +			     ref.listen.pif, ref.listen.port, now);
> > +	}
> >  }
> >  
> >  /**
> > diff --git a/udp_flow.c b/udp_flow.c
> > index 8907f2f7..b82c6c06 100644
> > --- a/udp_flow.c
> > +++ b/udp_flow.c
> > @@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c,
> >   * udp_flow_new() - Common setup for a new UDP flow
> >   * @c:		Execution context
> >   * @flow:	Initiated flow
> > + * @rule_hint:	Index of forwarding rule, or -1 if unknown
> >   * @now:	Timestamp
> >   *
> >   * Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_NONE
> > @@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c,
> >   * #syscalls getsockname
> >   */
> >  static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow,
> > -				const struct timespec *now)
> > +				int rule_hint, const struct timespec *now)
> >  {
> >  	struct udp_flow *uflow = NULL;
> >  	const struct flowside *tgt;
> >  	unsigned sidei;
> >  
> > -	if (!(tgt = flow_target(c, flow, IPPROTO_UDP)))
> > +	if (!(tgt = flow_target(c, flow, rule_hint, IPPROTO_UDP)))
> >  		goto cancel;
> >  
> >  	uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp);
> > @@ -216,6 +217,7 @@ cancel:
> >   * @dst:	Our (local) address to which the datagram is arriving
> >   * @port:	Our (local) port number to which the datagram is arriving
> >   * @s_in:	Source socket address, filled in by recvmmsg()
> > + * @rule_hint:	Index of forwarding rule, or -1 if unknown
> >   * @now:	Timestamp
> >   *
> >   * #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64
> > @@ -226,7 +228,7 @@ cancel:
> >  flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> >  			       const union inany_addr *dst, in_port_t port,
> >  			       const union sockaddr_inany *s_in,
> > -			       const struct timespec *now)
> > +			       int rule_hint, const struct timespec *now)
> >  {
> >  	const struct flowside *ini;
> >  	struct udp_flow *uflow;
> > @@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> >  		return FLOW_SIDX_NONE;
> >  	}
> >  
> > -	return udp_flow_new(c, flow, now);
> > +	return udp_flow_new(c, flow, rule_hint, now);
> >  }
> >  
> >  /**
> > @@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c,
> >  		return FLOW_SIDX_NONE;
> >  	}
> >  
> > -	return udp_flow_new(c, flow, now);
> > +	return udp_flow_new(c, flow, -1, now);
> >  }
> >  
> >  /**
> > @@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c,
> >  {
> >  	/* We don't know exactly where the datagrams will come from, but we know
> >  	 * they'll have an interface and oport matching this flow */
> > -	udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei],
> > +	udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei],
> 
> I guess this is fixed on your local branch because this patch (and only
> this one) applied with some fuzz, but in case it's not... this passes
> -1 as socket ('s' argument), not as 'rule_hint'.

Oops, no.  Not sure where the fuzz came from, but this was wrong in my
tree..

> If I swap it with uflow->s[sidei] it all works as expected.

.. but not any more.

> >  		     uflow->f.side[sidei].oport, now);
> >  }
> >  
> > diff --git a/udp_flow.h b/udp_flow.h
> > index 4c528e95..14e0f920 100644
> > --- a/udp_flow.h
> > +++ b/udp_flow.h
> > @@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx);
> >  flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif,
> >  			       const union inany_addr *dst, in_port_t port,
> >  			       const union sockaddr_inany *s_in,
> > -			       const struct timespec *now);
> > +			       int rule_hint, const struct timespec *now);
> >  flow_sidx_t udp_flow_from_tap(const struct ctx *c,
> >  			      uint8_t pif, sa_family_t af,
> >  			      const void *saddr, const void *daddr,
> > diff --git a/udp_internal.h b/udp_internal.h
> > index 96d11cff..f0ce8f14 100644
> > --- a/udp_internal.h
> > +++ b/udp_internal.h
> > @@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> >  size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> >                         const struct flowside *toside, size_t dlen,
> >  		       bool no_udp_csum);
> > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif,
> > -		  in_port_t port, const struct timespec *now);
> > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
> > +		  uint8_t frompif, in_port_t port, const struct timespec *now);
> >  
> >  #endif /* UDP_INTERNAL_H */
> 
> So, uh, having reached this point, I was asking myself "is this really
> everything"? :)
> 
> Well, it's not exactly everything as you mentioned in the cover letter,
> perhaps the biggest missing part (other than client / interface) is
> outbound forwarding, but still, reaching this point with just 14
> relatively small patches is quite impressive.
> 
> I guess it's also merit of how generic / well structured the flow table
> and especially the flow "stages" are.

Thank you?

As also noted in the cover letter, I think this is now complete enough
to be interesting.  There is a lot still to do though:

 * Allow configuration of the target address as well as matching address
 * Allow addresses to be configured for outbound / splice mappings
 * Allow n to 1 port remapping
 * Remove port bitmap (or rather make them local variables only)
   * This is a bit complicated because if we want to implement port
     scanning for passt, we might need it to be persistent again
 * Allow "auto" mappings to be configured
 * Allow "auto" mappings to be for specific bind addresses
 * Outbound forwarding table
 * Configuring specific NATs
 * ? Merging separate tcp/udp in/out tables into one
 * Remove global forwarding mode (other than as a local during conf)
 * Probably a bunch more I'm not remembering

And, of course, dynamic updates. 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2026-01-14  1:06 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08  2:29 [PATCH v3 00/14] Introduce forwarding table David Gibson
2026-01-08  2:29 ` [PATCH v3 01/14] inany: Extend inany_ntop() to treat NULL as a fully unspecified address David Gibson
2026-01-08 13:16   ` Laurent Vivier
2026-01-08  2:29 ` [PATCH v3 02/14] conf, fwd: Keep a table of our port forwarding configuration David Gibson
2026-01-12 23:26   ` Stefano Brivio
2026-01-13  5:12     ` David Gibson
2026-01-13 22:13       ` Stefano Brivio
2026-01-13 23:53         ` David Gibson
2026-01-08  2:29 ` [PATCH v3 03/14] conf: Accurately record ifname and address for outbound forwards David Gibson
2026-01-08  2:29 ` [PATCH v3 04/14] conf, fwd: Record "auto" port forwards in forwarding table David Gibson
2026-01-12 23:26   ` Stefano Brivio
2026-01-13  5:14     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 05/14] fwd: Make space to store listening sockets in forward table David Gibson
2026-01-12 23:26   ` Stefano Brivio
2026-01-13  5:28     ` David Gibson
2026-01-13 22:13       ` Stefano Brivio
2026-01-13 23:57         ` David Gibson
2026-01-08  2:29 ` [PATCH v3 06/14] ip: Add ipproto_name() function David Gibson
2026-01-08 13:22   ` Laurent Vivier
2026-01-08 23:12     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 07/14] fwd, tcp, udp: Set up listening sockets based on forward table David Gibson
2026-01-12 23:26   ` Stefano Brivio
2026-01-13  5:38     ` David Gibson
2026-01-13 22:13       ` Stefano Brivio
2026-01-13 23:59         ` David Gibson
2026-01-08  2:29 ` [PATCH v3 08/14] tcp, udp: Remove old auto-forwarding socket arrays David Gibson
2026-01-08  2:29 ` [PATCH v3 09/14] conf, fwd: Check forwarding table for conflicting rules David Gibson
2026-01-12 23:26   ` Stefano Brivio
2026-01-13  5:41     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 10/14] fwd: Generate auto-forward exclusions from socket fd tables David Gibson
2026-01-08  2:29 ` [PATCH v3 11/14] flow, fwd: Consult rules table when forwarding a new flow from socket David Gibson
2026-01-13 22:12   ` Stefano Brivio
2026-01-14  0:09     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 12/14] fwd: Remap ports based directly on forwarding rule David Gibson
2026-01-13 22:12   ` Stefano Brivio
2026-01-14  0:24     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 13/14] fwd, tcp, udp: Add forwarding rule to listening socket epoll references David Gibson
2026-01-13 22:12   ` Stefano Brivio
2026-01-14  0:37     ` David Gibson
2026-01-08  2:29 ` [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible David Gibson
2026-01-13 22:13   ` Stefano Brivio
2026-01-14  1:06     ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWbr_JmvhOOdEWnH@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).