From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DlneZSCR; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 6F1375A004E for ; Tue, 13 Jan 2026 23:13:10 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768342389; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AJ+4l7Pqzsm7dxfpoym1yvkyeKfr34Ti+qQZ4jl1F+k=; b=DlneZSCRwTqjr7V069gEP5zYWpP6LtbtlFg2iC6NNtF68vht0AnZKLiwtZTO14Ae0maBpM E9PF/JrOMcPRPQpGwSHaxKnAxg0LTeR+80j3KZJ2MLpf/qEakqtfN7ViymIqdBeDPjWWvq s0lco+ojX8D0V9FpCiXqEg+vEgFhbUo= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-424-h5VzA5yVNOSGrFc2An00Fw-1; Tue, 13 Jan 2026 17:13:07 -0500 X-MC-Unique: h5VzA5yVNOSGrFc2An00Fw-1 X-Mimecast-MFC-AGG-ID: h5VzA5yVNOSGrFc2An00Fw_1768342387 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-477964c22e0so1671345e9.0 for ; Tue, 13 Jan 2026 14:13:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768342386; x=1768947186; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AJ+4l7Pqzsm7dxfpoym1yvkyeKfr34Ti+qQZ4jl1F+k=; b=dZ3XRbHx7HDgHT8PyddYVDuDAPnYUlj7FLYEI2G8bX+6Qc6WHTM53QUpvR1v+hk3L+ OdGAXhXiAYCznWRb6wU8YJH+LcFn2ksZeJ8UtBaUXVpaBoVD5ODfR1CldNz5f7CGLuyT pM5EG035LzK9QV19w5BLGY1+yqxq2bDhnRq1JJrGF/vhrdjXAV+M/nf+SlZVuR9ydocF 7lPaFxOcTqjF8SvBTs5lkd5dkUPaenzbkuGpEkUMrWg0TE48WwRWcfynAw/Wfp4D81Ug 0ygiNGyW/I+HlLSZc1i+dm9pHFjZ3S1nASikfR0LGVMPyaUL+leqgsWkq4tlkqXCVyZ4 cazg== X-Gm-Message-State: AOJu0YzV9YikGAmNto/Wn0TorcpeUt3mrMce7IAt4rYemYAytflkQt+x O8IR5HRk8kdE+RJPaRb7hX7z6UMn8x/j17uVjDIw1qcSzn3eJjJmCpF2fQ2ucJNLODwC7aH1NEl s1d0wq3oEQlE9638tWUu9k+G86UQ/E2EsP6JSje+WkTzeVCm8Sw01F67wmkqmOA== X-Gm-Gg: AY/fxX5y+vjR0VRQEo+bharqUOBFbKOS5H8zSmrZVzU2dOfU2ZirBYnapwi6iYVcP3M rn9jjcyaw5eIY09cEsu68ieTaFBj/g8NsktMuaObm/U87i7WEtjAT0wbj0tvYetS05zOrbhMSpp WuMNUId01X+M+TqJS9VuTW6EHCngEEgIAeFO53zyHcbfva2VK9CInAL8xW6PTSG6T+25Ukg7VLo zPkpJj74Ny9KLFoeGUhI4PTvDQmh6p3920S+sUzJAq7fhINFlr3xT8/Huh/AIuORunBLhi21GIh dRlnLe+qETHJjog+togLU16djxg8yEP1PUVOw3XrFyezbYpHTQo++LLDlyAAcyihpZxceEX+QbX 90ANaDcspY08m0Usg8OXJ X-Received: by 2002:a05:600c:3e08:b0:475:d9de:952e with SMTP id 5b1f17b1804b1-47ee3775164mr6087365e9.1.1768342386077; Tue, 13 Jan 2026 14:13:06 -0800 (PST) X-Received: by 2002:a05:600c:3e08:b0:475:d9de:952e with SMTP id 5b1f17b1804b1-47ee3775164mr6087155e9.1.1768342385540; Tue, 13 Jan 2026 14:13:05 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47ec5d95edbsm213161975e9.3.2026.01.13.14.13.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jan 2026 14:13:04 -0800 (PST) Date: Tue, 13 Jan 2026 23:13:03 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v3 14/14] flow, fwd: Optimise forwarding rule lookup using epoll ref when possible Message-ID: <20260113231303.78fb3471@elisabeth> In-Reply-To: <20260108022948.2657573-15-david@gibson.dropbear.id.au> References: <20260108022948.2657573-1-david@gibson.dropbear.id.au> <20260108022948.2657573-15-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: OZIklb7GnJWM1P3I684AlzF2rjb56YBvg8EKULLVoGY_1768342387 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 7RN4ZK2NTYJ3MSNQZIIFQKK6CQ2Y5VWM X-Message-ID-Hash: 7RN4ZK2NTYJ3MSNQZIIFQKK6CQ2Y5VWM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 8 Jan 2026 13:29:48 +1100 David Gibson wrote: > Now that listening sockets include a reference to the forwarding rule which > created them we can, in many cases, avoid a linear search of the forwarding > table when we want to find the relevant rule. Instead we can take the > rule index from the socket's epoll reference, and use that to immediately > find the correct rule. > > This is conceptually simple, but requires a moderate amount of plumbing to > get the index from the reference through to the rule lookup. We still > allow fall back to linear search if we don't have the index, and this may > (rarely) be used in the udp_flush_flow() case, where we could get packets > for one flow on a different flow's socket, rather than through a listening > socket as usual. > > Signed-off-by: David Gibson > --- > flow.c | 13 +++++++++++-- > flow_table.h | 2 +- > fwd.c | 3 +-- > fwd.h | 1 + > icmp.c | 2 +- > tcp.c | 4 ++-- > udp.c | 14 +++++++++----- > udp_flow.c | 14 ++++++++------ > udp_flow.h | 2 +- > udp_internal.h | 4 ++-- > 10 files changed, 37 insertions(+), 22 deletions(-) > > diff --git a/flow.c b/flow.c > index 38f88732..85759970 100644 > --- a/flow.c > +++ b/flow.c > @@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif, > * flow_target() - Determine where flow should forward to, and move to TGT > * @c: Execution context > * @flow: Flow to forward > + * @rule_hint: Index of relevant forwarding rule, or -1 if unknown > * @proto: Protocol > * > * Return: pointer to the target flowside information > */ > struct flowside *flow_target(const struct ctx *c, union flow *flow, > - uint8_t proto) > + int rule_hint, uint8_t proto) > { > char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN]; > struct flow_common *f = &flow->f; > @@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow, > else > goto nofwd; > > - if (!(rule = fwd_rule_search(fwd, ini))) { > + if (rule_hint >= 0) { > + ASSERT((unsigned)rule_hint < fwd->count); > + rule = &fwd->rules[rule_hint]; > + if (!fwd_rule_match(rule, ini)) { > + flow_dbg(flow, > + "Unexpected mismatching forward rule"); > + goto nofwd; > + } > + } else if (!(rule = fwd_rule_search(fwd, ini))) { *If* we need this fwd_rule_search() / fwd_rule_match() / else if fwd_rule_search() pattern in more places, it might be convenient to hide the complexity by taking 'rule_hint' as argument for fwd_rule_search() perhaps, and then do something like this in fwd_rule_search(): for (i = hint == -1 ? 0 : hint; i < fwd->count; i++) { if (fwd_rule_match(&fwd->rules[i], ini)) return &fwd->rules[i]; if (hint) break; } return NULL; but I haven't really thought this through. It makes fwd_rule_search() a bit ugly. > /* This shouldn't happen, because if there's no rule for > * it we should have no listening socket that would let > * us get here > diff --git a/flow_table.h b/flow_table.h > index 5ee13acc..73de13ba 100644 > --- a/flow_table.h > +++ b/flow_table.h > @@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *flow, uint8_t pif, > const void *saddr, in_port_t sport, > const void *daddr, in_port_t dport); > struct flowside *flow_target(const struct ctx *c, union flow *flow, > - uint8_t proto); > + int rule_hint, uint8_t proto); > > union flow *flow_set_type(union flow *flow, enum flow_type type); > #define FLOW_SET_TYPE(flow_, t_, var_) (&flow_set_type((flow_), (t_))->var_) > diff --git a/fwd.c b/fwd.c > index 6727d26f..250b470c 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags, > * > * Returns: true if the rule applies to the flow, false otherwise > */ > -static bool fwd_rule_match(const struct fwd_rule *rule, > - const struct flowside *ini) > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini) > { > return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) && > ini->oport >= rule->first && ini->oport <= rule->last; > diff --git a/fwd.h b/fwd.h > index 435f422a..2af7791d 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -106,6 +106,7 @@ struct fwd_ports { > void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags, > const union inany_addr *addr, const char *ifname, > in_port_t first, in_port_t last, in_port_t to); > +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini); > const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd, > const struct flowside *ini); > void fwd_rules_print(const struct fwd_ports *fwd); > diff --git a/icmp.c b/icmp.c > index 9564c496..0f4d48bb 100644 > --- a/icmp.c > +++ b/icmp.c > @@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > return NULL; > > flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id); > - if (!(tgt = flow_target(c, flow, proto))) > + if (!(tgt = flow_target(c, flow, -1, proto))) > goto cancel; > > if (flow->f.pif[TGTSIDE] != PIF_HOST) { > diff --git a/tcp.c b/tcp.c > index fc03e38f..89b22a51 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > ini = flow_initiate_af(flow, PIF_TAP, > af, saddr, srcport, daddr, dstport); > > - if (!(tgt = flow_target(c, flow, IPPROTO_TCP))) > + if (!(tgt = flow_target(c, flow, -1, IPPROTO_TCP))) > goto cancel; > > if (flow->f.pif[TGTSIDE] != PIF_HOST) { > @@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, > goto cancel; > } > > - if (!flow_target(c, flow, IPPROTO_TCP)) > + if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP)) > goto cancel; > > switch (flow->f.pif[TGTSIDE]) { > diff --git a/udp.c b/udp.c > index 761221f6..d7e47e52 100644 > --- a/udp.c > +++ b/udp.c > @@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n, > * udp_sock_fwd() - Forward datagrams from a possibly unconnected socket > * @c: Execution context > * @s: Socket to forward from > + * @rule_hint: Forwarding rule to use, or -1 if unknown > * @frompif: Interface to which @s belongs > * @port: Our (local) port number of @s > * @now: Current timestamp > */ > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, > - in_port_t port, const struct timespec *now) > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > + uint8_t frompif, in_port_t port, const struct timespec *now) > { > union sockaddr_inany src; > union inany_addr dst; > @@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, > continue; > } > > - tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now); > + tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, > + rule_hint, now); > topif = pif_at_sidx(tosidx); > > if (pif_is_socket(topif)) { > @@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c, > union epoll_ref ref, uint32_t events, > const struct timespec *now) > { > - if (events & (EPOLLERR | EPOLLIN)) > - udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now); > + if (events & (EPOLLERR | EPOLLIN)) { > + udp_sock_fwd(c, ref.fd, ref.listen.rule, > + ref.listen.pif, ref.listen.port, now); > + } > } > > /** > diff --git a/udp_flow.c b/udp_flow.c > index 8907f2f7..b82c6c06 100644 > --- a/udp_flow.c > +++ b/udp_flow.c > @@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c, > * udp_flow_new() - Common setup for a new UDP flow > * @c: Execution context > * @flow: Initiated flow > + * @rule_hint: Index of forwarding rule, or -1 if unknown > * @now: Timestamp > * > * Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_NONE > @@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c, > * #syscalls getsockname > */ > static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > - const struct timespec *now) > + int rule_hint, const struct timespec *now) > { > struct udp_flow *uflow = NULL; > const struct flowside *tgt; > unsigned sidei; > > - if (!(tgt = flow_target(c, flow, IPPROTO_UDP))) > + if (!(tgt = flow_target(c, flow, rule_hint, IPPROTO_UDP))) > goto cancel; > > uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp); > @@ -216,6 +217,7 @@ cancel: > * @dst: Our (local) address to which the datagram is arriving > * @port: Our (local) port number to which the datagram is arriving > * @s_in: Source socket address, filled in by recvmmsg() > + * @rule_hint: Index of forwarding rule, or -1 if unknown > * @now: Timestamp > * > * #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64 > @@ -226,7 +228,7 @@ cancel: > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > const union inany_addr *dst, in_port_t port, > const union sockaddr_inany *s_in, > - const struct timespec *now) > + int rule_hint, const struct timespec *now) > { > const struct flowside *ini; > struct udp_flow *uflow; > @@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > return FLOW_SIDX_NONE; > } > > - return udp_flow_new(c, flow, now); > + return udp_flow_new(c, flow, rule_hint, now); > } > > /** > @@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c, > return FLOW_SIDX_NONE; > } > > - return udp_flow_new(c, flow, now); > + return udp_flow_new(c, flow, -1, now); > } > > /** > @@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c, > { > /* We don't know exactly where the datagrams will come from, but we know > * they'll have an interface and oport matching this flow */ > - udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei], > + udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei], I guess this is fixed on your local branch because this patch (and only this one) applied with some fuzz, but in case it's not... this passes -1 as socket ('s' argument), not as 'rule_hint'. If I swap it with uflow->s[sidei] it all works as expected. > uflow->f.side[sidei].oport, now); > } > > diff --git a/udp_flow.h b/udp_flow.h > index 4c528e95..14e0f920 100644 > --- a/udp_flow.h > +++ b/udp_flow.h > @@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx); > flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > const union inany_addr *dst, in_port_t port, > const union sockaddr_inany *s_in, > - const struct timespec *now); > + int rule_hint, const struct timespec *now); > flow_sidx_t udp_flow_from_tap(const struct ctx *c, > uint8_t pif, sa_family_t af, > const void *saddr, const void *daddr, > diff --git a/udp_internal.h b/udp_internal.h > index 96d11cff..f0ce8f14 100644 > --- a/udp_internal.h > +++ b/udp_internal.h > @@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp, > size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, > const struct flowside *toside, size_t dlen, > bool no_udp_csum); > -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, > - in_port_t port, const struct timespec *now); > +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > + uint8_t frompif, in_port_t port, const struct timespec *now); > > #endif /* UDP_INTERNAL_H */ So, uh, having reached this point, I was asking myself "is this really everything"? :) Well, it's not exactly everything as you mentioned in the cover letter, perhaps the biggest missing part (other than client / interface) is outbound forwarding, but still, reaching this point with just 14 relatively small patches is quite impressive. I guess it's also merit of how generic / well structured the flow table and especially the flow "stages" are. -- Stefano