From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Jon Maloy <jmaloy@redhat.com>, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents
Date: Thu, 21 Aug 2025 12:53:36 +0200 [thread overview]
Message-ID: <20250821125336.0b8ef0dc@elisabeth> (raw)
In-Reply-To: <aKZ-gzEpPNGPj93j@zatzit>
On Thu, 21 Aug 2025 12:03:47 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Tue, Aug 19, 2025 at 11:10:05PM -0400, Jon Maloy wrote:
> > We add a cache table to keep partial contents of the kernel ARP/NDP
> > tables. This way, we drastically reduce the number of netlink calls
> > to read those tables.
>
> Do you have data to suggest this is necessary?
I'll chime in as I originally suggested that we need this cache.
Without it, we'll have one netlink query for each local, non-loopback
flow being established, which sounds rather absurd (...am I missing
something?).
I haven't tested these changes yet but I suppose the usual tcp_crr test
should show the issue.
It's not just about TCP CRR latency though. We're adding this mostly
for use cases where some kind of LAN service is implemented by a
container (say, Pi-hole), and we can probably expect a ton of
short-lived TCP flows in those cases (say, DNS requests over TCP).
> It's a lot of code to optimise something only needed for some pretty
> uncommon cases.
Actually, it's a bit less code than I expected, but I don't understand
why you're assuming those cases are uncommon.
By the way, note that we should be able to get rid of most of this once
we implement a netlink monitor (which we need for other purposes),
because at that point we can also subscribe to ARP / neighbour table
changes.
> > We create dummy cache entries representing non-(not-yet)-existing
> > ARP/NDP entries when needed. We add a short expiration time to each
> > such entry, so that we can know when to make repeated calls to the
> > kernel tables in the beginning. We also add an access counter to the
> > entries, to ensure that the timer becomes longer and the call frequency
> > abates over time if no ARP/NDP entry is created.
> >
> > For regular entries we use a much longer timer, with the purpose to
> > update the entry in the rare case that a remote host changes its
> > MAC address.
> >
> > Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> > ---
> > arp.c | 3 +-
> > conf.c | 2 +
> > flow.c | 5 +-
> > fwd.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > fwd.h | 4 ++
> > ndp.c | 3 +-
> > tcp.c | 3 +-
> > 7 files changed, 218 insertions(+), 8 deletions(-)
> >
> > diff --git a/arp.c b/arp.c
> > index c37867a..040d4fe 100644
> > --- a/arp.c
> > +++ b/arp.c
> > @@ -29,7 +29,6 @@
> > #include "dhcp.h"
> > #include "passt.h"
> > #include "tap.h"
> > -#include "netlink.h"
> >
> > /**
> > * arp() - Check if this is a supported ARP message, reply as needed
> > @@ -79,7 +78,7 @@ int arp(const struct ctx *c, const struct pool *p)
> > */
> > inany_from_af(&tgt, AF_INET, am->tip);
> > if (!fwd_inany_nat(c, &tgt))
> > - nl_neigh_mac_get(nl_sock, &tgt, c->ifi4, am->sha);
> > + fwd_neigh_mac_get(c, &tgt, c->ifi4, am->sha);
> >
> > memcpy(swap, am->tip, sizeof(am->tip));
> > memcpy(am->tip, am->sip, sizeof(am->tip));
> > diff --git a/conf.c b/conf.c
> > index f47f48e..0abdbf7 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -2122,6 +2122,8 @@ void conf(struct ctx *c, int argc, char **argv)
> > c->udp.fwd_out.mode = fwd_default;
> >
> > fwd_scan_ports_init(c);
> > + if (fwd_mac_cache_init())
> > + die("Failed to initiate neighnor MAC cache");
>
> "neighnor"
Jon, by the way, we've been using BrE quite consistently throughout the
codebase, so the two occurrences of "neighbour" (code comments only)
have, well, a 'u' in them. I'd try to keep that consistency if it
doesn't bother anybody.
--
Stefano
next prev parent reply other threads:[~2025-08-21 10:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-20 3:09 [PATCH v4 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-08-20 3:09 ` [PATCH v4 1/9] netlink: add function to extract MAC addresses from NDP/ARP table Jon Maloy
2025-08-21 0:57 ` David Gibson
2025-08-20 3:09 ` [PATCH v4 2/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-08-21 1:18 ` David Gibson
2025-08-20 3:09 ` [PATCH v4 3/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-08-21 1:28 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 4/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-08-21 1:32 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 5/9] tcp: " Jon Maloy
2025-08-21 1:37 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 6/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-08-21 1:39 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 7/9] tcp: make tcp_rst_no_conn() respond with correct MAC address Jon Maloy
2025-08-21 1:46 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 8/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-08-21 1:51 ` David Gibson
2025-08-20 3:10 ` [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents Jon Maloy
2025-08-21 2:03 ` David Gibson
2025-08-21 10:53 ` Stefano Brivio [this message]
2025-08-25 1:48 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250821125336.0b8ef0dc@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=jmaloy@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).