public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Jon Maloy <jmaloy@redhat.com>, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents
Date: Thu, 21 Aug 2025 12:53:36 +0200	[thread overview]
Message-ID: <20250821125336.0b8ef0dc@elisabeth> (raw)
In-Reply-To: <aKZ-gzEpPNGPj93j@zatzit>

On Thu, 21 Aug 2025 12:03:47 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Aug 19, 2025 at 11:10:05PM -0400, Jon Maloy wrote:
> > We add a cache table to keep partial contents of the kernel ARP/NDP
> > tables. This way, we drastically reduce the number of netlink calls
> > to read those tables.  
> 
> Do you have data to suggest this is necessary?

I'll chime in as I originally suggested that we need this cache.

Without it, we'll have one netlink query for each local, non-loopback
flow being established, which sounds rather absurd (...am I missing
something?).

I haven't tested these changes yet but I suppose the usual tcp_crr test
should show the issue.

It's not just about TCP CRR latency though. We're adding this mostly
for use cases where some kind of LAN service is implemented by a
container (say, Pi-hole), and we can probably expect a ton of
short-lived TCP flows in those cases (say, DNS requests over TCP).

> It's a lot of code to optimise something only needed for some pretty
> uncommon cases.

Actually, it's a bit less code than I expected, but I don't understand
why you're assuming those cases are uncommon.

By the way, note that we should be able to get rid of most of this once
we implement a netlink monitor (which we need for other purposes),
because at that point we can also subscribe to ARP / neighbour table
changes.

> > We create dummy cache entries representing non-(not-yet)-existing
> > ARP/NDP entries when needed. We add a short expiration time to each
> > such entry, so that we can know when to make repeated calls to the
> > kernel tables in the beginning. We also add an access counter to the
> > entries, to ensure that the timer becomes longer and the call frequency
> > abates over time if no ARP/NDP entry is created.
> > 
> > For regular entries we use a much longer timer, with the purpose to
> > update the entry in the rare case that a remote host changes its
> > MAC address.
> > 
> > Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> > ---
> >  arp.c  |   3 +-
> >  conf.c |   2 +
> >  flow.c |   5 +-
> >  fwd.c  | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fwd.h  |   4 ++
> >  ndp.c  |   3 +-
> >  tcp.c  |   3 +-
> >  7 files changed, 218 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arp.c b/arp.c
> > index c37867a..040d4fe 100644
> > --- a/arp.c
> > +++ b/arp.c
> > @@ -29,7 +29,6 @@
> >  #include "dhcp.h"
> >  #include "passt.h"
> >  #include "tap.h"
> > -#include "netlink.h"
> >  
> >  /**
> >   * arp() - Check if this is a supported ARP message, reply as needed
> > @@ -79,7 +78,7 @@ int arp(const struct ctx *c, const struct pool *p)
> >  	 */
> >  	inany_from_af(&tgt, AF_INET, am->tip);
> >  	if (!fwd_inany_nat(c, &tgt))
> > -		nl_neigh_mac_get(nl_sock, &tgt, c->ifi4, am->sha);
> > +		fwd_neigh_mac_get(c, &tgt, c->ifi4, am->sha);
> >  
> >  	memcpy(swap,		am->tip,	sizeof(am->tip));
> >  	memcpy(am->tip,		am->sip,	sizeof(am->tip));
> > diff --git a/conf.c b/conf.c
> > index f47f48e..0abdbf7 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -2122,6 +2122,8 @@ void conf(struct ctx *c, int argc, char **argv)
> >  		c->udp.fwd_out.mode = fwd_default;
> >  
> >  	fwd_scan_ports_init(c);
> > +	if (fwd_mac_cache_init())
> > +		die("Failed to initiate neighnor MAC cache");  
> 
> "neighnor"

Jon, by the way, we've been using BrE quite consistently throughout the
codebase, so the two occurrences of "neighbour" (code comments only)
have, well, a 'u' in them. I'd try to keep that consistency if it
doesn't bother anybody.

-- 
Stefano


  reply	other threads:[~2025-08-21 10:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20  3:09 [PATCH v4 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-08-20  3:09 ` [PATCH v4 1/9] netlink: add function to extract MAC addresses from NDP/ARP table Jon Maloy
2025-08-21  0:57   ` David Gibson
2025-08-20  3:09 ` [PATCH v4 2/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-08-21  1:18   ` David Gibson
2025-08-20  3:09 ` [PATCH v4 3/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-08-21  1:28   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 4/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-08-21  1:32   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 5/9] tcp: " Jon Maloy
2025-08-21  1:37   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 6/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-08-21  1:39   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 7/9] tcp: make tcp_rst_no_conn() respond with correct MAC address Jon Maloy
2025-08-21  1:46   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 8/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-08-21  1:51   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents Jon Maloy
2025-08-21  2:03   ` David Gibson
2025-08-21 10:53     ` Stefano Brivio [this message]
2025-08-25  1:48       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250821125336.0b8ef0dc@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).