public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Jon Maloy <jmaloy@redhat.com>, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents
Date: Mon, 25 Aug 2025 11:48:24 +1000	[thread overview]
Message-ID: <aKvA6I06moqdNWUW@zatzit> (raw)
In-Reply-To: <20250821125336.0b8ef0dc@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 4788 bytes --]

On Thu, Aug 21, 2025 at 12:53:36PM +0200, Stefano Brivio wrote:
> On Thu, 21 Aug 2025 12:03:47 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Aug 19, 2025 at 11:10:05PM -0400, Jon Maloy wrote:
> > > We add a cache table to keep partial contents of the kernel ARP/NDP
> > > tables. This way, we drastically reduce the number of netlink calls
> > > to read those tables.  
> > 
> > Do you have data to suggest this is necessary?
> 
> I'll chime in as I originally suggested that we need this cache.
> 
> Without it, we'll have one netlink query for each local, non-loopback
> flow being established, which sounds rather absurd (...am I missing
> something?).

Not just local: we only need the MAC for local flows, but we don't
know if it's local until we look for the MAC.

Yeah, that's true.  I guess I don't have a good sense of the relative
latency of a netlink lookup versus a flow establishment, so I don't
have an intuition as to whether that's absurd or not.

Since you do, I'll defer to your judgement on this.

> I haven't tested these changes yet but I suppose the usual tcp_crr test
> should show the issue.
> 
> It's not just about TCP CRR latency though. We're adding this mostly
> for use cases where some kind of LAN service is implemented by a
> container (say, Pi-hole), and we can probably expect a ton of
> short-lived TCP flows in those cases (say, DNS requests over TCP).

Actually, I would have expected more UDP than TCP, but the issue is
the same.

> > It's a lot of code to optimise something only needed for some pretty
> > uncommon cases.
> 
> Actually, it's a bit less code than I expected, but I don't understand
> why you're assuming those cases are uncommon.

Hm, maybe.

> By the way, note that we should be able to get rid of most of this once
> we implement a netlink monitor (which we need for other purposes),
> because at that point we can also subscribe to ARP / neighbour table
> changes.

Not really.  We don't need explicit queries in that case, but we still
need the data structure to record the information we get from the
monitor.  The bulk of this code is implementing that data structure.

> > > We create dummy cache entries representing non-(not-yet)-existing
> > > ARP/NDP entries when needed. We add a short expiration time to each
> > > such entry, so that we can know when to make repeated calls to the
> > > kernel tables in the beginning. We also add an access counter to the
> > > entries, to ensure that the timer becomes longer and the call frequency
> > > abates over time if no ARP/NDP entry is created.
> > > 
> > > For regular entries we use a much longer timer, with the purpose to
> > > update the entry in the rare case that a remote host changes its
> > > MAC address.
> > > 
> > > Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> > > ---
> > >  arp.c  |   3 +-
> > >  conf.c |   2 +
> > >  flow.c |   5 +-
> > >  fwd.c  | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fwd.h  |   4 ++
> > >  ndp.c  |   3 +-
> > >  tcp.c  |   3 +-
> > >  7 files changed, 218 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/arp.c b/arp.c
> > > index c37867a..040d4fe 100644
> > > --- a/arp.c
> > > +++ b/arp.c
> > > @@ -29,7 +29,6 @@
> > >  #include "dhcp.h"
> > >  #include "passt.h"
> > >  #include "tap.h"
> > > -#include "netlink.h"
> > >  
> > >  /**
> > >   * arp() - Check if this is a supported ARP message, reply as needed
> > > @@ -79,7 +78,7 @@ int arp(const struct ctx *c, const struct pool *p)
> > >  	 */
> > >  	inany_from_af(&tgt, AF_INET, am->tip);
> > >  	if (!fwd_inany_nat(c, &tgt))
> > > -		nl_neigh_mac_get(nl_sock, &tgt, c->ifi4, am->sha);
> > > +		fwd_neigh_mac_get(c, &tgt, c->ifi4, am->sha);
> > >  
> > >  	memcpy(swap,		am->tip,	sizeof(am->tip));
> > >  	memcpy(am->tip,		am->sip,	sizeof(am->tip));
> > > diff --git a/conf.c b/conf.c
> > > index f47f48e..0abdbf7 100644
> > > --- a/conf.c
> > > +++ b/conf.c
> > > @@ -2122,6 +2122,8 @@ void conf(struct ctx *c, int argc, char **argv)
> > >  		c->udp.fwd_out.mode = fwd_default;
> > >  
> > >  	fwd_scan_ports_init(c);
> > > +	if (fwd_mac_cache_init())
> > > +		die("Failed to initiate neighnor MAC cache");  
> > 
> > "neighnor"
> 
> Jon, by the way, we've been using BrE quite consistently throughout the
> codebase, so the two occurrences of "neighbour" (code comments only)
> have, well, a 'u' in them. I'd try to keep that consistency if it
> doesn't bother anybody.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2025-08-25  1:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20  3:09 [PATCH v4 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-08-20  3:09 ` [PATCH v4 1/9] netlink: add function to extract MAC addresses from NDP/ARP table Jon Maloy
2025-08-21  0:57   ` David Gibson
2025-08-20  3:09 ` [PATCH v4 2/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-08-21  1:18   ` David Gibson
2025-08-20  3:09 ` [PATCH v4 3/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-08-21  1:28   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 4/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-08-21  1:32   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 5/9] tcp: " Jon Maloy
2025-08-21  1:37   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 6/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-08-21  1:39   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 7/9] tcp: make tcp_rst_no_conn() respond with correct MAC address Jon Maloy
2025-08-21  1:46   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 8/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-08-21  1:51   ` David Gibson
2025-08-20  3:10 ` [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents Jon Maloy
2025-08-21  2:03   ` David Gibson
2025-08-21 10:53     ` Stefano Brivio
2025-08-25  1:48       ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKvA6I06moqdNWUW@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).