From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KsxOVaI2; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 12FE45A0271 for ; Thu, 21 Aug 2025 12:53:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755773622; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g6ZhlPOJVEHxQX1suQ0mxmvmepRDXADdXsMYXO2Utho=; b=KsxOVaI2fl8Pff3DAIir6vx+1olJzgG1cTkqixoLz07bVVDa4eyvZgLzlAjYqpw755Lgwv XvnupXlR9Y8GNm/Q2veBNzfJP9rVi6/mLdlpB5pstzTssOcIncLrmIzcQYhnDdLLWqYJ6R yAYiZ2sYjOi7V0aVxRZm60F5k/WKxLI= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-422-WFIVlvJBOpGa7--gHPuk6w-1; Thu, 21 Aug 2025 06:53:40 -0400 X-MC-Unique: WFIVlvJBOpGa7--gHPuk6w-1 X-Mimecast-MFC-AGG-ID: WFIVlvJBOpGa7--gHPuk6w_1755773619 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3b9e41037e6so567658f8f.1 for ; Thu, 21 Aug 2025 03:53:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755773619; x=1756378419; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=g6ZhlPOJVEHxQX1suQ0mxmvmepRDXADdXsMYXO2Utho=; b=aUcWfHYISLb+ZWvM2ZbKn69v2jkSls2p8gWGgJiYhcqgkDvWxQCSFEievgDp03I/wn rb9gPjMCoBlCBZZNlfRaPWjWbvdr8Yq3cqF9RtCXKMAKH3/QkH0bIg4Oj5t7pPxluS9+ hOtthKk44E4Am4gBK2eVEZJCUBOejgu4pkYgy2NkMplnfhbsZXv9wT0tJzJlplMGEnU+ CtbIl98tClmYzHDgChivtKa8pVLCF+zMjcHtb8h5nRGDVyapIDX+rCDsVRLFYLWhsc7e N30YPv+0FUM6Shc5XKrhjSYblR7ANyIlT/M+Oo7VYzz6fiYktgdeqKUKYg+KDBL4P+1A ccEQ== X-Forwarded-Encrypted: i=1; AJvYcCWu5UWiMmQe5fddwMqqhPvUu+BAoZJRR/QGkKyiI5tmj4ayRDw/BCKyPzpW6sY5loCjfvt/iJugr8s=@passt.top X-Gm-Message-State: AOJu0YzwdZBoQrN0VEklhJmKRPzeQ1AqoKIWIvdTeGuVanyf5Tqe7mmy Dltiq/UbOVw6otzwqoL65Kt+ShVyEcYCcf1paCsnj7RjiWEqaGFJquXq6dyaRf+e4mmn0GxnjzK OqwYHc4FGQJjEejwDZ3ykC/C2YHCPc5b8TGvRXjvuf0HGdhV1vwK4CA== X-Gm-Gg: ASbGncsZ0aCqUZ/kiur65/MOzL0XEe9I/4vWAfIx/8dvCsXmcnu7HGEdzOyHOLR93n2 qoYL/kBJ/J2UN3hNFUwfzD/i36oagbYmmdugcNfU2mL2q8zs8l+CyEyVxtA7iPXlNPBJ4+ui45U sIMmIzJMGAj3Rm9lPkZ5VjjnDudYlPJVf+2Q9R9r0M4hwqWWyfzfaMwO7KJcuf5H7ljPt5YythC 6Cud2WGzEpiNVdcdZpWT9+AM2F52MULHZB/qbpfcPX4KCuL3rCgv2h1KSJUGWn6VvX3jXaFm4ly +mwogt9d4DoF6arGKr22VkB0epVCeG84x97ZjiM2NVC6c2yu+/k= X-Received: by 2002:a05:6000:3102:b0:3b7:8f49:9514 with SMTP id ffacd0b85a97d-3c49549d225mr1748549f8f.18.1755773619217; Thu, 21 Aug 2025 03:53:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFY8CLPm2iEq19hkRF1g1D8P2uL8MFRDzTC0vcRrh23VAfMv6tBF5914pNefi5kwbQquETayw== X-Received: by 2002:a05:6000:3102:b0:3b7:8f49:9514 with SMTP id ffacd0b85a97d-3c49549d225mr1748516f8f.18.1755773618693; Thu, 21 Aug 2025 03:53:38 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3c4568499ecsm3106036f8f.43.2025.08.21.03.53.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Aug 2025 03:53:38 -0700 (PDT) Date: Thu, 21 Aug 2025 12:53:36 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents Message-ID: <20250821125336.0b8ef0dc@elisabeth> In-Reply-To: References: <20250820031005.2725591-1-jmaloy@redhat.com> <20250820031005.2725591-10-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: aPWYHUvGERNjQLWCIJ2hpEFAo7rHXlpbHle2w4YEHKI_1755773619 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: UIB3JWMXWAETCIQX4Z2CL3HNSQOHSW3C X-Message-ID-Hash: UIB3JWMXWAETCIQX4Z2CL3HNSQOHSW3C X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 21 Aug 2025 12:03:47 +1000 David Gibson wrote: > On Tue, Aug 19, 2025 at 11:10:05PM -0400, Jon Maloy wrote: > > We add a cache table to keep partial contents of the kernel ARP/NDP > > tables. This way, we drastically reduce the number of netlink calls > > to read those tables. > > Do you have data to suggest this is necessary? I'll chime in as I originally suggested that we need this cache. Without it, we'll have one netlink query for each local, non-loopback flow being established, which sounds rather absurd (...am I missing something?). I haven't tested these changes yet but I suppose the usual tcp_crr test should show the issue. It's not just about TCP CRR latency though. We're adding this mostly for use cases where some kind of LAN service is implemented by a container (say, Pi-hole), and we can probably expect a ton of short-lived TCP flows in those cases (say, DNS requests over TCP). > It's a lot of code to optimise something only needed for some pretty > uncommon cases. Actually, it's a bit less code than I expected, but I don't understand why you're assuming those cases are uncommon. By the way, note that we should be able to get rid of most of this once we implement a netlink monitor (which we need for other purposes), because at that point we can also subscribe to ARP / neighbour table changes. > > We create dummy cache entries representing non-(not-yet)-existing > > ARP/NDP entries when needed. We add a short expiration time to each > > such entry, so that we can know when to make repeated calls to the > > kernel tables in the beginning. We also add an access counter to the > > entries, to ensure that the timer becomes longer and the call frequency > > abates over time if no ARP/NDP entry is created. > > > > For regular entries we use a much longer timer, with the purpose to > > update the entry in the rare case that a remote host changes its > > MAC address. > > > > Signed-off-by: Jon Maloy > > --- > > arp.c | 3 +- > > conf.c | 2 + > > flow.c | 5 +- > > fwd.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > fwd.h | 4 ++ > > ndp.c | 3 +- > > tcp.c | 3 +- > > 7 files changed, 218 insertions(+), 8 deletions(-) > > > > diff --git a/arp.c b/arp.c > > index c37867a..040d4fe 100644 > > --- a/arp.c > > +++ b/arp.c > > @@ -29,7 +29,6 @@ > > #include "dhcp.h" > > #include "passt.h" > > #include "tap.h" > > -#include "netlink.h" > > > > /** > > * arp() - Check if this is a supported ARP message, reply as needed > > @@ -79,7 +78,7 @@ int arp(const struct ctx *c, const struct pool *p) > > */ > > inany_from_af(&tgt, AF_INET, am->tip); > > if (!fwd_inany_nat(c, &tgt)) > > - nl_neigh_mac_get(nl_sock, &tgt, c->ifi4, am->sha); > > + fwd_neigh_mac_get(c, &tgt, c->ifi4, am->sha); > > > > memcpy(swap, am->tip, sizeof(am->tip)); > > memcpy(am->tip, am->sip, sizeof(am->tip)); > > diff --git a/conf.c b/conf.c > > index f47f48e..0abdbf7 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -2122,6 +2122,8 @@ void conf(struct ctx *c, int argc, char **argv) > > c->udp.fwd_out.mode = fwd_default; > > > > fwd_scan_ports_init(c); > > + if (fwd_mac_cache_init()) > > + die("Failed to initiate neighnor MAC cache"); > > "neighnor" Jon, by the way, we've been using BrE quite consistently throughout the codebase, so the two occurrences of "neighbour" (code comments only) have, well, a 'u' in them. I'd try to keep that consistency if it doesn't bother anybody. -- Stefano