From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: sbrivio@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v7 08/13] conf, pasta: Track observed guest IPv4 addresses in unified address array
Date: Wed, 27 May 2026 12:46:43 +1000 [thread overview]
Message-ID: <ahZbE8VbZog_95Sl@zatzit> (raw)
In-Reply-To: <20260413005319.3295910-9-jmaloy@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 14053 bytes --]
On Sun, Apr 12, 2026 at 08:53:14PM -0400, Jon Maloy wrote:
> We remove the addr_seen field in struct ip4_ctx and replace it by
> setting a new CONF_ADDR_OBSERVED flag in the corresponding entry
> in the unified address array.
>
> The observed IPv4 address is always added at or moved to position 0,
> increasing chances for a fast lookup.
>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>
> ---
> v4: - Removed migration protocol update, to be added in later commit
> - Allow only one OBSERVED address at a time
> - Some other changes based on feedback from David G
> v5: - Allowing multiple observed IPv4 addresses
> v6: - Refactored fwd_set_addr(), notably:
> o Limited number of allowed observed addresses to four per protocol
> o I kept the memmove() calls, since I find no more elegant way to
> do this. Performance cost should be minimal, since these parts
> of the code will execute only very exceptionally. Note that
> removing the 'oldest' entry implicitly means removing the least
> used one, since the latter will migrate to the highest position
> after a few iterations of remove/add.
> o Also kept the prefix_len update. Not sure about this, but I
> cannot see how the current approach can cause any harm.
> - Other changes suggested by David G, notably reversing some
> residues after an accidental merge/re-split with the next
> commit.
> v7: - Changed fwd_set_addr() to only accept keeping one observed-only
> address per protocol, as suggested by David.
> - Eliminated redundant tap_check_src_addr4() call level.
> - I keep fwd_select_addr() for the same pragmatic reason it was
> introduced: to avoid ugly, deeply indented code that tends
> to wrap across several lines.
> ---
> conf.c | 6 ---
> fwd.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++-------
> fwd.h | 4 ++
> migrate.c | 17 +++++++-
> passt.h | 6 +--
> tap.c | 8 +++-
> 6 files changed, 136 insertions(+), 29 deletions(-)
>
> diff --git a/conf.c b/conf.c
> index 924ade2..f503d0f 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -767,13 +767,8 @@ static unsigned int conf_ip4(struct ctx *c, unsigned int ifi)
> }
> if (!rc || !fwd_get_addr(c, AF_INET, 0, 0))
> return 0;
> -
> - a = fwd_get_addr(c, AF_INET, CONF_ADDR_HOST, 0);
> }
>
> - if (a)
> - ip4->addr_seen = *inany_v4(&a->addr);
> -
> ip4->our_tap_addr = ip4->guest_gw;
>
> return ifi;
> @@ -787,7 +782,6 @@ static void conf_ip4_local(struct ctx *c)
> {
> struct ip4_ctx *ip4 = &c->ip4;
>
> - ip4->addr_seen = IP4_LL_GUEST_ADDR;
> ip4->our_tap_addr = ip4->guest_gw = IP4_LL_GUEST_GW;
> ip4->no_copy_addrs = ip4->no_copy_routes = true;
> fwd_set_addr(c, &inany_from_v4(IP4_LL_GUEST_ADDR),
> diff --git a/fwd.c b/fwd.c
> index d3f576a..8c7bf91 100644
> --- a/fwd.c
> +++ b/fwd.c
> @@ -28,6 +28,7 @@
> #include "inany.h"
> #include "fwd.h"
> #include "passt.h"
> +#include "conf.h"
> #include "lineread.h"
> #include "flow_table.h"
> #include "netlink.h"
> @@ -260,21 +261,68 @@ void fwd_neigh_table_init(const struct ctx *c)
> void fwd_set_addr(struct ctx *c, const union inany_addr *addr,
> uint8_t flags, int prefix_len)
> {
> - struct guest_addr *a;
> + struct guest_addr *a, *arr = &c->addrs[0], *rm = NULL;
> + int count = c->addr_count;
> + int af_cnt = 0;
>
> - for_each_addr(a, c->addrs, c->addr_count, inany_af(addr)) {
> - goto found;
> + for_each_addr(a, c->addrs, c->addr_count, AF_UNSPEC) {
> + if (!inany_equals(&a->addr, addr))
> + continue;
> +
> + /* Adjust and update prefix_len if provided and applicable */
> + if (prefix_len && !(a->flags & CONF_ADDR_USER))
> + a->prefix_len = inany_prefix_len(addr, prefix_len);
Converting the format of the prefix length here doesn't make sense to
me. Both addr and a->addr are inanys, so both prefix_len and
a->prefix_len should already be in IPv6 format.
> +
> + /* Nothing more to change */
> + if ((a->flags & flags) == flags)
> + return;
> +
> + a->flags |= flags;
> + if (!(flags & CONF_ADDR_OBSERVED))
> + return;
> +
> + /* Observed address moves to position 0: remove, re-add later */
> + prefix_len = a->prefix_len;
> + memmove(a, a + 1, (&arr[count] - (a + 1)) * sizeof(*a));
If the address is already in an early slot, this will move most of the
table, and the code below will move it mostly back again. That seems
not ideal.
> + c->addr_count = --count;
Having the local count shadown c->addr_count leads to kind of
confusing resynchronization like this. I'd be inclined to just use
c->addr_count directly, and rely on the compiler to optimise it.
> + break;
> }
>
> - if (c->addr_count >= MAX_GUEST_ADDRS)
> + if (count >= MAX_GUEST_ADDRS) {
> + debug("Address table full, can't add address");
> return;
> + }
>
> - a = &c->addrs[c->addr_count++];
> -
> -found:
> + /* Add to head or tail, depending on flag */
> + if (flags & CONF_ADDR_OBSERVED) {
> + a = &arr[0];
> + memmove(&arr[1], a, count * sizeof(*a));
> + } else {
> + a = &arr[count];
> + }
> + c->addr_count = ++count;
> a->addr = *addr;
> a->prefix_len = inany_prefix_len6(addr, prefix_len);
Again, a conversion should not be necessary here. Much less one
that's different from the case above.
> a->flags = flags;
> +
> + if (!(flags & CONF_ADDR_OBSERVED))
> + return;
> +
> + /* Remove excess observed-only address if more than one */
> + for (int i = count - 1; i >= 0; i--) {
> + a = &arr[i];
> + if (inany_af(&a->addr) != inany_af(addr))
> + continue;
> + if (a->flags != CONF_ADDR_OBSERVED)
> + continue;
> + if (!rm)
> + rm = a;
> + af_cnt++;
> + }
As we've discussed, an address should only be removed if it is
*observed only*. If there's an existing, different address with
OBSERVED | USER, for example, we should just remove its OBSERVED bit
but leave the address itself.
> + if (af_cnt > 1) {
> + memmove(rm, rm + 1, (&arr[count] - (rm + 1)) * sizeof(*rm));
> + c->addr_count--;
> + }
> }
>
> /**
> @@ -985,6 +1033,38 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
> ((ini->oport == 53) || (ini->oport == 853));
> }
>
> +/**
> + * fwd_select_addr() - Select address with priority-based search
> + * @c: Execution context
> + * @af: Address family (AF_INET or AF_INET6)
> + * @primary: Primary flags to match (or 0 to skip)
> + * @secondary: Secondary flags to match (or 0 to skip)
> + * @skip: Flags to exclude from search
> + *
> + * Search for address entries in priority order.
> + *
> + * Return: pointer to selected address entry, or NULL if none found
> + */
> +const struct guest_addr *fwd_select_addr(const struct ctx *c, int af,
> + int primary, int secondary, int skip)
> +{
> + const struct guest_addr *a;
> +
> + if (primary) {
Why is it useful to allow skipping the primary search?
> + a = fwd_get_addr(c, af, primary, skip);
> + if (a)
> + return a;
> + }
> +
> + if (secondary) {
> + a = fwd_get_addr(c, af, secondary, skip);
> + if (a)
> + return a;
> + }
> +
> + return NULL;
> +}
> +
> /**
> * fwd_guest_accessible() - Is address guest-accessible
> * @c: Execution context
> @@ -1014,11 +1094,6 @@ static bool fwd_guest_accessible(const struct ctx *c,
> if (inany_equals(addr, &a->addr))
> return false;
> }
> - /* Also check addr_seen: it tracks the address the guest is actually
> - * using, which may differ from configured addresses.
> - */
> - if (inany_equals4(addr, &c->ip4.addr_seen))
> - return false;
>
> /* For IPv6, addr_seen starts unspecified, because we don't know what LL
> * address the guest will take until we see it. Only check against it
> @@ -1214,10 +1289,20 @@ uint8_t fwd_nat_from_host(const struct ctx *c,
> * match.
> */
> if (inany_v4(&ini->eaddr)) {
> - if (c->host_lo_to_ns_lo)
> + if (c->host_lo_to_ns_lo) {
> tgt->eaddr = inany_loopback4;
> - else
> - tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
> + } else {
> + const struct guest_addr *a;
> +
> + a = fwd_select_addr(c, AF_INET,
> + CONF_ADDR_OBSERVED,
> + CONF_ADDR_USER |
> + CONF_ADDR_HOST, 0);
Isn't the two stage lookup redundant with forcing OBSERVED addresses
into the first slot? If you just search for any address of the right
family, it will necessarily find the OBSERVED one first, yes?
> + if (!a)
> + return PIF_NONE;
> +
> + tgt->eaddr = a->addr;
> + }
> tgt->oaddr = inany_any4;
> } else {
> if (c->host_lo_to_ns_lo)
> @@ -1252,7 +1337,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c,
> tgt->oport = ini->eport;
>
> if (inany_v4(&tgt->oaddr)) {
> - tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
> + const struct guest_addr *a;
> +
> + a = fwd_select_addr(c, AF_INET, CONF_ADDR_OBSERVED,
> + CONF_ADDR_USER | CONF_ADDR_HOST, 0);
> + if (!a)
> + return PIF_NONE;
> +
> + tgt->eaddr = a->addr;
> } else {
> if (inany_is_linklocal6(&tgt->oaddr))
> tgt->eaddr.a6 = c->ip6.addr_ll_seen;
> diff --git a/fwd.h b/fwd.h
> index c5a1068..9893856 100644
> --- a/fwd.h
> +++ b/fwd.h
> @@ -25,6 +25,10 @@ void fwd_probe_ephemeral(void);
> bool fwd_port_is_ephemeral(in_port_t port);
> const struct guest_addr *fwd_get_addr(const struct ctx *c, sa_family_t af,
> uint8_t incl, uint8_t excl);
> +const struct guest_addr *fwd_select_addr(const struct ctx *c, int af,
> + int primary, int secondary, int skip);
> +void fwd_set_addr(struct ctx *c, const union inany_addr *addr,
> + uint8_t flags, int prefix_len);
>
> /**
> * struct fwd_rule - Forwarding rule governing a range of ports
> diff --git a/migrate.c b/migrate.c
> index 1e8858a..1e02720 100644
> --- a/migrate.c
> +++ b/migrate.c
> @@ -18,6 +18,8 @@
> #include "util.h"
> #include "ip.h"
> #include "passt.h"
> +#include "conf.h"
> +#include "fwd.h"
> #include "inany.h"
> #include "flow.h"
> #include "flow_table.h"
> @@ -57,11 +59,18 @@ static int seen_addrs_source_v2(struct ctx *c,
> struct migrate_seen_addrs_v2 addrs = {
> .addr6 = c->ip6.addr_seen,
> .addr6_ll = c->ip6.addr_ll_seen,
> - .addr4 = c->ip4.addr_seen,
> };
> + const struct guest_addr *a;
>
> (void)stage;
>
> + /* IPv4 observed address, with fallback to configured address */
> + a = fwd_select_addr(c, AF_INET, CONF_ADDR_OBSERVED,
> + CONF_ADDR_USER | CONF_ADDR_HOST,
> + CONF_ADDR_LINKLOCAL);
I don't think we want to exclude LINKLOCAL here. In the (unlikely for
IPv4) case that addr_seen was linklocal, we would have transported it
as is previously.
> + if (a)
> + addrs.addr4 = *inany_v4(&a->addr);
> +
> memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac));
>
> if (write_all_buf(fd, &addrs, sizeof(addrs)))
> @@ -90,7 +99,11 @@ static int seen_addrs_target_v2(struct ctx *c,
>
> c->ip6.addr_seen = addrs.addr6;
> c->ip6.addr_ll_seen = addrs.addr6_ll;
> - c->ip4.addr_seen = addrs.addr4;
> +
> + if (addrs.addr4.s_addr)
Use IN4_IS_ADDR_UNSPECIFIED, rather than looking into the s_addr
field, please.
> + fwd_set_addr(c, &inany_from_v4(addrs.addr4),
> + CONF_ADDR_OBSERVED, 0);
> +
> memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac));
>
> return 0;
> diff --git a/passt.h b/passt.h
> index f75656d..5da1d55 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -64,8 +64,9 @@ enum passt_modes {
> MODE_VU,
> };
>
> -/* Maximum number of addresses in context address array */
> +/* Limits on number of addresses in context address array */
> #define MAX_GUEST_ADDRS 32
> +#define MAX_OBSERVED_ADDRS 4
Leftover from earlier versions?
>
> /**
> * struct guest_addr - Unified IPv4/IPv6 address entry
> @@ -81,11 +82,11 @@ struct guest_addr {
> #define CONF_ADDR_HOST BIT(1) /* From host interface */
> #define CONF_ADDR_GENERATED BIT(2) /* Generated by PASST/PASTA */
> #define CONF_ADDR_LINKLOCAL BIT(3) /* Link-local address */
> +#define CONF_ADDR_OBSERVED BIT(4) /* Seen in guest traffic */
> };
>
> /**
> * struct ip4_ctx - IPv4 execution context
> - * @addr_seen: Latest IPv4 address seen as source from tap
> * @guest_gw: IPv4 gateway as seen by the guest
> * @map_host_loopback: Outbound connections to this address are NATted to the
> * host's 127.0.0.1
> @@ -101,7 +102,6 @@ struct guest_addr {
> * @no_copy_addrs: Don't copy all addresses when configuring namespace
> */
> struct ip4_ctx {
> - struct in_addr addr_seen;
> struct in_addr guest_gw;
> struct in_addr map_host_loopback;
> struct in_addr map_guest_addr;
> diff --git a/tap.c b/tap.c
> index eb93f74..7f04e12 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -47,6 +47,7 @@
> #include "ip.h"
> #include "iov.h"
> #include "passt.h"
> +#include "fwd.h"
> #include "arp.h"
> #include "dhcp.h"
> #include "ndp.h"
> @@ -756,9 +757,12 @@ resume:
> continue;
> }
>
> - if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr)
> - c->ip4.addr_seen.s_addr = iph->saddr;
> + if (iph->saddr) {
> + const union inany_addr *addr;
>
> + addr = &inany_from_v4(*(struct in_addr *) &iph->saddr);
> + fwd_set_addr(c, addr, CONF_ADDR_OBSERVED, 0);
This is called on essentially every packet. I'm a bit concerned that
even with the optimisations already implemented this might get
noticeably expensive.
> + }
> if (!iov_drop_header(&data, hlen))
> continue;
> if (iov_tail_size(&data) != l4len)
> --
> 2.52.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2026-05-27 3:26 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 0:53 [PATCH v7 00/13] Introduce multiple addresses and late binding Jon Maloy
2026-04-13 0:53 ` [PATCH v7 01/13] dhcpv6: Fix reply destination to match client's source address Jon Maloy
2026-05-14 5:21 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 02/13] passt, pasta: Introduce unified multi-address data structures Jon Maloy
2026-05-14 6:30 ` David Gibson
2026-05-14 23:28 ` Stefano Brivio
2026-05-25 9:35 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 03/13] fwd: Unify guest accessibility checks with unified address array Jon Maloy
2026-05-25 9:38 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 04/13] arp: Check all configured addresses in ARP filtering Jon Maloy
2026-04-13 0:53 ` [PATCH v7 05/13] conf: Allow multiple -a/--address options per address family Jon Maloy
2026-05-25 9:47 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 06/13] netlink, conf: Read all addresses from template interface at startup Jon Maloy
2026-04-13 0:53 ` [PATCH v7 07/13] netlink, pasta: refactor function pasta_ns_conf() Jon Maloy
2026-05-26 1:58 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 08/13] conf, pasta: Track observed guest IPv4 addresses in unified address array Jon Maloy
2026-05-27 2:46 ` David Gibson [this message]
2026-04-13 0:53 ` [PATCH v7 09/13] conf, pasta: Track observed guest IPv6 " Jon Maloy
2026-05-27 3:40 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 10/13] migrate: Update protocol to v3 for multi-address support Jon Maloy
2026-05-27 3:55 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 11/13] dhcp: Select address for DHCP distribution Jon Maloy
2026-05-27 4:30 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 12/13] dhcpv6: Select addresses for DHCPv6 distribution Jon Maloy
2026-05-27 4:40 ` David Gibson
2026-04-13 0:53 ` [PATCH v7 13/13] ndp: Support advertising multiple prefixes in Router Advertisements Jon Maloy
2026-05-27 4:52 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahZbE8VbZog_95Sl@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=jmaloy@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).