From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=EZzpNief; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D5B7D5A0272 for ; Tue, 03 Mar 2026 06:28:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1772515706; bh=jLm23/wEDhYWEYYCCmN0ICNAjWzMekDQ1HJIaejvKA0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EZzpNiefMYtbG7Z9X6jw0xcNIp1gC+lkXj2ELKp7EKwklJt4VIeuhCprVlR2AXVsk 5HCGLYhSy150WwJhhqxyWpADI8tZV6i/TzyifSDmwINJiUUQL+7W78ppRu7TrLkR6D uV96WCh4gWAjyau5o/wrN7gFz75VTsWrJrEBPo1+ybHFPUYDSgvfP7J+S+p/NPnCM1 z+gpiikmFwCV1gz1qDbokfL38xIxBgBoIAVnMEASU0u14CgmV/YY9/wpgAqzDJFZb+ sgcsWabsLiEvk6HtTNIGXKGvj61df4V3V0BlTnX8jWdQctqgT3Z4nmt7aaLP6yUadL 8kmI4WrFzoNjg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4fQ48V2B40z4wB4; Tue, 03 Mar 2026 16:28:26 +1100 (AEDT) Date: Tue, 3 Mar 2026 16:26:21 +1100 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v5 11/13] dhcp, dhcpv6: Select addresses for DHCP distribution Message-ID: References: <20260222174445.743845-1-jmaloy@redhat.com> <20260222174445.743845-12-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="TlBxzn6TIFYmNvjt" Content-Disposition: inline In-Reply-To: <20260222174445.743845-12-jmaloy@redhat.com> Message-ID-Hash: CKLUMXWYJT7WDZH3ACVEIWAUSHISBE4C X-Message-ID-Hash: CKLUMXWYJT7WDZH3ACVEIWAUSHISBE4C X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --TlBxzn6TIFYmNvjt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 22, 2026 at 12:44:43PM -0500, Jon Maloy wrote: > We update DHCP to select and mark the best address to advertise among > the potentially multiple addresses in the address array. >=20 > We also extend DHCPv6 to advertise all suitable IPv6 addresses > (excluding observed and link-local addresses) in a single reply > message, per RFC 8415. >=20 > Finally, we update conf_print() to reflect above changes. >=20 > Signed-off-by: Jon Maloy >=20 > --- > v5: - Replaced fwd_first_usable_addr() with new helper functions > to support these changes. > - Broke out NDP part to a separate commit > --- > conf.c | 73 +++++++++++++++++++++++++++++++++++------------------- > conf.h | 1 + > dhcp.c | 25 ++++++++++++------- > dhcp.h | 2 +- > dhcpv6.c | 75 ++++++++++++++++++++++++++++++++++++++++++++------------ > 5 files changed, 126 insertions(+), 50 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index f622424..ea6dc07 100644 > --- a/conf.c > +++ b/conf.c > @@ -47,6 +47,7 @@ > #include "lineread.h" > #include "isolation.h" > #include "log.h" > +#include "fwd.h" > #include "vhost_user.h" > =20 > #define NETNS_RUN_DIR "/run/netns" > @@ -1116,11 +1117,12 @@ enum passt_modes conf_mode(int argc, char *argv[]) > static void conf_print(const struct ctx *c) > { > char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN]; > - char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ]; > - struct inany_addr_entry *e; > + char bufmac[ETH_ADDRSTRLEN]; > int i; > =20 > if (c->ifi4 > 0 || c->ifi6 > 0) { > + char ifn[IFNAMSIZ]; > + > info("Template interface: %s%s%s%s%s", > c->ifi4 > 0 ? if_indextoname(c->ifi4, ifn) : "", > c->ifi4 > 0 ? " (IPv4)" : "", > @@ -1161,21 +1163,23 @@ static void conf_print(const struct ctx *c) > inet_ntop(AF_INET, &c->ip4.map_host_loopback, > buf4, sizeof(buf4))); > =20 > - e =3D first_v4(c); > - if (e && !c->no_dhcp) { > - uint32_t mask; > - > - mask =3D IN4_MASK(inany_prefix4(e)); > - > - info("DHCP:"); > - info(" assign: %s", > - inet_ntop(AF_INET, inany_v4(&e->addr), > - buf4, sizeof(buf4))); > - info(" mask: %s", > - inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); > - info(" router: %s", > - inet_ntop(AF_INET, &c->ip4.guest_gw, > - buf4, sizeof(buf4))); > + if (!c->no_dhcp) { > + const struct inany_addr_entry *e; > + > + e =3D fwd_select_addr(c, AF_INET, CONF_ADDR_USER, > + CONF_ADDR_HOST, 0, CONF_ADDR_OBSERVED); The CONF_ADDR_OBSERVED exclusion won't do quite what you want: typically after the guest has been up for a little time, address will have USER|OBSERVED, or HOST|OBSERVED. If we get another round of DHCP, those addresses will be excluded. Instead you only want to exclude addresses that *don't* have USER or HOST. This also duplicates the logic / options from the DHCP code itself, which worries me for the possibility of it getting out of sync. I'd prefer to see a single more or less parameter-less entry point that selects the DHCP published addresss and use it both here and in the DHCP code. > + if (e) { > + uint32_t mask =3D IN4_MASK(inany_prefix4(e)); > + > + info("DHCP:"); > + info(" assign: %s", > + inany_ntop(&e->addr, buf4, sizeof(buf4))); > + info(" mask: %s", > + inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); > + info(" router: %s", > + inet_ntop(AF_INET, &c->ip4.guest_gw, > + buf4, sizeof(buf4))); > + } > } > =20 > for (i =3D 0; i < ARRAY_SIZE(c->ip4.dns); i++) { > @@ -1209,14 +1213,33 @@ static void conf_print(const struct ctx *c) > else > goto dns6; > =20 > - e =3D first_v6(c); > - info(" assign: %s", !e ? "" : > - inet_ntop(AF_INET6, &e->addr.a6, buf6, sizeof(buf6))); > - info(" router: %s", > - inet_ntop(AF_INET6, &c->ip6.guest_gw, buf6, sizeof(buf6))); > - info(" our link-local: %s", > - inet_ntop(AF_INET6, &c->ip6.our_tap_ll, > - buf6, sizeof(buf6))); > + { > + int skip =3D CONF_ADDR_OBSERVED | CONF_ADDR_LINKLOCAL; > + const struct inany_addr_entry *e; > + int first =3D 1; > + > + for_each_addr(e, c, AF_INET6) { > + if (e->flags & skip) > + continue; As for v4 this will incorrectly skip addresses with are OBSERVED as well as USER or HOST. > + /* NDP requires /64 for SLAAC */ > + if (!c->no_ndp && e->prefix_len !=3D 64) > + continue; Hmm... I'm pretty sure DHCPv6 can assign addresses with different prefix_len, even if though NDP can't. > + info(" %s: %s/%d", > + first ? "assign" : " ", > + inany_ntop(&e->addr, buf6, sizeof(buf6)), > + e->prefix_len); > + first =3D 0; > + } > + if (!first) { Do we require an assigned address to advertise these other things? I'm not sure we do. > + info(" router: %s", > + inet_ntop(AF_INET6, &c->ip6.guest_gw, > + buf6, sizeof(buf6))); > + info(" our link-local: %s", > + inet_ntop(AF_INET6, &c->ip6.our_tap_ll, > + buf6, sizeof(buf6))); In fact for this one, I know we don't. > + } > + } > =20 > dns6: > for (i =3D 0; i < ARRAY_SIZE(c->ip6.dns); i++) { > diff --git a/conf.h b/conf.h > index 8b10ac6..b7d5fd2 100644 > --- a/conf.h > +++ b/conf.h > @@ -13,6 +13,7 @@ > #define CONF_ADDR_HOST BIT(1) /* From host interface */ > #define CONF_ADDR_LINKLOCAL BIT(2) /* Link-local address */ > #define CONF_ADDR_OBSERVED BIT(3) /* Seen in guest traffic */ > +#define CONF_ADDR_DHCP BIT(4) /* IPv4: assigned via DHCP */ Re: comment on previous migration patch. This is a case in point - you subtly altered the migration stream format by adding this flag. > =20 > enum passt_modes conf_mode(int argc, char *argv[]); > void conf(struct ctx *c, int argc, char **argv); > diff --git a/dhcp.c b/dhcp.c > index af473ee..b7eeac1 100644 > --- a/dhcp.c > +++ b/dhcp.c > @@ -31,6 +31,8 @@ > #include "passt.h" > #include "tap.h" > #include "log.h" > +#include "fwd.h" > +#include "conf.h" > #include "dhcp.h" > =20 > /** > @@ -300,23 +302,22 @@ static void opt_set_dns_search(const struct ctx *c,= size_t max_len) > * > * Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure > */ > -int dhcp(const struct ctx *c, struct iov_tail *data) > +int dhcp(struct ctx *c, struct iov_tail *data) > { > - struct inany_addr_entry *e =3D first_v4(c); > + struct in_addr addr, mask, dst; > char macstr[ETH_ADDRSTRLEN]; > + struct inany_addr_entry *e; > size_t mlen, dlen, opt_len; > - struct in_addr mask, dst; > + struct udphdr uh_storage; > struct ethhdr eh_storage; > struct iphdr iph_storage; > - struct udphdr uh_storage; > + const struct udphdr *uh; > const struct ethhdr *eh; > const struct iphdr *iph; > - const struct udphdr *uh; > struct msg m_storage; > struct msg const *m; > - struct in_addr addr; > struct msg reply; > - unsigned int i; > + int i; > =20 > eh =3D IOV_REMOVE_HEADER(data, eh_storage); > iph =3D IOV_PEEK_HEADER(data, iph_storage); > @@ -346,7 +347,13 @@ int dhcp(const struct ctx *c, struct iov_tail *data) > m->op !=3D BOOTREQUEST) > return -1; > =20 > - ASSERT(e); > + /* Select address to offer */ > + e =3D fwd_select_addr(c, AF_INET, CONF_ADDR_DHCP, > + CONF_ADDR_USER, CONF_ADDR_HOST, CONF_ADDR_OBSERVED); Again, this incorrectly discards addresses that are OBSERVED as well as USER or HOST. > + if (!e) > + return -1; > + > + e->flags |=3D CONF_ADDR_DHCP; I'm assuming the DHCP flag is so that we keep offering the same "primary" address, even if others are added. I'm not entirely sure how much that matters. But it also makes me think of another way to handle address stability. Although we have to pick a single address to put in DHCPOFFER, we should probably allow the guest to DHCPREQUEST any of the addresses in our list (i.e. respond with DHCPACK). That means that even if we change the "first choice" address we offer, the guest's dhcp client can still renew its lease on older but still valid addresses That does, alas, make our DHCP implementation more complex than it is now. > addr =3D *inany_v4(&e->addr); > =20 > reply.op =3D BOOTREPLY; > @@ -409,7 +416,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) > =20 > info(" from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr))); > =20 > - mask.s_addr =3D IN4_MASK(inany_prefix4(e)); > + mask.s_addr =3D e ? IN4_MASK(inany_prefix4(e)) : 0; If I'm reading the code above correctly, we can't get here with !e, can we? > memcpy(opts[1].s, &mask, sizeof(mask)); > memcpy(opts[3].s, &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); > memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr)); > diff --git a/dhcp.h b/dhcp.h > index cd50c99..7326c7d 100644 > --- a/dhcp.h > +++ b/dhcp.h > @@ -6,7 +6,7 @@ > #ifndef DHCP_H > #define DHCP_H > =20 > -int dhcp(const struct ctx *c, struct iov_tail *data); > +int dhcp(struct ctx *c, struct iov_tail *data); > void dhcp_init(void); > =20 > #endif /* DHCP_H */ > diff --git a/dhcpv6.c b/dhcpv6.c > index ba200e5..3bc2e6d 100644 > --- a/dhcpv6.c > +++ b/dhcpv6.c > @@ -31,6 +31,8 @@ > #include "passt.h" > #include "tap.h" > #include "log.h" > +#include "fwd.h" > +#include "conf.h" > =20 > /** > * struct opt_hdr - DHCPv6 option header > @@ -207,7 +209,7 @@ struct msg_hdr { > * @hdr: DHCP message header > * @server_id: Server Identifier option > * @ia_na: Non-temporary Address option > - * @ia_addr: Address for IA_NA > + * @ia_addr: Addresses for IA_NA (variable, up to INANY_MAX_ADDRS) > * @client_id: Client Identifier, variable length > * @dns_servers: DNS Recursive Name Server, here just for storage size > * @dns_search: Domain Search List, here just for storage size > @@ -218,7 +220,7 @@ static struct resp_t { > =20 > struct opt_server_id server_id; > struct opt_ia_na ia_na; > - struct opt_ia_addr ia_addr; > + struct opt_ia_addr ia_addr[INANY_MAX_ADDRS]; Uh.. this seems dubious. We can leave lots of (potentially) unused slots in an internal array, but not in an on-the-wire packet format. > struct opt_client_id client_id; > struct opt_dns_servers dns_servers; > struct opt_dns_search dns_search; > @@ -227,15 +229,11 @@ static struct resp_t { > { 0 }, > SERVER_ID, > =20 > - { { OPT_IA_NA, OPT_SIZE_CONV(sizeof(struct opt_ia_na) + > - sizeof(struct opt_ia_addr) - > - sizeof(struct opt_hdr)) }, > + { { OPT_IA_NA, 0 }, /* Length set dynamically */ > 1, (uint32_t)~0U, (uint32_t)~0U > }, > =20 > - { { OPT_IAAADR, OPT_SIZE(ia_addr) }, > - IN6ADDR_ANY_INIT, (uint32_t)~0U, (uint32_t)~0U > - }, > + { { { 0 }, IN6ADDR_ANY_INIT, 0, 0 } }, /* IA_ADDRs filled dynamically = */ > =20 > { { OPT_CLIENTID, 0, }, > { 0 } > @@ -539,6 +537,44 @@ static size_t dhcpv6_client_fqdn_fill(const struct i= ov_tail *data, > return offset + sizeof(struct opt_hdr) + opt_len; > } > =20 > +/** > + * dhcpv6_ia_addr_fill() - Fill IA_ADDR options for all suitable address= es > + * @c: Execution context > + * > + * Fills resp.ia_addr[] with all non-linklocal, non-observed addresses a= nd > + * updates resp.ia_na.hdr.l with the correct length. > + * > + * Return: number of addresses filled > + */ > +static int dhcpv6_ia_addr_fill(const struct ctx *c) > +{ > + int skip =3D CONF_ADDR_OBSERVED | CONF_ADDR_LINKLOCAL; > + const struct inany_addr_entry *e; > + int count =3D 0; > + > + for_each_addr(e, c, AF_INET6) { > + if (e->flags & skip) > + continue; Again, incorrectly skipping USER|OBSERVED and HOST|OBSERVED entries. > + if (count >=3D INANY_MAX_ADDRS) > + break; > + > + resp.ia_addr[count].hdr.t =3D OPT_IAAADR; > + resp.ia_addr[count].hdr.l =3D htons(sizeof(struct opt_ia_addr) - > + sizeof(struct opt_hdr)); > + resp.ia_addr[count].addr =3D e->addr.a6; > + resp.ia_addr[count].pref_lifetime =3D (uint32_t)~0U; > + resp.ia_addr[count].valid_lifetime =3D (uint32_t)~0U; > + count++; > + } > + > + /* Update IA_NA length: header fields + all IA_ADDRs */ > + resp.ia_na.hdr.l =3D htons(sizeof(struct opt_ia_na) - > + sizeof(struct opt_hdr) + > + count * sizeof(struct opt_ia_addr)); > + return count; > +} > + > /** > * dhcpv6() - Check if this is a DHCPv6 message, reply as needed > * @c: Execution context > @@ -552,7 +588,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, > const struct in6_addr *daddr) > { > const struct opt_server_id *server_id =3D NULL; > - struct inany_addr_entry *e =3D first_v6(c); > const struct opt_hdr *client_id =3D NULL; > /* The _storage variables can't be local to the blocks they're used in, > * because IOV_*_HEADER() may return pointers to them which are > @@ -567,11 +602,14 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, > struct opt_hdr client_id_storage; > /* cppcheck-suppress [variableScope,unmatchedSuppression] */ > struct opt_ia_na ia_storage; > + struct in6_addr *ia_addr =3D NULL; > + struct in6_addr first_addr; > const struct in6_addr *src; > struct msg_hdr mh_storage; > const struct msg_hdr *mh; > struct udphdr uh_storage; > const struct udphdr *uh; > + int addr_count; > size_t mlen, n; > =20 > uh =3D IOV_REMOVE_HEADER(data, uh_storage); > @@ -615,6 +653,12 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, > if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta))) > return -1; > =20 > + /* Fill IA_ADDRs for all suitable addresses */ > + addr_count =3D dhcpv6_ia_addr_fill(c); > + if (addr_count > 0) { > + first_addr =3D resp.ia_addr[0].addr; > + ia_addr =3D &first_addr; > + } > resp.hdr.type =3D TYPE_REPLY; > switch (mh->type) { > case TYPE_REQUEST: > @@ -627,7 +671,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, > if (mh->type =3D=3D TYPE_CONFIRM && server_id) > return -1; > =20 > - if (e && dhcpv6_ia_notonlink(data, &e->addr.a6)) { > + if (ia_addr && dhcpv6_ia_notonlink(data, ia_addr)) { > =20 > dhcpv6_send_ia_notonlink(c, data, &client_id_base, > ntohs(client_id->l), mh->xid); > @@ -668,12 +712,14 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, > if (ia) > resp.ia_na.iaid =3D ((struct opt_ia_na *)ia)->iaid; > =20 > + /* Client_id goes right after the used IA_ADDRs */ > + n =3D offsetof(struct resp_t, ia_addr) + > + addr_count * sizeof(struct opt_ia_addr); > iov_to_buf(&client_id_base.iov[0], client_id_base.cnt, > - client_id_base.off, &resp.client_id, > + client_id_base.off, (char *)&resp + n, > ntohs(client_id->l) + sizeof(struct opt_hdr)); Oh... right, you're repacking to get rid of the extra padding. That seems... extremely subtle and fragile. I think the cost of having multiple IA_ADDR is that we can't reasonably pretend to represent a whole response in a single struct any more. > - n =3D offsetof(struct resp_t, client_id) + > - sizeof(struct opt_hdr) + ntohs(client_id->l); > + n +=3D sizeof(struct opt_hdr) + ntohs(client_id->l); > n =3D dhcpv6_dns_fill(c, (char *)&resp, n); > n =3D dhcpv6_client_fqdn_fill(data, c, (char *)&resp, n); > =20 > @@ -704,6 +750,5 @@ void dhcpv6_init(const struct ctx *c) > memcpy(resp_not_on_link.server_id.duid_lladdr, > c->our_tap_mac, sizeof(c->our_tap_mac)); > =20 > - if (first_v6(c)) > - resp.ia_addr.addr =3D first_v6(c)->addr.a6; > + /* Address is set dynamically in dhcpv6() */ > } > --=20 > 2.52.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --TlBxzn6TIFYmNvjt Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmmmcPwACgkQzQJF27ox 2GeGkg//YyOrpMA9qZLjsAWldtB1gCb2p3V99nNWG+aNCCRevQQ+GvhebuX2wjOa q1cbIDpFjD8qqfa6gOk1eCn0crC4vfhTzN6trnlVS8aepr38nWDzvBr4K93Ov22y Zv6NbMUZP9KJh03Lan2GR41CDlPPFWiwuiRFXa8zM4ZIK9pJeqrcDxsG4GQALYWc jEPXUfkQHILg9WpUKKqQ+7bW69FBXruYltMRFWNEteXGK8adnIEN85kqv6XKLYk/ CRCuhj+oSQ5VG4N8uq3QmineTJNMFy8pFSX7shtWXWdAo46qY584qBTFAowuLMee 3dSXqVb49T3FTeYyXimeX4zWHj7brUU36mV6i8Kn9MMDNYwHer4++wY3pKi3zmx6 YMu3F4MYHIDrj52tenCizIo5HFKjnQMwbMBHqiniESEOYQrNtXI19dx1UBihrZn3 4krry67BTgCtCJdCQI+ZSH1MFdjtUCBiP4bGObnup+cb3Ch8NtncRpcuVVhuMcRq v0JwRe4xS0RPn+u5w1qZ40j8kKudn6ENGXvNpwGS5v5SSDydlfRzlULRiVfPPkdi ou/AeV142Cts8fzUKUDWiJytZ3z6XHsDbkfHBgHGKsjwV2y7C9tex733H30t1oNZ ZPlz/jNGzZVGDt7/E/qS7vFsJ7dGtwDFwoNgtn61eTcazQbaT9c= =PA0x -----END PGP SIGNATURE----- --TlBxzn6TIFYmNvjt--