From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=Ll2DVFXE; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 940A75A0265 for ; Fri, 22 May 2026 02:02:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1779408151; bh=WPE9fXB+CxZB5RbjWmYugyctjowKEDH/7YGMaH9avPM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ll2DVFXEJ+ph+TlD6tpmXVKuZ/ZDDmUg5QJyZ9J2sv7vgZZM9YUGc+zPOX2qVeNLQ jGanADh9ip2p9Sdtw67soiNn2PzS6JY6iC1dAGN+Bmd0z6C+L7+EHwbn44Hu+bwHOv 2niueOjpXLkDbcNNOLWjlTMs3733o/ar4xEsKlhB6g6cwpW/rMo/OWzByz6KyCn2/d 8lIR/pUwRBkORLvooAapZkfgSOf3aVHOtotdlAqdqceIEfaYw/xQG1ZNnJHS0oP6Yb jm2+5MixKjy1sv1W1OelEP2hyTMfHKFgTFK2RRj5XFksXkpY+SUkcBUm4Pts2MMrm5 Qi49JGhK1CHMw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gM57W4Clqz4wSj; Fri, 22 May 2026 10:02:31 +1000 (AEST) Date: Fri, 22 May 2026 10:02:27 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/2] netlink: Use regular request/response netlink socket for initial neighbour sync Message-ID: References: <20260521180146.1834333-1-sbrivio@redhat.com> <20260521180146.1834333-2-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="tVALetavKbYIeII2" Content-Disposition: inline In-Reply-To: <20260521180146.1834333-2-sbrivio@redhat.com> Message-ID-Hash: NVZUZZRSTHFCHODWSRVPMC3D4END4O3F X-Message-ID-Hash: NVZUZZRSTHFCHODWSRVPMC3D4END4O3F X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --tVALetavKbYIeII2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 21, 2026 at 08:01:45PM +0200, Stefano Brivio wrote: > ...instead of the one dedicated to the neighbour monitor, because, if > neighbour notifications start coming in before or while we send the > initial request to read out the neighbour tables, messages and > sequence numbers will collide. >=20 > For example, if nl_neigh_sync() sends a RTM_GETNEIGH request with > sequence 20, we expect a corresponding reply with sequence 20. But > given that we already used the same socket to subscribe to > notifications, and notifications don't correspond to any specific > request we sent, we might now get a message with sequence 0. Heh. Called it, kinda. Nice job tracking this down. > The collision between messages wouldn't actually matter, as we'll > handle anyway any RTM_NEWNEIGH message in the same fashion, but we > need to validate sequence numbers for robustness, and that will fail. >=20 > At the same time, we have to subscribe to neighbour notifications > before calling nl_neigh_sync(), because we'll have a race condition > otherwise, as we might miss neighbours that were added before the > notifier is registered. >=20 > Use the regular nl_sock for nl_neigh_sync(). >=20 > Drop the interface index from the request: we won't get any entry > otherwise, because the Linux kernel (as of version 7.0) is unable to > filter on it. Results are now filtered by interface index as we read > them. >=20 > Passing along an interface index used to work when nl_neigh_sync() > used the notifier socket, because NETLINK_GET_STRICT_CHK is not set > on it, meaning that results weren't filtered at all (interface and > IP version passed in the request were ignored altogether). >=20 > To reproduce the issue fixed here: >=20 > * detach a network and user namespace: >=20 > [terminal 0] > $ unshare -rUn > # echo $$ > 1543307 >=20 > * attach pasta to it: >=20 > [terminal 1] > $ ./pasta -f --config-net 1543307 -I enp9s0 >=20 > * enter that namespace from yet another terminal: >=20 > [terminal 2] > $ nsenter --preserve-credentials -U -n -t 1543307 >=20 > * start flooding the MAC address table of this namespace: >=20 > [terminal 1] > # for i in $(seq 10 99); do for j in $(seq 10 99); do for k in $(seq 10= 99); do ip ne add dev enp9s0 10.$i.$j.$k lladdr 00:11:22:$i:$j:$k; done; d= one; done >=20 > * and now start another instance of pasta in this namespace: >=20 > [terminal 2] > # ./pasta -d --config-net >=20 > which will eventually result in pasta exiting with a message like: >=20 > 0.0253: netlink: Unexpected sequence number (0 !=3D 34) >=20 > Reported-by: Paul Holzinger > Link: https://bugs.passt.top/show_bug.cgi?id=3D203 > Fixes: 3c469013cfaa ("netlink: add subscription on changes in NDP/ARP tab= le") > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > netlink.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) >=20 > diff --git a/netlink.c b/netlink.c > index c3c830e..0863734 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1206,24 +1206,23 @@ static void nl_neigh_msg_read(const struct ctx *c= , struct nlmsghdr *nh) > * @proto: Protocol, AF_INET or AF_INET6 > * @ifi: Interface index > */ > -static void nl_neigh_sync(const struct ctx *c, int proto, int ifi) > +static void nl_neigh_sync(const struct ctx *c, int proto) > { > struct { > struct nlmsghdr nlh; > struct ndmsg ndm; > } req =3D { > - .ndm.ndm_family =3D proto, > - .ndm.ndm_ifindex =3D ifi, > + .ndm.ndm_family =3D proto, > }; > struct nlmsghdr *nh; > char buf[NLBUFSIZ]; > ssize_t status; > uint32_t seq; > =20 > - seq =3D nl_send(nl_sock_neigh, &req, RTM_GETNEIGH, > - NLM_F_DUMP, sizeof(req)); > - nl_foreach_oftype(nh, status, nl_sock_neigh, buf, seq, RTM_NEWNEIGH) > + seq =3D nl_send(nl_sock, &req, RTM_GETNEIGH, NLM_F_DUMP, sizeof(req)); > + nl_foreach_oftype(nh, status, nl_sock, buf, seq, RTM_NEWNEIGH) > nl_neigh_msg_read(c, nh); > + > if (status < 0) > warn("netlink: RTM_GETNEIGH failed: %s", strerror_(-status)); > } > @@ -1298,8 +1297,8 @@ int nl_neigh_notify_init(const struct ctx *c) > return -1; > } > =20 > - nl_neigh_sync(c, AF_INET, c->ifi4); > - nl_neigh_sync(c, AF_INET6, c->ifi6); > + nl_neigh_sync(c, AF_INET); > + nl_neigh_sync(c, AF_INET6); > =20 > return 0; > } > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --tVALetavKbYIeII2 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmoPnQ4ACgkQzQJF27ox 2Ge3kA/6AyrRpSEZOdrKdK5bipOtIMepzmIB+ynfIvkjhKYoBPPGaltYZTceOfjJ 1RF6tvMs42c6I2Dzevtb/K9+Gwaayfyspe7NUqfia1P1EvDTqkxWcS2aNiLxa1X7 kjAqUj/1dzCEnbfn1uyj8hc9zoKBuHKlAPEXPwPLaQ1ZOt/9HPZO26gO1YPSj72X DHw1XWTCvApJ5qf3jlo8hXWGi5vy/h+FqmceBxz8/54SuC8uUMpnOYeklSSTKIJE vxTTiGswlm6bL18Nv+n6nk496d9DfGBDOlgr5r9sK0GcbU4SfG7RAm7+S/2Sznl5 Ro2okZz0YUByXtpvo6UttyQj15m5F33WaX9cF1fgXpGfXBaLlcpsqE5wAis20PM4 E2GgnZ7Z7aPSuymZRM4GYVvvv7l1J9dPoIoboY8rjencOnN4Xmq+Uxx5vbpk41dL yBd0Ms7QBwsceIvWnQjKi0QIySdB+/XJ0z1gbpl5y4yghAmdMGk824Dn6Sh/2IDy y7dXaeD/qdpKD7LULklu0I3wk8ODDmDYnXltJaE0pT5SD6rtT/7YhpZbreRoaqMu bxWKFt3DMan3UqP3v5A0/RP0ByKNfvYS/evwOuRNS9bdiYROeBdzuY/HPKMol1Bv nlXHnOpmgdDyWvnvgCiAHn1aDLtt1ytJgmAi/+yCf3yEpII3Hiw= =0mTQ -----END PGP SIGNATURE----- --tVALetavKbYIeII2--