public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>
Subject: Re: [PATCH 4/7] netlink, pasta: Disable DAD for link-local addresses on namespace interface
Date: Thu, 15 Aug 2024 08:52:58 +0200	[thread overview]
Message-ID: <20240815085258.553a14be@elisabeth> (raw)
In-Reply-To: <Zr1vdEYh3FtrcvvZ@zatzit.fritz.box>

On Thu, 15 Aug 2024 13:01:08 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Aug 15, 2024 at 12:54:26AM +0200, Stefano Brivio wrote:
> > It makes no sense for a container or a guest to try and perform
> > duplicate address detection for their link-local address, as we'll
> > anyway not relay neighbour solicitations with an unspecified source
> > address.
> > 
> > While they perform duplicate address detection, the link-local address
> > is not usable, which prevents us from bringing up especially
> > containers and communicate with them right away via IPv6.
> > 
> > This is not enough to prevent DAD and reach the container right away:
> > we'll need a couple more patches.
> > 
> > A large part of the function setting the nodad attribute is copied^W
> > vendored from nl_routes_dup(), and we could probably refactor things
> > to avoid code duplication, eventually, but keep this simple for the
> > moment.  
> 
> I don't really care about the duplication, but I'm not sure
> nl_routes_dup() was the right thing to vendor.
> 
> > Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> >  netlink.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  netlink.h |  1 +
> >  pasta.c   |  6 ++++
> >  3 files changed, 104 insertions(+)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index 873e6c7..4b49de1 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -673,6 +673,103 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
> >  	return 0;
> >  }
> >  
> > +/**
> > + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses
> > + * @s:		Netlink socket
> > + * @ifi:	Interface index in target namespace
> > + *
> > + * Return: 0 on success, negative error code on failure
> > + */
> > +int nl_addr_set_ll_nodad(int s, unsigned int ifi)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct ifaddrmsg ifa;
> > +	} req = {
> > +		.ifa.ifa_family    = AF_INET6,
> > +		.ifa.ifa_index     = ifi,
> > +	};
> > +	ssize_t nlmsgs_size, left, status;
> > +	unsigned ll_addrs = 0;
> > +	struct nlmsghdr *nh;
> > +	char buf[NLBUFSIZ];
> > +	uint32_t seq;
> > +	unsigned i;
> > +
> > +	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
> > +
> > +	/* nl_foreach() will step through multiple response datagrams,
> > +	 * which we don't want here because we need to have all the
> > +	 * addresses in the buffer at once. See also nl_route_dup().  
> 
> Hmm.. do we need them all in the buffer at once, though?  For
> routes_dup we needed it because we take multiple passes through the
> whole list, and that's not the case here.

Right, we don't need that, I shouldn't have vendored the comment as it
was.

> I guess we can't do an nl_do() within the loop, because that will
> expect the response to its own command while we're still getting
> reponses from the original NLM_F_DUMP.

Exactly, that's why.

> nl_addr_dup() gets away with it because the nl_do()s are on a
> different netlink socket.

Correct.

> But.. I think we could nl_send() each NODAD request as we construct
> it, keep a count, then wait for all the queued responses. 

Oh, I didn't think of doing that. It's definitely worth a try.

> It means we can't easily match an error response to which thing
> caused it, but doesn't look like we were reporting in that much
> detail anyway.

Right, I don't think we should care about that here.

> 
> > +	 */
> > +	nh = nl_next(s, buf, NULL, &nlmsgs_size);
> > +	for (left = nlmsgs_size;
> > +	     NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
> > +	     nh = NLMSG_NEXT(nh, left)) {
> > +		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> > +		bool discard = false;
> > +		struct rtattr *rta;
> > +		size_t na;
> > +
> > +		if (nh->nlmsg_type != RTM_NEWADDR)
> > +			continue;
> > +
> > +		if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK)
> > +			discard = true;
> > +
> > +		ifa->ifa_flags |= IFA_F_NODAD;
> > +
> > +		for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na);
> > +		     rta = RTA_NEXT(rta, na)) {
> > +			/* If 32-bit flags are used, add IFA_F_NODAD there */
> > +			if (rta->rta_type == IFA_FLAGS)
> > +				*(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD;
> > +		}
> > +
> > +		if (discard)
> > +			nh->nlmsg_type = NLMSG_NOOP;
> > +		else
> > +			ll_addrs++;
> > +	}
> > +
> > +	if (!NLMSG_OK(nh, left)) {
> > +		/* Process any remaining datagrams in a different
> > +		 * buffer so we don't overwrite the first one.
> > +		 */
> > +		char tail[NLBUFSIZ];
> > +		unsigned extra = 0;
> > +
> > +		nl_foreach_oftype(nh, status, s, tail, seq, RTM_NEWADDR)
> > +			extra++;
> > +
> > +		if (extra) {
> > +			err("netlink: Too many link-local addresses");
> > +			return -E2BIG;
> > +		}
> > +	}
> > +
> > +	if (status < 0)
> > +		return status;
> > +
> > +	for (i = 0; i < ll_addrs; i++) {
> > +		for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
> > +		     NLMSG_OK(nh, left);
> > +		     nh = NLMSG_NEXT(nh, left)) {
> > +			int rc;
> > +
> > +			if (nh->nlmsg_type != RTM_NEWADDR)
> > +				continue;
> > +
> > +			rc = nl_do(s, nh, RTM_NEWADDR, NLM_F_REPLACE,
> > +				nh->nlmsg_len);
> > +			if (rc < 0)
> > +				return rc;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  /**
> >   * nl_addr_get() - Get most specific global address, given interface and family
> >   * @s:		Netlink socket
> > diff --git a/netlink.h b/netlink.h
> > index 178f8ae..66a44ad 100644
> > --- a/netlink.h
> > +++ b/netlink.h
> > @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
> >  		void *addr, int *prefix_len, void *addr_l);
> >  int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
> >  		const void *addr, int prefix_len);
> > +int nl_addr_set_ll_nodad(int s, unsigned int ifi);
> >  int nl_addr_dup(int s_src, unsigned int ifi_src,
> >  		int s_dst, unsigned int ifi_dst, sa_family_t af);
> >  int nl_link_get_mac(int s, unsigned int ifi, void *mac);
> > diff --git a/pasta.c b/pasta.c
> > index 96545b1..838bbb3 100644
> > --- a/pasta.c
> > +++ b/pasta.c
> > @@ -340,6 +340,12 @@ void pasta_ns_conf(struct ctx *c)
> >  		}
> >  
> >  		if (c->ifi6) {
> > +			rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi);
> > +			if (rc < 0) {
> > +				die("Can't disable DAD for LL in namespace: %s",
> > +				    strerror(-rc));  
> 
> So... I'm usually the one arguing *for* ASSERT()s and die()s, but in
> this case it seems overly drastic.  If we're unable to set DAD it will
> slow things down, but mostly things should still work.  I'd prefer to
> see this as just a warn().

Definitely, yeah.

-- 
Stefano


  reply	other threads:[~2024-08-15  6:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14 22:54 [PATCH 0/7] Prevent DAD for link-local addresses in containers Stefano Brivio
2024-08-14 22:54 ` [PATCH 1/7] netlink: Fix typo in function comment for nl_addr_get() Stefano Brivio
2024-08-15  2:39   ` David Gibson
2024-08-14 22:54 ` [PATCH 2/7] netlink, pasta: Split MTU setting functionality out of nl_link_up() Stefano Brivio
2024-08-15  2:41   ` David Gibson
2024-08-14 22:54 ` [PATCH 3/7] netlink, pasta: Turn nl_link_up() into a generic function to set link flags Stefano Brivio
2024-08-15  2:42   ` David Gibson
2024-08-14 22:54 ` [PATCH 4/7] netlink, pasta: Disable DAD for link-local addresses on namespace interface Stefano Brivio
2024-08-15  3:01   ` David Gibson
2024-08-15  6:52     ` Stefano Brivio [this message]
2024-08-14 22:54 ` [PATCH 5/7] netlink, pasta: Fetch link-local address from namespace interface once it's up Stefano Brivio
2024-08-15  3:04   ` David Gibson
2024-08-15  6:53     ` Stefano Brivio
2024-08-14 22:54 ` [PATCH 6/7] pasta: Disable neighbour solicitations on device up to prevent DAD Stefano Brivio
2024-08-15  3:06   ` David Gibson
2024-08-14 22:54 ` [PATCH 7/7] netlink: Fix typo in function comment for nl_addr_set() Stefano Brivio
2024-08-15  3:07   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240815085258.553a14be@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).