public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>
Subject: Re: [PATCH v2 4/7] netlink, pasta: Disable DAD for link-local addresses on namespace interface
Date: Fri, 16 Aug 2024 07:45:35 +0200	[thread overview]
Message-ID: <20240816074535.21d7f961@elisabeth> (raw)
In-Reply-To: <Zr6jkRZJNCg_Tx7H@zatzit.fritz.box>

On Fri, 16 Aug 2024 10:55:45 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Aug 15, 2024 at 12:59:32PM +0200, Stefano Brivio wrote:
> > On Thu, 15 Aug 2024 20:38:17 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Thu, Aug 15, 2024 at 10:36:46AM +0200, Stefano Brivio wrote:  
> > > > It makes no sense for a container or a guest to try and perform
> > > > duplicate address detection for their link-local address, as we'll
> > > > anyway not relay neighbour solicitations with an unspecified source
> > > > address.
> > > > 
> > > > While they perform duplicate address detection, the link-local address
> > > > is not usable, which prevents us from bringing up especially
> > > > containers and communicate with them right away via IPv6.
> > > > 
> > > > This is not enough to prevent DAD and reach the container right away:
> > > > we'll need a couple more patches.
> > > > 
> > > > As we send NLM_F_REPLACE requests right away, while we still have to
> > > > read out other addresses on the same socket, we can't use nl_do():
> > > > keep a count of messages we send (addresses we change) and deal with
> > > > the answer to those NLM_F_REPLACE requests in a separate loop, later.
> > > > 
> > > > Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663
> > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > ---
> > > >  netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  netlink.h |  1 +
> > > >  pasta.c   |  6 ++++++
> > > >  3 files changed, 62 insertions(+)
> > > > 
> > > > diff --git a/netlink.c b/netlink.c
> > > > index 873e6c7..59f2fd9 100644
> > > > --- a/netlink.c
> > > > +++ b/netlink.c
> > > > @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +/**
> > > > + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses
> > > > + * @s:		Netlink socket
> > > > + * @ifi:	Interface index in target namespace
> > > > + *
> > > > + * Return: 0 on success, negative error code on failure
> > > > + */
> > > > +int nl_addr_set_ll_nodad(int s, unsigned int ifi)
> > > > +{
> > > > +	struct req_t {
> > > > +		struct nlmsghdr nlh;
> > > > +		struct ifaddrmsg ifa;
> > > > +	} req = {
> > > > +		.ifa.ifa_family    = AF_INET6,
> > > > +		.ifa.ifa_index     = ifi,
> > > > +	};
> > > > +	unsigned ll_addrs = 0;
> > > > +	struct nlmsghdr *nh;
> > > > +	char buf[NLBUFSIZ];
> > > > +	ssize_t status;
> > > > +	uint32_t seq;
> > > > +
> > > > +	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
> > > > +	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) {
> > > > +		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> > > > +		struct rtattr *rta;
> > > > +		size_t na;
> > > > +
> > > > +		if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK)
> > > > +			continue;
> > > > +
> > > > +		ifa->ifa_flags |= IFA_F_NODAD;
> > > > +
> > > > +		for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na);
> > > > +		     rta = RTA_NEXT(rta, na)) {
> > > > +			/* If 32-bit flags are used, add IFA_F_NODAD there */
> > > > +			if (rta->rta_type == IFA_FLAGS)
> > > > +				*(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD;
> > > > +		}
> > > > +
> > > > +		nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len);
> > > > +		ll_addrs++;
> > > > +	}
> > > > +
> > > > +	if (status < 0)
> > > > +		return status;    
> > > 
> > > Ah... one gotcha with the nl_send() in the loop.  We should make sure
> > > we get the responses from any of those we sent, even if the original
> > > request failed.  Otherwise we'll be out of sync on the netlink socket again.  
> > 
> > I'm ignoring the return code of nl_send(), so, minus the issue you're
> > raising about nl_foreach() below, that should already be sorted, right?  
> 
> No.  The return code from nl_send() is mostly irrelevant - it's just
> the sequence number (other errors die()).  But the point is you've
> queued requests, so the kernel will queue responses and if you exit
> the function here, nothing will consume them.

Oh, that's what I missed: you were referring to this return statement.
Sure, I understand that we need to consume those, hence the
nl_foreach() later, but I missed the fact that, of course, we wouldn't
necessarily reach it.

> > > > +	seq += ll_addrs;
> > > > +
> > > > +	nl_foreach(nh, status, s, buf, seq)
> > > > +		warn("netlink: Unexpected response message");    
> > > 
> > > I don't think this will work right if there's > 1 address.  It will be
> > > looking for the last sequence number on the first iteration and will
> > > die in nl_status() when it mismatches.  
> > 
> > Ah, oops, right.
> >   
> > > Maybe just loop on nl_next() until you get the last seq number, then
> > > call nl_status()?  
> > 
> > How do I check for errors on the answers before the next one? I mean,
> > nl_foreach() should fit here, it's just that I need to start from the
> > right sequence number.
> >   
> > > That also means you could just save the seq each
> > > time you nl_send(), overwriting the previous one, rather than relying
> > > on the fact that we allocate seqs, well, sequentially.  
> > 
> > I don't understand how this fits with calling nl_next() until I get
> > to the last sequence number. Letting that aside, can't I simply use
> > nl_foreach(), but start with the sequence of the first nl_send()
> > instead of the last one?  
> 
> Uh.. yeah, it's a bit fiddly.  Especially since in those foreach loops
> status does double duty as the remaining data in the current message
> and as the status code.
> 
> # Option 1
> 
> Assuming contiguous sequence numbers, which is true for now.
> 
> - Change the nl_send() within the first loop to
> 	last_seq = nl_send(...)
> 
> Then immediately after the first loop
> 
> int status2 = status;
> 
> for (seq++; seq <= last_seq; seq++) {
> 	nl_foreach(nh, status2, s, buf, seq)
> 		;
> 	if (status == 0)
> 		status = status2;
> }
> At this point you will have consumed all the responses and status will
> have the first reported error code.

This looks to me like the easiest to follow, thanks for the thorough
descriptions! I'm going with this one in v3.

> # Option 2
> 
> Refactor nl_status() to have a version that reports sequence number
> instead of taking & checking it. Loop on nl_next() until
> nl_status_variant() returns <= 0 *and* the last sequence number.
> 
> # Option 3
> 
> Open-coded version of (2)
> 
> ssize_t err = status;
> 
> do {
> 	nh = nl_next(s, buf, nh, &status);
> 	if (err == 0 && nh->nl_msg_type == NLMSG_ERR) {
> 		struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
> 		err = errmsg->error;
> 	}
> } while (ng->nlmsg_seq != last_seq ||
> 	(nh->nlmsg_type != NLMSG_DONE && nh->nlmsg_type != NLMSG_ERROR));
> 
> And at this point, again, you've consumed all the responses and 'err'
> has the first error code.  I think this is roughly what I was
> suggesting originally, but it is messier than I thought.

-- 
Stefano


  reply	other threads:[~2024-08-16  5:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-15  8:36 [PATCH v2 0/7] Prevent DAD for link-local addresses in containers Stefano Brivio
2024-08-15  8:36 ` [PATCH v2 1/7] netlink: Fix typo in function comment for nl_addr_get() Stefano Brivio
2024-08-15  8:36 ` [PATCH v2 2/7] netlink, pasta: Split MTU setting functionality out of nl_link_up() Stefano Brivio
2024-08-15  8:36 ` [PATCH v2 3/7] netlink, pasta: Turn nl_link_up() into a generic function to set link flags Stefano Brivio
2024-08-15  8:36 ` [PATCH v2 4/7] netlink, pasta: Disable DAD for link-local addresses on namespace interface Stefano Brivio
2024-08-15 10:38   ` David Gibson
2024-08-15 10:59     ` Stefano Brivio
2024-08-16  0:55       ` David Gibson
2024-08-16  5:45         ` Stefano Brivio [this message]
2024-08-15  8:36 ` [PATCH v2 5/7] netlink, pasta: Fetch link-local address from namespace interface once it's up Stefano Brivio
2024-08-15 10:41   ` David Gibson
2024-08-15  8:36 ` [PATCH v2 6/7] pasta: Disable neighbour solicitations on device up to prevent DAD Stefano Brivio
2024-08-15  8:36 ` [PATCH v2 7/7] netlink: Fix typo in function comment for nl_addr_set() Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240816074535.21d7f961@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).