On Thu, Aug 03, 2023 at 12:48:07AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:29 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > So far we never checked for errors reported on netlink operations via
> > NLMSG_ERROR messages.  This has led to several subtle and tricky to debug
> > situations which would have been obvious if we knew that certain netlink
> > operations had failed.
> > 
> > Introduce a nl_do() helper that performs netlink "do" operations (that is
> > making a single change without retreiving complex information) with much
> > more thorough error checking.  As well as returning an error code if we
> > get an NLMSG_ERROR message, we also check for unexpected behaviour in
> > several places.  That way if we've made a mistake in our assumptions about
> > how netlink works it should result in a clear error rather than some subtle
> > misbehaviour.
> > 
> > We update those calls to nl_req() that can use the new wrapper to do so.
> > We will extend those to better handle errors in future.  We don't touch
> > non-"do" operations for now, those are a bit trickier.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=60
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 47 insertions(+), 12 deletions(-)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index 3170344..cdd65c0 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
> >  	return n;
> >  }
> >  
> > +/**
> > + * nl_do() - Send netlink "do" request, and wait for acknowledgement
> > + * @s:		Netlink socket
> > + * @req:	Request (will fill netlink header)
> > + * @type:	Request type
> > + * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
> > + * @len:	Request length
> > + *
> > + * Return: 0 on success, negative error code on error
> > + */
> > +static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
> > +{
> > +	struct nlmsghdr *nh;
> > +	char buf[NLBUFSIZ];
> > +	uint16_t seq;
> > +	ssize_t n;
> > +
> > +	n = nl_req(s, buf, req, type, flags, len);
> > +	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
> > +
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> > +		struct nlmsgerr *errmsg;
> > +
> > +		if (nh->nlmsg_seq != seq)
> > +			die("netlink: Unexpected response sequence number");
> > +
> > +		switch (nh->nlmsg_type) {
> > +		case NLMSG_DONE:
> > +			return 0;
> > +		case NLMSG_ERROR:
> > +			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
> > +			return errmsg->error;
> 
> This is an errno, we should probably print it here ...and, now reading
> 14/17 and 16/17: saving repeated strerror() calls there. On the other
> hand this has the advantage of one single error message instead of two,
> but... hmm.

No, this is deliberate.  We use this for the "write" side of the dup
operations.  So for the routes we don't want this to print errors for
all the times we get a net unreachable, then again for all the
duplicated routes as we try repeatedly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson