public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>, passt-dev@passt.top
Cc: mpitt@redhat.com, pholzing@redhat.com,
	David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup()
Date: Tue, 19 Mar 2024 15:53:41 +1100	[thread overview]
Message-ID: <20240319045341.487396-1-david@gibson.dropbear.id.au> (raw)

A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.

Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro.  However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.

The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request.  It needs to get all the routes, then make multiple passes
through them.  We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.

After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them.  It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).

What we do after the loop depends on which exit case we had.  If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.

We attempt to check for that second case by re-checking NLMSG_OK(nh,
status).  However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check.  To fix this we need separate variables for the number of bytes
left and the final status code.

We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.

Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/netlink.c b/netlink.c
index f93f3770..632304c1 100644
--- a/netlink.c
+++ b/netlink.c
@@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	ssize_t nlmsgs_size, status;
+	ssize_t nlmsgs_size, left, status;
 	unsigned dup_routes = 0;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
@@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * routes in the buffer at once.
 	 */
 	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
-	for (status = nlmsgs_size;
-	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
-	     nh = NLMSG_NEXT(nh, status)) {
+	for (left = nlmsgs_size;
+	     NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, left)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	if (nh->nlmsg_type != NLMSG_DONE &&
-	    (!NLMSG_OK(nh, status) || status > 0)) {
+	if (!NLMSG_OK(nh, left)) {
 		/* Process any remaining datagrams in a different
 		 * buffer so we don't overwrite the first one.
 		 */
@@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * to calculate dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
-		     NLMSG_OK(nh, status);
-		     nh = NLMSG_NEXT(nh, status)) {
+		for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
+		     NLMSG_OK(nh, left);
+		     nh = NLMSG_NEXT(nh, left)) {
 			uint16_t flags = nh->nlmsg_flags;
 			int rc;
 
-- 
@@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	ssize_t nlmsgs_size, status;
+	ssize_t nlmsgs_size, left, status;
 	unsigned dup_routes = 0;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
@@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * routes in the buffer at once.
 	 */
 	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
-	for (status = nlmsgs_size;
-	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
-	     nh = NLMSG_NEXT(nh, status)) {
+	for (left = nlmsgs_size;
+	     NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, left)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	if (nh->nlmsg_type != NLMSG_DONE &&
-	    (!NLMSG_OK(nh, status) || status > 0)) {
+	if (!NLMSG_OK(nh, left)) {
 		/* Process any remaining datagrams in a different
 		 * buffer so we don't overwrite the first one.
 		 */
@@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * to calculate dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
-		     NLMSG_OK(nh, status);
-		     nh = NLMSG_NEXT(nh, status)) {
+		for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
+		     NLMSG_OK(nh, left);
+		     nh = NLMSG_NEXT(nh, left)) {
 			uint16_t flags = nh->nlmsg_flags;
 			int rc;
 
-- 
2.44.0


             reply	other threads:[~2024-03-19  4:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19  4:53 David Gibson [this message]
2024-03-19 10:59 ` [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup() Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319045341.487396-1-david@gibson.dropbear.id.au \
    --to=david@gibson.dropbear.id.au \
    --cc=mpitt@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).