public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup()
@ 2024-03-19  4:53 David Gibson
  2024-03-19 10:59 ` Stefano Brivio
  0 siblings, 1 reply; 2+ messages in thread
From: David Gibson @ 2024-03-19  4:53 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: mpitt, pholzing, David Gibson

A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.

Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro.  However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.

The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request.  It needs to get all the routes, then make multiple passes
through them.  We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.

After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them.  It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).

What we do after the loop depends on which exit case we had.  If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.

We attempt to check for that second case by re-checking NLMSG_OK(nh,
status).  However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check.  To fix this we need separate variables for the number of bytes
left and the final status code.

We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.

Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/netlink.c b/netlink.c
index f93f3770..632304c1 100644
--- a/netlink.c
+++ b/netlink.c
@@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	ssize_t nlmsgs_size, status;
+	ssize_t nlmsgs_size, left, status;
 	unsigned dup_routes = 0;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
@@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * routes in the buffer at once.
 	 */
 	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
-	for (status = nlmsgs_size;
-	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
-	     nh = NLMSG_NEXT(nh, status)) {
+	for (left = nlmsgs_size;
+	     NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, left)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	if (nh->nlmsg_type != NLMSG_DONE &&
-	    (!NLMSG_OK(nh, status) || status > 0)) {
+	if (!NLMSG_OK(nh, left)) {
 		/* Process any remaining datagrams in a different
 		 * buffer so we don't overwrite the first one.
 		 */
@@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * to calculate dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
-		     NLMSG_OK(nh, status);
-		     nh = NLMSG_NEXT(nh, status)) {
+		for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
+		     NLMSG_OK(nh, left);
+		     nh = NLMSG_NEXT(nh, left)) {
 			uint16_t flags = nh->nlmsg_flags;
 			int rc;
 
-- 
@@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	ssize_t nlmsgs_size, status;
+	ssize_t nlmsgs_size, left, status;
 	unsigned dup_routes = 0;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
@@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * routes in the buffer at once.
 	 */
 	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
-	for (status = nlmsgs_size;
-	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
-	     nh = NLMSG_NEXT(nh, status)) {
+	for (left = nlmsgs_size;
+	     NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, left)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	if (nh->nlmsg_type != NLMSG_DONE &&
-	    (!NLMSG_OK(nh, status) || status > 0)) {
+	if (!NLMSG_OK(nh, left)) {
 		/* Process any remaining datagrams in a different
 		 * buffer so we don't overwrite the first one.
 		 */
@@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 	 * to calculate dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
-		     NLMSG_OK(nh, status);
-		     nh = NLMSG_NEXT(nh, status)) {
+		for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
+		     NLMSG_OK(nh, left);
+		     nh = NLMSG_NEXT(nh, left)) {
 			uint16_t flags = nh->nlmsg_flags;
 			int rc;
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup()
  2024-03-19  4:53 [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup() David Gibson
@ 2024-03-19 10:59 ` Stefano Brivio
  0 siblings, 0 replies; 2+ messages in thread
From: Stefano Brivio @ 2024-03-19 10:59 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, mpitt, pholzing

On Tue, 19 Mar 2024 15:53:41 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
> same read() as families") changed netlink behaviour so that the
> NLMSG_DONE terminating a bunch of responses can go in the same
> datagram as those responses, rather than in a separate one.
> 
> Our netlink code is supposed to handle that behaviour, and indeed does
> so for most cases, using the nl_foreach() macro.  However, there was a
> subtle error in nl_route_dup() which doesn't work with this change.
> f00b1534 ("netlink: Don't try to get further datagrams in
> nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
> subtle error.
> 
> The problem arises because nl_route_dup(), unlike other cases doesn't
> just make a single pass through all the responses to a netlink
> request.  It needs to get all the routes, then make multiple passes
> through them.  We don't really have anywhere to buffer multiple
> datagrams, so we only support the case where all the routes fit in a
> single datagram - but we need to fail gracefully when that's not the
> case.
> 
> After receiving the first datagram of responses (with nl_next()) we
> have a first loop scanning them.  It needs to exit when either we run
> out of messages in the datagram (!NLMSG_OK()) or when we get a message
> indicating the last response (nl_status() <= 0).
> 
> What we do after the loop depends on which exit case we had.  If we
> saw the last response, we're done, but otherwise we need to receive
> more datagrams to discard the rest of the responses.
> 
> We attempt to check for that second case by re-checking NLMSG_OK(nh,
> status).  However in the got-last-response case, we've altered status
> from the number of remaining bytes to the error code (usually 0). That
> means NLMSG_OK() now returns false even if it didn't during the loop
> check.  To fix this we need separate variables for the number of bytes
> left and the final status code.
> 
> We also checked status after the loop, but this was redundant: we can
> only exit the loop with NLMSG_OK() == true if status <= 0.
> 
> Reported-by: Martin Pitt <mpitt@redhat.com>
> Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
> Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
> Link: https://github.com/containers/podman/issues/22052
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Applied.

-- 
Stefano


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-03-19 11:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-19  4:53 [PATCH] netlink: Fix handling of NLMSG_DONE in nl_route_dup() David Gibson
2024-03-19 10:59 ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).