From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 1042D5A026F for ; Fri, 15 Mar 2024 15:53:36 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710514416; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6GhHiGAma17uVqeaR9rPF3KRPDM1hOSsF4cNbfQRdlM=; b=By3jkcxXX26Nmdd1mzUP738IZ/rxbquGzZMC0hx2cFft9ynWy9kjTCBL8foHWaIhLyrCrx jMrcD35zFNfm6m6WGdK8YkQd9Vd7ptm6cx37Q02wK2X3WmxVlBt52WNAdmHMaArBnoejLd 9j03+GwWaW1zXnv/fr0Z9YEqeADYOFg= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-401-6-s3L8_DODufLPPSxN88rw-1; Fri, 15 Mar 2024 10:53:34 -0400 X-MC-Unique: 6-s3L8_DODufLPPSxN88rw-1 Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-5689d80e808so1209612a12.2 for ; Fri, 15 Mar 2024 07:53:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710514413; x=1711119213; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6GhHiGAma17uVqeaR9rPF3KRPDM1hOSsF4cNbfQRdlM=; b=R7IX0daKetO3gOP2mzryFuxxC6YRWOSYpiJAoqDEGfPfk5F+AUlWLsQoHWCHO63t9U FwESuamGfiXuIsfFDFjZO/60V+JzXiGLiJilrL3EgVChcA+VIPhgoXzDeOObyt6xrOBY aTHIcbJJPg2WS2Gl4FpGVcptLyBlH7jFr4GO/bQYvn4JcOtpKlyc08K1IN2yTtRsE72h Mb8qKU6VbwiXIWI2616h4as/d+sZLED1ANoA7dcw5Y00Jh7l1levszGdBRAkGOguXe1v eNvek5/teaBFCdvsxeTG+oF83/nuxZ+Pb4h2OHNQp+JOlZIpitslY5+MukeX+L5Gokib FIag== X-Gm-Message-State: AOJu0YyTgEBj7Eb3Use7ejljakQIQV218V29Pb80TgZ32OSj0kGrojI0 VL1X0xlpqLfPLpcxx5gzNl4lVL/2KcIfM/rJBhkVBaRWLX6j1Ls+Srf/vU4TufaB59FRIWNXlIR XXNwigcrmuzTJzrMn/onPMMXAWwet3dx4VpPWYmcWGQz/M6aMnw== X-Received: by 2002:a05:6402:5019:b0:567:9ddd:65c3 with SMTP id p25-20020a056402501900b005679ddd65c3mr3223078eda.29.1710514413206; Fri, 15 Mar 2024 07:53:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFEFsErbPweRmaiAm0QZUPq4hapB2uGObhu2D6EvdaOKYuQ/Kbm1KFE7xqC2CJ81h3jkqfxtw== X-Received: by 2002:a05:6402:5019:b0:567:9ddd:65c3 with SMTP id p25-20020a056402501900b005679ddd65c3mr3223063eda.29.1710514412678; Fri, 15 Mar 2024 07:53:32 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id dk4-20020a0564021d8400b0056706105abesm1764541edb.33.2024.03.15.07.53.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Mar 2024 07:53:31 -0700 (PDT) Date: Fri, 15 Mar 2024 15:52:50 +0100 From: Stefano Brivio To: Paul Holzinger Subject: Re: [PATCH] netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE Message-ID: <20240315155250.237cb94f@elisabeth> In-Reply-To: <2772fc43-252a-4dea-96fb-454d615f9d40@redhat.com> References: <20240315112432.382212-1-sbrivio@redhat.com> <2772fc43-252a-4dea-96fb-454d615f9d40@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.36; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 7G4RWRIPTFHWK4LYKZD7OHMBCP67KIOM X-Message-ID-Hash: 7G4RWRIPTFHWK4LYKZD7OHMBCP67KIOM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Martin Pitt , David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 15 Mar 2024 14:11:40 +0100 Paul Holzinger wrote: > On 15/03/2024 12:24, Stefano Brivio wrote: > > Martin reports that, with Fedora Linux kernel version > > kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64, > > including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same > > read() as families"), pasta doesn't exit once the network namespace > > is gone. > > > > Actually, pasta is completely non-functional, at least with default > > options, because nl_route_dup(), which duplicates routes from the > > parent namespace into the target namespace at start-up, is stuck on > > a second receive operation for RTM_GETROUTE. > > > > However, with that commit, the kernel is now able to fit the whole > > response, including the NLMSG_DONE message, into a single datagram, > > so no further messages will be received. > > > > It turns out that commit 4d6e9d0816e2 ("netlink: Always process all > > responses to a netlink request") accidentally relied on the fact that > > we would always get at least two datagrams as a response to > > RTM_GETROUTE. > > > > That is, the test to check if we expect another datagram, is based > > on the 'status' variable, which is 0 if we just parsed NLMSG_DONE, > > but we'll also expect another datagram if NLMSG_OK on the last > > message is false. But NLMSG_OK with a zero length is always false. > > > > The problem is that we don't distinguish if status is zero because > > we got a NLMSG_DONE message, or because we processed all the > > available datagram bytes. > > > > Introduce an explicit check on NLMSG_DONE. We should probably > > refactor this slightly, for example by introducing a special return > > code from nl_status(), but this is probably the least invasive fix > > for the issue at hand. > > > > Reported-by: Martin Pitt > > Link: https://github.com/containers/podman/issues/22052 > > Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") > > Signed-off-by: Stefano Brivio > > --- > > netlink.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/netlink.c b/netlink.c > > index 9e7cccb..20de9b3 100644 > > --- a/netlink.c > > +++ b/netlink.c > > @@ -525,7 +525,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, > > } > > } > > > > - if (!NLMSG_OK(nh, status) || status > 0) { > > + if (nh->nlmsg_type != NLMSG_DONE && > > + (!NLMSG_OK(nh, status) || status > 0)) { > > /* Process any remaining datagrams in a different > > * buffer so we don't overwrite the first one. > > */ > I was about to add my tested-by when I noticed a weird thing, but that > happens only on the new kernel as well: > > On the host $ ip route default via 192.168.122.1 dev enp1s0 proto dhcp > src 192.168.122.92 metric 100 192.168.122.0/24 dev enp1s0 proto kernel > scope link src 192.168.122.92 metric 100 ./pasta --config-net ip route > default via 192.168.122.1 dev enp1s0 proto dhcp metric 100 > 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.92 > 192.168.122.0/24 dev enp1s0 proto kernel scope link metric 100 It seems > we now have the same local route duplicated for some reason? I can reproduce this, it looks rather weird at the moment. Let's say I have two IPv4 routes on the host: $ ip ro sh default via 88.198.0.161 dev eth0 proto dhcp src 88.198.0.164 metric 100 88.198.0.160/27 dev eth0 proto kernel scope link src 88.198.0.164 metric 100 if I stop pasta after it re-creates the default route (and not the subnet one, yet), say, at this point (the message with padding confuses strace so we don't get the full decoding, but it's valid): recvfrom(5, [[{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=13, nlmsg_pid=28341}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_DHCP, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("88.198.0.164")], [{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("88.198.0.161")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("eth0")]]], [{nlmsg_len=68, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=13, nlmsg_pid=28341}, {rtm_family=AF_INET, rtm_dst_len=27, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_KERNEL, rtm_scope=RT_SCOPE_LINK, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN], [{nla_len=8, nla_type=RTA_DST}, inet_addr("88.198.0.160")], [{ nla_len=8, nla_type=RTA_PRIORITY}, 100], [{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("88.198.0.164")], [{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("eth0")]]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI|NLM_F_DUMP_FILTERED, nlmsg_seq=13, nlmsg_pid=28341}, 0]], 65536, 0, NULL, NULL) = 156 sendto(7, [{nlmsg_len=68, nlmsg_type=0x18 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|0x400, nlmsg_seq=14, nlmsg_pid=0}, "\x02\x00\x00\x00\xfe\x10\x00\x01\x00\x00\x00\x00\x08\x00\x0f\x00\xfe\x00\x00\x00\x08\x00\x06\x00\x64\x00\x00\x00\x08\x00\x00\x00"...], 68, 0, NULL, 0) = 68 I see that the kernel already created both in the target namespace: # ip ro sh default via 88.198.0.161 dev eth0 proto dhcp metric 100 88.198.0.160/27 dev eth0 proto kernel scope link src 88.198.0.164 ...and the second one gets the RTA_PREFSRC attribute we just stripped. I'm still trying to find out what might cause this, even if it looks harmless. -- Stefano