From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 825E45A026D for ; Fri, 15 Mar 2024 14:11:47 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710508306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=csfPeJlllNWEoDnCUvb8P85NVgid6SJqWhuiJK+DZP4=; b=cgMfw0a44PN1tDgA6F+OIqVQcy19SAGms6BrLZutBgujAEOS4hJDs/tMSEacq5bHhkajEx 4rBDh8GMdWm6Ei6AS6fiVR/+YQUadbadRscQKR4DkHxFSagB8nRRf+7ObgQ7l9Q0CoKMXh 9pbGxVB0CsZXWdiCGkPPkuO5lgX6rLA= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-253-YcSx9iieM7eWfOYZpf_JFg-1; Fri, 15 Mar 2024 09:11:44 -0400 X-MC-Unique: YcSx9iieM7eWfOYZpf_JFg-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-513b2e92c19so2641273e87.3 for ; Fri, 15 Mar 2024 06:11:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710508303; x=1711113103; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=csfPeJlllNWEoDnCUvb8P85NVgid6SJqWhuiJK+DZP4=; b=BZLeNVmozwQ+trCjDQ4dmU8TQwPGMjPjlxJVxQ/WVlhOXk8JF2sNYBAZP61IXF/B4C 7eQgZbjSfHX1dKzjYbkZna9Wn1XNOa3zV997WEpIRqy6ury0wXU27NqQzcwgH9HvoBZG I4Mcn2LIDfTQBYvNtqf+FgeHjjps+YvES5atY2ynD0Up3ODnK8e3/DgyseYAVawgwNo6 JFgPeLQCwyvs+WcYPwyNEuJXEuJ/YjocmUWAhWEhI3KwDIcPrLIgCi2x6NQ/nUlWT48i A3E4QzWLRmX2rL2y5LxlQaWrryV9vw/dwBHEoqpfLW/WLQMgBknqQt5n+4vueAoLkuzk oQdw== X-Forwarded-Encrypted: i=1; AJvYcCVp8IPFv4NRoeDpxybut5v/3W1072UbG4nIe7vfuO62TauDHKFAkU988qw/QwUw34ojHQu2t6D13wVmo4COQoKzW+x7 X-Gm-Message-State: AOJu0Yz6oL+sKezlOjgVidM7gOTIbSTlvAX8AoceecfiynLjANUKs5xm UHIGQ1BfeUaaxu7md0GkqpLT+Aawn52ELlSfFcuzofaJuNExPOFoMwvvKrB885KJClQ4Lf7qLyF qDkh6ZJagPkckPCbHU/rQapIriMAm4NCiKA/i5ee8JEUBLz89Og== X-Received: by 2002:a05:6512:1594:b0:512:ac3a:7f27 with SMTP id bp20-20020a056512159400b00512ac3a7f27mr3997785lfb.66.1710508303382; Fri, 15 Mar 2024 06:11:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGERyqcmTuCm6AQ/GXlBtfgeN/wus6Ae35SXv65TIS4/xK4V5myjwNv4zcAiNC0LHwr7kCiHA== X-Received: by 2002:a05:6512:1594:b0:512:ac3a:7f27 with SMTP id bp20-20020a056512159400b00512ac3a7f27mr3997770lfb.66.1710508302983; Fri, 15 Mar 2024 06:11:42 -0700 (PDT) Received: from [192.168.188.25] ([80.243.52.133]) by smtp.gmail.com with ESMTPSA id l12-20020a05600c4f0c00b004122b7a680dsm5859391wmq.21.2024.03.15.06.11.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 15 Mar 2024 06:11:42 -0700 (PDT) Message-ID: <2772fc43-252a-4dea-96fb-454d615f9d40@redhat.com> Date: Fri, 15 Mar 2024 14:11:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE To: Stefano Brivio , passt-dev@passt.top References: <20240315112432.382212-1-sbrivio@redhat.com> From: Paul Holzinger In-Reply-To: <20240315112432.382212-1-sbrivio@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: 7UUSGYJZ3DCTWPU2Z43W3QHSRQQUJ44B X-Message-ID-Hash: 7UUSGYJZ3DCTWPU2Z43W3QHSRQQUJ44B X-MailFrom: pholzing@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Martin Pitt , David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 15/03/2024 12:24, Stefano Brivio wrote: > Martin reports that, with Fedora Linux kernel version > kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64, > including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same > read() as families"), pasta doesn't exit once the network namespace > is gone. > > Actually, pasta is completely non-functional, at least with default > options, because nl_route_dup(), which duplicates routes from the > parent namespace into the target namespace at start-up, is stuck on > a second receive operation for RTM_GETROUTE. > > However, with that commit, the kernel is now able to fit the whole > response, including the NLMSG_DONE message, into a single datagram, > so no further messages will be received. > > It turns out that commit 4d6e9d0816e2 ("netlink: Always process all > responses to a netlink request") accidentally relied on the fact that > we would always get at least two datagrams as a response to > RTM_GETROUTE. > > That is, the test to check if we expect another datagram, is based > on the 'status' variable, which is 0 if we just parsed NLMSG_DONE, > but we'll also expect another datagram if NLMSG_OK on the last > message is false. But NLMSG_OK with a zero length is always false. > > The problem is that we don't distinguish if status is zero because > we got a NLMSG_DONE message, or because we processed all the > available datagram bytes. > > Introduce an explicit check on NLMSG_DONE. We should probably > refactor this slightly, for example by introducing a special return > code from nl_status(), but this is probably the least invasive fix > for the issue at hand. > > Reported-by: Martin Pitt > Link: https://github.com/containers/podman/issues/22052 > Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") > Signed-off-by: Stefano Brivio > --- > netlink.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/netlink.c b/netlink.c > index 9e7cccb..20de9b3 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -525,7 +525,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, > } > } > > - if (!NLMSG_OK(nh, status) || status > 0) { > + if (nh->nlmsg_type != NLMSG_DONE && > + (!NLMSG_OK(nh, status) || status > 0)) { > /* Process any remaining datagrams in a different > * buffer so we don't overwrite the first one. > */ I was about to add my tested-by when I noticed a weird thing, but that happens only on the new kernel as well: On the host $ ip route default via 192.168.122.1 dev enp1s0 proto dhcp src 192.168.122.92 metric 100 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.92 metric 100 ./pasta --config-net ip route default via 192.168.122.1 dev enp1s0 proto dhcp metric 100 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.92 192.168.122.0/24 dev enp1s0 proto kernel scope link metric 100 It seems we now have the same local route duplicated for some reason? I am not sure if it is caused by this patch as I cannot test versions without this patch on a newer kernel. I can however confirm that this patch works and it no longer hangs.