From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id ED7645A026D for ; Mon, 29 May 2023 00:09:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685311743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VrWCHWOH7awwOy4s2dNnXiAaSMxZcAKxRgtWahA0AIQ=; b=Qno32eZHv7IXXtBpjpvVSd+EYlOC/PWUArd+eoMhdllac2SOdOsAl60yFbEgWXq21lcPZ7 m4LZjzsbJorl/QXvALs6TVGP8uH1D11/JxL3kVsT42JIPw0fvMBR1BtF1AwdoxUqgVsMPJ wMNrzAUDpMscTRsi1i6ynVEAmNhggiY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-507-1anJQ7N7N5G59_cA_bHzoA-1; Sun, 28 May 2023 18:09:00 -0400 X-MC-Unique: 1anJQ7N7N5G59_cA_bHzoA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C5D94800BFF; Sun, 28 May 2023 22:08:59 +0000 (UTC) Received: from elisabeth (unknown [10.39.208.44]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C9471C154D1; Sun, 28 May 2023 22:08:58 +0000 (UTC) Date: Mon, 29 May 2023 00:08:56 +0200 From: Stefano Brivio To: Juan Orti Subject: Re: IPv6 UDP not working Message-ID: <20230529000856.6e846656@elisabeth> In-Reply-To: References: <20230528163812.69e359dd@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 4NRBTSRW6WZX423MHVAD7ZIGJCPCSRBN X-Message-ID-Hash: 4NRBTSRW6WZX423MHVAD7ZIGJCPCSRBN X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , "passt-user@passt.top" X-Mailman-Version: 3.3.8 Precedence: list List-Id: "For users: support, questions and answers" Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sun, 28 May 2023 16:27:13 +0000 Juan Orti wrote: > ------- Original Message ------- > El domingo, 28 de mayo de 2023 a las 16:38, Stefano Brivio escribi=C3=B3: >=20 > > I guess that might come from the IPV6_PKTINFO ancillary data > > (cmsg_type 0x32) -- I'm not sure how and why it's used here as strace > > doesn't dump the CMSG_DATA content, but, having a look at > > ip6_datagram_send_ctl() (net/ipv6/datagram.c), EINVAL might come from: > >=20 > > 1. a link-local address being passed along... I doubt that's the case > >=20 > > 2. a non-local address (or one we can't bind to anyway) being used. To > > check if we're in this case, it would be helpful if you could share > > the addressing information from the container (ip -6 address show), > > and if you could try 'sysctl -w net.ipv6.ip_nonlocal_bind =3D 1', > > again from the container. =20 >=20 >=20 > net.ipv6.ip_nonlocal_bind=3D1 is not helping. This is the container netwo= rk config: >=20 > # ip -6 address show > 1: lo: mtu 65536 state UNKNOWN qlen 1000 > inet6 ::1/128 scope host=20 > valid_lft forever preferred_lft forever > 2: enp88s0: mtu 65520 state UNKNOWN qle= n 1000 > inet6 fddc:f797:78ef:70::5/64 scope global flags 02=20 > valid_lft forever preferred_lft forever > inet6 fe80::5cef:4eff:fe6c:551f/64 scope link=20 > valid_lft forever preferred_lft forever >=20 > # ip -6 r show table all > fddc:f797:78ef:70::/64 dev enp88s0 metric 256=20 > fe80::/64 dev enp88s0 metric 256=20 > default via fe80::ea9f:80ff:fe5d:3d6e dev enp88s0 metric 1024=20 > local ::1 dev lo table local metric 0=20 > local fddc:f797:78ef:70::5 dev enp88s0 table local metric 0=20 > local fe80::5cef:4eff:fe6c:551f dev enp88s0 table local metric 0=20 > multicast ff00::/8 dev enp88s0 table local metric 256=20 >=20 >=20 > With a tcpdump inside the container I can see that the incoming packets a= re actually arriving with the link-local address as the destination (is thi= s expected?). >=20 > 16:18:26.248659 IP6 (hlim 255, next-header UDP (17) payload length: 63) f= ddc:f797:78ef:10::b46.42091 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 62= 15+ [1au] A? www.google.com. (55) Hmm, it depends: https://passt.top/passt/tree/udp.c?id=3De3b19530e4a689f9f8e417ebf737dfca2= 340342b#n646 I'm not sure what's the original source address of our DNS query (you can find that out with tcpdump in the parent namespace). For example, if it's a loopback address, we go ahead and try to convert both source and destination address to our notion of (observed) link-local addresses, because we can't use a loopback address on a non-loopback interface (non-lo in the container). But I guess in this case it's not a loopback address: the default gateway address, copied to the container, is fe80::ea9f:80ff:fe5d:3d6e, which is a link-local address, but we don't use it, so I assume we end up either in the IN6_IS_ADDR_LINKLOCAL(src) condition, or in the final 'else' clause. At that point, the address we've seen the guest using becomes our destination address. It can even be a link-local address if we haven't observed a unicast address used, yet. It would be interesting to see what happens if you generate traffic, from the container, coming from fddc:f797:78ef:70::5, before a DNS query is sent (a TCP request via IPv6 should be enough). I'm not swearing on the correctness of this logic, it's a result of handling several corner cases, it's rather ugly at the moment, and David is currently considering how to clean that up. By the way, this might also happen to be "fixed" on HEAD, as there we copy all the addresses and all the routes, by default, from the parent namespace to the container namespace. > 16:18:31.253942 IP6 (hlim 255, next-header UDP (17) payload length: 63) f= ddc:f797:78ef:10::b46.34965 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 62= 15+ [1au] A? www.google.com. (55) > 16:18:36.257294 IP6 (hlim 255, next-header UDP (17) payload length: 63) f= ddc:f797:78ef:10::b46.55302 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 62= 15+ [1au] A? www.google.com. (55) >=20 >=20 > TCP also uses the link-local address, however it works: ...yes, as far as I know there are no normative references preventing a non-link-local address from contacting a link-local one. This just happens to be a problem because AdguardHome uses IPV6_PKTINFO, with that same address I guess, in its sendmsg(), and for some reason I didn't really investigate that leads to EINVAL on Linux, but it looks like an implementation detail (specific to UDP) to me. --=20 Stefano