From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 83C555A026F for ; Sat, 20 Jan 2024 05:57:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1705726653; bh=pvB6aLZvg14rv2dIYF0OQiAOrm/yRp5jUBF0o+7lTzE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CtKAF+yjj/NDuGKcYDIQuldsOyEzvHB4/Q9WVzD/IHDws1QuDcbgBDzshydMJAEKc YL044K4kRv79lux4e2QpOBsr0s9MPxlO415yWWKz6VIkbA5JK6CKOLWxF5QwOIe5Md 94o1LHKCauLw1zgHlMQ3umoX3eMyoIk8lFgVY9ruYXDoFUy91iAyHilPFx+4Hfev2c FIY/WMxYaSxX4AUI7dtQhq3DDwKzDf7tKqpAMturFI5799GlIaGC5cdBh7Z+7hBwnX dFvk/GHLNneL3V5WYqaRUroqPgTZNs7dsRObOU61yXZfTB4Hzy/wpLIIa7Q9zk+AiW R9TqiSOdgotGA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TH43d3Mwyz4wwG; Sat, 20 Jan 2024 15:57:33 +1100 (AEDT) Date: Sat, 20 Jan 2024 15:47:12 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] tcp.c: leverage MSG_PEEK with offset kernel capability when available Message-ID: References: <20240114180755.1008481-1-jmaloy@redhat.com> <20240118172326.73b6f4ba@elisabeth> <20240119105630.089c5d34@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="viPfhdyPC+5pkgUd" Content-Disposition: inline In-Reply-To: <20240119105630.089c5d34@elisabeth> Message-ID-Hash: J5ASROBV64DFR7FFFQBSYN5YZWPNND3A X-Message-ID-Hash: J5ASROBV64DFR7FFFQBSYN5YZWPNND3A X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --viPfhdyPC+5pkgUd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jan 19, 2024 at 11:45:05AM +0100, Stefano Brivio wrote: > On Fri, 19 Jan 2024 11:05:02 +1100 > David Gibson wrote: >=20 > > On Thu, Jan 18, 2024 at 05:23:26PM +0100, Stefano Brivio wrote: > > > Not a full review, but a couple of comments, mostly about stuff I also > > > had in pkt_selfie.c (review of v1): > > >=20 > > > On Thu, 18 Jan 2024 14:05:38 +1100 > > > David Gibson wrote: > > > =20 > > > > On Sun, Jan 14, 2024 at 01:07:55PM -0500, Jon Maloy wrote: =20 > > > > > > > > > > [...] > > > > > > > > > > + > > > > > + s[0] =3D socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); > > > > > + s[1] =3D socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_T= CP); > > > > > + if (s[0] < 0 || s[1] < 0) { > > > > > + perror("Temporary probe socket creation failed\n"); > > > > > + goto out; > > > > > + } > > > > > + if (0 > bind(s[0], &a, sizeof(a))) { =20 > > > >=20 > > > > Since the socket address is unspecified, why do you need to bind at > > > > all? It might be clearer to explicitly set a to localhost + a > > > > specific port - because you're in a temporary namespace, you can re= ly > > > > on every port being available. =20 > > >=20 > > > There are two advantages of bind() without port, and then getsockname= (): > > > first, ip_unprivileged_port_start might have whatever value in our new > > > namespace (we don't touch it), and I wouldn't take for granted we'll > > > have CAP_SYS_ADMIN in it for all the possible start-up combinations. > > >=20 > > > Second, there's no need for a magic value. =20 > >=20 > > Good point. Note that at present we're not bind()ing to an address > > either. > >=20 > > > > > + perror("Temporary probe socket bind() failed\n"); > > > > > + goto out; > > > > > + } > > > > > + if (0 > getsockname(s[0], &a, &((socklen_t) { sizeof(a) }))) { > > > > > + perror("Temporary probe socket getsockname() failed\n"); > > > > > + goto out; > > > > > + } > > > > > + if (0 > listen(s[0], 0)) { > > > > > + perror("Temporary probe socket listen() failed\n"); > > > > > + goto out; > > > > > + } > > > > > + if (0 <=3D connect(s[1], &a, sizeof(a)) || errno !=3D EINPROGRE= SS) { > > > > > + perror("Temporary probe socket connect() failed\n"); > > > > > + goto out; > > > > > + } =20 > > > >=20 > > > > This is assuming that a will now contain the correct address to > > > > connect to. Although it will have the right port, I think the addr= ess > > > > may still be unspecified for the listening socket. =20 > > >=20 > > > Hmm, why? From getsockname(2): > > >=20 > > > getsockname() returns the current address to which the sock= et > > > sockfd is bound [...] =20 > >=20 > > But we've only bound ourselves to 0.0.0.0, which while perfectly > > cromulent for a listening socket, is no good for connect(). >=20 > Hah, "cromulent" just embiggened my dictionary! Why not, though? From > RFC 6890, 2.2.2: >=20 > +----------------------+----------------------------+ > | Attribute | Value | > +----------------------+----------------------------+ > | Address Block | 0.0.0.0/8 | > | Name | "This host on this network"| Huh. I never realised you could use 0.0.0.0 like that. I guess that makes it work, though I still feel like explicitly using localhost would be clearer. > and: >=20 > $ strace -e connect ./pkt_selfie=20 > --- SIGCHLD {si_signo=3DSIGCHLD, si_code=3DCLD_EXITED, si_pid=3D891325,= si_uid=3D1000, si_status=3D0, si_utime=3D0, si_stime=3D0} --- > connect(5, {sa_family=3DAF_INET, sin_port=3Dhtons(51155), sin_addr=3Din= et_addr("0.0.0.0")}, 16) =3D -1 EINPROGRESS (Operation now in progress) > MSG_PEEK with offset not supported > +++ exited with 0 +++ >=20 > with pkt_selfie.c from review of v1: >=20 > https://archives.passt.top/passt-dev/20231206160808.3d312733@elisabeth/ >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --viPfhdyPC+5pkgUd Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmWrUDgACgkQzQJF27ox 2GdtZhAAhE158ymlodwVu5R/3V6jRju7a6afMbvqd6k24Y60hvKIPdqHLOIa8acR kZdMBi7fsiMy0K+6kez4HEjPeGO0W3bZ4GEEEDVmtBifMrluMCQkvWvF6LINr12n wBgOBu8JQ3EVADjxr6mIkhNpjPqn5Zq5z8Ss116DGxmbb1hNG2P8hylM5YdkpJjU cFXGs4tfJQL3O76CdmWOzJHQEavKtYiupW2qY0jUpczscBls0Q1j7M/oAWE/RO+i 6dc87pCrEbvf5pqrzDgh+YvE2ECoNTtHLHPDElwjpDIoEJM+75nlgSWTMqnnPWoq tYNoAw8FmUDDbYkCFmbsGU+6VorKRa9dYlnycac+EC7oVxHhFINzLk7nCZeIPf/R Jtjbs0k5tPjw8Rj1T3VTjCtz72je7PoI4eXqHV4ZNYCAyxZmEuptR59la7f5cGtk 1VehY6HmguMrzsDjP46mTHgUaMyxp2nkMul3GDPwYgT+ZGw7vIVQZ33vV3OrnVdG pn6KzURbNhyY3xvTKEIPVqp0w06UONmk4wE1MpHo3/Mh4W1Pw0moNXu3Uq92iWrr NUcekSo9/E+yvTLMRwINnBuocT3jnyJeoUoJOi79khZiKHQ9opPoffB1Moi+A7I1 v3IZZ+3Vs36U2Dg5tunYN0ipLg7/QzxOU5vsuiMJkfKnKqzJfzk= =U6k0 -----END PGP SIGNATURE----- --viPfhdyPC+5pkgUd--