From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id DE6165A0082 for ; Mon, 7 Nov 2022 04:32:39 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4N5GyD1G7Mz4xN0; Mon, 7 Nov 2022 14:32:36 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1667791956; bh=+jYSvqOiyzJ1ypJVIP/HPb/PkgSdfDE5hd3v554gpX8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SlfEpIWat2TUlxdwwVPwqRS6c7fzSIAQEGfGnCV6e9dlG6OtTSw14eA6L6NbZ6CaG Ql5HoZWYY9ITocs7yKvA8pZ/nAPNtWCl5nISH6p7D/rxFxvsmw2amtqkalMKOhNFrQ mfnpftINFhJeOgvVWkMp1e04NQNcH38g6S5ChK+o= Date: Mon, 7 Nov 2022 14:17:44 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns Message-ID: References: <20221104015328.3831630-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="DqV3YqJu/c62cJyu" Content-Disposition: inline In-Reply-To: <20221104015328.3831630-1-sbrivio@redhat.com> Message-ID-Hash: HR3EX5FJBBL4NRKXEBCXDQ4XNUFP2ZEV X-Message-ID-Hash: HR3EX5FJBBL4NRKXEBCXDQ4XNUFP2ZEV X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --DqV3YqJu/c62cJyu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 04, 2022 at 02:53:28AM +0100, Stefano Brivio wrote: > This happens about every third time on the two_guests/basic test, and > on that test only: we clone() twice, first to spawn a child, then to > spawn a thread to check that we can enter the target network > namespace. >=20 > In this thread, we open a file descriptor associated to the target > namespace. It might happen that it doesn't exist yet: the kernel can > legitimately take its time to create one, after clone(). In this > case, at least on a 5.15 Linux kernel, trying to open that file again > always yields EACCES, and we get stuck there. >=20 > This only occurs if we spawn two instances of pasta very close > together, as it's done in the two_guests/basic case. >=20 > I couldn't figure out what the race condition is, yet, and especially > if it's a kernel issue or something we're doing wrong. However, if we > wait until the execvp() in the child is done, the issue disappears. > I'm not sure yet if it's just because of timing and this is hiding an > unrelated race condition. >=20 > The workaround consists of checking /proc/PID/exe against our own. If > it's different, that means execvp() already completed and we can > proceed. It's rather ugly, but much better than the alternative. > Leave a FIXME there for the moment being. >=20 > Signed-off-by: Stefano Brivio Weird and ugly, but seems like we need it. Reviewed-by: David Gibson > --- > pasta.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) >=20 > diff --git a/pasta.c b/pasta.c > index db86317..36072b2 100644 > --- a/pasta.c > +++ b/pasta.c > @@ -81,9 +81,26 @@ void pasta_child_handler(int signal) > */ > static int pasta_wait_for_ns(void *arg) > { > + char ns_exe_link[PATH_MAX], ns[PATH_MAX]; > struct ctx *c =3D (struct ctx *)arg; > int flags =3D O_RDONLY | O_CLOEXEC; > - char ns[PATH_MAX]; > + char exe[PATH_MAX] =3D { 0 }; > + > + /* FIXME: Why do we have to wait until execvp() is done in the child? > + * If we don't, and the first call to open() below returns ENOENT, any > + * subsequent call to it returns EACCES, at least on Linux 5.15, even > + * though the observed PID is correct, and another process can open that > + * path, and call setns() on that. > + */ > + snprintf(ns_exe_link, PATH_MAX, "/proc/%i/exe", pasta_child_pid); > + if (readlink("/proc/self/exe", exe, PATH_MAX - 1) !=3D -1) { > + char ns_exe[PATH_MAX] =3D { 0 }; > + > + do { > + if (readlink(ns_exe_link, ns_exe, PATH_MAX - 1) =3D=3D -1) > + break; > + } while (!strncmp(exe, ns_exe, PATH_MAX - 1)); > + } > =20 > snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid); > do --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --DqV3YqJu/c62cJyu Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNoeMAACgkQgypY4gEw YSLo3xAAx4Rrcf95SL9vk12rD6wJwH3XYfq/Uswp/rHxN2IyDAGhslB/WjBjQy1G W7XvHLOLKBR5HPidBVOONsY05cE5aKdfgVSXnw+0isQz4JTIZf7PQHxY5MmvBht8 DeKeDzFRkq0a2pS2q0OfMZXyzslOstMBCxLeL0Fij8Pngp+zI6hlik6kaMVuF/cB qVQGb/ClStlHfBjhUhPBA3Zv0Qorlml/SEshugapMoQ8Bctd5yqY7HrB+VqGRZUr NE2hoGPgkmQHSTivBbO+hBUZymlG/S4JLUMraJFGKiZ/sZBBxLtjMvxQPZfQVeSy VJhOho/ctDJ0Q/+L8h9XQZtdlOoyOcCzPljkxTQgBUSZZvd1OhyTdmUNWqAbV2L1 bs5Apn+qH8nO47RaQMN/CH8BWGiHFuyRRp3+l8SbWubGvvpTWw3xE1wiHVJs7yZh iThyBepzH2VXij3Ar4jsrsUR/hiHpXRTWg3MmWo1Yg8PmLGG5fZYw2DqyvXopvnx YDDuIEPZTQG80SxvAnGu4tDClgYVBY11SIU00umtD0PnCUt4CD5ZvntGmVNLSnLN es+8CvqyKpZ2TJfkNCPLFSbLc6Buzs6Il7PJ44LYlAomwT3hRYwFjnllOEFMa+B5 zlo63phJ+QYd2oU12sZMEfr9R6HJGF8VTh0DHkvFmCBF3GN5pdA= =aJ0P -----END PGP SIGNATURE----- --DqV3YqJu/c62cJyu--