public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns
@ 2022-11-04  1:53 Stefano Brivio
  2022-11-07  3:17 ` David Gibson
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Brivio @ 2022-11-04  1:53 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson

This happens about every third time on the two_guests/basic test, and
on that test only: we clone() twice, first to spawn a child, then to
spawn a thread to check that we can enter the target network
namespace.

In this thread, we open a file descriptor associated to the target
namespace. It might happen that it doesn't exist yet: the kernel can
legitimately take its time to create one, after clone(). In this
case, at least on a 5.15 Linux kernel, trying to open that file again
always yields EACCES, and we get stuck there.

This only occurs if we spawn two instances of pasta very close
together, as it's done in the two_guests/basic case.

I couldn't figure out what the race condition is, yet, and especially
if it's a kernel issue or something we're doing wrong. However, if we
wait until the execvp() in the child is done, the issue disappears.
I'm not sure yet if it's just because of timing and this is hiding an
unrelated race condition.

The workaround consists of checking /proc/PID/exe against our own. If
it's different, that means execvp() already completed and we can
proceed. It's rather ugly, but much better than the alternative.
Leave a FIXME there for the moment being.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 pasta.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/pasta.c b/pasta.c
index db86317..36072b2 100644
--- a/pasta.c
+++ b/pasta.c
@@ -81,9 +81,26 @@ void pasta_child_handler(int signal)
  */
 static int pasta_wait_for_ns(void *arg)
 {
+	char ns_exe_link[PATH_MAX], ns[PATH_MAX];
 	struct ctx *c = (struct ctx *)arg;
 	int flags = O_RDONLY | O_CLOEXEC;
-	char ns[PATH_MAX];
+	char exe[PATH_MAX] = { 0 };
+
+	/* FIXME: Why do we have to wait until execvp() is done in the child?
+	 * If we don't, and the first call to open() below returns ENOENT, any
+	 * subsequent call to it returns EACCES, at least on Linux 5.15, even
+	 * though the observed PID is correct, and another process can open that
+	 * path, and call setns() on that.
+	 */
+	snprintf(ns_exe_link, PATH_MAX, "/proc/%i/exe", pasta_child_pid);
+	if (readlink("/proc/self/exe", exe, PATH_MAX - 1) != -1) {
+		char ns_exe[PATH_MAX] = { 0 };
+
+		do {
+			if (readlink(ns_exe_link, ns_exe, PATH_MAX - 1) == -1)
+				break;
+		} while (!strncmp(exe, ns_exe, PATH_MAX - 1));
+	}
 
 	snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid);
 	do
-- 
@@ -81,9 +81,26 @@ void pasta_child_handler(int signal)
  */
 static int pasta_wait_for_ns(void *arg)
 {
+	char ns_exe_link[PATH_MAX], ns[PATH_MAX];
 	struct ctx *c = (struct ctx *)arg;
 	int flags = O_RDONLY | O_CLOEXEC;
-	char ns[PATH_MAX];
+	char exe[PATH_MAX] = { 0 };
+
+	/* FIXME: Why do we have to wait until execvp() is done in the child?
+	 * If we don't, and the first call to open() below returns ENOENT, any
+	 * subsequent call to it returns EACCES, at least on Linux 5.15, even
+	 * though the observed PID is correct, and another process can open that
+	 * path, and call setns() on that.
+	 */
+	snprintf(ns_exe_link, PATH_MAX, "/proc/%i/exe", pasta_child_pid);
+	if (readlink("/proc/self/exe", exe, PATH_MAX - 1) != -1) {
+		char ns_exe[PATH_MAX] = { 0 };
+
+		do {
+			if (readlink(ns_exe_link, ns_exe, PATH_MAX - 1) == -1)
+				break;
+		} while (!strncmp(exe, ns_exe, PATH_MAX - 1));
+	}
 
 	snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid);
 	do
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns
  2022-11-04  1:53 [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns Stefano Brivio
@ 2022-11-07  3:17 ` David Gibson
  2022-11-07  9:51   ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: David Gibson @ 2022-11-07  3:17 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2954 bytes --]

On Fri, Nov 04, 2022 at 02:53:28AM +0100, Stefano Brivio wrote:
> This happens about every third time on the two_guests/basic test, and
> on that test only: we clone() twice, first to spawn a child, then to
> spawn a thread to check that we can enter the target network
> namespace.
> 
> In this thread, we open a file descriptor associated to the target
> namespace. It might happen that it doesn't exist yet: the kernel can
> legitimately take its time to create one, after clone(). In this
> case, at least on a 5.15 Linux kernel, trying to open that file again
> always yields EACCES, and we get stuck there.
> 
> This only occurs if we spawn two instances of pasta very close
> together, as it's done in the two_guests/basic case.
> 
> I couldn't figure out what the race condition is, yet, and especially
> if it's a kernel issue or something we're doing wrong. However, if we
> wait until the execvp() in the child is done, the issue disappears.
> I'm not sure yet if it's just because of timing and this is hiding an
> unrelated race condition.
> 
> The workaround consists of checking /proc/PID/exe against our own. If
> it's different, that means execvp() already completed and we can
> proceed. It's rather ugly, but much better than the alternative.
> Leave a FIXME there for the moment being.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Weird and ugly, but seems like we need it.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  pasta.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/pasta.c b/pasta.c
> index db86317..36072b2 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -81,9 +81,26 @@ void pasta_child_handler(int signal)
>   */
>  static int pasta_wait_for_ns(void *arg)
>  {
> +	char ns_exe_link[PATH_MAX], ns[PATH_MAX];
>  	struct ctx *c = (struct ctx *)arg;
>  	int flags = O_RDONLY | O_CLOEXEC;
> -	char ns[PATH_MAX];
> +	char exe[PATH_MAX] = { 0 };
> +
> +	/* FIXME: Why do we have to wait until execvp() is done in the child?
> +	 * If we don't, and the first call to open() below returns ENOENT, any
> +	 * subsequent call to it returns EACCES, at least on Linux 5.15, even
> +	 * though the observed PID is correct, and another process can open that
> +	 * path, and call setns() on that.
> +	 */
> +	snprintf(ns_exe_link, PATH_MAX, "/proc/%i/exe", pasta_child_pid);
> +	if (readlink("/proc/self/exe", exe, PATH_MAX - 1) != -1) {
> +		char ns_exe[PATH_MAX] = { 0 };
> +
> +		do {
> +			if (readlink(ns_exe_link, ns_exe, PATH_MAX - 1) == -1)
> +				break;
> +		} while (!strncmp(exe, ns_exe, PATH_MAX - 1));
> +	}
>  
>  	snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid);
>  	do

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns
  2022-11-07  3:17 ` David Gibson
@ 2022-11-07  9:51   ` Stefano Brivio
  2022-11-07 18:40     ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Brivio @ 2022-11-07  9:51 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 7 Nov 2022 14:17:44 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Nov 04, 2022 at 02:53:28AM +0100, Stefano Brivio wrote:
> > This happens about every third time on the two_guests/basic test, and
> > on that test only: we clone() twice, first to spawn a child, then to
> > spawn a thread to check that we can enter the target network
> > namespace.
> > 
> > In this thread, we open a file descriptor associated to the target
> > namespace. It might happen that it doesn't exist yet: the kernel can
> > legitimately take its time to create one, after clone(). In this
> > case, at least on a 5.15 Linux kernel, trying to open that file again
> > always yields EACCES, and we get stuck there.
> > 
> > This only occurs if we spawn two instances of pasta very close
> > together, as it's done in the two_guests/basic case.
> > 
> > I couldn't figure out what the race condition is, yet, and especially
> > if it's a kernel issue or something we're doing wrong. However, if we
> > wait until the execvp() in the child is done, the issue disappears.
> > I'm not sure yet if it's just because of timing and this is hiding an
> > unrelated race condition.
> > 
> > The workaround consists of checking /proc/PID/exe against our own. If
> > it's different, that means execvp() already completed and we can
> > proceed. It's rather ugly, but much better than the alternative.
> > Leave a FIXME there for the moment being.
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> Weird and ugly, but seems like we need it.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Sorry, I forgot to mention: I didn't actually push this, because while
repeating that test over and over again I hit the same failure at some
point, even with this trick.

So this is probably just affecting the timing. I need to look into it
again.

-- 
Stefano


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns
  2022-11-07  9:51   ` Stefano Brivio
@ 2022-11-07 18:40     ` Stefano Brivio
  0 siblings, 0 replies; 4+ messages in thread
From: Stefano Brivio @ 2022-11-07 18:40 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 7 Nov 2022 10:51:11 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Mon, 7 Nov 2022 14:17:44 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Nov 04, 2022 at 02:53:28AM +0100, Stefano Brivio wrote:  
> > > This happens about every third time on the two_guests/basic test, and
> > > on that test only: we clone() twice, first to spawn a child, then to
> > > spawn a thread to check that we can enter the target network
> > > namespace.
> > > 
> > > In this thread, we open a file descriptor associated to the target
> > > namespace. It might happen that it doesn't exist yet: the kernel can
> > > legitimately take its time to create one, after clone(). In this
> > > case, at least on a 5.15 Linux kernel, trying to open that file again
> > > always yields EACCES, and we get stuck there.
> > > 
> > > This only occurs if we spawn two instances of pasta very close
> > > together, as it's done in the two_guests/basic case.
> > > 
> > > I couldn't figure out what the race condition is, yet, and especially
> > > if it's a kernel issue or something we're doing wrong. However, if we
> > > wait until the execvp() in the child is done, the issue disappears.
> > > I'm not sure yet if it's just because of timing and this is hiding an
> > > unrelated race condition.
> > > 
> > > The workaround consists of checking /proc/PID/exe against our own. If
> > > it's different, that means execvp() already completed and we can
> > > proceed. It's rather ugly, but much better than the alternative.
> > > Leave a FIXME there for the moment being.
> > > 
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>    
> > 
> > Weird and ugly, but seems like we need it.
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>  
> 
> Sorry, I forgot to mention: I didn't actually push this, because while
> repeating that test over and over again I hit the same failure at some
> point, even with this trick.
> 
> So this is probably just affecting the timing. I need to look into it
> again.

Tracked at:
  https://bugs.passt.top/show_bug.cgi?id=36

for the moment being.

-- 
Stefano


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-07 18:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04  1:53 [PATCH] pasta: Workaround: wait for execvp() to be done in child before entering netns Stefano Brivio
2022-11-07  3:17 ` David Gibson
2022-11-07  9:51   ` Stefano Brivio
2022-11-07 18:40     ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).