public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* Race condition with pasta COMMAND...
@ 2023-01-30 18:08 Paul Holzinger
  2023-01-31 13:02 ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Holzinger @ 2023-01-30 18:08 UTC (permalink / raw)
  To: passt-dev

Hi all,

while debugging some things I used `./pasta --config-net -- nslookup 
google.com 1.1.1.1` to test dns.
The problem is that does not work because the nslookup process will be 
executed before pasta is
ready with the netns setup, i.e. compare `./pasta --config-net -- ip a`.

So a workaround is to spawn a shell and sleep: `sh -c "sleep 1; nslookup 
google.com 1.1.1.1"`
However this is ugly and does not ensure that the netns is ready after 
one second. As a user
I would expect pasta to wait until the setup is finished before it calls 
exec().
I can send a patch if you agree and I find some time.

---
Paul


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Race condition with pasta COMMAND...
  2023-01-30 18:08 Race condition with pasta COMMAND Paul Holzinger
@ 2023-01-31 13:02 ` Stefano Brivio
  2023-01-31 18:00   ` Paul Holzinger
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Brivio @ 2023-01-31 13:02 UTC (permalink / raw)
  To: Paul Holzinger; +Cc: passt-dev

Hi Paul,

On Mon, 30 Jan 2023 19:08:14 +0100
Paul Holzinger <pholzing@redhat.com> wrote:

> Hi all,
> 
> while debugging some things I used `./pasta --config-net -- nslookup 
> google.com 1.1.1.1` to test dns.
> The problem is that does not work because the nslookup process will be 
> executed before pasta is
> ready with the netns setup, i.e. compare `./pasta --config-net -- ip a`.

Thanks for the report. I also hit this a couple of months ago but I
couldn't find yet the time to deal with it:

  https://bugs.passt.top/show_bug.cgi?id=37

> So a workaround is to spawn a shell and sleep: `sh -c "sleep 1; nslookup 
> google.com 1.1.1.1"`
> However this is ugly and does not ensure that the netns is ready after 
> one second. As a user
> I would expect pasta to wait until the setup is finished before it calls 
> exec().

Absolutely, yes.

As I mentioned on that ticket, I *think* that the only way to make sure
the setup is actually complete is to query back via netlink addresses
and routes we configured -- simply waiting until we successfully sent
netlink messages isn't enough, because it takes a (substantial) while
until addresses and routes are actually available.

> I can send a patch if you agree and I find some time.

That would be great, thanks in advance!

If you get to it, I think you could reuse nl_route() and nl_addr() from
netlink.c to perform the checks -- they might need to be extended a
bit, I'm not sure.

-- 
Stefano


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Race condition with pasta COMMAND...
  2023-01-31 13:02 ` Stefano Brivio
@ 2023-01-31 18:00   ` Paul Holzinger
  2023-01-31 18:43     ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Holzinger @ 2023-01-31 18:00 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

Hi Stefano,

On 31/01/2023 14:02, Stefano Brivio wrote:
> Hi Paul,
>
> On Mon, 30 Jan 2023 19:08:14 +0100
> Paul Holzinger <pholzing@redhat.com> wrote:
>
>> Hi all,
>>
>> while debugging some things I used `./pasta --config-net -- nslookup
>> google.com 1.1.1.1` to test dns.
>> The problem is that does not work because the nslookup process will be
>> executed before pasta is
>> ready with the netns setup, i.e. compare `./pasta --config-net -- ip a`.
> Thanks for the report. I also hit this a couple of months ago but I
> couldn't find yet the time to deal with it:
>
>    https://bugs.passt.top/show_bug.cgi?id=37
>
>> So a workaround is to spawn a shell and sleep: `sh -c "sleep 1; nslookup
>> google.com 1.1.1.1"`
>> However this is ugly and does not ensure that the netns is ready after
>> one second. As a user
>> I would expect pasta to wait until the setup is finished before it calls
>> exec().
> Absolutely, yes.
>
> As I mentioned on that ticket, I *think* that the only way to make sure
> the setup is actually complete is to query back via netlink addresses
> and routes we configured -- simply waiting until we successfully sent
> netlink messages isn't enough, because it takes a (substantial) while
> until addresses and routes are actually available.
>
Is there any reason why we would explicitly need to query netlink
after the setup is done?
With NLM_F_ACK  it should wait long enough, no? We use it like that in
podman and never experienced an problem with the network not being
ready apart from ipv6 DAD and I don't think we need worry about this here.

From  a quick test, at least for my use case it seems to be working when
I hold the exec until the isolate_prefork() call.
>> I can send a patch if you agree and I find some time.
> That would be great, thanks in advance!
>
> If you get to it, I think you could reuse nl_route() and nl_addr() from
> netlink.c to perform the checks -- they might need to be extended a
> bit, I'm not sure.
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Race condition with pasta COMMAND...
  2023-01-31 18:00   ` Paul Holzinger
@ 2023-01-31 18:43     ` Stefano Brivio
  0 siblings, 0 replies; 4+ messages in thread
From: Stefano Brivio @ 2023-01-31 18:43 UTC (permalink / raw)
  To: Paul Holzinger; +Cc: passt-dev

On Tue, 31 Jan 2023 19:00:02 +0100
Paul Holzinger <pholzing@redhat.com> wrote:

> Hi Stefano,
> 
> On 31/01/2023 14:02, Stefano Brivio wrote:
> > Hi Paul,
> >
> > On Mon, 30 Jan 2023 19:08:14 +0100
> > Paul Holzinger <pholzing@redhat.com> wrote:
> >  
> >> Hi all,
> >>
> >> while debugging some things I used `./pasta --config-net -- nslookup
> >> google.com 1.1.1.1` to test dns.
> >> The problem is that does not work because the nslookup process will be
> >> executed before pasta is
> >> ready with the netns setup, i.e. compare `./pasta --config-net -- ip a`.  
> > Thanks for the report. I also hit this a couple of months ago but I
> > couldn't find yet the time to deal with it:
> >
> >    https://bugs.passt.top/show_bug.cgi?id=37
> >  
> >> So a workaround is to spawn a shell and sleep: `sh -c "sleep 1; nslookup
> >> google.com 1.1.1.1"`
> >> However this is ugly and does not ensure that the netns is ready after
> >> one second. As a user
> >> I would expect pasta to wait until the setup is finished before it calls
> >> exec().  
> > Absolutely, yes.
> >
> > As I mentioned on that ticket, I *think* that the only way to make sure
> > the setup is actually complete is to query back via netlink addresses
> > and routes we configured -- simply waiting until we successfully sent
> > netlink messages isn't enough, because it takes a (substantial) while
> > until addresses and routes are actually available.
> >  
> Is there any reason why we would explicitly need to query netlink
> after the setup is done?
> With NLM_F_ACK  it should wait long enough, no? We use it like that in
> podman and never experienced an problem with the network not being
> ready apart from ipv6 DAD and I don't think we need worry about this here.

Perhaps in practice yes, but we don't have a formal guarantee that any
given routing table/trie/structure is actually synchronised just
because we get an acknowledgement for the netlink message (that's all
NLM_F_ACK says).

In terms of RCU, by the time we read NLM_F_ACK in passt, we know that
any pending (from the address/routing update perspective) read-side
critical section must have exited.

However, routing tables have generation IDs and there might be other
mechanisms (current or future) that prevent us from making this
assumption. Also, given multiple CPU threads on a machine, I don't
think it's really that straightforward.

> From  a quick test, at least for my use case it seems to be working when
> I hold the exec until the isolate_prefork() call.

I tried that too in the past and I had some "failures", I don't
remember if that was on "ip route show" or "ip address show".

Still, it's by all means an improvement on the original, and if it
works for you reliably that's a good sign, so I'd definitely welcome
that patch.

I'm just not convinced it's a complete fix, so I would also welcome a
more involved patch doing the check via netlink. :)

-- 
Stefano


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-31 18:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30 18:08 Race condition with pasta COMMAND Paul Holzinger
2023-01-31 13:02 ` Stefano Brivio
2023-01-31 18:00   ` Paul Holzinger
2023-01-31 18:43     ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).