On Sat, Sep 17, 2022 at 10:44:41AM +0200, Stefano Brivio wrote: > On Sat, 17 Sep 2022 13:32:45 +1000 > David Gibson wrote: > > > On Sat, Sep 17, 2022 at 01:55:34AM +0200, Stefano Brivio wrote: > > > There are some 'sleep 1' commands between starting the socat server > > > and its corresponding client to avoid races due to the server not > > > being ready as we start sending data. > > > > > > However, those don't cover all the cases where we might need them, > > > and in some cases the sleep command actually ended up being before > > > the server even starts. > > > > > > This fixes occasional failures in TCP and UDP simple transfer tests, > > > that became apparent with the new command dispatch mechanism. > > > > > > Signed-off-by: Stefano Brivio > > > > Heh, I have a very similar patch in my upcoming series. I used sleep > > 0.1 though, to avoid taking so long, which seems to be sufficient in > > practice. > > Just mind POSIX only specifies integers as argument for sleep(1), a > pattern I commonly use is: > > sleep 0.1 || sleep 1 Ah, right. > > I did look for a proper way to wait for the socat server to be ready, > > but I couldn't find anything workable. > > I guess we could enable logging and look for "starting accept loop", > from _xioopen_listen() in xio-listen.c. Yeah, I suppose. > Anyway, right now, I'm just trying to get the tests to complete with > all your pending series, because it's been a while. This doesn't look > too bad, we can try to improve it later. Yeah, fair enough. When I rebase I'll see if there's any refinements I can make relatively easily. > > I thought I could retry on the > > client side, but that doesn't work (at least for host to guest > > connections) because passt is always listening on the forwarded ports, > > so the client won't see a connection refused from the actual server. > > I don't think there's any sane way to propagate connection refused in > > that way, although we could and probably should forward connection > > resets outwards. > > They should be already, and (manual) retries actually worked for me, I'm not entirely sure what you mean by manual retries. I was trying to use socat's retry option, but that only seems to fire on ECONNREFUSED. > but it looks like added complexity. Before even checking the connection > state, tcp_tap_handler() checks for the RST flag: > > if (th->rst) { > conn_event(c, conn, CLOSED); > return p->count; > } > > and this will abruptly close the originating socket, which implies a > RST, on Linux. Right, I found that... but I'm not sure it is sending an RST. Setting SO_LINGER with zero timeout does send an RST, but I'm not sure it does so without it. Even if it does, it's not really helpful for this. I found that sending an RST doesn't reliably cause the client socat to exit with failure. I'm not 100% sure what's going on, but I think what can happen is that the client sends everything and completes before the RST arrives, because it all gets buffered before in passt (or at least in the kernel socket buffers associated with passt). In that case the client doesn't detect the error. > We don't send out ICMP or ICMPv6 messages, even if the guest > additionally replied with one, because we can't ("ping" sockets are > just for that -- echoes). For TCP, a simple RST is probably not as > descriptive, but that's all we can do. > > > Btw, I've also been doing a bunch of work on the static checks - with > > some different options I've brought the runtime of make cppcheck from > > ~40 minutes down to ~40 seconds :). > > Whoa, 40 minutes? For me it was never more than a couple of minutes, Yeah, about 36 minutes to be precise, I think. There's a reason I haven't been attempting to run this until now. > see https://passt.top/passt/about/#continuous-integration, > "build/static_checkers". > > I guess it depends on system headers... but in any case that's > currently taking way too long, also for myself. :) Huh, interesting. I wonder if it's because it simply isn't finding all the system headers for you. I see you have a missingIncludeSystem suppression in there, and when I removed that I found it complained about not finding the headers on Debian until I added -I/usr/include/x86_64-linux-gnu. Anyway, I have patches that fix this which I'll be sending soon. In any case, since the run time will be exponential in the number of config defines, it doesn't take a huge difference in the headers to make a vast difference to runtime. I also found that both clang-tidy and cppcheck fail for me, I have a bunch of fixes for this, but I'm not finished yet. There's a couple of tricky ones, including one with dueling errors - cppcheck says there's a redundant NULL check, but if I remove it clang-tidy says there's a NULL pointer dereference. Still working on it. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson