On Tue, Jan 13, 2026 at 01:12:01AM +0100, Stefano Brivio wrote: > On Mon, 12 Jan 2026 14:42:39 +1100 > David Gibson wrote: > > > On Sun, Jan 11, 2026 at 12:33:14AM +0100, Stefano Brivio wrote: > > > On Mon, 5 Jan 2026 19:28:48 +1100 > > > David Gibson wrote: > > > > > > > We need to support (as best we can) older kernels which don't allow > > > > unprivilieged processes to use the SO_BINDTODEVICE socket option. > > > > > > Nit: unprivileged > > > > > > > Fallcaks for that case are controlled by the c->no_bindtodevice variable. > > > > > > Fallbacks > > > > Oops & oops. Fixed. > > > > > > Currently testing behaviour of those fallbacks requires setting up a test > > > > system with a kernel that doesn't support the option, which is pretty > > > > awkward. We can test it almost as well and much more easily by adding a > > > > command line option to explicitly disable use of SO_BINDTODEVICE. > > > > > > It's kind of hard to understand if this patch entirely does that, I > > > think. > > > > Well, it forces c->no_bindtodevice to be true. If we attempt to use > > SO_BINDTODEVICE in that case, it's a bug elsewhere. > > Yes... but we wouldn't find it with this patch. We would only find it > with a kernel actually not supporting it, or by replacing all the > setsockopt() calls with something else. True. What I was looking to test with this was behaviour of the higher level workarounds - e.g. that we split -[TU] forwards into 127.0.0.1 and ::1 instead of using *%lo. > > > We still have a separate, implicit probing of SO_BINDTODEVICE in > > > sock_l4_(), which is perhaps excluded by c->no_bindtodevice (but then > > > the comment is misleading?). > > > > It should indeed be excluded because we should never call sock_l4_() > > with a non-empty ifname if !c->no_bindtodevice. It's not really > > probing, because we outright fail sock_l4_(), there's no fallback > > there. The error path is there: > > * As a backstop if there is a bug elsewhere meaning we do call this > > with non-empty ifname > > * If the SO_BINDTODEVICE call fails for a reason other than being > > globally unavailable (non existent interface, out of memory, > > sufficiently perverse selinux module). > > > > Given the above, probably should be an err(), and the comment there is > > no longer accurate / helpful (we already moved it to > > sock_probe_features()). I've made those changes for the next spin. > > Ah, okay. > > > > > Like --no-splice this is envisaged as something for developers' and > > > > testers' convenience, not a supported option for end users. The man page > > > > text reflects that. > > > > > > I never really understood the point of --no-splice, as there was no > > > user request whatsoever behind it, but fine, the argument was that it > > > added some needed functionality, even though I couldn't quite grasp > > > which one it was. > > > > That was never the argument from _me_ for --no-splice. For me it was > > always that it was useful for development / testing / debugging, not > > that it was (directly) useful to end users. > > Right, I think Jon meant it was useful to end users. Otherwise, I would > have argued, it should be mentioned in the man page, and, I would have > argued further, the option shouldn't exist at all. > > > That's true in at least > > two ways: > > * Allows testing non-splice functionality without having to either > > use passt or create some non-loopback addresses > > ...but without a loopback address we can't use the tap path anyway. I'm not sure what you mean here. If I want to exercise something on the tap path I can use: $ pasta --no-splice [whatever else] [...] $ socat STDIO TCP:localhost:12345 and I don't need to look up my host's current global IP. Or if I want to test tap with multiple different host-side oaddrs, I can use 127.0.0.0/8 without > > * Lets us ask a user reporting a problem to try --no-splice if we > > suspect, but aren't sure that it's specific to the splice logic > > ...which we never had to do (because it's obvious whether they're using > the splice logic or not, I simply ask what kind of address they're > using). Admittedly, I don't think we've ever used it like that since it was introduced. I do know that before it existed there were several bugs where it would have been helpful (obviously not essential) to try that. > > My case for --no-bindtodevice is the same: it's useful to me (and > > therefore I'm guessing to other developers and testers). > > I have some doubts about other developers and testers, in the sense > that to me it really looks like something you need just for the > implementation. Eh, maybe. > > The man page update is pretty explicit about that. > > Sure, better than --no-splice. > > > > However, with this, the question is where we draw the line. There are > > > probably other options we could use to make debugging or testing > > > slightly simpler, but if they don't offer actual functionality, we > > > always kept them out so far. > > > > I mean, maybe, none are immediately occurring to me. If they do in > > future, I think we should consider adding them. > > The thing is, 'passt -h' already reports 117 lines. It's still somewhat > usable, but 200 lines would be substantially less usable, I think. > > A counter-example (at least for me) is 'qemu-system-x86_64 -h', 524 > lines on my build. I don't think that's usable and I don't think we > should go there. > > > Note that > > --no-splice, and especially --no-bindtodevice are extremely simple to > > implement. I would not be arguing for them if they were more complex. > > My concern isn't really about complexity of the implementation, rather > about the fact that we add more command line options. Users don't need > them, but they have to scroll through them (in --help output and man > page) just because we needed them (quite likely) once. That's a reasonable point. > > > That's because we already have a long list of options and making it > > > unnecessarily longer is a disservice to users, I think. > > > > That's a valid point. Would it be more palatable to you if we made > > these suboptions of some explicit "developer hacks" option? (--hacks? > > --debugopt? --devtest?) > > At that point the hassle looks comparable to a mandatory macro > implementing (or not) the setsockopt(), which can be selected at build > time. True, a build time option might do almost as well. > But anyway, not really, because they would also need to be documented > command-line options. How would we use them otherwise as developers? Well, we could limit --help and the man page to just stating the existence of the top-level option and a pointer to a HACKS.md or whatever for the details. And we could make it explicitly subject to change without notice between versions. > > > Would using something like this: > > > > > > sed -i 's/(\(setsockopt([a-z]*, SOL_SOCKET, SO_BINDTODEVICE\)/((errno = EPERM) || \1/g' *.c > > > > > > be totally outrageous, for testing purposes? > > > > Totally outrageous, no. A bit more hassle, yes. > > ...what about a script? Or a macro with a #define? > > > > It has the advantage of making it easier to verify if we're really > > > disabling the usage of SO_BINDTODEVICE on all the paths (together with > > > grep / git / editors), and not introducing additional command line > > > options. > > > > > > Another trick I use sometimes to selectively disable or enable kernel > > > features is to handle system calls via seitan, in this case the > > > (simple) recipe would something like: > > > > > > [ > > > { > > > "match": [ > > > { "setsockopt": { "level": socket", "name": "bindtodevice" } } > > > ], > > > "return": { "value": "EPERM", "error": -1 } > > > } > > > ] > > > > > > but I haven't implemented setsockopt() yet. :( > > > > > > > Signed-off-by: David Gibson > > > > --- > > > > conf.c | 2 ++ > > > > passt.1 | 6 ++++++ > > > > 2 files changed, 8 insertions(+) > > > > > > > > diff --git a/conf.c b/conf.c > > > > index ceb9aa55..70ea168c 100644 > > > > --- a/conf.c > > > > +++ b/conf.c > > > > @@ -962,6 +962,7 @@ static void usage(const char *name, FILE *f, int status) > > > > " --no-ndp Disable NDP responses\n" > > > > " --no-dhcpv6 Disable DHCPv6 server\n" > > > > " --no-ra Disable router advertisements\n" > > > > + " --no-bindtodevice Disable SO_BINDTODEVICE\n" > > > > " --freebind Bind to any address for forwarding\n" > > > > " --no-map-gw Don't map gateway address to host\n" > > > > " -4, --ipv4-only Enable IPv4 operation only\n" > > > > @@ -1454,6 +1455,7 @@ void conf(struct ctx *c, int argc, char **argv) > > > > {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, > > > > {"no-ndp", no_argument, &c->no_ndp, 1 }, > > > > {"no-ra", no_argument, &c->no_ra, 1 }, > > > > + {"no-bindtodevice", no_argument, &c->no_bindtodevice, 1}, > > > > {"no-splice", no_argument, &c->no_splice, 1 }, > > > > {"freebind", no_argument, &c->freebind, 1 }, > > > > {"no-map-gw", no_argument, &no_map_gw, 1 }, > > > > diff --git a/passt.1 b/passt.1 > > > > index db0d6620..4859d9e5 100644 > > > > --- a/passt.1 > > > > +++ b/passt.1 > > > > @@ -348,6 +348,12 @@ namespace will be silently dropped. > > > > Disable Router Advertisements. Router Solicitations coming from guest or target > > > > namespace will be ignored. > > > > > > > > +.TP > > > > +.BR \-\-no-bindtodevice > > > > +Development/testing option, do not use. Disables use of > > > > +SO_BINDTODEVICE socket option. Implicitly enabled on older kernels > > > > +which don't permit unprivileged use of SO_BINDTODEVICE. > > > > + > > > > .TP > > > > .BR \-\-freebind > > > > Allow any binding address to be specified for \fB-t\fR and \fB-u\fR > > > > > > The change looks otherwise good to me... I just hope we can avoid it > > > somehow, but if not, so be it. > > > > I mean, it's not essential to anything that follows, but it was useful > > to me during testing. If you really don't want it, well, I'll cope. > > I'm not sure but... if the threshold is "useful during testing" we > should also build something reordering TCP segments so that we can > reproduce https://bugs.passt.top/show_bug.cgi?id=159 from time to time. > > And that could actually be a clean and relatively simple implementation, > but it just adds noise to the documentation. > > I don't see a big damage we do with two extra options, but... then > maybe we should we stop at 5? 10? Hrm, yeah. Ok, you convinced me for now, I'll drop this one. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson