From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 1/3] conf: Introduce --no-bindtodevice option for testing
Date: Tue, 13 Jan 2026 14:00:18 +1100 [thread overview]
Message-ID: <aWW1Qrn-rb9I9aY9@zatzit> (raw)
In-Reply-To: <20260113011201.05a80cb7@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 11180 bytes --]
On Tue, Jan 13, 2026 at 01:12:01AM +0100, Stefano Brivio wrote:
> On Mon, 12 Jan 2026 14:42:39 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Sun, Jan 11, 2026 at 12:33:14AM +0100, Stefano Brivio wrote:
> > > On Mon, 5 Jan 2026 19:28:48 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > > We need to support (as best we can) older kernels which don't allow
> > > > unprivilieged processes to use the SO_BINDTODEVICE socket option.
> > >
> > > Nit: unprivileged
> > >
> > > > Fallcaks for that case are controlled by the c->no_bindtodevice variable.
> > >
> > > Fallbacks
> >
> > Oops & oops. Fixed.
> >
> > > > Currently testing behaviour of those fallbacks requires setting up a test
> > > > system with a kernel that doesn't support the option, which is pretty
> > > > awkward. We can test it almost as well and much more easily by adding a
> > > > command line option to explicitly disable use of SO_BINDTODEVICE.
> > >
> > > It's kind of hard to understand if this patch entirely does that, I
> > > think.
> >
> > Well, it forces c->no_bindtodevice to be true. If we attempt to use
> > SO_BINDTODEVICE in that case, it's a bug elsewhere.
>
> Yes... but we wouldn't find it with this patch. We would only find it
> with a kernel actually not supporting it, or by replacing all the
> setsockopt() calls with something else.
True. What I was looking to test with this was behaviour of the
higher level workarounds - e.g. that we split -[TU] forwards into
127.0.0.1 and ::1 instead of using *%lo.
> > > We still have a separate, implicit probing of SO_BINDTODEVICE in
> > > sock_l4_(), which is perhaps excluded by c->no_bindtodevice (but then
> > > the comment is misleading?).
> >
> > It should indeed be excluded because we should never call sock_l4_()
> > with a non-empty ifname if !c->no_bindtodevice. It's not really
> > probing, because we outright fail sock_l4_(), there's no fallback
> > there. The error path is there:
> > * As a backstop if there is a bug elsewhere meaning we do call this
> > with non-empty ifname
> > * If the SO_BINDTODEVICE call fails for a reason other than being
> > globally unavailable (non existent interface, out of memory,
> > sufficiently perverse selinux module).
> >
> > Given the above, probably should be an err(), and the comment there is
> > no longer accurate / helpful (we already moved it to
> > sock_probe_features()). I've made those changes for the next spin.
>
> Ah, okay.
>
> > > > Like --no-splice this is envisaged as something for developers' and
> > > > testers' convenience, not a supported option for end users. The man page
> > > > text reflects that.
> > >
> > > I never really understood the point of --no-splice, as there was no
> > > user request whatsoever behind it, but fine, the argument was that it
> > > added some needed functionality, even though I couldn't quite grasp
> > > which one it was.
> >
> > That was never the argument from _me_ for --no-splice. For me it was
> > always that it was useful for development / testing / debugging, not
> > that it was (directly) useful to end users.
>
> Right, I think Jon meant it was useful to end users. Otherwise, I would
> have argued, it should be mentioned in the man page, and, I would have
> argued further, the option shouldn't exist at all.
>
> > That's true in at least
> > two ways:
> > * Allows testing non-splice functionality without having to either
> > use passt or create some non-loopback addresses
>
> ...but without a loopback address we can't use the tap path anyway.
I'm not sure what you mean here. If I want to exercise something on
the tap path I can use:
$ pasta --no-splice [whatever else]
[...]
$ socat STDIO TCP:localhost:12345
and I don't need to look up my host's current global IP. Or if I want
to test tap with multiple different host-side oaddrs, I can use
127.0.0.0/8 without
> > * Lets us ask a user reporting a problem to try --no-splice if we
> > suspect, but aren't sure that it's specific to the splice logic
>
> ...which we never had to do (because it's obvious whether they're using
> the splice logic or not, I simply ask what kind of address they're
> using).
Admittedly, I don't think we've ever used it like that since it was
introduced. I do know that before it existed there were several bugs
where it would have been helpful (obviously not essential) to try
that.
> > My case for --no-bindtodevice is the same: it's useful to me (and
> > therefore I'm guessing to other developers and testers).
>
> I have some doubts about other developers and testers, in the sense
> that to me it really looks like something you need just for the
> implementation.
Eh, maybe.
> > The man page update is pretty explicit about that.
>
> Sure, better than --no-splice.
>
> > > However, with this, the question is where we draw the line. There are
> > > probably other options we could use to make debugging or testing
> > > slightly simpler, but if they don't offer actual functionality, we
> > > always kept them out so far.
> >
> > I mean, maybe, none are immediately occurring to me. If they do in
> > future, I think we should consider adding them.
>
> The thing is, 'passt -h' already reports 117 lines. It's still somewhat
> usable, but 200 lines would be substantially less usable, I think.
>
> A counter-example (at least for me) is 'qemu-system-x86_64 -h', 524
> lines on my build. I don't think that's usable and I don't think we
> should go there.
>
> > Note that
> > --no-splice, and especially --no-bindtodevice are extremely simple to
> > implement. I would not be arguing for them if they were more complex.
>
> My concern isn't really about complexity of the implementation, rather
> about the fact that we add more command line options. Users don't need
> them, but they have to scroll through them (in --help output and man
> page) just because we needed them (quite likely) once.
That's a reasonable point.
> > > That's because we already have a long list of options and making it
> > > unnecessarily longer is a disservice to users, I think.
> >
> > That's a valid point. Would it be more palatable to you if we made
> > these suboptions of some explicit "developer hacks" option? (--hacks?
> > --debugopt? --devtest?)
>
> At that point the hassle looks comparable to a mandatory macro
> implementing (or not) the setsockopt(), which can be selected at build
> time.
True, a build time option might do almost as well.
> But anyway, not really, because they would also need to be documented
> command-line options. How would we use them otherwise as developers?
Well, we could limit --help and the man page to just stating the
existence of the top-level option and a pointer to a HACKS.md or
whatever for the details. And we could make it explicitly subject to
change without notice between versions.
> > > Would using something like this:
> > >
> > > sed -i 's/(\(setsockopt([a-z]*, SOL_SOCKET, SO_BINDTODEVICE\)/((errno = EPERM) || \1/g' *.c
> > >
> > > be totally outrageous, for testing purposes?
> >
> > Totally outrageous, no. A bit more hassle, yes.
>
> ...what about a script? Or a macro with a #define?
>
> > > It has the advantage of making it easier to verify if we're really
> > > disabling the usage of SO_BINDTODEVICE on all the paths (together with
> > > grep / git / editors), and not introducing additional command line
> > > options.
> > >
> > > Another trick I use sometimes to selectively disable or enable kernel
> > > features is to handle system calls via seitan, in this case the
> > > (simple) recipe would something like:
> > >
> > > [
> > > {
> > > "match": [
> > > { "setsockopt": { "level": socket", "name": "bindtodevice" } }
> > > ],
> > > "return": { "value": "EPERM", "error": -1 }
> > > }
> > > ]
> > >
> > > but I haven't implemented setsockopt() yet. :(
> > >
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > > conf.c | 2 ++
> > > > passt.1 | 6 ++++++
> > > > 2 files changed, 8 insertions(+)
> > > >
> > > > diff --git a/conf.c b/conf.c
> > > > index ceb9aa55..70ea168c 100644
> > > > --- a/conf.c
> > > > +++ b/conf.c
> > > > @@ -962,6 +962,7 @@ static void usage(const char *name, FILE *f, int status)
> > > > " --no-ndp Disable NDP responses\n"
> > > > " --no-dhcpv6 Disable DHCPv6 server\n"
> > > > " --no-ra Disable router advertisements\n"
> > > > + " --no-bindtodevice Disable SO_BINDTODEVICE\n"
> > > > " --freebind Bind to any address for forwarding\n"
> > > > " --no-map-gw Don't map gateway address to host\n"
> > > > " -4, --ipv4-only Enable IPv4 operation only\n"
> > > > @@ -1454,6 +1455,7 @@ void conf(struct ctx *c, int argc, char **argv)
> > > > {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 },
> > > > {"no-ndp", no_argument, &c->no_ndp, 1 },
> > > > {"no-ra", no_argument, &c->no_ra, 1 },
> > > > + {"no-bindtodevice", no_argument, &c->no_bindtodevice, 1},
> > > > {"no-splice", no_argument, &c->no_splice, 1 },
> > > > {"freebind", no_argument, &c->freebind, 1 },
> > > > {"no-map-gw", no_argument, &no_map_gw, 1 },
> > > > diff --git a/passt.1 b/passt.1
> > > > index db0d6620..4859d9e5 100644
> > > > --- a/passt.1
> > > > +++ b/passt.1
> > > > @@ -348,6 +348,12 @@ namespace will be silently dropped.
> > > > Disable Router Advertisements. Router Solicitations coming from guest or target
> > > > namespace will be ignored.
> > > >
> > > > +.TP
> > > > +.BR \-\-no-bindtodevice
> > > > +Development/testing option, do not use. Disables use of
> > > > +SO_BINDTODEVICE socket option. Implicitly enabled on older kernels
> > > > +which don't permit unprivileged use of SO_BINDTODEVICE.
> > > > +
> > > > .TP
> > > > .BR \-\-freebind
> > > > Allow any binding address to be specified for \fB-t\fR and \fB-u\fR
> > >
> > > The change looks otherwise good to me... I just hope we can avoid it
> > > somehow, but if not, so be it.
> >
> > I mean, it's not essential to anything that follows, but it was useful
> > to me during testing. If you really don't want it, well, I'll cope.
>
> I'm not sure but... if the threshold is "useful during testing" we
> should also build something reordering TCP segments so that we can
> reproduce https://bugs.passt.top/show_bug.cgi?id=159 from time to time.
>
> And that could actually be a clean and relatively simple implementation,
> but it just adds noise to the documentation.
>
> I don't see a big damage we do with two extra options, but... then
> maybe we should we stop at 5? 10?
Hrm, yeah. Ok, you convinced me for now, I'll drop this one.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2026-01-13 3:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-05 8:28 [PATCH 0/3] Allow listen functions to return fds David Gibson
2026-01-05 8:28 ` [PATCH 1/3] conf: Introduce --no-bindtodevice option for testing David Gibson
2026-01-10 23:33 ` Stefano Brivio
2026-01-12 3:42 ` David Gibson
2026-01-13 0:12 ` Stefano Brivio
2026-01-13 3:00 ` David Gibson [this message]
2026-01-05 8:28 ` [PATCH 2/3] tcp, udp, conf: Don't silently ignore listens on unsupported IP versions David Gibson
2026-01-10 23:33 ` Stefano Brivio
2026-01-12 3:48 ` David Gibson
2026-01-13 0:12 ` Stefano Brivio
2026-01-13 3:05 ` David Gibson
2026-01-05 8:28 ` [PATCH 3/3] tcp, udp: Make {tcp,udp}_listen() return socket fds David Gibson
2026-01-10 23:33 ` Stefano Brivio
2026-01-12 3:50 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWW1Qrn-rb9I9aY9@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).