From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=d4Xp+EvF; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 5C30C5A0625 for ; Tue, 13 Jan 2026 04:06:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1768273576; bh=C1yTPYyoHInlaNRematYXkg0mQzAkX9+eYbfN5U56ig=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=d4Xp+EvF92RzZBpgazhpCZb/+obUteiqcKXC9SS3sAWsJqKJcwqQ5pKZZq0XuRW6N QFS6k5p8WYYzc9h9Tn+J5c2SSVuHjaQAarWoqigmNlGR/+8qjCiKE9AvzPuXpeydXi tqpnuhHIwnfmrYWZ6HX8T1HiPGIF6ihOGK4q30VYl2sf+KBz0cfntSvJ1xNes4ppUq UytU2mlybiRQmwIB0EEerVAiln5HWAYURnoqIm1IiMyuu1SEgjR5G5orAElWLch5RL w0J0JwlHwtd3vf/rvpaJ7QWI2JZKmQee+yjLMqi4tO2UQIk+ceapU9gQF9rfwWyC3C ryB3HksYZXc5A== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dqvK41NdBz4wR5; Tue, 13 Jan 2026 14:06:16 +1100 (AEDT) Date: Tue, 13 Jan 2026 14:00:18 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/3] conf: Introduce --no-bindtodevice option for testing Message-ID: References: <20260105082850.1985300-1-david@gibson.dropbear.id.au> <20260105082850.1985300-2-david@gibson.dropbear.id.au> <20260111003314.2e24f648@elisabeth> <20260113011201.05a80cb7@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="qkjAxLP3GjHzZv0b" Content-Disposition: inline In-Reply-To: <20260113011201.05a80cb7@elisabeth> Message-ID-Hash: XUPP74VA3L2XQRE44ILXAFEMLSQTHILT X-Message-ID-Hash: XUPP74VA3L2XQRE44ILXAFEMLSQTHILT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --qkjAxLP3GjHzZv0b Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 13, 2026 at 01:12:01AM +0100, Stefano Brivio wrote: > On Mon, 12 Jan 2026 14:42:39 +1100 > David Gibson wrote: >=20 > > On Sun, Jan 11, 2026 at 12:33:14AM +0100, Stefano Brivio wrote: > > > On Mon, 5 Jan 2026 19:28:48 +1100 > > > David Gibson wrote: > > > =20 > > > > We need to support (as best we can) older kernels which don't allow > > > > unprivilieged processes to use the SO_BINDTODEVICE socket option. = =20 > > >=20 > > > Nit: unprivileged > > > =20 > > > > Fallcaks for that case are controlled by the c->no_bindtodevice var= iable. =20 > > >=20 > > > Fallbacks =20 > >=20 > > Oops & oops. Fixed. > >=20 > > > > Currently testing behaviour of those fallbacks requires setting up = a test > > > > system with a kernel that doesn't support the option, which is pret= ty > > > > awkward. We can test it almost as well and much more easily by add= ing a > > > > command line option to explicitly disable use of SO_BINDTODEVICE. = =20 > > >=20 > > > It's kind of hard to understand if this patch entirely does that, I > > > think. =20 > >=20 > > Well, it forces c->no_bindtodevice to be true. If we attempt to use > > SO_BINDTODEVICE in that case, it's a bug elsewhere. >=20 > Yes... but we wouldn't find it with this patch. We would only find it > with a kernel actually not supporting it, or by replacing all the > setsockopt() calls with something else. True. What I was looking to test with this was behaviour of the higher level workarounds - e.g. that we split -[TU] forwards into 127.0.0.1 and ::1 instead of using *%lo. > > > We still have a separate, implicit probing of SO_BINDTODEVICE in > > > sock_l4_(), which is perhaps excluded by c->no_bindtodevice (but then > > > the comment is misleading?). =20 > >=20 > > It should indeed be excluded because we should never call sock_l4_() > > with a non-empty ifname if !c->no_bindtodevice. It's not really > > probing, because we outright fail sock_l4_(), there's no fallback > > there. The error path is there: > > * As a backstop if there is a bug elsewhere meaning we do call this > > with non-empty ifname > > * If the SO_BINDTODEVICE call fails for a reason other than being > > globally unavailable (non existent interface, out of memory, > > sufficiently perverse selinux module). > >=20 > > Given the above, probably should be an err(), and the comment there is > > no longer accurate / helpful (we already moved it to > > sock_probe_features()). I've made those changes for the next spin. >=20 > Ah, okay. >=20 > > > > Like --no-splice this is envisaged as something for developers' and > > > > testers' convenience, not a supported option for end users. The ma= n page > > > > text reflects that. =20 > > >=20 > > > I never really understood the point of --no-splice, as there was no > > > user request whatsoever behind it, but fine, the argument was that it > > > added some needed functionality, even though I couldn't quite grasp > > > which one it was. =20 > >=20 > > That was never the argument from _me_ for --no-splice. For me it was > > always that it was useful for development / testing / debugging, not > > that it was (directly) useful to end users. >=20 > Right, I think Jon meant it was useful to end users. Otherwise, I would > have argued, it should be mentioned in the man page, and, I would have > argued further, the option shouldn't exist at all. >=20 > > That's true in at least > > two ways: > > * Allows testing non-splice functionality without having to either > > use passt or create some non-loopback addresses >=20 > ...but without a loopback address we can't use the tap path anyway. I'm not sure what you mean here. If I want to exercise something on the tap path I can use: $ pasta --no-splice [whatever else] [...] $ socat STDIO TCP:localhost:12345 and I don't need to look up my host's current global IP. Or if I want to test tap with multiple different host-side oaddrs, I can use 127.0.0.0/8 without=20 > > * Lets us ask a user reporting a problem to try --no-splice if we > > suspect, but aren't sure that it's specific to the splice logic >=20 > ...which we never had to do (because it's obvious whether they're using > the splice logic or not, I simply ask what kind of address they're > using). Admittedly, I don't think we've ever used it like that since it was introduced. I do know that before it existed there were several bugs where it would have been helpful (obviously not essential) to try that. > > My case for --no-bindtodevice is the same: it's useful to me (and > > therefore I'm guessing to other developers and testers). >=20 > I have some doubts about other developers and testers, in the sense > that to me it really looks like something you need just for the > implementation. Eh, maybe. > > The man page update is pretty explicit about that. >=20 > Sure, better than --no-splice. >=20 > > > However, with this, the question is where we draw the line. There are > > > probably other options we could use to make debugging or testing > > > slightly simpler, but if they don't offer actual functionality, we > > > always kept them out so far. =20 > >=20 > > I mean, maybe, none are immediately occurring to me. If they do in > > future, I think we should consider adding them. >=20 > The thing is, 'passt -h' already reports 117 lines. It's still somewhat > usable, but 200 lines would be substantially less usable, I think. >=20 > A counter-example (at least for me) is 'qemu-system-x86_64 -h', 524 > lines on my build. I don't think that's usable and I don't think we > should go there. >=20 > > Note that > > --no-splice, and especially --no-bindtodevice are extremely simple to > > implement. I would not be arguing for them if they were more complex. >=20 > My concern isn't really about complexity of the implementation, rather > about the fact that we add more command line options. Users don't need > them, but they have to scroll through them (in --help output and man > page) just because we needed them (quite likely) once. That's a reasonable point. > > > That's because we already have a long list of options and making it > > > unnecessarily longer is a disservice to users, I think. =20 > >=20 > > That's a valid point. Would it be more palatable to you if we made > > these suboptions of some explicit "developer hacks" option? (--hacks? > > --debugopt? --devtest?) >=20 > At that point the hassle looks comparable to a mandatory macro > implementing (or not) the setsockopt(), which can be selected at build > time. True, a build time option might do almost as well. > But anyway, not really, because they would also need to be documented > command-line options. How would we use them otherwise as developers? Well, we could limit --help and the man page to just stating the existence of the top-level option and a pointer to a HACKS.md or whatever for the details. And we could make it explicitly subject to change without notice between versions. > > > Would using something like this: > > >=20 > > > sed -i 's/(\(setsockopt([a-z]*, SOL_SOCKET, SO_BINDTODEVICE\)/((err= no =3D EPERM) || \1/g' *.c > > >=20 > > > be totally outrageous, for testing purposes? =20 > >=20 > > Totally outrageous, no. A bit more hassle, yes. >=20 > ...what about a script? Or a macro with a #define? >=20 > > > It has the advantage of making it easier to verify if we're really > > > disabling the usage of SO_BINDTODEVICE on all the paths (together with > > > grep / git / editors), and not introducing additional command line > > > options. > > >=20 > > > Another trick I use sometimes to selectively disable or enable kernel > > > features is to handle system calls via seitan, in this case the > > > (simple) recipe would something like: > > >=20 > > > [ > > > { > > > "match": [ > > > { "setsockopt": { "level": socket", "name": "bindtodevice" } } > > > ], > > > "return": { "value": "EPERM", "error": -1 } > > > } > > > ] > > >=20 > > > but I haven't implemented setsockopt() yet. :( > > > =20 > > > > Signed-off-by: David Gibson > > > > --- > > > > conf.c | 2 ++ > > > > passt.1 | 6 ++++++ > > > > 2 files changed, 8 insertions(+) > > > >=20 > > > > diff --git a/conf.c b/conf.c > > > > index ceb9aa55..70ea168c 100644 > > > > --- a/conf.c > > > > +++ b/conf.c > > > > @@ -962,6 +962,7 @@ static void usage(const char *name, FILE *f, in= t status) > > > > " --no-ndp Disable NDP responses\n" > > > > " --no-dhcpv6 Disable DHCPv6 server\n" > > > > " --no-ra Disable router advertisements\n" > > > > + " --no-bindtodevice Disable SO_BINDTODEVICE\n" > > > > " --freebind Bind to any address for forwarding\n" > > > > " --no-map-gw Don't map gateway address to host\n" > > > > " -4, --ipv4-only Enable IPv4 operation only\n" > > > > @@ -1454,6 +1455,7 @@ void conf(struct ctx *c, int argc, char **arg= v) > > > > {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, > > > > {"no-ndp", no_argument, &c->no_ndp, 1 }, > > > > {"no-ra", no_argument, &c->no_ra, 1 }, > > > > + {"no-bindtodevice", no_argument, &c->no_bindtodevice, 1}, > > > > {"no-splice", no_argument, &c->no_splice, 1 }, > > > > {"freebind", no_argument, &c->freebind, 1 }, > > > > {"no-map-gw", no_argument, &no_map_gw, 1 }, > > > > diff --git a/passt.1 b/passt.1 > > > > index db0d6620..4859d9e5 100644 > > > > --- a/passt.1 > > > > +++ b/passt.1 > > > > @@ -348,6 +348,12 @@ namespace will be silently dropped. > > > > Disable Router Advertisements. Router Solicitations coming from gu= est or target > > > > namespace will be ignored. > > > > =20 > > > > +.TP > > > > +.BR \-\-no-bindtodevice > > > > +Development/testing option, do not use. Disables use of > > > > +SO_BINDTODEVICE socket option. Implicitly enabled on older kernels > > > > +which don't permit unprivileged use of SO_BINDTODEVICE. > > > > + > > > > .TP > > > > .BR \-\-freebind > > > > Allow any binding address to be specified for \fB-t\fR and \fB-u\f= R =20 > > >=20 > > > The change looks otherwise good to me... I just hope we can avoid it > > > somehow, but if not, so be it. =20 > >=20 > > I mean, it's not essential to anything that follows, but it was useful > > to me during testing. If you really don't want it, well, I'll cope. >=20 > I'm not sure but... if the threshold is "useful during testing" we > should also build something reordering TCP segments so that we can > reproduce https://bugs.passt.top/show_bug.cgi?id=3D159 from time to time. >=20 > And that could actually be a clean and relatively simple implementation, > but it just adds noise to the documentation. >=20 > I don't see a big damage we do with two extra options, but... then > maybe we should we stop at 5? 10? Hrm, yeah. Ok, you convinced me for now, I'll drop this one. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --qkjAxLP3GjHzZv0b Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmlltTAACgkQzQJF27ox 2GczhA//fofY5x+dRp+KZY5N95OPN/ou6QDCQ0GnKjoJivcZpIR6wxWGprESBv9o gMw5t94ibXqSMPWlYyDURVwk/HvZlwOR+EbC9GNXjr68o7cY7U3ukdPv36dOdPGF 4SL4XtUfuKadQzXOUvViIznFdCJMBQS+heQV3kjlGfidg13Y2ehwggne1JFJQJb8 BezNvFH/0c9uh+YOuNO6DlNi4Wz67D5QnJdaDX1z1C/48T7NTk+ULVbo2tghfPac DskK/OXAoVVt8cw0tdyhEg4OAxgBdYR/hMaPFGLeewAB/OOfdQY+Eiqluk8FpkTI 7tncuqYDgwOecV8diYuh6SlQfI8bfJcJ8ngFpUCYCwepw2woWotcMAChr9DKHrx9 ZgxVNaxBgCj90rvnr8UF1Lsa0ZlRIy2SgYcIZBL3668m+kTouVa3vrCOxwOZ2n6P KHEbr7wA4VN4Gak8ic3Bg4aEY9G38pBYwcBUAcbnxOSB/bNlexOH4v/NHm8qbyKV 2WaRmfCH9HLY0sFEnc6FyOREUlnSwfY97bdGQ/MYXh4KYonyP0fFegunQ8A41Ov4 AY05xWgo8KksaFmsCRjKYyJevRKdXIPMLLjiPkvovk8d8PgMCV1chmeVttt2Gt0Q LS1UOcZnYMZusqXnxoFL8yvSTRo6/Bun/LPmzUxvsx0m13e338s= =BdX3 -----END PGP SIGNATURE----- --qkjAxLP3GjHzZv0b--