From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=aJO1gx3g; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id BCC1E5A0620 for ; Thu, 30 Jan 2025 08:44:01 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1738223028; bh=EL8AI7ZrflZM7j4hWhbgFJeMaVd5WVDkIaefprIHEmo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aJO1gx3guxAGdbiSsppjLi/ecLyJXNw0Gg5hRWKzvGYNkthWPe6HxUEShvP6kI0Uf 4Io5OaILwcPJjrEUWtFgk+j9s1nvGy3kTx33WpfWCM8dUvbL4JkfUHV2CPLdDIg7VH gVVqW0ANscFmmb9ijCdQCdDgjdnsJzFySuOFL2Jd9EGSi8JVS2CL+RcKqROegZq69L ocAC6+blr1re3vYQ/DmXUKmYnvKuqLt7Q0QJhBP6ZP48ToeK9vimbI2PdESDM1C7kl SXduEgalAXlSwUmXxn3Zqyyf9e1FuqgoP4KhSl8jHXiBBGDw+fMh4BIkzcXiAmHtG6 9QuJ23t/CHrDA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Yk9xw2Grxz4wgp; Thu, 30 Jan 2025 18:43:48 +1100 (AEDT) Date: Thu, 30 Jan 2025 18:43:34 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 7/7] Introduce passt-repair Message-ID: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-8-sbrivio@redhat.com> <20250128075131.268f85d1@elisabeth> <20250129080428.472744aa@elisabeth> <20250130055543.1020dec7@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5tXv82z32UX4kuc6" Content-Disposition: inline In-Reply-To: <20250130055543.1020dec7@elisabeth> Message-ID-Hash: BZRNZD6RXNQFBQ4JH7CJANMJ4AKBKBVE X-Message-ID-Hash: BZRNZD6RXNQFBQ4JH7CJANMJ4AKBKBVE X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --5tXv82z32UX4kuc6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 30, 2025 at 05:55:43AM +0100, Stefano Brivio wrote: > On Thu, 30 Jan 2025 11:53:08 +1100 > David Gibson wrote: >=20 > > On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote: > > > On Wed, 29 Jan 2025 12:29:27 +1100 > > > David Gibson wrote: > > > =20 > > > > On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano Brivio wrote: =20 > > > > > On Tue, 28 Jan 2025 12:51:59 +1100 > > > > > David Gibson wrote: > > > > > =20 > > > > > > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote:= =20 > > > > > > > A privileged helper to set/clear TCP_REPAIR on sockets on beh= alf of > > > > > > > passt. Not used yet. > > > > > > >=20 > > > > > > > Signed-off-by: Stefano Brivio > > > > > > > --- > > > > > > > Makefile | 10 +++-- > > > > > > > passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++= ++++++++++ > > > > > > > 2 files changed, 118 insertions(+), 3 deletions(-) > > > > > > > create mode 100644 passt-repair.c > > > > > > >=20 > > > > > > > diff --git a/Makefile b/Makefile > > > > > > > index 1383875..1b71cb0 100644 > > > > > > > --- a/Makefile > > > > > > > +++ b/Makefile > > > > > > > @@ -42,7 +42,8 @@ PASST_SRCS =3D arch.c arp.c checksum.c conf= =2Ec dhcp.c dhcpv6.c flow.c fwd.c \ > > > > > > > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_= vu.c util.c \ > > > > > > > vhost_user.c virtio.c vu_common.c > > > > > > > QRAP_SRCS =3D qrap.c > > > > > > > -SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > > > > > > > +PASST_REPAIR_SRCS =3D passt-repair.c > > > > > > > +SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > > > > > > =20 > > > > > > > MANPAGES =3D passt.1 pasta.1 qrap.1 > > > > > > > =20 > > > > > > > @@ -72,9 +73,9 @@ mandir ?=3D $(datarootdir)/man > > > > > > > man1dir ?=3D $(mandir)/man1 > > > > > > > =20 > > > > > > > ifeq ($(TARGET_ARCH),x86_64) > > > > > > > -BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap > > > > > > > +BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap passt-repair > > > > > > > else > > > > > > > -BIN :=3D passt pasta qrap > > > > > > > +BIN :=3D passt pasta qrap passt-repair > > > > > > > endif > > > > > > > =20 > > > > > > > all: $(BIN) $(MANPAGES) docs > > > > > > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > > > > > > > qrap: $(QRAP_SRCS) passt.h > > > > > > > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=3D\"$(TARGET_AR= CH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS) > > > > > > > =20 > > > > > > > +passt-repair: $(PASST_REPAIR_SRCS) > > > > > > > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -= o passt-repair $(LDFLAGS) > > > > > > > + > > > > > > > valgrind: EXTRA_SYSCALLS +=3D rt_sigprocmask rt_sigtimedwait= rt_sigaction \ > > > > > > > rt_sigreturn getpid gettid kill clock_gettime mmap \ > > > > > > > mmap2 munmap open unlink gettimeofday futex statx \ > > > > > > > diff --git a/passt-repair.c b/passt-repair.c > > > > > > > new file mode 100644 > > > > > > > index 0000000..e9b9609 > > > > > > > --- /dev/null > > > > > > > +++ b/passt-repair.c > > > > > > > @@ -0,0 +1,111 @@ > > > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > > > > > + > > > > > > > +/* PASST - Plug A Simple Socket Transport > > > > > > > + * for qemu/UNIX domain socket mode > > > > > > > + * > > > > > > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAI= R on sockets > > > > > > > + * > > > > > > > + * Copyright (c) 2025 Red Hat GmbH > > > > > > > + * Author: Stefano Brivio > > > > > > > + * > > > > > > > + * Connect to passt via UNIX domain socket, receive sockets = via SCM_RIGHTS along > > > > > > > + * with commands mapping to TCP_REPAIR values, and switch re= pair mode on or > > > > > > > + * off. Reply by echoing the command. Exit if the command is= INT_MAX. > > > > > > > + */ > > > > > > > + > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > + > > > > > > > +#include > > > > > > > + > > > > > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm= =2Eh), not in UAPI */ > > > > > > > + > > > > > > > +int main(int argc, char **argv) > > > > > > > +{ > > > > > > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > > > > > > > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > > > > > > > + struct sockaddr_un a =3D { AF_UNIX, "" }; > > > > > > > + int cmd, fds[SCM_MAX_FD], s, ret, i; > > > > > > > + struct cmsghdr *cmsg; > > > > > > > + struct msghdr msg; > > > > > > > + struct iovec iov; > > > > > > > + > > > > > > > + iov =3D (struct iovec){ &cmd, sizeof(cmd) }; =20 > > > > > >=20 > > > > > > I mean, local to local, it's *probably* fine, but still a netwo= rk > > > > > > protocol not defined in terms of explicit width fields makes me > > > > > > nervous. I'd prefer to see the cmd being a packed structure wi= th > > > > > > fixed width elements. =20 > > > > >=20 > > > > > It actually is, because: > > > > >=20 > > > > > struct { > > > > > int cmd; > > > > > }; > > > > >=20 > > > > > is a packet structure with fixed width elements. Any architecture= we > > > > > build for (at least the ones I'm aware of) has a 32-bit int. We c= an > > > > > make it uint32_t if it makes you feel better. =20 > > > >=20 > > > > Sorry, I should have said "*explicitly* fixed width fields". So, y= es, > > > > uint32_t would make me feel better :) =20 > > >=20 > > > Changed to int8_t anyway meanwhile. We don't need all those bits. =20 > >=20 > > Works or me. > >=20 > > > > > > I also think we should do some sort of basic magic / version ex= change. > > > > > > I don't see any reason we'd need to extend the protocol, but I'd > > > > > > rather have the option if we have to. =20 > > > > >=20 > > > > > passt-repair will be packaged and distributed together with passt, > > > > > though. Versions must match. =20 > > > >=20 > > > > But nothing enforces that. =20 > > >=20 > > > Distribution packages. If I run claws-mail with the wrong version of, > > > say, libpixman, it won't start. If you don't use them, you're on your > > > own. =20 > >=20 > > But shared libraries *do* have versioning checks: there are defined > > compatibility semantics for sonames, and there can be symbol versions > > as well. > >=20 > > > > AIUI KubeVirt will be running passt-repair > > > > in a different context. Which means it may well be deployed by a > > > > different path than the passt binary =20 > > >=20 > > > No, that's not the way it works. It needs to match, in the sense that > > > 1. it's a KubeVirt requirement to have compatible packages between > > > distribution and the "base container image" and 2. this would most > > > likely be sourced from the "base container image" anyway. > > >=20 > > > I maintain the packages for four distributions, plus AppArmor and > > > SELinux policies upstream and downstream, and I take care of updating > > > the package in KubeVirt as well, so I guess I have a vague idea of > > > what's convenient, enforced, burdensome, and so on. > > > =20 > > > > which means however we > > > > distribute it's quite plausible that a downstream screwup could > > > > mismatch the versions. We should endeavour to have a reasonably > > > > graceful failure mode for that. =20 > > >=20 > > > Regardless of this, I think that *this one* is an interface (I wouldn= 't > > > even call it a protocol) that needs to be set in stone, except for > > > hypothetical (and highly unlikely) UAPI additions which we'll be anyw= ay > > > able to accommodate for easily. =20 > >=20 > > Ok, I can buy that, but it's a contradictory position to "versions > > must match". >=20 > Note: "Regardless of this". It's *another* consideration *on top of > that*. >=20 > 1. Versions (builds) match. >=20 > 2. And even if they didn't, it wouldn't be a problem, because this > interface will not change. Hm, ok. I'm way less convinced on both of those points. Which means I'd like to have a clear policy on whether we require versions to match or not. Which we prioritise affects design choices. > > > It's a single socket option with three possible values (for 13 years > > > now), of which we plan to use two. If we want this interface to do > > > anything else, it should be another interface. > > >=20 > > > So there's really no problem with this. > > >=20 > > > Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW > > > should ideally suffice), so it needs to be extremely simple and > > > auditable. =20 > >=20 > > Sending and checking a magic number is not a lot of complexity, even > > in something on this scale. >=20 > If you want to have multiple bytes (because I'm forecasting that you > won't be happy with 255 values), it's substantial complexity in > comparison to the current implementation. > > > > > And latency here might matter more than in > > > > > the rest of the migration process. =20 > > > > =20 > > > > > > Plus checking a magic number > > > > > > should make things less damaging and more debuggable if you wer= e to > > > > > > point the repair helper at an entirely unrelated unix socket in= stead > > > > > > of passt's repair socket. =20 > > > > >=20 > > > > > Maybe, yes, even though I don't really see good chances for that > > > > > mistake to happen. Feel free to post a proposal, of course. =20 > > > >=20 > > > > I disagree on the good chances for a mistake thing. In GSS I saw > > > > plenty of occasions where things that shouldn't be mismatched were = due > > > > to some packaging or user screwup. And that's before even consider= ing > > > > the way that KubeVirt deploys its various pieces seems to provide a > > > > number of opportunities to mess this up. > > > >=20 > > > > So, I'll see what I can come up with. I'm fine with requiring > > > > matching versions if it's actually checked. Maybe a magic derived > > > > from our git hash, or even our build-id. =20 > > >=20 > > > Both would make things significantly less usable, because they would > > > make different but compatible builds incompatible, and different > > > implementations rather inconvenient. =20 > >=20 > > Ok, so you're definitely now saying versions *don't* need to match. >=20 > They don't need to, no. They will match, but they don't need to. >=20 > > > For example, it might be a practical solution to have a Go > > > implementation of this in KubeVirt's virt-handler itself, but if it > > > needs to extract information or strings from the binary, that becomes > > > impractical. =20 > >=20 > > Ok... could we at least add just a magic number then. If we do ever > > need a new protocol we can change it, otherwise the protocol immutable > > for now. >=20 > Adding a non-byte magic number implies handling short reads and short > writes, which I think is absolutely unnecessary. Feel free to propose > an implementation, as usual. Ok, it's on my list. > If you are happy with a single byte magic number, then I suppose that, > given that we're just using three values, we could encode that > information using a combination of the remaining bits, which has the > advantage of not needing any specific implementation until it's > actually needed (never, I suppose), because passt-repair already > terminates on an unknown command value. Yeah, as you predicted, I'm not really happy with a 1 byte magic number. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --5tXv82z32UX4kuc6 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmebLaUACgkQzQJF27ox 2GcqwQ//YW0P7Ogzlsmr7c0/9sDy7sSJ3EMB4btsuBbpZURKoXiudClNR1WeMMJR GiHAV/FH87UjXIfINIoZGlhG5GTbexJiCuenUo3oxV9dO44YPq3Xt6U+rXut2xy9 SPjxfswcW/E45wCf0/lskDw8Wd4yBvRf5siJovdyIeBirqWIsCOYX0uCfZTOxSz7 Ma32LD5ypyw70byFCjCVyJV88Q6Un8A1/QZ941Y2AXvnG+LD/ORyfhyXIuyYGsil wsxbeOKG7EGdAcWs98h7Jeh4OFgpszaQhBC4rQQUEiSOqWbJO32esqtlJy9vzTqC +g7MBxpVNLlIsQ+kwAMbBw6Z5StNBP1dzHGHmR6Oy2//Juea3etRwcqyx6tspvvx 2Kn7XEv7as5GSnunLAY0uoik3hXFZjfZBEV1DGpIQRzIoY+eJHEZFQrR51Hz7xzC cZGif9RLmGpnMyttmBR9+3TkF0uul55EToqNkVKqi4DjlCwQRlrRJhZko6hmMY1M ioELhefRzx9TZexfhNBFHbZBwpDGboPA6WRzVuTLifJMv2VC6ulMEMAEzQvbsJVc PGW6SeITG1bGn3iblyHenMyCFmLXFuSOLWXMfXfKzo6xor73JgClOVeeLdZGNwgn eAn2Db+9OExVEO0hIhMMTezH4v4kgaYHZ1tD5XcTWdSnFhY2Nns= =Xc9n -----END PGP SIGNATURE----- --5tXv82z32UX4kuc6--