From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=FDe56E16; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 08DFC5A0626 for ; Thu, 30 Jan 2025 01:53:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1738198384; bh=bfPn3lRhuHdWT0cjZ/zczkzSYeO1Hrz5pm3Rkt9dQDs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FDe56E16+p1nvRtYVn9KLJQaEPjCD70ZyAc9or3H6wkVoP4Vb+s1C6+v1rw3sXKux O+kVPqn4Ph1KFb/F8R5K0fEISrgdB5msUadrUjnNqXhMVbn/TBbk/MsyG9BGPODLQo FUJGgc2gBYDIHzsby479GhubAFizxX0Q12f2peMo7L0uah4+48gPcwniblHCFVK6ge 2huxkNbTAcJf10EcIMSZnTTz2S57dv+aW7+KVdqpmmCGkWP0kN6qar55yjVm1Npnnm UcCa3pHFPaVob4XxvxHGAO8E+ESdO8CNWMkQ1ZZkbTe544dnqbr0u5xfynveE2EbKY KxzMmqHwVEK6w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Yk0r02wrnz4x6P; Thu, 30 Jan 2025 11:53:04 +1100 (AEDT) Date: Thu, 30 Jan 2025 11:53:08 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 7/7] Introduce passt-repair Message-ID: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-8-sbrivio@redhat.com> <20250128075131.268f85d1@elisabeth> <20250129080428.472744aa@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="sWOWKbQQcZgpjwhS" Content-Disposition: inline In-Reply-To: <20250129080428.472744aa@elisabeth> Message-ID-Hash: NFAIIYVOFZ57RXRXQU6RFBWC3UV7KE4F X-Message-ID-Hash: NFAIIYVOFZ57RXRXQU6RFBWC3UV7KE4F X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --sWOWKbQQcZgpjwhS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote: > On Wed, 29 Jan 2025 12:29:27 +1100 > David Gibson wrote: >=20 > > On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano Brivio wrote: > > > On Tue, 28 Jan 2025 12:51:59 +1100 > > > David Gibson wrote: > > > =20 > > > > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote: =20 > > > > > A privileged helper to set/clear TCP_REPAIR on sockets on behalf = of > > > > > passt. Not used yet. > > > > >=20 > > > > > Signed-off-by: Stefano Brivio > > > > > --- > > > > > Makefile | 10 +++-- > > > > > passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++= ++++++ > > > > > 2 files changed, 118 insertions(+), 3 deletions(-) > > > > > create mode 100644 passt-repair.c > > > > >=20 > > > > > diff --git a/Makefile b/Makefile > > > > > index 1383875..1b71cb0 100644 > > > > > --- a/Makefile > > > > > +++ b/Makefile > > > > > @@ -42,7 +42,8 @@ PASST_SRCS =3D arch.c arp.c checksum.c conf.c d= hcp.c dhcpv6.c flow.c fwd.c \ > > > > > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c= util.c \ > > > > > vhost_user.c virtio.c vu_common.c > > > > > QRAP_SRCS =3D qrap.c > > > > > -SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > > > > > +PASST_REPAIR_SRCS =3D passt-repair.c > > > > > +SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > > > > =20 > > > > > MANPAGES =3D passt.1 pasta.1 qrap.1 > > > > > =20 > > > > > @@ -72,9 +73,9 @@ mandir ?=3D $(datarootdir)/man > > > > > man1dir ?=3D $(mandir)/man1 > > > > > =20 > > > > > ifeq ($(TARGET_ARCH),x86_64) > > > > > -BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap > > > > > +BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap passt-repair > > > > > else > > > > > -BIN :=3D passt pasta qrap > > > > > +BIN :=3D passt pasta qrap passt-repair > > > > > endif > > > > > =20 > > > > > all: $(BIN) $(MANPAGES) docs > > > > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > > > > > qrap: $(QRAP_SRCS) passt.h > > > > > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=3D\"$(TARGET_ARCH)\= " $(QRAP_SRCS) -o qrap $(LDFLAGS) > > > > > =20 > > > > > +passt-repair: $(PASST_REPAIR_SRCS) > > > > > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o pa= sst-repair $(LDFLAGS) > > > > > + > > > > > valgrind: EXTRA_SYSCALLS +=3D rt_sigprocmask rt_sigtimedwait rt_= sigaction \ > > > > > rt_sigreturn getpid gettid kill clock_gettime mmap \ > > > > > mmap2 munmap open unlink gettimeofday futex statx \ > > > > > diff --git a/passt-repair.c b/passt-repair.c > > > > > new file mode 100644 > > > > > index 0000000..e9b9609 > > > > > --- /dev/null > > > > > +++ b/passt-repair.c > > > > > @@ -0,0 +1,111 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > > > + > > > > > +/* PASST - Plug A Simple Socket Transport > > > > > + * for qemu/UNIX domain socket mode > > > > > + * > > > > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on= sockets > > > > > + * > > > > > + * Copyright (c) 2025 Red Hat GmbH > > > > > + * Author: Stefano Brivio > > > > > + * > > > > > + * Connect to passt via UNIX domain socket, receive sockets via = SCM_RIGHTS along > > > > > + * with commands mapping to TCP_REPAIR values, and switch repair= mode on or > > > > > + * off. Reply by echoing the command. Exit if the command is INT= _MAX. > > > > > + */ > > > > > + > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > + > > > > > +#include > > > > > + > > > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h),= not in UAPI */ > > > > > + > > > > > +int main(int argc, char **argv) > > > > > +{ > > > > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > > > > > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > > > > > + struct sockaddr_un a =3D { AF_UNIX, "" }; > > > > > + int cmd, fds[SCM_MAX_FD], s, ret, i; > > > > > + struct cmsghdr *cmsg; > > > > > + struct msghdr msg; > > > > > + struct iovec iov; > > > > > + > > > > > + iov =3D (struct iovec){ &cmd, sizeof(cmd) }; =20 > > > >=20 > > > > I mean, local to local, it's *probably* fine, but still a network > > > > protocol not defined in terms of explicit width fields makes me > > > > nervous. I'd prefer to see the cmd being a packed structure with > > > > fixed width elements. =20 > > >=20 > > > It actually is, because: > > >=20 > > > struct { > > > int cmd; > > > }; > > >=20 > > > is a packet structure with fixed width elements. Any architecture we > > > build for (at least the ones I'm aware of) has a 32-bit int. We can > > > make it uint32_t if it makes you feel better. =20 > >=20 > > Sorry, I should have said "*explicitly* fixed width fields". So, yes, > > uint32_t would make me feel better :) >=20 > Changed to int8_t anyway meanwhile. We don't need all those bits. Works or me. > > > > I also think we should do some sort of basic magic / version exchan= ge. > > > > I don't see any reason we'd need to extend the protocol, but I'd > > > > rather have the option if we have to. =20 > > >=20 > > > passt-repair will be packaged and distributed together with passt, > > > though. Versions must match. =20 > >=20 > > But nothing enforces that. >=20 > Distribution packages. If I run claws-mail with the wrong version of, > say, libpixman, it won't start. If you don't use them, you're on your > own. But shared libraries *do* have versioning checks: there are defined compatibility semantics for sonames, and there can be symbol versions as well. > > AIUI KubeVirt will be running passt-repair > > in a different context. Which means it may well be deployed by a > > different path than the passt binary >=20 > No, that's not the way it works. It needs to match, in the sense that > 1. it's a KubeVirt requirement to have compatible packages between > distribution and the "base container image" and 2. this would most > likely be sourced from the "base container image" anyway. >=20 > I maintain the packages for four distributions, plus AppArmor and > SELinux policies upstream and downstream, and I take care of updating > the package in KubeVirt as well, so I guess I have a vague idea of > what's convenient, enforced, burdensome, and so on. >=20 > > which means however we > > distribute it's quite plausible that a downstream screwup could > > mismatch the versions. We should endeavour to have a reasonably > > graceful failure mode for that. >=20 > Regardless of this, I think that *this one* is an interface (I wouldn't > even call it a protocol) that needs to be set in stone, except for > hypothetical (and highly unlikely) UAPI additions which we'll be anyway > able to accommodate for easily. Ok, I can buy that, but it's a contradictory position to "versions must match". > It's a single socket option with three possible values (for 13 years > now), of which we plan to use two. If we want this interface to do > anything else, it should be another interface. >=20 > So there's really no problem with this. >=20 > Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW > should ideally suffice), so it needs to be extremely simple and > auditable. Sending and checking a magic number is not a lot of complexity, even in something on this scale. >=20 > > > And latency here might matter more than in > > > the rest of the migration process. =20 > >=20 > > > > Plus checking a magic number > > > > should make things less damaging and more debuggable if you were to > > > > point the repair helper at an entirely unrelated unix socket instead > > > > of passt's repair socket. =20 > > >=20 > > > Maybe, yes, even though I don't really see good chances for that > > > mistake to happen. Feel free to post a proposal, of course. =20 > >=20 > > I disagree on the good chances for a mistake thing. In GSS I saw > > plenty of occasions where things that shouldn't be mismatched were due > > to some packaging or user screwup. And that's before even considering > > the way that KubeVirt deploys its various pieces seems to provide a > > number of opportunities to mess this up. > >=20 > > So, I'll see what I can come up with. I'm fine with requiring > > matching versions if it's actually checked. Maybe a magic derived > > from our git hash, or even our build-id. >=20 > Both would make things significantly less usable, because they would > make different but compatible builds incompatible, and different > implementations rather inconvenient. Ok, so you're definitely now saying versions *don't* need to match. > For example, it might be a practical solution to have a Go > implementation of this in KubeVirt's virt-handler itself, but if it > needs to extract information or strings from the binary, that becomes > impractical. Ok... could we at least add just a magic number then. If we do ever need a new protocol we can change it, otherwise the protocol immutable for now. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --sWOWKbQQcZgpjwhS Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeazXMACgkQzQJF27ox 2GflNw//W3IcI5C5BVVJwy9bJn6fs8jTd4wRfDdq9vzwINa5ic/pqQvGumPkhCo+ YBOdn+Hsvmv0461XGR7VTkN5ghC7m8pqZu5BwRRpJPPKCeTsWv04PSPWG5Y9WmnO yvdEz9zp/2VSXz28ovoMoigLlyvBiurhZmH7+K2hz3OwiFMTh5ZZ6cGAa3gWNRqX AQCryphkHGwb9auqn1fDpxbH19CqGrBr9t5L/sbKpdLTvsYr457t1JzRcNEXlpP7 z5WoKOdyAaZ088OFxuWDCYgZw8UMGzSLKImcRpb2FOKMy8wXa2z2xU3MwxTxOeDz g7t1ZRfcchDIrUxyrOM4SqPj/O8CjHETlhxUTzzFu1rqL04jBgAMqlu9UG3mI523 Qn9mljXp6whrXvKCoFiHdToSxDllMOtYazzqRlZMmrD43yaMADo/er4cUGT3h5m2 i263MhqJc/LkmmIXazU1Vmd8qmu1x34QVXXJIdb4wUqug+OebSYsywffMI78Na5p d1TFDx3ddkjspAVy4A//l8NflH2xHsKA37zUqHwGiMdryi0Hvo3vJ5fV53N60EOZ XJgo2e3xAzttse0MuzDHPxwBpDz/gIpt0QjLZPmR6FGkPb0vNGmSYFNuwtR35Ery HAI47oUAOlve6C2gbrmrHTHNHySSpazLGe79erQD1dAG/rxwdqw= =PH+2 -----END PGP SIGNATURE----- --sWOWKbQQcZgpjwhS--