From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=eY2L7Zdx; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id D84045A061E for ; Thu, 30 Jan 2025 05:55:51 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738212950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ggPd/8tT6a+RagJr0ma2TDlSwcpTqF67mZVcssD6obs=; b=eY2L7Zdxl4IDyLpk3PxBroKcRSFVWCG2oxuMNvpaettFd1IfXwh7pA3aci+VWfUqFsCfZS ql8kgY5a2lPx54hjcRa5Vz9Pw+mDJiwuzhcBnlAqZNGvM2LmwMUZzeIOirkgOttkonhuWd DpglO1lYYPY+8OOYy5ZtOB4ROjUAs44= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-185-AMU9I_oXMJaw-9CBr_SLvA-1; Wed, 29 Jan 2025 23:55:49 -0500 X-MC-Unique: AMU9I_oXMJaw-9CBr_SLvA-1 X-Mimecast-MFC-AGG-ID: AMU9I_oXMJaw-9CBr_SLvA Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-436723bf7ffso2453815e9.3 for ; Wed, 29 Jan 2025 20:55:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738212947; x=1738817747; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ggPd/8tT6a+RagJr0ma2TDlSwcpTqF67mZVcssD6obs=; b=ZD7E8MRwLBrtlYTZkF7hBWbSgLNLE6SkVUpE4qWZ2PcePzcA3AUk2lRaqqiGHvNyfl IWoiSwKAY4YJffOyh2f+FzwcILi7272pFJS6OY7ImVpmG2YZMmQXdUp5usHF3aDa+7An uIr6aJ6PmhvfgAlmwaPu1ffMWNOYvsF/o/LgqagplWXolDRCKuel1mkhXIRlKPBSfaaL DAT5/MxeP6GlunVofCzXsaYpWHJbYFk74LHq9nqOSsIE1KJ4hoXelseD8WBoAjW36O8M qs47ZPuJkyeQKQa+F2gvcumYhm/g1tSbrE4ZB+ItoMtms/1q0le6Fm/vNV1JjnPpNvR5 hQeQ== X-Gm-Message-State: AOJu0YyP4fo8OtMn7H/3nBbcDz4mzVldGsBccIpg38VKZCEo+x3uWy+k NBYML0Jn5dbxJmuzVlkfmwoPzxhvZy5f9B5kpq6SMblREZ2NpVxuAHCaCw6HS1Pg3gZ9BMOTw7Z BmRDopqxx0aenxY4AEoT/rS0j1OpzYvZrRY/QTcQTL0o4XmomQTJph0cEPg== X-Gm-Gg: ASbGnctPfYeTHIc1TBqhl6Z5ns19y4ArutWRTX67LoJUEFQgEF2hjzYcjogYHbFN6s+ er/pYMDS2D6VElPsJGAX4vjsPServ255jnQZPsQ5Dc8CsZ6YBgjuFJPMyR7uW8KjDyPn1tdGi3I ZdCoqVlOqQ5Q5wOPhsj6a7UWxNrkJ5U05bmCUEdWQer9Lw680vYkmjOy9T+6AB3WyuCB7AApnFw 1vXHsG+vNq3iPJyP/4Cx+GiwQjOHAeZSjQGFCxz7jV+4G7eJo2tlPTwzLTqDiKbBVeCAtDPBACN 49qyHW2auPBYAhw6 X-Received: by 2002:adf:f943:0:b0:385:f996:1b8e with SMTP id ffacd0b85a97d-38c51951006mr4966194f8f.16.1738212947468; Wed, 29 Jan 2025 20:55:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IHyYLKL9Mx7JjNTec0dJ9bCKpyGTk3whZwVOMz1kIXz4X+//X+vj7P7kjlr8IhpKjAOkukpZA== X-Received: by 2002:adf:f943:0:b0:385:f996:1b8e with SMTP id ffacd0b85a97d-38c51951006mr4966181f8f.16.1738212947037; Wed, 29 Jan 2025 20:55:47 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38c5c0ec02bsm783390f8f.13.2025.01.29.20.55.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jan 2025 20:55:46 -0800 (PST) Date: Thu, 30 Jan 2025 05:55:43 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 7/7] Introduce passt-repair Message-ID: <20250130055543.1020dec7@elisabeth> In-Reply-To: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-8-sbrivio@redhat.com> <20250128075131.268f85d1@elisabeth> <20250129080428.472744aa@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XBP1ocYY4tP7HZNAdQM8oDslyp0uhEJSP0J-Qt5usog_1738212948 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: YZ657FLEG2TXOC4BUOYGQ532D4YWYEGU X-Message-ID-Hash: YZ657FLEG2TXOC4BUOYGQ532D4YWYEGU X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 30 Jan 2025 11:53:08 +1100 David Gibson wrote: > On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote: > > On Wed, 29 Jan 2025 12:29:27 +1100 > > David Gibson wrote: > > > > > On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano Brivio wrote: > > > > On Tue, 28 Jan 2025 12:51:59 +1100 > > > > David Gibson wrote: > > > > > > > > > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote: > > > > > > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of > > > > > > passt. Not used yet. > > > > > > > > > > > > Signed-off-by: Stefano Brivio > > > > > > --- > > > > > > Makefile | 10 +++-- > > > > > > passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > 2 files changed, 118 insertions(+), 3 deletions(-) > > > > > > create mode 100644 passt-repair.c > > > > > > > > > > > > diff --git a/Makefile b/Makefile > > > > > > index 1383875..1b71cb0 100644 > > > > > > --- a/Makefile > > > > > > +++ b/Makefile > > > > > > @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ > > > > > > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > > > > > > vhost_user.c virtio.c vu_common.c > > > > > > QRAP_SRCS = qrap.c > > > > > > -SRCS = $(PASST_SRCS) $(QRAP_SRCS) > > > > > > +PASST_REPAIR_SRCS = passt-repair.c > > > > > > +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > > > > > > > > > > > MANPAGES = passt.1 pasta.1 qrap.1 > > > > > > > > > > > > @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man > > > > > > man1dir ?= $(mandir)/man1 > > > > > > > > > > > > ifeq ($(TARGET_ARCH),x86_64) > > > > > > -BIN := passt passt.avx2 pasta pasta.avx2 qrap > > > > > > +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair > > > > > > else > > > > > > -BIN := passt pasta qrap > > > > > > +BIN := passt pasta qrap passt-repair > > > > > > endif > > > > > > > > > > > > all: $(BIN) $(MANPAGES) docs > > > > > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > > > > > > qrap: $(QRAP_SRCS) passt.h > > > > > > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS) > > > > > > > > > > > > +passt-repair: $(PASST_REPAIR_SRCS) > > > > > > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS) > > > > > > + > > > > > > valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ > > > > > > rt_sigreturn getpid gettid kill clock_gettime mmap \ > > > > > > mmap2 munmap open unlink gettimeofday futex statx \ > > > > > > diff --git a/passt-repair.c b/passt-repair.c > > > > > > new file mode 100644 > > > > > > index 0000000..e9b9609 > > > > > > --- /dev/null > > > > > > +++ b/passt-repair.c > > > > > > @@ -0,0 +1,111 @@ > > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > > > > + > > > > > > +/* PASST - Plug A Simple Socket Transport > > > > > > + * for qemu/UNIX domain socket mode > > > > > > + * > > > > > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets > > > > > > + * > > > > > > + * Copyright (c) 2025 Red Hat GmbH > > > > > > + * Author: Stefano Brivio > > > > > > + * > > > > > > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along > > > > > > + * with commands mapping to TCP_REPAIR values, and switch repair mode on or > > > > > > + * off. Reply by echoing the command. Exit if the command is INT_MAX. > > > > > > + */ > > > > > > + > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > + > > > > > > +#include > > > > > > + > > > > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ > > > > > > + > > > > > > +int main(int argc, char **argv) > > > > > > +{ > > > > > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > > > > > > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > > > > > > + struct sockaddr_un a = { AF_UNIX, "" }; > > > > > > + int cmd, fds[SCM_MAX_FD], s, ret, i; > > > > > > + struct cmsghdr *cmsg; > > > > > > + struct msghdr msg; > > > > > > + struct iovec iov; > > > > > > + > > > > > > + iov = (struct iovec){ &cmd, sizeof(cmd) }; > > > > > > > > > > I mean, local to local, it's *probably* fine, but still a network > > > > > protocol not defined in terms of explicit width fields makes me > > > > > nervous. I'd prefer to see the cmd being a packed structure with > > > > > fixed width elements. > > > > > > > > It actually is, because: > > > > > > > > struct { > > > > int cmd; > > > > }; > > > > > > > > is a packet structure with fixed width elements. Any architecture we > > > > build for (at least the ones I'm aware of) has a 32-bit int. We can > > > > make it uint32_t if it makes you feel better. > > > > > > Sorry, I should have said "*explicitly* fixed width fields". So, yes, > > > uint32_t would make me feel better :) > > > > Changed to int8_t anyway meanwhile. We don't need all those bits. > > Works or me. > > > > > > I also think we should do some sort of basic magic / version exchange. > > > > > I don't see any reason we'd need to extend the protocol, but I'd > > > > > rather have the option if we have to. > > > > > > > > passt-repair will be packaged and distributed together with passt, > > > > though. Versions must match. > > > > > > But nothing enforces that. > > > > Distribution packages. If I run claws-mail with the wrong version of, > > say, libpixman, it won't start. If you don't use them, you're on your > > own. > > But shared libraries *do* have versioning checks: there are defined > compatibility semantics for sonames, and there can be symbol versions > as well. > > > > AIUI KubeVirt will be running passt-repair > > > in a different context. Which means it may well be deployed by a > > > different path than the passt binary > > > > No, that's not the way it works. It needs to match, in the sense that > > 1. it's a KubeVirt requirement to have compatible packages between > > distribution and the "base container image" and 2. this would most > > likely be sourced from the "base container image" anyway. > > > > I maintain the packages for four distributions, plus AppArmor and > > SELinux policies upstream and downstream, and I take care of updating > > the package in KubeVirt as well, so I guess I have a vague idea of > > what's convenient, enforced, burdensome, and so on. > > > > > which means however we > > > distribute it's quite plausible that a downstream screwup could > > > mismatch the versions. We should endeavour to have a reasonably > > > graceful failure mode for that. > > > > Regardless of this, I think that *this one* is an interface (I wouldn't > > even call it a protocol) that needs to be set in stone, except for > > hypothetical (and highly unlikely) UAPI additions which we'll be anyway > > able to accommodate for easily. > > Ok, I can buy that, but it's a contradictory position to "versions > must match". Note: "Regardless of this". It's *another* consideration *on top of that*. 1. Versions (builds) match. 2. And even if they didn't, it wouldn't be a problem, because this interface will not change. > > It's a single socket option with three possible values (for 13 years > > now), of which we plan to use two. If we want this interface to do > > anything else, it should be another interface. > > > > So there's really no problem with this. > > > > Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW > > should ideally suffice), so it needs to be extremely simple and > > auditable. > > Sending and checking a magic number is not a lot of complexity, even > in something on this scale. If you want to have multiple bytes (because I'm forecasting that you won't be happy with 255 values), it's substantial complexity in comparison to the current implementation. > > > > And latency here might matter more than in > > > > the rest of the migration process. > > > > > > > > Plus checking a magic number > > > > > should make things less damaging and more debuggable if you were to > > > > > point the repair helper at an entirely unrelated unix socket instead > > > > > of passt's repair socket. > > > > > > > > Maybe, yes, even though I don't really see good chances for that > > > > mistake to happen. Feel free to post a proposal, of course. > > > > > > I disagree on the good chances for a mistake thing. In GSS I saw > > > plenty of occasions where things that shouldn't be mismatched were due > > > to some packaging or user screwup. And that's before even considering > > > the way that KubeVirt deploys its various pieces seems to provide a > > > number of opportunities to mess this up. > > > > > > So, I'll see what I can come up with. I'm fine with requiring > > > matching versions if it's actually checked. Maybe a magic derived > > > from our git hash, or even our build-id. > > > > Both would make things significantly less usable, because they would > > make different but compatible builds incompatible, and different > > implementations rather inconvenient. > > Ok, so you're definitely now saying versions *don't* need to match. They don't need to, no. They will match, but they don't need to. > > For example, it might be a practical solution to have a Go > > implementation of this in KubeVirt's virt-handler itself, but if it > > needs to extract information or strings from the binary, that becomes > > impractical. > > Ok... could we at least add just a magic number then. If we do ever > need a new protocol we can change it, otherwise the protocol immutable > for now. Adding a non-byte magic number implies handling short reads and short writes, which I think is absolutely unnecessary. Feel free to propose an implementation, as usual. If you are happy with a single byte magic number, then I suppose that, given that we're just using three values, we could encode that information using a combination of the remaining bits, which has the advantage of not needing any specific implementation until it's actually needed (never, I suppose), because passt-repair already terminates on an unknown command value. -- Stefano