From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH 7/7] Introduce passt-repair
Date: Thu, 30 Jan 2025 18:43:34 +1100 [thread overview]
Message-ID: <Z5stppZzBjzc5Dv_@zatzit> (raw)
In-Reply-To: <20250130055543.1020dec7@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 11939 bytes --]
On Thu, Jan 30, 2025 at 05:55:43AM +0100, Stefano Brivio wrote:
> On Thu, 30 Jan 2025 11:53:08 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote:
> > > On Wed, 29 Jan 2025 12:29:27 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > > On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano Brivio wrote:
> > > > > On Tue, 28 Jan 2025 12:51:59 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote:
> > > > > > > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of
> > > > > > > passt. Not used yet.
> > > > > > >
> > > > > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > > > > ---
> > > > > > > Makefile | 10 +++--
> > > > > > > passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > 2 files changed, 118 insertions(+), 3 deletions(-)
> > > > > > > create mode 100644 passt-repair.c
> > > > > > >
> > > > > > > diff --git a/Makefile b/Makefile
> > > > > > > index 1383875..1b71cb0 100644
> > > > > > > --- a/Makefile
> > > > > > > +++ b/Makefile
> > > > > > > @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \
> > > > > > > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
> > > > > > > vhost_user.c virtio.c vu_common.c
> > > > > > > QRAP_SRCS = qrap.c
> > > > > > > -SRCS = $(PASST_SRCS) $(QRAP_SRCS)
> > > > > > > +PASST_REPAIR_SRCS = passt-repair.c
> > > > > > > +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
> > > > > > >
> > > > > > > MANPAGES = passt.1 pasta.1 qrap.1
> > > > > > >
> > > > > > > @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man
> > > > > > > man1dir ?= $(mandir)/man1
> > > > > > >
> > > > > > > ifeq ($(TARGET_ARCH),x86_64)
> > > > > > > -BIN := passt passt.avx2 pasta pasta.avx2 qrap
> > > > > > > +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair
> > > > > > > else
> > > > > > > -BIN := passt pasta qrap
> > > > > > > +BIN := passt pasta qrap passt-repair
> > > > > > > endif
> > > > > > >
> > > > > > > all: $(BIN) $(MANPAGES) docs
> > > > > > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt%
> > > > > > > qrap: $(QRAP_SRCS) passt.h
> > > > > > > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS)
> > > > > > >
> > > > > > > +passt-repair: $(PASST_REPAIR_SRCS)
> > > > > > > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS)
> > > > > > > +
> > > > > > > valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \
> > > > > > > rt_sigreturn getpid gettid kill clock_gettime mmap \
> > > > > > > mmap2 munmap open unlink gettimeofday futex statx \
> > > > > > > diff --git a/passt-repair.c b/passt-repair.c
> > > > > > > new file mode 100644
> > > > > > > index 0000000..e9b9609
> > > > > > > --- /dev/null
> > > > > > > +++ b/passt-repair.c
> > > > > > > @@ -0,0 +1,111 @@
> > > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > > > > +
> > > > > > > +/* PASST - Plug A Simple Socket Transport
> > > > > > > + * for qemu/UNIX domain socket mode
> > > > > > > + *
> > > > > > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets
> > > > > > > + *
> > > > > > > + * Copyright (c) 2025 Red Hat GmbH
> > > > > > > + * Author: Stefano Brivio <sbrivio@redhat.com>
> > > > > > > + *
> > > > > > > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along
> > > > > > > + * with commands mapping to TCP_REPAIR values, and switch repair mode on or
> > > > > > > + * off. Reply by echoing the command. Exit if the command is INT_MAX.
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include <sys/types.h>
> > > > > > > +#include <sys/socket.h>
> > > > > > > +#include <sys/un.h>
> > > > > > > +#include <errno.h>
> > > > > > > +#include <stdio.h>
> > > > > > > +#include <stdlib.h>
> > > > > > > +#include <string.h>
> > > > > > > +#include <limits.h>
> > > > > > > +#include <unistd.h>
> > > > > > > +#include <netdb.h>
> > > > > > > +
> > > > > > > +#include <netinet/tcp.h>
> > > > > > > +
> > > > > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> > > > > > > +
> > > > > > > +int main(int argc, char **argv)
> > > > > > > +{
> > > > > > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> > > > > > > + __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> > > > > > > + struct sockaddr_un a = { AF_UNIX, "" };
> > > > > > > + int cmd, fds[SCM_MAX_FD], s, ret, i;
> > > > > > > + struct cmsghdr *cmsg;
> > > > > > > + struct msghdr msg;
> > > > > > > + struct iovec iov;
> > > > > > > +
> > > > > > > + iov = (struct iovec){ &cmd, sizeof(cmd) };
> > > > > >
> > > > > > I mean, local to local, it's *probably* fine, but still a network
> > > > > > protocol not defined in terms of explicit width fields makes me
> > > > > > nervous. I'd prefer to see the cmd being a packed structure with
> > > > > > fixed width elements.
> > > > >
> > > > > It actually is, because:
> > > > >
> > > > > struct {
> > > > > int cmd;
> > > > > };
> > > > >
> > > > > is a packet structure with fixed width elements. Any architecture we
> > > > > build for (at least the ones I'm aware of) has a 32-bit int. We can
> > > > > make it uint32_t if it makes you feel better.
> > > >
> > > > Sorry, I should have said "*explicitly* fixed width fields". So, yes,
> > > > uint32_t would make me feel better :)
> > >
> > > Changed to int8_t anyway meanwhile. We don't need all those bits.
> >
> > Works or me.
> >
> > > > > > I also think we should do some sort of basic magic / version exchange.
> > > > > > I don't see any reason we'd need to extend the protocol, but I'd
> > > > > > rather have the option if we have to.
> > > > >
> > > > > passt-repair will be packaged and distributed together with passt,
> > > > > though. Versions must match.
> > > >
> > > > But nothing enforces that.
> > >
> > > Distribution packages. If I run claws-mail with the wrong version of,
> > > say, libpixman, it won't start. If you don't use them, you're on your
> > > own.
> >
> > But shared libraries *do* have versioning checks: there are defined
> > compatibility semantics for sonames, and there can be symbol versions
> > as well.
> >
> > > > AIUI KubeVirt will be running passt-repair
> > > > in a different context. Which means it may well be deployed by a
> > > > different path than the passt binary
> > >
> > > No, that's not the way it works. It needs to match, in the sense that
> > > 1. it's a KubeVirt requirement to have compatible packages between
> > > distribution and the "base container image" and 2. this would most
> > > likely be sourced from the "base container image" anyway.
> > >
> > > I maintain the packages for four distributions, plus AppArmor and
> > > SELinux policies upstream and downstream, and I take care of updating
> > > the package in KubeVirt as well, so I guess I have a vague idea of
> > > what's convenient, enforced, burdensome, and so on.
> > >
> > > > which means however we
> > > > distribute it's quite plausible that a downstream screwup could
> > > > mismatch the versions. We should endeavour to have a reasonably
> > > > graceful failure mode for that.
> > >
> > > Regardless of this, I think that *this one* is an interface (I wouldn't
> > > even call it a protocol) that needs to be set in stone, except for
> > > hypothetical (and highly unlikely) UAPI additions which we'll be anyway
> > > able to accommodate for easily.
> >
> > Ok, I can buy that, but it's a contradictory position to "versions
> > must match".
>
> Note: "Regardless of this". It's *another* consideration *on top of
> that*.
>
> 1. Versions (builds) match.
>
> 2. And even if they didn't, it wouldn't be a problem, because this
> interface will not change.
Hm, ok. I'm way less convinced on both of those points. Which means
I'd like to have a clear policy on whether we require versions to
match or not. Which we prioritise affects design choices.
> > > It's a single socket option with three possible values (for 13 years
> > > now), of which we plan to use two. If we want this interface to do
> > > anything else, it should be another interface.
> > >
> > > So there's really no problem with this.
> > >
> > > Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW
> > > should ideally suffice), so it needs to be extremely simple and
> > > auditable.
> >
> > Sending and checking a magic number is not a lot of complexity, even
> > in something on this scale.
>
> If you want to have multiple bytes (because I'm forecasting that you
> won't be happy with 255 values), it's substantial complexity in
> comparison to the current implementation.
> > > > > And latency here might matter more than in
> > > > > the rest of the migration process.
> > > >
> > > > > > Plus checking a magic number
> > > > > > should make things less damaging and more debuggable if you were to
> > > > > > point the repair helper at an entirely unrelated unix socket instead
> > > > > > of passt's repair socket.
> > > > >
> > > > > Maybe, yes, even though I don't really see good chances for that
> > > > > mistake to happen. Feel free to post a proposal, of course.
> > > >
> > > > I disagree on the good chances for a mistake thing. In GSS I saw
> > > > plenty of occasions where things that shouldn't be mismatched were due
> > > > to some packaging or user screwup. And that's before even considering
> > > > the way that KubeVirt deploys its various pieces seems to provide a
> > > > number of opportunities to mess this up.
> > > >
> > > > So, I'll see what I can come up with. I'm fine with requiring
> > > > matching versions if it's actually checked. Maybe a magic derived
> > > > from our git hash, or even our build-id.
> > >
> > > Both would make things significantly less usable, because they would
> > > make different but compatible builds incompatible, and different
> > > implementations rather inconvenient.
> >
> > Ok, so you're definitely now saying versions *don't* need to match.
>
> They don't need to, no. They will match, but they don't need to.
>
> > > For example, it might be a practical solution to have a Go
> > > implementation of this in KubeVirt's virt-handler itself, but if it
> > > needs to extract information or strings from the binary, that becomes
> > > impractical.
> >
> > Ok... could we at least add just a magic number then. If we do ever
> > need a new protocol we can change it, otherwise the protocol immutable
> > for now.
>
> Adding a non-byte magic number implies handling short reads and short
> writes, which I think is absolutely unnecessary. Feel free to propose
> an implementation, as usual.
Ok, it's on my list.
> If you are happy with a single byte magic number, then I suppose that,
> given that we're just using three values, we could encode that
> information using a combination of the remaining bits, which has the
> advantage of not needing any specific implementation until it's
> actually needed (never, I suppose), because passt-repair already
> terminates on an unknown command value.
Yeah, as you predicted, I'm not really happy with a 1 byte magic
number.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-01-30 7:44 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-27 23:15 [PATCH 0/7] Draft, incomplete series introducing state migration Stefano Brivio
2025-01-27 23:15 ` [PATCH 1/7] icmp, udp: Pad time_t timestamp to 64-bit to ease " Stefano Brivio
2025-01-28 0:49 ` David Gibson
2025-01-28 6:48 ` Stefano Brivio
2025-01-27 23:15 ` [PATCH 2/7] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-28 0:50 ` David Gibson
2025-01-27 23:15 ` [PATCH 3/7] tcp_conn: Avoid 7-bit hole in struct tcp_splice_conn Stefano Brivio
2025-01-28 0:53 ` David Gibson
2025-01-28 6:48 ` Stefano Brivio
2025-01-29 1:02 ` David Gibson
2025-01-29 7:33 ` Stefano Brivio
2025-01-30 0:44 ` David Gibson
2025-01-30 4:55 ` Stefano Brivio
2025-01-30 7:27 ` David Gibson
2025-01-27 23:15 ` [PATCH 4/7] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-27 23:15 ` [PATCH 5/7] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-28 0:59 ` David Gibson
2025-01-28 6:48 ` Stefano Brivio
2025-01-29 1:03 ` David Gibson
2025-01-29 7:33 ` Stefano Brivio
2025-01-30 0:44 ` David Gibson
2025-01-27 23:15 ` [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-28 1:40 ` David Gibson
2025-01-28 6:50 ` Stefano Brivio
2025-01-29 1:16 ` David Gibson
2025-01-29 7:33 ` Stefano Brivio
2025-01-30 0:48 ` David Gibson
2025-01-30 4:55 ` Stefano Brivio
2025-01-30 7:38 ` David Gibson
2025-01-30 8:32 ` Stefano Brivio
2025-01-30 8:54 ` David Gibson
2025-01-27 23:15 ` [PATCH 7/7] Introduce passt-repair Stefano Brivio
2025-01-27 23:31 ` Stefano Brivio
2025-01-28 1:51 ` David Gibson
2025-01-28 6:51 ` Stefano Brivio
2025-01-29 1:29 ` David Gibson
2025-01-29 7:04 ` Stefano Brivio
2025-01-30 0:53 ` David Gibson
2025-01-30 4:55 ` Stefano Brivio
2025-01-30 7:43 ` David Gibson [this message]
2025-01-30 7:56 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5stppZzBjzc5Dv_@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).