public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH 7/7] Introduce passt-repair
Date: Thu, 30 Jan 2025 18:43:34 +1100	[thread overview]
Message-ID: <Z5stppZzBjzc5Dv_@zatzit> (raw)
In-Reply-To: <20250130055543.1020dec7@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 11939 bytes --]

On Thu, Jan 30, 2025 at 05:55:43AM +0100, Stefano Brivio wrote:
> On Thu, 30 Jan 2025 11:53:08 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote:
> > > On Wed, 29 Jan 2025 12:29:27 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano Brivio wrote:  
> > > > > On Tue, 28 Jan 2025 12:51:59 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote:    
> > > > > > > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of
> > > > > > > passt. Not used yet.
> > > > > > > 
> > > > > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > > > > ---
> > > > > > >  Makefile       |  10 +++--
> > > > > > >  passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  2 files changed, 118 insertions(+), 3 deletions(-)
> > > > > > >  create mode 100644 passt-repair.c
> > > > > > > 
> > > > > > > diff --git a/Makefile b/Makefile
> > > > > > > index 1383875..1b71cb0 100644
> > > > > > > --- a/Makefile
> > > > > > > +++ b/Makefile
> > > > > > > @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \
> > > > > > >  	tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
> > > > > > >  	vhost_user.c virtio.c vu_common.c
> > > > > > >  QRAP_SRCS = qrap.c
> > > > > > > -SRCS = $(PASST_SRCS) $(QRAP_SRCS)
> > > > > > > +PASST_REPAIR_SRCS = passt-repair.c
> > > > > > > +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
> > > > > > >  
> > > > > > >  MANPAGES = passt.1 pasta.1 qrap.1
> > > > > > >  
> > > > > > > @@ -72,9 +73,9 @@ mandir		?= $(datarootdir)/man
> > > > > > >  man1dir		?= $(mandir)/man1
> > > > > > >  
> > > > > > >  ifeq ($(TARGET_ARCH),x86_64)
> > > > > > > -BIN := passt passt.avx2 pasta pasta.avx2 qrap
> > > > > > > +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair
> > > > > > >  else
> > > > > > > -BIN := passt pasta qrap
> > > > > > > +BIN := passt pasta qrap passt-repair
> > > > > > >  endif
> > > > > > >  
> > > > > > >  all: $(BIN) $(MANPAGES) docs
> > > > > > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt%
> > > > > > >  qrap: $(QRAP_SRCS) passt.h
> > > > > > >  	$(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS)
> > > > > > >  
> > > > > > > +passt-repair: $(PASST_REPAIR_SRCS)
> > > > > > > +	$(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS)
> > > > > > > +
> > > > > > >  valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction	\
> > > > > > >  			    rt_sigreturn getpid gettid kill clock_gettime mmap \
> > > > > > >  			    mmap2 munmap open unlink gettimeofday futex statx \
> > > > > > > diff --git a/passt-repair.c b/passt-repair.c
> > > > > > > new file mode 100644
> > > > > > > index 0000000..e9b9609
> > > > > > > --- /dev/null
> > > > > > > +++ b/passt-repair.c
> > > > > > > @@ -0,0 +1,111 @@
> > > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > > > > +
> > > > > > > +/* PASST - Plug A Simple Socket Transport
> > > > > > > + *  for qemu/UNIX domain socket mode
> > > > > > > + *
> > > > > > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets
> > > > > > > + *
> > > > > > > + * Copyright (c) 2025 Red Hat GmbH
> > > > > > > + * Author: Stefano Brivio <sbrivio@redhat.com>
> > > > > > > + *
> > > > > > > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along
> > > > > > > + * with commands mapping to TCP_REPAIR values, and switch repair mode on or
> > > > > > > + * off. Reply by echoing the command. Exit if the command is INT_MAX.
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include <sys/types.h>
> > > > > > > +#include <sys/socket.h>
> > > > > > > +#include <sys/un.h>
> > > > > > > +#include <errno.h>
> > > > > > > +#include <stdio.h>
> > > > > > > +#include <stdlib.h>
> > > > > > > +#include <string.h>
> > > > > > > +#include <limits.h>
> > > > > > > +#include <unistd.h>
> > > > > > > +#include <netdb.h>
> > > > > > > +
> > > > > > > +#include <netinet/tcp.h>
> > > > > > > +
> > > > > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> > > > > > > +
> > > > > > > +int main(int argc, char **argv)
> > > > > > > +{
> > > > > > > +	char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> > > > > > > +	     __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> > > > > > > +	struct sockaddr_un a = { AF_UNIX, "" };
> > > > > > > +	int cmd, fds[SCM_MAX_FD], s, ret, i;
> > > > > > > +	struct cmsghdr *cmsg;
> > > > > > > +	struct msghdr msg;
> > > > > > > +	struct iovec iov;
> > > > > > > +
> > > > > > > +	iov = (struct iovec){ &cmd, sizeof(cmd) };      
> > > > > > 
> > > > > > I mean, local to local, it's *probably* fine, but still a network
> > > > > > protocol not defined in terms of explicit width fields makes me
> > > > > > nervous.  I'd prefer to see the cmd being a packed structure with
> > > > > > fixed width elements.    
> > > > > 
> > > > > It actually is, because:
> > > > > 
> > > > > 	struct {
> > > > > 		int cmd;
> > > > > 	};
> > > > > 
> > > > > is a packet structure with fixed width elements. Any architecture we
> > > > > build for (at least the ones I'm aware of) has a 32-bit int. We can
> > > > > make it uint32_t if it makes you feel better.    
> > > > 
> > > > Sorry, I should have said "*explicitly* fixed width fields".  So, yes,
> > > > uint32_t would make me feel better :)  
> > > 
> > > Changed to int8_t anyway meanwhile. We don't need all those bits.  
> > 
> > Works or me.
> > 
> > > > > > I also think we should do some sort of basic magic / version exchange.
> > > > > > I don't see any reason we'd need to extend the protocol, but I'd
> > > > > > rather have the option if we have to.    
> > > > > 
> > > > > passt-repair will be packaged and distributed together with passt,
> > > > > though. Versions must match.    
> > > > 
> > > > But nothing enforces that.  
> > > 
> > > Distribution packages. If I run claws-mail with the wrong version of,
> > > say, libpixman, it won't start. If you don't use them, you're on your
> > > own.  
> > 
> > But shared libraries *do* have versioning checks: there are defined
> > compatibility semantics for sonames, and there can be symbol versions
> > as well.
> > 
> > > > AIUI KubeVirt will be running passt-repair
> > > > in a different context.  Which means it may well be deployed by a
> > > > different path than the passt binary  
> > > 
> > > No, that's not the way it works. It needs to match, in the sense that
> > > 1. it's a KubeVirt requirement to have compatible packages between
> > > distribution and the "base container image" and 2. this would most
> > > likely be sourced from the "base container image" anyway.
> > > 
> > > I maintain the packages for four distributions, plus AppArmor and
> > > SELinux policies upstream and downstream, and I take care of updating
> > > the package in KubeVirt as well, so I guess I have a vague idea of
> > > what's convenient, enforced, burdensome, and so on.
> > >   
> > > > which means however we
> > > > distribute it's quite plausible that a downstream screwup could
> > > > mismatch the versions.  We should endeavour to have a reasonably
> > > > graceful failure mode for that.  
> > > 
> > > Regardless of this, I think that *this one* is an interface (I wouldn't
> > > even call it a protocol) that needs to be set in stone, except for
> > > hypothetical (and highly unlikely) UAPI additions which we'll be anyway
> > > able to accommodate for easily.  
> > 
> > Ok, I can buy that, but it's a contradictory position to "versions
> > must match".
> 
> Note: "Regardless of this". It's *another* consideration *on top of
> that*.
> 
> 1. Versions (builds) match.
> 
> 2. And even if they didn't, it wouldn't be a problem, because this
>    interface will not change.

Hm, ok.  I'm way less convinced on both of those points.  Which means
I'd like to have a clear policy on whether we require versions to
match or not.  Which we prioritise affects design choices.

> > > It's a single socket option with three possible values (for 13 years
> > > now), of which we plan to use two. If we want this interface to do
> > > anything else, it should be another interface.
> > > 
> > > So there's really no problem with this.
> > > 
> > > Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW
> > > should ideally suffice), so it needs to be extremely simple and
> > > auditable.  
> > 
> > Sending and checking a magic number is not a lot of complexity, even
> > in something on this scale.
> 
> If you want to have multiple bytes (because I'm forecasting that you
> won't be happy with 255 values), it's substantial complexity in
> comparison to the current implementation.

> > > > > And latency here might matter more than in
> > > > > the rest of the migration process.    
> > > >   
> > > > > > Plus checking a magic number
> > > > > > should make things less damaging and more debuggable if you were to
> > > > > > point the repair helper at an entirely unrelated unix socket instead
> > > > > > of passt's repair socket.    
> > > > > 
> > > > > Maybe, yes, even though I don't really see good chances for that
> > > > > mistake to happen. Feel free to post a proposal, of course.    
> > > > 
> > > > I disagree on the good chances for a mistake thing.  In GSS I saw
> > > > plenty of occasions where things that shouldn't be mismatched were due
> > > > to some packaging or user screwup.  And that's before even considering
> > > > the way that KubeVirt deploys its various pieces seems to provide a
> > > > number of opportunities to mess this up.
> > > > 
> > > > So, I'll see what I can come up with.  I'm fine with requiring
> > > > matching versions if it's actually checked.  Maybe a magic derived
> > > > from our git hash, or even our build-id.  
> > > 
> > > Both would make things significantly less usable, because they would
> > > make different but compatible builds incompatible, and different
> > > implementations rather inconvenient.  
> > 
> > Ok, so you're definitely now saying versions *don't* need to match.
> 
> They don't need to, no. They will match, but they don't need to.
> 
> > > For example, it might be a practical solution to have a Go
> > > implementation of this in KubeVirt's virt-handler itself, but if it
> > > needs to extract information or strings from the binary, that becomes
> > > impractical.  
> > 
> > Ok... could we at least add just a magic number then.  If we do ever
> > need a new protocol we can change it, otherwise the protocol immutable
> > for now.
> 
> Adding a non-byte magic number implies handling short reads and short
> writes, which I think is absolutely unnecessary. Feel free to propose
> an implementation, as usual.

Ok, it's on my list.

> If you are happy with a single byte magic number, then I suppose that,
> given that we're just using three values, we could encode that
> information using a combination of the remaining bits, which has the
> advantage of not needing any specific implementation until it's
> actually needed (never, I suppose), because passt-repair already
> terminates on an unknown command value.

Yeah, as you predicted, I'm not really happy with a 1 byte magic
number.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-01-30  7:44 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-27 23:15 [PATCH 0/7] Draft, incomplete series introducing state migration Stefano Brivio
2025-01-27 23:15 ` [PATCH 1/7] icmp, udp: Pad time_t timestamp to 64-bit to ease " Stefano Brivio
2025-01-28  0:49   ` David Gibson
2025-01-28  6:48     ` Stefano Brivio
2025-01-27 23:15 ` [PATCH 2/7] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-28  0:50   ` David Gibson
2025-01-27 23:15 ` [PATCH 3/7] tcp_conn: Avoid 7-bit hole in struct tcp_splice_conn Stefano Brivio
2025-01-28  0:53   ` David Gibson
2025-01-28  6:48     ` Stefano Brivio
2025-01-29  1:02       ` David Gibson
2025-01-29  7:33         ` Stefano Brivio
2025-01-30  0:44           ` David Gibson
2025-01-30  4:55             ` Stefano Brivio
2025-01-30  7:27               ` David Gibson
2025-01-27 23:15 ` [PATCH 4/7] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-27 23:15 ` [PATCH 5/7] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-28  0:59   ` David Gibson
2025-01-28  6:48     ` Stefano Brivio
2025-01-29  1:03       ` David Gibson
2025-01-29  7:33         ` Stefano Brivio
2025-01-30  0:44           ` David Gibson
2025-01-27 23:15 ` [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-28  1:40   ` David Gibson
2025-01-28  6:50     ` Stefano Brivio
2025-01-29  1:16       ` David Gibson
2025-01-29  7:33         ` Stefano Brivio
2025-01-30  0:48           ` David Gibson
2025-01-30  4:55             ` Stefano Brivio
2025-01-30  7:38               ` David Gibson
2025-01-30  8:32                 ` Stefano Brivio
2025-01-30  8:54                   ` David Gibson
2025-01-27 23:15 ` [PATCH 7/7] Introduce passt-repair Stefano Brivio
2025-01-27 23:31   ` Stefano Brivio
2025-01-28  1:51   ` David Gibson
2025-01-28  6:51     ` Stefano Brivio
2025-01-29  1:29       ` David Gibson
2025-01-29  7:04         ` Stefano Brivio
2025-01-30  0:53           ` David Gibson
2025-01-30  4:55             ` Stefano Brivio
2025-01-30  7:43               ` David Gibson [this message]
2025-01-30  7:56                 ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5stppZzBjzc5Dv_@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).