public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH v3 08/20] Introduce passt-repair
Date: Mon, 3 Feb 2025 12:46:44 +1100	[thread overview]
Message-ID: <Z6AgBAOo0IoGuKUf@zatzit> (raw)
In-Reply-To: <20250131193953.3034031-9-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6831 bytes --]

On Fri, Jan 31, 2025 at 08:39:41PM +0100, Stefano Brivio wrote:
> A privileged helper to set/clear TCP_REPAIR on sockets on behalf of
> passt. Not used yet.
> 
> >From David's patch: add it to .gitignore, like our other executable
> targets.
> 
> Co-authored-by: David Gibson <david@gibson.dropbear.id.au>

I don't think a trivial change like the .gitignore really needs to be
commented and credited.

> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  .gitignore     |   1 +
>  Makefile       |  10 +++--
>  passt-repair.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 125 insertions(+), 3 deletions(-)
>  create mode 100644 passt-repair.c
> 
> diff --git a/.gitignore b/.gitignore
> index d1c8be9..5824a71 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -3,6 +3,7 @@
>  /passt.avx2
>  /pasta
>  /pasta.avx2
> +/passt-repair
>  /qrap
>  /pasta.1
>  /seccomp.h
> diff --git a/Makefile b/Makefile
> index 1383875..1b71cb0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \
>  	tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
>  	vhost_user.c virtio.c vu_common.c
>  QRAP_SRCS = qrap.c
> -SRCS = $(PASST_SRCS) $(QRAP_SRCS)
> +PASST_REPAIR_SRCS = passt-repair.c
> +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
>  
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
> @@ -72,9 +73,9 @@ mandir		?= $(datarootdir)/man
>  man1dir		?= $(mandir)/man1
>  
>  ifeq ($(TARGET_ARCH),x86_64)
> -BIN := passt passt.avx2 pasta pasta.avx2 qrap
> +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair
>  else
> -BIN := passt pasta qrap
> +BIN := passt pasta qrap passt-repair
>  endif
>  
>  all: $(BIN) $(MANPAGES) docs
> @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt%
>  qrap: $(QRAP_SRCS) passt.h
>  	$(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS)
>  
> +passt-repair: $(PASST_REPAIR_SRCS)
> +	$(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS)
> +
>  valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction	\
>  			    rt_sigreturn getpid gettid kill clock_gettime mmap \
>  			    mmap2 munmap open unlink gettimeofday futex statx \
> diff --git a/passt-repair.c b/passt-repair.c
> new file mode 100644
> index 0000000..988a52c
> --- /dev/null
> +++ b/passt-repair.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + *  for qemu/UNIX domain socket mode
> + *
> + * PASTA - Pack A Subtle Tap Abstraction
> + *  for network namespace/tap device mode
> + *
> + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets
> + *
> + * Copyright (c) 2025 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + *
> + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along
> + * with byte commands mapping to TCP_REPAIR values, and switch repair mode on or
> + * off. Reply by echoing the command. Exit on EOF.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <netdb.h>
> +
> +#include <netinet/tcp.h>
> +
> +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> +
> +int main(int argc, char **argv)
> +{
> +	char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> +	     __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> +	struct sockaddr_un a = { AF_UNIX, "" };
> +	int fds[SCM_MAX_FD], s, ret, i, n;
> +	int8_t cmd = INT8_MAX;
> +	struct cmsghdr *cmsg;
> +	struct msghdr msg;
> +	struct iovec iov;
> +
> +	iov = (struct iovec){ &cmd, sizeof(cmd) };
> +	msg = (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 };
> +	cmsg = CMSG_FIRSTHDR(&msg);
> +
> +	if (argc != 2) {
> +		fprintf(stderr, "Usage: %s PATH\n", argv[0]);
> +		return -1;
> +	}
> +
> +	ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]);
> +	if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) {
> +		fprintf(stderr, "Invalid socket path: %s\n", argv[1]);
> +		return -1;
> +	}
> +
> +	if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
> +		perror("Failed to create AF_UNIX socket");
> +		return -1;
> +	}
> +
> +	if (connect(s, (struct sockaddr *)&a, sizeof(a))) {
> +		fprintf(stderr, "Failed to connect to %s: %s\n", argv[1],
> +			strerror(errno));
> +		return -1;
> +	}
> +
> +loop:
> +	ret = recvmsg(s, &msg, 0);
> +	if (ret < 0) {
> +		perror("Failed to receive message");
> +		return -1;
> +	}
> +
> +	if (!ret)	/* Done */
> +		return 0;
> +
> +	if (!cmsg ||
> +	    cmsg->cmsg_len < CMSG_LEN(sizeof(int)) ||
> +	    cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) ||
> +	    cmsg->cmsg_type != SCM_RIGHTS)
> +		return -1;
> +
> +	n = cmsg->cmsg_len / CMSG_LEN(sizeof(int));
> +	memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n);
> +
> +	if (cmd != TCP_REPAIR_ON && cmd != TCP_REPAIR_OFF &&
> +	    cmd != TCP_REPAIR_OFF_NO_WP) {
> +		fprintf(stderr, "Unsupported command 0x%04x\n", cmd);
> +		return -1;
> +	}
> +
> +	for (i = 0; i < n; i++) {
> +		int o = cmd;
> +
> +		if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, &o, sizeof(o))) {
> +			fprintf(stderr,
> +				"Setting TCP_REPAIR to %i on socket %i: %s", o,
> +				fds[i], strerror(errno));
> +			return -1;
> +		}

So, I was thinking about this: I think we need to close() the fd,
after calling TCP_REPAIR.  If we don't, that's essentially an extra
reference to the underlying kernel file object.  That means:

 * When we close() the fd in passt, the socket won't actually go away.
   I think this is probably the cause of the in use ports you
   encountered.  The current approach of exiting after the migrate is
   causing passt-repair to also exit, freeing up the additional
   references.

 * For incoming migrations, there's: when a migrated
   connection comes to a proper close on the target, the socket will
   be held open by the extra fd in the target side passt-repair.

 * At the moment, I don't think we expect more than two migrations for
   a single passt-repair instance (one in, and one out).  But,
   particularly for the case of multiple failed migration attempts, I
   don't think we want to count on that.  We're essentially leaking fd
   slots here, so passt-repair could run out of fds.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-02-03  2:17 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-31 19:39 [PATCH v3 00/20] Draft, incomplete series introducing state migration Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 01/20] tcp: Always pass NULL event with EPOLL_CTL_DEL Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 02/20] util: Rename and make global vu_remove_watch() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 03/20] icmp, udp: Pad time_t timestamp to 64-bit to ease state migration Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 04/20] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 05/20] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 06/20] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 07/20] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 08/20] Introduce passt-repair Stefano Brivio
2025-02-03  1:46   ` David Gibson [this message]
2025-01-31 19:39 ` [PATCH v3 09/20] Add interfaces and configuration bits for passt-repair Stefano Brivio
2025-02-03  5:22   ` David Gibson
2025-01-31 19:39 ` [PATCH v3 10/20] flow, tcp: Basic pre-migration source handler to dump sequence numbers Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 11/20] migrate: vu_migrate_{source,target}() aren't actually vu speciic Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 12/20] migrate: Move repair_sock_init() to vu_init() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 13/20] migrate: Make more handling common rather than vhost-user specific Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 14/20] migrate: Don't handle the migration channel through epoll Stefano Brivio
2025-02-03  1:50   ` David Gibson
2025-02-03  5:38     ` Stefano Brivio
2025-02-03  8:45       ` David Gibson
2025-02-03  2:16   ` David Gibson
2025-01-31 19:39 ` [PATCH v3 15/20] flow, flow_table: Export declaration of hash table Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 16/20] vhost_user: Turn vhost-user message reports to trace() Stefano Brivio
2025-02-03  3:11   ` David Gibson
2025-02-03  6:10     ` Stefano Brivio
2025-02-03  8:47       ` David Gibson
2025-01-31 19:39 ` [PATCH v3 17/20] vhost_user: Make source quit after reporting migration state Stefano Brivio
2025-02-03  1:55   ` David Gibson
2025-02-03  6:09     ` Stefano Brivio
2025-02-03  8:52       ` David Gibson
2025-02-03  9:44         ` Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 18/20] tcp: Get our socket port using getsockname() when connecting from guest Stefano Brivio
2025-02-03  2:05   ` David Gibson
2025-02-03  6:09     ` Stefano Brivio
2025-02-03  8:59       ` David Gibson
2025-02-03  9:45         ` Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 19/20] tcp: Add HOSTSIDE(x), HOSTFLOW(x) macros Stefano Brivio
2025-02-03  2:06   ` David Gibson
2025-01-31 19:39 ` [PATCH v3 20/20] Implement target side of migration Stefano Brivio
2025-02-01  7:45 ` [PATCH v3 00/20] Draft, incomplete series introducing state migration Stefano Brivio
2025-02-03  2:18   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6AgBAOo0IoGuKUf@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).