From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH v3 08/20] Introduce passt-repair
Date: Mon, 3 Feb 2025 12:46:44 +1100 [thread overview]
Message-ID: <Z6AgBAOo0IoGuKUf@zatzit> (raw)
In-Reply-To: <20250131193953.3034031-9-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6831 bytes --]
On Fri, Jan 31, 2025 at 08:39:41PM +0100, Stefano Brivio wrote:
> A privileged helper to set/clear TCP_REPAIR on sockets on behalf of
> passt. Not used yet.
>
> >From David's patch: add it to .gitignore, like our other executable
> targets.
>
> Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
I don't think a trivial change like the .gitignore really needs to be
commented and credited.
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> .gitignore | 1 +
> Makefile | 10 +++--
> passt-repair.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 125 insertions(+), 3 deletions(-)
> create mode 100644 passt-repair.c
>
> diff --git a/.gitignore b/.gitignore
> index d1c8be9..5824a71 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -3,6 +3,7 @@
> /passt.avx2
> /pasta
> /pasta.avx2
> +/passt-repair
> /qrap
> /pasta.1
> /seccomp.h
> diff --git a/Makefile b/Makefile
> index 1383875..1b71cb0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \
> tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
> vhost_user.c virtio.c vu_common.c
> QRAP_SRCS = qrap.c
> -SRCS = $(PASST_SRCS) $(QRAP_SRCS)
> +PASST_REPAIR_SRCS = passt-repair.c
> +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
>
> MANPAGES = passt.1 pasta.1 qrap.1
>
> @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man
> man1dir ?= $(mandir)/man1
>
> ifeq ($(TARGET_ARCH),x86_64)
> -BIN := passt passt.avx2 pasta pasta.avx2 qrap
> +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair
> else
> -BIN := passt pasta qrap
> +BIN := passt pasta qrap passt-repair
> endif
>
> all: $(BIN) $(MANPAGES) docs
> @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt%
> qrap: $(QRAP_SRCS) passt.h
> $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS)
>
> +passt-repair: $(PASST_REPAIR_SRCS)
> + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS)
> +
> valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \
> rt_sigreturn getpid gettid kill clock_gettime mmap \
> mmap2 munmap open unlink gettimeofday futex statx \
> diff --git a/passt-repair.c b/passt-repair.c
> new file mode 100644
> index 0000000..988a52c
> --- /dev/null
> +++ b/passt-repair.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + * for qemu/UNIX domain socket mode
> + *
> + * PASTA - Pack A Subtle Tap Abstraction
> + * for network namespace/tap device mode
> + *
> + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets
> + *
> + * Copyright (c) 2025 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + *
> + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along
> + * with byte commands mapping to TCP_REPAIR values, and switch repair mode on or
> + * off. Reply by echoing the command. Exit on EOF.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <netdb.h>
> +
> +#include <netinet/tcp.h>
> +
> +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> +
> +int main(int argc, char **argv)
> +{
> + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> + __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> + struct sockaddr_un a = { AF_UNIX, "" };
> + int fds[SCM_MAX_FD], s, ret, i, n;
> + int8_t cmd = INT8_MAX;
> + struct cmsghdr *cmsg;
> + struct msghdr msg;
> + struct iovec iov;
> +
> + iov = (struct iovec){ &cmd, sizeof(cmd) };
> + msg = (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 };
> + cmsg = CMSG_FIRSTHDR(&msg);
> +
> + if (argc != 2) {
> + fprintf(stderr, "Usage: %s PATH\n", argv[0]);
> + return -1;
> + }
> +
> + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]);
> + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) {
> + fprintf(stderr, "Invalid socket path: %s\n", argv[1]);
> + return -1;
> + }
> +
> + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
> + perror("Failed to create AF_UNIX socket");
> + return -1;
> + }
> +
> + if (connect(s, (struct sockaddr *)&a, sizeof(a))) {
> + fprintf(stderr, "Failed to connect to %s: %s\n", argv[1],
> + strerror(errno));
> + return -1;
> + }
> +
> +loop:
> + ret = recvmsg(s, &msg, 0);
> + if (ret < 0) {
> + perror("Failed to receive message");
> + return -1;
> + }
> +
> + if (!ret) /* Done */
> + return 0;
> +
> + if (!cmsg ||
> + cmsg->cmsg_len < CMSG_LEN(sizeof(int)) ||
> + cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) ||
> + cmsg->cmsg_type != SCM_RIGHTS)
> + return -1;
> +
> + n = cmsg->cmsg_len / CMSG_LEN(sizeof(int));
> + memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n);
> +
> + if (cmd != TCP_REPAIR_ON && cmd != TCP_REPAIR_OFF &&
> + cmd != TCP_REPAIR_OFF_NO_WP) {
> + fprintf(stderr, "Unsupported command 0x%04x\n", cmd);
> + return -1;
> + }
> +
> + for (i = 0; i < n; i++) {
> + int o = cmd;
> +
> + if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, &o, sizeof(o))) {
> + fprintf(stderr,
> + "Setting TCP_REPAIR to %i on socket %i: %s", o,
> + fds[i], strerror(errno));
> + return -1;
> + }
So, I was thinking about this: I think we need to close() the fd,
after calling TCP_REPAIR. If we don't, that's essentially an extra
reference to the underlying kernel file object. That means:
* When we close() the fd in passt, the socket won't actually go away.
I think this is probably the cause of the in use ports you
encountered. The current approach of exiting after the migrate is
causing passt-repair to also exit, freeing up the additional
references.
* For incoming migrations, there's: when a migrated
connection comes to a proper close on the target, the socket will
be held open by the extra fd in the target side passt-repair.
* At the moment, I don't think we expect more than two migrations for
a single passt-repair instance (one in, and one out). But,
particularly for the case of multiple failed migration attempts, I
don't think we want to count on that. We're essentially leaking fd
slots here, so passt-repair could run out of fds.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-02-03 2:17 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 19:39 [PATCH v3 00/20] Draft, incomplete series introducing state migration Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 01/20] tcp: Always pass NULL event with EPOLL_CTL_DEL Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 02/20] util: Rename and make global vu_remove_watch() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 03/20] icmp, udp: Pad time_t timestamp to 64-bit to ease state migration Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 04/20] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 05/20] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 06/20] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 07/20] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 08/20] Introduce passt-repair Stefano Brivio
2025-02-03 1:46 ` David Gibson [this message]
2025-01-31 19:39 ` [PATCH v3 09/20] Add interfaces and configuration bits for passt-repair Stefano Brivio
2025-02-03 5:22 ` David Gibson
2025-01-31 19:39 ` [PATCH v3 10/20] flow, tcp: Basic pre-migration source handler to dump sequence numbers Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 11/20] migrate: vu_migrate_{source,target}() aren't actually vu speciic Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 12/20] migrate: Move repair_sock_init() to vu_init() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 13/20] migrate: Make more handling common rather than vhost-user specific Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 14/20] migrate: Don't handle the migration channel through epoll Stefano Brivio
2025-02-03 1:50 ` David Gibson
2025-02-03 5:38 ` Stefano Brivio
2025-02-03 8:45 ` David Gibson
2025-02-03 2:16 ` David Gibson
2025-01-31 19:39 ` [PATCH v3 15/20] flow, flow_table: Export declaration of hash table Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 16/20] vhost_user: Turn vhost-user message reports to trace() Stefano Brivio
2025-02-03 3:11 ` David Gibson
2025-02-03 6:10 ` Stefano Brivio
2025-02-03 8:47 ` David Gibson
2025-01-31 19:39 ` [PATCH v3 17/20] vhost_user: Make source quit after reporting migration state Stefano Brivio
2025-02-03 1:55 ` David Gibson
2025-02-03 6:09 ` Stefano Brivio
2025-02-03 8:52 ` David Gibson
2025-02-03 9:44 ` Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 18/20] tcp: Get our socket port using getsockname() when connecting from guest Stefano Brivio
2025-02-03 2:05 ` David Gibson
2025-02-03 6:09 ` Stefano Brivio
2025-02-03 8:59 ` David Gibson
2025-02-03 9:45 ` Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 19/20] tcp: Add HOSTSIDE(x), HOSTFLOW(x) macros Stefano Brivio
2025-02-03 2:06 ` David Gibson
2025-01-31 19:39 ` [PATCH v3 20/20] Implement target side of migration Stefano Brivio
2025-02-01 7:45 ` [PATCH v3 00/20] Draft, incomplete series introducing state migration Stefano Brivio
2025-02-03 2:18 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6AgBAOo0IoGuKUf@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).