On Fri, Jan 31, 2025 at 08:39:41PM +0100, Stefano Brivio wrote: > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of > passt. Not used yet. > > >From David's patch: add it to .gitignore, like our other executable > targets. > > Co-authored-by: David Gibson I don't think a trivial change like the .gitignore really needs to be commented and credited. > Signed-off-by: Stefano Brivio > --- > .gitignore | 1 + > Makefile | 10 +++-- > passt-repair.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 125 insertions(+), 3 deletions(-) > create mode 100644 passt-repair.c > > diff --git a/.gitignore b/.gitignore > index d1c8be9..5824a71 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -3,6 +3,7 @@ > /passt.avx2 > /pasta > /pasta.avx2 > +/passt-repair > /qrap > /pasta.1 > /seccomp.h > diff --git a/Makefile b/Makefile > index 1383875..1b71cb0 100644 > --- a/Makefile > +++ b/Makefile > @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > vhost_user.c virtio.c vu_common.c > QRAP_SRCS = qrap.c > -SRCS = $(PASST_SRCS) $(QRAP_SRCS) > +PASST_REPAIR_SRCS = passt-repair.c > +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > MANPAGES = passt.1 pasta.1 qrap.1 > > @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man > man1dir ?= $(mandir)/man1 > > ifeq ($(TARGET_ARCH),x86_64) > -BIN := passt passt.avx2 pasta pasta.avx2 qrap > +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair > else > -BIN := passt pasta qrap > +BIN := passt pasta qrap passt-repair > endif > > all: $(BIN) $(MANPAGES) docs > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > qrap: $(QRAP_SRCS) passt.h > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS) > > +passt-repair: $(PASST_REPAIR_SRCS) > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS) > + > valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ > rt_sigreturn getpid gettid kill clock_gettime mmap \ > mmap2 munmap open unlink gettimeofday futex statx \ > diff --git a/passt-repair.c b/passt-repair.c > new file mode 100644 > index 0000000..988a52c > --- /dev/null > +++ b/passt-repair.c > @@ -0,0 +1,117 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +/* PASST - Plug A Simple Socket Transport > + * for qemu/UNIX domain socket mode > + * > + * PASTA - Pack A Subtle Tap Abstraction > + * for network namespace/tap device mode > + * > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets > + * > + * Copyright (c) 2025 Red Hat GmbH > + * Author: Stefano Brivio > + * > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along > + * with byte commands mapping to TCP_REPAIR values, and switch repair mode on or > + * off. Reply by echoing the command. Exit on EOF. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ > + > +int main(int argc, char **argv) > +{ > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > + struct sockaddr_un a = { AF_UNIX, "" }; > + int fds[SCM_MAX_FD], s, ret, i, n; > + int8_t cmd = INT8_MAX; > + struct cmsghdr *cmsg; > + struct msghdr msg; > + struct iovec iov; > + > + iov = (struct iovec){ &cmd, sizeof(cmd) }; > + msg = (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 }; > + cmsg = CMSG_FIRSTHDR(&msg); > + > + if (argc != 2) { > + fprintf(stderr, "Usage: %s PATH\n", argv[0]); > + return -1; > + } > + > + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); > + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) { > + fprintf(stderr, "Invalid socket path: %s\n", argv[1]); > + return -1; > + } > + > + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { > + perror("Failed to create AF_UNIX socket"); > + return -1; > + } > + > + if (connect(s, (struct sockaddr *)&a, sizeof(a))) { > + fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], > + strerror(errno)); > + return -1; > + } > + > +loop: > + ret = recvmsg(s, &msg, 0); > + if (ret < 0) { > + perror("Failed to receive message"); > + return -1; > + } > + > + if (!ret) /* Done */ > + return 0; > + > + if (!cmsg || > + cmsg->cmsg_len < CMSG_LEN(sizeof(int)) || > + cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) || > + cmsg->cmsg_type != SCM_RIGHTS) > + return -1; > + > + n = cmsg->cmsg_len / CMSG_LEN(sizeof(int)); > + memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n); > + > + if (cmd != TCP_REPAIR_ON && cmd != TCP_REPAIR_OFF && > + cmd != TCP_REPAIR_OFF_NO_WP) { > + fprintf(stderr, "Unsupported command 0x%04x\n", cmd); > + return -1; > + } > + > + for (i = 0; i < n; i++) { > + int o = cmd; > + > + if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, &o, sizeof(o))) { > + fprintf(stderr, > + "Setting TCP_REPAIR to %i on socket %i: %s", o, > + fds[i], strerror(errno)); > + return -1; > + } So, I was thinking about this: I think we need to close() the fd, after calling TCP_REPAIR. If we don't, that's essentially an extra reference to the underlying kernel file object. That means: * When we close() the fd in passt, the socket won't actually go away. I think this is probably the cause of the in use ports you encountered. The current approach of exiting after the migrate is causing passt-repair to also exit, freeing up the additional references. * For incoming migrations, there's: when a migrated connection comes to a proper close on the target, the socket will be held open by the extra fd in the target side passt-repair. * At the moment, I don't think we expect more than two migrations for a single passt-repair instance (one in, and one out). But, particularly for the case of multiple failed migration attempts, I don't think we want to count on that. We're essentially leaking fd slots here, so passt-repair could run out of fds. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson