From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=I2KnwVVM; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 604E45A061C for ; Mon, 03 Feb 2025 03:17:02 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1738549000; bh=MisSfHELhrb9uxdujtbVY76EByolZXSXcsPuBm8vKoY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=I2KnwVVMWec8p1N7Ur/kVaSMszj5AErBeJg2tkoZzj87aNuYMMD537lydoldrMZWK m8UTB8L5W0BdNSLA6o13eaOZQJhfv45DIxqYhBBke99FUumICaP8Jvkjou1jJXRjQh QgeMBR/rICZNktGJX4INhFHA6JmrQVfw8ih4Buy4O7HFcvfx5F6zUjQIgbnrcjdvKF /pjN34lmb2J4nreT5PtpIE+NI/hodu89KDP9mJoovaHn1kD7lXqGv/W/7PRoco7w/7 C5DL64uO1BqenKNx16/ue7fo+6+Z1UhknC/1TPEJnj06h+lT7O+kr7rWAeipqawLzH A0DIGTPkKcmwA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YmVVc4Gzqz4wxm; Mon, 3 Feb 2025 13:16:40 +1100 (AEDT) Date: Mon, 3 Feb 2025 12:46:44 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 08/20] Introduce passt-repair Message-ID: References: <20250131193953.3034031-1-sbrivio@redhat.com> <20250131193953.3034031-9-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Pl5d57eSeekhE3hV" Content-Disposition: inline In-Reply-To: <20250131193953.3034031-9-sbrivio@redhat.com> Message-ID-Hash: Z6E65YJEW34M3LGQ5JLBVTSOO44SXUGM X-Message-ID-Hash: Z6E65YJEW34M3LGQ5JLBVTSOO44SXUGM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Pl5d57eSeekhE3hV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jan 31, 2025 at 08:39:41PM +0100, Stefano Brivio wrote: > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of > passt. Not used yet. >=20 > >From David's patch: add it to .gitignore, like our other executable > targets. >=20 > Co-authored-by: David Gibson I don't think a trivial change like the .gitignore really needs to be commented and credited. > Signed-off-by: Stefano Brivio > --- > .gitignore | 1 + > Makefile | 10 +++-- > passt-repair.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 125 insertions(+), 3 deletions(-) > create mode 100644 passt-repair.c >=20 > diff --git a/.gitignore b/.gitignore > index d1c8be9..5824a71 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -3,6 +3,7 @@ > /passt.avx2 > /pasta > /pasta.avx2 > +/passt-repair > /qrap > /pasta.1 > /seccomp.h > diff --git a/Makefile b/Makefile > index 1383875..1b71cb0 100644 > --- a/Makefile > +++ b/Makefile > @@ -42,7 +42,8 @@ PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dh= cpv6.c flow.c fwd.c \ > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > vhost_user.c virtio.c vu_common.c > QRAP_SRCS =3D qrap.c > -SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > +PASST_REPAIR_SRCS =3D passt-repair.c > +SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > =20 > MANPAGES =3D passt.1 pasta.1 qrap.1 > =20 > @@ -72,9 +73,9 @@ mandir ?=3D $(datarootdir)/man > man1dir ?=3D $(mandir)/man1 > =20 > ifeq ($(TARGET_ARCH),x86_64) > -BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap > +BIN :=3D passt passt.avx2 pasta pasta.avx2 qrap passt-repair > else > -BIN :=3D passt pasta qrap > +BIN :=3D passt pasta qrap passt-repair > endif > =20 > all: $(BIN) $(MANPAGES) docs > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > qrap: $(QRAP_SRCS) passt.h > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=3D\"$(TARGET_ARCH)\" $(QRAP= _SRCS) -o qrap $(LDFLAGS) > =20 > +passt-repair: $(PASST_REPAIR_SRCS) > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repa= ir $(LDFLAGS) > + > valgrind: EXTRA_SYSCALLS +=3D rt_sigprocmask rt_sigtimedwait rt_sigactio= n \ > rt_sigreturn getpid gettid kill clock_gettime mmap \ > mmap2 munmap open unlink gettimeofday futex statx \ > diff --git a/passt-repair.c b/passt-repair.c > new file mode 100644 > index 0000000..988a52c > --- /dev/null > +++ b/passt-repair.c > @@ -0,0 +1,117 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +/* PASST - Plug A Simple Socket Transport > + * for qemu/UNIX domain socket mode > + * > + * PASTA - Pack A Subtle Tap Abstraction > + * for network namespace/tap device mode > + * > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets > + * > + * Copyright (c) 2025 Red Hat GmbH > + * Author: Stefano Brivio > + * > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGH= TS along > + * with byte commands mapping to TCP_REPAIR values, and switch repair mo= de on or > + * off. Reply by echoing the command. Exit on EOF. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in = UAPI */ > + > +int main(int argc, char **argv) > +{ > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > + struct sockaddr_un a =3D { AF_UNIX, "" }; > + int fds[SCM_MAX_FD], s, ret, i, n; > + int8_t cmd =3D INT8_MAX; > + struct cmsghdr *cmsg; > + struct msghdr msg; > + struct iovec iov; > + > + iov =3D (struct iovec){ &cmd, sizeof(cmd) }; > + msg =3D (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 }; > + cmsg =3D CMSG_FIRSTHDR(&msg); > + > + if (argc !=3D 2) { > + fprintf(stderr, "Usage: %s PATH\n", argv[0]); > + return -1; > + } > + > + ret =3D snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); > + if (ret <=3D 0 || ret >=3D (int)sizeof(a.sun_path)) { > + fprintf(stderr, "Invalid socket path: %s\n", argv[1]); > + return -1; > + } > + > + if ((s =3D socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { > + perror("Failed to create AF_UNIX socket"); > + return -1; > + } > + > + if (connect(s, (struct sockaddr *)&a, sizeof(a))) { > + fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], > + strerror(errno)); > + return -1; > + } > + > +loop: > + ret =3D recvmsg(s, &msg, 0); > + if (ret < 0) { > + perror("Failed to receive message"); > + return -1; > + } > + > + if (!ret) /* Done */ > + return 0; > + > + if (!cmsg || > + cmsg->cmsg_len < CMSG_LEN(sizeof(int)) || > + cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) || > + cmsg->cmsg_type !=3D SCM_RIGHTS) > + return -1; > + > + n =3D cmsg->cmsg_len / CMSG_LEN(sizeof(int)); > + memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n); > + > + if (cmd !=3D TCP_REPAIR_ON && cmd !=3D TCP_REPAIR_OFF && > + cmd !=3D TCP_REPAIR_OFF_NO_WP) { > + fprintf(stderr, "Unsupported command 0x%04x\n", cmd); > + return -1; > + } > + > + for (i =3D 0; i < n; i++) { > + int o =3D cmd; > + > + if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, &o, sizeof(o))) { > + fprintf(stderr, > + "Setting TCP_REPAIR to %i on socket %i: %s", o, > + fds[i], strerror(errno)); > + return -1; > + } So, I was thinking about this: I think we need to close() the fd, after calling TCP_REPAIR. If we don't, that's essentially an extra reference to the underlying kernel file object. That means: * When we close() the fd in passt, the socket won't actually go away. I think this is probably the cause of the in use ports you encountered. The current approach of exiting after the migrate is causing passt-repair to also exit, freeing up the additional references. * For incoming migrations, there's: when a migrated connection comes to a proper close on the target, the socket will be held open by the extra fd in the target side passt-repair. * At the moment, I don't think we expect more than two migrations for a single passt-repair instance (one in, and one out). But, particularly for the case of multiple failed migration attempts, I don't think we want to count on that. We're essentially leaking fd slots here, so passt-repair could run out of fds. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --Pl5d57eSeekhE3hV Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmegIAMACgkQzQJF27ox 2GdSLw//SdptTerUaYwKI4vdsu48UmBITsekDOdSchl7yvVbOULDbbZB5X+Rca01 Yyq8DNQpOiTB64S9i1UWWTv8Nhff8okeZiAv9E2aU4+KnlaIqOnYUx+hl8J8MQwp wJzp7y3V8QFGwjnv6d3RuI5uVtIsuIoOEZFFKs+ikv1QujY0YqOi+jp9ugLVftjc RYVBR+uWyfDCE1J3kEQgTu4Qcnf2UivqabGTPCnjVTjuUwUUJT0Umu4vpCj/Nt3P jA/Pq33ihrE9z1DreVx1/Qk3ODctvfkebV/Q4xdolzWdihhgg4ue/Jqd4f1PI0ms Pw0ZTbMjSirPX/tefeILwza16eaHs2eyCpK/13huAnSngbv5Iz+aKTKf3as5uWAe u7lSFBYGWQqr09KDx+c9pGMGS/4cGw5ysivWO6wpmsIOMPRA/+4eoz/AxUYIY/wy Y2ZduxM2i3ys79KUFnQRheC6Kgjbjm11hktoMwFZN8+5bgLXQOfoQVt971T4u5HK aUBhOW0V0XOo2WIi6ft7dNmFMG51nbRGJ0fD/6YbaDWpDpLBTfE/+3j0Eei4n6bj PqBV+dePHwH3RH3YIA9VBzKCOgEUK8ejZYvrvq/9aqF5M3/NJ9ReOBYJ2NZYv5iv MOF9LAVZuD7qsUZ01/8q1SYK3FZT++36E/9T4/0QvS6F6oq1zQc= =+Fdf -----END PGP SIGNATURE----- --Pl5d57eSeekhE3hV--