From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=np45SRnC; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 751915A061D for ; Thu, 30 Jan 2025 02:33:21 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1738200790; bh=hUjt7ApYcgyDUIpp3Eday9qcJD5i6X0HT8LvOpv+Op4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=np45SRnCSrfWzrRVsGoPRKEGkQgP/Dbcvs6V9qpSB7tPq0U4o5nvZH/s6bA2UYrCo xDskLO5g1Uf9KP66pEbVKx01pgFhNc9Iu5KxizRkCbezuiBqeGhy9GB7TZm7jl+4PB dO04yRbCttEEuyiSgJ0TZoFqjA09Ny90a1Y00srWTyjdXBVNyDkYOn8CzlGLJGQz0z 23SSR7X594q6koGITvSXryUNvsSsqQQ/PQNmTDBzquVEjVJhyX7guZnwuGVqhfxmh7 QVDn3a6+IufNodp1szCfjvi9YknLLExo0ETL5yLpH8qZdzqDH5PXQpIEH78nNFHupE e5Vjto4AsbSMw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Yk1kG0YbMz4wb0; Thu, 30 Jan 2025 12:33:10 +1100 (AEDT) Date: Thu, 30 Jan 2025 12:33:11 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 7/8] Add interfaces and configuration bits for passt-repair Message-ID: References: <20250128233940.1235855-1-sbrivio@redhat.com> <20250128233940.1235855-8-sbrivio@redhat.com> <20250129094610.148de3c6@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="HrAFDF5V+Z+wJLC8" Content-Disposition: inline In-Reply-To: <20250129094610.148de3c6@elisabeth> Message-ID-Hash: FAMI2MREB5DOZZASAZZ5UJDWVOXDD2JD X-Message-ID-Hash: FAMI2MREB5DOZZASAZZ5UJDWVOXDD2JD X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --HrAFDF5V+Z+wJLC8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 29, 2025 at 09:46:10AM +0100, Stefano Brivio wrote: > On Wed, 29 Jan 2025 17:09:07 +1100 > David Gibson wrote: >=20 > > On Wed, Jan 29, 2025 at 12:39:39AM +0100, Stefano Brivio wrote: > > > In vhost-user mode, by default, create a second UNIX domain socket > > > accepting connections from passt-repair, with the usual listener > > > socket. > > >=20 > > > When we need to set or clear TCP_REPAIR on sockets, we'll send them > > > via SCM_RIGHTS to passt-repair, who sets the socket option values we > > > ask for. > > >=20 > > > To that end, introduce batched functions to request TCP_REPAIR > > > settings on sockets, so that we don't have to send a single message > > > for each socket, on migration. When needed, repair_flush() will > > > send the message and check for the reply. > > >=20 > > > Signed-off-by: Stefano Brivio > > > --- > > > Makefile | 12 ++-- > > > conf.c | 46 ++++++++++-- > > > epoll_type.h | 4 ++ > > > passt.1 | 11 +++ > > > passt.c | 9 +++ > > > passt.h | 7 ++ > > > repair.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++++= ++ > > > repair.h | 16 +++++ > > > tap.c | 65 +---------------- > > > util.c | 62 +++++++++++++++++ > > > util.h | 1 + > > > 11 files changed, 353 insertions(+), 72 deletions(-) > > > create mode 100644 repair.c > > > create mode 100644 repair.h > > >=20 > > > diff --git a/Makefile b/Makefile > > > index 1b71cb0..f67a20b 100644 > > > --- a/Makefile > > > +++ b/Makefile > > > @@ -38,9 +38,9 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCK= ETS) > > > =20 > > > PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c= fwd.c \ > > > icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c= \ > > > - ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c tap= =2Ec \ > > > - tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c uti= l.c \ > > > - vhost_user.c virtio.c vu_common.c > > > + ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c \ > > > + repair.c tap.c tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow= =2Ec \ > > > + udp_vu.c util.c vhost_user.c virtio.c vu_common.c > > > QRAP_SRCS =3D qrap.c > > > PASST_REPAIR_SRCS =3D passt-repair.c > > > SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > > @@ -50,9 +50,9 @@ MANPAGES =3D passt.1 pasta.1 qrap.1 > > > PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flo= w.h fwd.h \ > > > flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ > > > lineread.h log.h migrate.h ndp.h netlink.h packet.h passt.h pasta.h= \ > > > - pcap.h pif.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_interna= l.h \ > > > - tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util= =2Eh \ > > > - vhost_user.h virtio.h vu_common.h > > > + pcap.h pif.h repair.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h \ > > > + tcp_internal.h tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.= h \ > > > + udp_vu.h util.h vhost_user.h virtio.h vu_common.h > > > HEADERS =3D $(PASST_HEADERS) seccomp.h > > > =20 > > > C :=3D \#include \nint main(){int a=3Dgetrandom(0, 0, = 0);} > > > diff --git a/conf.c b/conf.c > > > index df2b016..85dec44 100644 > > > --- a/conf.c > > > +++ b/conf.c > > > @@ -816,6 +816,9 @@ static void usage(const char *name, FILE *f, int = status) > > > " UNIX domain socket is provided by -s option\n" > > > " --print-capabilities print back-end capabilities in JSON forma= t,\n" > > > " only meaningful for vhost-user mode\n"); > > > + FPRINTF(f, > > > + " --repair-path PATH path for passt-repair(1)\n" =20 > >=20 > > Nit: as a privileged helper, should it be passt-repair(8)? >=20 > So, I spent a couple of minutes on this as I wrote this, and I spent a > bit longer now: the most authoritative definition I can find of section > 8 is from man-pages(7): >=20 > 8 System management commands > Commands like mount(8), many of which only root can > execute. >=20 > It's not really a system management command, and the idea is to run it > with CAP_NET_ADMIN, but not necessarily as root. So I would rather keep > it in section 1, unless there's some other conflicting definition I'm > not aware of. Yeah oh, on that basis (1) makes more sense than (8). > There's also the topic of where it should be installed (/sbin, > /usr/sbin/, /bin, /usr/bin). I'd pick /usr/bin, because /sbin doesn't > really mean much nowadays, and it's anyway fitting with the FHS 3.0: >=20 > Utilities used for system administration (and other root-only > commands) are stored in /sbin, /usr/ sbin, and /usr/local/sbin. > =20 > /sbin contains binaries essential for booting, restoring, recovering, > and/or repairing the system in addition to the binaries in /bin. > 18 Programs executed after /usr is known to be mounted (when there > are no problems) are generally placed into /usr/sbin. >=20 > ...and I don't think this helper would qualify for /sbin or /usr/sbin. >=20 > >=20 > > > + " default: append '.repair' to UNIX domain path\n"); > > > } > > > =20 > > > FPRINTF(f, > > > @@ -1240,8 +1243,30 @@ static void conf_nat(const char *arg, struct i= n_addr *addr4, > > > */ > > > static void conf_open_files(struct ctx *c) > > > { > > > - if (c->mode !=3D MODE_PASTA && c->fd_tap =3D=3D -1) > > > - c->fd_tap_listen =3D tap_sock_unix_open(c->sock_path); > > > + if (c->mode !=3D MODE_PASTA && c->fd_tap =3D=3D -1) { > > > + c->fd_tap_listen =3D sock_unix(c->sock_path); > > > + > > > + if (c->mode =3D=3D MODE_VU && strcmp(c->repair_path, "none")) { > > > + if (!strncmp(c->repair_path, "./", 2)) { > > > + memmove(c->repair_path, c->repair_path + 2, > > > + sizeof(c->repair_path) - 2); > > > + } =20 > >=20 > > Do you need this? Shouldn't "./whatever" be usable as-is? >=20 > Ah, yes, I didn't know. I explicitly added this for the '--repair-path > ./none' case, but it's not actually needed. I'll drop this. >=20 > > > + > > > + if (!*c->repair_path && > > > + snprintf_check(c->repair_path, > > > + sizeof(c->repair_path), "%s.repair", > > > + c->sock_path)) { > > > + warn("passt-repair path %s not usable", > > > + c->repair_path); =20 > >=20 > > I'd prefer a die() here - I think omitting a possibly expected > > feature, with just a warning that could easily be lost in the logs is > > not a good idea. >=20 > I was thinking about that, but should we really risk *not* starting > because with ".repair" the path is now too long, for a feature that, > realistically, most users won't actually use? >=20 > If we're started by any kind of framework (which is where we run the > risk of the warning being ignored), then there should be explicit > checks about the path and the usability of passt-repair (or equivalent). Hm, yeah, I guess. Honestly I'm not sure which option will cause us less trouble. > > > + c->fd_repair_listen =3D -1; > > > + } else { > > > + c->fd_repair_listen =3D sock_unix(c->repair_path); > > > + } > > > + } else { > > > + c->fd_repair_listen =3D -1; > > > + } > > > + c->fd_repair =3D -1; > > > + } > > > =20 > > > if (*c->pidfile) { > > > c->pidfile_fd =3D output_file_open(c->pidfile, O_WRONLY); > > > @@ -1354,9 +1379,12 @@ void conf(struct ctx *c, int argc, char **argv) > > > {"host-lo-to-ns-lo", no_argument, NULL, 23 }, > > > {"dns-host", required_argument, NULL, 24 }, > > > {"vhost-user", no_argument, NULL, 25 }, > > > + > > > /* vhost-user backend program convention */ > > > {"print-capabilities", no_argument, NULL, 26 }, > > > {"socket-path", required_argument, NULL, 's' }, > > > + > > > + {"repair-path", required_argument, NULL, 27 }, > > > { 0 }, > > > }; > > > const char *logname =3D (c->mode =3D=3D MODE_PASTA) ? "pasta" : "pa= sst"; > > > @@ -1824,8 +1852,8 @@ void conf(struct ctx *c, int argc, char **argv) > > > if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw)) > > > c->no_dhcp =3D 1; > > > =20 > > > - /* Inbound port options & DNS can be parsed now (after IPv4/IPv6 > > > - * settings) > > > + /* Inbound port options, DNS, and --repair-path can be parsed now, = after > > > + * IPv4/IPv6 settings and --vhost-user. > > > */ > > > fwd_probe_ephemeral(); > > > udp_portmap_clear(); > > > @@ -1871,6 +1899,16 @@ void conf(struct ctx *c, int argc, char **argv) > > > } > > > =20 > > > die("Cannot use DNS address %s", optarg); > > > + } else if (name =3D=3D 27) { > > > + if (c->mode !=3D MODE_VU && strcmp(optarg, "none")) > > > + die("--repair-path is for vhost-user mode only"); > > > + > > > + if (snprintf_check(c->repair_path, > > > + sizeof(c->repair_path), "%s", > > > + optarg)) > > > + die("Invalid passt-repair path: %s", optarg); > > > + > > > + break; > > > } > > > } while (name !=3D -1); > > > =20 > > > diff --git a/epoll_type.h b/epoll_type.h > > > index fd9eac3..706238a 100644 > > > --- a/epoll_type.h > > > +++ b/epoll_type.h > > > @@ -42,6 +42,10 @@ enum epoll_type { > > > EPOLL_TYPE_VHOST_KICK, > > > /* vhost-user migration socket */ > > > EPOLL_TYPE_VHOST_MIGRATION, > > > + /* TCP_REPAIR helper listening socket */ > > > + EPOLL_TYPE_REPAIR_LISTEN, > > > + /* TCP_REPAIR helper socket */ > > > + EPOLL_TYPE_REPAIR, > > > =20 > > > EPOLL_NUM_TYPES, > > > }; > > > diff --git a/passt.1 b/passt.1 > > > index d9cd33e..63a3a01 100644 > > > --- a/passt.1 > > > +++ b/passt.1 > > > @@ -418,6 +418,17 @@ Enable vhost-user. The vhost-user command socket= is provided by \fB--socket\fR. > > > .BR \-\-print-capabilities > > > Print back-end capabilities in JSON format, only meaningful for vhos= t-user mode. > > > =20 > > > +.TP > > > +.BR \-\-repair-path " " \fIpath > > > +Path for UNIX domain socket used by the \fBpasst-repair\fR(1) helper= to connect =20 > >=20 > > passt-repair(8)? >=20 > See above. >=20 > > > +to \fBpasst\fR in order to set or clear the TCP_REPAIR option on soc= kets, during > > > +migration. \fB--repair-path none\fR disables this interface (if you = need to > > > +specify a socket path called "none" you can prefix the path by \fI./= \fR). > > > + > > > +Default, for \-\-vhost-user mode only, is to append \fI.repair\fR to= the path > > > +chosen for the hypervisor UNIX domain socket. No socket is created i= f not in > > > +\-\-vhost-user mode. > > > + > > > .TP > > > .BR \-F ", " \-\-fd " " \fIFD > > > Pass a pre-opened, connected socket to \fBpasst\fR. Usually the sock= et is opened > > > diff --git a/passt.c b/passt.c > > > index 184d4e5..1fa2ddd 100644 > > > --- a/passt.c > > > +++ b/passt.c > > > @@ -51,6 +51,7 @@ > > > #include "tcp_splice.h" > > > #include "ndp.h" > > > #include "vu_common.h" > > > +#include "repair.h" > > > =20 > > > #define EPOLL_EVENTS 8 > > > =20 > > > @@ -76,6 +77,8 @@ char *epoll_type_str[] =3D { > > > [EPOLL_TYPE_VHOST_CMD] =3D "vhost-user command socket", > > > [EPOLL_TYPE_VHOST_KICK] =3D "vhost-user kick socket", > > > [EPOLL_TYPE_VHOST_MIGRATION] =3D "vhost-user migration socket", > > > + [EPOLL_TYPE_REPAIR_LISTEN] =3D "TCP_REPAIR helper listening socket", > > > + [EPOLL_TYPE_REPAIR] =3D "TCP_REPAIR helper socket", > > > }; > > > static_assert(ARRAY_SIZE(epoll_type_str) =3D=3D EPOLL_NUM_TYPES, > > > "epoll_type_str[] doesn't match enum epoll_type"); > > > @@ -360,6 +363,12 @@ loop: > > > case EPOLL_TYPE_VHOST_MIGRATION: > > > vu_migrate(&c, eventmask); > > > break; > > > + case EPOLL_TYPE_REPAIR_LISTEN: > > > + repair_listen_handler(&c, eventmask); > > > + break; > > > + case EPOLL_TYPE_REPAIR: > > > + repair_handler(&c, eventmask); > > > + break; > > > default: > > > /* Can't happen */ > > > ASSERT(0); > > > diff --git a/passt.h b/passt.h > > > index 0dd4efa..85b0a10 100644 > > > --- a/passt.h > > > +++ b/passt.h > > > @@ -20,6 +20,7 @@ union epoll_ref; > > > #include "siphash.h" > > > #include "ip.h" > > > #include "inany.h" > > > +#include "migrate.h" > > > #include "flow.h" > > > #include "icmp.h" > > > #include "fwd.h" > > > @@ -193,6 +194,7 @@ struct ip6_ctx { > > > * @foreground: Run in foreground, don't log to stderr by default > > > * @nofile: Maximum number of open files (ulimit -n) > > > * @sock_path: Path for UNIX domain socket > > > + * @repair_path: TCP_REPAIR helper path, can be "none", empty for de= fault > > > * @pcap: Path for packet capture file > > > * @pidfile: Path to PID file, empty string if not configured > > > * @pidfile_fd: File descriptor for PID file, -1 if none > > > @@ -203,6 +205,8 @@ struct ip6_ctx { > > > * @epollfd: File descriptor for epoll instance > > > * @fd_tap_listen: File descriptor for listening AF_UNIX socket, if = any > > > * @fd_tap: AF_UNIX socket, tuntap device, or pre-opened socket > > > + * @fd_repair_listen: File descriptor for listening TCP_REPAIR socke= t, if any > > > + * @fd_repair: Connected AF_UNIX socket for TCP_REPAIR helper > > > * @our_tap_mac: Pasta/passt's MAC on the tap link > > > * @guest_mac: MAC address of guest or namespace, seen or configured > > > * @hash_secret: 128-bit secret for siphash functions > > > @@ -244,6 +248,7 @@ struct ctx { > > > int foreground; > > > int nofile; > > > char sock_path[UNIX_PATH_MAX]; > > > + char repair_path[UNIX_PATH_MAX]; > > > char pcap[PATH_MAX]; > > > =20 > > > char pidfile[PATH_MAX]; > > > @@ -260,6 +265,8 @@ struct ctx { > > > int epollfd; > > > int fd_tap_listen; > > > int fd_tap; > > > + int fd_repair_listen; > > > + int fd_repair; > > > unsigned char our_tap_mac[ETH_ALEN]; > > > unsigned char guest_mac[ETH_ALEN]; > > > uint64_t hash_secret[2]; > > > diff --git a/repair.c b/repair.c > > > new file mode 100644 > > > index 0000000..24966f5 > > > --- /dev/null > > > +++ b/repair.c > > > @@ -0,0 +1,192 @@ > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > + > > > +/* PASST - Plug A Simple Socket Transport > > > + * for qemu/UNIX domain socket mode > > > + * > > > + * PASTA - Pack A Subtle Tap Abstraction > > > + * for network namespace/tap device mode > > > + * > > > + * repair.c - Interface (server) for passt-repair, set/clear TCP_REP= AIR > > > + * > > > + * Copyright (c) 2025 Red Hat GmbH > > > + * Author: Stefano Brivio > > > + */ > > > + > > > +#include > > > +#include > > > + > > > +#include "util.h" > > > +#include "ip.h" > > > +#include "passt.h" > > > +#include "inany.h" > > > +#include "flow.h" > > > +#include "flow_table.h" > > > + > > > +#include "repair.h" > > > + > > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not= in UAPI */ > > > + > > > +static int fds[SCM_MAX_FD]; =20 > >=20 > > Even 'static', I'd prefer a longer name for a global variable. >=20 > A longer name for this also means a longer name for nfds below, which > is not necessarily practical, but I'll check and try to change these to > repair_fds[] / repair_nfds if doable. >=20 > > > +static int current_cmd; =20 > >=20 > > Is there any particular rationale behind these being globals, whereas > > fd_repair is in struct ctx? AFAICT they're basically equally global > > in practice. >=20 > Yes: c->fd_repair_listen needs to be in struct ctx, and I want to keep > this consistent with c->fd_tap_listen / c->fd_tap. >=20 > Besides, the variables declared here are really hacks representing > state for this compilation unit only. Oh, right. Essentially these (static) globals are less global than ctx, which is truly world global. Ok, that makes sense. > > > +static int nfds; > > > + > > > +/** > > > + * repair_sock_init() - Start listening for connections on helper so= cket > > > + * @c: Execution context > > > + */ > > > +void repair_sock_init(const struct ctx *c) > > > +{ > > > + union epoll_ref ref =3D { .type =3D EPOLL_TYPE_REPAIR_LISTEN }; > > > + struct epoll_event ev =3D { 0 }; > > > + > > > + listen(c->fd_repair_listen, 0); > > > + > > > + ref.fd =3D c->fd_repair_listen; > > > + ev.events =3D EPOLLIN | EPOLLHUP | EPOLLET; > > > + ev.data.u64 =3D ref.u64; > > > + epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_repair_listen, &ev); > > > +} > > > + > > > +/** > > > + * repair_listen_handler() - Handle events on TCP_REPAIR helper list= ening socket > > > + * @c: Execution context > > > + * @events: epoll events > > > + */ > > > +void repair_listen_handler(struct ctx *c, uint32_t events) > > > +{ > > > + union epoll_ref ref =3D { .type =3D EPOLL_TYPE_REPAIR }; > > > + struct epoll_event ev =3D { 0 }; > > > + struct ucred ucred; > > > + socklen_t len; > > > + > > > + if (events !=3D EPOLLIN) { > > > + debug("Spurious event 0x%04x on TCP_REPAIR helper socket", > > > + events); > > > + return; > > > + } > > > + > > > + len =3D sizeof(ucred); > > > + > > > + /* Another client is already connected: accept and close right away= =2E */ =20 > >=20 > > For the repair socket, last-connection-wins would make more sense to > > me than first-connection-wins. While hacking/debugging seems it might > > be useful to fix something in the passt-repair, re-run it and have it > > displace the stale version for existing passt instances. >=20 > In the whole debugging I've done so far I'm actually using passt-repair > (otherwise I don't even start it), so it already terminates once it's > done. >=20 > I think it's more secure to keep it like this, because it's more robust > against races. >=20 > Let's say we have an issue in KubeVirt's virt-handler so that the > migration starts a bit before the helper is started, and somebody > manages to use this time window to connect another helper, this attempt > will actually go unnoticed. >=20 > If passt-repair fails to connect, it will be very obvious. Ok, you convinced me. > > > + if (c->fd_repair !=3D -1) { > > > + int discard =3D accept4(c->fd_repair_listen, NULL, NULL, > > > + SOCK_NONBLOCK); > > > + > > > + if (discard =3D=3D -1) > > > + return; > > > + > > > + if (!getsockopt(discard, SOL_SOCKET, SO_PEERCRED, &ucred, &len)) > > > + info("Discarding TCP_REPAIR helper, PID %i", ucred.pid); > > > + > > > + close(discard); > > > + return; > > > + } > > > + > > > + c->fd_repair =3D accept4(c->fd_repair_listen, NULL, NULL, 0); > > > + > > > + if (!getsockopt(c->fd_repair, SOL_SOCKET, SO_PEERCRED, &ucred, &len= )) > > > + info("Accepted TCP_REPAIR helper, PID %i", ucred.pid); > > > + > > > + ref.fd =3D c->fd_repair; > > > + ev.events =3D EPOLLHUP | EPOLLET; > > > + ev.data.u64 =3D ref.u64; > > > + epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_repair, &ev); > > > +} > > > + > > > +/** > > > + * repair_close() - Close connection to TCP_REPAIR helper > > > + * @c: Execution context > > > + */ > > > +void repair_close(struct ctx *c) > > > +{ > > > + debug("Closing TCP_REPAIR helper socket"); > > > + > > > + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_repair, NULL); > > > + close(c->fd_repair); > > > + c->fd_repair =3D -1; > > > +} > > > + > > > +/** > > > + * repair_handler() - Handle EPOLLHUP and EPOLLERR on TCP_REPAIR hel= per socket > > > + * @c: Execution context > > > + * @events: epoll events > > > + */ > > > +void repair_handler(struct ctx *c, uint32_t events) > > > +{ > > > + (void)events; > > > + > > > + repair_close(c); > > > +} > > > + > > > +/** > > > + * repair_flush() - Flush current set of sockets to helper, with cur= rent command > > > + * @c: Execution context > > > + * > > > + * Return: 0 on success, negative error code on failure > > > + */ > > > +int repair_flush(struct ctx *c) > > > +{ > > > + struct iovec iov =3D { &((int8_t){ current_cmd }), sizeof(int8_t) }; > > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > > > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > > > + struct cmsghdr *cmsg; > > > + struct msghdr msg; > > > + int ret =3D 0; > > > + > > > + if (!nfds) > > > + return 0; > > > + > > > + msg =3D (struct msghdr){ NULL, 0, &iov, 1, > > > + buf, CMSG_SPACE(sizeof(int) * nfds), 0 }; > > > + cmsg =3D CMSG_FIRSTHDR(&msg); > > > + > > > + cmsg->cmsg_level =3D SOL_SOCKET; > > > + cmsg->cmsg_type =3D SCM_RIGHTS; > > > + cmsg->cmsg_len =3D CMSG_LEN(sizeof(int) * nfds); > > > + memcpy(CMSG_DATA(cmsg), fds, sizeof(int) * nfds); > > > + > > > + nfds =3D 0; > > > + > > > + if (sendmsg(c->fd_repair, &msg, 0) < 0) { > > > + ret =3D -errno; =20 > >=20 > > This error code won't be reported to the caller: you'll continue on to > > the recv() below, which will return EBADF, clobbering ret. >=20 > Oops, right, I'll return early in this case (or equivalent). >=20 > > > + err_perror("Failed to send sockets to TCP_REPAIR helper"); > > > + repair_close(c); > > > + } > > > + > > > + if (recv(c->fd_repair, &((int8_t){ 0 }), 1, 0) < 0) { > > > + ret =3D -errno; > > > + err_perror("Failed to receive reply from TCP_REPAIR helper"); > > > + repair_close(c); > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +/** > > > + * repair_flush() - Add socket to TCP_REPAIR set with given command > > > + * @c: Execution context > > > + * @s: Socket to add > > > + * @cmd: TCP_REPAIR_ON, TCP_REPAIR_OFF, or TCP_REPAIR_OFF_NO_WP > > > + * > > > + * Return: 0 on success, negative error code on failure > > > + */ > > > +/* cppcheck-suppress unusedFunction */ > > > +int repair_set(struct ctx *c, int s, int cmd) > > > +{ > > > + int rc; > > > + > > > + if (nfds && current_cmd !=3D cmd) { > > > + if ((rc =3D repair_flush(c))) > > > + return rc; > > > + } > > > + > > > + current_cmd =3D cmd; > > > + fds[nfds++] =3D s; > > > + > > > + if (nfds >=3D SCM_MAX_FD) { > > > + if ((rc =3D repair_flush(c))) > > > + return rc; > > > + } > > > + > > > + return 0; > > > +} > > > diff --git a/repair.h b/repair.h > > > new file mode 100644 > > > index 0000000..693c515 > > > --- /dev/null > > > +++ b/repair.h > > > @@ -0,0 +1,16 @@ > > > +/* SPDX-License-Identifier: GPL-2.0-or-later > > > + * Copyright (c) 2025 Red Hat GmbH > > > + * Author: Stefano Brivio > > > + */ > > > +=20 > > > +#ifndef REPAIR_H > > > +#define REPAIR_H > > > + > > > +void repair_sock_init(const struct ctx *c); > > > +void repair_listen_handler(struct ctx *c, uint32_t events); > > > +void repair_handler(struct ctx *c, uint32_t events); > > > +void repair_close(struct ctx *c); > > > +int repair_flush(struct ctx *c); > > > +int repair_set(struct ctx *c, int s, int cmd); > > > + > > > +#endif /* REPAIR_H */ > > > diff --git a/tap.c b/tap.c > > > index cd32a90..0e60eb4 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -56,6 +56,7 @@ > > > #include "netlink.h" > > > #include "pasta.h" > > > #include "packet.h" > > > +#include "repair.h" > > > #include "tap.h" > > > #include "log.h" > > > #include "vhost_user.h" > > > @@ -1151,68 +1152,6 @@ void tap_handler_pasta(struct ctx *c, uint32_t= events, > > > tap_pasta_input(c, now); > > > } > > > =20 > > > -/** > > > - * tap_sock_unix_open() - Create and bind AF_UNIX socket > > > - * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH = as prefix) > > > - * > > > - * Return: socket descriptor on success, won't return on failure > > > - */ > > > -int tap_sock_unix_open(char *sock_path) > > > -{ > > > - int fd =3D socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); > > > - struct sockaddr_un addr =3D { > > > - .sun_family =3D AF_UNIX, > > > - }; > > > - int i; > > > - > > > - if (fd < 0) > > > - die_perror("Failed to open UNIX domain socket"); > > > - > > > - for (i =3D 1; i < UNIX_SOCK_MAX; i++) { > > > - char *path =3D addr.sun_path; > > > - int ex, ret; > > > - > > > - if (*sock_path) > > > - memcpy(path, sock_path, UNIX_PATH_MAX); > > > - else if (snprintf_check(path, UNIX_PATH_MAX - 1, > > > - UNIX_SOCK_PATH, i)) > > > - die_perror("Can't build UNIX domain socket path"); > > > - > > > - ex =3D socket(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, > > > - 0); > > > - if (ex < 0) > > > - die_perror("Failed to check for UNIX domain conflicts"); > > > - > > > - ret =3D connect(ex, (const struct sockaddr *)&addr, sizeof(addr)); > > > - if (!ret || (errno !=3D ENOENT && errno !=3D ECONNREFUSED && > > > - errno !=3D EACCES)) { > > > - if (*sock_path) > > > - die("Socket path %s already in use", path); > > > - > > > - close(ex); > > > - continue; > > > - } > > > - close(ex); > > > - > > > - unlink(path); > > > - ret =3D bind(fd, (const struct sockaddr *)&addr, sizeof(addr)); > > > - if (*sock_path && ret) > > > - die_perror("Failed to bind UNIX domain socket"); > > > - > > > - if (!ret) > > > - break; > > > - } > > > - > > > - if (i =3D=3D UNIX_SOCK_MAX) > > > - die_perror("Failed to bind UNIX domain socket"); > > > - > > > - info("UNIX domain socket bound at %s", addr.sun_path); > > > - if (!*sock_path) > > > - memcpy(sock_path, addr.sun_path, UNIX_PATH_MAX); > > > - > > > - return fd; > > > -} > > > - > > > /** > > > * tap_backend_show_hints() - Give help information to start QEMU > > > * @c: Execution context > > > @@ -1423,6 +1362,8 @@ void tap_backend_init(struct ctx *c) > > > tap_sock_tun_init(c); > > > break; > > > case MODE_VU: > > > + repair_sock_init(c); > > > + /* fall through */ > > > case MODE_PASST: > > > tap_sock_unix_init(c); > > > =20 > > > diff --git a/util.c b/util.c > > > index 36857d4..e98da74 100644 > > > --- a/util.c > > > +++ b/util.c > > > @@ -178,6 +178,68 @@ int sock_l4_sa(const struct ctx *c, enum epoll_t= ype type, > > > return fd; > > > } > > > =20 > > > +/** > > > + * sock_unix() - Create and bind AF_UNIX socket > > > + * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH = as prefix) > > > + * > > > + * Return: socket descriptor on success, won't return on failure > > > + */ =20 > >=20 > > I like making tap_sock_unix_open() more general. Could be split into > > a separate patch. >=20 > I need it here though, and keeping it here saves lines in the commit > message... Oh very well. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --HrAFDF5V+Z+wJLC8 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmea1tYACgkQzQJF27ox 2GdEBBAAl4qUzzj3/XXS2DSi8y3UPR1MspeRR7QKc59X7nueHDPvgyPgil6lm/92 MOdfjsiJkdIiB9qwqUFnt2SHva+ocOifU9EIkX/KUC4xa4M+WV+16RTI2jgb1DtR sEm9fvNJ/d8rg7EO7VYUf2QxbPfRbZwPeLoePKWxLAS2k6xnAihyUsvc46XrJuto kb0nDss15f2u6oAvxe4W130AMC9EnEUC9D0sfED1IHvR+GFQcblmv3jWAIMDcoBd DEgOHblTqlrQiNhx6gtNb6t/VdwmLfatBw9TehzKd1CGyYIvgwmh+rYaYYHThsHz ZkQXsw5K3/DLDwMWz9vUOo4BEIE5gMJG5QCWiLVSWf5E/N6EDYraGC1lCIFDT1a7 ee11NOEjPbvX4hRPY1253wAIljCqDArLaSAFQHo1dmQ4sp5C9+ECkQhsgL/T9PKD VC8F7mx//8C/dM3i0xc3319lQD/tOMlnM0ZwO2s7J+5gWYZKBFx81c1Q7gp05oBv 2C59y49kr7w0uYIP1Tm3bOQusJgipxH4uqx0O03SRaL570mxIb5lsm8pqt2YM+K7 DTl+hvnNO79sZaS5ZUs8xbVQUxPzlRbiunMKumH0PphfTypA2ed12NEfLT3fY2M6 V5C8negP7JTjW7nLlN4sceX7vfy6u9vRvknuihfEXL7V+A/SO6Y= =YTgO -----END PGP SIGNATURE----- --HrAFDF5V+Z+wJLC8--