From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH v2 7/8] Add interfaces and configuration bits for passt-repair
Date: Thu, 30 Jan 2025 12:33:11 +1100 [thread overview]
Message-ID: <Z5rW1yU3HkJblRc-@zatzit> (raw)
In-Reply-To: <20250129094610.148de3c6@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 26584 bytes --]
On Wed, Jan 29, 2025 at 09:46:10AM +0100, Stefano Brivio wrote:
> On Wed, 29 Jan 2025 17:09:07 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Wed, Jan 29, 2025 at 12:39:39AM +0100, Stefano Brivio wrote:
> > > In vhost-user mode, by default, create a second UNIX domain socket
> > > accepting connections from passt-repair, with the usual listener
> > > socket.
> > >
> > > When we need to set or clear TCP_REPAIR on sockets, we'll send them
> > > via SCM_RIGHTS to passt-repair, who sets the socket option values we
> > > ask for.
> > >
> > > To that end, introduce batched functions to request TCP_REPAIR
> > > settings on sockets, so that we don't have to send a single message
> > > for each socket, on migration. When needed, repair_flush() will
> > > send the message and check for the reply.
> > >
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > ---
> > > Makefile | 12 ++--
> > > conf.c | 46 ++++++++++--
> > > epoll_type.h | 4 ++
> > > passt.1 | 11 +++
> > > passt.c | 9 +++
> > > passt.h | 7 ++
> > > repair.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > repair.h | 16 +++++
> > > tap.c | 65 +----------------
> > > util.c | 62 +++++++++++++++++
> > > util.h | 1 +
> > > 11 files changed, 353 insertions(+), 72 deletions(-)
> > > create mode 100644 repair.c
> > > create mode 100644 repair.h
> > >
> > > diff --git a/Makefile b/Makefile
> > > index 1b71cb0..f67a20b 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -38,9 +38,9 @@ FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
> > >
> > > PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \
> > > icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \
> > > - ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c tap.c \
> > > - tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
> > > - vhost_user.c virtio.c vu_common.c
> > > + ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c \
> > > + repair.c tap.c tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c \
> > > + udp_vu.c util.c vhost_user.c virtio.c vu_common.c
> > > QRAP_SRCS = qrap.c
> > > PASST_REPAIR_SRCS = passt-repair.c
> > > SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
> > > @@ -50,9 +50,9 @@ MANPAGES = passt.1 pasta.1 qrap.1
> > > PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h fwd.h \
> > > flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \
> > > lineread.h log.h migrate.h ndp.h netlink.h packet.h passt.h pasta.h \
> > > - pcap.h pif.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h \
> > > - tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h \
> > > - vhost_user.h virtio.h vu_common.h
> > > + pcap.h pif.h repair.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h \
> > > + tcp_internal.h tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h \
> > > + udp_vu.h util.h vhost_user.h virtio.h vu_common.h
> > > HEADERS = $(PASST_HEADERS) seccomp.h
> > >
> > > C := \#include <sys/random.h>\nint main(){int a=getrandom(0, 0, 0);}
> > > diff --git a/conf.c b/conf.c
> > > index df2b016..85dec44 100644
> > > --- a/conf.c
> > > +++ b/conf.c
> > > @@ -816,6 +816,9 @@ static void usage(const char *name, FILE *f, int status)
> > > " UNIX domain socket is provided by -s option\n"
> > > " --print-capabilities print back-end capabilities in JSON format,\n"
> > > " only meaningful for vhost-user mode\n");
> > > + FPRINTF(f,
> > > + " --repair-path PATH path for passt-repair(1)\n"
> >
> > Nit: as a privileged helper, should it be passt-repair(8)?
>
> So, I spent a couple of minutes on this as I wrote this, and I spent a
> bit longer now: the most authoritative definition I can find of section
> 8 is from man-pages(7):
>
> 8 System management commands
> Commands like mount(8), many of which only root can
> execute.
>
> It's not really a system management command, and the idea is to run it
> with CAP_NET_ADMIN, but not necessarily as root. So I would rather keep
> it in section 1, unless there's some other conflicting definition I'm
> not aware of.
Yeah oh, on that basis (1) makes more sense than (8).
> There's also the topic of where it should be installed (/sbin,
> /usr/sbin/, /bin, /usr/bin). I'd pick /usr/bin, because /sbin doesn't
> really mean much nowadays, and it's anyway fitting with the FHS 3.0:
>
> Utilities used for system administration (and other root-only
> commands) are stored in /sbin, /usr/ sbin, and /usr/local/sbin.
>
> /sbin contains binaries essential for booting, restoring, recovering,
> and/or repairing the system in addition to the binaries in /bin.
> 18 Programs executed after /usr is known to be mounted (when there
> are no problems) are generally placed into /usr/sbin.
>
> ...and I don't think this helper would qualify for /sbin or /usr/sbin.
>
> >
> > > + " default: append '.repair' to UNIX domain path\n");
> > > }
> > >
> > > FPRINTF(f,
> > > @@ -1240,8 +1243,30 @@ static void conf_nat(const char *arg, struct in_addr *addr4,
> > > */
> > > static void conf_open_files(struct ctx *c)
> > > {
> > > - if (c->mode != MODE_PASTA && c->fd_tap == -1)
> > > - c->fd_tap_listen = tap_sock_unix_open(c->sock_path);
> > > + if (c->mode != MODE_PASTA && c->fd_tap == -1) {
> > > + c->fd_tap_listen = sock_unix(c->sock_path);
> > > +
> > > + if (c->mode == MODE_VU && strcmp(c->repair_path, "none")) {
> > > + if (!strncmp(c->repair_path, "./", 2)) {
> > > + memmove(c->repair_path, c->repair_path + 2,
> > > + sizeof(c->repair_path) - 2);
> > > + }
> >
> > Do you need this? Shouldn't "./whatever" be usable as-is?
>
> Ah, yes, I didn't know. I explicitly added this for the '--repair-path
> ./none' case, but it's not actually needed. I'll drop this.
>
> > > +
> > > + if (!*c->repair_path &&
> > > + snprintf_check(c->repair_path,
> > > + sizeof(c->repair_path), "%s.repair",
> > > + c->sock_path)) {
> > > + warn("passt-repair path %s not usable",
> > > + c->repair_path);
> >
> > I'd prefer a die() here - I think omitting a possibly expected
> > feature, with just a warning that could easily be lost in the logs is
> > not a good idea.
>
> I was thinking about that, but should we really risk *not* starting
> because with ".repair" the path is now too long, for a feature that,
> realistically, most users won't actually use?
>
> If we're started by any kind of framework (which is where we run the
> risk of the warning being ignored), then there should be explicit
> checks about the path and the usability of passt-repair (or equivalent).
Hm, yeah, I guess. Honestly I'm not sure which option will cause us
less trouble.
> > > + c->fd_repair_listen = -1;
> > > + } else {
> > > + c->fd_repair_listen = sock_unix(c->repair_path);
> > > + }
> > > + } else {
> > > + c->fd_repair_listen = -1;
> > > + }
> > > + c->fd_repair = -1;
> > > + }
> > >
> > > if (*c->pidfile) {
> > > c->pidfile_fd = output_file_open(c->pidfile, O_WRONLY);
> > > @@ -1354,9 +1379,12 @@ void conf(struct ctx *c, int argc, char **argv)
> > > {"host-lo-to-ns-lo", no_argument, NULL, 23 },
> > > {"dns-host", required_argument, NULL, 24 },
> > > {"vhost-user", no_argument, NULL, 25 },
> > > +
> > > /* vhost-user backend program convention */
> > > {"print-capabilities", no_argument, NULL, 26 },
> > > {"socket-path", required_argument, NULL, 's' },
> > > +
> > > + {"repair-path", required_argument, NULL, 27 },
> > > { 0 },
> > > };
> > > const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
> > > @@ -1824,8 +1852,8 @@ void conf(struct ctx *c, int argc, char **argv)
> > > if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw))
> > > c->no_dhcp = 1;
> > >
> > > - /* Inbound port options & DNS can be parsed now (after IPv4/IPv6
> > > - * settings)
> > > + /* Inbound port options, DNS, and --repair-path can be parsed now, after
> > > + * IPv4/IPv6 settings and --vhost-user.
> > > */
> > > fwd_probe_ephemeral();
> > > udp_portmap_clear();
> > > @@ -1871,6 +1899,16 @@ void conf(struct ctx *c, int argc, char **argv)
> > > }
> > >
> > > die("Cannot use DNS address %s", optarg);
> > > + } else if (name == 27) {
> > > + if (c->mode != MODE_VU && strcmp(optarg, "none"))
> > > + die("--repair-path is for vhost-user mode only");
> > > +
> > > + if (snprintf_check(c->repair_path,
> > > + sizeof(c->repair_path), "%s",
> > > + optarg))
> > > + die("Invalid passt-repair path: %s", optarg);
> > > +
> > > + break;
> > > }
> > > } while (name != -1);
> > >
> > > diff --git a/epoll_type.h b/epoll_type.h
> > > index fd9eac3..706238a 100644
> > > --- a/epoll_type.h
> > > +++ b/epoll_type.h
> > > @@ -42,6 +42,10 @@ enum epoll_type {
> > > EPOLL_TYPE_VHOST_KICK,
> > > /* vhost-user migration socket */
> > > EPOLL_TYPE_VHOST_MIGRATION,
> > > + /* TCP_REPAIR helper listening socket */
> > > + EPOLL_TYPE_REPAIR_LISTEN,
> > > + /* TCP_REPAIR helper socket */
> > > + EPOLL_TYPE_REPAIR,
> > >
> > > EPOLL_NUM_TYPES,
> > > };
> > > diff --git a/passt.1 b/passt.1
> > > index d9cd33e..63a3a01 100644
> > > --- a/passt.1
> > > +++ b/passt.1
> > > @@ -418,6 +418,17 @@ Enable vhost-user. The vhost-user command socket is provided by \fB--socket\fR.
> > > .BR \-\-print-capabilities
> > > Print back-end capabilities in JSON format, only meaningful for vhost-user mode.
> > >
> > > +.TP
> > > +.BR \-\-repair-path " " \fIpath
> > > +Path for UNIX domain socket used by the \fBpasst-repair\fR(1) helper to connect
> >
> > passt-repair(8)?
>
> See above.
>
> > > +to \fBpasst\fR in order to set or clear the TCP_REPAIR option on sockets, during
> > > +migration. \fB--repair-path none\fR disables this interface (if you need to
> > > +specify a socket path called "none" you can prefix the path by \fI./\fR).
> > > +
> > > +Default, for \-\-vhost-user mode only, is to append \fI.repair\fR to the path
> > > +chosen for the hypervisor UNIX domain socket. No socket is created if not in
> > > +\-\-vhost-user mode.
> > > +
> > > .TP
> > > .BR \-F ", " \-\-fd " " \fIFD
> > > Pass a pre-opened, connected socket to \fBpasst\fR. Usually the socket is opened
> > > diff --git a/passt.c b/passt.c
> > > index 184d4e5..1fa2ddd 100644
> > > --- a/passt.c
> > > +++ b/passt.c
> > > @@ -51,6 +51,7 @@
> > > #include "tcp_splice.h"
> > > #include "ndp.h"
> > > #include "vu_common.h"
> > > +#include "repair.h"
> > >
> > > #define EPOLL_EVENTS 8
> > >
> > > @@ -76,6 +77,8 @@ char *epoll_type_str[] = {
> > > [EPOLL_TYPE_VHOST_CMD] = "vhost-user command socket",
> > > [EPOLL_TYPE_VHOST_KICK] = "vhost-user kick socket",
> > > [EPOLL_TYPE_VHOST_MIGRATION] = "vhost-user migration socket",
> > > + [EPOLL_TYPE_REPAIR_LISTEN] = "TCP_REPAIR helper listening socket",
> > > + [EPOLL_TYPE_REPAIR] = "TCP_REPAIR helper socket",
> > > };
> > > static_assert(ARRAY_SIZE(epoll_type_str) == EPOLL_NUM_TYPES,
> > > "epoll_type_str[] doesn't match enum epoll_type");
> > > @@ -360,6 +363,12 @@ loop:
> > > case EPOLL_TYPE_VHOST_MIGRATION:
> > > vu_migrate(&c, eventmask);
> > > break;
> > > + case EPOLL_TYPE_REPAIR_LISTEN:
> > > + repair_listen_handler(&c, eventmask);
> > > + break;
> > > + case EPOLL_TYPE_REPAIR:
> > > + repair_handler(&c, eventmask);
> > > + break;
> > > default:
> > > /* Can't happen */
> > > ASSERT(0);
> > > diff --git a/passt.h b/passt.h
> > > index 0dd4efa..85b0a10 100644
> > > --- a/passt.h
> > > +++ b/passt.h
> > > @@ -20,6 +20,7 @@ union epoll_ref;
> > > #include "siphash.h"
> > > #include "ip.h"
> > > #include "inany.h"
> > > +#include "migrate.h"
> > > #include "flow.h"
> > > #include "icmp.h"
> > > #include "fwd.h"
> > > @@ -193,6 +194,7 @@ struct ip6_ctx {
> > > * @foreground: Run in foreground, don't log to stderr by default
> > > * @nofile: Maximum number of open files (ulimit -n)
> > > * @sock_path: Path for UNIX domain socket
> > > + * @repair_path: TCP_REPAIR helper path, can be "none", empty for default
> > > * @pcap: Path for packet capture file
> > > * @pidfile: Path to PID file, empty string if not configured
> > > * @pidfile_fd: File descriptor for PID file, -1 if none
> > > @@ -203,6 +205,8 @@ struct ip6_ctx {
> > > * @epollfd: File descriptor for epoll instance
> > > * @fd_tap_listen: File descriptor for listening AF_UNIX socket, if any
> > > * @fd_tap: AF_UNIX socket, tuntap device, or pre-opened socket
> > > + * @fd_repair_listen: File descriptor for listening TCP_REPAIR socket, if any
> > > + * @fd_repair: Connected AF_UNIX socket for TCP_REPAIR helper
> > > * @our_tap_mac: Pasta/passt's MAC on the tap link
> > > * @guest_mac: MAC address of guest or namespace, seen or configured
> > > * @hash_secret: 128-bit secret for siphash functions
> > > @@ -244,6 +248,7 @@ struct ctx {
> > > int foreground;
> > > int nofile;
> > > char sock_path[UNIX_PATH_MAX];
> > > + char repair_path[UNIX_PATH_MAX];
> > > char pcap[PATH_MAX];
> > >
> > > char pidfile[PATH_MAX];
> > > @@ -260,6 +265,8 @@ struct ctx {
> > > int epollfd;
> > > int fd_tap_listen;
> > > int fd_tap;
> > > + int fd_repair_listen;
> > > + int fd_repair;
> > > unsigned char our_tap_mac[ETH_ALEN];
> > > unsigned char guest_mac[ETH_ALEN];
> > > uint64_t hash_secret[2];
> > > diff --git a/repair.c b/repair.c
> > > new file mode 100644
> > > index 0000000..24966f5
> > > --- /dev/null
> > > +++ b/repair.c
> > > @@ -0,0 +1,192 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +
> > > +/* PASST - Plug A Simple Socket Transport
> > > + * for qemu/UNIX domain socket mode
> > > + *
> > > + * PASTA - Pack A Subtle Tap Abstraction
> > > + * for network namespace/tap device mode
> > > + *
> > > + * repair.c - Interface (server) for passt-repair, set/clear TCP_REPAIR
> > > + *
> > > + * Copyright (c) 2025 Red Hat GmbH
> > > + * Author: Stefano Brivio <sbrivio@redhat.com>
> > > + */
> > > +
> > > +#include <errno.h>
> > > +#include <sys/uio.h>
> > > +
> > > +#include "util.h"
> > > +#include "ip.h"
> > > +#include "passt.h"
> > > +#include "inany.h"
> > > +#include "flow.h"
> > > +#include "flow_table.h"
> > > +
> > > +#include "repair.h"
> > > +
> > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> > > +
> > > +static int fds[SCM_MAX_FD];
> >
> > Even 'static', I'd prefer a longer name for a global variable.
>
> A longer name for this also means a longer name for nfds below, which
> is not necessarily practical, but I'll check and try to change these to
> repair_fds[] / repair_nfds if doable.
>
> > > +static int current_cmd;
> >
> > Is there any particular rationale behind these being globals, whereas
> > fd_repair is in struct ctx? AFAICT they're basically equally global
> > in practice.
>
> Yes: c->fd_repair_listen needs to be in struct ctx, and I want to keep
> this consistent with c->fd_tap_listen / c->fd_tap.
>
> Besides, the variables declared here are really hacks representing
> state for this compilation unit only.
Oh, right. Essentially these (static) globals are less global than
ctx, which is truly world global. Ok, that makes sense.
> > > +static int nfds;
> > > +
> > > +/**
> > > + * repair_sock_init() - Start listening for connections on helper socket
> > > + * @c: Execution context
> > > + */
> > > +void repair_sock_init(const struct ctx *c)
> > > +{
> > > + union epoll_ref ref = { .type = EPOLL_TYPE_REPAIR_LISTEN };
> > > + struct epoll_event ev = { 0 };
> > > +
> > > + listen(c->fd_repair_listen, 0);
> > > +
> > > + ref.fd = c->fd_repair_listen;
> > > + ev.events = EPOLLIN | EPOLLHUP | EPOLLET;
> > > + ev.data.u64 = ref.u64;
> > > + epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_repair_listen, &ev);
> > > +}
> > > +
> > > +/**
> > > + * repair_listen_handler() - Handle events on TCP_REPAIR helper listening socket
> > > + * @c: Execution context
> > > + * @events: epoll events
> > > + */
> > > +void repair_listen_handler(struct ctx *c, uint32_t events)
> > > +{
> > > + union epoll_ref ref = { .type = EPOLL_TYPE_REPAIR };
> > > + struct epoll_event ev = { 0 };
> > > + struct ucred ucred;
> > > + socklen_t len;
> > > +
> > > + if (events != EPOLLIN) {
> > > + debug("Spurious event 0x%04x on TCP_REPAIR helper socket",
> > > + events);
> > > + return;
> > > + }
> > > +
> > > + len = sizeof(ucred);
> > > +
> > > + /* Another client is already connected: accept and close right away. */
> >
> > For the repair socket, last-connection-wins would make more sense to
> > me than first-connection-wins. While hacking/debugging seems it might
> > be useful to fix something in the passt-repair, re-run it and have it
> > displace the stale version for existing passt instances.
>
> In the whole debugging I've done so far I'm actually using passt-repair
> (otherwise I don't even start it), so it already terminates once it's
> done.
>
> I think it's more secure to keep it like this, because it's more robust
> against races.
>
> Let's say we have an issue in KubeVirt's virt-handler so that the
> migration starts a bit before the helper is started, and somebody
> manages to use this time window to connect another helper, this attempt
> will actually go unnoticed.
>
> If passt-repair fails to connect, it will be very obvious.
Ok, you convinced me.
> > > + if (c->fd_repair != -1) {
> > > + int discard = accept4(c->fd_repair_listen, NULL, NULL,
> > > + SOCK_NONBLOCK);
> > > +
> > > + if (discard == -1)
> > > + return;
> > > +
> > > + if (!getsockopt(discard, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
> > > + info("Discarding TCP_REPAIR helper, PID %i", ucred.pid);
> > > +
> > > + close(discard);
> > > + return;
> > > + }
> > > +
> > > + c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0);
> > > +
> > > + if (!getsockopt(c->fd_repair, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
> > > + info("Accepted TCP_REPAIR helper, PID %i", ucred.pid);
> > > +
> > > + ref.fd = c->fd_repair;
> > > + ev.events = EPOLLHUP | EPOLLET;
> > > + ev.data.u64 = ref.u64;
> > > + epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_repair, &ev);
> > > +}
> > > +
> > > +/**
> > > + * repair_close() - Close connection to TCP_REPAIR helper
> > > + * @c: Execution context
> > > + */
> > > +void repair_close(struct ctx *c)
> > > +{
> > > + debug("Closing TCP_REPAIR helper socket");
> > > +
> > > + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_repair, NULL);
> > > + close(c->fd_repair);
> > > + c->fd_repair = -1;
> > > +}
> > > +
> > > +/**
> > > + * repair_handler() - Handle EPOLLHUP and EPOLLERR on TCP_REPAIR helper socket
> > > + * @c: Execution context
> > > + * @events: epoll events
> > > + */
> > > +void repair_handler(struct ctx *c, uint32_t events)
> > > +{
> > > + (void)events;
> > > +
> > > + repair_close(c);
> > > +}
> > > +
> > > +/**
> > > + * repair_flush() - Flush current set of sockets to helper, with current command
> > > + * @c: Execution context
> > > + *
> > > + * Return: 0 on success, negative error code on failure
> > > + */
> > > +int repair_flush(struct ctx *c)
> > > +{
> > > + struct iovec iov = { &((int8_t){ current_cmd }), sizeof(int8_t) };
> > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> > > + __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> > > + struct cmsghdr *cmsg;
> > > + struct msghdr msg;
> > > + int ret = 0;
> > > +
> > > + if (!nfds)
> > > + return 0;
> > > +
> > > + msg = (struct msghdr){ NULL, 0, &iov, 1,
> > > + buf, CMSG_SPACE(sizeof(int) * nfds), 0 };
> > > + cmsg = CMSG_FIRSTHDR(&msg);
> > > +
> > > + cmsg->cmsg_level = SOL_SOCKET;
> > > + cmsg->cmsg_type = SCM_RIGHTS;
> > > + cmsg->cmsg_len = CMSG_LEN(sizeof(int) * nfds);
> > > + memcpy(CMSG_DATA(cmsg), fds, sizeof(int) * nfds);
> > > +
> > > + nfds = 0;
> > > +
> > > + if (sendmsg(c->fd_repair, &msg, 0) < 0) {
> > > + ret = -errno;
> >
> > This error code won't be reported to the caller: you'll continue on to
> > the recv() below, which will return EBADF, clobbering ret.
>
> Oops, right, I'll return early in this case (or equivalent).
>
> > > + err_perror("Failed to send sockets to TCP_REPAIR helper");
> > > + repair_close(c);
> > > + }
> > > +
> > > + if (recv(c->fd_repair, &((int8_t){ 0 }), 1, 0) < 0) {
> > > + ret = -errno;
> > > + err_perror("Failed to receive reply from TCP_REPAIR helper");
> > > + repair_close(c);
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +/**
> > > + * repair_flush() - Add socket to TCP_REPAIR set with given command
> > > + * @c: Execution context
> > > + * @s: Socket to add
> > > + * @cmd: TCP_REPAIR_ON, TCP_REPAIR_OFF, or TCP_REPAIR_OFF_NO_WP
> > > + *
> > > + * Return: 0 on success, negative error code on failure
> > > + */
> > > +/* cppcheck-suppress unusedFunction */
> > > +int repair_set(struct ctx *c, int s, int cmd)
> > > +{
> > > + int rc;
> > > +
> > > + if (nfds && current_cmd != cmd) {
> > > + if ((rc = repair_flush(c)))
> > > + return rc;
> > > + }
> > > +
> > > + current_cmd = cmd;
> > > + fds[nfds++] = s;
> > > +
> > > + if (nfds >= SCM_MAX_FD) {
> > > + if ((rc = repair_flush(c)))
> > > + return rc;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > diff --git a/repair.h b/repair.h
> > > new file mode 100644
> > > index 0000000..693c515
> > > --- /dev/null
> > > +++ b/repair.h
> > > @@ -0,0 +1,16 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-or-later
> > > + * Copyright (c) 2025 Red Hat GmbH
> > > + * Author: Stefano Brivio <sbrivio@redhat.com>
> > > + */
> > > +
> > > +#ifndef REPAIR_H
> > > +#define REPAIR_H
> > > +
> > > +void repair_sock_init(const struct ctx *c);
> > > +void repair_listen_handler(struct ctx *c, uint32_t events);
> > > +void repair_handler(struct ctx *c, uint32_t events);
> > > +void repair_close(struct ctx *c);
> > > +int repair_flush(struct ctx *c);
> > > +int repair_set(struct ctx *c, int s, int cmd);
> > > +
> > > +#endif /* REPAIR_H */
> > > diff --git a/tap.c b/tap.c
> > > index cd32a90..0e60eb4 100644
> > > --- a/tap.c
> > > +++ b/tap.c
> > > @@ -56,6 +56,7 @@
> > > #include "netlink.h"
> > > #include "pasta.h"
> > > #include "packet.h"
> > > +#include "repair.h"
> > > #include "tap.h"
> > > #include "log.h"
> > > #include "vhost_user.h"
> > > @@ -1151,68 +1152,6 @@ void tap_handler_pasta(struct ctx *c, uint32_t events,
> > > tap_pasta_input(c, now);
> > > }
> > >
> > > -/**
> > > - * tap_sock_unix_open() - Create and bind AF_UNIX socket
> > > - * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
> > > - *
> > > - * Return: socket descriptor on success, won't return on failure
> > > - */
> > > -int tap_sock_unix_open(char *sock_path)
> > > -{
> > > - int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
> > > - struct sockaddr_un addr = {
> > > - .sun_family = AF_UNIX,
> > > - };
> > > - int i;
> > > -
> > > - if (fd < 0)
> > > - die_perror("Failed to open UNIX domain socket");
> > > -
> > > - for (i = 1; i < UNIX_SOCK_MAX; i++) {
> > > - char *path = addr.sun_path;
> > > - int ex, ret;
> > > -
> > > - if (*sock_path)
> > > - memcpy(path, sock_path, UNIX_PATH_MAX);
> > > - else if (snprintf_check(path, UNIX_PATH_MAX - 1,
> > > - UNIX_SOCK_PATH, i))
> > > - die_perror("Can't build UNIX domain socket path");
> > > -
> > > - ex = socket(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC,
> > > - 0);
> > > - if (ex < 0)
> > > - die_perror("Failed to check for UNIX domain conflicts");
> > > -
> > > - ret = connect(ex, (const struct sockaddr *)&addr, sizeof(addr));
> > > - if (!ret || (errno != ENOENT && errno != ECONNREFUSED &&
> > > - errno != EACCES)) {
> > > - if (*sock_path)
> > > - die("Socket path %s already in use", path);
> > > -
> > > - close(ex);
> > > - continue;
> > > - }
> > > - close(ex);
> > > -
> > > - unlink(path);
> > > - ret = bind(fd, (const struct sockaddr *)&addr, sizeof(addr));
> > > - if (*sock_path && ret)
> > > - die_perror("Failed to bind UNIX domain socket");
> > > -
> > > - if (!ret)
> > > - break;
> > > - }
> > > -
> > > - if (i == UNIX_SOCK_MAX)
> > > - die_perror("Failed to bind UNIX domain socket");
> > > -
> > > - info("UNIX domain socket bound at %s", addr.sun_path);
> > > - if (!*sock_path)
> > > - memcpy(sock_path, addr.sun_path, UNIX_PATH_MAX);
> > > -
> > > - return fd;
> > > -}
> > > -
> > > /**
> > > * tap_backend_show_hints() - Give help information to start QEMU
> > > * @c: Execution context
> > > @@ -1423,6 +1362,8 @@ void tap_backend_init(struct ctx *c)
> > > tap_sock_tun_init(c);
> > > break;
> > > case MODE_VU:
> > > + repair_sock_init(c);
> > > + /* fall through */
> > > case MODE_PASST:
> > > tap_sock_unix_init(c);
> > >
> > > diff --git a/util.c b/util.c
> > > index 36857d4..e98da74 100644
> > > --- a/util.c
> > > +++ b/util.c
> > > @@ -178,6 +178,68 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type,
> > > return fd;
> > > }
> > >
> > > +/**
> > > + * sock_unix() - Create and bind AF_UNIX socket
> > > + * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix)
> > > + *
> > > + * Return: socket descriptor on success, won't return on failure
> > > + */
> >
> > I like making tap_sock_unix_open() more general. Could be split into
> > a separate patch.
>
> I need it here though, and keeping it here saves lines in the commit
> message...
Oh very well.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-01-30 1:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 23:39 [PATCH v2 0/8] Draft, incomplete series introducing state migration Stefano Brivio
2025-01-28 23:39 ` [PATCH v2 1/8] icmp, udp: Pad time_t timestamp to 64-bit to ease " Stefano Brivio
2025-01-29 1:34 ` David Gibson
2025-01-28 23:39 ` [PATCH v2 2/8] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-29 1:35 ` David Gibson
2025-01-28 23:39 ` [PATCH v2 3/8] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-28 23:39 ` [PATCH v2 4/8] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-29 1:37 ` David Gibson
2025-01-28 23:39 ` [PATCH v2 5/8] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-29 5:41 ` David Gibson
2025-01-29 8:46 ` Stefano Brivio
2025-01-30 1:28 ` David Gibson
2025-01-28 23:39 ` [PATCH v2 6/8] Introduce passt-repair Stefano Brivio
2025-01-28 23:39 ` [PATCH v2 7/8] Add interfaces and configuration bits for passt-repair Stefano Brivio
2025-01-29 6:09 ` David Gibson
2025-01-29 8:46 ` Stefano Brivio
2025-01-30 1:33 ` David Gibson [this message]
2025-01-28 23:39 ` [PATCH v2 8/8] flow, tcp: Basic pre-migration source handler to dump sequence numbers Stefano Brivio
2025-01-29 6:15 ` David Gibson
2025-01-29 8:46 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5rW1yU3HkJblRc-@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).