From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QjINgSWe; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id ADA6E5A061C for ; Tue, 28 Jan 2025 07:51:38 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738047097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IACZ26PDYxOfjj+Ix/Gr8t928R90hcgCuklSEbjMWy4=; b=QjINgSWebv4RxW/2ZYkrd9WmRM0HDocZtfSUgOAtxeu2QyPmb0DZxy3jCkYBDY9o85EgHP hjCkeduTG6aN+LLTGbjZYVfoieEKX00LY08fVIwnvIlTACoRUV0pQX7ltJlQdZuHo/GGBj AuZB6XxxaPJX/hMeLup3u26rwqNLi34= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-299-jR06nFUkOzyW27wpun9YLw-1; Tue, 28 Jan 2025 01:51:35 -0500 X-MC-Unique: jR06nFUkOzyW27wpun9YLw-1 X-Mimecast-MFC-AGG-ID: jR06nFUkOzyW27wpun9YLw Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-436723bf7ffso39809835e9.3 for ; Mon, 27 Jan 2025 22:51:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738047094; x=1738651894; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IACZ26PDYxOfjj+Ix/Gr8t928R90hcgCuklSEbjMWy4=; b=ZOfTmjhPa31I3AHAdla/7u5QD5MtOVy4juvyQAp9ILYOzGvusJlHOsg5azaPMrU7rJ 3ZE6QhZ4OOnaFLlq6vVr72yGO75rn8ZDMVf8L6KD+AFi8pZ5bPurrXEqjLkuVqJWkB+V GAenNn0G42fPYUkVsgbeHOEXzkHe1PY1MJxg9yb/aig96+jH7fRSxPLdGB6QtmQb9RSI /zubuUXzu+8L1xxs5QDS0lXZjN9Mvr6pOztb42qm+dy/82IEwDv+0hg7scdxxu8F0sMr 0iEv+82ATTOXgB+dBIcNboPOzhGv3tVSyVC+/aNIUxSdK9GzW1KxhKF709E2Yk7HPxn0 chEg== X-Gm-Message-State: AOJu0YwWgpk5EUSoDMUccquFNsF8/oFDE3tI3aWvMnupGRqljzBJtpyp zkVxhv/n2Dx6u0rWgKe/i1eqiOk00zJrIKxzMfMVbJBC/j3EHfY7fcmL2XhP4ooTq/tJmDvxl/G eosezV0IuXHObDIXHm6zu+Khrrr45t/PV3y+2pJGNVBCUFEWwCQ== X-Gm-Gg: ASbGnctauB1t8iNtAt1m+DS67iaOJLePqzlY1MGsAczvGuKyHVYFa7wv4IA30rlpYUG fvtjlkEpDQrSKeN+d6RUoB1JamTvfzy9THAPmOHXnG8hio7+MsCcXVtOVF6lI9MafTFDku6Zo5N vyeMRhxLWyLaeWXCb10+b+ypNciQtJKAlVeFqrgJVZzPnBLG8Fno0XMXCqO27EKNs6fYYUlk6WP mwPKSitf9MPVNaXdrcAsJw1IJ67gZYvSQzlHcvlgS9694TOiYmrfxAEVq6taSFHGTnHzEGP4PgQ djMvf3Nnkj8aete5 X-Received: by 2002:a05:600c:4f47:b0:430:57e8:3c7e with SMTP id 5b1f17b1804b1-43891453ab8mr358735015e9.28.1738047094212; Mon, 27 Jan 2025 22:51:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IF3pbSo+t8R7y5e8/BRt0Cz31eJNaxau1cYIE6g2MY52HAv2sJF2ldIzPpicL/UHX4Rl4c50w== X-Received: by 2002:a05:600c:4f47:b0:430:57e8:3c7e with SMTP id 5b1f17b1804b1-43891453ab8mr358734825e9.28.1738047093745; Mon, 27 Jan 2025 22:51:33 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438bd502007sm156768215e9.14.2025.01.27.22.51.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2025 22:51:33 -0800 (PST) Date: Tue, 28 Jan 2025 07:51:31 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 7/7] Introduce passt-repair Message-ID: <20250128075131.268f85d1@elisabeth> In-Reply-To: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-8-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 76eWjG8w4ixCg9d-7tqpTlaMD7wevicOKnyOAYTSWbU_1738047094 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: VKRY4RHO2XFF33SFJJ4AMHJIFPV3R2DF X-Message-ID-Hash: VKRY4RHO2XFF33SFJJ4AMHJIFPV3R2DF X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 28 Jan 2025 12:51:59 +1100 David Gibson wrote: > On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano Brivio wrote: > > A privileged helper to set/clear TCP_REPAIR on sockets on behalf of > > passt. Not used yet. > > > > Signed-off-by: Stefano Brivio > > --- > > Makefile | 10 +++-- > > passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 118 insertions(+), 3 deletions(-) > > create mode 100644 passt-repair.c > > > > diff --git a/Makefile b/Makefile > > index 1383875..1b71cb0 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ > > tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > > vhost_user.c virtio.c vu_common.c > > QRAP_SRCS = qrap.c > > -SRCS = $(PASST_SRCS) $(QRAP_SRCS) > > +PASST_REPAIR_SRCS = passt-repair.c > > +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) > > > > MANPAGES = passt.1 pasta.1 qrap.1 > > > > @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man > > man1dir ?= $(mandir)/man1 > > > > ifeq ($(TARGET_ARCH),x86_64) > > -BIN := passt passt.avx2 pasta pasta.avx2 qrap > > +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair > > else > > -BIN := passt pasta qrap > > +BIN := passt pasta qrap passt-repair > > endif > > > > all: $(BIN) $(MANPAGES) docs > > @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% > > qrap: $(QRAP_SRCS) passt.h > > $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS) > > > > +passt-repair: $(PASST_REPAIR_SRCS) > > + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS) > > + > > valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ > > rt_sigreturn getpid gettid kill clock_gettime mmap \ > > mmap2 munmap open unlink gettimeofday futex statx \ > > diff --git a/passt-repair.c b/passt-repair.c > > new file mode 100644 > > index 0000000..e9b9609 > > --- /dev/null > > +++ b/passt-repair.c > > @@ -0,0 +1,111 @@ > > +// SPDX-License-Identifier: GPL-2.0-or-later > > + > > +/* PASST - Plug A Simple Socket Transport > > + * for qemu/UNIX domain socket mode > > + * > > + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets > > + * > > + * Copyright (c) 2025 Red Hat GmbH > > + * Author: Stefano Brivio > > + * > > + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along > > + * with commands mapping to TCP_REPAIR values, and switch repair mode on or > > + * off. Reply by echoing the command. Exit if the command is INT_MAX. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ > > + > > +int main(int argc, char **argv) > > +{ > > + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] > > + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); > > + struct sockaddr_un a = { AF_UNIX, "" }; > > + int cmd, fds[SCM_MAX_FD], s, ret, i; > > + struct cmsghdr *cmsg; > > + struct msghdr msg; > > + struct iovec iov; > > + > > + iov = (struct iovec){ &cmd, sizeof(cmd) }; > > I mean, local to local, it's *probably* fine, but still a network > protocol not defined in terms of explicit width fields makes me > nervous. I'd prefer to see the cmd being a packed structure with > fixed width elements. It actually is, because: struct { int cmd; }; is a packet structure with fixed width elements. Any architecture we build for (at least the ones I'm aware of) has a 32-bit int. We can make it uint32_t if it makes you feel better. > I also think we should do some sort of basic magic / version exchange. > I don't see any reason we'd need to extend the protocol, but I'd > rather have the option if we have to. passt-repair will be packaged and distributed together with passt, though. Versions must match. And latency here might matter more than in the rest of the migration process. > Plus checking a magic number > should make things less damaging and more debuggable if you were to > point the repair helper at an entirely unrelated unix socket instead > of passt's repair socket. Maybe, yes, even though I don't really see good chances for that mistake to happen. Feel free to post a proposal, of course. > > + msg = (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 }; > > + cmsg = CMSG_FIRSTHDR(&msg); > > + > > + if (argc != 2) { > > + fprintf(stderr, "Usage: %s PATH\n", argv[0]); > > + return -1; > > + } > > + > > + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); > > + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) { > > + fprintf(stderr, "Invalid socket path: %s\n", argv[1]); > > + return -1; > > + } > > + > > + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { > > Hmm.. would a datagram socket better suit our needs here? We need a connection though, so that passt knows when the helper is ready to get messages. It could be done with a synchronisation datagram but it looks more complicated to handle. By the way, with a connection, we could probably just close() the socket here instead of having a "quit" command. If you're referring to the fact we don't keep message boundaries, so we would in theory need to add short read handling to the recvmsg() below: I'd rather switch cmd to a single byte instead. You can't transfer less than that. > > + perror("Failed to create AF_UNIX socket"); > > + return -1; > > + } > > + > > + if (connect(s, (struct sockaddr *)&a, sizeof(a))) { > > + fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], > > + strerror(errno)); > > + return -1; > > + } > > + > > + while (1) { > > + int n; > > + > > + if (recvmsg(s, &msg, 0) < 0) { > > + perror("Failed to receive message"); > > + return -1; > > + } > > + > > + if (!cmsg || > > + cmsg->cmsg_len < CMSG_LEN(sizeof(int)) || > > + cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) || > > + cmsg->cmsg_type != SCM_RIGHTS) > > + return -1; > > + > > + n = cmsg->cmsg_len / CMSG_LEN(sizeof(int)); > > + memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n); > > + > > + switch (cmd) { > > + case INT_MAX: > > + return 0; > > + case TCP_REPAIR_ON: > > + case TCP_REPAIR_OFF: > > + case TCP_REPAIR_OFF_NO_WP: > > + for (i = 0; i < n; i++) { > > + if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, > > + &cmd, sizeof(int))) { > > + perror("Setting TCP_REPAIR"); > > + return -1; > > We probably want this to report errors back to passt, rather than just > dying in this case. That way if for some weird reason one socket > can't be placed in repair mode, we can still migrate all the other > connections. We implicitly report the error in the sense that we close the connection and passt will abort the migration. If you look at the handling of TCP_REPAIR in do_tcp_setsockopt(), you'll see that it either always fails (EPERM), or always succeeds. I mean, it's straightforward to implement, and we can just reply with a different command. But it's probably more meaningful and fitting to abort altogether. Besides, if we have to report exactly on which socket we failed, we won't be able to switch to a single-byte command protocol. > > + } > > + } > > + > > + /* Confirm setting by echoing the command back */ > > + if (send(s, &cmd, sizeof(int), 0) < 0) { > > + fprintf(stderr, "Reply to command %i: %s\n", > > + cmd, strerror(errno)); > > + return -1; > > + } > > + > > + break; > > + default: > > + fprintf(stderr, "Unsupported command 0x%04x\n", cmd); > > + return -1; > > + } > > + } > > +} -- Stefano