From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ifgoQegL; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id D33A05A0275 for ; Mon, 03 Feb 2025 10:45:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738575911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EvgoYbUw/S9aUfkZ/ZGwgmGh3fL8M3qwrT3gzD3rxVI=; b=ifgoQegLROnofkPzKWAog+hf6ACau2M1bJtOm76fgYpwYnI3TbkQIz+zS99H7T62Y/1fVm x5euSJHiHPErJQ6srVzknLej+aLil6eNrPlveePwTJnYmgbgO7rF8QJ7284X+hC8ljcjh8 C0+OFde9Cd+mliSsoit/FecOnDwQzT8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-199-inRoTLNuPa67dPuVnO7-kw-1; Mon, 03 Feb 2025 04:45:10 -0500 X-MC-Unique: inRoTLNuPa67dPuVnO7-kw-1 X-Mimecast-MFC-AGG-ID: inRoTLNuPa67dPuVnO7-kw Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4361ac8b25fso21565735e9.2 for ; Mon, 03 Feb 2025 01:45:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738575909; x=1739180709; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=EvgoYbUw/S9aUfkZ/ZGwgmGh3fL8M3qwrT3gzD3rxVI=; b=P8x0lZcmlOWz0xKelKIt4O3r/cW62kf2lrh1UDN2lJfT5cDZOxjVbR87U+zQgmys+J kMqV0nHxaASKpJ8gC7m7B8dreGwU5xIYO8oRV0RuxxIy3427WTnPF/wu87WNIsMQQKMp 57DftGE4d5xQkj0cTV+bwKjDHZP9n2cFe8Qa0SusEne6WHVX7uzi8OYgzag+6oXq4P43 IMES9WC2mq8wmjgLXph+fNR10FXRBTgMHtmU/cfJF2OMi1V6ClrLhT+keoSNM0govjc2 z8dfEpjd9z48jWR/I5E8uoeXWnezjuCBN0mACCfant37BhMFCRxZ/eCzA1wMYHFp4P33 aixQ== X-Gm-Message-State: AOJu0Yym3n08pszWypyS2ei06ClQh7qXkVIlHthPAsFl3ryI9qBwttXF iKP42+22wNFh8mTxiSITYTDjn65MRmfotZ5y/XMfrXp1m9ZIt18iLOxYPxny2RTAaeMXuDGXIz7 yxNdPGc6MgGXxdQdI2UvgP0r8R2dYSUZAnUJzJGt5SDdBPWh2/A== X-Gm-Gg: ASbGncvccIxSl5kqwhwDQxbamos1NrjJPG9pspjZ3o1I2C9Xtj+eAWcfcN6R//7WWej SS8m57ekncSgVxqr91agmT+aNNUHxttKhcW5PJFMrya9elIeUezvSDEKuX4XduYyySqGTKR0jZw SHnOBhTIoRl/Z0bpR5xdn1ej6lQOFXnsKKzWeMr3FGSWlUFbtrT+aw+x7XDAMD+I4BRC1Xon2vn /TBu7OSJZs9yaE1vFbDm3bi2bcDVbkSDR0C1Nmmj4h8Duz0LB50fTbW/Tz0HvfUynsMND/Ng+SL M0GqfU55259e7rEm X-Received: by 2002:a05:600c:1554:b0:436:faf1:9da with SMTP id 5b1f17b1804b1-438dc3a3f9emr189164025e9.2.1738575909553; Mon, 03 Feb 2025 01:45:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFX5druVuIXVDSW1jEYe0MlF52a+hCmQ9wNonssAuHEOzcpfGrp0icuxazBd63PS06WT8eS4w== X-Received: by 2002:a05:600c:1554:b0:436:faf1:9da with SMTP id 5b1f17b1804b1-438dc3a3f9emr189163745e9.2.1738575909160; Mon, 03 Feb 2025 01:45:09 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438dcc81589sm183006625e9.33.2025.02.03.01.45.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 01:45:06 -0800 (PST) Date: Mon, 3 Feb 2025 10:45:05 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: <20250203104505.71d768ed@elisabeth> In-Reply-To: References: <20250129083350.220a7ab0@elisabeth> <20250130055522.39acb265@elisabeth> <20250130093236.117c3fd0@elisabeth> <20250131063655.41a5861b@elisabeth> <20250131100919.0950ec1e@elisabeth> <20250203070928.54561e7e@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wBma4fjR_R7nPwCcwraObh_IBGVx5eWRX-w0htWZgC0_1738575910 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: XQBTNB275WTOTBTC3WYHKXCGV6ZTPF4J X-Message-ID-Hash: XQBTNB275WTOTBTC3WYHKXCGV6ZTPF4J X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 3 Feb 2025 20:06:28 +1100 David Gibson wrote: > On Mon, Feb 03, 2025 at 07:09:28AM +0100, Stefano Brivio wrote: > > On Mon, 3 Feb 2025 11:46:13 +1100 > > David Gibson wrote: > > > > > On Fri, Jan 31, 2025 at 10:09:19AM +0100, Stefano Brivio wrote: > > > > Fixed, finally. Some answers: > > > > > > > > On Fri, 31 Jan 2025 17:14:18 +1100 > > > > David Gibson wrote: > > > > > > > > > On Fri, Jan 31, 2025 at 06:36:55AM +0100, Stefano Brivio wrote: > > > > > > On Thu, 30 Jan 2025 09:32:36 +0100 > > > > > > Stefano Brivio wrote: > > > > > > > > > > > > > I would like to quickly complete the whole flow first, because I think > > > > > > > we can inform design and implementation decisions much better at that > > > > > > > point > > > > > > > > > > > > So, there seems to be a problem with (testing?) this. I couldn't quite > > > > > > understand the root cause yet, and it doesn't happen with the reference > > > > > > source.c and target.c implementations I shared. > > > > > > > > > > > > Let's assume I have a connection in the source guest to 127.0.0.1:9091, > > > > > > from 127.0.0.1:56350. After the migration, in the target, I get: > > > > > > > > > > > > --- > > > > > > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 79 > > > > > > setsockopt(79, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > > > > > bind(79, {sa_family=AF_INET, sin_port=htons(56350), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 > > > > > > sendmsg(72, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[79]}], msg_controllen=24, msg_flags=0}, 0) = 1 > > > > > > recvfrom(72, "\1", 1, 0, NULL, NULL) = 1 > > > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) = 0 > > > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [1788468535], 4) = 0 > > > > > > write(2, "77.6923: ", 977.6923: ) = 9 > > > > > > write(2, "Set send queue sequence for sock"..., 51Set send queue sequence for socket 79 to 1788468535) = 51 > > > > > > write(2, "\n", 1 > > > > > > ) = 1 > > > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0 > > > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [115288604], 4) = 0 > > > > > > write(2, "77.6924: ", 977.6924: ) = 9 > > > > > > write(2, "Set receive queue sequence for s"..., 53Set receive queue sequence for socket 79 to 115288604) = 53 > > > > > > write(2, "\n", 1 > > > > > > ) = 1 > > > > > > connect(79, {sa_family=AF_INET, sin_port=htons(9091), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address) > > > > > > --- > > > > > > > > > > > > EADDRNOTAVAIL, according to the documentation, which seems to be > > > > > > consistent with a glance at the implementation (that is, I must be > > > > > > missing some issue in the kernel), should be returned on connect() if: > > > > > > > > > > > > EADDRNOTAVAIL > > > > > > (Internet domain sockets) The socket referred to by > > > > > > sockfd had not previously been bound to an address > > > > > > and, upon attempting to bind it to an ephemeral > > > > > > port, it was determined that all port numbers in the > > > > > > ephemeral port range are currently in use. See the > > > > > > discussion of /proc/sys/net/ipv4/ip_local_port_range > > > > > > in ip(7). > > > > > > > > > > > > but well, of course it was bound. > > > > > > > > > > > > To a port, indeed, not a full address, that is, any (0.0.0.0) and > > > > > > address port, but I think for the purposes of this description that > > > > > > bind() call is enough. > > > > > > > > > > So, I was wondering if binding to 0.0.0.0 is sufficient for a repaired > > > > > socket. > > > > > > > > It is. > > > > > > > > > Usually, of course, that 0.0.0.0 would be resolved to a real > > > > > address at connect() time. But TCP_REPAIR's version of connect() > > > > > bypasses a bunch of the usual connect logic, so maybe we need an > > > > > explicit address here. > > > > > > > > No need. > > > > > > Ok. > > > > > > > > ...but that doesn't explain the difference between passt and your test > > > > > implementation. > > > > > > > > The difference that actually matters is that the test implementation > > > > terminates, and that has the equivalent effect of switching off repair > > > > mode for the closed sockets, which frees up all the associated context, > > > > including the port. > > > > > > > > Usually, there are no valid operations on closed sockets (not even > > > > close()). This is the first exception I ever met: you can set > > > > TCP_REPAIR_OFF. > > > > > > I'm still confused by the specific sequence of events that's causing > > > the problem. If a socket is closed with close(2) it should no longer > > > exist, so I don't see how you could even attempt to do anything with > > > it. > > > > > > Do you mean that the socket is shutdown(RD|WR)? Or that it's been > > > closed by passt, but not by passt-repair? Or the other way around? > > > > > > I'd kind of assume that you _must_ close the socket while still in > > > repair mode, since we want it to go away on the source without > > > attempting to FIN or RST or anything. > > > > While the explanation for the issue is what you gave as comment to 8/20 > > (I need to close() the socket from passt-repair), let me answer here: > > sure, I must close() it, and it was close()d by passt but not > > passt-repair. > > Right, I realised the problem with the missing close in passt-repair > after I wrote this. > > > > > But there's a catch: you can't pass a closed socket in repair mode via > > > > SCM_RIGHTS (well, I'm fairly sure nobody approached this level of > > > > insanity before): you get EBADF (which is an understatement). > > > > > > > > And there's another catch: if you actually try to do that, even if it > > > > fails, that has the same effect of clearing the socket entirely: you > > > > free up the port. > > > > > > !?! this is even more baffling. Passing what's now an unrelated, > > > unassigned integer as an fd is having some effect on a socket that was > > > around!? If so that's a horrifying kernel bug. > > > > Nah, most likely not. The EBADF on a close()d socket is a bit > > questionable (it should be EINVAL? Or a -1 socket in the > > recipient?), > > You're not "passing a closed socket", that's nonsensical. You're > trying to pass a stale fd that's no longer refers to your socket. Well, I'm just passing a number that doesn't happen to refer to a current socket, but: > EBADF is _exactly_ what should happen, regardless of whether or not > the underlying socket is really closed, or if it's held open by > another fd somewhere (a dup() or something passed to another process > like in this case). ...EBADF on a sendmsg() means, in POSIX.1-2024: [EBADF] The socket argument is not a valid file descriptor. and nothing else. This matches GNU/Linux documentation by the way. The socket argument is *not* one of the file descriptors that you can pass via SCM_RIGHTS. I would argue that a more reasonable and less surprising behaviour would be signalling that there's no socket to send with a -1 in the receiver. Or omit it in ancillary data altogether. POSIX specifies SCM_RIGHTS, but doesn't mention any error for ancillary data. > > but other than that, the explanation is that passing that closed socket > > caused EOF in passt-repair, and passt-repair would quit, solving the > > issue. > > Passing a bad fd caused an error on the sendmsg(), which caused an EOF > on the other end. Which is a little odd, but again nothing to do with > "passing a closed socket"; that's impossible - if the socket is closed > there's no way to refer to it and so no way to even attempt sending > it. Right, so it shouldn't be sent, but the error doesn't match. -- Stefano