From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=F5XRHmrR; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 4AB995A0271 for ; Mon, 03 Feb 2025 07:09:35 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738562974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P3/T/SgLdgCaznUP2liXFp3IZzWODjQD8/8hMo/Cobw=; b=F5XRHmrRnfJVNbyG9g4CtRwSw021PWxqDLHo9tgCCj/Pf2abkp6QdtpzkYoR8sL/imOCN2 aRjBhL6JRNko5vAA8pUYV6z5t4sM6uJkptD5wGFMqEc+sR3nCxpwD/a+XTgMquq1IgO/sV OhCiPFfvrVoGpBkUZxyz6Y9PoH6wM1w= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-670-bFVMOQFDPRCK7-jt7hsTqw-1; Mon, 03 Feb 2025 01:09:32 -0500 X-MC-Unique: bFVMOQFDPRCK7-jt7hsTqw-1 X-Mimecast-MFC-AGG-ID: bFVMOQFDPRCK7-jt7hsTqw Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3862b364578so2271015f8f.1 for ; Sun, 02 Feb 2025 22:09:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738562971; x=1739167771; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=P3/T/SgLdgCaznUP2liXFp3IZzWODjQD8/8hMo/Cobw=; b=Z+m7s+4vKAEIMIP7ZA9ML2X+WohJHg7hzj9QQas7VN9ZMkqzCydFJx5cL9oViPawoB 27BGd9/sv7KmFMg68+CrveLmTcv+jvA3/pG4HgkNOSdW0YwXl2lyzO6T1OeoVcX9Z0BO nuVWfXF01CQbNeEPh2sikGXAcbKhsLbcnyxuvSYV7NAQ4FJudIZ5Qj3MZkR4yrysQ/+S +p7Z4J3q/ICY3zSfehLcmaFnRicowMV7Qq2204pvDAoz1diWhpgs3G6WZNX8Xr+8bZfW WX3D5IC+aIf1POD1Y1BMXqJNoShGvsZIrBBXLr0rtrG6ohSxb7ZLTKDa4/JTN+yfqly2 YZ7w== X-Gm-Message-State: AOJu0YxmJSCtNoPJFIy1PhYQKAHn/8NXtrCYnG3u9aYNHb5z9GHtUYtD +mthqRWJOVaULDeGMTcx7g4R6TXRoEMCq65Hcvb3IcwAXY82KjMn+MIA8nmnKBsRVmDGJFshhol XBaK1AwZsn3CjTpHmMoAJl6SQJ8JHSXmos0lCZ3e9JFIvVRLDHzD2EgIu1w== X-Gm-Gg: ASbGncvC1bIbQF6Vfs+9G5EdjQqeF8p1EykRFz0ml4TOGRIJgLBvWYb3K0rrhkavln2 M+rLcF023paokUgCRjq1M/DTiIjafirPv03J6C4z1Vi2g84TAdTySzHkGvI1LrmWQN93rwLW6tC XvMFa4HBY8EWB/vvpxQ5wJUXVDxu+9Nzeojn0RvGAW3Nz34dLW2LzGJpxdgmFdDByd9apz5Q+0B oCIPEdlAFTf8CSa2wOtNYJorpIXnjed3LaOq010vRWpGB8ztF3H5BhPmwbEssghyik81TbuS8jr O3KM57oeSCbkh6gjAaPREVWbiF/gBtXUAQ== X-Received: by 2002:a5d:5888:0:b0:38a:5dc4:6dcd with SMTP id ffacd0b85a97d-38c5a9a7240mr13410101f8f.22.1738562970910; Sun, 02 Feb 2025 22:09:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IEf+3VI5670pMd/lVgmV0Mm3vvWtpHAljuer5PuUvDwJqAy01Hp3W0j74sNj80XOE/RSn6rVg== X-Received: by 2002:a5d:5888:0:b0:38a:5dc4:6dcd with SMTP id ffacd0b85a97d-38c5a9a7240mr13410081f8f.22.1738562970475; Sun, 02 Feb 2025 22:09:30 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38c5c0ec89csm11559908f8f.6.2025.02.02.22.09.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Feb 2025 22:09:29 -0800 (PST) Date: Mon, 3 Feb 2025 07:09:28 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: <20250203070928.54561e7e@elisabeth> In-Reply-To: References: <20250128075001.3557d398@elisabeth> <20250129083350.220a7ab0@elisabeth> <20250130055522.39acb265@elisabeth> <20250130093236.117c3fd0@elisabeth> <20250131063655.41a5861b@elisabeth> <20250131100919.0950ec1e@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xe-4xlijk6Y31QtCsubGFZXsp6R0GkTY2kNZkpuYOyo_1738562972 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: L3QFQTVHQLYTSJPH7ZYLLKOVGW7XPOCL X-Message-ID-Hash: L3QFQTVHQLYTSJPH7ZYLLKOVGW7XPOCL X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 3 Feb 2025 11:46:13 +1100 David Gibson wrote: > On Fri, Jan 31, 2025 at 10:09:19AM +0100, Stefano Brivio wrote: > > Fixed, finally. Some answers: > > > > On Fri, 31 Jan 2025 17:14:18 +1100 > > David Gibson wrote: > > > > > On Fri, Jan 31, 2025 at 06:36:55AM +0100, Stefano Brivio wrote: > > > > On Thu, 30 Jan 2025 09:32:36 +0100 > > > > Stefano Brivio wrote: > > > > > > > > > I would like to quickly complete the whole flow first, because I think > > > > > we can inform design and implementation decisions much better at that > > > > > point > > > > > > > > So, there seems to be a problem with (testing?) this. I couldn't quite > > > > understand the root cause yet, and it doesn't happen with the reference > > > > source.c and target.c implementations I shared. > > > > > > > > Let's assume I have a connection in the source guest to 127.0.0.1:9091, > > > > from 127.0.0.1:56350. After the migration, in the target, I get: > > > > > > > > --- > > > > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 79 > > > > setsockopt(79, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > > > bind(79, {sa_family=AF_INET, sin_port=htons(56350), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 > > > > sendmsg(72, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[79]}], msg_controllen=24, msg_flags=0}, 0) = 1 > > > > recvfrom(72, "\1", 1, 0, NULL, NULL) = 1 > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) = 0 > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [1788468535], 4) = 0 > > > > write(2, "77.6923: ", 977.6923: ) = 9 > > > > write(2, "Set send queue sequence for sock"..., 51Set send queue sequence for socket 79 to 1788468535) = 51 > > > > write(2, "\n", 1 > > > > ) = 1 > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0 > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [115288604], 4) = 0 > > > > write(2, "77.6924: ", 977.6924: ) = 9 > > > > write(2, "Set receive queue sequence for s"..., 53Set receive queue sequence for socket 79 to 115288604) = 53 > > > > write(2, "\n", 1 > > > > ) = 1 > > > > connect(79, {sa_family=AF_INET, sin_port=htons(9091), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address) > > > > --- > > > > > > > > EADDRNOTAVAIL, according to the documentation, which seems to be > > > > consistent with a glance at the implementation (that is, I must be > > > > missing some issue in the kernel), should be returned on connect() if: > > > > > > > > EADDRNOTAVAIL > > > > (Internet domain sockets) The socket referred to by > > > > sockfd had not previously been bound to an address > > > > and, upon attempting to bind it to an ephemeral > > > > port, it was determined that all port numbers in the > > > > ephemeral port range are currently in use. See the > > > > discussion of /proc/sys/net/ipv4/ip_local_port_range > > > > in ip(7). > > > > > > > > but well, of course it was bound. > > > > > > > > To a port, indeed, not a full address, that is, any (0.0.0.0) and > > > > address port, but I think for the purposes of this description that > > > > bind() call is enough. > > > > > > So, I was wondering if binding to 0.0.0.0 is sufficient for a repaired > > > socket. > > > > It is. > > > > > Usually, of course, that 0.0.0.0 would be resolved to a real > > > address at connect() time. But TCP_REPAIR's version of connect() > > > bypasses a bunch of the usual connect logic, so maybe we need an > > > explicit address here. > > > > No need. > > Ok. > > > > ...but that doesn't explain the difference between passt and your test > > > implementation. > > > > The difference that actually matters is that the test implementation > > terminates, and that has the equivalent effect of switching off repair > > mode for the closed sockets, which frees up all the associated context, > > including the port. > > > > Usually, there are no valid operations on closed sockets (not even > > close()). This is the first exception I ever met: you can set > > TCP_REPAIR_OFF. > > I'm still confused by the specific sequence of events that's causing > the problem. If a socket is closed with close(2) it should no longer > exist, so I don't see how you could even attempt to do anything with > it. > > Do you mean that the socket is shutdown(RD|WR)? Or that it's been > closed by passt, but not by passt-repair? Or the other way around? > > I'd kind of assume that you _must_ close the socket while still in > repair mode, since we want it to go away on the source without > attempting to FIN or RST or anything. While the explanation for the issue is what you gave as comment to 8/20 (I need to close() the socket from passt-repair), let me answer here: sure, I must close() it, and it was close()d by passt but not passt-repair. > > But there's a catch: you can't pass a closed socket in repair mode via > > SCM_RIGHTS (well, I'm fairly sure nobody approached this level of > > insanity before): you get EBADF (which is an understatement). > > > > And there's another catch: if you actually try to do that, even if it > > fails, that has the same effect of clearing the socket entirely: you > > free up the port. > > !?! this is even more baffling. Passing what's now an unrelated, > unassigned integer as an fd is having some effect on a socket that was > around!? If so that's a horrifying kernel bug. Nah, most likely not. The EBADF on a close()d socket is a bit questionable (it should be EINVAL? Or a -1 socket in the recipient?), but other than that, the explanation is that passing that closed socket caused EOF in passt-repair, and passt-repair would quit, solving the issue. -- Stefano