From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FS3z4C+n; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id AE08E5A0276 for ; Tue, 18 Mar 2025 09:28:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1742286508; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=owd0OieeuG0mKYLxJFDZ+a3WDKOhP2F0PNmgU/lQ9G8=; b=FS3z4C+ngLwFZA3tSBm9FsObC8LfoL3xViQ3Xq7XXFNKWgGnUwGI5ILNgn99sh/YLsD6ww w42HKOhln80VXsNTO952922y2g/ZwQ6JqD5zuWHQoU0IqQagmLoX4myDZN8lCCL1BQ/uh5 cz5I6ubtrwgQv6WYy2DpOgjlXK2IGGo= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-642-Bh_5USHuMYiovt59DMldew-1; Tue, 18 Mar 2025 04:28:27 -0400 X-MC-Unique: Bh_5USHuMYiovt59DMldew-1 X-Mimecast-MFC-AGG-ID: Bh_5USHuMYiovt59DMldew_1742286506 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-39142ce2151so2427818f8f.1 for ; Tue, 18 Mar 2025 01:28:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742286505; x=1742891305; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=owd0OieeuG0mKYLxJFDZ+a3WDKOhP2F0PNmgU/lQ9G8=; b=kVnH7rmuposjPN91tWpmLzGYsrASY6f19B37doe/446Kdn0fbVWe8aHRwD0cCFrXHh YpdmA5oR8s8XWl+PKWsyaxhcBL8BLIeKPVjyGque/v77Epfq1GBbtqXgMTQdHK9Li5hw cCHle6KayghtwkK30fYaov0725yCfk2u2wi5W11OPakSsQE7u4NQIN5y/CYVrdbFxtm5 JRxgajUaPopwM33Ahlbm+zxMxoXqZXTJ7cBDmvTiNxVYA+F8skK3N4Zd7FtMovwS/31Z cMW1e1IpapNxl34g2Np8gUeflLRDs7KIBCvOOodK/xoQw0pvJe2GRq1sed0oxuL8coFe nG7g== X-Gm-Message-State: AOJu0YxMYjyQf8x21wDdHNj3RnsoDQqCE7oKYQf85DVz8qZsH6/XJSMm a1t3/kx9XwP/sI426LBARoQRcdLTsuUQFw2PUQI/49Hh+DgQr5W1PS4bJrZh+0f7pA1damcgBmM ObPi4wYlS8aIKNJZRdD+y8n6K8oOTuzVmsbmI8+jA/IVtGhImMuQeSRRqFw== X-Gm-Gg: ASbGnctPjG3qO/3K8Sn601w2x4Bfux5BDoOmH0U4jK1UpheCNpcSIDunRtFyXho0SfY b8PyjYUE3ZRYZ+HT307LV6lfg6I0HnnzYzA6IOtXYVyBDFsUB1wtwb0m1mCRRVVCv9s3ckgoiS9 aehsFUHUh+DPSbXHvkl/FizztoxOgyNl3K5YFA1NEskW8SOJ42uNcQUO8s4GB5srK6/XCocBH5U bNhoBAE+WY+2WYvXLG4LtbM1PFvw8gKRK3MBuYNXt0uE3w2vWBOD6Kyggy3zP06bHnAj3dOkKiR xPYXeNJXaAQb/FSwfuzMjd1yeOE= X-Received: by 2002:a5d:64a9:0:b0:390:fd7c:98be with SMTP id ffacd0b85a97d-3971d9f0df7mr20100971f8f.19.1742286505642; Tue, 18 Mar 2025 01:28:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEveAS9aiDFxRYSYeemxWKBTpEZ7gs0raKtqPgvS4yP4bj4vx4gjM4aq/6rV/MKwh6k9ae9cQ== X-Received: by 2002:a5d:64a9:0:b0:390:fd7c:98be with SMTP id ffacd0b85a97d-3971d9f0df7mr20100947f8f.19.1742286505295; Tue, 18 Mar 2025 01:28:25 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-395cb40fab8sm17927481f8f.63.2025.03.18.01.28.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Mar 2025 01:28:25 -0700 (PDT) Date: Tue, 18 Mar 2025 09:28:23 +0100 From: Stefano Brivio To: David Gibson Subject: Re: Migration failure across bridge Message-ID: <20250318092823.0bca8887@elisabeth> In-Reply-To: References: Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7Ef2EAgN9ULEUxvSvMGY238AwryyiOkUW1Agup0-BEk_1742286506 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: R64ACHRJ4KFTWATYFCTTTOMPPHDE56VY X-Message-ID-Hash: R64ACHRJ4KFTWATYFCTTTOMPPHDE56VY X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 18 Mar 2025 16:21:58 +1100 David Gibson wrote: > Continued investigating the problem with migration failing across a > bridge. > > Good news is I've found the problem... or at least one problem. \o/ > Bad > news is we'll have to change the migration stream format to fix it. Whoops, sorry, my bad. And now, RFC 7323, section 3.2, contrary to RFC 1323 (also section 3.2), requires that we keep sending timestamps if we negotiated them: Once TSopt has been successfully negotiated, that is both and contain TSopt, the TSopt MUST be sent in every non- segment for the duration of the connection ...so we can't just disable them for migrated flows. Strictly speaking, I don't think it's necessary to define a new version of the format, because I'm really really sure nobody is using this yet, other than for tests. If you want to use this as a chance to play with/test a version bump, we can do it. My preference would be to keep this as v1 anyway for the moment, regardless of the *non*-breakage, for simplicity. That is, whoops, migration is broken on 2025_02_17.a1e48a0. > The packets are being dropped in tcp_validate_incoming() due to a > failed PAWS check (skb drop reason "TCP_RFC7323_PAWS"). That in turn > looks to be because we don't preserve TCP timestamp state across the > migration. We preserve _whether_ TCP timestamps are active on the > connection (TCPOPT_TIMESTAMP entry in TCP_REPAIR_OPTIONS), but we > don't preserve the current timestamp values (TCP_TIMESTAMP socket > option). The equivalent CRIU code is > > https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa3575691977eb21753/soccr/soccr.c#L266 > > and > > https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa3575691977eb21753/soccr/soccr.c#L572 > > I'll work on writing a fix tomorrow. > > Not yet sure why we didn't hit this with a local migration. I'm > guessing some part of being a local connection means we're bypassing > the PAWS check. The TCP_TIMESTAMP option is documented... not where it should be documented, grr: https://criu.org/index.php?title=TCP_connection#Timestamp and I _guess_ that two guests using kvm-clock as clock source might actually have the same jiffies, and from this description, same jiffies, same timestamps. Perhaps in your nested case not all guests are using kvm-clock, or there's something else to it. -- Stefano