From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=D8WIyyBo; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id CFC8C5A0271 for ; Wed, 12 Mar 2025 21:40:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741812028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O45+9JHvwNJVBGQwh4jhfPL8o0obIW8dWlLjzbnZHU4=; b=D8WIyyBobL0zs7LoLcpCA9ZkIfw3HBwehG7faBIFDNC4fTj62yaCBb77GdsNaVaMmjXkSj GkRDg1dVIATiXojuGMzvZFVwLpESpyvNGXCLZPUqcM/DVGrdEwyc3DWtZ12wP6x/vOdj7U tEGsA4HCGPW22xfaawcIhFdE4ahtLGM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-694-nyzH6bFDNWCec7Di7zvqBQ-1; Wed, 12 Mar 2025 16:39:17 -0400 X-MC-Unique: nyzH6bFDNWCec7Di7zvqBQ-1 X-Mimecast-MFC-AGG-ID: nyzH6bFDNWCec7Di7zvqBQ_1741811956 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43ce245c5acso1318715e9.2 for ; Wed, 12 Mar 2025 13:39:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741811955; x=1742416755; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=O45+9JHvwNJVBGQwh4jhfPL8o0obIW8dWlLjzbnZHU4=; b=L3mmo7nB9/ntzxXLkQpBHvBGnkQCrDDlwpe3hYniFF2LszuKiGHCEG2WYiBlzXnB7R LDAngNUH2UoBWCYuw5Os6T92AnJkQphmpquMgE8I4Atsm7ktVn/OEMGIiqkLlp5gRucL hKryKRhXSe57B483U03LIx5SgL9Tb7GqjwWxhUYKNGxqMVRR58d2GN0aiLBCAsS0/gZV cgjHthT9+mnoeUrwQMwUCUj9N2cz8Szb4eRIflUz0bGnY23iG56j3ofjcib7Wrvz8/pH qX9KxZ79BjDekCVNB6Bi1jfd9cFAZvRjDTWQFAu3XRiPC3n2+3NZbzcckGq1X++AyDny 7Qig== X-Gm-Message-State: AOJu0Yxzp7S2kZaubMMEoLCyQ1yQgL6Z2jNEv/SnCPg0r8ukCDnyPRm6 1q5/Xgh7NzKroIRRKk3LohVEAk8F51NKFpI/375nY/RKnEKl4XeDxyV9MhJ78y2nprB3KDWQcyK Z1pWABqjHV/5beUWeQOOXNk6Q/mqIQHLpPKqkksI1yCOlRLbcKLsw6MRi2w== X-Gm-Gg: ASbGncuz9a/J6ulQG3UmkFoTq6aF5Yg8aO4hYseMFwhNAm31J8TeN8+nWC93+Xdvawh gq3xTwFKq66CIieP9Xabe7RvQoj/3/2Vhuk75BAKdc/j8XmL4h1eHiLCnJPxGRNTNwuMrtKU4mG 8Q/XIBBKCfUc1DrF2C4DOCzWAzoyQmmTGH4mxbEL462AJa9vxRV9JUOPESrG5krvgRmTvC5GjI3 +pE7yHjStyQI7xfrlf/Bhn9XgGds7VN3Y7zKkFUuUkuoyG8j6rbBBbIRn0wXhzOzI/Il8HLIJzz hV7Pfie7SxuxKjq6vSd8lP3piP7GWchanqYST11RK7xC X-Received: by 2002:a5d:6d8c:0:b0:390:dfa1:3448 with SMTP id ffacd0b85a97d-39132dbb23emr22338760f8f.43.1741811955521; Wed, 12 Mar 2025 13:39:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHIV3s06reDoz8Cs2WLa1lF2088ESuurf2V38XtU5FMyK1XU1H/jH3ru6E3MCkt6N2uk/vLPw== X-Received: by 2002:a5d:6d8c:0:b0:390:dfa1:3448 with SMTP id ffacd0b85a97d-39132dbb23emr22338747f8f.43.1741811955109; Wed, 12 Mar 2025 13:39:15 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d0a7582absm31489105e9.17.2025.03.12.13.39.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Mar 2025 13:39:14 -0700 (PDT) Date: Wed, 12 Mar 2025 21:39:10 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect Message-ID: <20250312213910.059118d6@elisabeth> In-Reply-To: References: <20250307224129.2789988-1-sbrivio@redhat.com> <20250311225532.7ddaa1cd@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: lTp_cn9gC_77nA_JWL134N7S7TT2ENmZzjLHuVySvKc_1741811956 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: KZCOEG6ZNXUWGCJHWOSAJWRWLSLF7KHM X-Message-ID-Hash: KZCOEG6ZNXUWGCJHWOSAJWRWLSLF7KHM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 12 Mar 2025 12:29:11 +1100 David Gibson wrote: > On Tue, Mar 11, 2025 at 10:55:32PM +0100, Stefano Brivio wrote: > > On Tue, 11 Mar 2025 12:13:46 +1100 > > David Gibson wrote: > > > > > On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote: > > > > ...and time out after that. This will be needed because of an upcoming > > > > change to passt-repair enabling it to start before passt is started, > > > > on both source and target, by means of an inotify watch. > > > > > > > > Once the inotify watch triggers, passt-repair will connect right away, > > > > but we have no guarantees that the connection completes before we > > > > start the migration process, so wait for it (for a reasonable amount > > > > of time). > > > > > > > > Signed-off-by: Stefano Brivio > > > > > > I still think it's ugly, of course, but I don't see a better way, so: > > > > > > Reviewed-by: David Gibson > > > > > > > --- > > > > v2: > > > > > > > > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to > > > > migrate a simple guest with 256 MiB of memory and no storage other > > > > than an initramfs in less than 4 milliseconds, at least on my test > > > > system (rather fast CPU threads and memory interface), I think that > > > > 10 ms shouldn't make a big difference in case passt-repair is not > > > > available for whatever reason > > > > > > So, IIUC, that 4ms is the *total* migration time. > > > > Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair > > comparison. That is: > > > > $ git diff > > diff --git a/migrate.c b/migrate.c > > index 0fca77b..3d36843 100644 > > --- a/migrate.c > > +++ b/migrate.c > > @@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c) > > if (c->device_state_fd < 0) > > return; > > > > +#include > > + { > > + struct timespec now; > > + clock_gettime(CLOCK_REALTIME, &now); > > + err("tv: %li.%li", now.tv_sec, now.tv_nsec); > > + } > > + > > debug("Handling migration request from fd: %d, target: %d", > > c->device_state_fd, c->migrate_target); > > Ah. That still doesn't really measure the guest downtime, for two reasons: > * It measures from start of migration on the source to start of > migration on the target, largely ignoring the actual duration of > passt's processing > * It ignores everything that happens during the final migration > phase *except* for passt itself > > But, it is necessarily a lower bound on the downtime, which I guess is > enough in this instance. > > > $ grep tv\: test/test_logs/context_passt_*.log > > test/test_logs/context_passt_1.log:tv: 1741729630.368652064 > > test/test_logs/context_passt_2.log:tv: 1741729630.378664420 > > > > In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512 > > MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly > > more than 4 ms. One flow or zero flows seem to make little difference. > > Of course, because both ends of the measurement take place before we > actually do anything. I wouldn't expect it to vary based on how much > we're doing. All this really measures is the latency from notifying > the source passt to notifying the target passt. > > > > The concern here is > > > not that we add to the total migration time, but that we add to the > > > migration downtime, that is, the time the guest is not running > > > anywhere. The downtime can be much smaller than the total migration > > > time. Furthermore qemu has no way to account for this delay in its > > > estimate of what the downtime will be - the time for transferring > > > device state is pretty much assumed to be neglible in comparison to > > > transferring guest memory contents. So, if qemu stops the guest at > > > the point that the remaining memory transfer will just fit in the > > > downtime limit, any delays we add will likely cause the downtime limit > > > to be missed by that much. > > > > > > Now, as it happens, the default downtime limit is 300ms, so an > > > additional 10ms is probably fine (though 100ms really wasn't). > > > Nonetheless the reasoning above isn't valid. > > > > ~50 ms is actually quite easy to get with a few (8) gigabytes of > > memory, > > 50ms as measured above? That's a bit surprising, because there's no > particular reason for it to depend on memory size. AFAICT > SET_DEVICE_STATE_FD is called close to immediately before actually > reading/writing the stream from the backend. Oops, right, this figure I had in mind actually came from a rather different measurement, that is, checking when the guest appeared to resume from traffic captures with iperf3 running. I definitely can't see this difference if I repeat the same measurement as above. > The memory size will of course affect the total migration time, and > maybe the downtime. As soon as qemu thinks it can transfer all > remaining RAM within its downtime limit, qemu will go to the stopped > phase. With a fast local to local connection, it's possible qemu > could enter that stopped phase almost immediately. > > > that's why 100 ms also looked fine to me, but sure, 10 ms > > sounds more reasonable. -- Stefano