From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=chPkcNU6;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by passt.top (Postfix) with ESMTPS id 880C65A026F
	for <passt-dev@passt.top>; Tue, 11 Mar 2025 22:55:42 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1741730141;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=B87cnF2vxf66hNL6liSeFKVCzRoz4TxaOeFWbChGQwY=;
	b=chPkcNU6rGqeZE/KeWUcbDXnu7wVSri+51BMOmP6p9jIjffzSLeeoMtPMVgOSgneKmHo8P
	d5PpC+PQjKUPkn1pVYhsyqekTlu5MFa0DJTts/MKh/beziZwquM2zF3VDnT0uvCgr4nKTM
	7Xa59CM3Rw5EFxqXwjLJB0HL/o3HFRw=
Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com
 [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-192-d78RBb0JMb2WPXYR-H6xtw-1; Tue, 11 Mar 2025 17:55:39 -0400
X-MC-Unique: d78RBb0JMb2WPXYR-H6xtw-1
X-Mimecast-MFC-AGG-ID: d78RBb0JMb2WPXYR-H6xtw_1741730136
Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3914bc0cc4aso1459597f8f.3
        for <passt-dev@passt.top>; Tue, 11 Mar 2025 14:55:38 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741730136; x=1742334936;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=B87cnF2vxf66hNL6liSeFKVCzRoz4TxaOeFWbChGQwY=;
        b=MpZ0hk8/vKg2LqxFOkys35XQ4euRdV0mA6X8t1Dy4PNUYXI5T7TWtYULO7J5TF87u0
         XEvUet4Uiu8JjQVkHC1dqKI0TcjqQj0bxqzLh601vM+Du96IASgzeWOMhvhzZ96l+Q9R
         u+iJ4KUguQ4zK5p1Izcc+E0Pqc1WAEx3KgCLc9BGuZz7SyIs/zbBA7xt/gq8qoKi96vf
         vApsd/uH4LoxyVPGbjS/RKcWmZFhpiu5hSgkD/CFSbLfsBEDVlFZyV3NCv02fVfaI2qO
         WbuHI8c0vtP74EWguL7HQ7SZCn5xiLyWp3gOIgaQeciH2Xv2qY8oCns/C72QceZusdol
         OdxA==
X-Gm-Message-State: AOJu0Yw15/7UPZugYR6Abdo3T59x4BBwq4SFR/hVL/UodBQrwODLqaiw
	XgLs6o5xQTdkL8B/0oDzcKPTM8EPaHIuNmpViQJJpgt01qeWqg2fhLK8jQ3a2APULu/Ft77Ct5r
	97jS9DOFaLm/qo+r0iua58uZjRn5j3XfMQmtJFEgNoyfzfhig+SPYj2O9EA==
X-Gm-Gg: ASbGncugrjJ3V0tqMLffG90XagwsnLiEVGF+DTRI8GpP4VsMQ0OaZe4WEL2+9emf+iE
	avN62429SgB7isCXEuHFr8zQv6a5nhXP2f+YLMBafOXDS12t1OJgf9ntGd2nimna/m9d9/OHorr
	xfHwcWTsoEDUsBXZU7i15WtGheL4sZDnhNp2Clb5iWyigXTb4N3Y9lnGt8H2bqkYOlNmK5vfZpv
	5TXlG1dkRYq/8z3hJRc8dPXQr3hdHHk5rF5fLMVQHXCWto8C4WCuIvhqMwgyhGOIa/Uy9ysCtH5
	jZRGyPijeCw1tyF9Q9LfS6TRzyc=
X-Received: by 2002:a5d:47cc:0:b0:38f:2766:759f with SMTP id ffacd0b85a97d-39132dbb069mr14899723f8f.41.1741730136097;
        Tue, 11 Mar 2025 14:55:36 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IHSVha+dTWtgDnPId1fheTFV/P/tfaiypO8wl3TQUVkduJvzpSQnj/SHxNLTxYNzBfoZTytcw==
X-Received: by 2002:a5d:47cc:0:b0:38f:2766:759f with SMTP id ffacd0b85a97d-39132dbb069mr14899710f8f.41.1741730135650;
        Tue, 11 Mar 2025 14:55:35 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3912c0e2bacsm19193924f8f.78.2025.03.11.14.55.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 11 Mar 2025 14:55:35 -0700 (PDT)
Date: Tue, 11 Mar 2025 22:55:32 +0100
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v2] flow, repair: Wait for a short while for
 passt-repair to connect
Message-ID: <20250311225532.7ddaa1cd@elisabeth>
In-Reply-To: <Z8-OSrfzr9GkFzHD@zatzit>
References: <20250307224129.2789988-1-sbrivio@redhat.com>
	<Z8-OSrfzr9GkFzHD@zatzit>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: 1mW9Dqt3mMET7u1v71BQKJcP0sIl3uKPF2ILodvr5u4_1741730136
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: PU2O2QBJWLCD756SVFPHOP7HV4JH7C4E
X-Message-ID-Hash: PU2O2QBJWLCD756SVFPHOP7HV4JH7C4E
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20250311225532.7ddaa1cd@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/PU2O2QBJWLCD756SVFPHOP7HV4JH7C4E/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Tue, 11 Mar 2025 12:13:46 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote:
> > ...and time out after that. This will be needed because of an upcoming
> > change to passt-repair enabling it to start before passt is started,
> > on both source and target, by means of an inotify watch.
> > 
> > Once the inotify watch triggers, passt-repair will connect right away,
> > but we have no guarantees that the connection completes before we
> > start the migration process, so wait for it (for a reasonable amount
> > of time).
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> I still think it's ugly, of course, but I don't see a better way, so:
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> > ---
> > v2:
> > 
> > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
> >   migrate a simple guest with 256 MiB of memory and no storage other
> >   than an initramfs in less than 4 milliseconds, at least on my test
> >   system (rather fast CPU threads and memory interface), I think that
> >   10 ms shouldn't make a big difference in case passt-repair is not
> >   available for whatever reason  
> 
> So, IIUC, that 4ms is the *total* migration time.

Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair
comparison. That is:

$ git diff
diff --git a/migrate.c b/migrate.c
index 0fca77b..3d36843 100644
--- a/migrate.c
+++ b/migrate.c
@@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c)
 	if (c->device_state_fd < 0)
 		return;
 
+#include <time.h>
+	{
+		struct timespec now;
+		clock_gettime(CLOCK_REALTIME, &now);
+		err("tv: %li.%li", now.tv_sec, now.tv_nsec);
+	}
+
 	debug("Handling migration request from fd: %d, target: %d",
 	      c->device_state_fd, c->migrate_target);
 
$ grep tv\: test/test_logs/context_passt_*.log
test/test_logs/context_passt_1.log:tv: 1741729630.368652064
test/test_logs/context_passt_2.log:tv: 1741729630.378664420

In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512
MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly
more than 4 ms. One flow or zero flows seem to make little difference.

> The concern here is
> not that we add to the total migration time, but that we add to the
> migration downtime, that is, the time the guest is not running
> anywhere.  The downtime can be much smaller than the total migration
> time.  Furthermore qemu has no way to account for this delay in its
> estimate of what the downtime will be - the time for transferring
> device state is pretty much assumed to be neglible in comparison to
> transferring guest memory contents.  So, if qemu stops the guest at
> the point that the remaining memory transfer will just fit in the
> downtime limit, any delays we add will likely cause the downtime limit
> to be missed by that much.
> 
> Now, as it happens, the default downtime limit is 300ms, so an
> additional 10ms is probably fine (though 100ms really wasn't).
> Nonetheless the reasoning above isn't valid.

~50 ms is actually quite easy to get with a few (8) gigabytes of
memory, that's why 100 ms also looked fine to me, but sure, 10 ms
sounds more reasonable.

-- 
Stefano