From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=EOIcmd/Z; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D54755A026F for ; Thu, 13 Mar 2025 04:04:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1741835042; bh=pd5NVLI2bNVjnPz3Iqucgg3DbdvZtbq/WhZr6WHaxFE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=EOIcmd/Z1iVAJ+UqKH5DFE5tjz0ftQBJptwCMQKIsVtqtM7r7BvdTO+bUYgT1HmYs lE/DOEkpvixIDwZrnoQojCGE5XOkpPWV1vGHd0Fy9xpr31EPhDQhTphcf+HtEXHn1E 6FjutgDoxvS+GPiCIoYkQ2cfqFLMYJO4MIPB4vITpe/0isKxoq33qqrGR1H3kYRd1T hKverR9ntWlrLh2u2yvTPEyeqoKcnLAkufwvocaLwtfnfCQgbblkcud9OBu/QA0SUD ariyEi8ygv503hiSVHKrx1TpVj/A/PwZfzv7Ab3/PKdoEP7YnQm3I3jLa1upX3ihyF ifCY68wMUmyfw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4ZCslk18p0z4x04; Thu, 13 Mar 2025 14:04:02 +1100 (AEDT) Date: Thu, 13 Mar 2025 14:03:57 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect Message-ID: References: <20250307224129.2789988-1-sbrivio@redhat.com> <20250311225532.7ddaa1cd@elisabeth> <20250312213910.059118d6@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="NHhYquYlFQtwOSzJ" Content-Disposition: inline In-Reply-To: <20250312213910.059118d6@elisabeth> Message-ID-Hash: MUHLJEUCPX6M64TSYMVOBQ6BIJS5SMPC X-Message-ID-Hash: MUHLJEUCPX6M64TSYMVOBQ6BIJS5SMPC X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --NHhYquYlFQtwOSzJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 12, 2025 at 09:39:10PM +0100, Stefano Brivio wrote: > On Wed, 12 Mar 2025 12:29:11 +1100 > David Gibson wrote: >=20 > > On Tue, Mar 11, 2025 at 10:55:32PM +0100, Stefano Brivio wrote: > > > On Tue, 11 Mar 2025 12:13:46 +1100 > > > David Gibson wrote: [snip] > > > > Now, as it happens, the default downtime limit is 300ms, so an > > > > additional 10ms is probably fine (though 100ms really wasn't). > > > > Nonetheless the reasoning above isn't valid. =20 > > >=20 > > > ~50 ms is actually quite easy to get with a few (8) gigabytes of > > > memory, =20 > >=20 > > 50ms as measured above? That's a bit surprising, because there's no > > particular reason for it to depend on memory size. AFAICT > > SET_DEVICE_STATE_FD is called close to immediately before actually > > reading/writing the stream from the backend. >=20 > Oops, right, this figure I had in mind actually came from a rather > different measurement, that is, checking when the guest appeared to > resume from traffic captures with iperf3 running. Ok. That is a reasonable measure of the downtime, at least as long as the guest is continuously trying to send, which it will with iperf3. Which means by adding a 100ms delay, you'd triple the downtime, which isn't really ok. With more RAM and/or smaller migration bandwidth this would increase up to the 300ms limit. In that case 100ms would still be a 33% (unaccounted for) increase, which still isn't really ok. > I definitely can't see this difference if I repeat the same measurement > as above. >=20 > > The memory size will of course affect the total migration time, and > > maybe the downtime. As soon as qemu thinks it can transfer all > > remaining RAM within its downtime limit, qemu will go to the stopped > > phase. With a fast local to local connection, it's possible qemu > > could enter that stopped phase almost immediately. > >=20 > > > that's why 100 ms also looked fine to me, but sure, 10 ms > > > sounds more reasonable. >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --NHhYquYlFQtwOSzJ Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmfSSxwACgkQzQJF27ox 2GeQzQ//es38VN/jy60d9uYfjg4ZUn+3OhamAC3CipaMmFSkgVtbASeG0MZtc79j QuWUcCysfAra5PhniVA+5fTvcQqa1/ps4JpxmnLq5SS4i7TWJBtsOQ/qmRKP8bV4 WlQAjU9lgxcg/zPNBomWS7wh1C+sYRWWcuaYJ4unsSNlxtv4py/okwPRG8NfWoMJ 4T3QMgG7N8VGoLbbZj4A1FAy1PTvPe6LyiOgIYlyc04Y5BFrXr+JraJiofOP6P9B HfnCO/S0xeBik4M0zkODU9YkccPu0YpluOuxb2XFbLLPRC8iyhmWwPwDpbe+uHVs iAvFLVYKHNPTieORLRWEEL5kJgrNeYQ4OY3Hom0GPguZEnc/9KeBVJ59Esqp/ARZ KKsA+St5qrp5LPClFduHGehB7wqtooKSI4aMau4wjny0vWV/xHPaV716sIJYHrwR s6v+XAfwlfxGsnR/BAhFx47Kncb3IW9Jx5X4GWYjY4GZh4u//ACNQt6dB/bO98eL OYANifw8PV+5poqQsTuSVgpHp17X4Eatz7tirpfSh6JsDu1f/eNl+f1Mpj4iHbci DaYv7M1xWlWZRM5iMaeYRd3+MEc2hQKVUFhiUWME98qBbbhnU7yRRj39sz47ZTDS cEsNKsnms8xaHrHDDxLAdPqkFQhrczl78ugm08gcFvZ9cmr2P3E= =I5jm -----END PGP SIGNATURE----- --NHhYquYlFQtwOSzJ--