From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=chPkcNU6; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 880C65A026F for ; Tue, 11 Mar 2025 22:55:42 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741730141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B87cnF2vxf66hNL6liSeFKVCzRoz4TxaOeFWbChGQwY=; b=chPkcNU6rGqeZE/KeWUcbDXnu7wVSri+51BMOmP6p9jIjffzSLeeoMtPMVgOSgneKmHo8P d5PpC+PQjKUPkn1pVYhsyqekTlu5MFa0DJTts/MKh/beziZwquM2zF3VDnT0uvCgr4nKTM 7Xa59CM3Rw5EFxqXwjLJB0HL/o3HFRw= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-192-d78RBb0JMb2WPXYR-H6xtw-1; Tue, 11 Mar 2025 17:55:39 -0400 X-MC-Unique: d78RBb0JMb2WPXYR-H6xtw-1 X-Mimecast-MFC-AGG-ID: d78RBb0JMb2WPXYR-H6xtw_1741730136 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3914bc0cc4aso1459597f8f.3 for ; Tue, 11 Mar 2025 14:55:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741730136; x=1742334936; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=B87cnF2vxf66hNL6liSeFKVCzRoz4TxaOeFWbChGQwY=; b=MpZ0hk8/vKg2LqxFOkys35XQ4euRdV0mA6X8t1Dy4PNUYXI5T7TWtYULO7J5TF87u0 XEvUet4Uiu8JjQVkHC1dqKI0TcjqQj0bxqzLh601vM+Du96IASgzeWOMhvhzZ96l+Q9R u+iJ4KUguQ4zK5p1Izcc+E0Pqc1WAEx3KgCLc9BGuZz7SyIs/zbBA7xt/gq8qoKi96vf vApsd/uH4LoxyVPGbjS/RKcWmZFhpiu5hSgkD/CFSbLfsBEDVlFZyV3NCv02fVfaI2qO WbuHI8c0vtP74EWguL7HQ7SZCn5xiLyWp3gOIgaQeciH2Xv2qY8oCns/C72QceZusdol OdxA== X-Gm-Message-State: AOJu0Yw15/7UPZugYR6Abdo3T59x4BBwq4SFR/hVL/UodBQrwODLqaiw XgLs6o5xQTdkL8B/0oDzcKPTM8EPaHIuNmpViQJJpgt01qeWqg2fhLK8jQ3a2APULu/Ft77Ct5r 97jS9DOFaLm/qo+r0iua58uZjRn5j3XfMQmtJFEgNoyfzfhig+SPYj2O9EA== X-Gm-Gg: ASbGncugrjJ3V0tqMLffG90XagwsnLiEVGF+DTRI8GpP4VsMQ0OaZe4WEL2+9emf+iE avN62429SgB7isCXEuHFr8zQv6a5nhXP2f+YLMBafOXDS12t1OJgf9ntGd2nimna/m9d9/OHorr xfHwcWTsoEDUsBXZU7i15WtGheL4sZDnhNp2Clb5iWyigXTb4N3Y9lnGt8H2bqkYOlNmK5vfZpv 5TXlG1dkRYq/8z3hJRc8dPXQr3hdHHk5rF5fLMVQHXCWto8C4WCuIvhqMwgyhGOIa/Uy9ysCtH5 jZRGyPijeCw1tyF9Q9LfS6TRzyc= X-Received: by 2002:a5d:47cc:0:b0:38f:2766:759f with SMTP id ffacd0b85a97d-39132dbb069mr14899723f8f.41.1741730136097; Tue, 11 Mar 2025 14:55:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHSVha+dTWtgDnPId1fheTFV/P/tfaiypO8wl3TQUVkduJvzpSQnj/SHxNLTxYNzBfoZTytcw== X-Received: by 2002:a5d:47cc:0:b0:38f:2766:759f with SMTP id ffacd0b85a97d-39132dbb069mr14899710f8f.41.1741730135650; Tue, 11 Mar 2025 14:55:35 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3912c0e2bacsm19193924f8f.78.2025.03.11.14.55.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Mar 2025 14:55:35 -0700 (PDT) Date: Tue, 11 Mar 2025 22:55:32 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect Message-ID: <20250311225532.7ddaa1cd@elisabeth> In-Reply-To: References: <20250307224129.2789988-1-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 1mW9Dqt3mMET7u1v71BQKJcP0sIl3uKPF2ILodvr5u4_1741730136 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: PU2O2QBJWLCD756SVFPHOP7HV4JH7C4E X-Message-ID-Hash: PU2O2QBJWLCD756SVFPHOP7HV4JH7C4E X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 11 Mar 2025 12:13:46 +1100 David Gibson wrote: > On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote: > > ...and time out after that. This will be needed because of an upcoming > > change to passt-repair enabling it to start before passt is started, > > on both source and target, by means of an inotify watch. > > > > Once the inotify watch triggers, passt-repair will connect right away, > > but we have no guarantees that the connection completes before we > > start the migration process, so wait for it (for a reasonable amount > > of time). > > > > Signed-off-by: Stefano Brivio > > I still think it's ugly, of course, but I don't see a better way, so: > > Reviewed-by: David Gibson > > > --- > > v2: > > > > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to > > migrate a simple guest with 256 MiB of memory and no storage other > > than an initramfs in less than 4 milliseconds, at least on my test > > system (rather fast CPU threads and memory interface), I think that > > 10 ms shouldn't make a big difference in case passt-repair is not > > available for whatever reason > > So, IIUC, that 4ms is the *total* migration time. Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair comparison. That is: $ git diff diff --git a/migrate.c b/migrate.c index 0fca77b..3d36843 100644 --- a/migrate.c +++ b/migrate.c @@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c) if (c->device_state_fd < 0) return; +#include + { + struct timespec now; + clock_gettime(CLOCK_REALTIME, &now); + err("tv: %li.%li", now.tv_sec, now.tv_nsec); + } + debug("Handling migration request from fd: %d, target: %d", c->device_state_fd, c->migrate_target); $ grep tv\: test/test_logs/context_passt_*.log test/test_logs/context_passt_1.log:tv: 1741729630.368652064 test/test_logs/context_passt_2.log:tv: 1741729630.378664420 In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512 MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly more than 4 ms. One flow or zero flows seem to make little difference. > The concern here is > not that we add to the total migration time, but that we add to the > migration downtime, that is, the time the guest is not running > anywhere. The downtime can be much smaller than the total migration > time. Furthermore qemu has no way to account for this delay in its > estimate of what the downtime will be - the time for transferring > device state is pretty much assumed to be neglible in comparison to > transferring guest memory contents. So, if qemu stops the guest at > the point that the remaining memory transfer will just fit in the > downtime limit, any delays we add will likely cause the downtime limit > to be missed by that much. > > Now, as it happens, the default downtime limit is 300ms, so an > additional 10ms is probably fine (though 100ms really wasn't). > Nonetheless the reasoning above isn't valid. ~50 ms is actually quite easy to get with a few (8) gigabytes of memory, that's why 100 ms also looked fine to me, but sure, 10 ms sounds more reasonable. -- Stefano