[PATCH v2] flow, repair: Wait for a short while for passt-repair to connect

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
@ 2025-03-07 22:41 Stefano Brivio
  2025-03-11  1:13 ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-03-07 22:41 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson

...and time out after that. This will be needed because of an upcoming
change to passt-repair enabling it to start before passt is started,
on both source and target, by means of an inotify watch.

Once the inotify watch triggers, passt-repair will connect right away,
but we have no guarantees that the connection completes before we
start the migration process, so wait for it (for a reasonable amount
of time).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
v2:

- Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
  migrate a simple guest with 256 MiB of memory and no storage other
  than an initramfs in less than 4 milliseconds, at least on my test
  system (rather fast CPU threads and memory interface), I think that
  10 ms shouldn't make a big difference in case passt-repair is not
  available for whatever reason

- Move the static assert next to the initialisation of 'tv'

 flow.c   | 20 ++++++++++++++++++++
 repair.c | 31 +++++++++++++++++++++++++++++++
 repair.h |  1 +
 3 files changed, 52 insertions(+)

diff --git a/flow.c b/flow.c
index 749c498..5e64b79 100644
--- a/flow.c
+++ b/flow.c
@@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
 	return ret;
 }
 
+/**
+ * flow_migrate_need_repair() - Do we need to set repair mode for any flow?
+ *
+ * Return: true if repair mode is needed, false otherwise
+ */
+static bool flow_migrate_need_repair(void)
+{
+	union flow *flow;
+
+	foreach_established_tcp_flow(flow)
+		return true;
+
+	return false;
+}
+
 /**
  * flow_migrate_repair_all() - Turn repair mode on or off for all flows
  * @c:		Execution context
@@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
 	(void)stage;
 	(void)fd;
 
+	if (flow_migrate_need_repair())
+		repair_wait(c);
+
 	if ((rc = flow_migrate_repair_all(c, true)))
 		return -rc;
 
@@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
 	if (!count)
 		return 0;
 
+	repair_wait(c);
+
 	if ((rc = flow_migrate_repair_all(c, true)))
 		return -rc;
 
diff --git a/repair.c b/repair.c
index 3ee089f..ebeb248 100644
--- a/repair.c
+++ b/repair.c
@@ -27,6 +27,10 @@
 
 #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
 
+/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */
+#define REPAIR_ACCEPT_TIMEOUT_MS	10
+#define REPAIR_ACCEPT_TIMEOUT_US	(REPAIR_ACCEPT_TIMEOUT_MS * 1000)
+
 /* Pending file descriptors for next repair_flush() call, or command change */
 static int repair_fds[SCM_MAX_FD];
 
@@ -138,6 +142,33 @@ void repair_handler(struct ctx *c, uint32_t events)
 	repair_close(c);
 }
 
+/**
+ * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect
+ * @c:		Execution context
+ */
+void repair_wait(struct ctx *c)
+{
+	struct timeval tv = { .tv_sec = 0,
+			      .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) };
+	static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);
+
+	if (c->fd_repair >= 0 || c->fd_repair_listen == -1)
+		return;
+
+	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
+		       &tv, sizeof(tv))) {
+		err_perror("Set timeout on TCP_REPAIR listening socket");
+		return;
+	}
+
+	repair_listen_handler(c, EPOLLIN);
+
+	tv.tv_usec = 0;
+	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
+		       &tv, sizeof(tv)))
+		err_perror("Clear timeout on TCP_REPAIR listening socket");
+}
+
 /**
  * repair_flush() - Flush current set of sockets to helper, with current command
  * @c:		Execution context
diff --git a/repair.h b/repair.h
index de279d6..1d37922 100644
--- a/repair.h
+++ b/repair.h
@@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c);
 void repair_listen_handler(struct ctx *c, uint32_t events);
 void repair_handler(struct ctx *c, uint32_t events);
 void repair_close(struct ctx *c);
+void repair_wait(struct ctx *c);
 int repair_flush(struct ctx *c);
 int repair_set(struct ctx *c, int s, int cmd);
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
  2025-03-07 22:41 [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect Stefano Brivio
@ 2025-03-11  1:13 ` David Gibson
  2025-03-11 21:55   ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-03-11  1:13 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5684 bytes --]

On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote:
> ...and time out after that. This will be needed because of an upcoming
> change to passt-repair enabling it to start before passt is started,
> on both source and target, by means of an inotify watch.
> 
> Once the inotify watch triggers, passt-repair will connect right away,
> but we have no guarantees that the connection completes before we
> start the migration process, so wait for it (for a reasonable amount
> of time).
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

I still think it's ugly, of course, but I don't see a better way, so:

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> v2:
> 
> - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
>   migrate a simple guest with 256 MiB of memory and no storage other
>   than an initramfs in less than 4 milliseconds, at least on my test
>   system (rather fast CPU threads and memory interface), I think that
>   10 ms shouldn't make a big difference in case passt-repair is not
>   available for whatever reason

So, IIUC, that 4ms is the *total* migration time.  The concern here is
not that we add to the total migration time, but that we add to the
migration downtime, that is, the time the guest is not running
anywhere.  The downtime can be much smaller than the total migration
time.  Furthermore qemu has no way to account for this delay in its
estimate of what the downtime will be - the time for transferring
device state is pretty much assumed to be neglible in comparison to
transferring guest memory contents.  So, if qemu stops the guest at
the point that the remaining memory transfer will just fit in the
downtime limit, any delays we add will likely cause the downtime limit
to be missed by that much.

Now, as it happens, the default downtime limit is 300ms, so an
additional 10ms is probably fine (though 100ms really wasn't).
Nonetheless the reasoning above isn't valid.

> - Move the static assert next to the initialisation of 'tv'
> 
>  flow.c   | 20 ++++++++++++++++++++
>  repair.c | 31 +++++++++++++++++++++++++++++++
>  repair.h |  1 +
>  3 files changed, 52 insertions(+)
> 
> diff --git a/flow.c b/flow.c
> index 749c498..5e64b79 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
>  	return ret;
>  }
>  
> +/**
> + * flow_migrate_need_repair() - Do we need to set repair mode for any flow?
> + *
> + * Return: true if repair mode is needed, false otherwise
> + */
> +static bool flow_migrate_need_repair(void)
> +{
> +	union flow *flow;
> +
> +	foreach_established_tcp_flow(flow)
> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * flow_migrate_repair_all() - Turn repair mode on or off for all flows
>   * @c:		Execution context
> @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
>  	(void)stage;
>  	(void)fd;
>  
> +	if (flow_migrate_need_repair())
> +		repair_wait(c);
> +
>  	if ((rc = flow_migrate_repair_all(c, true)))
>  		return -rc;
>  
> @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
>  	if (!count)
>  		return 0;
>  
> +	repair_wait(c);
> +
>  	if ((rc = flow_migrate_repair_all(c, true)))
>  		return -rc;
>  
> diff --git a/repair.c b/repair.c
> index 3ee089f..ebeb248 100644
> --- a/repair.c
> +++ b/repair.c
> @@ -27,6 +27,10 @@
>  
>  #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
>  
> +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */
> +#define REPAIR_ACCEPT_TIMEOUT_MS	10
> +#define REPAIR_ACCEPT_TIMEOUT_US	(REPAIR_ACCEPT_TIMEOUT_MS * 1000)
> +
>  /* Pending file descriptors for next repair_flush() call, or command change */
>  static int repair_fds[SCM_MAX_FD];
>  
> @@ -138,6 +142,33 @@ void repair_handler(struct ctx *c, uint32_t events)
>  	repair_close(c);
>  }
>  
> +/**
> + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect
> + * @c:		Execution context
> + */
> +void repair_wait(struct ctx *c)
> +{
> +	struct timeval tv = { .tv_sec = 0,
> +			      .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) };
> +	static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);
> +
> +	if (c->fd_repair >= 0 || c->fd_repair_listen == -1)
> +		return;
> +
> +	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> +		       &tv, sizeof(tv))) {
> +		err_perror("Set timeout on TCP_REPAIR listening socket");
> +		return;
> +	}
> +
> +	repair_listen_handler(c, EPOLLIN);
> +
> +	tv.tv_usec = 0;
> +	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> +		       &tv, sizeof(tv)))
> +		err_perror("Clear timeout on TCP_REPAIR listening socket");
> +}
> +
>  /**
>   * repair_flush() - Flush current set of sockets to helper, with current command
>   * @c:		Execution context
> diff --git a/repair.h b/repair.h
> index de279d6..1d37922 100644
> --- a/repair.h
> +++ b/repair.h
> @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c);
>  void repair_listen_handler(struct ctx *c, uint32_t events);
>  void repair_handler(struct ctx *c, uint32_t events);
>  void repair_close(struct ctx *c);
> +void repair_wait(struct ctx *c);
>  int repair_flush(struct ctx *c);
>  int repair_set(struct ctx *c, int s, int cmd);
>  

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
  2025-03-11  1:13 ` David Gibson
@ 2025-03-11 21:55   ` Stefano Brivio
  2025-03-12  1:29     ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-03-11 21:55 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Tue, 11 Mar 2025 12:13:46 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote:
> > ...and time out after that. This will be needed because of an upcoming
> > change to passt-repair enabling it to start before passt is started,
> > on both source and target, by means of an inotify watch.
> > 
> > Once the inotify watch triggers, passt-repair will connect right away,
> > but we have no guarantees that the connection completes before we
> > start the migration process, so wait for it (for a reasonable amount
> > of time).
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> I still think it's ugly, of course, but I don't see a better way, so:
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> > ---
> > v2:
> > 
> > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
> >   migrate a simple guest with 256 MiB of memory and no storage other
> >   than an initramfs in less than 4 milliseconds, at least on my test
> >   system (rather fast CPU threads and memory interface), I think that
> >   10 ms shouldn't make a big difference in case passt-repair is not
> >   available for whatever reason  
> 
> So, IIUC, that 4ms is the *total* migration time.

Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair
comparison. That is:

$ git diff
diff --git a/migrate.c b/migrate.c
index 0fca77b..3d36843 100644
--- a/migrate.c
+++ b/migrate.c
@@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c)
 	if (c->device_state_fd < 0)
 		return;
 
+#include <time.h>
+	{
+		struct timespec now;
+		clock_gettime(CLOCK_REALTIME, &now);
+		err("tv: %li.%li", now.tv_sec, now.tv_nsec);
+	}
+
 	debug("Handling migration request from fd: %d, target: %d",
 	      c->device_state_fd, c->migrate_target);
 
$ grep tv\: test/test_logs/context_passt_*.log
test/test_logs/context_passt_1.log:tv: 1741729630.368652064
test/test_logs/context_passt_2.log:tv: 1741729630.378664420

In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512
MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly
more than 4 ms. One flow or zero flows seem to make little difference.

> The concern here is
> not that we add to the total migration time, but that we add to the
> migration downtime, that is, the time the guest is not running
> anywhere.  The downtime can be much smaller than the total migration
> time.  Furthermore qemu has no way to account for this delay in its
> estimate of what the downtime will be - the time for transferring
> device state is pretty much assumed to be neglible in comparison to
> transferring guest memory contents.  So, if qemu stops the guest at
> the point that the remaining memory transfer will just fit in the
> downtime limit, any delays we add will likely cause the downtime limit
> to be missed by that much.
> 
> Now, as it happens, the default downtime limit is 300ms, so an
> additional 10ms is probably fine (though 100ms really wasn't).
> Nonetheless the reasoning above isn't valid.

~50 ms is actually quite easy to get with a few (8) gigabytes of
memory, that's why 100 ms also looked fine to me, but sure, 10 ms
sounds more reasonable.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
  2025-03-11 21:55   ` Stefano Brivio
@ 2025-03-12  1:29     ` David Gibson
  2025-03-12 20:39       ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-03-12  1:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5012 bytes --]

On Tue, Mar 11, 2025 at 10:55:32PM +0100, Stefano Brivio wrote:
> On Tue, 11 Mar 2025 12:13:46 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote:
> > > ...and time out after that. This will be needed because of an upcoming
> > > change to passt-repair enabling it to start before passt is started,
> > > on both source and target, by means of an inotify watch.
> > > 
> > > Once the inotify watch triggers, passt-repair will connect right away,
> > > but we have no guarantees that the connection completes before we
> > > start the migration process, so wait for it (for a reasonable amount
> > > of time).
> > > 
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> > 
> > I still think it's ugly, of course, but I don't see a better way, so:
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > > ---
> > > v2:
> > > 
> > > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
> > >   migrate a simple guest with 256 MiB of memory and no storage other
> > >   than an initramfs in less than 4 milliseconds, at least on my test
> > >   system (rather fast CPU threads and memory interface), I think that
> > >   10 ms shouldn't make a big difference in case passt-repair is not
> > >   available for whatever reason  
> > 
> > So, IIUC, that 4ms is the *total* migration time.
> 
> Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair
> comparison. That is:
> 
> $ git diff
> diff --git a/migrate.c b/migrate.c
> index 0fca77b..3d36843 100644
> --- a/migrate.c
> +++ b/migrate.c
> @@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c)
>  	if (c->device_state_fd < 0)
>  		return;
>  
> +#include <time.h>
> +	{
> +		struct timespec now;
> +		clock_gettime(CLOCK_REALTIME, &now);
> +		err("tv: %li.%li", now.tv_sec, now.tv_nsec);
> +	}
> +
>  	debug("Handling migration request from fd: %d, target: %d",
>  	      c->device_state_fd, c->migrate_target);

Ah.  That still doesn't really measure the guest downtime, for two reasons:
  * It measures from start of migration on the source to start of
    migration on the target, largely ignoring the actual duration of
    passt's processing
  * It ignores everything that happens during the final migration
    phase *except* for passt itself

But, it is necessarily a lower bound on the downtime, which I guess is
enough in this instance.

> $ grep tv\: test/test_logs/context_passt_*.log
> test/test_logs/context_passt_1.log:tv: 1741729630.368652064
> test/test_logs/context_passt_2.log:tv: 1741729630.378664420
> 
> In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512
> MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly
> more than 4 ms. One flow or zero flows seem to make little difference.

Of course, because both ends of the measurement take place before we
actually do anything.  I wouldn't expect it to vary based on how much
we're doing.  All this really measures is the latency from notifying
the source passt to notifying the target passt.

> > The concern here is
> > not that we add to the total migration time, but that we add to the
> > migration downtime, that is, the time the guest is not running
> > anywhere.  The downtime can be much smaller than the total migration
> > time.  Furthermore qemu has no way to account for this delay in its
> > estimate of what the downtime will be - the time for transferring
> > device state is pretty much assumed to be neglible in comparison to
> > transferring guest memory contents.  So, if qemu stops the guest at
> > the point that the remaining memory transfer will just fit in the
> > downtime limit, any delays we add will likely cause the downtime limit
> > to be missed by that much.
> > 
> > Now, as it happens, the default downtime limit is 300ms, so an
> > additional 10ms is probably fine (though 100ms really wasn't).
> > Nonetheless the reasoning above isn't valid.
> 
> ~50 ms is actually quite easy to get with a few (8) gigabytes of
> memory,

50ms as measured above?  That's a bit surprising, because there's no
particular reason for it to depend on memory size.  AFAICT
SET_DEVICE_STATE_FD is called close to immediately before actually
reading/writing the stream from the backend.

The memory size will of course affect the total migration time, and
maybe the downtime.  As soon as qemu thinks it can transfer all
remaining RAM within its downtime limit, qemu will go to the stopped
phase.  With a fast local to local connection, it's possible qemu
could enter that stopped phase almost immediately.

> that's why 100 ms also looked fine to me, but sure, 10 ms
> sounds more reasonable.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
  2025-03-12  1:29     ` David Gibson
@ 2025-03-12 20:39       ` Stefano Brivio
  2025-03-13  3:03         ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-03-12 20:39 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 12 Mar 2025 12:29:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Mar 11, 2025 at 10:55:32PM +0100, Stefano Brivio wrote:
> > On Tue, 11 Mar 2025 12:13:46 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Fri, Mar 07, 2025 at 11:41:29PM +0100, Stefano Brivio wrote:  
> > > > ...and time out after that. This will be needed because of an upcoming
> > > > change to passt-repair enabling it to start before passt is started,
> > > > on both source and target, by means of an inotify watch.
> > > > 
> > > > Once the inotify watch triggers, passt-repair will connect right away,
> > > > but we have no guarantees that the connection completes before we
> > > > start the migration process, so wait for it (for a reasonable amount
> > > > of time).
> > > > 
> > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>    
> > > 
> > > I still think it's ugly, of course, but I don't see a better way, so:
> > > 
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > >   
> > > > ---
> > > > v2:
> > > > 
> > > > - Use 10 ms as timeout instead of 100 ms. Given that I'm unable to
> > > >   migrate a simple guest with 256 MiB of memory and no storage other
> > > >   than an initramfs in less than 4 milliseconds, at least on my test
> > > >   system (rather fast CPU threads and memory interface), I think that
> > > >   10 ms shouldn't make a big difference in case passt-repair is not
> > > >   available for whatever reason    
> > > 
> > > So, IIUC, that 4ms is the *total* migration time.  
> > 
> > Ah, no, that's passt-to-passt in the migrate/basic test, to have a fair
> > comparison. That is:
> > 
> > $ git diff
> > diff --git a/migrate.c b/migrate.c
> > index 0fca77b..3d36843 100644
> > --- a/migrate.c
> > +++ b/migrate.c
> > @@ -286,6 +286,13 @@ void migrate_handler(struct ctx *c)
> >  	if (c->device_state_fd < 0)
> >  		return;
> >  
> > +#include <time.h>
> > +	{
> > +		struct timespec now;
> > +		clock_gettime(CLOCK_REALTIME, &now);
> > +		err("tv: %li.%li", now.tv_sec, now.tv_nsec);
> > +	}
> > +
> >  	debug("Handling migration request from fd: %d, target: %d",
> >  	      c->device_state_fd, c->migrate_target);  
> 
> Ah.  That still doesn't really measure the guest downtime, for two reasons:
>   * It measures from start of migration on the source to start of
>     migration on the target, largely ignoring the actual duration of
>     passt's processing
>   * It ignores everything that happens during the final migration
>     phase *except* for passt itself
> 
> But, it is necessarily a lower bound on the downtime, which I guess is
> enough in this instance.
> 
> > $ grep tv\: test/test_logs/context_passt_*.log
> > test/test_logs/context_passt_1.log:tv: 1741729630.368652064
> > test/test_logs/context_passt_2.log:tv: 1741729630.378664420
> > 
> > In this case it's 10 ms, but I can sometimes get 7 ms. This is with 512
> > MiB, but with 256 MiB I typically get 5 to 6 ms, and sometimes slightly
> > more than 4 ms. One flow or zero flows seem to make little difference.  
> 
> Of course, because both ends of the measurement take place before we
> actually do anything.  I wouldn't expect it to vary based on how much
> we're doing.  All this really measures is the latency from notifying
> the source passt to notifying the target passt.
> 
> > > The concern here is
> > > not that we add to the total migration time, but that we add to the
> > > migration downtime, that is, the time the guest is not running
> > > anywhere.  The downtime can be much smaller than the total migration
> > > time.  Furthermore qemu has no way to account for this delay in its
> > > estimate of what the downtime will be - the time for transferring
> > > device state is pretty much assumed to be neglible in comparison to
> > > transferring guest memory contents.  So, if qemu stops the guest at
> > > the point that the remaining memory transfer will just fit in the
> > > downtime limit, any delays we add will likely cause the downtime limit
> > > to be missed by that much.
> > > 
> > > Now, as it happens, the default downtime limit is 300ms, so an
> > > additional 10ms is probably fine (though 100ms really wasn't).
> > > Nonetheless the reasoning above isn't valid.  
> > 
> > ~50 ms is actually quite easy to get with a few (8) gigabytes of
> > memory,  
> 
> 50ms as measured above?  That's a bit surprising, because there's no
> particular reason for it to depend on memory size.  AFAICT
> SET_DEVICE_STATE_FD is called close to immediately before actually
> reading/writing the stream from the backend.

Oops, right, this figure I had in mind actually came from a rather
different measurement, that is, checking when the guest appeared to
resume from traffic captures with iperf3 running.

I definitely can't see this difference if I repeat the same measurement
as above.

> The memory size will of course affect the total migration time, and
> maybe the downtime.  As soon as qemu thinks it can transfer all
> remaining RAM within its downtime limit, qemu will go to the stopped
> phase.  With a fast local to local connection, it's possible qemu
> could enter that stopped phase almost immediately.
> 
> > that's why 100 ms also looked fine to me, but sure, 10 ms
> > sounds more reasonable.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect
  2025-03-12 20:39       ` Stefano Brivio
@ 2025-03-13  3:03         ` David Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2025-03-13  3:03 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2267 bytes --]

On Wed, Mar 12, 2025 at 09:39:10PM +0100, Stefano Brivio wrote:
> On Wed, 12 Mar 2025 12:29:11 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Mar 11, 2025 at 10:55:32PM +0100, Stefano Brivio wrote:
> > > On Tue, 11 Mar 2025 12:13:46 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
[snip]
> > > > Now, as it happens, the default downtime limit is 300ms, so an
> > > > additional 10ms is probably fine (though 100ms really wasn't).
> > > > Nonetheless the reasoning above isn't valid.  
> > > 
> > > ~50 ms is actually quite easy to get with a few (8) gigabytes of
> > > memory,  
> > 
> > 50ms as measured above?  That's a bit surprising, because there's no
> > particular reason for it to depend on memory size.  AFAICT
> > SET_DEVICE_STATE_FD is called close to immediately before actually
> > reading/writing the stream from the backend.
> 
> Oops, right, this figure I had in mind actually came from a rather
> different measurement, that is, checking when the guest appeared to
> resume from traffic captures with iperf3 running.

Ok.  That is a reasonable measure of the downtime, at least as long as
the guest is continuously trying to send, which it will with iperf3.

Which means by adding a 100ms delay, you'd triple the downtime, which
isn't really ok.  With more RAM and/or smaller migration bandwidth
this would increase up to the 300ms limit.  In that case 100ms would
still be a 33% (unaccounted for) increase, which still isn't really
ok.

> I definitely can't see this difference if I repeat the same measurement
> as above.
> 
> > The memory size will of course affect the total migration time, and
> > maybe the downtime.  As soon as qemu thinks it can transfer all
> > remaining RAM within its downtime limit, qemu will go to the stopped
> > phase.  With a fast local to local connection, it's possible qemu
> > could enter that stopped phase almost immediately.
> > 
> > > that's why 100 ms also looked fine to me, but sure, 10 ms
> > > sounds more reasonable.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-13  3:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-07 22:41 [PATCH v2] flow, repair: Wait for a short while for passt-repair to connect Stefano Brivio
2025-03-11  1:13 ` David Gibson
2025-03-11 21:55   ` Stefano Brivio
2025-03-12  1:29     ` David Gibson
2025-03-12 20:39       ` Stefano Brivio
2025-03-13  3:03         ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).