public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH] flow, repair: Wait for a short while for passt-repair to connect
Date: Fri, 7 Mar 2025 13:59:08 +1100	[thread overview]
Message-ID: <Z8pg_KEeH_EqeMGa@zatzit> (raw)
In-Reply-To: <20250306190051.3924372-1-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6579 bytes --]

On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:
> ...and time out after that. This will be needed because of an upcoming
> change to passt-repair enabling it to start before passt is started,
> on both source and target, by means of an inotify watch.
> 
> Once the inotify watch triggers, passt-repair will connect right away,
> but we have no guarantees that the connection completes before we
> start the migration process, so wait for it (for a reasonable amount
> of time).
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Hmm.  So I'm pretty confident this patch does what it sets out to do.
I'm increasingly dubious about this direction.

> ---
> I'm still working on the passt-repair change mentioned above. It's
> turning out to be a bit trickier than expected because to support
> typical libvirt migration usage, on the target, we also need to be
> able to wait for creation of a directory and, separately, for a socket
> file being added inside it.

Oof.  This significantly amplifies my concerns about the security
model: one of the most obvious ways to to bound who could potentially
convince passt-repair to connect is to make sure the directory exists
before hand: if you don't have permissions to create the socket in
that directory, passt-repair won't connect.

If we're also waiting for directories to appear, it makes this even
harder to analyse in terms of whether someone who's not supposed to
could convince passt-repair to connect to them.

Given the complexity you're hitting here, I'm really starting to think
maybe we should reverse our model: have passt connect to passt-repair.
The security model is easier to understand: permission to the socket
and its directory is permission to use TCP_REPAIR.  There's no awkward
waiting on either side - I think it might actually be less total code
chyurn.  Judging by the early queries from the Kubevirt guys it seems
to be the more obvious model people would expect.  On the other hand,
configuring the socket path is still an issue, and being able to tell
passt where to connect to through libvirt might become more important
than it is now.

> However, this part looks solid and it should do what we need. I
> tested it by starting passt-repair after sleep(1) in migrate/* tests,
> changing this timeout to a minute, on either source, target, or both.
> 
>  flow.c   | 20 ++++++++++++++++++++
>  repair.c | 31 +++++++++++++++++++++++++++++++
>  repair.h |  1 +
>  3 files changed, 52 insertions(+)
> 
> diff --git a/flow.c b/flow.c
> index 749c498..5e64b79 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
>  	return ret;
>  }
>  
> +/**
> + * flow_migrate_need_repair() - Do we need to set repair mode for any flow?
> + *
> + * Return: true if repair mode is needed, false otherwise
> + */
> +static bool flow_migrate_need_repair(void)
> +{
> +	union flow *flow;
> +
> +	foreach_established_tcp_flow(flow)
> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * flow_migrate_repair_all() - Turn repair mode on or off for all flows
>   * @c:		Execution context
> @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
>  	(void)stage;
>  	(void)fd;
>  
> +	if (flow_migrate_need_repair())
> +		repair_wait(c);
> +
>  	if ((rc = flow_migrate_repair_all(c, true)))
>  		return -rc;
>  
> @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
>  	if (!count)
>  		return 0;
>  
> +	repair_wait(c);
> +
>  	if ((rc = flow_migrate_repair_all(c, true)))
>  		return -rc;
>  
> diff --git a/repair.c b/repair.c
> index 3ee089f..ce5a4a4 100644
> --- a/repair.c
> +++ b/repair.c
> @@ -27,6 +27,11 @@
>  
>  #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
>  
> +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */
> +#define REPAIR_ACCEPT_TIMEOUT_MS	100

100ms is a pretty long time in the context of a stopped guest.  I
guess is is a worst case.  We could possibly mitigate it by doing the
wait when memory change logging goes on (so migration is starting, bu
the guest is not get stopped).  However at that point we can't rely on
flow_migrate_need_repair(), because we might have no flows, but one is
opened before the guest is stopped.

> +#define REPAIR_ACCEPT_TIMEOUT_US	(REPAIR_ACCEPT_TIMEOUT_MS * 1000)
> +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);

In this context the static_assert() seems like a pretty arbitrary
bound.  To make the reason for it clearer, I'd suggest moving it to
where we construct the timeval.

> +
>  /* Pending file descriptors for next repair_flush() call, or command change */
>  static int repair_fds[SCM_MAX_FD];
>  
> @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events)
>  	repair_close(c);
>  }
>  
> +/**
> + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect
> + * @c:		Execution context
> + */
> +void repair_wait(struct ctx *c)
> +{
> +	struct timeval tv = { .tv_sec = 0,
> +			      .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) };
> +
> +	if (c->fd_repair >= 0 || c->fd_repair_listen == -1)
> +		return;
> +
> +	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> +		       &tv, sizeof(tv))) {
> +		err_perror("Set timeout on TCP_REPAIR listening socket");
> +		return;
> +	}
> +
> +	repair_listen_handler(c, EPOLLIN);
> +
> +	tv.tv_usec = 0;
> +	if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> +		       &tv, sizeof(tv)))
> +		err_perror("Clear timeout on TCP_REPAIR listening socket");
> +}
> +
>  /**
>   * repair_flush() - Flush current set of sockets to helper, with current command
>   * @c:		Execution context
> diff --git a/repair.h b/repair.h
> index de279d6..1d37922 100644
> --- a/repair.h
> +++ b/repair.h
> @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c);
>  void repair_listen_handler(struct ctx *c, uint32_t events);
>  void repair_handler(struct ctx *c, uint32_t events);
>  void repair_close(struct ctx *c);
> +void repair_wait(struct ctx *c);
>  int repair_flush(struct ctx *c);
>  int repair_set(struct ctx *c, int s, int cmd);
>  

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-03-07  2:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 19:00 [PATCH] flow, repair: Wait for a short while for passt-repair to connect Stefano Brivio
2025-03-07  2:59 ` David Gibson [this message]
2025-03-07  6:14   ` Stefano Brivio
2025-03-07  8:34     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8pg_KEeH_EqeMGa@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).