From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH] flow, repair: Wait for a short while for passt-repair to connect
Date: Fri, 7 Mar 2025 07:14:45 +0100 [thread overview]
Message-ID: <20250307071445.7c26b2d1@elisabeth> (raw)
In-Reply-To: <Z8pg_KEeH_EqeMGa@zatzit>
On Fri, 7 Mar 2025 13:59:08 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:
> > ...and time out after that. This will be needed because of an upcoming
> > change to passt-repair enabling it to start before passt is started,
> > on both source and target, by means of an inotify watch.
> >
> > Once the inotify watch triggers, passt-repair will connect right away,
> > but we have no guarantees that the connection completes before we
> > start the migration process, so wait for it (for a reasonable amount
> > of time).
> >
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
>
> Hmm. So I'm pretty confident this patch does what it sets out to do.
> I'm increasingly dubious about this direction.
>
> > ---
> > I'm still working on the passt-repair change mentioned above. It's
> > turning out to be a bit trickier than expected because to support
> > typical libvirt migration usage, on the target, we also need to be
> > able to wait for creation of a directory and, separately, for a socket
> > file being added inside it.
>
> Oof. This significantly amplifies my concerns about the security
> model: one of the most obvious ways to to bound who could potentially
> convince passt-repair to connect is to make sure the directory exists
> before hand: if you don't have permissions to create the socket in
> that directory, passt-repair won't connect.
>
> If we're also waiting for directories to appear, it makes this even
> harder to analyse in terms of whether someone who's not supposed to
> could convince passt-repair to connect to them.
I don't think your concern is particularly founded because in a general
case we have LSM policies preventing other processes from creating that
file in that directory, and in KubeVirt's case we simply have no other
processes that can access the pod mount namespace and do that.
But that part of the change is not strictly necessary, surely not for a
proper libvirt integration (libvirt knows the path) and also not for
the integration for KubeVirt (KubeVirt can build the path).
So, to dodge the obstructionism here, I'll happily drop that part. It
also saves me time.
> Given the complexity you're hitting here, I'm really starting to think
> maybe we should reverse our model: have passt connect to passt-repair.
> The security model is easier to understand: permission to the socket
> and its directory is permission to use TCP_REPAIR. There's no awkward
> waiting on either side - I think it might actually be less total code
> chyurn.
Of course it's a bit less code, but it turns a potential concern into
the actually expected situation: you have something permanently around
that allows any process (because you have no practical way to restrict
it, not in distributions and not in KubeVirt) to set/clear repair mode
on sockets. Unless you start introducing authentication, that is.
In the current model it's still a one-off operation, even with an
additional wait. You start passt-repair once to do its job, and once it
does the job, it terminates.
> Judging by the early queries from the Kubevirt guys it seems
> to be the more obvious model people would expect.
Note that you talked to people taking care of (almost exclusively)
networking there. Knowing myself a fair bit of code, architecture, and
history of security weaknesses and discussions around them, I'm pretty
sure you can't do the opposite.
The long-term solutions are anyway having libvirt provide a phase or
blocking event where this can be done, and, at least in KubeVirt's
case, eventually handle this with seitan.
> On the other hand,
> configuring the socket path is still an issue, and being able to tell
> passt where to connect to through libvirt might become more important
> than it is now.
>
> > However, this part looks solid and it should do what we need. I
> > tested it by starting passt-repair after sleep(1) in migrate/* tests,
> > changing this timeout to a minute, on either source, target, or both.
> >
> > flow.c | 20 ++++++++++++++++++++
> > repair.c | 31 +++++++++++++++++++++++++++++++
> > repair.h | 1 +
> > 3 files changed, 52 insertions(+)
> >
> > diff --git a/flow.c b/flow.c
> > index 749c498..5e64b79 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
> > return ret;
> > }
> >
> > +/**
> > + * flow_migrate_need_repair() - Do we need to set repair mode for any flow?
> > + *
> > + * Return: true if repair mode is needed, false otherwise
> > + */
> > +static bool flow_migrate_need_repair(void)
> > +{
> > + union flow *flow;
> > +
> > + foreach_established_tcp_flow(flow)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > /**
> > * flow_migrate_repair_all() - Turn repair mode on or off for all flows
> > * @c: Execution context
> > @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
> > (void)stage;
> > (void)fd;
> >
> > + if (flow_migrate_need_repair())
> > + repair_wait(c);
> > +
> > if ((rc = flow_migrate_repair_all(c, true)))
> > return -rc;
> >
> > @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
> > if (!count)
> > return 0;
> >
> > + repair_wait(c);
> > +
> > if ((rc = flow_migrate_repair_all(c, true)))
> > return -rc;
> >
> > diff --git a/repair.c b/repair.c
> > index 3ee089f..ce5a4a4 100644
> > --- a/repair.c
> > +++ b/repair.c
> > @@ -27,6 +27,11 @@
> >
> > #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> >
> > +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */
> > +#define REPAIR_ACCEPT_TIMEOUT_MS 100
>
> 100ms is a pretty long time in the context of a stopped guest. I
> guess is is a worst case.
It's actually not a case I can reproduce at all, as expected:
passt-repair connects quickly enough once the socket is available.
> We could possibly mitigate it by doing the
Mitigate what exactly? The expected mode of operation is with
passt-repair available. The failure handling is a fallback. I can turn
it into 1 ms, it just seems to increase risk for no good reason.
> wait when memory change logging goes on (so migration is starting, bu
> the guest is not get stopped). However at that point we can't rely on
> flow_migrate_need_repair(), because we might have no flows, but one is
> opened before the guest is stopped.
Right, that complicates things quite a lot.
> > +#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000)
> > +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);
>
> In this context the static_assert() seems like a pretty arbitrary
> bound. To make the reason for it clearer, I'd suggest moving it to
> where we construct the timeval.
Oh, I didn't think of that. Sure, I'll move it.
> > +
> > /* Pending file descriptors for next repair_flush() call, or command change */
> > static int repair_fds[SCM_MAX_FD];
> >
> > @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events)
> > repair_close(c);
> > }
> >
> > +/**
> > + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect
> > + * @c: Execution context
> > + */
> > +void repair_wait(struct ctx *c)
> > +{
> > + struct timeval tv = { .tv_sec = 0,
> > + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) };
> > +
> > + if (c->fd_repair >= 0 || c->fd_repair_listen == -1)
> > + return;
> > +
> > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> > + &tv, sizeof(tv))) {
> > + err_perror("Set timeout on TCP_REPAIR listening socket");
> > + return;
> > + }
> > +
> > + repair_listen_handler(c, EPOLLIN);
> > +
> > + tv.tv_usec = 0;
> > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO,
> > + &tv, sizeof(tv)))
> > + err_perror("Clear timeout on TCP_REPAIR listening socket");
> > +}
> > +
> > /**
> > * repair_flush() - Flush current set of sockets to helper, with current command
> > * @c: Execution context
> > diff --git a/repair.h b/repair.h
> > index de279d6..1d37922 100644
> > --- a/repair.h
> > +++ b/repair.h
> > @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c);
> > void repair_listen_handler(struct ctx *c, uint32_t events);
> > void repair_handler(struct ctx *c, uint32_t events);
> > void repair_close(struct ctx *c);
> > +void repair_wait(struct ctx *c);
> > int repair_flush(struct ctx *c);
> > int repair_set(struct ctx *c, int s, int cmd);
--
Stefano
next prev parent reply other threads:[~2025-03-07 6:14 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-06 19:00 [PATCH] flow, repair: Wait for a short while for passt-repair to connect Stefano Brivio
2025-03-07 2:59 ` David Gibson
2025-03-07 6:14 ` Stefano Brivio [this message]
2025-03-07 8:34 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250307071445.7c26b2d1@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).