public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair
Date: Fri, 21 Feb 2025 17:37:18 +1100	[thread overview]
Message-ID: <Z7gfHldsxIDfCnlG@zatzit> (raw)
In-Reply-To: <20250221065912.404a1e88@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 3734 bytes --]

On Fri, Feb 21, 2025 at 06:59:12AM +0100, Stefano Brivio wrote:
> On Fri, 21 Feb 2025 13:40:12 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 20, 2025 at 11:38:00AM +0100, Stefano Brivio wrote:
> > > On Thu, 20 Feb 2025 21:18:06 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > This sort of thing is, incidentally, why I did way back suggest the
> > > > possibility of passt-repair reporting failures per-fd, rather than
> > > > just per-batch.  
> > > 
> > > Sorry, I somehow missed that proposal, and I can't find any trace of
> > > it.  
> > 
> > It may have just been on IRC somewhere.
> > 
> > > But anyway, the problem is that if we fail to read a batch for any
> > > reason (invalid ancillary data... maybe always implying a kernel issue,
> > > but I'm not sure), you can't _reliably_ report per-fd failures.
> > > *Usually*, you can. Worth it?  
> > 
> > Ah, I see.  We could handle that by being able to report both per-fd
> > and "whole batch" failure (equivalent to failure on every fd), but
> > that would complexify the protocol, of course.
> 
> By the way, after having another look at the kernel interface and
> implementation: this makes no sense.
> 
> Either we're able to set repair mode for all the sockets, or for none
> of them (EPERM).

Well.. probably.  I suspect something sufficiently insane in an LSM
could break that rule.

> And if there's any invalid file descriptor in the set,
> we'll get EBADF for the whole sendmsg().
> 
> The stuff I'm proposing below is well beyond my threshold of things
> that make no sense to implement, but at least it limits damage in terms
> of complexity (and hence of potential impact on the actual usage,
> because that's what we're talking about here: a die() that makes no
> sense but now proves to be actually harmful). I wouldn't go any further
> than that.
> 
> > > In any case, if it's simple, we can still do it, because passt and
> > > passt-repair are distributed together. You can't pass back the file
> > > descriptors via SCM_RIGHTS though, because we want to close() them
> > > before we reply.
> > > 
> > > Another alternative could be that passt-repair reverts back the state
> > > of the file descriptors that were already switched, on failure.  
> > 
> > That might help a bit, we'd still need to rework the passt-side
> > interface to know what needs reverting at the right stage.
> 
> Not for what I'm proposing:
> 
> 1. passt sends 1 (TCP_REPAIR_ON) for sockets 2, 3
> 
> 2. passt-repair sets repair mode for 2
> 
> 3. passt-repair fails to set repair mode for 3
> 
> 4. passt-repair clears repair mode for 2

Again, it shouldn't happen in practice, but you will get a mess if you
ever managed to set repair mode, but failed to clear it.

There's a similar question in the other direction, if passt is trying
to REPAIR_OFF, and we fail part way through.  Should passt-repair
REPAIR_ON again?  I'm not sure it has the information to make a sane
choice here.

> 5. passt-repair closes the connection to signal failure
> 
> 6. passt knows that repair mode is *not* set for any socket in the batch
> 
> The interface remains the same (per-batch error only), but you can rely
> on the whole batch to have failed.
> 
> You already can, even with no changes in passt-repair (see above), by
> the way.

Well, I just finished implementing a simple way of reporting partial
failures.  I guess see what you think of it.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-02-21  6:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20  6:03 [PATCH 0/2] RFC: More graceful handling of migration without passt-repair (UNTESTED) David Gibson
2025-02-20  6:03 ` [PATCH 1/2] migrate, flow: Trivially succeed if migrating with no flows David Gibson
2025-02-20  6:03 ` [PATCH 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair David Gibson
2025-02-20  8:07   ` Stefano Brivio
2025-02-20 10:18     ` David Gibson
2025-02-20 10:38       ` Stefano Brivio
2025-02-21  2:40         ` David Gibson
2025-02-21  5:59           ` Stefano Brivio
2025-02-21  6:37             ` David Gibson [this message]
2025-02-21  7:03               ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7gfHldsxIDfCnlG@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).