From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=dxOTv06f; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id C5FD05A0008 for ; Fri, 21 Feb 2025 07:41:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1740120100; bh=nPBkCR6zNRRyKRmPgOCriTAxqnj+1cIFSgwJthVioXc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dxOTv06ftoBxQAZq9CsMIWxcyMrFP8BIuGRMWN7lvnP/fqFF96Za1y7ai1lh+Ew7H Px9GX0T7bNOO4TWdmm6zeGV/wPxT31apqXvrwAHehlOYae+n1Btb1rgWTJtnvzPNt8 LYlHkhgyQd15g2fO0cDA9Xgyz4jnet4J4SrcHkfy8jB5tc8SZIwqtZqjb/HdWmF1s0 0mZuOurCcya2ITWj8CTuxtN1wLUtwVwaYxwmkrE6RNFQTNyjR3vU6J8yuthOMlHc8r 7AoneNsmCJmt7YC/wRf+6kMY65YHkAB2orDofd52iSzc1rmBp6gcvPtLUcJXYFP+v3 nMmpIQnW6fvpw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YzgX44bFbz4wcr; Fri, 21 Feb 2025 17:41:40 +1100 (AEDT) Date: Fri, 21 Feb 2025 17:37:18 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair Message-ID: References: <20250220060318.1796504-1-david@gibson.dropbear.id.au> <20250220060318.1796504-3-david@gibson.dropbear.id.au> <20250220090726.43432475@elisabeth> <20250220113800.05be8f5f@elisabeth> <20250221065912.404a1e88@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="eQnpy3H0zrR5voOw" Content-Disposition: inline In-Reply-To: <20250221065912.404a1e88@elisabeth> Message-ID-Hash: 25PCFCK6IGGVRFNTTMWGV6S4MMH63RV7 X-Message-ID-Hash: 25PCFCK6IGGVRFNTTMWGV6S4MMH63RV7 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --eQnpy3H0zrR5voOw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 21, 2025 at 06:59:12AM +0100, Stefano Brivio wrote: > On Fri, 21 Feb 2025 13:40:12 +1100 > David Gibson wrote: >=20 > > On Thu, Feb 20, 2025 at 11:38:00AM +0100, Stefano Brivio wrote: > > > On Thu, 20 Feb 2025 21:18:06 +1100 > > > David Gibson wrote: > > > =20 > > > > This sort of thing is, incidentally, why I did way back suggest the > > > > possibility of passt-repair reporting failures per-fd, rather than > > > > just per-batch. =20 > > >=20 > > > Sorry, I somehow missed that proposal, and I can't find any trace of > > > it. =20 > >=20 > > It may have just been on IRC somewhere. > >=20 > > > But anyway, the problem is that if we fail to read a batch for any > > > reason (invalid ancillary data... maybe always implying a kernel issu= e, > > > but I'm not sure), you can't _reliably_ report per-fd failures. > > > *Usually*, you can. Worth it? =20 > >=20 > > Ah, I see. We could handle that by being able to report both per-fd > > and "whole batch" failure (equivalent to failure on every fd), but > > that would complexify the protocol, of course. >=20 > By the way, after having another look at the kernel interface and > implementation: this makes no sense. >=20 > Either we're able to set repair mode for all the sockets, or for none > of them (EPERM). Well.. probably. I suspect something sufficiently insane in an LSM could break that rule. > And if there's any invalid file descriptor in the set, > we'll get EBADF for the whole sendmsg(). >=20 > The stuff I'm proposing below is well beyond my threshold of things > that make no sense to implement, but at least it limits damage in terms > of complexity (and hence of potential impact on the actual usage, > because that's what we're talking about here: a die() that makes no > sense but now proves to be actually harmful). I wouldn't go any further > than that. >=20 > > > In any case, if it's simple, we can still do it, because passt and > > > passt-repair are distributed together. You can't pass back the file > > > descriptors via SCM_RIGHTS though, because we want to close() them > > > before we reply. > > >=20 > > > Another alternative could be that passt-repair reverts back the state > > > of the file descriptors that were already switched, on failure. =20 > >=20 > > That might help a bit, we'd still need to rework the passt-side > > interface to know what needs reverting at the right stage. >=20 > Not for what I'm proposing: >=20 > 1. passt sends 1 (TCP_REPAIR_ON) for sockets 2, 3 >=20 > 2. passt-repair sets repair mode for 2 >=20 > 3. passt-repair fails to set repair mode for 3 >=20 > 4. passt-repair clears repair mode for 2 Again, it shouldn't happen in practice, but you will get a mess if you ever managed to set repair mode, but failed to clear it. There's a similar question in the other direction, if passt is trying to REPAIR_OFF, and we fail part way through. Should passt-repair REPAIR_ON again? I'm not sure it has the information to make a sane choice here. > 5. passt-repair closes the connection to signal failure >=20 > 6. passt knows that repair mode is *not* set for any socket in the batch >=20 > The interface remains the same (per-batch error only), but you can rely > on the whole batch to have failed. >=20 > You already can, even with no changes in passt-repair (see above), by > the way. Well, I just finished implementing a simple way of reporting partial failures. I guess see what you think of it. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --eQnpy3H0zrR5voOw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAme4Hx0ACgkQzQJF27ox 2Gcp9A//fhUeZLhfUSF6k/Z+N7UNduAf5IGuN4KIhvaSjN9Y+UXf/s4YtDvWHvCf 4LLLLL6Uxp/thGfLq3M/uI09Z6N0OS5Hcha9ErNgv4V4FlY5BFjLDipj7FtBgxB6 C2gFUz1BOMZ8tlffk8S7apqs7CgPE53fsUk1gvnzqn+whxQnGbSKghsTsm/6tG8a SqbdgaDyq7sE0k3pr8h4WK0Ve45CEFcGI4maChkwT9wPH36ny1BAMruhrs0V/18N kjGuX+khrqxEePaeCcQEnMCHPmACRAtYtb5YkIwn7zLPOCsYoSRJknHaNMJQg5zA oUzrV7tardYqtbwrUe7VSD87WjadgJZthyZd/GE3jfRei8Yl39G7PtiifQfcEesK IInqfDr+H0YWeE8iWQl1MtMr7t+b9SlgBofvO79R3WZ6NL6AYPN0bdOR2yvkb+SS r9N5wJi6iwIM+YQrC8UZkS73x+vPhOApG5tT4gdyblWK2AhgTib/HfHApD0BFfY/ XUzOEKudvD907TAbk80sviH6eDd95x8D2mfGI54cfyVlET3lIm0Re4CYaCyEdHiP mT4q3+oVo1b384hJ9fZZr6DKIiQHWu7cE8ULVte50hObHqJLW3PPRNdiEJTt3JcX VeoE2ZAibsnVTpB020ifge/MRPtAphJ2RwPtT+4H1+Dp4b8l54Q= =zncF -----END PGP SIGNATURE----- --eQnpy3H0zrR5voOw--