From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202506 header.b=NvKIJENc; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 980FF5A0278 for ; Fri, 25 Jul 2025 06:11:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202506; t=1753416534; bh=pggTGGLZN4Q9PkKoO3IP4ORWJy7Y6RaQseFyZf8/d5E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NvKIJENcJbi8eHoNqYu0UsmtQAQ6hQxmbvD2YQAfSDzYmq9eLkCtHGuumGEFeiRpZ tamxX7zg47KV55oBmT78b4RVKPLqqSzUy3Np7i/s0JZ5yFUh/ITkNEpusvZkLYgt8M Eo6u5j77nmuSbmMPWuSYf0/GNbhvqFcS26KjoSFRzaOpInH493RyBgo4hrPqRBkos7 FHW8T0OIge4EBjxaY8xEm9g0TcJXzC/cRUEomj3far2pDUcQKgmpYCdynUQBe7uc34 6MwVP2vpCJZec2Xwl9VVc9YeUsSA7Br29DbQ39ELif+OGFRh7E1FhlPLPwis9hjgQn vQ0mLmTUPcfUg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4bpDrk5c4wz4xQ0; Fri, 25 Jul 2025 14:08:54 +1000 (AEST) Date: Fri, 25 Jul 2025 14:04:17 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3] treewide: By default, don't quit source after migration, keep sockets open Message-ID: References: <20250724172858.1189615-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="6/f+in2MNt9icW64" Content-Disposition: inline In-Reply-To: <20250724172858.1189615-1-sbrivio@redhat.com> Message-ID-Hash: 2D4VRGTCWAIP3UDL3YD4LPUQF5JOITLB X-Message-ID-Hash: 2D4VRGTCWAIP3UDL3YD4LPUQF5JOITLB X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Nir Dothan X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --6/f+in2MNt9icW64 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 24, 2025 at 07:28:58PM +0200, Stefano Brivio wrote: > We are hitting an issue in the KubeVirt integration where some data is > still sent to the source instance even after migration is complete. As > we exit, the kernel closes our sockets and resets connections. The > resulting RST segments are sent to peers, effectively terminating > connections that were meanwhile migrated. >=20 > At the moment, this is not done intentionally, but in the future > KubeVirt might enable OVN-Kubernetes features where source and > destination nodes are explicitly getting mirrored traffic for a while, > in order to decrease migration downtime. >=20 > By default, don't quit after migration is completed on the source: the > previous behaviour can be enabled with the new, but deprecated, > --migrate-exit option. After migration (as source), the -1 / --one-off > option has no effect. >=20 > Also, by default, keep migrated TCP sockets open (in repair mode) as > long as we're running, and ignore events on any epoll descriptor > representing data channels. The previous behaviour can be enabled with > the new, equally deprecated, --migrate-no-linger option. >=20 > By keeping sockets open, and not exiting, we prevent the kernel > running on the source node to send out RST segments if further data > reaches us. >=20 > Reported-by: Nir Dothan > Signed-off-by: Stefano Brivio > --- > v2: > - assorted changes in commit message > - context variable ignore_linger becomes ignore_no_linger > - new options are deprecated > - don't ignore events on some descriptors, drop them from epoll >=20 > v3: > - Nir reported occasional failures (connections being reset) > with both v1 and v2, because, in KubeVirt's usage, we quit as > QEMU exits. Disable --one-off after migration as source, and > document this exception This seems like an awful, awful hack. We're abandoning consistent semantics on a wild guess as to what the layers above us need. Specifically, --once-off used to mean that the layer above us didn't need to manage passt's lifetime; it was tied to qemu's. Now it still needs to manually manage passt's lifetime, so what's the point. So, if it needs passt to outlive qemu it should actually manage that and not use --once-off. Requring passt to outlive qemu already seems pretty dubious to me: having the source still connected when passt was quitting is one thing - indeed it's arguably hard to avoid. Having it still connected when *qemu* quits is much less defensible. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --6/f+in2MNt9icW64 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiDAjQACgkQzQJF27ox 2GdOUBAAgSme9uJJQiznYuLkbAXoR0CYauuwYERTTaXl0UWvD3xJmJioAwiaDuM7 XY482+NV8bjO3mJys9HXoPQzmGqbumd8VYoty3hVN9MZq5p9WGF2nawrWb5RnG84 lcmwXLsdt0AaIBtdEPYR0yn1GCSkhU40GmFq8XIO6TutkdWzez060SDJB7c1zmzm 7vXS7uOmDoFV1uTw3+iUyo/bV/+DQX/i+d6YoD6fTmvjy5/33j+Ua8MZOoYHn6Zv UrlNRQfnhsg0xL5PADQ5mXvDb6ILAqI5eUTYbCS/ZPrInoK1s+XDkFTM55AEvvPm 1bh2WRg/pHyzaoFqQHrCvDb+agCOe6q6BeFU1o//EsDmHBkbZpouCiXVfgSflaXM 0Gb16bL6xz9nIK/meYIdECAo+e4UokCGXjyEH3SPujrPDGWBFkYnCCAR4FP6L11d dGjVR4x4RNbY2K5ga3vuj36E6eMrAPMWEpI+XRsZE015aSYKh7st6kjqI/CVuol1 OAlBik88CofdRedy4DZjbFPiX4polh9C8WiNt92e4KuFr5A9mm4E8ZpOQuCv02d5 4aLD6ou959NyfpvCZQD5RbvhSD0HtMfY6gbRRb2vJB3e/53lD3rAvl3NVIGglrJz 7aGm84CBUJ0+z+jUaRoPKiFDF22BzEHOdXhT8SYlBNW7e3wyvrQ= =AVkm -----END PGP SIGNATURE----- --6/f+in2MNt9icW64--