From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OPRUW8/u; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 544285A0271 for ; Fri, 25 Jul 2025 07:11:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1753420264; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DQQrCMFTApt2R7Qda6qpwkIp4IThWnzR7h2Cs/9v/xA=; b=OPRUW8/ujDI8OF8pplzC8Kvukvj13m//EJTZ7CycLka0rUc9q5XG3wX450MC2AhxUxUDaV oxna0+ALNE9szqPEQGGoegM7x/b/4AF6Hzf6JQ0AEwWXieRtk3BUuf+UsUzW+D74VBUlc+ hM/Ghs0fGob1ygEZGnokmqwWGbhERVA= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-322-92TgFWTdOyy0gQuGgMir_g-1; Fri, 25 Jul 2025 01:11:02 -0400 X-MC-Unique: 92TgFWTdOyy0gQuGgMir_g-1 X-Mimecast-MFC-AGG-ID: 92TgFWTdOyy0gQuGgMir_g_1753420261 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3b775f1236fso85262f8f.1 for ; Thu, 24 Jul 2025 22:11:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753420261; x=1754025061; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=DQQrCMFTApt2R7Qda6qpwkIp4IThWnzR7h2Cs/9v/xA=; b=tkooejmtF+0DdV9+jsZsIbNCL05MZcqKHO8U4DuOsiFp8gIdO5W4wuWF7PRX9X87eP 60oR66fSJkkxY89hDJnFR8ZfYs4dvzBY5rN3R6B/EpN0/tIIXQfmt/WL5oGLXPNhFJhS JyNvfm8Ks8n6ZhgZ3yZtmmBFo956I1VME2hdBhWpI1CTBhlY5Y/yrmAtcBxbbqjSq1of 8FcGbiRBX55nLbnEjBI96y4utPcas6kagCcFI69hVovwon2jnJE6JMF3Q4J3xzZbjmsV hW88HdVcdE8+2eFT9H6sRdXIr6h3cc/XUbN2fW5pxYCcpqMB6JZmOBP3G0BkNthgNjv7 uU0Q== X-Gm-Message-State: AOJu0YwGJ17RMgIW5p+QOqa1R2sfm66JqCwNFrh5Y/3UAcrT1xvhBSMo /oUpiZALIBB8/lP21qIIC24yxfJAcVhAXXf4/PHOTFhc8AnbuGjg3Xi1FJfpBUKAIaTWYI5bpnK 230DoqiJAl9PVRXjNHoJgGHL6BO3VLt0gQgJ8cIw46COpVFEneXPA1A== X-Gm-Gg: ASbGncvbHY3AJQctsuR6xQfxLC2635Pmt7S6Ceps/3YWxVGghwSF5QYhoCF9Npl5pRo +RyibsXaYMFSnub1HA0W/8tqePADPcF47b7oaCuRDYBqbTjw0b58u0bmN+FZgmdIzZ7sgi9B4u6 b8WhTVt9PNlUr6GDwnj9leb/ZMyGPBrTcHHs30m9DcXAY6Lbyf4x9MnvkiRzTTEq8d3x5E50CKl Ona+kY0xDTiWfZXrR9PStxzsnIqGrff9CFZZ4GIKghxStqtt1zjaZyv3G8ZV5O4n5EmZDwfm+dh poX3bkalQW2bHr/ioyBsTzQEjFuFSIpEt21jU/x6u0N8Xl5kICE= X-Received: by 2002:a05:6000:1889:b0:3a4:dc93:1e87 with SMTP id ffacd0b85a97d-3b7765e614amr542785f8f.1.1753420260951; Thu, 24 Jul 2025 22:11:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGnT8ytpA0aCK97FRbVfiNyGnJWsGldQBfcb1ghF/Uk6mYff+5mlwWPbVBkGwXXteFmmFePrw== X-Received: by 2002:a05:6000:1889:b0:3a4:dc93:1e87 with SMTP id ffacd0b85a97d-3b7765e614amr542771f8f.1.1753420260544; Thu, 24 Jul 2025 22:11:00 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3b76fcacf0bsm3956493f8f.38.2025.07.24.22.10.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Jul 2025 22:10:59 -0700 (PDT) Date: Fri, 25 Jul 2025 07:10:58 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v3] treewide: By default, don't quit source after migration, keep sockets open Message-ID: <20250725071058.0842f7a2@elisabeth> In-Reply-To: References: <20250724172858.1189615-1-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: bw0V2FCagvAucYPTv_ab8ei6kVxmxzFd3GQimRXHCA0_1753420261 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 27JACM3F52ZOQNGONIJNPY7MEFN2XM3X X-Message-ID-Hash: 27JACM3F52ZOQNGONIJNPY7MEFN2XM3X X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Nir Dothan X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 25 Jul 2025 14:04:17 +1000 David Gibson wrote: > On Thu, Jul 24, 2025 at 07:28:58PM +0200, Stefano Brivio wrote: > > We are hitting an issue in the KubeVirt integration where some data is > > still sent to the source instance even after migration is complete. As > > we exit, the kernel closes our sockets and resets connections. The > > resulting RST segments are sent to peers, effectively terminating > > connections that were meanwhile migrated. > > > > At the moment, this is not done intentionally, but in the future > > KubeVirt might enable OVN-Kubernetes features where source and > > destination nodes are explicitly getting mirrored traffic for a while, > > in order to decrease migration downtime. > > > > By default, don't quit after migration is completed on the source: the > > previous behaviour can be enabled with the new, but deprecated, > > --migrate-exit option. After migration (as source), the -1 / --one-off > > option has no effect. > > > > Also, by default, keep migrated TCP sockets open (in repair mode) as > > long as we're running, and ignore events on any epoll descriptor > > representing data channels. The previous behaviour can be enabled with > > the new, equally deprecated, --migrate-no-linger option. > > > > By keeping sockets open, and not exiting, we prevent the kernel > > running on the source node to send out RST segments if further data > > reaches us. > > > > Reported-by: Nir Dothan > > Signed-off-by: Stefano Brivio > > --- > > v2: > > - assorted changes in commit message > > - context variable ignore_linger becomes ignore_no_linger > > - new options are deprecated > > - don't ignore events on some descriptors, drop them from epoll > > > > v3: > > - Nir reported occasional failures (connections being reset) > > with both v1 and v2, because, in KubeVirt's usage, we quit as > > QEMU exits. Disable --one-off after migration as source, and > > document this exception > > This seems like an awful, awful hack. Well, of course, it is, and long term it should be fixed in either KubeVirt or libvirt (even though I'm not sure how, see below) instead. > We're abandoning consistent > semantics on a wild guess as to what the layers above us need. No, not really, we tested this and tested the alternative. > Specifically, --once-off used to mean that the layer above us didn't --one-off > need to manage passt's lifetime; it was tied to qemu's. Now it still > needs to manually manage passt's lifetime, so what's the point. So, > if it needs passt to outlive qemu it should actually manage that and > not use --once-off. The main point is that it does *not* manually manage passt's lifetime if there's no migration (which is the general case for libvirt and all other users). We don't have any other user with an implementation of the migration workflow anyway (libvirt itself doesn't do that, yet). It's otherwise unusable for KubeVirt. So I'd say let's fix it for the only user we have. > Requring passt to outlive qemu already seems pretty dubious to me: > having the source still connected when passt was quitting is one thing > - indeed it's arguably hard to avoid. Having it still connected when > *qemu* quits is much less defensible. The fundamental problem here is that there's an issue in KubeVirt (and working around it is the whole point of this patch) which implies that packets are sent to the source pod *for a while* after migration. We found out that the guest is generally suspended during that while, but sometimes it might even have already exited. The pod remains, though, as long as it's needed. That's the only certainty we have. So, do we want to drop --one-off from the libvirt integration, and have libvirt manage passt's lifecycle entirely (note that all users outside KubeVirt don't use migration, so we would make the general case vastly more complicated for the sake of correctness for a single usage...)? Well, we can try to do that. Except that libvirt doesn't know either for how long this traffic will reach the source pod (that's a KubeVirt concept). So it should implement the same hack: let it outlive QEMU on migration... as long as we have that issue in KubeVirt. But I asked KubeVirt people, and it turns out that it's extremely complicated to fix this in KubeVirt. So, actually, I don't see another way to fix this in the short term. And without KubeVirt using this we could also drop the whole feature... -- Stefano