From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=c+TOjxnq; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 00D685A061E for ; Thu, 20 Feb 2025 09:07:33 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740038852; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/61FKSAKEaIGMB6UeZXdRcgsn/GptMmnkN3IR1Mu4HQ=; b=c+TOjxnq586SdURyMNMbiBEoqq6kfwzLCOCRLHhHb+m/gTxzsNmLrQU0Kd0Bf1t1LhAyrN Lb7IuDxK78OrDmfk7wQjjp5s8ClPATEzhZUDxkkMJTEypFoK+gae8Wja8ze+ydE2B55HAx HXTF/FTLgE06KkNBAnSpQ5eVyBqdC1s= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-RnBWgGbDNFeyYmzMYQKCTg-1; Thu, 20 Feb 2025 03:07:30 -0500 X-MC-Unique: RnBWgGbDNFeyYmzMYQKCTg-1 X-Mimecast-MFC-AGG-ID: RnBWgGbDNFeyYmzMYQKCTg_1740038849 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4393e89e910so2936765e9.0 for ; Thu, 20 Feb 2025 00:07:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740038849; x=1740643649; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/61FKSAKEaIGMB6UeZXdRcgsn/GptMmnkN3IR1Mu4HQ=; b=W//9XATrP56QSHYFWha3gesJdJ2HsXciRfwbGTTW/oYRX7yf6djNlyQMLcuLFo7njy ZEtMQCIk+XA/wAQBg9O3C8U4rLH2xU6DVAGEKphFmA+Wx8zu90W98dCKt0DTAyQJkxah EQi//iA+hdlTpWSPOkJnd/HQmgkQktdvzn5izMLfR3IJaqXgDVp/fUjBnYUwkBosaCJC 8WmDYd7VrhRYtZBniZIPjFV5ojwe5gkGSNtdPI26T1bQQsXXljA9Z9Tafi/G/Yo4yG7u r/+2FYJtaZfk5pOxiKWGs8hQV+4GoT9e7aI4afmrPO5WALOy9270ZQ+avpX4MdgkzDDz 7LBA== X-Gm-Message-State: AOJu0YyfuxNIjOnmScF2Ir8cjOTEAuyUhsRYnTkURnuumVNvAutk/OVo kP0dxNwZZsiGRlLrLdhdsnUhSBsjnA1TnhpJ22p8OBul4RI4s+Ih29fR6lA5D6uviU1ZF139s5o a2OlCMsGTp4MxN/vep+zVYWpAf79aY0uylql3GSk4DCgzgZ6+kfhngOsjgw== X-Gm-Gg: ASbGnct4yZpmOZ0iBKxRrG8/fMHgdB7jKsbDvequh6lKzn/Obd9c1Sy18Up60a9Jzrc OtuKLNV0LkwltJGr/jjHoN9i+EXKx+htaNaGPaDWvSKI+a0IJYK2aWk2fluGdYeBxOa3lx+96VP 5PXoaDKiZDUhrfWaofN/VBjT/dMziYs34QdSmmxoSbBlGSEZAUlHa6qfLMxuMG4NqwF2T4DmOCO uUabGajfIUzlA73IW1v3zGltkmyHxKoECucY7P8EXErt0WG2lH2rKcppJ+80WireW5FV6u5lmVV H2hA+11kHJd8ryrDUYGyr3x2gfGx+MYZCg== X-Received: by 2002:a05:6000:2c3:b0:38d:e6f4:5a87 with SMTP id ffacd0b85a97d-38f33f117c6mr21205095f8f.10.1740038848811; Thu, 20 Feb 2025 00:07:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IFc8/piJm5zWbNTIFg23aYh4tJq3Qpod7OseK03nG4GHH97SaGwbQC0jCPTClNTraadjgTn3g== X-Received: by 2002:a05:6000:2c3:b0:38d:e6f4:5a87 with SMTP id ffacd0b85a97d-38f33f117c6mr21205066f8f.10.1740038848429; Thu, 20 Feb 2025 00:07:28 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f259d58f3sm19775413f8f.73.2025.02.20.00.07.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2025 00:07:28 -0800 (PST) Date: Thu, 20 Feb 2025 09:07:26 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair Message-ID: <20250220090726.43432475@elisabeth> In-Reply-To: <20250220060318.1796504-3-david@gibson.dropbear.id.au> References: <20250220060318.1796504-1-david@gibson.dropbear.id.au> <20250220060318.1796504-3-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: AveY36HuxCTMrAPmuUsvf0c_N26nr4lJHqalXNq6754_1740038849 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: PER4SWPDWSFBBZJHDERFJHDTZKIEZQ3S X-Message-ID-Hash: PER4SWPDWSFBBZJHDERFJHDTZKIEZQ3S X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 20 Feb 2025 17:03:18 +1100 David Gibson wrote: > Migrating TCP flows requires passt-repair in order to use TCP_REPAIR. If > passt-repair is not started, our failure mode is pretty ugly though: we'll > attempt the migration, hitting various problems when we can't enter repair > mode. In some cases we may not roll back these changes properly, meaning > we break network connections on the source. > > Our general approach is not to completely block migration if there are > problems, but simply to break any flows we can't migrate. So, if we have > no connection from passt-repair carry on with the migration, but don't > attempt to migrate any TCP connections. > > Signed-off-by: David Gibson > --- > flow.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/flow.c b/flow.c > index 6cf96c26..749c4984 100644 > --- a/flow.c > +++ b/flow.c > @@ -923,6 +923,10 @@ static int flow_migrate_repair_all(struct ctx *c, bool enable) > union flow *flow; > int rc; > > + /* If we don't have a repair helper, there's nothing we can do */ > + if (c->fd_repair < 0) > + return 0; > + This doesn't fix the behaviour in a relatively likely failure mode: passt-repair is there, but we can't communicate to it (LSM policy issues or similar). In that case, unconditionally terminating on failure in the rollback function: if (tcp_flow_repair_off(c, &flow->tcp)) die("Failed to roll back TCP_REPAIR mode"); if (repair_flush(c)) die("Failed to roll back TCP_REPAIR mode"); isn't a very productive thing to do: we go from an uneventful failure where flows were not affected at all to a guest left without connectivity. That starts looking less robust than the alternative (e.g. what I implemented in v12: silently fail and continue) at least without https://patchew.org/QEMU/20250217092550.1172055-1-lvivier@redhat.com/ in a general case as well: if we continue, we'll have hanging flows that will expire on timeout, but if we don't, again, we'll have a guest without connectivity. I understand that leaving flows around for that long might present a relevant inconsistency, though. So I'm wondering about some alternatives: actually, the rollback function shouldn't be called at all in this case. Or it could just (indirectly) call tcp_rst() on all the flows that were possibly affected. -- Stefano