From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZNAINYfB; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 030065A026F for ; Thu, 30 Jan 2025 09:32:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738225963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T269+16V/kjZ3q81OLUgaXNO7jmNVLRTSQbyP8osulA=; b=ZNAINYfBAlP8ycivcZJv89ABkBMCNFxCqx8Q5y7R27A+Xg3mJRTpw5t0MOaACUPXblSqg1 V/Z7OWarKANKF8M6XTsvvyvdZ0KQUFGsxLOcfNSzaaEE+voGStc6OWdAyftjY7+79+9CXm XHmT362iUHXmliBTIH7yNddSBHP8wek= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-386-yScqEggvMdqWYn_YlDQ6xQ-1; Thu, 30 Jan 2025 03:32:41 -0500 X-MC-Unique: yScqEggvMdqWYn_YlDQ6xQ-1 X-Mimecast-MFC-AGG-ID: yScqEggvMdqWYn_YlDQ6xQ Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38629a685fdso214523f8f.2 for ; Thu, 30 Jan 2025 00:32:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738225960; x=1738830760; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=T269+16V/kjZ3q81OLUgaXNO7jmNVLRTSQbyP8osulA=; b=jNviaOrD6Gp0r71UIJvviMhpLatCKZ0K1R8s8F1aasNmwyAQbxhp6lKI3SIHJdfL1b JRCRZfGjMXbt9hePF38qeXm4guhIgNjEC2t6+0MJc+EYGiPvtkTxX5x6RLwKRgfeXIbk qi4jR8iCjc6idIZnWyEalBp1QqU/DealvPE+Nj3Ru3j163GvCQKhWg3jVtWVzUHC807X YyxDB+RLuPj2NFqQ1eE044wgE5FLLafrxjSmOsQaGTt0sGtmk/IF72PWdkRKlLo30Q98 7yzG9l5aggmDOK6fOeTlfwY5gIfELFJmDYGcawyPzKuLLYjgCbAzGEjhYrragGkqfpvE IGTQ== X-Gm-Message-State: AOJu0Yxkwe5MNBlLmXg5qr7Ef7/M9FeRrk4XH9XEhk+TUvMv5+ltSqZi okUF+rMlKBEuE+tNElCOSLKBZLGBd5570F+bcz4gdt/0vzdfQnygRE6sJXrDxJZKiX8sWlgHPk1 1o58eL4M1t8bwrV4YcQ6lLhZ/9IhLU86pIUVKeYgNNaw5lMHflg== X-Gm-Gg: ASbGnctG9VFDPUEDekys12pwYib9HyvUCEQyCEF3YyPqnTYXytHMdFTcf3sCFvj4C9f AOWZl0uiKNPD6A1lh97BLD0lUKDSRkmZUz4ruf1swsfOxrVRcVOfemK7jBMIpnf08u+A+51QibU q/khOCh4hB0t0jXrYplhR5+uYAbNCoKvn038LBv2DyERD5ofG4+QLyAXMwsn41vOtQDD0OkRvLK SJBMQdEqYBJlaDjBAn5SbUqUQg0PEZRTJjOvNAn6LdqqVvx4Eeq+1ALCu9PiJ3wEKBDC9GOuGRr Uc7Zg6TWUwnXQvqkBDn9tYYghbKGxsuBLQ== X-Received: by 2002:a05:600c:524a:b0:434:9f81:76d5 with SMTP id 5b1f17b1804b1-438dc40ffdamr49830145e9.22.1738225960515; Thu, 30 Jan 2025 00:32:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGimqJCPBBNkEBwTCAkEaG2rlxdKaOLqpHjNX7Fzzws/Op4F0615lMzicKh+6XA3q4w66/WhA== X-Received: by 2002:a05:600c:524a:b0:434:9f81:76d5 with SMTP id 5b1f17b1804b1-438dc40ffdamr49829735e9.22.1738225960008; Thu, 30 Jan 2025 00:32:40 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438d75de282sm47025985e9.2.2025.01.30.00.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jan 2025 00:32:39 -0800 (PST) Date: Thu, 30 Jan 2025 09:32:36 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: <20250130093236.117c3fd0@elisabeth> In-Reply-To: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-7-sbrivio@redhat.com> <20250128075001.3557d398@elisabeth> <20250129083350.220a7ab0@elisabeth> <20250130055522.39acb265@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: HeRJGnPjHic2-tNvL59tYRsxS4oj-T3CyaAjWV36Qd8_1738225961 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 637G2WPTDJQOKCV65CCSELY3OK3QDMQP X-Message-ID-Hash: 637G2WPTDJQOKCV65CCSELY3OK3QDMQP X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 30 Jan 2025 18:38:22 +1100 David Gibson wrote: > Right, but in the present draft you pay that cost whether or not > you're actually using the flows. Unfortunately a busy server with > heaps of active connections is exactly the case that's likely to be > most sensitve to additional downtime, but there's not really any > getting around that. A machine with a lot of state will need either > high downtime or high migration bandwidth. It's... sixteen megabytes. A KubeVirt node is only allowed to perform up to _four_ migrations in parallel, and that's our main use case at the moment. "High downtime" is kind of relative. > But, I'm really hoping we can move relatively quickly to a model where > a guest with only a handful of connections _doesn't_ have to pay that > 128k flow cost - and can consequently migrate ok even with quite > constrained migration bandwidth. In that scenario the size of the > header could become significant. I think the biggest cost of the full flow table transfer is rather code that's a bit quicker to write (I just managed to properly set sequences on the target, connections don't quite "flow" yet) but relatively high maintenance (as you mentioned, we need to be careful about every single field) and easy to break. I would like to quickly complete the whole flow first, because I think we can inform design and implementation decisions much better at that point, and we can be sure it's feasible, but I'm not particularly keen to merge this patch like it is, if we can switch it relatively swiftly to an implementation where we model a smaller fixed-endian structure with just the stuff we need. And again, to be a bit more sure of which stuff we need in it, the full flow is useful to have implemented. Actually the biggest complications I see in switching to that approach, from the current point, are that we need to, I guess: 1. model arrays (not really complicated by itself) 2. have a temporary structure where we store flows instead of using the flow table directly (meaning that the "data model" needs to logically decouple source and destination of the copy) 3. batch stuff to some extent. We'll call socket() and connect() once for each socket anyway, obviously, but sending one message to the TCP_REPAIR helper for each socket looks like a rather substantial and avoidable overhead > > > It's both easier to do > > > and a bigger win in most cases. That would dramatically reduce the > > > size sent here. > > > > Yep, feel free. > > It's on my queue for the next few days. To me this part actually looks like the biggest priority after/while getting the whole thing to work, because we can start right with a 'v1' which looks more sustainable. And I would just get stuff working on x86_64 in that case, without even implementing conversions and endianness switches etc. -- Stefano