From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=IZCqlTqv; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id F25315A0008 for ; Wed, 26 Feb 2025 09:09:55 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740557394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fp9DIhllBvah9azJ3N+tJpgDE+MZFvJm4tuP8uSKfxI=; b=IZCqlTqv6X4C6qZWF7OoqvkFooOrkeuj793BubYg1WOaViLhMt94XVO9N7dSGgFljgkVQ/ YkVdgrfGAKm87kjJMZxx/rxdgmoDcAkXkYiBtPNqVtxX8ZdypwKy4ISNS6jFl3rMxgrO0K 9sBkXk+Qpk0YLV8jIyQFycH5HQ6Fco8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-443-zRyL5IjyMFiJe7hUGjVHPQ-1; Wed, 26 Feb 2025 03:09:53 -0500 X-MC-Unique: zRyL5IjyMFiJe7hUGjVHPQ-1 X-Mimecast-MFC-AGG-ID: zRyL5IjyMFiJe7hUGjVHPQ_1740557392 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-438e4e9a53fso50096445e9.1 for ; Wed, 26 Feb 2025 00:09:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740557391; x=1741162191; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Fp9DIhllBvah9azJ3N+tJpgDE+MZFvJm4tuP8uSKfxI=; b=F9dzwEJjHaqH/PTZnuuxKGDXFsLothx0o/G4wAVfOai9PfJZuU6cUoRxMq67whsN7d dT+arzqIe04njbSdhNtyIgJB5vMGyKBzfBypTL9hVss6LFN62yo3/dJud4t8SiUoJ6EU RyhE5+fCS2czAzbnsTX2sfbSur+eVjvDfXE+MY270VGj0vG+Yc89qweiA8imQ+xN3tWO 23T3juDPChpTfhrb1+PMobDsmskjHDX3BqP4vTSRhVBltAAESxt8D7CeFu1RjxBnFwIu WhZBpR4aa4/tb2HUw81dQK2kg8xl/Z7/O7bRxpJo375jyhtYwuUQVVldBXOaNQggUG48 M9hw== X-Gm-Message-State: AOJu0Ywz1U39ow5C4vgiGaptJ7Yvn9FRmzjh4lXp33pmVgt5cACgcMfW oO1qS1lledAd97svJIevWTm3DAu8y4ibJjutApvyvJQ0ej9Q/KVA7nKZcMOJo83u44M1OhboE0O 6vNx2iOzHdjX8HKaDlZg1mTuBMQsH+7Zc9Fq/o1rkCdmFdiiWRNw3ARbGSw== X-Gm-Gg: ASbGncuRdfV8A9nSslOgQKlVTkQcFwxKIdSl2ne/EkRDVIPxF2D/u54P5IR8YgvgRBH YWCSJYrlLyMaH/DwE0xMbPofidYwUeW+Jl+qPVPy0NvpxyylZ8WfiIeRaQHqheKF5qW6IOL2Nd1 eLYelg89DdK4mi97auILRcJ2TE+MTHM4JHdq29AlzdUYhbksPf6q8deGD4FCwAmONFpzjNj4MuP jaR/x3ppno/awe7NUSr1ROHEcFS9ywwNOH//rd+nFRh7B7Dw3ikCvDXgVnIK8cRlozaF+Q01sOe 4aq4iL6LUksPr77X+EoKXo0poDboO6jOIBEzFna7qNA8 X-Received: by 2002:a05:600c:450d:b0:439:9a43:dd62 with SMTP id 5b1f17b1804b1-439aebb2d6fmr132666385e9.24.1740557391193; Wed, 26 Feb 2025 00:09:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IFs5sAfS9eU2+0xzgk2/on1GTyQmXN3wBNcJRM1eklEZZkFNlmqKDXYXjmeGNFY/n0nx8Zb0Q== X-Received: by 2002:a05:600c:450d:b0:439:9a43:dd62 with SMTP id 5b1f17b1804b1-439aebb2d6fmr132666155e9.24.1740557390747; Wed, 26 Feb 2025 00:09:50 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43aba532b0dsm12210375e9.13.2025.02.26.00.09.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2025 00:09:49 -0800 (PST) Date: Wed, 26 Feb 2025 09:09:48 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair Message-ID: <20250226090948.3d1fff91@elisabeth> In-Reply-To: References: <20250225055132.3677190-1-david@gibson.dropbear.id.au> <20250225184316.407247f4@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Al6OuScPPdcuwEW32F2-oAz-L_kXa5LJ_HiddsAtdAs_1740557392 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: WJK236376RYOO62CVTILWHIMJOUWJURE X-Message-ID-Hash: WJK236376RYOO62CVTILWHIMJOUWJURE X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 26 Feb 2025 11:27:32 +1100 David Gibson wrote: > On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote: > > On Tue, 25 Feb 2025 16:51:30 +1100 > > David Gibson wrote: > > > > > From Red Hat internal testing we've had some reports that if > > > attempting to migrate without passt-repair, the failure mode is uglier > > > than we'd like. The migration fails, which is somewhat expected, but > > > we don't correctly roll things back on the source, so it breaks > > > network there as well. > > > > > > Handle this more gracefully allowing the migration to proceed in this > > > case, but allow TCP connections to break > > > > > > I've now tested this reasonably: > > > * I get a clean migration if there are now active flows > > > * Migration completes, although connections are broken if > > > passt-repair isn't connected > > > * Basic test suite (minus perf) > > > > > > I didn't manage to test with libvirt yet, but I'm pretty convinced the > > > behaviour should be better than it was. > > > > I did, and it is. The series looks good to me and I would apply it as > > it is, but I'm waiting a bit longer in case you want to try out some > > variations based on my tests as well. Here's what I did. > > [snip] > > Thanks for the detailed instructions. More complex than I might have > liked, but oh well. > > > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session > > Migration: [97.59 %]error: End of file while reading data: : Input/output error > > > > ...despite --verbose the error doesn't tell much (perhaps I need > > LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With > > this series (I just used 'make install' from the local build), migration > > succeeds instead: > > > > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session > > Migration: [100.00 %] > > > > Now, on the target, I still have to figure out how to tell libvirt > > to start QEMU and prepare for the migration (equivalent of '-incoming' > > as we use in our tests), instead of just starting a new instance like > > it does. Otherwise, I have no chance to start passt-repair there. > > Perhaps it has something to do with persistent mode described here: > > Ah. So I'm pretty sure virsh migrate will automatically start qemu > with --incoming on the target. ("-incoming"), yes, see src/qemu/qemu_migration.c, qemuMigrationDstPrepare(). > IIUC the problem here is more about > timing: we want it to start it early, so that we have a chance to > start passt-repair and let it connect before the migration actually > happens. For the timing itself, we could actually wait for passt-repair to be there, with a timeout (say, 100ms). We could also modify passt-repair to set up an inotify watcher if the socket isn't there yet. > Crud... I didn't think of this before. I don't know that there's any > sensible way to do this without having libvirt managing passt-repair > as well. But we can't really use it as we're assuming that passt-repair will run with capabilities virtqemud doesn't want/need. > I mean it's not impossible there's some option to do this, > but I doubt there's been any reason before for something outside of > libvirt to control the timing of the target qemu's creation. I think > we need to ask libvirt people about this. I'm looking into it (and perhaps virtiofsd had similar troubles?). > > https://libvirt.org/migration.html#configuration-file-handling > > Yeah.. I don't think this is relevant. > > > and --listen-address, but I'm not quite sure yet. > > > > That is, I could only test different failures (early one on source, or > > later one on target) with this, not a complete successful migration. > > > > > There are more fragile cases that I'm looking to fix, particularly the > > > die()s in flow_migrate_source_rollback() and elsewhere, however I ran > > > into various complications that I didn't manage to sort out today. > > > I'll continue looking at those tomorrow. I'm now pretty confident > > > that those additional fixes won't entirely supersede the changes in > > > this series, so it should be fine to apply these on their own. > > > > By the way, I think the somewhat less fragile/more obvious case where > > we fail clumsily is when the target doesn't have the same address as > > the source (among other possible addresses). In that case, we fail (and > > terminate) with a rather awkward: > > Ah, yes, that is a higher priority fragile case. > > > 93.7217: ERROR: Failed to bind socket for migrated flow: Cannot assign requested address > > 93.7218: ERROR: Flow 0 (TCP connection): Can't set up socket: (null), drop > > 93.7331: ERROR: Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket > > 93.7333: ERROR: Unexpected reply from TCP_REPAIR helper: -100 > > > > that's because, oops, I only took care of socket() failures in > > tcp_flow_repair_socket(), but not bind() failures (!). Sorry. > > No, you check for errors on both. Well, "check", yes, but I'm not even setting an error code. I haven't tried your 3/3 yet but look at "(null)" resulting from: flow_err(flow, "Can't set up socket: %s, drop", strerror_(rc)); ...rc is 0. > The problem is that in > tcp_flow_migrate_target() we cancel the flow allocation and carry on - > but the source will still send information for this flow, putting us > out of sync with the stream. That, too, yes. > > Once that's fixed, flow_migrate_target() should also take care of > > decreasing 'count' accordingly. I just had a glimpse but didn't > > really try to sketch a fix. > > Adjusting count won't do the job. Instead we'd need to keep the flow > around, but marked as "dead" somehow, so that we read but discard the > incoming information for it. The MIGRATING state I added in one of my > drafts was supposed to help with this sort of thing. But that's quite > a complex change. I think it's great that you could (practically) solve it with three lines... > Hrm... at least in the near term, I think it might actually be easier > to set IP_FREEBIND when we create sockets for in-migrating flows. > That way we can process them normally, they just won't do much without > the address set. It has the additional advantage that it should work > if the higher layers only move the IP just after the migration, > instead of in advance. Perhaps we want it anyway, but I wonder: what happens if we turn repair mode off and we bound to a non-local address? I suppose we won't send out anything, but I'm not sure. If we send out the first keep-alive segment with a wrong address, we probably ruined the connection. Once I find a solution for the target libvirt/passt-repair thing (and the remaining SELinux issues), I'll try to have a look at this too. I haven't tried yet a migration with a mismatching address on the target and passt-repair available. -- Stefano