From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=PEz+tBTp;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by passt.top (Postfix) with ESMTPS id 57CC05A0008
	for <passt-dev@passt.top>; Tue, 25 Feb 2025 18:43:24 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1740505403;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=JPrj8E4Ft+iJF63GjqsiwE7cKhdV4J2jpWAS3UWVZSw=;
	b=PEz+tBTpYvzziEKubzLTwB+U7mLZRx82ERuos4tJWPWVFHriX+ckU3Ltm6kkJI0OFIwfy8
	+h90DmySknjpmEILX/Ymn4qWGnO1ouLrRpXD/pnsKW7dQ/4X9IibSnJzz+ifdf6uQGyfkW
	yJZayBpHarFvRZcwFegrCZizhaK7URQ=
Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com
 [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-141-v7FBNr3EM2mBfpMysAmNlw-1; Tue, 25 Feb 2025 12:43:21 -0500
X-MC-Unique: v7FBNr3EM2mBfpMysAmNlw-1
X-Mimecast-MFC-AGG-ID: v7FBNr3EM2mBfpMysAmNlw_1740505400
Received: by mail-ej1-f69.google.com with SMTP id a640c23a62f3a-abbae81829fso743112266b.3
        for <passt-dev@passt.top>; Tue, 25 Feb 2025 09:43:21 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740505400; x=1741110200;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=JPrj8E4Ft+iJF63GjqsiwE7cKhdV4J2jpWAS3UWVZSw=;
        b=rUiww9zxT9ux9WxDIOATADvEmf3+aLXfwhLKpDca5o5IOAJNe2QwWHH9kPb/z3KVXI
         f+QSMIRIEJCG0kACx62Vw38hbG0P+3ORPaGGVdm1Rv5xDOpSpTaQ+Pr4xHEXHrTSvTTN
         PKkaqs1bWtc9/5ycvKuhaU9eTf1q/AoMhWBy+h/nU289zIOjLDu0ku6oblCzFVbwo4XL
         W7CVFn20eQPLe1/lwc2Ul2i0OoQDfug+kXxo11eG1shZG+pzUFR7b2eYWUG3WCeXwUmI
         xf9GWQf+jDBMuN+dMopfo2mxyxPNlXr+JxDb6K++fQX/0k6AAb2mzAHJQbTm3uS4qmjb
         ZObw==
X-Gm-Message-State: AOJu0YwHwV2sCbbYB91eceAFQ0x5rbV5u/KUDVGr26bDbn6ZMkcbQdNG
	RWOYOPuJtX4++Zb7wP9lXKDiwxUKM8uvh2TFv2Tn0hRLJAXJGNY9vNuD3E2IRxPzSplIhj3DOH8
	0piOmeE4ZBulmjC964bRd9BCHY+CD2nr2H2vLVlpZDCiTbNst8BhH85P6JA==
X-Gm-Gg: ASbGncvBzGzVyEhe11oK9le/WrzvAKH9gKjQ1s3OxvtyPe0nGbnr/oMMEOBRQQpASUD
	yp3t2ms3wlFZqfzmcAXq1z2xPikw2EcD+XKJWzkkUYHvl9phLQwPDkHAVS+4EkOWW2s8VxM+wxb
	W1CREf+XDq/NXZOGTM2/AL0CkZJHHElTjI/R7qVxSX//S7pZcEexuBhjIrYpivSeVzGTDJFFZ3V
	kcOV0+9mjF/Afyx8zJIoBPTUOXP2eG6wHmIEOi+I18NgBQGWemdwSbm77hVVMVoJ2pOYHyu7rwb
	bzCMWowVtWfYUbPJSBCsjQkb4bYANnM/L/c3x/bw0fMD
X-Received: by 2002:a17:907:94cc:b0:ab7:f0fa:1341 with SMTP id a640c23a62f3a-abed10fecbbmr398011166b.56.1740505399482;
        Tue, 25 Feb 2025 09:43:19 -0800 (PST)
X-Google-Smtp-Source: AGHT+IGu5lU10ywi/dPsX/PkRu1gEpc7/FKuWewXk2MUr+RMjoD01prxmtb4C6x9zWwR3WVJ5TeEJQ==
X-Received: by 2002:a17:907:94cc:b0:ab7:f0fa:1341 with SMTP id a640c23a62f3a-abed10fecbbmr398009266b.56.1740505399036;
        Tue, 25 Feb 2025 09:43:19 -0800 (PST)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4])
        by smtp.gmail.com with ESMTPSA id a640c23a62f3a-abed2055011sm172104666b.156.2025.02.25.09.43.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 25 Feb 2025 09:43:18 -0800 (PST)
Date: Tue, 25 Feb 2025 18:43:16 +0100
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v2 0/2] More graceful handling of migration without
 passt-repair
Message-ID: <20250225184316.407247f4@elisabeth>
In-Reply-To: <20250225055132.3677190-1-david@gibson.dropbear.id.au>
References: <20250225055132.3677190-1-david@gibson.dropbear.id.au>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: UqBPM2dUm7EiHaLZgA1JPpKb1noVXK0saJRmgeG9tVQ_1740505400
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: I2CB4RNAKRP4XB4MV46DJIOXWYII5WZM
X-Message-ID-Hash: I2CB4RNAKRP4XB4MV46DJIOXWYII5WZM
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20250225184316.407247f4@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/I2CB4RNAKRP4XB4MV46DJIOXWYII5WZM/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Tue, 25 Feb 2025 16:51:30 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> From Red Hat internal testing we've had some reports that if
> attempting to migrate without passt-repair, the failure mode is uglier
> than we'd like.  The migration fails, which is somewhat expected, but
> we don't correctly roll things back on the source, so it breaks
> network there as well.
> 
> Handle this more gracefully allowing the migration to proceed in this
> case, but allow TCP connections to break
> 
> I've now tested this reasonably:
>  * I get a clean migration if there are now active flows
>  * Migration completes, although connections are broken if
>    passt-repair isn't connected
>  * Basic test suite (minus perf)
> 
> I didn't manage to test with libvirt yet, but I'm pretty convinced the
> behaviour should be better than it was.

I did, and it is. The series looks good to me and I would apply it as
it is, but I'm waiting a bit longer in case you want to try out some
variations based on my tests as well. Here's what I did.

L0 is Debian testing, L1 are two similar (virt-clone'd) instances of
RHEL 9.5 (with passt-0^20250217.ga1e48a0-1.el9.x86_64 or local build with
this series, qemu-kvm-9.1.0-14.el9.x86_64, libvirt-10.10.0-7.el9.x86_64),
and L2 is Alpine 3.21-ish.

The two L1 instances (hosting the source and target guests), of course,
don't need to be run under libvirt, but they do in my case. They are
connected by passt, so that they share the same address internally, but
I'm forwarding different SSH ports to them.

Relevant libvirt XML snippets for L1 instances:

    <interface type='user'>
      <mac address='52:54:00:8a:9e:c2'/>
      <portForward proto='tcp'>
        <range start='1295' to='22'/>
      </portForward>
      <model type='virtio'/>
      <backend type='passt'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

and:

    <interface type='user'>
      <mac address='52:54:00:b8:99:8c'/>
      <portForward proto='tcp'>
        <range start='11951' to='22'/>
      </portForward>
      <model type='virtio'/>
      <backend type='passt'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

...I didn't switch those to vhost-user mode yet.

I prepared the L2 guest on L1 with:

  $ wget https://dl-cdn.alpinelinux.org/alpine/v3.21/releases/cloud/nocloud_alpine-3.21.2-x86_64-bios-tiny-r0.qcow2
  $ virt-customize -a nocloud_alpine-3.21.2-x86_64-bios-tiny-r0.qcow2 --root-password password:root
  $ virt-install -d --name alpine --memory 1024 --noreboot --osinfo alpinelinux3.20 --network backend.type=passt,portForward0.proto=tcp,portForward0.range0.start=40922,portForward0.range0.to=2222 --import --disk nocloud_alpine-3.21.2-x86_64-bios-tiny-r0.qcow2

And made sure I can connect via SSH to the second (target node) L1 with:

  $ ssh-copy-id -f -p 11951 $GATEWAY

There are some known SELinux issues at this point that I'm still
working on (similar for AppArmor), so I *temporarily* set it to
permissive mode with 'setenforce 0', on L1. Some were not known,
though, and it's taking me longer than expected.

Now I can start passt-repair (or not) on the source L1 (node):

  # passt-repair /run/user/1001/libvirt/qemu/run/passt/8-alpine-net0.socket.repair

and open a TCP connection in the source L2 guest ('virsh console alpine',
then login as root/root):

  # apk add inetutils-telnet
  # telnet passt.top 80

and finally ask libvirt to migrate the guest. Note that I need
"--unsafe" because I didn't care about migrating storage (it's good
enough to have the guest memory for this test).

Without this series, migration fails on the source:

  $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
  Migration: [97.59 %]error: End of file while reading data: : Input/output error

...despite --verbose the error doesn't tell much (perhaps I need
LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
this series (I just used 'make install' from the local build), migration
succeeds instead:

  $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
  Migration: [100.00 %]

Now, on the target, I still have to figure out how to tell libvirt
to start QEMU and prepare for the migration (equivalent of '-incoming'
as we use in our tests), instead of just starting a new instance like
it does. Otherwise, I have no chance to start passt-repair there.
Perhaps it has something to do with persistent mode described here:

  https://libvirt.org/migration.html#configuration-file-handling

and --listen-address, but I'm not quite sure yet.

That is, I could only test different failures (early one on source, or
later one on target) with this, not a complete successful migration.

> There are more fragile cases that I'm looking to fix, particularly the
> die()s in flow_migrate_source_rollback() and elsewhere, however I ran
> into various complications that I didn't manage to sort out today.
> I'll continue looking at those tomorrow.  I'm now pretty confident
> that those additional fixes won't entirely supersede the changes in
> this series, so it should be fine to apply these on their own.

By the way, I think the somewhat less fragile/more obvious case where
we fail clumsily is when the target doesn't have the same address as
the source (among other possible addresses). In that case, we fail (and
terminate) with a rather awkward:

  93.7217: ERROR:   Failed to bind socket for migrated flow: Cannot assign requested address
  93.7218: ERROR:   Flow 0 (TCP connection): Can't set up socket: (null), drop
  93.7331: ERROR:   Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket
  93.7333: ERROR:   Unexpected reply from TCP_REPAIR helper: -100

that's because, oops, I only took care of socket() failures in
tcp_flow_repair_socket(), but not bind() failures (!). Sorry.

Once that's fixed, flow_migrate_target() should also take care of
decreasing 'count' accordingly. I just had a glimpse but didn't
really try to sketch a fix.

-- 
Stefano