On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote:
> On Fri, 22 May 2026 06:22:39 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Fri, 22 May 2026 01:13:33 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> > 
> > > On 5/21/26 10:30, Laurent Vivier wrote:  
> > > > On 5/20/26 22:53, Stefano Brivio wrote:    
> > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>    
> > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>>    
> > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:    
> > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:    
> > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > >>>>>> two iovec entries per virtqueue element.
> > > >>>>>>
> > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > >>>>>> correctly handling padding of multi-buffer frames.    
> > > >>>>>
> > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > >>>>>
> > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > >>>>> you posted. I haven't spotted anything that might cause issues there.    
> > > >>>>
> > > >>>> Yeah, that's the one :(
> > > >>>>
> > > >>>> $ git bisect bad
> > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > >>>>
> > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()    
> > > > 
> > > > I checked on my system with the commit previous to this series,
> > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > everytime).
> > > >     
> > > >  > TCP/IPv4: sequence check, ramps, inbound    
> > > > ...failed.
> > > > 
> > > > and rampstream_out hangs sometime too.
> > > > 
> > > > I'm going to try with ealier commits.    
> > > 
> > > For me the problem can happen with any commit...
> > > 
> > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > a race condition.  
> > 
> > Hah, thanks for checking. Maybe...
> > 
> > > Did you try to test on a host with a kernel patched with
> > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?  
> > 
> > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > issue with teardown functions on recent kernels (current net.git HEAD
> > more or less):
> > 
> > ---
> > + teardown_migrate
> > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> > qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> > 18.8974: ================ Vhost user message ================
> > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> > 18.8974: Flags:   0x1
> > 18.8974: Size:    8
> > 18.8974: State.index: 0
> > 18.8975: ================ Vhost user message ================
> > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> > 18.8975: Flags:   0x1
> > 18.8975: Size:    8
> > 18.8975: State.index: 1
> > qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> > 18.7961: Client connection closed
> > 18.7962: Closing TCP_REPAIR helper socket
> > + context_wait qemu_1
> > + __name=qemu_1
> > + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> > + __pid=67766
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n passt_repair_2$ 
> > + return 0
> > 18.9016: Client connection closed
> > 18.9018: Closing TCP_REPAIR helper socket
> > + wait 67766
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n passt_repair_1$ 
> > + return 0
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n qemu_2$ 
> > + return 0
> > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > Connection closed by UNKNOWN port 65535
> > ...
> > ---
> > 
> > it looks like we stop QEMU a bit too early. But it should be unrelated.
> > 
> > I'm now trying to find some kind of workaround for existing (not fixed)
> > kernel versions. Maybe stopping rampstream_in for a moment or something
> > like that.
> 
> For some weird reason even very blatant throttling (100 ms - 1 s delays
> every 10000 ramps, or an explicit 500 ms pause via signal before
> migration) doesn't help.
> 
> So it doesn't seem to be *that* kind of race. I should probably check
> the same exact kernel version with fix and without...

If it's due to the kernel not stopping the queues on REPAIR, then the
only real way to fix the test is to cut off the source machine's
network before we trigger migration.  That could be done with
netfilter (in a user+netns).  But probably more natural would be to
not do the migration between local passt instances, but actually
between two host namespaces, with separate netifs for external
connectivity and for the migration.  Remove the external netif on the
source, then trigger migration, then add the external netif on the
destination.

It's quite a bit of hassle :(.  But it does model something much
closer to a real migration scenario.  As a bonus it would mean we'd no
longer rely on the hack of guessing when to exit the source passt in
order to allow the destination passt to bind.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson