From: Stefano Brivio <sbrivio@redhat.com>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top, Jon Maloy <jmaloy@redhat.com>,
David GIbson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
Date: Fri, 22 May 2026 06:22:40 +0200 (CEST) [thread overview]
Message-ID: <20260522062239.4fcd3314@elisabeth> (raw)
In-Reply-To: <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com>
On Fri, 22 May 2026 01:13:33 +0200
Laurent Vivier <lvivier@redhat.com> wrote:
> On 5/21/26 10:30, Laurent Vivier wrote:
> > On 5/20/26 22:53, Stefano Brivio wrote:
> >> On Wed, 20 May 2026 18:18:52 +0200
> >> Stefano Brivio <sbrivio@redhat.com> wrote:
> >>
> >>> On Wed, 20 May 2026 18:07:08 +0200
> >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> >>>
> >>>> On Wed, 20 May 2026 17:34:45 +0200
> >>>> Stefano Brivio <sbrivio@redhat.com> wrote:
> >>>>> On Wed, 13 May 2026 13:52:08 +0200
> >>>>> Laurent Vivier <lvivier@redhat.com> wrote:
> >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> >>>>>> exactly one iovec entry covering the entire frame. This assumption
> >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> >>>>>> two iovec entries per virtqueue element.
> >>>>>>
> >>>>>> This series refactors the vhost-user data path so that frame lengths,
> >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> >>>>>> being derived from iovec sizes. This decoupling is a prerequisite for
> >>>>>> correctly handling padding of multi-buffer frames.
> >>>>>
> >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> >>>>> row, which I've never saw happening (I think I saw a single failure a
> >>>>> long time ago when the machine had a high CPU load, but nothing else).
> >>>>>
> >>>>> I'm currently bisecting and the bisect seems to point towards the end
> >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> >>>>> you posted. I haven't spotted anything that might cause issues there.
> >>>>
> >>>> Yeah, that's the one :(
> >>>>
> >>>> $ git bisect bad
> >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> >>>> Author: Laurent Vivier <lvivier@redhat.com>
> >>>> Date: Wed May 13 13:52:18 2026 +0200
> >>>>
> >>>> vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
> >
> > I checked on my system with the commit previous to this series,
> > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not
> > everytime).
> >
> > > TCP/IPv4: sequence check, ramps, inbound
> > ...failed.
> >
> > and rampstream_out hangs sometime too.
> >
> > I'm going to try with ealier commits.
>
> For me the problem can happen with any commit...
>
> As it depends on the execution path and on the load and speed of the system it looks like
> a race condition.
Hah, thanks for checking. Maybe...
> Did you try to test on a host with a kernel patched with
> "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?
Now I tried, and yes, the test doesn't hang anymore! I seem to have an
issue with teardown functions on recent kernels (current net.git HEAD
more or less):
---
+ teardown_migrate
+ cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
+ /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
qemu-system-x86_64: terminating on signal 15 from pid 34 ()
+ cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
+ /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
18.8974: ================ Vhost user message ================
18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
18.8974: Flags: 0x1
18.8974: Size: 8
18.8974: State.index: 0
18.8975: ================ Vhost user message ================
18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
18.8975: Flags: 0x1
18.8975: Size: 8
18.8975: State.index: 1
qemu-system-x86_64: terminating on signal 15 from pid 35 ()
18.7961: Client connection closed
18.7962: Closing TCP_REPAIR helper socket
+ context_wait qemu_1
+ __name=qemu_1
+ __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
+ __pid=67766
+ rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n passt_repair_2$
+ return 0
18.9016: Client connection closed
18.9018: Closing TCP_REPAIR helper socket
+ wait 67766
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n passt_repair_1$
+ return 0
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n qemu_2$
+ return 0
2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
Connection closed by UNKNOWN port 65535
...
---
it looks like we stop QEMU a bit too early. But it should be unrelated.
I'm now trying to find some kind of workaround for existing (not fixed)
kernel versions. Maybe stopping rampstream_in for a moment or something
like that.
--
Stefano
next prev parent reply other threads:[~2026-05-22 4:22 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 11:52 Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
2026-05-14 1:24 ` David Gibson
2026-05-20 0:52 ` [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Stefano Brivio
2026-05-20 15:34 ` Stefano Brivio
2026-05-20 16:07 ` Stefano Brivio
2026-05-20 16:18 ` Stefano Brivio
2026-05-20 20:53 ` Stefano Brivio
2026-05-21 8:30 ` Laurent Vivier
2026-05-21 23:13 ` Laurent Vivier
2026-05-22 4:22 ` Stefano Brivio [this message]
2026-05-22 5:44 ` Stefano Brivio
2026-05-22 6:15 ` David GIbson
2026-05-22 6:23 ` Stefano Brivio
2026-05-22 6:36 ` David GIbson
2026-05-22 6:45 ` Stefano Brivio
2026-05-22 12:04 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260522062239.4fcd3314@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=jmaloy@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).