From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=I2kpdgUQ; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id D18FD5A0262 for ; Fri, 22 May 2026 08:45:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779432341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xBNCio5V/uvsHIB+B6+4i+bU7Iyds24RPxq+LLLJpnw=; b=I2kpdgUQ/VW/8ikFAIiimPWNEY8DvcGFFjU8iZBw03500TfMlln3t9WwYavNlMDibNxKja LGBU72VUqLdVgA6BTY6LDiKKsL9VefF/jXJ8M1A/j+eQZrLelqpwNvJ0ZzwnFbWAivi1yP NUSFJjWyHXM1q/U6ZizCdZVppzux0w0= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-173-vpZvGQQYMmmHq73fJEZV7w-1; Fri, 22 May 2026 02:45:38 -0400 X-MC-Unique: vpZvGQQYMmmHq73fJEZV7w-1 X-Mimecast-MFC-AGG-ID: vpZvGQQYMmmHq73fJEZV7w_1779432337 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-48e89faa62eso41686975e9.1 for ; Thu, 21 May 2026 23:45:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779432337; x=1780037137; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xBNCio5V/uvsHIB+B6+4i+bU7Iyds24RPxq+LLLJpnw=; b=MrLeSPp4bcD32e9QP0I06/cdkk/T+JUC/9rSFku3nC5nKCEOnC/RcctODgnIbNqscp V39GwZtOe+7jsYMKYjnSdNZWXxbE56gzjB3x/2b8WXUGfn1eTf/iu58PPulHSaZZoI2e 2a6sGmVLflgCgsrB4OkhPLCaC/HEPcqR5YiRSzgjiNrmSAOV6gJwWPYGmy+XHmyqGvNq GEUUnwOwf2zJWITqB+M2nQeAcjIPcf/bD/jYPSTiMdQHXumTQNo3q8eMgfWrcQVns/vC MyN30fiSii4rxTmJiQXseoR7eUGFmDqPankI+4IkbqVNqu+TZjow3q65OtBStZxa/Ucn aeag== X-Forwarded-Encrypted: i=1; AFNElJ8xnMMchC1hM7RRSHWwyhZtk0BTZi8dPTPdCc7J5xQHM0U7i7IB3HDjBxsyWHI7Kv6R/RtplaEMzjA=@passt.top X-Gm-Message-State: AOJu0YxtBgH7marVMF2SORo0PfLEoMC4SqN7F1rSDOlV+2gafkavWp1W mZej2FZWss8Y+rno8EwlCiaEvmE6Jggha5i4pRt4iWfH8ldfMJXgW0xBQSRaoERJBcSEzYZl0EZ wXUvNwPeZ4EnxQqVpwJmHBzDEYwCRemDX7j1CHMfFcLkrygyFgqUHFw== X-Gm-Gg: Acq92OEuxFsjgzuavLLp7TMZJEA9vGMG4xz7uJvYzSmt62+BjV3cy10D8r/4NqIoJOF trFH+AU7nqc9e44XpwEcXCZ6iSVcVr0wXz4dberrt4fxsTf3RUSNx67IC5Fo4V+rQ149wlyomPq phCrtFgukiUXSUujlhDfabPOIq4qygZXj9yaaPCkKyFXkQiO0PHzlZJBa2ZWY9MZSsOUM1NWUxd Vncxari3+qJNEHNXOOuDUtI0Hytb0GhPSTymVC5fDCj2HjHW0UipHjDijjjMReDVV2TacLS8xqQ O3ezqPmK88h0Fj7YOBv8GcsBAoIWf+Ii3RKk7AGJy3LzZAihxc+Sr/YkmX8T63cLqXj1rTOnjw/ nQCZpMXFsc83OTikr1o3nhNrqeud4UlZqrnpc81Esd41u2VlF9g== X-Received: by 2002:a05:600c:3b02:b0:48a:8b02:ae91 with SMTP id 5b1f17b1804b1-490424b11b2mr24446895e9.11.1779432336565; Thu, 21 May 2026 23:45:36 -0700 (PDT) X-Received: by 2002:a05:600c:3b02:b0:48a:8b02:ae91 with SMTP id 5b1f17b1804b1-490424b11b2mr24446375e9.11.1779432335953; Thu, 21 May 2026 23:45:35 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45eb6cd351csm1653482f8f.14.2026.05.21.23.45.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2026 23:45:35 -0700 (PDT) From: Stefano Brivio To: David GIbson Subject: Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Message-ID: <20260522084533.6094dbd5@elisabeth> In-Reply-To: References: <20260520173445.0658dfef@elisabeth> <20260520180708.275ec4de@elisabeth> <20260520181852.1f0119ff@elisabeth> <20260520225340.54490a21@elisabeth> <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com> <20260522062239.4fcd3314@elisabeth> <20260522074455.15e6cc3e@elisabeth> <20260522082349.3141a1f9@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Fri, 22 May 2026 08:45:34 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: jiP6MHisvxbwvVkhy4w4jmru1joQkkj6BUzqtcuni4c_1779432337 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: TDYJ63AT54FG2ZPHHBSBUBOJYPBBXRIJ X-Message-ID-Hash: TDYJ63AT54FG2ZPHHBSBUBOJYPBBXRIJ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier , passt-dev@passt.top, Jon Maloy X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 22 May 2026 16:36:52 +1000 David GIbson wrote: > On Fri, May 22, 2026 at 08:23:50AM +0200, Stefano Brivio wrote: > > On Fri, 22 May 2026 16:15:08 +1000 > > David GIbson wrote: > > =20 > > > On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote: =20 > > > > On Fri, 22 May 2026 06:22:39 +0200 > > > > Stefano Brivio wrote: > > > > =20 > > > > > On Fri, 22 May 2026 01:13:33 +0200 > > > > > Laurent Vivier wrote: > > > > > =20 > > > > > > On 5/21/26 10:30, Laurent Vivier wrote: =20 > > > > > > > On 5/20/26 22:53, Stefano Brivio wrote: =20 > > > > > > >> On Wed, 20 May 2026 18:18:52 +0200 > > > > > > >> Stefano Brivio wrote: > > > > > > >> =20 > > > > > > >>> On Wed, 20 May 2026 18:07:08 +0200 > > > > > > >>> Stefano Brivio wrote: > > > > > > >>> =20 > > > > > > >>>> On Wed, 20 May 2026 17:34:45 +0200 > > > > > > >>>> Stefano Brivio wrote: =20 > > > > > > >>>>> On Wed, 13 May 2026 13:52:08 +0200 > > > > > > >>>>> Laurent Vivier wrote: =20 > > > > > > >>>>>> Currently, the vhost-user path assumes each virtqueue el= ement contains > > > > > > >>>>>> exactly one iovec entry covering the entire frame.=C2=A0= This assumption > > > > > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide= descriptors where the > > > > > > >>>>>> vnet header and the frame payload are in separate buffer= s, resulting in > > > > > > >>>>>> two iovec entries per virtqueue element. > > > > > > >>>>>> > > > > > > >>>>>> This series refactors the vhost-user data path so that f= rame lengths, > > > > > > >>>>>> header sizes, and padding are tracked and passed explici= tly rather than > > > > > > >>>>>> being derived from iovec sizes.=C2=A0 This decoupling is= a prerequisite for > > > > > > >>>>>> correctly handling padding of multi-buffer frames. = =20 > > > > > > >>>>> > > > > > > >>>>> Sorry to bring (likely) bad news, but this series seems t= o introduce a > > > > > > >>>>> regression: I got the migration/rampstream_in tests fail = twice in a > > > > > > >>>>> row, which I've never saw happening (I think I saw a sing= le failure a > > > > > > >>>>> long time ago when the machine had a high CPU load, but n= othing else). > > > > > > >>>>> > > > > > > >>>>> I'm currently bisecting and the bisect seems to point tow= ards the end > > > > > > >>>>> of the series (probably 10/10), but I haven't finished ye= t. I'll keep > > > > > > >>>>> you posted. I haven't spotted anything that might cause i= ssues there. =20 > > > > > > >>>> > > > > > > >>>> Yeah, that's the one :( > > > > > > >>>> > > > > > > >>>> $ git bisect bad > > > > > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad = commit > > > > > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a > > > > > > >>>> Author: Laurent Vivier > > > > > > >>>> Date:=C2=A0=C2=A0 Wed May 13 13:52:18 2026 +0200 > > > > > > >>>> > > > > > > >>>> =C2=A0=C2=A0=C2=A0=C2=A0 vhost-user: Centralise Ethernet f= rame padding in vu_collect() and vu_pad() =20 > > > > > > >=20 > > > > > > > I checked on my system with the commit previous to this serie= s, > > > > > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and r= ampstream_in fails too (not=20 > > > > > > > everytime). > > > > > > > =20 > > > > > > > > TCP/IPv4: sequence check, ramps, inbound =20 > > > > > > > ...failed. > > > > > > >=20 > > > > > > > and rampstream_out hangs sometime too. > > > > > > >=20 > > > > > > > I'm going to try with ealier commits. =20 > > > > > >=20 > > > > > > For me the problem can happen with any commit... > > > > > >=20 > > > > > > As it depends on the execution path and on the load and speed o= f the system it looks like=20 > > > > > > a race condition. =20 > > > > >=20 > > > > > Hah, thanks for checking. Maybe... > > > > > =20 > > > > > > Did you try to test on a host with a kernel patched with > > > > > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump = and data receive" ? =20 > > > > >=20 > > > > > Now I tried, and yes, the test doesn't hang anymore! I seem to ha= ve an > > > > > issue with teardown functions on recent kernels (current net.git = HEAD > > > > > more or less): > > > > >=20 > > > > > --- > > > > > + teardown_migrate > > > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid > > > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/mi= grate/ns1.hold -- kill 16 > > > > > qemu-system-x86_64: terminating on signal 15 from pid 34 () > > > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid > > > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/mi= grate/ns1.hold -- kill 15 > > > > > 18.8974: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost u= ser message =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11) > > > > > 18.8974: Flags: 0x1 > > > > > 18.8974: Size: 8 > > > > > 18.8974: State.index: 0 > > > > > 18.8975: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost u= ser message =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11) > > > > > 18.8975: Flags: 0x1 > > > > > 18.8975: Size: 8 > > > > > 18.8975: State.index: 1 > > > > > qemu-system-x86_64: terminating on signal 15 from pid 35 () > > > > > 18.7961: Client connection closed > > > > > 18.7962: Closing TCP_REPAIR helper socket > > > > > + context_wait qemu_1 > > > > > + __name=3Dqemu_1 > > > > > + __pidfile=3D/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid > > > > > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid > > > > > + rc=3D0 > > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdou= t.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dS= Y5hBu1 > > > > > + __pid=3D67766 > > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid > > > > > + [ 1 -eq 1 ] > > > > > + echo [Exit code: 0] > > > > > + echo -n passt_repair_2$=20 > > > > > + return 0 > > > > > 18.9016: Client connection closed > > > > > 18.9018: Closing TCP_REPAIR helper socket > > > > > + wait 67766 > > > > > + rc=3D0 > > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdou= t.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU= 550iEI > > > > > + [ 1 -eq 1 ] > > > > > + echo [Exit code: 0] > > > > > + echo -n passt_repair_1$=20 > > > > > + return 0 > > > > > + rc=3D0 > > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAh= fl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA > > > > > + [ 1 -eq 1 ] > > > > > + echo [Exit code: 0] > > > > > + echo -n qemu_2$=20 > > > > > + return 0 > > > > > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=3D40 cid:94558 p= ort:22, 16): Connection timed out > > > > > Connection closed by UNKNOWN port 65535 > > > > > ... > > > > > --- > > > > >=20 > > > > > it looks like we stop QEMU a bit too early. But it should be unre= lated. > > > > >=20 > > > > > I'm now trying to find some kind of workaround for existing (not = fixed) > > > > > kernel versions. Maybe stopping rampstream_in for a moment or som= ething > > > > > like that. =20 > > > >=20 > > > > For some weird reason even very blatant throttling (100 ms - 1 s de= lays > > > > every 10000 ramps, or an explicit 500 ms pause via signal before > > > > migration) doesn't help. > > > >=20 > > > > So it doesn't seem to be *that* kind of race. I should probably che= ck > > > > the same exact kernel version with fix and without... =20 > > >=20 > > > If it's due to the kernel not stopping the queues on REPAIR, then the > > > only real way to fix the test is to cut off the source machine's > > > network before we trigger migration. =20 > >=20 > > Well, that's a rather complicated way to do it. One could simply stop > > the traffic instead. =20 >=20 > I don't know that "simply" is quite so simple. You can suspend the > source of the data, but you need to wait a difficult to ascertain > amount of time for that to make it to the guest, and all the acks to > come back. Looking at captures that parts seems to be around 1-2 ms, so I'm waiting 100 ms. > For rampstream_out it's worse: the source is in the guest which isn't > supposed to know about the migration in advance, so you can't really > stop it without stopping the guest's whole network. But we don't have a problem with that one. --=20 Stefano