From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HJVpu9e3; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id CF8DC5A026D for ; Fri, 22 May 2026 06:22:45 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779423764; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ws1ovwRJ6KYr1cvAIZt1F66E96dOpKaZk6o6WNX1CgI=; b=HJVpu9e32tPH+JDFzt4JtDVQLQ0FgaFnTz6D21IhEbYb3RtdO9HYk80Bp2qHAuHMY6Tt9H wmQs80/CbWkrIt5mqFZUgeMD3a9PavztioNleoJCEU6V+AO5o6eTkuev0sFwLVuiaqflTL 28GsQff5UmvzKPp1iD7/Z2fChustnyM= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-673-BuS-2GWDPuyayrFuZVpx3A-1; Fri, 22 May 2026 00:22:42 -0400 X-MC-Unique: BuS-2GWDPuyayrFuZVpx3A-1 X-Mimecast-MFC-AGG-ID: BuS-2GWDPuyayrFuZVpx3A_1779423762 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-48feb0298d7so51824695e9.1 for ; Thu, 21 May 2026 21:22:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779423761; x=1780028561; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ws1ovwRJ6KYr1cvAIZt1F66E96dOpKaZk6o6WNX1CgI=; b=Af/I5qHRZYY1YybeAM9il50wuQgcqWBd7iiqCuu+P0pu9QfFL6ztOJww1ZnBI2jSxy jLzbfqCXZmobn8zg2opq+FQjrmcksTgI2yXcg8CK8XZaWL2OXBfJ5o+cATcfwOWvb+65 YzmNFlA7OHyVj5FFc0Qsdc1dZ8+TUlOdJS+oRXZUn3b9p1pL2aj+89wcAODV8anBEI5d 7C6druK0M1r9xsc7mpMXsEJYbP8xbjLSOlv0w1ibhQZ7tYtneAQ4XZaH4J/+KI6eTn+h eaDCCHys0NDSBFLy4mNSVVWS5tIRKVqZ9fgSzd5ZclbdTkxgvvVD/vm24c1yelsEvR2k Vg2w== X-Gm-Message-State: AOJu0YwiV7PheZmMgGsyp8YICTWaHC93l+IY39aO9EnH5QOpSKqZEKDx 9ovnF3wLmWPafQ4lkhM+PtEuBr209WXx6OC/hNWPUIrP35TEJpokxFtB29ecIrPDrXk/CAaiDkh R75D+wGuXWNmP6z/RyeTivOW3ldID89rBQnl4FjvSJAtorcmBtqCtMH/oIDKchA== X-Gm-Gg: Acq92OHFh1Mz0jvxgNKGGmSFTgcG/GpPDlUvucYf94FMA0ryCWEHsW4NWnom/oKX/XS pibJcKKWZGq4jqdZau6RyaOsM7XND0E8Bh1JcBr6+Z/15SS9ZNG18q68w3+4IsXSYIaJUnu8NrR 6V5RkyOrOaewcdF42shkzYME6zviM6mkz7cGwxwkws/rqNXxiaf0m+2Srd7croe69psfObBb1Qa lo6EOEP7vyUsWLEmYR4SJTH0d2eoLDFmyezVQG8pDJt5H8Kc07Wnk3tvJpLWBFvVnQfXa0JT1vm Jy+0RC4D4WWURp3hTDWo4jnVlfZPbhIt4ezMkt1QetHNS6l/fc/S0rl8GXuPz+3e5a3ukkAUwxE U1lPiPlvTE++qwzJjpWn1yf6nXtLElAsH1yTJtM15hm0= X-Received: by 2002:a05:600c:888b:b0:48f:99a9:bbd6 with SMTP id 5b1f17b1804b1-490428c9338mr14105755e9.24.1779423761553; Thu, 21 May 2026 21:22:41 -0700 (PDT) X-Received: by 2002:a05:600c:888b:b0:48f:99a9:bbd6 with SMTP id 5b1f17b1804b1-490428c9338mr14105515e9.24.1779423761073; Thu, 21 May 2026 21:22:41 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45eb6c9f58dsm1012388f8f.5.2026.05.21.21.22.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2026 21:22:40 -0700 (PDT) From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Message-ID: <20260522062239.4fcd3314@elisabeth> In-Reply-To: <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com> References: <20260513115218.1662850-1-lvivier@redhat.com> <20260520173445.0658dfef@elisabeth> <20260520180708.275ec4de@elisabeth> <20260520181852.1f0119ff@elisabeth> <20260520225340.54490a21@elisabeth> <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Fri, 22 May 2026 06:22:40 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ikyptvVsDvoVeLa5akNveim9b1c90lf7dn6XmGOuJ1o_1779423762 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 5LK6ZPRBSVKSRNQHJ5MGAIRX3QP6LZ5Y X-Message-ID-Hash: 5LK6ZPRBSVKSRNQHJ5MGAIRX3QP6LZ5Y X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , David GIbson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 22 May 2026 01:13:33 +0200 Laurent Vivier wrote: > On 5/21/26 10:30, Laurent Vivier wrote: > > On 5/20/26 22:53, Stefano Brivio wrote: =20 > >> On Wed, 20 May 2026 18:18:52 +0200 > >> Stefano Brivio wrote: > >> =20 > >>> On Wed, 20 May 2026 18:07:08 +0200 > >>> Stefano Brivio wrote: > >>> =20 > >>>> On Wed, 20 May 2026 17:34:45 +0200 > >>>> Stefano Brivio wrote: =20 > >>>>> On Wed, 13 May 2026 13:52:08 +0200 > >>>>> Laurent Vivier wrote: =20 > >>>>>> Currently, the vhost-user path assumes each virtqueue element cont= ains > >>>>>> exactly one iovec entry covering the entire frame.=C2=A0 This assu= mption > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descripto= rs where the > >>>>>> vnet header and the frame payload are in separate buffers, resulti= ng in > >>>>>> two iovec entries per virtqueue element. > >>>>>> > >>>>>> This series refactors the vhost-user data path so that frame lengt= hs, > >>>>>> header sizes, and padding are tracked and passed explicitly rather= than > >>>>>> being derived from iovec sizes.=C2=A0 This decoupling is a prerequ= isite for > >>>>>> correctly handling padding of multi-buffer frames. =20 > >>>>> > >>>>> Sorry to bring (likely) bad news, but this series seems to introduc= e a > >>>>> regression: I got the migration/rampstream_in tests fail twice in a > >>>>> row, which I've never saw happening (I think I saw a single failure= a > >>>>> long time ago when the machine had a high CPU load, but nothing els= e). > >>>>> > >>>>> I'm currently bisecting and the bisect seems to point towards the e= nd > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll ke= ep > >>>>> you posted. I haven't spotted anything that might cause issues ther= e. =20 > >>>> > >>>> Yeah, that's the one :( > >>>> > >>>> $ git bisect bad > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a > >>>> Author: Laurent Vivier > >>>> Date:=C2=A0=C2=A0 Wed May 13 13:52:18 2026 +0200 > >>>> > >>>> =C2=A0=C2=A0=C2=A0=C2=A0 vhost-user: Centralise Ethernet frame paddi= ng in vu_collect() and vu_pad() =20 > >=20 > > I checked on my system with the commit previous to this series, > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_= in fails too (not=20 > > everytime). > > =20 > > > TCP/IPv4: sequence check, ramps, inbound =20 > > ...failed. > >=20 > > and rampstream_out hangs sometime too. > >=20 > > I'm going to try with ealier commits. =20 >=20 > For me the problem can happen with any commit... >=20 > As it depends on the execution path and on the load and speed of the syst= em it looks like=20 > a race condition. Hah, thanks for checking. Maybe... > Did you try to test on a host with a kernel patched with > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data r= eceive" ? Now I tried, and yes, the test doesn't hang anymore! I seem to have an issue with teardown functions on recent kernels (current net.git HEAD more or less): --- + teardown_migrate + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.= hold -- kill 16 qemu-system-x86_64: terminating on signal 15 from pid 34 () + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.= hold -- kill 15 18.8974: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost user messag= e =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 18.8974: Request: VHOST_USER_GET_VRING_BASE (11) 18.8974: Flags: 0x1 18.8974: Size: 8 18.8974: State.index: 0 18.8975: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost user messag= e =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 18.8975: Request: VHOST_USER_GET_VRING_BASE (11) 18.8975: Flags: 0x1 18.8975: Size: 8 18.8975: State.index: 1 qemu-system-x86_64: terminating on signal 15 from pid 35 () 18.7961: Client connection closed 18.7962: Closing TCP_REPAIR helper socket + context_wait qemu_1 + __name=3Dqemu_1 + __pidfile=3D/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid + rc=3D0 + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr= /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1 + __pid=3D67766 + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid + [ 1 -eq 1 ] + echo [Exit code: 0] + echo -n passt_repair_2$=20 + return 0 18.9016: Client connection closed 18.9018: Closing TCP_REPAIR helper socket + wait 67766 + rc=3D0 + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe= /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI + [ 1 -eq 1 ] + echo [Exit code: 0] + echo -n passt_repair_1$=20 + return 0 + rc=3D0 + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/pa= sst-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA + [ 1 -eq 1 ] + echo [Exit code: 0] + echo -n qemu_2$=20 + return 0 2026/05/22 04:08:23 socat[73089] E connect(5, AF=3D40 cid:94558 port:22, 16= ): Connection timed out Connection closed by UNKNOWN port 65535 ... --- it looks like we stop QEMU a bit too early. But it should be unrelated. I'm now trying to find some kind of workaround for existing (not fixed) kernel versions. Maybe stopping rampstream_in for a moment or something like that. --=20 Stefano