From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JIqQC9/f; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 75A8B5A0265 for ; Tue, 26 May 2026 10:00:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779782401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OZx7eiino3c7Ak5Iy1iTFcgoVrGgP3c5tltSBzamilY=; b=JIqQC9/fxYaJlfo2tq2zT0qOE4VRwF719I2r9nL38yS6nY8aZ2irgPIX5EjGFIqMI1Rjnj guf/Iyow1NI+LJ6scFNNKFYCthDZALGSVQAhWOfP530xUHB08NwY5Qy0tlEjYWyfvTpL0v o8ISzox3FHRVywbGQyX2VPCJYTZgulM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-76-9ijR4iO4NCahgCaf2tJ1ZQ-1; Tue, 26 May 2026 03:59:59 -0400 X-MC-Unique: 9ijR4iO4NCahgCaf2tJ1ZQ-1 X-Mimecast-MFC-AGG-ID: 9ijR4iO4NCahgCaf2tJ1ZQ_1779782398 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-49051422d55so28504715e9.2 for ; Tue, 26 May 2026 00:59:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779782398; x=1780387198; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ANXXPWN/SbwE+YfYfvpyU1YVOOxLLz1p4SlZ3NOUktM=; b=AiYxXkDgqdwmXH+Fvjivvw1UDJ/eqjaEe10UOTYadhiuLFCgchc3E/jM9VKuzqk6U+ 6A46MzVL7YRNtnszdHrHKH7vT1B/yYXRjcTWLZ/t+q+WapyPpQkLu+Lm6M13aw+FKds9 o8mb7nvQsxD1oeRHke9QXZpK/VMuuwqsOptCBEg/ffAFQAa6nIOYUMFxIzK++fdSwNio VX3PTLogsAmfftvdos5rx21fm7uuFCokxadZocl6l0jzEaUiaqFVbAfR/6q6lTJPlPlU J9h6xxNAp1cL8vI+rU/ETc1EztYCsZ1CwNEDXNtoGf7srMa3+MwoFww7DhXAxZ3gLvwh ljWw== X-Gm-Message-State: AOJu0YxE05+3DVoyMtd61FblHpU+S3aNkWw5ekfxTmeM8WlyiGT/t248 b0LEXWzChFLuySg+xCCB6QO1jYyN6LEvr93N5I7xG/6+bcb/XvoSUQyF+r2uhvXuf9mzEH2SwVf A5He7nZyHinhFL6QcYGXctDzjy6U5NC7Wx3tjF8oMF0CzEdAQcrs5wg== X-Gm-Gg: Acq92OGiUMSQ9koLD8MuOWW99KqCxoaH+CBZV98yNmZjrVVfx5engCfCun2UAagBsf5 pSIklEAx6Sask+Lnk8Ri1iywv3zZd9bKMOjwhxkNbqS6aakXua7zTbkprLl6htILY4re2zKoi31 VgWEdAZwvdTS1f1u8C55aiFGUBSBqDRmtdRZ971B3wyenxZ7Qcjuov4pwjH92zl9EAbMv+iyyFk SNNa7HVooaoB/Htcwmmlhjowv8tu199Zz3ASmmn2jo8aMQByKkZhDuLrwRyrlTDfmC+lbWPZqqC mq9kaYCydhURZxOfQnAuXHeGG0q7NC4JRddwfiQaOY23mVImEt4bijQ2EexoMqCuZwD+Xy+6Kcw V/ACH8MQg9MbpoRHKEU6AFwUsX8RPT3Ra X-Received: by 2002:a05:600c:3e12:b0:490:40f1:5314 with SMTP id 5b1f17b1804b1-49042482845mr308999945e9.1.1779782397850; Tue, 26 May 2026 00:59:57 -0700 (PDT) X-Received: by 2002:a05:600c:3e12:b0:490:40f1:5314 with SMTP id 5b1f17b1804b1-49042482845mr308999065e9.1.1779782397214; Tue, 26 May 2026 00:59:57 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490791a3f16sm11896865e9.9.2026.05.26.00.59.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2026 00:59:56 -0700 (PDT) From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Message-ID: <20260526095955.008a6ea1@elisabeth> In-Reply-To: References: <20260513115218.1662850-1-lvivier@redhat.com> <20260520173445.0658dfef@elisabeth> <20260520180708.275ec4de@elisabeth> <20260520181852.1f0119ff@elisabeth> <20260520225340.54490a21@elisabeth> <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com> <20260522062239.4fcd3314@elisabeth> <20260522074455.15e6cc3e@elisabeth> <20260522140414.6eaa8f43@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Tue, 26 May 2026 09:59:56 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: cUWF4UcBnRMWnNMIAffCfEEFzYMJYcymu2pNxKjmNoE_1779782398 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: HEEIGYSME3Q75F7JRTU7OIMLNBOVJWDV X-Message-ID-Hash: HEEIGYSME3Q75F7JRTU7OIMLNBOVJWDV X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , David GIbson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 26 May 2026 09:31:51 +0200 Laurent Vivier wrote: > On 5/22/26 14:04, Stefano Brivio wrote: > > On Fri, 22 May 2026 07:44:55 +0200 > > Stefano Brivio wrote: > > =20 > >> On Fri, 22 May 2026 06:22:39 +0200 > >> Stefano Brivio wrote: > >> =20 > >>> On Fri, 22 May 2026 01:13:33 +0200 > >>> Laurent Vivier wrote: > >>> =20 > >>>> On 5/21/26 10:30, Laurent Vivier wrote: =20 > >>>>> On 5/20/26 22:53, Stefano Brivio wrote: =20 > >>>>>> On Wed, 20 May 2026 18:18:52 +0200 > >>>>>> Stefano Brivio wrote: > >>>>>> =20 > >>>>>>> On Wed, 20 May 2026 18:07:08 +0200 > >>>>>>> Stefano Brivio wrote: > >>>>>>> =20 > >>>>>>>> On Wed, 20 May 2026 17:34:45 +0200 > >>>>>>>> Stefano Brivio wrote: =20 > >>>>>>>>> On Wed, 13 May 2026 13:52:08 +0200 > >>>>>>>>> Laurent Vivier wrote: =20 > >>>>>>>>>> Currently, the vhost-user path assumes each virtqueue element = contains > >>>>>>>>>> exactly one iovec entry covering the entire frame.=C2=A0 This = assumption > >>>>>>>>>> breaks as some virtio-net drivers (notably iPXE) provide descr= iptors where the > >>>>>>>>>> vnet header and the frame payload are in separate buffers, res= ulting in > >>>>>>>>>> two iovec entries per virtqueue element. > >>>>>>>>>> > >>>>>>>>>> This series refactors the vhost-user data path so that frame l= engths, > >>>>>>>>>> header sizes, and padding are tracked and passed explicitly ra= ther than > >>>>>>>>>> being derived from iovec sizes.=C2=A0 This decoupling is a pre= requisite for > >>>>>>>>>> correctly handling padding of multi-buffer frames. =20 > >>>>>>>>> > >>>>>>>>> Sorry to bring (likely) bad news, but this series seems to intr= oduce a > >>>>>>>>> regression: I got the migration/rampstream_in tests fail twice = in a > >>>>>>>>> row, which I've never saw happening (I think I saw a single fai= lure a > >>>>>>>>> long time ago when the machine had a high CPU load, but nothing= else). > >>>>>>>>> > >>>>>>>>> I'm currently bisecting and the bisect seems to point towards t= he end > >>>>>>>>> of the series (probably 10/10), but I haven't finished yet. I'l= l keep > >>>>>>>>> you posted. I haven't spotted anything that might cause issues = there. =20 > >>>>>>>> > >>>>>>>> Yeah, that's the one :( > >>>>>>>> > >>>>>>>> $ git bisect bad > >>>>>>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit > >>>>>>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a > >>>>>>>> Author: Laurent Vivier > >>>>>>>> Date:=C2=A0=C2=A0 Wed May 13 13:52:18 2026 +0200 > >>>>>>>> > >>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0 vhost-user: Centralise Ethernet frame = padding in vu_collect() and vu_pad() =20 > >>>>> > >>>>> I checked on my system with the commit previous to this series, > >>>>> bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstr= eam_in fails too (not > >>>>> everytime). > >>>>> =20 > >>>>> > TCP/IPv4: sequence check, ramps, inbound =20 > >>>>> ...failed. > >>>>> > >>>>> and rampstream_out hangs sometime too. > >>>>> > >>>>> I'm going to try with ealier commits. =20 > >>>> > >>>> For me the problem can happen with any commit... > >>>> > >>>> As it depends on the execution path and on the load and speed of the= system it looks like > >>>> a race condition. =20 > >>> > >>> Hah, thanks for checking. Maybe... > >>> =20 > >>>> Did you try to test on a host with a kernel patched with > >>>> "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and d= ata receive" ? =20 > >>> > >>> Now I tried, and yes, the test doesn't hang anymore! I seem to have a= n > >>> issue with teardown functions on recent kernels (current net.git HEAD > >>> more or less): > >>> > >>> --- > >>> [...] > >>> > >>> 2026/05/22 04:08:23 socat[73089] E connect(5, AF=3D40 cid:94558 port:= 22, 16): Connection timed out > >>> Connection closed by UNKNOWN port 65535 > >>> ... > >>> --- > >>> > >>> it looks like we stop QEMU a bit too early. But it should be unrelate= d. =20 > >=20 > > Oops, I forgot to upgrade QEMU on the virtual machine I was using to > > test those kernel builds, I had a somewhat outdated 8.1 version and it > > failed migration for unrelated reasons. It works with 11.0. > >=20 > > Back to kernel versions: the "problem" is that with a recent > > net-next.git HEAD, with or without my fix, in a nested VM, the test > > always passes (20/20). And I can't easily test things non-nested. > >=20 > > I guess could just skip that test for the moment from the set I run git > > push, and run it manually in the virtual machine, for the moment. > >=20 > > But judging from captures (test_logs/pasta_1.pcap from PCAP=3D1 ./run) > > I'm fairly sure it's not *that* issue: > >=20 > > 465 12.141763 192.0.2.1 =E2=86=92 88.198.0.164 58451 TCP [TCP Wi= ndow Full] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10002100 Ack=3D1 Win=3D65= 536 Len=3D58397 > > 466 12.187195 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP ZeroW= indow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > 467 13.187281 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP 34416 = =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win=3D65536 Len=3D4096 > >=20 > > last data transfer from client (rampstream): > >=20 > > 468 13.187358 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP ZeroW= indow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > >=20 > > everything acknowledged, migration starts now: > >=20 > > 469 14.143217 fe80::f471:c3ff:fe10:4e45 =E2=86=92 ff02::2 70 I= CMPv6 Router Solicitation from f6:71:c3:10:4e:45 > > 470 14.687123 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP ZeroW= indow] [TCP Keep-Alive] 10001 =E2=86=92 34416 [ACK] Seq=3D0 Ack=3D10060497 = Win=3D0 Len=3D0 > >=20 > > migration completed: and we acknowledge the right sequence (10060497), > > so it didn't jump forward. > >=20 > > But starting from this point: > >=20 > > 471 14.687265 192.0.2.1 =E2=86=92 88.198.0.164 60 TCP 34416 =E2= =86=92 10001 [ACK] Seq=3D10060497 Ack=3D1 Win=3D65536 Len=3D0 > > 472 16.687412 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP Ret= ransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win=3D= 65536 Len=3D4096 > > 473 16.687450 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP ZeroW= indow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > 474 20.687650 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP Ret= ransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win=3D= 65536 Len=3D4096 > > 475 20.687692 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP ZeroW= indow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > 476 28.687817 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP Ret= ransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win=3D= 65536 Len=3D4096 > >=20 > > we keep advertising a zero window (that's the kernel doing it really), > > as if we were unable to dequeue data. > >=20 > > I enabled --trace just for the target instance of passt, and I don't > > see anything suspicious there: > >=20 > > 13.0958: Receiving 1 flows > > 13.0958: Flow 0 (NEW): FREE -> NEW > > 13.0958: Flow 0 (TCP connection): TGT -> TYPED > > 13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0.1= 64]:10001 =3D> TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001 > > 13.0958: Flow 0 (TCP connection): Side 1 hash table insert: bucket: 138= 154 > > 13.0958: Flow 0 (TCP connection): TYPED -> ACTIVE > > 13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0.1= 64]:10001 =3D> TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001 > > 13.0959: Flow 0 (TCP connection): Extended migration data, socket 83 se= quences send 3121929544 receive 1643895001 > > 13.0959: Flow 0 (TCP connection): pending queues: send 0 not sent 0 r= eceive 3500081 > > 13.0959: Flow 0 (TCP connection): window: snd_wl1 1647395082 snd_wnd = 65536 max 65536 rcv_wnd 0 rcv_wup 1647395082 > > 13.0959: Flow 0 (TCP connection): SO_PEEK_OFF disabled offset=3D0 > > 13.0985: Got packet, but RX virtqueue not usable yet > > 13.0985: Closing migration channel, fd: 82 > > 13.0985: Closing TCP_REPAIR helper socket > > 13.0985: passt: epoll event on vhost-user command socket 77 (events: 0x= 00000001) > >=20 > > then the usual VHOST_USER_CHECK_DEVICE_STATE and VHOST_USER_SET_VRING_E= NABLE > > commands. After that, a tight loop of: > >=20 > > 13.0986: passt: epoll event on connected TCP socket 83 (events: 0x00000= 001) > > 13.0986: Got packet, but RX virtqueue not usable yet > > 13.0986: passt: epoll event on connected TCP socket 83 (events: 0x00000= 001) > > 13.0986: Got packet, but RX virtqueue not usable yet > >=20 > > until we go further with the vhost-user setup. I still see this message > > which I had never noticed (but I didn't try to bisect around it): > >=20 > > 13.1006: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost user me= ssage =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > 13.1006: Request: VHOST_USER_SET_VRING_ADDR (9) > > [...] > > 13.1006: Last avail index !=3D used index: 3252 !=3D 3027 > >=20 > > and then after VHOST_USER_SET_VRING_CALL, and: > >=20 > > 13.1008: passt: epoll event on vhost-user kick socket 78 (events: 0x000= 00001) > > 13.1008: vhost-user: got kick_data: 0000000000000001 idx: 1 > >=20 > > it's just a tight loop of: > >=20 > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000= 001) > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000= 001) > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000= 001) > >=20 > > as if we weren't dequeueing anything from there. > >=20 > > I start suspecting we might be hitting two different issues: perhaps > > things fail on your setup because of the kernel bug with TCP_REPAIR not > > freezing the queue, and they fail on my setup for some other reason. > >=20 > > For me it's very deterministic though: with patch 10/10 things always > > fail, and without it they never fail. > >=20 > > I guess I'll add more prints and check for more messages before/after > > that patch. > > =20 >=20 > In fact there is a buffer leak because iov_skip_bytes() doesn't correctly= compute the=20 > number of used elements and then we don't release all the unused buffers. >=20 > I'm trying to fix that. >=20 > Please try with series "[PATCH v7 0/4] vhost-user,tcp: Handle multiple io= vec entries per=20 > virtqueue element" applied, it reworks this part. I'm trying it now. If that totally reworks this part and it fixes things and it's ready to be merged (sorry, I didn't manage to have a look yet) I don't think it's strictly necessary to figure out the leak. --=20 Stefano