From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=g8vinslZ; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 2F4555A0265 for ; Tue, 26 May 2026 10:38:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779784695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=An91Y04r7lFRD69gQ5wsH5XytbZLZRjIZiVW2bDXgOw=; b=g8vinslZJeEuvZVXpe59Rpq+ykSvMT0Nw9cXDFNhqSwqpDy7P+3UVBYwJXy1LF/wPa9fHx ZmmkF67a6aulX2CxUq6GQcgunrRKB31V4Ou4TCZS/sTKwl7xaiGjaG/a6opQL7IT+xvAJ4 EwlynuIqF27GDbjQW25BK4XhWXPcwTo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-417-9L3_6c9ENb2mQnrN8RZiwA-1; Tue, 26 May 2026 04:38:13 -0400 X-MC-Unique: 9L3_6c9ENb2mQnrN8RZiwA-1 X-Mimecast-MFC-AGG-ID: 9L3_6c9ENb2mQnrN8RZiwA_1779784692 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-4411a36715dso6726250f8f.2 for ; Tue, 26 May 2026 01:38:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779784692; x=1780389492; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ey2sLbNNLx1XF+FLyQzdnwoh5KJYFwzmCtQ6uGFXA0o=; b=oev1QaTkCwXcPB4Ut9eCVSVRAgZpNStNWN7YIYGRlzlyrKohRgxSmgvUN7StYnw+vM nmBrxtKxjGgg7/1TnSvG8/azk8if7jxquTHOxdsNDUP9Q+L3N81LUkZyGdHTwrLHtBgt tAQKVyrHFVtvjpMRxpJXz+o4mwq6ZEGxdPrKzoxNOsw+vPSu6M7NxcmBiZt+aoscw6kG fwts1edVcuHNJkGrPoot8BFfKb8eP0LxO7ByF1hu6czJD/8DconzQdRA/RamOnKfRAe8 63xt4BxmphBT9YAsotfRmoJZlH1gcaMxBtd7qy1M5mbRUWvBY3Ay1bY9YsQcyN0+HQBz CZCA== X-Gm-Message-State: AOJu0Yw1I3xvjODrdlOqQFQFhUOgY/6cdB5CMFrGlHkFBivJOGYxPzrm adEkQiXxixQSB3kWv1m8b3BMfPXq/hFzVIXYLOJUmn679jJcZ9blFLykY6dk3DFVEjZfiTFlufB afyYW/pnv7RXYVg9ELuJtKDDMm/9YNZHH3kSHCix0h4XrYAi0UUffjg== X-Gm-Gg: Acq92OGiSG+9RDHunrj3NiZz9XkxB1k7IHpcZ1BKnq+hY+Cs12l00TqsvmpHP8IOcye Ek3HVDInCPQ+2LWsQAhNg4g9ALKYDRFdLYBQuV0kHCGMgBHzXDAKhTi1kNqLtPYgIE4bwQY7hP2 0mGhHubpI+JlEO0/pDEP0NTNN06zQ4eP6KAASP0GBcmh01eYUqZFu+9kmWYQrl82jP3XlErTxmR LHZuS3gfxjwajZnm+W/goEXhMfiyAF2XTfgLoWNV7rNmEdOux22sHtRJsI+3zbyreqrhnwq+raJ X/c7SbaYiMUdEQOn0bD+jty2QiRLlkPafiDAqsvM+8/Awj1gYU3Z7ogx1idJSKE/v4YAhywhcx2 dxUgtju0oDn3rajUcdpHierlZ299GIMenT+Xr/nOKzb4uxYMQ5g== X-Received: by 2002:a05:600c:3b02:b0:48a:8b02:ae91 with SMTP id 5b1f17b1804b1-490424b11b2mr285338045e9.11.1779784691808; Tue, 26 May 2026 01:38:11 -0700 (PDT) X-Received: by 2002:a05:600c:3b02:b0:48a:8b02:ae91 with SMTP id 5b1f17b1804b1-490424b11b2mr285337255e9.11.1779784691220; Tue, 26 May 2026 01:38:11 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-49045284855sm309843745e9.0.2026.05.26.01.38.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2026 01:38:10 -0700 (PDT) From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Message-ID: <20260526103809.54da7aac@elisabeth> In-Reply-To: <20260526095955.008a6ea1@elisabeth> References: <20260513115218.1662850-1-lvivier@redhat.com> <20260520173445.0658dfef@elisabeth> <20260520180708.275ec4de@elisabeth> <20260520181852.1f0119ff@elisabeth> <20260520225340.54490a21@elisabeth> <50d79312-0493-4af0-b0bc-7c590885cbd2@redhat.com> <20260522062239.4fcd3314@elisabeth> <20260522074455.15e6cc3e@elisabeth> <20260522140414.6eaa8f43@elisabeth> <20260526095955.008a6ea1@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Tue, 26 May 2026 10:38:10 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QVfLk981LP8mvUWhmQnqjREgMeqL8jYcJq0-joyg5HU_1779784692 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VYXF5OPWSXHXS54SNS6D7I5JH4JJFC35 X-Message-ID-Hash: VYXF5OPWSXHXS54SNS6D7I5JH4JJFC35 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , David GIbson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 26 May 2026 09:59:55 +0200 Stefano Brivio wrote: > On Tue, 26 May 2026 09:31:51 +0200 > Laurent Vivier wrote: >=20 > > On 5/22/26 14:04, Stefano Brivio wrote: =20 > > > On Fri, 22 May 2026 07:44:55 +0200 > > > Stefano Brivio wrote: > > > =20 > > >> On Fri, 22 May 2026 06:22:39 +0200 > > >> Stefano Brivio wrote: > > >> =20 > > >>> On Fri, 22 May 2026 01:13:33 +0200 > > >>> Laurent Vivier wrote: > > >>> =20 > > >>>> On 5/21/26 10:30, Laurent Vivier wrote: =20 > > >>>>> On 5/20/26 22:53, Stefano Brivio wrote: =20 > > >>>>>> On Wed, 20 May 2026 18:18:52 +0200 > > >>>>>> Stefano Brivio wrote: > > >>>>>> =20 > > >>>>>>> On Wed, 20 May 2026 18:07:08 +0200 > > >>>>>>> Stefano Brivio wrote: > > >>>>>>> =20 > > >>>>>>>> On Wed, 20 May 2026 17:34:45 +0200 > > >>>>>>>> Stefano Brivio wrote: =20 > > >>>>>>>>> On Wed, 13 May 2026 13:52:08 +0200 > > >>>>>>>>> Laurent Vivier wrote: =20 > > >>>>>>>>>> Currently, the vhost-user path assumes each virtqueue elemen= t contains > > >>>>>>>>>> exactly one iovec entry covering the entire frame.=C2=A0 Thi= s assumption > > >>>>>>>>>> breaks as some virtio-net drivers (notably iPXE) provide des= criptors where the > > >>>>>>>>>> vnet header and the frame payload are in separate buffers, r= esulting in > > >>>>>>>>>> two iovec entries per virtqueue element. > > >>>>>>>>>> > > >>>>>>>>>> This series refactors the vhost-user data path so that frame= lengths, > > >>>>>>>>>> header sizes, and padding are tracked and passed explicitly = rather than > > >>>>>>>>>> being derived from iovec sizes.=C2=A0 This decoupling is a p= rerequisite for > > >>>>>>>>>> correctly handling padding of multi-buffer frames. =20 > > >>>>>>>>> > > >>>>>>>>> Sorry to bring (likely) bad news, but this series seems to in= troduce a > > >>>>>>>>> regression: I got the migration/rampstream_in tests fail twic= e in a > > >>>>>>>>> row, which I've never saw happening (I think I saw a single f= ailure a > > >>>>>>>>> long time ago when the machine had a high CPU load, but nothi= ng else). > > >>>>>>>>> > > >>>>>>>>> I'm currently bisecting and the bisect seems to point towards= the end > > >>>>>>>>> of the series (probably 10/10), but I haven't finished yet. I= 'll keep > > >>>>>>>>> you posted. I haven't spotted anything that might cause issue= s there. =20 > > >>>>>>>> > > >>>>>>>> Yeah, that's the one :( > > >>>>>>>> > > >>>>>>>> $ git bisect bad > > >>>>>>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad comm= it > > >>>>>>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a > > >>>>>>>> Author: Laurent Vivier > > >>>>>>>> Date:=C2=A0=C2=A0 Wed May 13 13:52:18 2026 +0200 > > >>>>>>>> > > >>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0 vhost-user: Centralise Ethernet fram= e padding in vu_collect() and vu_pad() =20 > > >>>>> > > >>>>> I checked on my system with the commit previous to this series, > > >>>>> bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and ramps= tream_in fails too (not > > >>>>> everytime). > > >>>>> =20 > > >>>>> > TCP/IPv4: sequence check, ramps, inbound =20 > > >>>>> ...failed. > > >>>>> > > >>>>> and rampstream_out hangs sometime too. > > >>>>> > > >>>>> I'm going to try with ealier commits. =20 > > >>>> > > >>>> For me the problem can happen with any commit... > > >>>> > > >>>> As it depends on the execution path and on the load and speed of t= he system it looks like > > >>>> a race condition. =20 > > >>> > > >>> Hah, thanks for checking. Maybe... > > >>> =20 > > >>>> Did you try to test on a host with a kernel patched with > > >>>> "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and= data receive" ? =20 > > >>> > > >>> Now I tried, and yes, the test doesn't hang anymore! I seem to have= an > > >>> issue with teardown functions on recent kernels (current net.git HE= AD > > >>> more or less): > > >>> > > >>> --- > > >>> [...] > > >>> > > >>> 2026/05/22 04:08:23 socat[73089] E connect(5, AF=3D40 cid:94558 por= t:22, 16): Connection timed out > > >>> Connection closed by UNKNOWN port 65535 > > >>> ... > > >>> --- > > >>> > > >>> it looks like we stop QEMU a bit too early. But it should be unrela= ted. =20 > > >=20 > > > Oops, I forgot to upgrade QEMU on the virtual machine I was using to > > > test those kernel builds, I had a somewhat outdated 8.1 version and i= t > > > failed migration for unrelated reasons. It works with 11.0. > > >=20 > > > Back to kernel versions: the "problem" is that with a recent > > > net-next.git HEAD, with or without my fix, in a nested VM, the test > > > always passes (20/20). And I can't easily test things non-nested. > > >=20 > > > I guess could just skip that test for the moment from the set I run g= it > > > push, and run it manually in the virtual machine, for the moment. > > >=20 > > > But judging from captures (test_logs/pasta_1.pcap from PCAP=3D1 ./run= ) > > > I'm fairly sure it's not *that* issue: > > >=20 > > > 465 12.141763 192.0.2.1 =E2=86=92 88.198.0.164 58451 TCP [TCP = Window Full] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10002100 Ack=3D1 Win=3D= 65536 Len=3D58397 > > > 466 12.187195 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP Zer= oWindow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > > 467 13.187281 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP 34416 = =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win=3D65536 Len=3D4096 > > >=20 > > > last data transfer from client (rampstream): > > >=20 > > > 468 13.187358 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP Zer= oWindow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > >=20 > > > everything acknowledged, migration starts now: > > >=20 > > > 469 14.143217 fe80::f471:c3ff:fe10:4e45 =E2=86=92 ff02::2 70= ICMPv6 Router Solicitation from f6:71:c3:10:4e:45 > > > 470 14.687123 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP Zer= oWindow] [TCP Keep-Alive] 10001 =E2=86=92 34416 [ACK] Seq=3D0 Ack=3D1006049= 7 Win=3D0 Len=3D0 > > >=20 > > > migration completed: and we acknowledge the right sequence (10060497)= , > > > so it didn't jump forward. > > >=20 > > > But starting from this point: > > >=20 > > > 471 14.687265 192.0.2.1 =E2=86=92 88.198.0.164 60 TCP 34416 = =E2=86=92 10001 [ACK] Seq=3D10060497 Ack=3D1 Win=3D65536 Len=3D0 > > > 472 16.687412 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP R= etransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win= =3D65536 Len=3D4096 > > > 473 16.687450 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP Zer= oWindow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > > 474 20.687650 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP R= etransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win= =3D65536 Len=3D4096 > > > 475 20.687692 88.198.0.164 =E2=86=92 192.0.2.1 54 TCP [TCP Zer= oWindow] 10001 =E2=86=92 34416 [ACK] Seq=3D1 Ack=3D10060497 Win=3D0 Len=3D0 > > > 476 28.687817 192.0.2.1 =E2=86=92 88.198.0.164 4150 TCP [TCP R= etransmission] 34416 =E2=86=92 10001 [PSH, ACK] Seq=3D10060497 Ack=3D1 Win= =3D65536 Len=3D4096 > > >=20 > > > we keep advertising a zero window (that's the kernel doing it really)= , > > > as if we were unable to dequeue data. > > >=20 > > > I enabled --trace just for the target instance of passt, and I don't > > > see anything suspicious there: > > >=20 > > > 13.0958: Receiving 1 flows > > > 13.0958: Flow 0 (NEW): FREE -> NEW > > > 13.0958: Flow 0 (TCP connection): TGT -> TYPED > > > 13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0= .164]:10001 =3D> TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001 > > > 13.0958: Flow 0 (TCP connection): Side 1 hash table insert: bucket: 1= 38154 > > > 13.0958: Flow 0 (TCP connection): TYPED -> ACTIVE > > > 13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0= .164]:10001 =3D> TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001 > > > 13.0959: Flow 0 (TCP connection): Extended migration data, socket 83 = sequences send 3121929544 receive 1643895001 > > > 13.0959: Flow 0 (TCP connection): pending queues: send 0 not sent 0= receive 3500081 > > > 13.0959: Flow 0 (TCP connection): window: snd_wl1 1647395082 snd_wn= d 65536 max 65536 rcv_wnd 0 rcv_wup 1647395082 > > > 13.0959: Flow 0 (TCP connection): SO_PEEK_OFF disabled offset=3D0 > > > 13.0985: Got packet, but RX virtqueue not usable yet > > > 13.0985: Closing migration channel, fd: 82 > > > 13.0985: Closing TCP_REPAIR helper socket > > > 13.0985: passt: epoll event on vhost-user command socket 77 (events: = 0x00000001) > > >=20 > > > then the usual VHOST_USER_CHECK_DEVICE_STATE and VHOST_USER_SET_VRING= _ENABLE > > > commands. After that, a tight loop of: > > >=20 > > > 13.0986: passt: epoll event on connected TCP socket 83 (events: 0x000= 00001) > > > 13.0986: Got packet, but RX virtqueue not usable yet > > > 13.0986: passt: epoll event on connected TCP socket 83 (events: 0x000= 00001) > > > 13.0986: Got packet, but RX virtqueue not usable yet > > >=20 > > > until we go further with the vhost-user setup. I still see this messa= ge > > > which I had never noticed (but I didn't try to bisect around it): > > >=20 > > > 13.1006: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Vhost user = message =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > 13.1006: Request: VHOST_USER_SET_VRING_ADDR (9) > > > [...] > > > 13.1006: Last avail index !=3D used index: 3252 !=3D 3027 > > >=20 > > > and then after VHOST_USER_SET_VRING_CALL, and: > > >=20 > > > 13.1008: passt: epoll event on vhost-user kick socket 78 (events: 0x0= 0000001) > > > 13.1008: vhost-user: got kick_data: 0000000000000001 idx: 1 > > >=20 > > > it's just a tight loop of: > > >=20 > > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x000= 00001) > > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x000= 00001) > > > 13.1008: passt: epoll event on connected TCP socket 83 (events: 0x000= 00001) > > >=20 > > > as if we weren't dequeueing anything from there. > > >=20 > > > I start suspecting we might be hitting two different issues: perhaps > > > things fail on your setup because of the kernel bug with TCP_REPAIR n= ot > > > freezing the queue, and they fail on my setup for some other reason. > > >=20 > > > For me it's very deterministic though: with patch 10/10 things always > > > fail, and without it they never fail. > > >=20 > > > I guess I'll add more prints and check for more messages before/after > > > that patch. > > > =20 > >=20 > > In fact there is a buffer leak because iov_skip_bytes() doesn't correct= ly compute the=20 > > number of used elements and then we don't release all the unused buffer= s. > >=20 > > I'm trying to fix that. > >=20 > > Please try with series "[PATCH v7 0/4] vhost-user,tcp: Handle multiple = iovec entries per=20 > > virtqueue element" applied, it reworks this part. =20 >=20 > I'm trying it now. If that totally reworks this part and it fixes > things and it's ready to be merged (sorry, I didn't manage to have a > look yet) I don't think it's strictly necessary to figure out the > leak. All tests pass with it, rampstream_in passed 20/20 times. Should I go ahead and merge both series (UDP and TCP, they both look ready) or do you still need to figure out the buffer leak first for other reasons? --=20 Stefano