From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=Gq4ifq5G; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id C4AD05A0271 for ; Thu, 16 Jan 2025 00:51:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1736985090; bh=5mk/w7fd0KWf++50cJg7j8PcBsclFF5YsOjq2xCtCvE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Gq4ifq5GEh6qf9EtemUG8xiQ3Rtxv4scBg3uv6U3/CxEI/JCjHwAAE4Vd8cnP2MqK O7wtVlLJ/INgQrkPONyCxD6Bk1hPPTjJ8kXHf7YmqNCrgQzaTtpmi09y3lGh++Eh/3 e3l2NgzCBo9/Nst+JwixjNydva4TbNxDzDC7OM3EDVmD34OBwLf9Wfcpr4AmYd7uCh uJI6II8H1KcVXsUb7sZ9u/4VkHQ8q6LmfFZx+vO8uYhpRPelV3t47iik+xVYRDVeSg coxMyHc9jUxsIPyeogaKMc9gGyX0JgRY09ZKuFAgbvlJXgbsjMZ7buhy1lapURTkv0 lSbLzm979sfQw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YYN7Q3PMpz4xRj; Thu, 16 Jan 2025 10:51:30 +1100 (AEDT) Date: Thu, 16 Jan 2025 10:51:33 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] vhost_user: fix multibuffer from linux Message-ID: References: <20250115162230.813861-1-lvivier@redhat.com> <20250115233302.23b24862@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Lvds72M0UK00hQDK" Content-Disposition: inline In-Reply-To: <20250115233302.23b24862@elisabeth> Message-ID-Hash: V652SGWCFTIY3QYQVLMUGXLTYZU2HDM7 X-Message-ID-Hash: V652SGWCFTIY3QYQVLMUGXLTYZU2HDM7 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Lvds72M0UK00hQDK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 15, 2025 at 11:33:02PM +0100, Stefano Brivio wrote: > On Wed, 15 Jan 2025 17:22:30 +0100 > Laurent Vivier wrote: >=20 > > Under some conditions, linux can provide several buffers > > in the same element (multiple entries in the iovec array). > >=20 > > I didn't identify what changed between the kernel guest that > > provides one buffer and the one that provides several > > (doesn't seem to be a kernel change or a configuration change). >=20 > Perhaps memory pressure, or different page accounting between kernels? >=20 > > Fix the following assert: > >=20 > > ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num= _sg > >=20 > > What I can see is the buffer can be splitted in two iovecs: > > - vnet header > > - packet data > >=20 > > This change manages this special case but the real fix will be to allow > > tap_add_packet() to manage iovec array. > >=20 > > Signed-off-by: Laurent Vivier >=20 > Applied. So, we obviously want this as a short term fix. However, this seems like it's still not very robust. IIUC, kernel in theory could arbitrarily split the packet, not just breaking it neatly at the header boundary. I think we should try to handle this more generally. > I just wonder, if it makes sense as a follow-up: >=20 > > --- > > vu_common.c | 28 ++++++++++++++++++++++------ > > 1 file changed, 22 insertions(+), 6 deletions(-) > >=20 > > diff --git a/vu_common.c b/vu_common.c > > index 6d365bea5fe2..431fba6be0c0 100644 > > --- a/vu_common.c > > +++ b/vu_common.c > > @@ -18,6 +18,8 @@ > > #include "pcap.h" > > #include "vu_common.h" > > =20 > > +#define VU_MAX_TX_BUFFER_NB 2 > > + > > /** > > * vu_packet_check_range() - Check if a given memory zone is contained= in > > * a mapped guest memory region > > @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int= index, > > =20 > > count =3D 0; > > out_sg_count =3D 0; > > - while (count < VIRTQUEUE_MAX_SIZE) { > > + while (count < VIRTQUEUE_MAX_SIZE && > > + out_sg_count + VU_MAX_TX_BUFFER_NB <=3D VIRTQUEUE_MAX_SIZE) { > > int ret; > > =20 > > - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); > > + elem[count].out_num =3D VU_MAX_TX_BUFFER_NB; > > + elem[count].out_sg =3D &out_sg[out_sg_count]; > > + elem[count].in_num =3D 0; > > + elem[count].in_sg =3D NULL; > > + > > ret =3D vu_queue_pop(vdev, vq, &elem[count]); > > if (ret < 0) > > break; > > @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int= index, > > warn("virtio-net transmit queue contains no out buffers"); > > break; > > } > > - ASSERT(elem[count].out_num =3D=3D 1); > > + if (elem[count].out_num =3D=3D 1) { > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[0].iov_len - hdrlen, > > + (char *)elem[count].out_sg[0].iov_base + > > + hdrlen); > > + } else { > > + /* vnet header can be in a separate iovec */ > > + ASSERT(elem[count].out_num =3D=3D 2); >=20 > I suppose we don't have strong guarantees about this. What about > discarding the packet with a debug() message, at least until we have a > more elegant solution, if this happens? >=20 > For UDP and ICMP, that's the best thing we can do. >=20 > For TCP, we could just discard a part of it, and the peer would tell > our guest, but it's surely not practical to look into the packet here, > so dropping it altogether would look reasonable. >=20 > > + ASSERT(elem[count].out_sg[0].iov_len =3D=3D (size_t)hdrlen); >=20 > And similarly here (with an err() message), even though there's probably > an issue in the hypervisor if this happens, but it doesn't mean we're > doomed. Right. If the kernel is splitting every packet this way, then we probably are doomed, but we don't know it for certain. In general ASSERT()s should indicate a definite bug in *our* code, not unexpected behaviour from something we interact with, so err() and dropping the packet would be more appropriate IMO. >=20 > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[1].iov_len, > > + (char *)elem[count].out_sg[1].iov_base); > > + } > > =20 > > - tap_add_packet(vdev->context, > > - elem[count].out_sg[0].iov_len - hdrlen, > > - (char *)elem[count].out_sg[0].iov_base + hdrlen); > > count++; > > } > > tap_handler(vdev->context, now); >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --Lvds72M0UK00hQDK Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeISfgACgkQzQJF27ox 2Gesog//VTv0xQLEJT+mQot4dfit/I2g4Y5WxGOGet8RDGhNlHbO3q2E8YWWKFpf Jt3Ze/9wSQ/p679HRM22NKcZfh0qG9WLv8DAY4fypJ01SIlsy2yNmFhOCNAFcn27 r8FnsJ5MQmXacon4FjB+EIk3A9Xd13vvGDEaNDiwoMcn2VIlZq0hw9itC/UEQoRx NEnpiX6W6pKz/hvya7qMMRh9hWzDOjyKee3+CxGdURiTDNcD1mNb7OLBkpYs6/7H oYhZrjjBhfcSqc9dpCihX1xP/VsOT3lrrqHtvtUH2neLTcpGo2xv2FLh+FZVDYTu aN/9JUWq5Y240GhZiG019gj5txvs2LLy9iazRsVAlzmbjnm2b7F7AVYZPAncbiGh aqlCzYy3gOCXK3yHO25SQBuuQYIBl/UYqWiNeabecoRojDOcZfuczcb18d3mRiqe C71qDUuATs/wvBGjI9D8bDE0Gu7qfeFus74q0s5AyIej5py5DNovcoEDRz4qAY8c 0i266+iOCLODXngVjMnmGagc9MpM0nR0YVWUkiGVLRhWLIV4nYvBXyhB1msWasqX O64HP5wB10oSDLYW2mzRze4TIYYus761Mt9VdwUBaws3+kQ22HngY4lH9hJoxGtP /8gSFJZwlP3IbUnwPzVIPUyPX8tFfqg+qGoKJGQ4eU/64zDv3v4= =dxJT -----END PGP SIGNATURE----- --Lvds72M0UK00hQDK--