* [PATCH] vhost_user: fix multibuffer from linux @ 2025-01-15 16:22 Laurent Vivier 2025-01-15 22:33 ` Stefano Brivio 0 siblings, 1 reply; 3+ messages in thread From: Laurent Vivier @ 2025-01-15 16:22 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier Under some conditions, linux can provide several buffers in the same element (multiple entries in the iovec array). I didn't identify what changed between the kernel guest that provides one buffer and the one that provides several (doesn't seem to be a kernel change or a configuration change). Fix the following assert: ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg What I can see is the buffer can be splitted in two iovecs: - vnet header - packet data This change manages this special case but the real fix will be to allow tap_add_packet() to manage iovec array. Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- vu_common.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/vu_common.c b/vu_common.c index 6d365bea5fe2..431fba6be0c0 100644 --- a/vu_common.c +++ b/vu_common.c @@ -18,6 +18,8 @@ #include "pcap.h" #include "vu_common.h" +#define VU_MAX_TX_BUFFER_NB 2 + /** * vu_packet_check_range() - Check if a given memory zone is contained in * a mapped guest memory region @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, count = 0; out_sg_count = 0; - while (count < VIRTQUEUE_MAX_SIZE) { + while (count < VIRTQUEUE_MAX_SIZE && + out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) { int ret; - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); + elem[count].out_num = VU_MAX_TX_BUFFER_NB; + elem[count].out_sg = &out_sg[out_sg_count]; + elem[count].in_num = 0; + elem[count].in_sg = NULL; + ret = vu_queue_pop(vdev, vq, &elem[count]); if (ret < 0) break; @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, warn("virtio-net transmit queue contains no out buffers"); break; } - ASSERT(elem[count].out_num == 1); + if (elem[count].out_num == 1) { + tap_add_packet(vdev->context, + elem[count].out_sg[0].iov_len - hdrlen, + (char *)elem[count].out_sg[0].iov_base + + hdrlen); + } else { + /* vnet header can be in a separate iovec */ + ASSERT(elem[count].out_num == 2); + ASSERT(elem[count].out_sg[0].iov_len == (size_t)hdrlen); + tap_add_packet(vdev->context, + elem[count].out_sg[1].iov_len, + (char *)elem[count].out_sg[1].iov_base); + } - tap_add_packet(vdev->context, - elem[count].out_sg[0].iov_len - hdrlen, - (char *)elem[count].out_sg[0].iov_base + hdrlen); count++; } tap_handler(vdev->context, now); -- 2.47.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] vhost_user: fix multibuffer from linux 2025-01-15 16:22 [PATCH] vhost_user: fix multibuffer from linux Laurent Vivier @ 2025-01-15 22:33 ` Stefano Brivio 2025-01-15 23:51 ` David Gibson 0 siblings, 1 reply; 3+ messages in thread From: Stefano Brivio @ 2025-01-15 22:33 UTC (permalink / raw) To: Laurent Vivier; +Cc: passt-dev On Wed, 15 Jan 2025 17:22:30 +0100 Laurent Vivier <lvivier@redhat.com> wrote: > Under some conditions, linux can provide several buffers > in the same element (multiple entries in the iovec array). > > I didn't identify what changed between the kernel guest that > provides one buffer and the one that provides several > (doesn't seem to be a kernel change or a configuration change). Perhaps memory pressure, or different page accounting between kernels? > Fix the following assert: > > ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg > > What I can see is the buffer can be splitted in two iovecs: > - vnet header > - packet data > > This change manages this special case but the real fix will be to allow > tap_add_packet() to manage iovec array. > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> Applied. I just wonder, if it makes sense as a follow-up: > --- > vu_common.c | 28 ++++++++++++++++++++++------ > 1 file changed, 22 insertions(+), 6 deletions(-) > > diff --git a/vu_common.c b/vu_common.c > index 6d365bea5fe2..431fba6be0c0 100644 > --- a/vu_common.c > +++ b/vu_common.c > @@ -18,6 +18,8 @@ > #include "pcap.h" > #include "vu_common.h" > > +#define VU_MAX_TX_BUFFER_NB 2 > + > /** > * vu_packet_check_range() - Check if a given memory zone is contained in > * a mapped guest memory region > @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > > count = 0; > out_sg_count = 0; > - while (count < VIRTQUEUE_MAX_SIZE) { > + while (count < VIRTQUEUE_MAX_SIZE && > + out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) { > int ret; > > - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); > + elem[count].out_num = VU_MAX_TX_BUFFER_NB; > + elem[count].out_sg = &out_sg[out_sg_count]; > + elem[count].in_num = 0; > + elem[count].in_sg = NULL; > + > ret = vu_queue_pop(vdev, vq, &elem[count]); > if (ret < 0) > break; > @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > warn("virtio-net transmit queue contains no out buffers"); > break; > } > - ASSERT(elem[count].out_num == 1); > + if (elem[count].out_num == 1) { > + tap_add_packet(vdev->context, > + elem[count].out_sg[0].iov_len - hdrlen, > + (char *)elem[count].out_sg[0].iov_base + > + hdrlen); > + } else { > + /* vnet header can be in a separate iovec */ > + ASSERT(elem[count].out_num == 2); I suppose we don't have strong guarantees about this. What about discarding the packet with a debug() message, at least until we have a more elegant solution, if this happens? For UDP and ICMP, that's the best thing we can do. For TCP, we could just discard a part of it, and the peer would tell our guest, but it's surely not practical to look into the packet here, so dropping it altogether would look reasonable. > + ASSERT(elem[count].out_sg[0].iov_len == (size_t)hdrlen); And similarly here (with an err() message), even though there's probably an issue in the hypervisor if this happens, but it doesn't mean we're doomed. > + tap_add_packet(vdev->context, > + elem[count].out_sg[1].iov_len, > + (char *)elem[count].out_sg[1].iov_base); > + } > > - tap_add_packet(vdev->context, > - elem[count].out_sg[0].iov_len - hdrlen, > - (char *)elem[count].out_sg[0].iov_base + hdrlen); > count++; > } > tap_handler(vdev->context, now); -- Stefano ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] vhost_user: fix multibuffer from linux 2025-01-15 22:33 ` Stefano Brivio @ 2025-01-15 23:51 ` David Gibson 0 siblings, 0 replies; 3+ messages in thread From: David Gibson @ 2025-01-15 23:51 UTC (permalink / raw) To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev [-- Attachment #1: Type: text/plain, Size: 4652 bytes --] On Wed, Jan 15, 2025 at 11:33:02PM +0100, Stefano Brivio wrote: > On Wed, 15 Jan 2025 17:22:30 +0100 > Laurent Vivier <lvivier@redhat.com> wrote: > > > Under some conditions, linux can provide several buffers > > in the same element (multiple entries in the iovec array). > > > > I didn't identify what changed between the kernel guest that > > provides one buffer and the one that provides several > > (doesn't seem to be a kernel change or a configuration change). > > Perhaps memory pressure, or different page accounting between kernels? > > > Fix the following assert: > > > > ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg > > > > What I can see is the buffer can be splitted in two iovecs: > > - vnet header > > - packet data > > > > This change manages this special case but the real fix will be to allow > > tap_add_packet() to manage iovec array. > > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > > Applied. So, we obviously want this as a short term fix. However, this seems like it's still not very robust. IIUC, kernel in theory could arbitrarily split the packet, not just breaking it neatly at the header boundary. I think we should try to handle this more generally. > I just wonder, if it makes sense as a follow-up: > > > --- > > vu_common.c | 28 ++++++++++++++++++++++------ > > 1 file changed, 22 insertions(+), 6 deletions(-) > > > > diff --git a/vu_common.c b/vu_common.c > > index 6d365bea5fe2..431fba6be0c0 100644 > > --- a/vu_common.c > > +++ b/vu_common.c > > @@ -18,6 +18,8 @@ > > #include "pcap.h" > > #include "vu_common.h" > > > > +#define VU_MAX_TX_BUFFER_NB 2 > > + > > /** > > * vu_packet_check_range() - Check if a given memory zone is contained in > > * a mapped guest memory region > > @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > > > > count = 0; > > out_sg_count = 0; > > - while (count < VIRTQUEUE_MAX_SIZE) { > > + while (count < VIRTQUEUE_MAX_SIZE && > > + out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) { > > int ret; > > > > - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); > > + elem[count].out_num = VU_MAX_TX_BUFFER_NB; > > + elem[count].out_sg = &out_sg[out_sg_count]; > > + elem[count].in_num = 0; > > + elem[count].in_sg = NULL; > > + > > ret = vu_queue_pop(vdev, vq, &elem[count]); > > if (ret < 0) > > break; > > @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > > warn("virtio-net transmit queue contains no out buffers"); > > break; > > } > > - ASSERT(elem[count].out_num == 1); > > + if (elem[count].out_num == 1) { > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[0].iov_len - hdrlen, > > + (char *)elem[count].out_sg[0].iov_base + > > + hdrlen); > > + } else { > > + /* vnet header can be in a separate iovec */ > > + ASSERT(elem[count].out_num == 2); > > I suppose we don't have strong guarantees about this. What about > discarding the packet with a debug() message, at least until we have a > more elegant solution, if this happens? > > For UDP and ICMP, that's the best thing we can do. > > For TCP, we could just discard a part of it, and the peer would tell > our guest, but it's surely not practical to look into the packet here, > so dropping it altogether would look reasonable. > > > + ASSERT(elem[count].out_sg[0].iov_len == (size_t)hdrlen); > > And similarly here (with an err() message), even though there's probably > an issue in the hypervisor if this happens, but it doesn't mean we're > doomed. Right. If the kernel is splitting every packet this way, then we probably are doomed, but we don't know it for certain. In general ASSERT()s should indicate a definite bug in *our* code, not unexpected behaviour from something we interact with, so err() and dropping the packet would be more appropriate IMO. > > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[1].iov_len, > > + (char *)elem[count].out_sg[1].iov_base); > > + } > > > > - tap_add_packet(vdev->context, > > - elem[count].out_sg[0].iov_len - hdrlen, > > - (char *)elem[count].out_sg[0].iov_base + hdrlen); > > count++; > > } > > tap_handler(vdev->context, now); > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-15 23:51 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-01-15 16:22 [PATCH] vhost_user: fix multibuffer from linux Laurent Vivier 2025-01-15 22:33 ` Stefano Brivio 2025-01-15 23:51 ` David Gibson
Code repositories for project(s) associated with this public inbox https://passt.top/passt This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for IMAP folder(s).