On Wed, Jan 15, 2025 at 11:33:02PM +0100, Stefano Brivio wrote: > On Wed, 15 Jan 2025 17:22:30 +0100 > Laurent Vivier wrote: > > > Under some conditions, linux can provide several buffers > > in the same element (multiple entries in the iovec array). > > > > I didn't identify what changed between the kernel guest that > > provides one buffer and the one that provides several > > (doesn't seem to be a kernel change or a configuration change). > > Perhaps memory pressure, or different page accounting between kernels? > > > Fix the following assert: > > > > ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg > > > > What I can see is the buffer can be splitted in two iovecs: > > - vnet header > > - packet data > > > > This change manages this special case but the real fix will be to allow > > tap_add_packet() to manage iovec array. > > > > Signed-off-by: Laurent Vivier > > Applied. So, we obviously want this as a short term fix. However, this seems like it's still not very robust. IIUC, kernel in theory could arbitrarily split the packet, not just breaking it neatly at the header boundary. I think we should try to handle this more generally. > I just wonder, if it makes sense as a follow-up: > > > --- > > vu_common.c | 28 ++++++++++++++++++++++------ > > 1 file changed, 22 insertions(+), 6 deletions(-) > > > > diff --git a/vu_common.c b/vu_common.c > > index 6d365bea5fe2..431fba6be0c0 100644 > > --- a/vu_common.c > > +++ b/vu_common.c > > @@ -18,6 +18,8 @@ > > #include "pcap.h" > > #include "vu_common.h" > > > > +#define VU_MAX_TX_BUFFER_NB 2 > > + > > /** > > * vu_packet_check_range() - Check if a given memory zone is contained in > > * a mapped guest memory region > > @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > > > > count = 0; > > out_sg_count = 0; > > - while (count < VIRTQUEUE_MAX_SIZE) { > > + while (count < VIRTQUEUE_MAX_SIZE && > > + out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) { > > int ret; > > > > - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); > > + elem[count].out_num = VU_MAX_TX_BUFFER_NB; > > + elem[count].out_sg = &out_sg[out_sg_count]; > > + elem[count].in_num = 0; > > + elem[count].in_sg = NULL; > > + > > ret = vu_queue_pop(vdev, vq, &elem[count]); > > if (ret < 0) > > break; > > @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, > > warn("virtio-net transmit queue contains no out buffers"); > > break; > > } > > - ASSERT(elem[count].out_num == 1); > > + if (elem[count].out_num == 1) { > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[0].iov_len - hdrlen, > > + (char *)elem[count].out_sg[0].iov_base + > > + hdrlen); > > + } else { > > + /* vnet header can be in a separate iovec */ > > + ASSERT(elem[count].out_num == 2); > > I suppose we don't have strong guarantees about this. What about > discarding the packet with a debug() message, at least until we have a > more elegant solution, if this happens? > > For UDP and ICMP, that's the best thing we can do. > > For TCP, we could just discard a part of it, and the peer would tell > our guest, but it's surely not practical to look into the packet here, > so dropping it altogether would look reasonable. > > > + ASSERT(elem[count].out_sg[0].iov_len == (size_t)hdrlen); > > And similarly here (with an err() message), even though there's probably > an issue in the hypervisor if this happens, but it doesn't mean we're > doomed. Right. If the kernel is splitting every packet this way, then we probably are doomed, but we don't know it for certain. In general ASSERT()s should indicate a definite bug in *our* code, not unexpected behaviour from something we interact with, so err() and dropping the packet would be more appropriate IMO. > > > + tap_add_packet(vdev->context, > > + elem[count].out_sg[1].iov_len, > > + (char *)elem[count].out_sg[1].iov_base); > > + } > > > > - tap_add_packet(vdev->context, > > - elem[count].out_sg[0].iov_len - hdrlen, > > - (char *)elem[count].out_sg[0].iov_base + hdrlen); > > count++; > > } > > tap_handler(vdev->context, now); > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson