From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=jx1c2+Yq; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 0D8AD5A026D for ; Wed, 25 Mar 2026 06:07:20 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1774415237; bh=ajJs/Kqu2uUlp4xIS1vFyWL1NIvBsuJEBE7oUwBo+WQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jx1c2+YqgAXxz4vuMgjYFUjPNbGSE4bkJRnSUolivpIbVFYSQdv2gGrvIWE5Fx7+8 qC/5aT7LaPRoPX0gJTQY3+10mDit0cEBizb6YKZ4Kcl4be/LIdIxgwK7/8/4WV9oVy YNgZzRJaM3MB+YERBRL6DryRy8SCVSccZKIF5P6Zqsol3eecKAhmfVGxNT0sYzbIa4 I0CiX10WU8dU/VYBHvgyK8s27Zh/v+faDi6Z9arENmzvP+wj5aNRqEiEdRJFUEC6SH U5oJt+JeCzxZ0taFcE5cdcN9xtaHTm5xh2XtDyYPtuDzvkJc4TIdoGNpObjJ3PXiAl /GqHVB/pplnyg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4fgZdx2N0dz4wH2; Wed, 25 Mar 2026 16:07:17 +1100 (AEDT) Date: Wed, 25 Mar 2026 16:06:14 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 4/7] tcp_vu: Support multibuffer frames in tcp_vu_sock_recv() Message-ID: References: <20260323165259.1253482-1-lvivier@redhat.com> <20260323165259.1253482-5-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0f0pTBzw+8YWHyyR" Content-Disposition: inline In-Reply-To: <20260323165259.1253482-5-lvivier@redhat.com> Message-ID-Hash: FBNYXJRP4U5FN4HTUZXZ7ZNSZXWDQP3B X-Message-ID-Hash: FBNYXJRP4U5FN4HTUZXZ7ZNSZXWDQP3B X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0f0pTBzw+8YWHyyR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 23, 2026 at 05:52:56PM +0100, Laurent Vivier wrote: > Previously, tcp_vu_sock_recv() assumed a 1:1 mapping between virtqueue > elements and iovecs (one iovec per element), enforced by an ASSERT. > This prevented the use of virtqueue elements with multiple buffers > (e.g. when mergeable rx buffers are not negotiated and headers are > provided in a separate buffer). >=20 > Introduce a struct vu_frame to track per-frame metadata: the range of > elements and iovecs that make up each frame, and the frame's total size. > This replaces the head[] array which only tracked element indices. >=20 > A separate iov_msg[] array is built for recvmsg() by cloning the data > portions (after stripping headers) using iov_tail helpers. >=20 > Then a frame truncation after recvmsg() properly walks the frame and > element arrays to adjust iovec counts and element counts. >=20 > Signed-off-by: Laurent Vivier > --- > tcp_vu.c | 161 ++++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 107 insertions(+), 54 deletions(-) >=20 > diff --git a/tcp_vu.c b/tcp_vu.c > index a39f6ea018e9..96cd9da1caae 100644 > --- a/tcp_vu.c > +++ b/tcp_vu.c > @@ -35,9 +35,24 @@ > #include "vu_common.h" > #include > =20 > -static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE + DISCARD_IOV_NUM]; > +static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE]; > static struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE]; > -static int head[VIRTQUEUE_MAX_SIZE + 1]; > + > +/** > + * struct vu_frame - Descriptor for a TCP frame mapped to virtqueue elem= ents > + * @idx_element: Index of first element in elem[] for this frame > + * @num_element: Number of virtqueue elements used by this frame > + * @idx_iovec: Index of first iovec in iov_vu[] for this frame > + * @num_iovec: Number of iovecs covering this frame's buffers > + * @size: Total frame size including all headers "all headers" is a bit context dependent. I assume it includes the Ethernet header, but I'm not sure about virtio_net_hdr_mrg_rxbuf. If it doesn't this should be called l2len. If it does.. maybe we need to invent a term. Usually I wouldn't consider the "frame" to include like mrg_rxbuf (I'd consider that a "hw" descriptor followed by the frame). > + */ > +static struct vu_frame { > + int idx_element; > + int num_element; > + int idx_iovec; > + int num_iovec; > + size_t size; > +} frame[VIRTQUEUE_MAX_SIZE]; > =20 > /** > * tcp_vu_hdrlen() - Sum size of all headers, from TCP to virtio-net > @@ -173,8 +188,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_= tap_conn *conn, int flags) > * @v6: Set for IPv6 connections > * @already_sent: Number of bytes already sent > * @fillsize: Maximum bytes to fill in guest-side receiving window > - * @iov_cnt: number of iov (output) > - * @head_cnt: Pointer to store the count of head iov entries (output) > + * @elem_used: number of element (output) > + * @frame_cnt: Pointer to store the number of frames (output) > * > * Return: number of bytes received from the socket, or a negative error= code > * on failure. > @@ -182,58 +197,77 @@ int tcp_vu_send_flag(const struct ctx *c, struct tc= p_tap_conn *conn, int flags) > static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, > const struct tcp_tap_conn *conn, bool v6, > uint32_t already_sent, size_t fillsize, > - int *iov_cnt, int *head_cnt) > + int *elem_used, int *frame_cnt) > { > + static struct iovec iov_msg[VIRTQUEUE_MAX_SIZE + DISCARD_IOV_NUM]; What's the rationale for making this static? > const struct vu_dev *vdev =3D c->vdev; > struct msghdr mh_sock =3D { 0 }; > uint16_t mss =3D MSS_GET(conn); > size_t hdrlen, iov_used; > int s =3D conn->sock; > + ssize_t ret, dlen; > int elem_cnt; > - ssize_t ret; > - int i; > - > - *iov_cnt =3D 0; > + int i, j; > =20 > hdrlen =3D tcp_vu_hdrlen(v6); > =20 > + *elem_used =3D 0; > + > iov_used =3D 0; > elem_cnt =3D 0; > - *head_cnt =3D 0; > + *frame_cnt =3D 0; > while (fillsize > 0 && elem_cnt < ARRAY_SIZE(elem) && > - iov_used < VIRTQUEUE_MAX_SIZE) { > - size_t frame_size, dlen, in_total; > - struct iovec *iov; > + iov_used < ARRAY_SIZE(iov_vu) && > + *frame_cnt < ARRAY_SIZE(frame)) { > + size_t frame_size, in_total; > int cnt; > =20 > cnt =3D vu_collect(vdev, vq, &elem[elem_cnt], > ARRAY_SIZE(elem) - elem_cnt, > - &iov_vu[DISCARD_IOV_NUM + iov_used], > - VIRTQUEUE_MAX_SIZE - iov_used, &in_total, > + &iov_vu[iov_used], > + ARRAY_SIZE(iov_vu) - iov_used, &in_total, > MIN(mss, fillsize) + hdrlen, > &frame_size); > if (cnt =3D=3D 0) > break; > - assert((size_t)cnt =3D=3D in_total); /* one iovec per element */ > + > + frame[*frame_cnt].idx_element =3D elem_cnt; > + frame[*frame_cnt].num_element =3D cnt; > + frame[*frame_cnt].idx_iovec =3D iov_used; > + frame[*frame_cnt].num_iovec =3D in_total; > + frame[*frame_cnt].size =3D frame_size; > + (*frame_cnt)++; > =20 > iov_used +=3D in_total; > - dlen =3D frame_size - hdrlen; > + elem_cnt +=3D cnt; > =20 > - /* reserve space for headers in iov */ > - iov =3D &elem[elem_cnt].in_sg[0]; > - assert(iov->iov_len >=3D hdrlen); > + fillsize -=3D frame_size - hdrlen; > + } > =20 > - iov->iov_base =3D (char *)iov->iov_base + hdrlen; > - iov->iov_len -=3D hdrlen; > - head[(*head_cnt)++] =3D elem_cnt; > + /* build an iov array without headers */ > + for (i =3D 0, j =3D DISCARD_IOV_NUM; i < *frame_cnt && > + j < ARRAY_SIZE(iov_msg); i++) { > + struct iov_tail data; > + ssize_t cnt; > =20 > - fillsize -=3D dlen; > - elem_cnt +=3D cnt; > + data =3D IOV_TAIL(&iov_vu[frame[i].idx_iovec], > + frame[i].num_iovec, 0); > + iov_drop_header(&data, hdrlen); > + > + cnt =3D iov_tail_clone(&iov_msg[j], ARRAY_SIZE(iov_msg) - j, > + &data); > + if (cnt =3D=3D -1) > + die("Missing entries in iov_msg"); IIUC that would indicate a bug in the sizing / setup of the arrays, in which case it should be an assert(). > + > + j +=3D cnt; > } > =20 > - if (tcp_prepare_iov(&mh_sock, iov_vu, already_sent, elem_cnt)) > + if (tcp_prepare_iov(&mh_sock, iov_msg, already_sent, > + j - DISCARD_IOV_NUM)) { > /* Expect caller to do a TCP reset */ > + vu_queue_rewind(vq, elem_cnt); > return -1; > + } > =20 > do > ret =3D recvmsg(s, &mh_sock, MSG_PEEK); > @@ -247,28 +281,47 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c= , struct vu_virtq *vq, > if (!peek_offset_cap) > ret -=3D already_sent; > =20 > - i =3D vu_pad(&iov_vu[DISCARD_IOV_NUM], iov_used, hdrlen, ret); > + dlen =3D ret; > =20 > - /* adjust head count */ > - while (*head_cnt > 0 && head[*head_cnt - 1] >=3D i) > - (*head_cnt)--; > + /* truncate frame */ > + *elem_used =3D 0; > + for (i =3D 0; i < *frame_cnt; i++) { > + struct vu_frame *f =3D &frame[i]; > =20 > - /* mark end of array */ > - head[*head_cnt] =3D i; > - *iov_cnt =3D i; > + if ((size_t)ret <=3D f->size - hdrlen) { > + unsigned cnt; > =20 > - /* release unused buffers */ > - vu_queue_rewind(vq, elem_cnt - i); > + cnt =3D vu_pad(&iov_vu[f->idx_iovec], f->num_iovec, > + 0, hdrlen + ret); > =20 > - /* restore space for headers in iov */ > - for (i =3D 0; i < *head_cnt; i++) { > - struct iovec *iov =3D &elem[head[i]].in_sg[0]; > + f->size =3D ret + hdrlen; > + f->num_iovec =3D cnt; > =20 > - iov->iov_base =3D (char *)iov->iov_base - hdrlen; > - iov->iov_len +=3D hdrlen; > + for (j =3D 0; j < f->num_element; j++) { > + struct vu_virtq_element *e; > + > + e =3D &elem[f->idx_element + j]; > + if (cnt <=3D e->in_num) { > + e->in_num =3D cnt; > + j++; > + break; > + } > + cnt -=3D e->in_num; > + } > + f->num_element =3D j; > + *elem_used +=3D j; > + i++; > + break; > + } > + *elem_used +=3D f->num_element; > + ret -=3D f->size - hdrlen; > } > + *frame_cnt =3D i; > =20 > - return ret; > + /* release unused buffers */ > + vu_queue_rewind(vq, elem_cnt - *elem_used); > + > + return dlen; > } > =20 > /** > @@ -341,7 +394,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct= tcp_tap_conn *conn) > struct vu_dev *vdev =3D c->vdev; > struct vu_virtq *vq =3D &vdev->vq[VHOST_USER_RX_QUEUE]; > ssize_t len, previous_dlen; > - int i, iov_cnt, head_cnt; > + int i, elem_cnt, frame_cnt; > size_t hdrlen, fillsize; > int v6 =3D CONN_V6(conn); > uint32_t already_sent; > @@ -381,7 +434,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct= tcp_tap_conn *conn) > * data from the socket > */ > len =3D tcp_vu_sock_recv(c, vq, conn, v6, already_sent, fillsize, > - &iov_cnt, &head_cnt); > + &elem_cnt, &frame_cnt); > if (len < 0) { > if (len !=3D -EAGAIN && len !=3D -EWOULDBLOCK) { > tcp_rst(c, conn); > @@ -395,6 +448,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct= tcp_tap_conn *conn) > } > =20 > if (!len) { > + vu_queue_rewind(vq, elem_cnt); > if (already_sent) { > conn_flag(c, conn, STALLED); > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) =3D=3D > @@ -432,33 +486,32 @@ int tcp_vu_data_from_sock(const struct ctx *c, stru= ct tcp_tap_conn *conn) > */ > =20 > hdrlen =3D tcp_vu_hdrlen(v6); > - for (i =3D 0, previous_dlen =3D -1, check =3D -1; i < head_cnt; i++) { > - struct iovec *iov =3D &elem[head[i]].in_sg[0]; > - int buf_cnt =3D head[i + 1] - head[i]; > - size_t frame_size =3D iov_size(iov, buf_cnt); > - bool push =3D i =3D=3D head_cnt - 1; > + for (i =3D 0, previous_dlen =3D -1, check =3D -1; i < frame_cnt; i++) { > + struct iovec *iov =3D &iov_vu[frame[i].idx_iovec]; > + int iov_cnt =3D frame[i].num_iovec; > + bool push =3D i =3D=3D frame_cnt - 1; > ssize_t dlen; > =20 > - assert(frame_size >=3D hdrlen); > + assert(frame[i].size >=3D hdrlen); > =20 > - dlen =3D frame_size - hdrlen; > - vu_set_vnethdr(iov->iov_base, buf_cnt); > + dlen =3D frame[i].size - hdrlen; > + vu_set_vnethdr(iov->iov_base, frame[i].num_element); > =20 > /* The IPv4 header checksum varies only with dlen */ > if (previous_dlen !=3D dlen) > check =3D -1; > previous_dlen =3D dlen; > =20 > - tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push); > + tcp_vu_prepare(c, conn, iov, iov_cnt, &check, !*c->pcap, push); > =20 > if (*c->pcap) > - pcap_iov(iov, buf_cnt, VNET_HLEN); > + pcap_iov(iov, iov_cnt, VNET_HLEN); > =20 > conn->seq_to_tap +=3D dlen; > } > =20 > /* send packets */ > - vu_flush(vdev, vq, elem, iov_cnt); > + vu_flush(vdev, vq, elem, elem_cnt); > =20 > conn_flag(c, conn, ACK_FROM_TAP_DUE); > =20 > --=20 > 2.53.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --0f0pTBzw+8YWHyyR Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmnDbUUACgkQzQJF27ox 2GfI8hAAox8rcfVtaf6vMVOa24WuSprAgqXLWn5NXs7AyALQlfwAOCWfeHeyVE/L S83d8Hw698WEKKzDAZnNISUWYbOA3Hwocw5vkJn+UFUdf09mFtRozLUE3TcBuOJg VdsqYeQiGoE/zksnQJimgjp503s0rS6R8RNb9sN7FTHTWoFXVPdWUZ0eohT1x20T YNCFAt+VgXrlpboKNziH6LifZ24E1PRSbE3bxWvJN6Qd991Ufy6wzyy2Rg7KHG2q 8r962j+x2joOpo8lASuB/7PONYjXAFjLJgPYut1I4RzVZcX+erwOsOyocNDFWzOl jiNKV0q3qy1EX/XtfX69GkRzM2PZpnEnlbPfG2RV31ZIkajkADnBynR+RlqK2fWx RR9L+Oc2ERW1gsiIlebhGhVf8PhJ7lY5KTfsUjdA3gnbRTBSMJ0Q9NJZUG9NVu8p tOoqLoEmwkr0aBZzq0CacAPSJdoWwMRB6edE3CrfeBMjso+PdoanRf7UFiVC1V+p 7sYr+bSena+Dxh/SYVUtV8/QQQJkfPbyi/zS1mJyRk+W17kmcfm2Sc2s/3XwIEcF SElTKYFv5c36xVdy5RbIgMW+e7zu2I9VigClBG0y1lXadDuH10O1odXeo9FyBQSL ICKw+m1JjS0XjyRDq1hjvhQ7t7qTdzb/uru6UYGIujjtgKGACZE= =s8Qa -----END PGP SIGNATURE----- --0f0pTBzw+8YWHyyR--