From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ge1JZZg9; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id AA0335A0272 for ; Tue, 24 Mar 2026 09:04:48 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1774339487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=MFLhJ9jwF6U85Kyl3x6m/aF6Kkhx5UHdBxkeTw1LZuQ=; b=ge1JZZg98LYKcMW5BzyoT35vX31UEz0M/dVhPHZVz0jIeZeVSIqcF+efmGHuLvMus+RDwQ DuxnrTqxOALBx+guFITBXnirkrFfWC+BpTL/TrPHCyQ+CKRrQ21D5k73YiOLbWM3t6zAcI rKf+fBzehFMF21hTHmTfsA4MDTgspnM= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-518-obKeG3uPMdePJMEwQpdfbQ-1; Tue, 24 Mar 2026 04:04:46 -0400 X-MC-Unique: obKeG3uPMdePJMEwQpdfbQ-1 X-Mimecast-MFC-AGG-ID: obKeG3uPMdePJMEwQpdfbQ_1774339485 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-43b47c19ed8so1231031f8f.1 for ; Tue, 24 Mar 2026 01:04:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774339485; x=1774944285; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MFLhJ9jwF6U85Kyl3x6m/aF6Kkhx5UHdBxkeTw1LZuQ=; b=Hd0/JH5f3fFG9jHpKWUeuy1mb+SKJZqW3qPUm67K2QmQNMPfti4F6rwmXParJRMs9s 3wEcbuyMcuLLOIryHWD5eLKKJng+jGF2EPoUo9CbL6SCJU/xYXXvWRriS1VK7nYy5Zb5 PSNI1Qg8cTJ8eMmOt75Ss7GBRLkSAhD+yifxIS7xjaXYzyWNrsVFqKBxoVgNLgWI6eDM mmq27hbAeAYuYaUjDDzspPdGsthd69nn/Tpoa6kGvME8Ky9/3FSsa7HrhuQCabG4g+e4 C8cs7iGk1yADbIy2z3qnP8lM0GWXVYrKzp4Gzt6puEMP3eecpBem7P6JuhnbEs8feoQA mIxg== X-Gm-Message-State: AOJu0YxYpv2hB4cG88MybINxWGubJRQdngGK2zAPCah0aSgLWNmoIK/Y TFbWEgjXj1vponIlSekkj6BjoPGkmaPAVTCupSsFYmOzlAOuao/l7y4rm0k3XSj86H69KxYHthw VpSKV9yF26QwBLoe4lqbfz+X2ECP+3sXnB6AeJMYayuSXdNU20MfgJA== X-Gm-Gg: ATEYQzxIl5ZeZcRaxfHzp+65nLwSMAZFsAu+rtz3EGFzVI13gNCX/6DG+CXE/ReCOAX LHJm1phde5xFpQktRbjZ0UVZt4WrFLeon2NIth3vEtYJM7J/LfJ46TPiapnfX0WmD799jgRnVUn QwmAZJUpH5jlEr2PP8vBTHH9bRGqhjF4058vFJPuP55YBVd8k7guJvq4FR8ZxMa/sYq38SLVQuU wnPMYZZDYSBeIt6N62WGw9jBTRtbahecoqCKczPkuosDejZr8rUVBZvt5P07wChUHXi+SR5+Q+u pzpgnsyAXYebd77aBcTgJhCuAEImin5Lk7ls6EyD3xcCVyUbcTaibERpVXHB61yR+BW7+K9J7D5 1z6/DoI32hEV8XBklHjK4DK953vdu0twuhas+VDowRkCBrGZJPQURA/ujWgMiFSJcNw== X-Received: by 2002:a05:6000:40cc:b0:43b:54c9:85f4 with SMTP id ffacd0b85a97d-43b64279621mr24129795f8f.39.1774339484542; Tue, 24 Mar 2026 01:04:44 -0700 (PDT) X-Received: by 2002:a05:6000:40cc:b0:43b:54c9:85f4 with SMTP id ffacd0b85a97d-43b64279621mr24129728f8f.39.1774339483856; Tue, 24 Mar 2026 01:04:43 -0700 (PDT) Received: from ?IPV6:2a01:e0a:e10:ef90:4326:a36e:a7cb:624b? ([2a01:e0a:e10:ef90:4326:a36e:a7cb:624b]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43b64703c27sm35838686f8f.18.2026.03.24.01.04.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Mar 2026 01:04:42 -0700 (PDT) Message-ID: Date: Tue, 24 Mar 2026 09:04:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 1/5] vhost-user: Centralise Ethernet frame padding in vu_collect(), vu_pad() and vu_flush() To: David Gibson References: <20260323143151.538673-1-lvivier@redhat.com> <20260323143151.538673-2-lvivier@redhat.com> From: Laurent Vivier Autocrypt: addr=lvivier@redhat.com; keydata= xsFNBFYFJhkBEAC2me7w2+RizYOKZM+vZCx69GTewOwqzHrrHSG07MUAxJ6AY29/+HYf6EY2 WoeuLWDmXE7A3oJoIsRecD6BXHTb0OYS20lS608anr3B0xn5g0BX7es9Mw+hV/pL+63EOCVm SUVTEQwbGQN62guOKnJJJfphbbv82glIC/Ei4Ky8BwZkUuXd7d5NFJKC9/GDrbWdj75cDNQx UZ9XXbXEKY9MHX83Uy7JFoiFDMOVHn55HnncflUncO0zDzY7CxFeQFwYRbsCXOUL9yBtqLer Ky8/yjBskIlNrp0uQSt9LMoMsdSjYLYhvk1StsNPg74+s4u0Q6z45+l8RAsgLw5OLtTa+ePM JyS7OIGNYxAX6eZk1+91a6tnqfyPcMbduxyBaYXn94HUG162BeuyBkbNoIDkB7pCByed1A7q q9/FbuTDwgVGVLYthYSfTtN0Y60OgNkWCMtFwKxRaXt1WFA5ceqinN/XkgA+vf2Ch72zBkJL RBIhfOPFv5f2Hkkj0MvsUXpOWaOjatiu0fpPo6Hw14UEpywke1zN4NKubApQOlNKZZC4hu6/ 8pv2t4HRi7s0K88jQYBRPObjrN5+owtI51xMaYzvPitHQ2053LmgsOdN9EKOqZeHAYG2SmRW LOxYWKX14YkZI5j/TXfKlTpwSMvXho+efN4kgFvFmP6WT+tPnwARAQABzSNMYXVyZW50IFZp dmllciA8bHZpdmllckByZWRoYXQuY29tPsLBeAQTAQIAIgUCVgVQgAIbAwYLCQgHAwIGFQgC CQoLBBYCAwECHgECF4AACgkQ8ww4vT8vvjwpgg//fSGy0Rs/t8cPFuzoY1cex4limJQfReLr SJXCANg9NOWy/bFK5wunj+h/RCFxIFhZcyXveurkBwYikDPUrBoBRoOJY/BHK0iZo7/WQkur 6H5losVZtrotmKOGnP/lJYZ3H6OWvXzdz8LL5hb3TvGOP68K8Bn8UsIaZJoeiKhaNR0sOJyI YYbgFQPWMHfVwHD/U+/gqRhD7apVysxv5by/pKDln1I5v0cRRH6hd8M8oXgKhF2+rAOL7gvh jEHSSWKUlMjC7YwwjSZmUkL+TQyE18e2XBk85X8Da3FznrLiHZFHQ/NzETYxRjnOzD7/kOVy gKD/o7asyWQVU65mh/ECrtjfhtCBSYmIIVkopoLaVJ/kEbVJQegT2P6NgERC/31kmTF69vn8 uQyW11Hk8tyubicByL3/XVBrq4jZdJW3cePNJbTNaT0d/bjMg5zCWHbMErUib2Nellnbg6bc 2HLDe0NLVPuRZhHUHM9hO/JNnHfvgiRQDh6loNOUnm9Iw2YiVgZNnT4soUehMZ7au8PwSl4I KYE4ulJ8RRiydN7fES3IZWmOPlyskp1QMQBD/w16o+lEtY6HSFEzsK3o0vuBRBVp2WKnssVH qeeV01ZHw0bvWKjxVNOksP98eJfWLfV9l9e7s6TaAeySKRRubtJ+21PRuYAxKsaueBfUE7ZT 7zfOwU0EVgUmGQEQALxSQRbl/QOnmssVDxWhHM5TGxl7oLNJms2zmBpcmlrIsn8nNz0rRyxT 460k2niaTwowSRK8KWVDeAW6ZAaWiYjLlTunoKwvF8vP3JyWpBz0diTxL5o+xpvy/Q6YU3BN efdq8Vy3rFsxgW7mMSrI/CxJ667y8ot5DVugeS2NyHfmZlPGE0Nsy7hlebS4liisXOrN3jFz asKyUws3VXek4V65lHwB23BVzsnFMn/bw/rPliqXGcwl8CoJu8dSyrCcd1Ibs0/Inq9S9+t0 VmWiQWfQkz4rvEeTQkp/VfgZ6z98JRW7S6l6eophoWs0/ZyRfOm+QVSqRfFZdxdP2PlGeIFM C3fXJgygXJkFPyWkVElr76JTbtSHsGWbt6xUlYHKXWo+xf9WgtLeby3cfSkEchACrxDrQpj+ Jt/JFP+q997dybkyZ5IoHWuPkn7uZGBrKIHmBunTco1+cKSuRiSCYpBIXZMHCzPgVDjk4viP brV9NwRkmaOxVvye0vctJeWvJ6KA7NoAURplIGCqkCRwg0MmLrfoZnK/gRqVJ/f6adhU1oo6 z4p2/z3PemA0C0ANatgHgBb90cd16AUxpdEQmOCmdNnNJF/3Zt3inzF+NFzHoM5Vwq6rc1JP jfC3oqRLJzqAEHBDjQFlqNR3IFCIAo4SYQRBdAHBCzkM4rWyRhuVABEBAAHCwV8EGAECAAkF AlYFJhkCGwwACgkQ8ww4vT8vvjwg9w//VQrcnVg3TsjEybxDEUBm8dBmnKqcnTBFmxN5FFtI WlEuY8+YMiWRykd8Ln9RJ/98/ghABHz9TN8TRo2b6WimV64FmlVn17Ri6FgFU3xNt9TTEChq AcNg88eYryKsYpFwegGpwUlaUaaGh1m9OrTzcQy+klVfZWaVJ9Nw0keoGRGb8j4XjVpL8+2x OhXKrM1fzzb8JtAuSbuzZSQPDwQEI5CKKxp7zf76J21YeRrEW4WDznPyVcDTa+tz++q2S/Bp P4W98bXCBIuQgs2m+OflERv5c3Ojldp04/S4NEjXEYRWdiCxN7ca5iPml5gLtuvhJMSy36gl U6IW9kn30IWuSoBpTkgV7rLUEhh9Ms82VWW/h2TxL8enfx40PrfbDtWwqRID3WY8jLrjKfTd R3LW8BnUDNkG+c4FzvvGUs8AvuqxxyHbXAfDx9o/jXfPHVRmJVhSmd+hC3mcQ+4iX5bBPBPM oDqSoLt5w9GoQQ6gDVP2ZjTWqwSRMLzNr37rJjZ1pt0DCMMTbiYIUcrhX8eveCJtY7NGWNyx FCRkhxRuGcpwPmRVDwOl39MB3iTsRighiMnijkbLXiKoJ5CDVvX5yicNqYJPKh5MFXN1bvsB kmYiStMRbrD0HoY1kx5/VozBtc70OU0EB8Wrv9hZD+Ofp0T3KOr1RUHvCZoLURfFhSQ= In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: HVyAjR5XPp1F0vVPkffrVqr6DxmTpVae3mvcRbGeVVA_1774339485 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: VAOL3TPTCCT7NOBRTFXX6EANJD43WPN4 X-Message-ID-Hash: VAOL3TPTCCT7NOBRTFXX6EANJD43WPN4 X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 3/24/26 02:56, David Gibson wrote: > On Mon, Mar 23, 2026 at 03:31:47PM +0100, Laurent Vivier wrote: >> The per-protocol padding done by vu_pad() in tcp_vu.c and udp_vu.c was >> only correct for single-buffer frames, and assumed the padding area always >> fell within the first iov. It also relied on each caller computing the >> right MAX(..., ETH_ZLEN + VNET_HLEN) size for vu_collect() and calling >> vu_pad() at the right point. >> >> Centralise padding logic into three shared vhost-user helpers instead: >> >> - vu_collect() now ensures at least ETH_ZLEN + VNET_HLEN bytes of buffer >> space are collected, so there is always room for a minimum-sized frame. >> >> - vu_pad() replaces the old single-iov helper with a new implementation >> that takes a full iovec array plus a 'skipped' byte count. It uses a >> new iov_memset() helper in iov.c to zero-fill the padding area across >> iovec boundaries, then calls iov_truncate() to set the logical frame >> size. >> >> - vu_flush() computes the actual frame length (accounting for >> VIRTIO_NET_F_MRG_RXBUF multi-buffer frames) and passes the padded >> length to vu_queue_fill(). >> >> Callers in tcp_vu.c, udp_vu.c and vu_send_single() now use the new >> vu_pad() in place of the old pad-then-truncate sequences and the >> MAX(..., ETH_ZLEN + VNET_HLEN) size calculations passed to vu_collect(). >> >> Centralising padding here will also ease the move to multi-iovec per >> element support, since there will be a single place to update. >> >> In vu_send_single(), fix padding, truncation and data copy to use the >> requested frame size rather than the total available buffer space from >> vu_collect(), which could be larger. Also add matching padding, truncation >> and explicit size to vu_collect() for the DUP_ACK path in >> tcp_vu_send_flag(). > > Something I'm struggling to reason about throughout these series is > the distinction between three things. I'm going to call them 1) > "available buffer" - the space we've gotten for the guest into which > we can potentially put data, 2) "used buffer" - the space we actually > filled with data or padding and 3) "packet" - the space filled with > the actual frame data, not counting padding. > > Actually, maybe there's 4: 1) the set of buffers we get from > vu_collect() and 0) the buffers vu_collect() gathered before it > truncated them to the requseted size. > > We juggle iov variables that at various points could be one of those > three, and I'm often not clear on which one it's representing at a > given moment. An iov_truncate() can change from (1) to (2) or (2) to > (3) safely, but moving in the other direction is never _obviously_ > safe - it relies on knowing where this specific iov came from and how > it was allocated. > > Stefano and I were discussing last night whether an explicit and > consistent terminology for those three things through the code might > help. > >> >> Signed-off-by: Laurent Vivier >> --- >> iov.c | 26 ++++++++++++++++++ >> iov.h | 2 ++ >> tcp_vu.c | 22 +++++----------- >> udp_vu.c | 9 ++----- >> vu_common.c | 76 ++++++++++++++++++++++++++++++++++++----------------- >> vu_common.h | 2 +- >> 6 files changed, 90 insertions(+), 47 deletions(-) >> >> diff --git a/iov.c b/iov.c >> index ae0743931d18..8134b8c9f988 100644 >> --- a/iov.c >> +++ b/iov.c >> @@ -170,6 +170,32 @@ size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size) >> return i; >> } >> >> +/** >> + * iov_memset() - Set bytes of an IO vector to a given value >> + * @iov: IO vector >> + * @iov_cnt: Number of elements in @iov >> + * @offset: Byte offset in the iovec at which to start >> + * @c: Byte value to fill with >> + * @length: Number of bytes to set >> + * Will write less than @length bytes if it runs out of space in >> + * the iov >> + */ >> +void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c, >> + size_t length) >> +{ >> + size_t i; >> + >> + i = iov_skip_bytes(iov, iov_cnt, offset, &offset); >> + >> + for ( ; i < iov_cnt && length; i++) { >> + size_t n = MIN(iov[i].iov_len - offset, length); >> + >> + memset((char *)iov[i].iov_base + offset, c, n); >> + offset = 0; >> + length -= n; >> + } >> +} >> + >> /** >> * iov_tail_prune() - Remove any unneeded buffers from an IOV tail >> * @tail: IO vector tail (modified) >> diff --git a/iov.h b/iov.h >> index b4e50b0fca5a..d295d05b3bab 100644 >> --- a/iov.h >> +++ b/iov.h >> @@ -30,6 +30,8 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, >> size_t offset, void *buf, size_t bytes); >> size_t iov_size(const struct iovec *iov, size_t iov_cnt); >> size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size); >> +void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c, >> + size_t length); >> >> /* >> * DOC: Theory of Operation, struct iov_tail >> diff --git a/tcp_vu.c b/tcp_vu.c >> index dc0e17c0f03f..3001defb5467 100644 >> --- a/tcp_vu.c >> +++ b/tcp_vu.c >> @@ -72,12 +72,12 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) >> struct vu_dev *vdev = c->vdev; >> struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; >> struct vu_virtq_element flags_elem[2]; >> - size_t optlen, hdrlen, l2len; >> struct ipv6hdr *ip6h = NULL; >> struct iphdr *ip4h = NULL; >> struct iovec flags_iov[2]; >> struct tcp_syn_opts *opts; >> struct iov_tail payload; >> + size_t optlen, hdrlen; >> struct tcphdr *th; >> struct ethhdr *eh; >> uint32_t seq; >> @@ -89,7 +89,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) >> >> elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1, >> &flags_iov[0], 1, NULL, >> - MAX(hdrlen + sizeof(*opts), ETH_ZLEN + VNET_HLEN), NULL); >> + hdrlen + sizeof(*opts), NULL); > > So vu_collect() now takes an upper bound on (3) as the parameter, > instead of an upper bound on (2). Seems reasonable, but it does have > the side effect that this value is no longer an upper bound on (1). > >> if (elem_cnt != 1) >> return -1; >> >> @@ -131,7 +131,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) >> return ret; >> } >> >> - iov_truncate(&flags_iov[0], 1, hdrlen + optlen); >> + vu_pad(&flags_iov[0], 1, 0, hdrlen + optlen); >> payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen); >> >> if (flags & KEEPALIVE) >> @@ -140,9 +140,6 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) >> tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload, >> NULL, seq, !*c->pcap); >> >> - l2len = optlen + hdrlen - VNET_HLEN; >> - vu_pad(&flags_elem[0].in_sg[0], l2len); >> - >> if (*c->pcap) >> pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN); >> nb_ack = 1; >> @@ -150,10 +147,11 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) >> if (flags & DUP_ACK) { >> elem_cnt = vu_collect(vdev, vq, &flags_elem[1], 1, >> &flags_iov[1], 1, NULL, >> - flags_elem[0].in_sg[0].iov_len, NULL); >> + hdrlen + optlen, NULL); >> if (elem_cnt == 1 && >> flags_elem[1].in_sg[0].iov_len >= >> flags_elem[0].in_sg[0].iov_len) { >> + vu_pad(&flags_iov[1], 1, 0, hdrlen + optlen); >> memcpy(flags_elem[1].in_sg[0].iov_base, >> flags_elem[0].in_sg[0].iov_base, >> flags_elem[0].in_sg[0].iov_len); >> @@ -213,7 +211,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, >> ARRAY_SIZE(elem) - elem_cnt, >> &iov_vu[DISCARD_IOV_NUM + iov_used], >> VIRTQUEUE_MAX_SIZE - iov_used, &in_total, >> - MAX(MIN(mss, fillsize) + hdrlen, ETH_ZLEN + VNET_HLEN), >> + MIN(mss, fillsize) + hdrlen, >> &frame_size); >> if (cnt == 0) >> break; >> @@ -249,8 +247,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, >> if (!peek_offset_cap) >> ret -= already_sent; >> >> - /* adjust iov number and length of the last iov */ >> - i = iov_truncate(&iov_vu[DISCARD_IOV_NUM], iov_used, ret); >> + i = vu_pad(&iov_vu[DISCARD_IOV_NUM], iov_used, hdrlen, ret); >> >> /* adjust head count */ >> while (*head_cnt > 0 && head[*head_cnt - 1] >= i) >> @@ -446,7 +443,6 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) >> size_t frame_size = iov_size(iov, buf_cnt); >> bool push = i == head_cnt - 1; >> ssize_t dlen; >> - size_t l2len; >> >> assert(frame_size >= hdrlen); >> >> @@ -460,10 +456,6 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) >> >> tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push); >> >> - /* Pad first/single buffer only, it's at least ETH_ZLEN long */ >> - l2len = dlen + hdrlen - VNET_HLEN; >> - vu_pad(iov, l2len); >> - >> if (*c->pcap) >> pcap_iov(iov, buf_cnt, VNET_HLEN); >> >> diff --git a/udp_vu.c b/udp_vu.c >> index cc69654398f0..6a1e1696b19f 100644 >> --- a/udp_vu.c >> +++ b/udp_vu.c >> @@ -73,8 +73,7 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s, >> const struct vu_dev *vdev = c->vdev; >> int elem_cnt, elem_used, iov_used; >> struct msghdr msg = { 0 }; >> - size_t hdrlen, l2len; >> - size_t iov_cnt; >> + size_t iov_cnt, hdrlen; >> >> assert(!c->no_udp); >> >> @@ -117,13 +116,9 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s, >> iov_vu[0].iov_base = (char *)iov_vu[0].iov_base - hdrlen; >> iov_vu[0].iov_len += hdrlen; >> >> - iov_used = iov_truncate(iov_vu, iov_cnt, *dlen + hdrlen); >> + iov_used = vu_pad(iov_vu, iov_cnt, 0, *dlen + hdrlen); >> elem_used = iov_used; /* one iovec per element */ >> >> - /* pad frame to 60 bytes: first buffer is at least ETH_ZLEN long */ >> - l2len = *dlen + hdrlen - VNET_HLEN; >> - vu_pad(&iov_vu[0], l2len); >> - >> vu_set_vnethdr(iov_vu[0].iov_base, elem_used); >> >> /* release unused buffers */ >> diff --git a/vu_common.c b/vu_common.c >> index 92381cd33c85..808eb2b69281 100644 >> --- a/vu_common.c >> +++ b/vu_common.c >> @@ -74,6 +74,7 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq, >> size_t current_iov = 0; >> int elem_cnt = 0; >> >> + size = MAX(size, ETH_ZLEN + VNET_HLEN); /* Ethernet minimum size */ >> while (current_size < size && elem_cnt < max_elem && >> current_iov < max_in_sg) { >> int ret; >> @@ -113,6 +114,25 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq, >> return elem_cnt; >> } >> >> +/** >> + * vu_pad() - Pad short frames to minimum Ethernet length and truncate iovec >> + * @iov: Pointer to iovec array >> + * @cnt: Number of entries in @iov >> + * @skipped: Bytes already accounted for in the frame but not in @iov >> + * @size: Data length in @iov >> + * >> + * Return: number of iovec entries after truncation >> + */ >> +size_t vu_pad(struct iovec *iov, size_t cnt, size_t skipped, size_t size) >> +{ >> + if (skipped + size < ETH_ZLEN + VNET_HLEN) { >> + iov_memset(iov, cnt, size, 0, >> + ETH_ZLEN + VNET_HLEN - (skipped + size)); >> + } >> + >> + return iov_truncate(iov, cnt, size); >> +} >> + >> /** >> * vu_set_vnethdr() - set virtio-net headers >> * @vnethdr: Address of the header to set >> @@ -137,12 +157,32 @@ void vu_set_vnethdr(struct virtio_net_hdr_mrg_rxbuf *vnethdr, int num_buffers) >> void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq, >> struct vu_virtq_element *elem, int elem_cnt) >> { >> - int i; >> - >> - for (i = 0; i < elem_cnt; i++) { >> - size_t elem_size = iov_size(elem[i].in_sg, elem[i].in_num); >> - >> - vu_queue_fill(vdev, vq, &elem[i], elem_size, i); >> + int i, j, num_elements; >> + >> + for (i = 0; i < elem_cnt; i += num_elements) { >> + const struct virtio_net_hdr_mrg_rxbuf *vnethdr; >> + size_t len, padding, elem_size; >> + >> + vnethdr = elem[i].in_sg[0].iov_base; >> + num_elements = le16toh(vnethdr->num_buffers); >> + >> + len = 0; >> + for (j = 0; j < num_elements - 1; j++) { >> + elem_size = iov_size(elem[i + j].in_sg, >> + elem[i + j].in_num); >> + vu_queue_fill(vdev, vq, &elem[i + j], >> + elem_size, i + j); >> + len += elem_size; >> + } >> + /* pad the last element to have an Ethernet minimum size */ >> + elem_size = iov_size(elem[i + j].in_sg, elem[i + j].in_num); >> + if (ETH_ZLEN + VNET_HLEN > len + elem_size) >> + padding = ETH_ZLEN + VNET_HLEN - (len + elem_size); >> + else >> + padding = 0; >> + >> + vu_queue_fill(vdev, vq, &elem[i + j], elem_size + padding, > > So here we go from (3) to (2) which as discussed requires a very broad > understanding to knwo is safe. > > > Hrm.. I might have missed something, but would this approach work? > The idea is to always delay iov truncation to the last possible > moment, so we never have to try reversing it. > * vu_collect() an upper bound on data length, as in this patch > * vu_collect() pads that to at least ETH_ZLEN, again as in this patch > * vu_collect() doesn't do an iov_truncate(), it's explicitly allowed > to return more buffer space than you asked for (effectively, it > returns (0)) > * Protocols never iov_truncate(), but keep track of the data length > in a separate variable > * vu_flush() takes the data length in addition to the other > parameters. It does the pad and truncation immediately before > putting the iov in the queue > Yes, it's another way to pad the frame. I didn't do like that because I didn't want to track the data size separately. But you're right it's cleaner and I think it makes the code more robust. I'm updating the series in this way. Thanks, Laurent