From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=iJB3LdPz; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id D49665A0265 for ; Mon, 11 May 2026 11:50:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778493033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xQcd0nIluKY42pl497HTHBDekIIXnv6AU/x4rODvgAo=; b=iJB3LdPzjRFpaOA4kWSblks2rwX/VN3sdDEgWp09d2B/MF6+ul5bj2uUlQvHKfmGLxHrlx L68eCeci3dR/nlOl6WN4tos4zjMQGF1GRO/vW51IxxDDzOyTEwsADviLFyenaqp6TN1fIj Z/B9nTDQ0rSKFNltLZo/F5+NFJdHGGA= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-IzFmM36zN5uUmiH5FYUvCg-1; Mon, 11 May 2026 05:50:32 -0400 X-MC-Unique: IzFmM36zN5uUmiH5FYUvCg-1 X-Mimecast-MFC-AGG-ID: IzFmM36zN5uUmiH5FYUvCg_1778493032 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-909120afff9so146845385a.1 for ; Mon, 11 May 2026 02:50:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778493032; x=1779097832; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xQcd0nIluKY42pl497HTHBDekIIXnv6AU/x4rODvgAo=; b=esREO+ZWXg6Yu2SfzYrsFya0uihlY2VlFp+qMt3KoRY/NLusPLmVtAD6ch7VY2382F aulbku0sJNVwCtvUfUTj9ElXQk2w0QfX51hxEXD3lA/IQz8EBvFh1d/nzzXeMMVIEQj+ I5FRcDW/xkpsZmxqe2odh4JNH9GUdZf1ACJQxonMJSzMVsiFt1U3taHrn17ca+UMqN1W KeQ97UE2OCSlzskRkiguYTKWw0L9K0V35OVlH52X/szxjNPiI3BevGpbIFzRViNefWTi uKo7TJGYiqydLzosHhK5tpqmqor0OYHCwOhMAcU1dPDpJSiV+HrNzs8Mj7k5m0kAAP/X /oyw== X-Forwarded-Encrypted: i=1; AFNElJ8E73TGJBeyILptq3eamdnXulDot0yd6d3SV+JJKQB/ZA/PDUtICwzPeoPNVoJHppmWqA/ge1gMFWg=@passt.top X-Gm-Message-State: AOJu0Yz0BE/mqIipf6I60RzZXQ8xV6PuNGH7B5KxUV7rVpVbyerlBP/o +6ycPskc3OibLYBw5QnEod4AhiuzK/IDF/SyAR1WLXfZeg9IRQ9ino5Pw5Ze3ap/+8/sTSQ+30S KIqvsBl4z23DeUEowD+hbZ1gHhKxLUyOnnL1evWki94T08RK0hE7uog== X-Gm-Gg: Acq92OFnbFnBJKGqWAEAO/R8M2nNKRNFaS+srlHwbZWhVF815GXP6NtLSOc+KOcxXQw /twm6SG9ibma2XNTz3vB+ETFCVNqBqIEQ0PKWB6++BMPAPsWl4n6DY4FqxuvB911PokctsOEGIa knZ9MRc2nZK2hfGzpCnbc06Dx8yxw+4N8iBqJIvFai7jHo1QdBaP9SywdqPywwA6IFrvNkXqbfu txifuo5uuivRfX09mGllpH2GMGYQFXRwMvIlNlbKDexs9D8nqiaTThqzQripLpcmKVrOR+t5bCE XUcaEP0wVcbOwrLcDw9kAmn9omdJSusOZcp8DYHUw7649A/90ybrDssSZWYWVj6GSFPvkzb80TY PP4FbFjnM/18yIiUSgqzEFsmNo6RLDYSJ5pmTVotEpCCmDHSf+iRvy5/rJeEwcR+oWeo2VaLi1t 4h6pWzZs7R40Kn X-Received: by 2002:a05:620a:664a:b0:909:c82b:9755 with SMTP id af79cd13be357-909c82ba054mr710514685a.10.1778493031467; Mon, 11 May 2026 02:50:31 -0700 (PDT) X-Received: by 2002:a05:620a:664a:b0:909:c82b:9755 with SMTP id af79cd13be357-909c82ba054mr710512285a.10.1778493030789; Mon, 11 May 2026 02:50:30 -0700 (PDT) Received: from [192.168.2.15] (lnsm4-toronto63-142-116-28-118.internet.virginmobile.ca. [142.116.28.118]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8fc2cd04ddesm3069171685a.43.2026.05.11.02.50.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 May 2026 02:50:30 -0700 (PDT) Message-ID: Date: Mon, 11 May 2026 05:50:29 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 07/10] pcap: Pass explicit L2 length to pcap_iov() To: Laurent Vivier , passt-dev@passt.top References: <20260416155721.3807225-1-lvivier@redhat.com> <20260416155721.3807225-8-lvivier@redhat.com> From: Jon Maloy In-Reply-To: <20260416155721.3807225-8-lvivier@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ttawxSztU46OZRe19xr6qPr2KmjIJWjw-kjFVZk5Ne4_1778493032 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: ZKGDPU3EZFEODPH2M6ZLYEJ3YIR4RF3L X-Message-ID-Hash: ZKGDPU3EZFEODPH2M6ZLYEJ3YIR4RF3L X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2026-04-16 11:57, Laurent Vivier wrote: > With vhost-user multibuffer frames, the iov can be larger than the > actual L2 frame. The previous approach of computing L2 length as > iov_size() - offset would overcount and write extra bytes into the > pcap file. > > Pass the L2 frame length explicitly to pcap_frame() and pcap_iov(), > and write exactly that many bytes instead of the full iov remainder. > > Signed-off-by: Laurent Vivier Reviewed-by: Jon Maloy See below. > --- > pcap.c | 28 +++++++++++++++++++--------- > pcap.h | 2 +- > tap.c | 6 ++++-- > tcp_vu.c | 14 ++++++++------ [...] > @@ -109,8 +115,11 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, > if (clock_gettime(CLOCK_REALTIME, &now)) > err_perror("Failed to get CLOCK_REALTIME time"); > > - for (i = 0; i < n; i++) > - pcap_frame(iov + i * frame_parts, frame_parts, offset, &now); > + for (i = 0; i < n; i++) { > + pcap_frame(iov + i * frame_parts, frame_parts, offset, > + iov_size(iov + i * frame_parts, frame_parts) - offset, > + &now); > + } > } > > /** > @@ -120,8 +129,9 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, > * containing packet data to write, including L2 header > * @iovcnt: Number of buffers (@iov entries) > * @offset: Offset of the L2 frame within the full data length > + * @l2len: Length of L2 frame data to capture > */ > -void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset) > +void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset, size_t l2len) This line should be wrapped. /jon > { > struct timespec now = { 0 }; > > @@ -131,7 +141,7 @@ void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset) > if (clock_gettime(CLOCK_REALTIME, &now)) > err_perror("Failed to get CLOCK_REALTIME time"); > > - pcap_frame(iov, iovcnt, offset, &now); > + pcap_frame(iov, iovcnt, offset, l2len, &now); > } > > /** > diff --git a/pcap.h b/pcap.h > index dface5df4ee6..c171257cbd73 100644 > --- a/pcap.h > +++ b/pcap.h > @@ -13,7 +13,7 @@ extern int pcap_fd; > void pcap(const char *pkt, size_t l2len); > void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, > size_t offset); > -void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset); > +void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset, size_t l2len); > void pcap_init(struct ctx *c); > > #endif /* PCAP_H */ > diff --git a/tap.c b/tap.c > index 41a61a36c279..41ba2b8666a5 100644 > --- a/tap.c > +++ b/tap.c > @@ -500,7 +500,8 @@ static size_t tap_send_frames_passt(const struct ctx *c, > /* Number of unsent or partially sent buffers for the frame */ > size_t rembufs = bufs_per_frame - (i % bufs_per_frame); > > - if (write_remainder(c->fd_tap, &iov[i], rembufs, buf_offset) < 0) { > + if (write_remainder(c->fd_tap, &iov[i], rembufs, buf_offset, > + SIZE_MAX) < 0) { > err_perror("tap: partial frame send"); > return i; > } > @@ -1102,10 +1103,11 @@ void tap_handler(struct ctx *c, const struct timespec *now) > void tap_add_packet(struct ctx *c, struct iov_tail *data, > const struct timespec *now) > { > + size_t l2len = iov_tail_size(data); > struct ethhdr eh_storage; > const struct ethhdr *eh; > > - pcap_iov(data->iov, data->cnt, data->off); > + pcap_iov(data->iov, data->cnt, data->off, l2len); > > eh = IOV_PEEK_HEADER(data, eh_storage); > if (!eh) > diff --git a/tcp_vu.c b/tcp_vu.c > index 0cd01190d612..7b7ea9c789b1 100644 > --- a/tcp_vu.c > +++ b/tcp_vu.c > @@ -128,7 +128,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > return ret; > } > > - iov_truncate(&flags_iov[0], 1, hdrlen + optlen); > + l2len = hdrlen + optlen - VNET_HLEN; > + iov_truncate(&flags_iov[0], 1, l2len + VNET_HLEN); > payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen); > > if (flags & KEEPALIVE) > @@ -137,13 +138,12 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload, > NULL, seq, !*c->pcap); > > - l2len = optlen + hdrlen - VNET_HLEN; > vu_pad(&flags_elem[0].in_sg[0], l2len); > > vu_flush(vdev, vq, flags_elem, 1); > > if (*c->pcap) > - pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN); > + pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN, l2len); > > if (flags & DUP_ACK) { > elem_cnt = vu_collect(vdev, vq, &flags_elem[1], 1, > @@ -158,8 +158,10 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > > vu_flush(vdev, vq, &flags_elem[1], 1); > > - if (*c->pcap) > - pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN); > + if (*c->pcap) { > + pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN, > + l2len); > + } > } > } > vu_queue_notify(vdev, vq); > @@ -464,7 +466,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > vu_flush(vdev, vq, &elem[head[i]], buf_cnt); > > if (*c->pcap) > - pcap_iov(iov, buf_cnt, VNET_HLEN); > + pcap_iov(iov, buf_cnt, VNET_HLEN, l2len); > > conn->seq_to_tap += dlen; > } > diff --git a/udp_vu.c b/udp_vu.c > index 1a73d997f683..76242d423778 100644 > --- a/udp_vu.c > +++ b/udp_vu.c > @@ -182,6 +182,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx) > static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE]; > struct vu_dev *vdev = c->vdev; > struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; > + size_t hdrlen = udp_vu_hdrlen(v6); > int i; > > assert(!c->no_udp); > @@ -227,7 +228,8 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx) > udp_vu_prepare(c, iov_vu, toside, dlen); > if (*c->pcap) { > udp_vu_csum(toside, iov_vu, iov_cnt, dlen); > - pcap_iov(iov_vu, iov_cnt, VNET_HLEN); > + pcap_iov(iov_vu, iov_cnt, VNET_HLEN, > + hdrlen + dlen - VNET_HLEN); > } > vu_flush(vdev, vq, elem, elem_used); > vu_queue_notify(vdev, vq); > diff --git a/util.c b/util.c > index 73c9d51d7b4a..141aad275869 100644 > --- a/util.c > +++ b/util.c > @@ -722,31 +722,54 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, > * @iov: IO vector > * @iovcnt: Number of entries in @iov > * @skip: Number of bytes of the vector to skip writing > + * @length: Number of bytes of the vector to write > * > * Return: 0 on success, -1 on error (with errno set) > * > * #syscalls writev > */ > -int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip) > +int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, > + size_t skip, size_t length) > { > size_t i = 0, offset; > > - while ((i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) { > + while (length && > + (i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) { > ssize_t rc; > + size_t end; > > if (offset) { > + size_t len = MIN(length, iov[i].iov_len - offset); > + > /* Write the remainder of the partially written buffer */ > if (write_all_buf(fd, (char *)iov[i].iov_base + offset, > - iov[i].iov_len - offset) < 0) > + len) < 0) > return -1; > + > + length -= len; > i++; > + > + if (!length || i >= iovcnt) > + break; > + } > + > + end = iov_skip_bytes(iov + i, iovcnt - i, length, NULL); > + > + /* Write a trailing partial buffer */ > + if (!end) { > + size_t len = MIN(length, iov[i].iov_len); > + > + if (write_all_buf(fd, iov[i].iov_base, len) < 0) > + return -1; > + break; > } > > /* Write as much of the remaining whole buffers as we can */ > - rc = writev(fd, &iov[i], iovcnt - i); > + rc = writev(fd, &iov[i], end); > if (rc < 0) > return -1; > > + length -= rc; > skip = rc; > } > return 0; > diff --git a/util.h b/util.h > index 92aeabc86b52..888093277896 100644 > --- a/util.h > +++ b/util.h > @@ -235,7 +235,8 @@ int fls(unsigned long x); > int ilog2(unsigned long x); > int write_file(const char *path, const char *buf); > intmax_t read_file_integer(const char *path, intmax_t fallback); > -int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip); > +int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, > + size_t skip, size_t length); > int read_remainder(int fd, const struct iovec *iov, size_t cnt, size_t skip); > void close_open_files(int argc, char **argv); > bool snprintf_check(char *str, size_t size, const char *format, ...); > diff --git a/vu_common.c b/vu_common.c > index 57949ca32309..f254cb67ec78 100644 > --- a/vu_common.c > +++ b/vu_common.c > @@ -268,7 +268,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size) > iov_from_buf(in_sg, in_total, VNET_HLEN, buf, total); > > if (*c->pcap) > - pcap_iov(in_sg, in_total, VNET_HLEN); > + pcap_iov(in_sg, in_total, VNET_HLEN, size); > > vu_flush(vdev, vq, elem, elem_cnt); > vu_queue_notify(vdev, vq);