public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Laurent Vivier <lvivier@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v9 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()
Date: Tue, 19 May 2026 10:03:39 +0200	[thread overview]
Message-ID: <933734ef-5d4c-40dd-9fee-bb8f182f0921@redhat.com> (raw)
In-Reply-To: <agvwS0pUl8iAdJWy@zatzit>

On 5/19/26 07:08, David Gibson wrote:
> On Mon, May 18, 2026 at 02:26:25PM +0200, Laurent Vivier wrote:
>> Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
>> pointing at the UDP frame instead of a contiguous udp_payload_t buffer
>> and explicit data length.  This lets vhost-user pass scatter-gather
>> virtqueue buffers directly without an intermediate copy.
>>
>> The UDP header is built into a local struct udphdr and written back with
>> IOV_PUSH_HEADER().  On the tap side, udp_tap_prepare() wraps the
>> existing udp_payload_t in a two-element iov to match the new interface.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> 
> Alas, this still has a potentially aliased memcpy(), see below.

Did you see I updated patch 2 to avoid the memcpy() if the header is already in place?
(I think I should have removed your R-b...)

Thanks,
Laurent

> 
>> ---
>>   iov.c          |  1 -
>>   udp.c          | 73 ++++++++++++++++++++++---------------------
>>   udp_internal.h |  4 +--
>>   udp_vu.c       | 85 ++++++++++++++++++++++++++------------------------
>>   4 files changed, 85 insertions(+), 78 deletions(-)
>>
>> diff --git a/iov.c b/iov.c
>> index 6a5d7d35b67f..9248ba95a9f2 100644
>> --- a/iov.c
>> +++ b/iov.c
>> @@ -367,7 +367,6 @@ void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
>>    *
>>    * Return: number of bytes written
>>    */
>> -/* cppcheck-suppress unusedFunction */
>>   size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len)
>>   {
>>   	size_t l;
>> diff --git a/udp.c b/udp.c
>> index 66dc7766868c..8ea8ac848dab 100644
>> --- a/udp.c
>> +++ b/udp.c
>> @@ -255,20 +255,21 @@ static void udp_iov_init(const struct ctx *c)
>>   /**
>>    * udp_update_hdr4() - Update headers for one IPv4 datagram
>>    * @ip4h:		Pre-filled IPv4 header (except for tot_len and saddr)
>> - * @bp:			Pointer to udp_payload_t to update
>> + * @payload:		iov_tail including UDP header on entry, excluding it on exit
> 
> It's not necessarily essential to correctness, but I think things will
> be cleaner if you make this take @uh (as a pointer) and @payload
> (excluding the UDP header) as separate parameters.
> 
> For starters @payload can then be const and won't cover different
> things on entry and exit.
> 
>>    * @toside:		Flowside for destination side
>>    * @dlen:		Length of UDP payload
>>    * @no_udp_csum:	Do not set UDP checksum
>>    *
>> - * Return: size of IPv4 payload (UDP header + data)
>> + * Return: size of datagram (UDP header + data)
>>    */
>> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>> +size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
>>   		       const struct flowside *toside, size_t dlen,
>>   		       bool no_udp_csum)
>>   {
>>   	const struct in_addr *src = inany_v4(&toside->oaddr);
>>   	const struct in_addr *dst = inany_v4(&toside->eaddr);
>> -	size_t l4len = dlen + sizeof(bp->uh);
>> +	struct udphdr uh;
>> +	size_t l4len = dlen + sizeof(uh);
>>   	size_t l3len = l4len + sizeof(*ip4h);
>>   
>>   	assert(src && dst);
>> @@ -278,19 +279,18 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>>   	ip4h->saddr = src->s_addr;
>>   	ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
>>   
>> -	bp->uh.source = htons(toside->oport);
>> -	bp->uh.dest = htons(toside->eport);
>> -	bp->uh.len = htons(l4len);
>> +	uh.source = htons(toside->oport);
>> +	uh.dest = htons(toside->eport);
>> +	uh.len = htons(l4len);
>>   	if (no_udp_csum) {
>> -		bp->uh.check = 0;
>> +		uh.check = 0;
>>   	} else {
>> -		const struct iovec iov = {
>> -			.iov_base = bp->data,
>> -			.iov_len = dlen
>> -		};
>> -		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
>> -		csum_udp4(&bp->uh, *src, *dst, &data, dlen);
>> +		struct iov_tail data = *payload;
>> +
>> +		IOV_DROP_HEADER(&data, struct udphdr);
> 
> This is dropped.
> 
>> +		csum_udp4(&uh, *src, *dst, &data, dlen);
>>   	}
>> +	IOV_PUSH_HEADER(payload, uh);
> 
> As is this.
> 
>>   
>>   	return l4len;
>>   }
>> @@ -299,18 +299,19 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>>    * udp_update_hdr6() - Update headers for one IPv6 datagram
>>    * @ip6h:		Pre-filled IPv6 header (except for payload_len and
>>    * 			addresses)
>> - * @bp:			Pointer to udp_payload_t to update
>> + * @payload:		iov_tail including UDP header on entry, excluding it on exit
> 
> Same for IPv6.
> 
>>    * @toside:		Flowside for destination side
>>    * @dlen:		Length of UDP payload
>>    * @no_udp_csum:	Do not set UDP checksum
>>    *
>> - * Return: size of IPv6 payload (UDP header + data)
>> + * Return: size of datagram (UDP header + data)
>>    */
>> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>> +size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
>>   		       const struct flowside *toside, size_t dlen,
>>   		       bool no_udp_csum)
>>   {
>> -	uint16_t l4len = dlen + sizeof(bp->uh);
>> +	struct udphdr uh;
>> +	uint16_t l4len = dlen + sizeof(uh);
>>   
>>   	ip6h->payload_len = htons(l4len);
>>   	ip6h->daddr = toside->eaddr.a6;
>> @@ -319,24 +320,24 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>>   	ip6h->nexthdr = IPPROTO_UDP;
>>   	ip6h->hop_limit = 255;
>>   
>> -	bp->uh.source = htons(toside->oport);
>> -	bp->uh.dest = htons(toside->eport);
>> -	bp->uh.len = ip6h->payload_len;
>> +	uh.source = htons(toside->oport);
>> +	uh.dest = htons(toside->eport);
>> +	uh.len = htons(l4len);
>> +
>>   	if (no_udp_csum) {
>>   		/* 0 is an invalid checksum for UDP IPv6 and dropped by
>>   		 * the kernel stack, even if the checksum is disabled by virtio
>>   		 * flags. We need to put any non-zero value here.
>>   		 */
>> -		bp->uh.check = 0xffff;
>> +		uh.check = 0xffff;
>>   	} else {
>> -		const struct iovec iov = {
>> -			.iov_base = bp->data,
>> -			.iov_len = dlen
>> -		};
>> -		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
>> -		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
>> -			  dlen);
>> +		struct iov_tail data = *payload;
>> +
>> +		IOV_DROP_HEADER(&data, struct udphdr);
> 
> This is dropped.
> 
>> +		csum_udp6(&uh, &toside->oaddr.a6, &toside->eaddr.a6,
>> +			  &data, dlen);
>>   	}
>> +	IOV_PUSH_HEADER(payload, uh);
> 
> And this.
> 
>>   
>>   	return l4len;
>>   }
>> @@ -372,15 +373,18 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>>   			    bool no_udp_csum)
>>   {
>>   	struct iovec (*tap_iov)[UDP_NUM_IOVS] = &udp_l2_iov[idx];
>> +	struct iov_tail payload = IOV_TAIL(&(*tap_iov)[UDP_IOV_PAYLOAD], 1, 0);
> 
> You can still construct an iov_tail with just the UDP payload here,
> either from tap_iov, or from mmh[idx].msg_hdr.msg_iov[].
> 
> Likewise you can obtain a suitable uh pointer easily.
> 
> 
>>   	struct ethhdr *eh = (*tap_iov)[UDP_IOV_ETH].iov_base;
>> -	struct udp_payload_t *bp = &udp_payload[idx];
>>   	struct udp_meta_t *bm = &udp_meta[idx];
>>   	size_t l4len, l2len;
>>   
>> +	l4len = sizeof(struct udphdr) + mmh[idx].msg_len;
>> +	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
>> +
>>   	eth_update_mac(eh, NULL, tap_omac);
>>   	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
>> -		l4len = udp_update_hdr6(&bm->ip6h, bp, toside,
>> -					mmh[idx].msg_len, no_udp_csum);
>> +		udp_update_hdr6(&bm->ip6h, &payload, toside, mmh[idx].msg_len,
>> +			        no_udp_csum);
> 
> So this will still work.
> 
> 
>>   		l2len = MAX(l4len + sizeof(bm->ip6h) + ETH_HLEN, ETH_ZLEN);
>>   		tap_hdr_update(&bm->taph, l2len);
>> @@ -388,8 +392,8 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>>   		eh->h_proto = htons_constant(ETH_P_IPV6);
>>   		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip6h);
>>   	} else {
>> -		l4len = udp_update_hdr4(&bm->ip4h, bp, toside,
>> -					mmh[idx].msg_len, no_udp_csum);
>> +		udp_update_hdr4(&bm->ip4h, &payload, toside, mmh[idx].msg_len,
>> +				no_udp_csum);
> 
> As will this.
> 
>>   
>>   		l2len = MAX(l4len + sizeof(bm->ip4h) + ETH_HLEN, ETH_ZLEN);
>>   		tap_hdr_update(&bm->taph, l2len);
>> @@ -397,7 +401,6 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>>   		eh->h_proto = htons_constant(ETH_P_IP);
>>   		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip4h);
>>   	}
>> -	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
>>   
>>   	udp_tap_pad(*tap_iov);
>>   }
>> diff --git a/udp_internal.h b/udp_internal.h
>> index 64e457748324..e6cbaab79519 100644
>> --- a/udp_internal.h
>> +++ b/udp_internal.h
>> @@ -25,10 +25,10 @@ struct udp_payload_t {
>>   } __attribute__ ((packed, aligned(__alignof__(unsigned int))));
>>   #endif
>>   
>> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>> +size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
>>   		       const struct flowside *toside, size_t dlen,
>>   		       bool no_udp_csum);
>> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>> +size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
>>   		       const struct flowside *toside, size_t dlen,
>>   		       bool no_udp_csum);
>>   void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
>> diff --git a/udp_vu.c b/udp_vu.c
>> index 74bf79d57969..36543f75638d 100644
>> --- a/udp_vu.c
>> +++ b/udp_vu.c
>> @@ -98,69 +98,73 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
>>   /**
>>    * udp_vu_prepare() - Prepare the packet header
>>    * @c:		Execution context
>> - * @iov:	IO vector for the frame (including vnet header)
>> + * @data:	IO vector tail for the L2 frame, on return points to the L4 header
>>    * @toside:	Address information for one side of the flow
>>    * @dlen:	Packet data length
>>    */
>> -static void udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
>> -			     const struct flowside *toside, ssize_t dlen)
>> +static void udp_vu_prepare(const struct ctx *c, struct iov_tail *data,
>> +			     const struct flowside *toside, size_t dlen)
>>   {
>> -	struct ethhdr *eh;
>> +	bool ipv4 = inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr);
>> +	struct ethhdr eh;
> 
> Create a local buffer for the UDP header.
> 
> In addition to @data we'll want another iov_tail as a parameter.
> I'll, call it @payload since it will be const and contain only the UDP
> payload.  You already built a suitable iov array for the recvmsg(), so
> @payload can be constructed from that.
> 
>>   
>>   	/* ethernet header */
>> -	eh = vu_eth(iov[0].iov_base);
>> +	memcpy(eh.h_dest, c->guest_mac, sizeof(eh.h_dest));
>> +	memcpy(eh.h_source, c->our_tap_mac, sizeof(eh.h_source));
>>   
>> -	memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
>> -	memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
>> +	if (ipv4)
>> +		eh.h_proto = htons(ETH_P_IP);
>> +	else
>> +		eh.h_proto = htons(ETH_P_IPV6);
>> +	IOV_PUSH_HEADER(data, eh);
>>   
>>   	/* initialize header */
>> -	if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
>> -		struct iphdr *iph = vu_ip(iov[0].iov_base);
>> -		struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
>> -
>> -		eh->h_proto = htons(ETH_P_IP);
>> +	if (ipv4) {
>> +		struct iov_tail datagram;
>> +		struct iphdr iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
>>   
>> -		*iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
>> +		datagram = *data;
>> +		IOV_DROP_HEADER(&datagram, struct iphdr);
>> +		udp_update_hdr4(&iph, &datagram, toside, dlen, true);
> 
> At the moment this is rather odd, because we skip ahead, push the UDP
> header inside udp_update_hdr4(), then rewind to push the IP header.
> If udp_update_hdr4() takes uh and @payload separately, it's more
> straightforward.  It will update both iph and uh, currently local
> buffers.  Then we can push iph followed by uh to @data.
> 
>>   
>> -		udp_update_hdr4(iph, bp, toside, dlen, true);
>> +		IOV_PUSH_HEADER(data, iph);
>>   	} else {
>> -		struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
>> -		struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
>> -
>> -		eh->h_proto = htons(ETH_P_IPV6);
>> +		struct iov_tail datagram;
>> +		struct ipv6hdr ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
>>   
>> -		*ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
>> +		datagram = *data;
>> +		IOV_DROP_HEADER(&datagram, struct ipv6hdr);
>> +		udp_update_hdr6(&ip6h, &datagram, toside, dlen, true);
> 
> Similar for IPv6.
> 
>>   
>> -		udp_update_hdr6(ip6h, bp, toside, dlen, true);
>> +		IOV_PUSH_HEADER(data, ip6h);
>>   	}
>>   }
>>   
>>   /**
>>    * udp_vu_csum() - Calculate and set checksum for a UDP packet
>>    * @toside:	Address information for one side of the flow
>> - * @iov:	IO vector for the frame
>> - * @cnt:	Number of IO vector entries
>> + * @data:	IO vector tail including UDP header on entry, excluding it on exit
>>    * @dlen:	Data length
>>    */
>> -static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
>> -			size_t cnt, size_t dlen)
>> +static void udp_vu_csum(const struct flowside *toside, struct iov_tail *data,
>> +			size_t dlen)
>>   {
>>   	const struct in_addr *src4 = inany_v4(&toside->oaddr);
>>   	const struct in_addr *dst4 = inany_v4(&toside->eaddr);
>> -	char *base = iov[0].iov_base;
>> -	struct udp_payload_t *bp;
>> -	struct iov_tail data;
>> -
>> -	if (src4 && dst4) {
>> -		bp = vu_payloadv4(base);
>> -		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
>> -		csum_udp4(&bp->uh, *src4, *dst4, &data, dlen);
>> -	} else {
>> -		bp = vu_payloadv6(base);
>> -		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
>> -		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
>> -			  dlen);
>> -	}
>> +	struct iov_tail current = *data;
>> +	struct udphdr *uh, uh_storage;
>> +	bool ipv4 = src4 && dst4;
>> +
>> +	uh = IOV_REMOVE_HEADER(&current, uh_storage);
>> +	if (!uh)
>> +		return;
>> +
>> +	if (ipv4)
>> +		csum_udp4(uh, *src4, *dst4, &current, dlen);
>> +	else
>> +		csum_udp6(uh, &toside->oaddr.a6, &toside->eaddr.a6, &current, dlen);
>> +
>> +	IOV_PUSH_HEADER(data, *uh);
> 
> Here's the aliased memcpy.  uh points within data, and this will
> attempt to copy it back into the same place.  Better to have uh be a
> local buffer.
> 
>>   }
>>   
>>   /**
>> @@ -227,9 +231,10 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
>>   		vu_queue_rewind(vq, elem_cnt - elem_used);
>>   
>>   		if (iov_cnt > 0) {
>> -			udp_vu_prepare(c, iov_vu, toside, dlen);
>> +			struct iov_tail data = IOV_TAIL(iov_vu, iov_cnt, VNET_HLEN);
>> +			udp_vu_prepare(c, &data, toside, dlen);
>>   			if (*c->pcap) {
>> -				udp_vu_csum(toside, iov_vu, iov_cnt, dlen);
>> +				udp_vu_csum(toside, &data, dlen);
>>   				pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
>>   					 hdrlen + dlen - VNET_HLEN);
>>   			}
>> -- 
>> 2.54.0
>>
> 


      reply	other threads:[~2026-05-19  8:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 12:26 [PATCH v9 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-05-18 12:26 ` [PATCH v9 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
2026-05-18 12:26 ` [PATCH v9 2/3] iov: Introduce IOV_PUSH_HEADER() macro Laurent Vivier
2026-05-18 12:26 ` [PATCH v9 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2026-05-19  5:08   ` David Gibson
2026-05-19  8:03     ` Laurent Vivier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=933734ef-5d4c-40dd-9fee-bb8f182f0921@redhat.com \
    --to=lvivier@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).