public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v4 5/5] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()
Date: Tue, 24 Mar 2026 13:54:03 +1100	[thread overview]
Message-ID: <acH8yzvh8BQ9Y2Na@zatzit> (raw)
In-Reply-To: <20260323143151.538673-6-lvivier@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 13804 bytes --]

On Mon, Mar 23, 2026 at 03:31:51PM +0100, Laurent Vivier wrote:
> Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
> covering the full L3 frame (IP header + UDP header + data), instead of
> separate IP header, udp_payload_t, and data-length parameters.  The
> functions now use with_header() and IOV_DROP_HEADER() to access the IP
> and UDP headers directly from the iov_tail, and derive sizes via
> iov_tail_size() rather than an explicit length argument.
> 
> This decouples the header update functions from the udp_payload_t memory
> layout, which assumes all headers and data sit in a single contiguous
> buffer.  The vhost-user path uses virtqueue-provided scatter-gather
> buffers where this assumption does not hold; passing an iov_tail lets
> both the tap path and the vhost-user path share the same functions
> without layout-specific helpers.
> 
> On the vhost-user side, udp_vu_prepare() likewise switches to
> with_header() for the Ethernet header, and its caller now drops the
> vnet header before calling udp_vu_prepare() instead of having the
> function deal with it internally.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  iov.c          |   1 -
>  udp.c          | 129 +++++++++++++++++++++++++++----------------------
>  udp_internal.h |   6 +--
>  udp_vu.c       |  51 +++++++++----------
>  4 files changed, 97 insertions(+), 90 deletions(-)
> 
> diff --git a/iov.c b/iov.c
> index 7fc9c3c78a32..c1eda9941f32 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -316,7 +316,6 @@ void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
>   *
>   * Return: number of bytes written
>   */
> -/* cppcheck-suppress unusedFunction */
>  size_t iov_put_header_(const struct iov_tail *tail, const void *v, size_t len)
>  {
>  	size_t l = len;
> diff --git a/udp.c b/udp.c
> index 1fc5a42c5ca7..261f2e1b156c 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -254,42 +254,42 @@ static void udp_iov_init(const struct ctx *c)
>  
>  /**
>   * udp_update_hdr4() - Update headers for one IPv4 datagram
> - * @ip4h:		Pre-filled IPv4 header (except for tot_len and saddr)
> - * @bp:			Pointer to udp_payload_t to update

I like removing @bp from this function.  I'm less enthusiastic about
removing @ip4h as a separate parameter.  I kind of like the idea of
having this take @ip4h, @uh and @payload as separate parameters - a
bit like csum_udp4().

I realise however, that that makes the caller awkward, at least with
the current constraints of PUSH_HEADER / with_header().

> + * @payload:		UDP payload

This is no longer accurate - this needs to include the IP and UDP
headers, and so "payload" is not a good name any more.

>   * @toside:		Flowside for destination side
> - * @dlen:		Length of UDP payload
>   * @no_udp_csum:	Do not set UDP checksum
>   *
>   * Return: size of IPv4 payload (UDP header + data)
>   */
> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> -		       const struct flowside *toside, size_t dlen,
> +size_t udp_update_hdr4(struct iov_tail *payload, const struct flowside *toside,
>  		       bool no_udp_csum)
>  {
>  	const struct in_addr *src = inany_v4(&toside->oaddr);
>  	const struct in_addr *dst = inany_v4(&toside->eaddr);
> -	size_t l4len = dlen + sizeof(bp->uh);
> -	size_t l3len = l4len + sizeof(*ip4h);
> +	size_t l3len = iov_tail_size(payload);
> +	size_t l4len = l3len - sizeof(struct iphdr);
>  
>  	assert(src && dst);
>  
> -	ip4h->tot_len = htons(l3len);
> -	ip4h->daddr = dst->s_addr;
> -	ip4h->saddr = src->s_addr;
> -	ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
> +	with_header(struct iphdr, ip4h, payload) {
> +		ip4h->tot_len = htons(l3len);
> +		ip4h->daddr = dst->s_addr;
> +		ip4h->saddr = src->s_addr;
> +		ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
> +	}
> +	IOV_DROP_HEADER(payload, struct iphdr);
> +
> +	with_header(struct udphdr, uh, payload) {
> +		uh->source = htons(toside->oport);
> +		uh->dest = htons(toside->eport);
> +		uh->len = htons(l4len);
> +		if (no_udp_csum) {
> +			uh->check = 0;
> +		} else {
> +			struct iov_tail data = *payload;
>  
> -	bp->uh.source = htons(toside->oport);
> -	bp->uh.dest = htons(toside->eport);
> -	bp->uh.len = htons(l4len);
> -	if (no_udp_csum) {
> -		bp->uh.check = 0;
> -	} else {
> -		const struct iovec iov = {
> -			.iov_base = bp->data,
> -			.iov_len = dlen
> -		};
> -		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
> -		csum_udp4(&bp->uh, *src, *dst, &data);
> +			IOV_DROP_HEADER(&data, struct udphdr);
> +			csum_udp4(uh, *src, *dst, &data);
> +		}
>  	}
>  
>  	return l4len;
> @@ -297,44 +297,45 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>  
>  /**
>   * udp_update_hdr6() - Update headers for one IPv6 datagram
> - * @ip6h:		Pre-filled IPv6 header (except for payload_len and
> - * 			addresses)
> - * @bp:			Pointer to udp_payload_t to update
> + * @payload:		UDP payload
>   * @toside:		Flowside for destination side
> - * @dlen:		Length of UDP payload
>   * @no_udp_csum:	Do not set UDP checksum
>   *
>   * Return: size of IPv6 payload (UDP header + data)
>   */
> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> -		       const struct flowside *toside, size_t dlen,
> +size_t udp_update_hdr6(struct iov_tail *payload, const struct flowside *toside,
>  		       bool no_udp_csum)
>  {
> -	uint16_t l4len = dlen + sizeof(bp->uh);
> -
> -	ip6h->payload_len = htons(l4len);
> -	ip6h->daddr = toside->eaddr.a6;
> -	ip6h->saddr = toside->oaddr.a6;
> -	ip6h->version = 6;
> -	ip6h->nexthdr = IPPROTO_UDP;
> -	ip6h->hop_limit = 255;
> -
> -	bp->uh.source = htons(toside->oport);
> -	bp->uh.dest = htons(toside->eport);
> -	bp->uh.len = ip6h->payload_len;
> -	if (no_udp_csum) {
> -		/* 0 is an invalid checksum for UDP IPv6 and dropped by
> -		 * the kernel stack, even if the checksum is disabled by virtio
> -		 * flags. We need to put any non-zero value here.
> -		 */
> -		bp->uh.check = 0xffff;
> -	} else {
> -		const struct iovec iov = {
> -			.iov_base = bp->data,
> -			.iov_len = dlen
> -		};
> -		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
> -		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data);
> +	uint16_t l4len = iov_tail_size(payload) - sizeof(struct ipv6hdr);
> +
> +	with_header(struct ipv6hdr, ip6h, payload) {
> +		ip6h->payload_len = htons(l4len);
> +		ip6h->daddr = toside->eaddr.a6;
> +		ip6h->saddr = toside->oaddr.a6;
> +		ip6h->version = 6;
> +		ip6h->nexthdr = IPPROTO_UDP;
> +		ip6h->hop_limit = 255;
> +	}
> +	IOV_DROP_HEADER(payload, struct ipv6hdr);
> +
> +	with_header(struct udphdr, uh, payload) {
> +		uh->source = htons(toside->oport);
> +		uh->dest = htons(toside->eport);
> +		uh->len = htons(l4len);
> +		if (no_udp_csum) {
> +			/* 0 is an invalid checksum for UDP IPv6 and dropped by
> +			 * the kernel stack, even if the checksum is disabled
> +			 * by virtio flags. We need to put any non-zero value
> +			 * here.
> +			 */
> +			uh->check = 0xffff;
> +		} else {
> +			struct iov_tail data = *payload;
> +
> +			IOV_DROP_HEADER(&data, struct udphdr);
> +			csum_udp6(uh, &toside->oaddr.a6, &toside->eaddr.a6,
> +				  &data);
> +		}
>  	}
>  
>  	return l4len;
> @@ -374,12 +375,22 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>  	struct ethhdr *eh = (*tap_iov)[UDP_IOV_ETH].iov_base;
>  	struct udp_payload_t *bp = &udp_payload[idx];
>  	struct udp_meta_t *bm = &udp_meta[idx];
> +	struct iovec iov[3];
> +	struct iov_tail payload = IOV_TAIL(iov, ARRAY_SIZE(iov), 0);
>  	size_t l4len, l2len;
>  
> +	iov[1].iov_base = &bp->uh;
> +	iov[1].iov_len = sizeof(bp->uh);
> +	iov[2].iov_base = bp->data;
> +	iov[2].iov_len = mmh[idx].msg_len;
> +
>  	eth_update_mac(eh, NULL, tap_omac);
>  	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
> -		l4len = udp_update_hdr6(&bm->ip6h, bp, toside,
> -					mmh[idx].msg_len, no_udp_csum);
> +
> +		iov[0].iov_base = &bm->ip6h;
> +		iov[0].iov_len = sizeof(bm->ip6h);
> +
> +		l4len = udp_update_hdr6(&payload, toside, no_udp_csum);
>  
>  		l2len = MAX(l4len + sizeof(bm->ip6h) + ETH_HLEN, ETH_ZLEN);
>  		tap_hdr_update(&bm->taph, l2len);
> @@ -387,8 +398,10 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>  		eh->h_proto = htons_constant(ETH_P_IPV6);
>  		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip6h);
>  	} else {
> -		l4len = udp_update_hdr4(&bm->ip4h, bp, toside,
> -					mmh[idx].msg_len, no_udp_csum);
> +		iov[0].iov_base = &bm->ip4h;
> +		iov[0].iov_len = sizeof(bm->ip4h);
> +
> +		l4len = udp_update_hdr4(&payload, toside, no_udp_csum);
>  
>  		l2len = MAX(l4len + sizeof(bm->ip4h) + ETH_HLEN, ETH_ZLEN);
>  		tap_hdr_update(&bm->taph, l2len);
> diff --git a/udp_internal.h b/udp_internal.h
> index 64e457748324..fb3017ae3251 100644
> --- a/udp_internal.h
> +++ b/udp_internal.h
> @@ -25,11 +25,9 @@ struct udp_payload_t {
>  } __attribute__ ((packed, aligned(__alignof__(unsigned int))));
>  #endif
>  
> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> -		       const struct flowside *toside, size_t dlen,
> +size_t udp_update_hdr4(struct iov_tail *payload, const struct flowside *toside,
>  		       bool no_udp_csum);
> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> -		       const struct flowside *toside, size_t dlen,
> +size_t udp_update_hdr6(struct iov_tail *payload, const struct flowside *toside,
>  		       bool no_udp_csum);
>  void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
>  		  uint8_t frompif, in_port_t port, const struct timespec *now);
> diff --git a/udp_vu.c b/udp_vu.c
> index dc949e30a793..80391b4f8788 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -93,42 +93,39 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
>   * @c:		Execution context
>   * @data:	IO vector tail for the frame
>   * @toside:	Address information for one side of the flow
> - * @dlen:	Packet data length
>   *
>   * Return: Layer-4 length
>   */
>  static size_t udp_vu_prepare(const struct ctx *c, const struct iov_tail *data,
> -			     const struct flowside *toside, ssize_t dlen)
> +			     const struct flowside *toside)
>  {
> -	const struct iovec *iov = data->iov;
> -	struct ethhdr *eh;
> +	bool ipv4 = inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr);
> +	struct iov_tail payload = *data;
>  	size_t l4len;
>  
>  	/* ethernet header */
> -	eh = vu_eth(iov[0].iov_base);
> -
> -	memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
> -	memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
> +	with_header(struct ethhdr, eh, &payload) {
> +		memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
> +		memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
> +
> +		if (ipv4)
> +			eh->h_proto = htons(ETH_P_IP);
> +		else
> +			eh->h_proto = htons(ETH_P_IPV6);
> +	}
> +	IOV_DROP_HEADER(&payload, struct ethhdr);
>  
>  	/* initialize header */
> -	if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
> -		struct iphdr *iph = vu_ip(iov[0].iov_base);
> -		struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
> +	if (ipv4) {
> +		with_header(struct iphdr, iph, &payload)
> +			*iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
>  
> -		eh->h_proto = htons(ETH_P_IP);
> -
> -		*iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
> -
> -		l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
> +		l4len = udp_update_hdr4(&payload, toside, true);
>  	} else {
> -		struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
> -		struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
> -
> -		eh->h_proto = htons(ETH_P_IPV6);
> -
> -		*ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
> +		with_header(struct ipv6hdr, ip6h, &payload)
> +			*ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
>  
> -		l4len = udp_update_hdr6(ip6h, bp, toside, dlen, true);
> +		l4len = udp_update_hdr6(&payload, toside, true);
>  	}
>  
>  	return l4len;
> @@ -137,7 +134,7 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iov_tail *data,
>  /**
>   * udp_vu_csum() - Calculate and set checksum for a UDP packet
>   * @toside:	Address information for one side of the flow
> - * @data:	IO vector tail for the frame (including vnet header)
> + * @data:	IO vector tail for the L2 frame
>   */
>  static void udp_vu_csum(const struct flowside *toside,
>  			const struct iov_tail *data)
> @@ -148,7 +145,6 @@ static void udp_vu_csum(const struct flowside *toside,
>  	struct udphdr *uh, uh_storage;
>  	bool ipv4 = src4 && dst4;
>  
> -	IOV_DROP_HEADER(&payload, struct virtio_net_hdr_mrg_rxbuf);
>  	IOV_DROP_HEADER(&payload, struct ethhdr);
>  	if (ipv4)
>  		IOV_DROP_HEADER(&payload, struct iphdr);
> @@ -225,10 +221,11 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
>  		if (iov_cnt > 0) {
>  			struct iov_tail data = IOV_TAIL(iov_vu, iov_cnt, 0);
>  			vu_set_vnethdr(iov_vu[0].iov_base, elem_used);
> -			udp_vu_prepare(c, &data, toside, dlen);
> +			iov_drop_header(&data, VNET_HLEN);
> +			udp_vu_prepare(c, &data, toside);
>  			if (*c->pcap) {
>  				udp_vu_csum(toside, &data);
> -				pcap_iov(data.iov, data.cnt, VNET_HLEN);
> +				pcap_iov(data.iov, data.cnt, data.off);
>  			}
>  			vu_flush(vdev, vq, elem, elem_used);
>  		}
> -- 
> 2.53.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2026-03-24  2:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 14:31 [PATCH v4 0/5] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-03-23 14:31 ` [PATCH v4 1/5] vhost-user: Centralise Ethernet frame padding in vu_collect(), vu_pad() and vu_flush() Laurent Vivier
2026-03-24  1:56   ` David Gibson
2026-03-24  8:04     ` Laurent Vivier
2026-03-23 14:31 ` [PATCH v4 2/5] udp_vu: Use iov_tail to manage virtqueue buffers Laurent Vivier
2026-03-24  2:11   ` David Gibson
2026-03-23 14:31 ` [PATCH v4 3/5] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
2026-03-24  2:37   ` David Gibson
2026-03-23 14:31 ` [PATCH v4 4/5] iov: Add IOV_PUT_HEADER() and with_header() to write header data back to iov_tail Laurent Vivier
2026-03-24  2:41   ` David Gibson
2026-03-24  2:48     ` David Gibson
2026-03-24  7:44       ` Laurent Vivier
2026-03-24 23:46         ` David Gibson
2026-03-24  7:16     ` Laurent Vivier
2026-03-24 23:38       ` David Gibson
2026-03-23 14:31 ` [PATCH v4 5/5] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2026-03-24  2:54   ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acH8yzvh8BQ9Y2Na@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).