* [PATCH v8 00/30] Introduce discontiguous frames management
@ 2025-08-05 15:45 Laurent Vivier
2025-08-05 15:45 ` [PATCH v8 01/30] arp: Don't mix incoming and outgoing buffers Laurent Vivier
` (29 more replies)
0 siblings, 30 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
This series introduces iov_tail to convey frame information
between functions.
This is only an API change, for the moment the memory pool
is only able to store contiguous buffer, so, except for
vhost-user in a special case, we only play with iovec array
with only one entry.
v8:
- rebase
- rework the two last patches to store the iovec in the p->pkt array
v7:
- Add a patch to fix comment style of 'Return:'
- Fix ignore_arp()/accept_arp()
- Fix coverity error
- Fix several comments
v6:
- Replaced iov_slice() with the clearer iov_tail_clone()
for creating iovec subsets.
- Standardized local header variable names (to *_storage suffix).
- Renamed functions for better semantics (e.g., ignore_arp to
accept_arp, packet_data to packet_get).
- Corrected OPTLEN_MAX definition in TCP.
- Addressed minor logic issues (e.g., DHCPv6 FQDN flags, NDP null check).
- Updated ipv6_l4hdr() return type to boolean.
- Improved comments and documentation across several modules.
v5:
- store in the pool iovec array with several entries
v4:
Prepare to introduce iovec array in the pool:
- passe iov_tail rather than pool to ndp,icmp, dhcp, dhcpv6 and arp
- remove unused pool macros
- add memory regions in the pool structure, this will allow us to use
the buf pointer to store the iovec array for vhost-user
v3:
Address comments from David
Laurent Vivier (30):
arp: Don't mix incoming and outgoing buffers
iov: Introduce iov_tail_clone() and iov_tail_drop().
iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER()
tap: Use iov_tail with tap_add_packet()
packet: Use iov_tail with packet_add()
packet: Add packet_data()
arp: Convert to iov_tail
ndp: Convert to iov_tail
icmp: Convert to iov_tail
udp: Convert to iov_tail
tcp: Convert tcp_tap_handler() to use iov_tail
tcp: Convert tcp_data_from_tap() to use iov_tail
dhcpv6: move offset initialization out of dhcpv6_opt()
dhcpv6: Extract sending of NotOnLink status
dhcpv6: Convert to iov_tail
dhcpv6: Use iov_tail in dhcpv6_opt()
dhcp: Convert to iov_tail
ip: Use iov_tail in ipv6_l4hdr()
tap: Convert tap4_handler() to iov_tail
tap: Convert tap6_handler() to iov_tail
packet: rename packet_data() to packet_get()
arp: use iov_tail rather than pool
dhcp: use iov_tail rather than pool
dhcpv6: use iov_tail rather than pool
icmp: use iov_tail rather than pool
ndp: use iov_tail rather than pool
packet: remove PACKET_POOL() and PACKET_POOL_P()
packet: remove unused parameter from PACKET_POOL_DECL()
packet: Refactor vhost-user memory region handling
packet: Add support for multi-vector packets
arp.c | 86 +++++++++++++-------
arp.h | 2 +-
dhcp.c | 48 ++++++-----
dhcp.h | 2 +-
dhcpv6.c | 223 +++++++++++++++++++++++++++++++--------------------
dhcpv6.h | 2 +-
icmp.c | 41 ++++++----
icmp.h | 2 +-
iov.c | 102 ++++++++++++++++++++---
iov.h | 58 ++++++++++----
ip.c | 32 ++++----
ip.h | 3 +-
ndp.c | 16 +++-
ndp.h | 4 +-
packet.c | 139 +++++++++++++++++---------------
packet.h | 45 ++++-------
pcap.c | 1 +
tap.c | 119 +++++++++++++++------------
tap.h | 4 +-
tcp.c | 61 +++++++++-----
tcp_buf.c | 2 +-
udp.c | 33 +++++---
vhost_user.c | 28 +++----
virtio.c | 4 +-
virtio.h | 18 ++++-
vu_common.c | 48 ++++-------
26 files changed, 689 insertions(+), 434 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v8 01/30] arp: Don't mix incoming and outgoing buffers
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
@ 2025-08-05 15:45 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop() Laurent Vivier
` (28 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Don't use the memory of the incoming packet to build the outgoing buffer
as it can be memory of the TX queue in the case of vhost-user.
Moreover with vhost-user, the packet can be split across several
iovec and it's easier to rebuild it in a buffer than updating an
existing iovec array.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
arp.c | 84 ++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 55 insertions(+), 29 deletions(-)
diff --git a/arp.c b/arp.c
index fc482bbd9938..9f1fedeafec0 100644
--- a/arp.c
+++ b/arp.c
@@ -31,56 +31,82 @@
#include "tap.h"
/**
- * arp() - Check if this is a supported ARP message, reply as needed
+ * ignore_arp() - Check if we should ignore this ARP message
* @c: Execution context
- * @p: Packet pool, single packet with Ethernet buffer
+ * @ah: ARP header
+ * @am: ARP message
*
- * Return: 1 if handled, -1 on failure
+ * Return: true if the ARP message should be ignored, false otherwise
*/
-int arp(const struct ctx *c, const struct pool *p)
+static bool ignore_arp(const struct ctx *c,
+ const struct arphdr *ah, const struct arpmsg *am)
{
- unsigned char swap[4];
- struct ethhdr *eh;
- struct arphdr *ah;
- struct arpmsg *am;
- size_t l2len;
-
- eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
- ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
- am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
-
- if (!eh || !ah || !am)
- return -1;
-
if (ah->ar_hrd != htons(ARPHRD_ETHER) ||
ah->ar_pro != htons(ETH_P_IP) ||
ah->ar_hln != ETH_ALEN ||
ah->ar_pln != 4 ||
ah->ar_op != htons(ARPOP_REQUEST))
- return 1;
+ return true;
/* Discard announcements, but not 0.0.0.0 "probes" */
if (memcmp(am->sip, &in4addr_any, sizeof(am->sip)) &&
!memcmp(am->sip, am->tip, sizeof(am->sip)))
- return 1;
+ return true;
/* Don't resolve the guest's assigned address, either. */
if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip)))
+ return true;
+
+ return false;
+}
+
+/**
+ * arp() - Check if this is a supported ARP message, reply as needed
+ * @c: Execution context
+ * @p: Packet pool, single packet with Ethernet buffer
+ *
+ * Return: 1 if handled, -1 on failure
+ */
+int arp(const struct ctx *c, const struct pool *p)
+{
+ struct {
+ struct ethhdr eh;
+ struct arphdr ah;
+ struct arpmsg am;
+ } __attribute__((__packed__)) resp;
+ const struct ethhdr *eh;
+ const struct arphdr *ah;
+ const struct arpmsg *am;
+
+ eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
+ ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
+ am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
+
+ if (!eh || !ah || !am)
+ return -1;
+
+ if (ignore_arp(c, ah, am))
return 1;
- ah->ar_op = htons(ARPOP_REPLY);
- memcpy(am->tha, am->sha, sizeof(am->tha));
- memcpy(am->sha, c->our_tap_mac, sizeof(am->sha));
+ /* Ethernet header */
+ resp.eh.h_proto = htons(ETH_P_ARP);
+ memcpy(resp.eh.h_dest, eh->h_source, sizeof(resp.eh.h_dest));
+ memcpy(resp.eh.h_source, c->our_tap_mac, sizeof(resp.eh.h_source));
- memcpy(swap, am->tip, sizeof(am->tip));
- memcpy(am->tip, am->sip, sizeof(am->tip));
- memcpy(am->sip, swap, sizeof(am->sip));
+ /* ARP header */
+ resp.ah.ar_op = htons(ARPOP_REPLY);
+ resp.ah.ar_hrd = ah->ar_hrd;
+ resp.ah.ar_pro = ah->ar_pro;
+ resp.ah.ar_hln = ah->ar_hln;
+ resp.ah.ar_pln = ah->ar_pln;
- l2len = sizeof(*eh) + sizeof(*ah) + sizeof(*am);
- memcpy(eh->h_dest, eh->h_source, sizeof(eh->h_dest));
- memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
+ /* ARP message */
+ memcpy(resp.am.sha, c->our_tap_mac, sizeof(resp.am.sha));
+ memcpy(resp.am.sip, am->tip, sizeof(resp.am.sip));
+ memcpy(resp.am.tha, am->sha, sizeof(resp.am.tha));
+ memcpy(resp.am.tip, am->sip, sizeof(resp.am.tip));
- tap_send_single(c, eh, l2len);
+ tap_send_single(c, &resp, sizeof(resp));
return 1;
}
--
@@ -31,56 +31,82 @@
#include "tap.h"
/**
- * arp() - Check if this is a supported ARP message, reply as needed
+ * ignore_arp() - Check if we should ignore this ARP message
* @c: Execution context
- * @p: Packet pool, single packet with Ethernet buffer
+ * @ah: ARP header
+ * @am: ARP message
*
- * Return: 1 if handled, -1 on failure
+ * Return: true if the ARP message should be ignored, false otherwise
*/
-int arp(const struct ctx *c, const struct pool *p)
+static bool ignore_arp(const struct ctx *c,
+ const struct arphdr *ah, const struct arpmsg *am)
{
- unsigned char swap[4];
- struct ethhdr *eh;
- struct arphdr *ah;
- struct arpmsg *am;
- size_t l2len;
-
- eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
- ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
- am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
-
- if (!eh || !ah || !am)
- return -1;
-
if (ah->ar_hrd != htons(ARPHRD_ETHER) ||
ah->ar_pro != htons(ETH_P_IP) ||
ah->ar_hln != ETH_ALEN ||
ah->ar_pln != 4 ||
ah->ar_op != htons(ARPOP_REQUEST))
- return 1;
+ return true;
/* Discard announcements, but not 0.0.0.0 "probes" */
if (memcmp(am->sip, &in4addr_any, sizeof(am->sip)) &&
!memcmp(am->sip, am->tip, sizeof(am->sip)))
- return 1;
+ return true;
/* Don't resolve the guest's assigned address, either. */
if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip)))
+ return true;
+
+ return false;
+}
+
+/**
+ * arp() - Check if this is a supported ARP message, reply as needed
+ * @c: Execution context
+ * @p: Packet pool, single packet with Ethernet buffer
+ *
+ * Return: 1 if handled, -1 on failure
+ */
+int arp(const struct ctx *c, const struct pool *p)
+{
+ struct {
+ struct ethhdr eh;
+ struct arphdr ah;
+ struct arpmsg am;
+ } __attribute__((__packed__)) resp;
+ const struct ethhdr *eh;
+ const struct arphdr *ah;
+ const struct arpmsg *am;
+
+ eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
+ ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
+ am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
+
+ if (!eh || !ah || !am)
+ return -1;
+
+ if (ignore_arp(c, ah, am))
return 1;
- ah->ar_op = htons(ARPOP_REPLY);
- memcpy(am->tha, am->sha, sizeof(am->tha));
- memcpy(am->sha, c->our_tap_mac, sizeof(am->sha));
+ /* Ethernet header */
+ resp.eh.h_proto = htons(ETH_P_ARP);
+ memcpy(resp.eh.h_dest, eh->h_source, sizeof(resp.eh.h_dest));
+ memcpy(resp.eh.h_source, c->our_tap_mac, sizeof(resp.eh.h_source));
- memcpy(swap, am->tip, sizeof(am->tip));
- memcpy(am->tip, am->sip, sizeof(am->tip));
- memcpy(am->sip, swap, sizeof(am->sip));
+ /* ARP header */
+ resp.ah.ar_op = htons(ARPOP_REPLY);
+ resp.ah.ar_hrd = ah->ar_hrd;
+ resp.ah.ar_pro = ah->ar_pro;
+ resp.ah.ar_hln = ah->ar_hln;
+ resp.ah.ar_pln = ah->ar_pln;
- l2len = sizeof(*eh) + sizeof(*ah) + sizeof(*am);
- memcpy(eh->h_dest, eh->h_source, sizeof(eh->h_dest));
- memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
+ /* ARP message */
+ memcpy(resp.am.sha, c->our_tap_mac, sizeof(resp.am.sha));
+ memcpy(resp.am.sip, am->tip, sizeof(resp.am.sip));
+ memcpy(resp.am.tha, am->sha, sizeof(resp.am.tha));
+ memcpy(resp.am.tip, am->sip, sizeof(resp.am.tip));
- tap_send_single(c, eh, l2len);
+ tap_send_single(c, &resp, sizeof(resp));
return 1;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop().
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
2025-08-05 15:45 ` [PATCH v8 01/30] arp: Don't mix incoming and outgoing buffers Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 1:32 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER() Laurent Vivier
` (27 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
These utilities enhance iov_tail manipulation, useful for
efficient packet processing by enabling iovec array cloning and
header stripping without data copies.
- iov_tail_drop(): Discards a specified number of bytes from the
beginning of an iov_tail by advancing its internal offset and pruning
consumed elements.
- iov_tail_clone(): Clone an iov_tail into an iovec array, adjusting the
first iovec entry to remove the iov_tail offset.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
iov.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
iov.h | 3 +++
2 files changed, 55 insertions(+)
diff --git a/iov.c b/iov.c
index edf0444d1955..9d282d4af461 100644
--- a/iov.c
+++ b/iov.c
@@ -192,6 +192,21 @@ size_t iov_tail_size(struct iov_tail *tail)
return iov_size(tail->iov, tail->cnt) - tail->off;
}
+/**
+ * iov_tail_drop() - Discard a header from an IOV tail
+ * @tail: IO vector tail
+ * @len: length to move the head of the tail
+ *
+ * Return: true if the item still contains any bytes, otherwise false
+ */
+/* cppcheck-suppress unusedFunction */
+bool iov_tail_drop(struct iov_tail *tail, size_t len)
+{
+ tail->off = tail->off + len;
+
+ return iov_tail_prune(tail);
+}
+
/**
* iov_peek_header_() - Get pointer to a header from an IOV tail
* @tail: IOV tail to get header from
@@ -248,3 +263,40 @@ void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
tail->off = tail->off + len;
return p;
}
+
+/**
+ * iov_tail_clone() - Assign iov references referencing a subset of the data
+ * in an iov_tail
+ *
+ * @dst_iov: Pointer to the destination array of struct iovec describing
+ * the scatter/gather I/O vector to shallow copy to.
+ * @dst_iov_cnt: Maximum number of elements in the destination iov array.
+ * @tail: Pointer to the source iov_tail
+ *
+ * Return: the number of elements successfully referenced from the destination
+ * iov array, a negative value if there is not enough room in the
+ * destination iov array
+ */
+/* cppcheck-suppress unusedFunction */
+ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
+ struct iov_tail *tail)
+{
+ const struct iovec *iov = &tail->iov[0];
+ size_t iov_cnt = tail->cnt;
+ size_t offset = tail->off;
+ unsigned int i, j;
+
+ i = iov_skip_bytes(iov, iov_cnt, offset, &offset);
+
+ /* assign iov references referencing a subset of the source one */
+ for (j = 0; i < iov_cnt && j < dst_iov_cnt; i++, j++) {
+ dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
+ dst_iov[j].iov_len = iov[i].iov_len - offset;
+ offset = 0;
+ }
+
+ if (j == dst_iov_cnt && i != iov_cnt)
+ return -1;
+
+ return j;
+}
diff --git a/iov.h b/iov.h
index 3fc96ab9755a..bf9820ac52ab 100644
--- a/iov.h
+++ b/iov.h
@@ -72,8 +72,11 @@ struct iov_tail {
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
+bool iov_tail_drop(struct iov_tail *tail, size_t len);
void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align);
void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align);
+ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
+ struct iov_tail *tail);
/**
* IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
--
@@ -72,8 +72,11 @@ struct iov_tail {
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
+bool iov_tail_drop(struct iov_tail *tail, size_t len);
void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align);
void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align);
+ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
+ struct iov_tail *tail);
/**
* IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
2025-08-05 15:45 ` [PATCH v8 01/30] arp: Don't mix incoming and outgoing buffers Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 1:45 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet() Laurent Vivier
` (26 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Provide a temporary variable of the wanted type to store
the header if the memory in the iovec array is not contiguous.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
iov.c | 55 +++++++++++++++++++++++++++++++++++++++++++++----------
iov.h | 55 +++++++++++++++++++++++++++++++++++++++++--------------
tcp_buf.c | 2 +-
3 files changed, 87 insertions(+), 25 deletions(-)
diff --git a/iov.c b/iov.c
index 9d282d4af461..d39bb099fa69 100644
--- a/iov.c
+++ b/iov.c
@@ -109,7 +109,7 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
*
* Return: the number of bytes successfully copied.
*/
-/* cppcheck-suppress unusedFunction */
+/* cppcheck-suppress [staticFunction] */
size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
size_t offset, void *buf, size_t bytes)
{
@@ -127,6 +127,7 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
/* copying data */
for (copied = 0; copied < bytes && i < iov_cnt; i++) {
size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
+ /* NOLINTNEXTLINE(clang-analyzer-core.NonNullParamChecker) */
memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
len);
copied += len;
@@ -208,7 +209,7 @@ bool iov_tail_drop(struct iov_tail *tail, size_t len)
}
/**
- * iov_peek_header_() - Get pointer to a header from an IOV tail
+ * iov_check_header() - Check if a header can be accessed
* @tail: IOV tail to get header from
* @len: Length of header to get, in bytes
* @align: Required alignment of header, in bytes
@@ -219,8 +220,7 @@ bool iov_tail_drop(struct iov_tail *tail, size_t len)
* overruns the IO vector, is not contiguous or doesn't have the
* requested alignment.
*/
-/* cppcheck-suppress [staticFunction,unmatchedSuppression] */
-void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align)
+static void *iov_check_header(struct iov_tail *tail, size_t len, size_t align)
{
char *p;
@@ -240,27 +240,62 @@ void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align)
return p;
}
+/**
+ * iov_peek_header_() - Get pointer to a header from an IOV tail
+ * @tail: IOV tail to get header from
+ * @v: Temporary memory to use if the memory in @tail
+ * is discontinuous
+ * @len: Length of header to get, in bytes
+ * @align: Required alignment of header, in bytes
+ *
+ * @tail may be pruned, but will represent the same bytes as before.
+ *
+ * Return: pointer to the first @len logical bytes of the tail, or to
+ * a copy if that overruns the IO vector, is not contiguous or
+ * doesn't have the requested alignment. NULL if that overruns the
+ * IO vector.
+ */
+/* cppcheck-suppress [staticFunction,unmatchedSuppression] */
+void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
+{
+ char *p = iov_check_header(tail, len, align);
+ size_t l;
+
+ if (p)
+ return p;
+
+ l = iov_to_buf(tail->iov, tail->cnt, tail->off, v, len);
+ if (l != len)
+ return NULL;
+
+ return v;
+}
+
/**
* iov_remove_header_() - Remove a header from an IOV tail
* @tail: IOV tail to remove header from (modified)
+ * @v: Temporary memory to use if the memory in @tail
+ * is discontinuous
* @len: Length of header to remove, in bytes
* @align: Required alignment of header, in bytes
*
* On success, @tail is updated so that it longer includes the bytes of the
* returned header.
*
- * Return: pointer to the first @len logical bytes of the tail, NULL if that
- * overruns the IO vector, is not contiguous or doesn't have the
- * requested alignment.
+ * Return: pointer to the first @len logical bytes of the tail, or to
+ * a copy if that overruns the IO vector, is not contiguous or
+ * doesn't have the requested alignment. NULL if that overruns the
+ * IO vector.
*/
-void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
+void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
{
- char *p = iov_peek_header_(tail, len, align);
+ char *p = iov_peek_header_(tail, v, len, align);
if (!p)
return NULL;
tail->off = tail->off + len;
+
return p;
}
@@ -275,7 +310,7 @@ void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
*
* Return: the number of elements successfully referenced from the destination
* iov array, a negative value if there is not enough room in the
- * destination iov array
+ * destination iov array
*/
/* cppcheck-suppress unusedFunction */
ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
diff --git a/iov.h b/iov.h
index bf9820ac52ab..ccdb690ef3f1 100644
--- a/iov.h
+++ b/iov.h
@@ -70,41 +70,68 @@ struct iov_tail {
#define IOV_TAIL(iov_, cnt_, off_) \
(struct iov_tail){ .iov = (iov_), .cnt = (cnt_), .off = (off_) }
+/**
+ * IOV_TAIL_FROM_BUF() - Create a new IOV tail from a buffer
+ * @buf_: Buffer address to use in the iovec
+ * @len_: Buffer size
+ * @off_: Byte offset in the buffer where the tail begins
+ */
+#define IOV_TAIL_FROM_BUF(buf_, len_, off_) \
+ IOV_TAIL((&(const struct iovec){ .iov_base = (buf_), \
+ .iov_len = (len_) }), \
+ 1, \
+ (off_))
+
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
bool iov_tail_drop(struct iov_tail *tail, size_t len);
-void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align);
-void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align);
+void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
+void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
struct iov_tail *tail);
/**
* IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
* @tail_: IOV tail to get header from
- * @type_: Data type of the header
+ * @var_: Temporary buffer of the type of the header to use if
+ * the memory in the iovec array is not contiguous.
*
* @tail_ may be pruned, but will represent the same bytes as before.
*
- * Return: pointer of type (@type_ *) located at the start of @tail_, NULL if
- * we can't get a contiguous and aligned pointer.
+ * Return: pointer of type (@type_ *) located at the start of @tail_
+ * or to @var_ if iovec memory is not contiguous, NULL if
+ * that overruns the iovec.
*/
-#define IOV_PEEK_HEADER(tail_, type_) \
- ((type_ *)(iov_peek_header_((tail_), \
- sizeof(type_), __alignof__(type_))))
+
+#define IOV_PEEK_HEADER(tail_, var_) \
+ ((__typeof__(var_) *)(iov_peek_header_((tail_), &(var_), \
+ sizeof(var_), \
+ __alignof__(var_))))
/**
* IOV_REMOVE_HEADER() - Remove and return typed header from an IOV tail
* @tail_: IOV tail to remove header from (modified)
- * @type_: Data type of the header to remove
+ * @var_: Temporary buffer of the type of the header to use if
+ * the memory in the iovec array is not contiguous.
*
* On success, @tail_ is updated so that it longer includes the bytes of the
* returned header.
*
- * Return: pointer of type (@type_ *) located at the old start of @tail_, NULL
- * if we can't get a contiguous and aligned pointer.
+ * Return: pointer of type (@type_ *) located at the start of @tail_
+ * or to @var_ if iovec memory is not contiguous, NULL if
+ * that overruns the iovec.
+ */
+
+#define IOV_REMOVE_HEADER(tail_, var_) \
+ ((__typeof__(var_) *)(iov_remove_header_((tail_), &(var_), \
+ sizeof(var_), __alignof__(var_))))
+
+/** IOV_DROP_HEADER() - Remove a typed header from an IOV tail
+ * @tail_: IOV tail to remove header from (modified)
+ * @type_: Data type of the header to remove
+ *
+ * Return: true if the tail still contains any bytes, otherwise false
*/
-#define IOV_REMOVE_HEADER(tail_, type_) \
- ((type_ *)(iov_remove_header_((tail_), \
- sizeof(type_), __alignof__(type_))))
+#define IOV_DROP_HEADER(tail_, type_) iov_tail_drop((tail_), sizeof(type_))
#endif /* IOVEC_H */
diff --git a/tcp_buf.c b/tcp_buf.c
index d1fca676c9a7..bc898de86919 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -160,7 +160,7 @@ static void tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn,
uint32_t seq, bool no_tcp_csum)
{
struct iov_tail tail = IOV_TAIL(&iov[TCP_IOV_PAYLOAD], 1, 0);
- struct tcphdr *th = IOV_REMOVE_HEADER(&tail, struct tcphdr);
+ struct tcphdr th_storage, *th = IOV_REMOVE_HEADER(&tail, th_storage);
struct tap_hdr *taph = iov[TCP_IOV_TAP].iov_base;
const struct flowside *tapside = TAPFLOW(conn);
const struct in_addr *a4 = inany_v4(&tapside->oaddr);
--
@@ -160,7 +160,7 @@ static void tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn,
uint32_t seq, bool no_tcp_csum)
{
struct iov_tail tail = IOV_TAIL(&iov[TCP_IOV_PAYLOAD], 1, 0);
- struct tcphdr *th = IOV_REMOVE_HEADER(&tail, struct tcphdr);
+ struct tcphdr th_storage, *th = IOV_REMOVE_HEADER(&tail, th_storage);
struct tap_hdr *taph = iov[TCP_IOV_TAP].iov_base;
const struct flowside *tapside = TAPFLOW(conn);
const struct in_addr *a4 = inany_v4(&tapside->oaddr);
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (2 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 1:56 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 05/30] packet: Use iov_tail with packet_add() Laurent Vivier
` (25 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Use IOV_PEEK_HEADER() to get the ethernet header from the iovec.
Move the workaround about multiple iovec array from vu_handle_tx() to
tap_add_packet(). Removing the offset out of the iovec array should
reduce the iovec count to 1.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
iov.c | 1 -
pcap.c | 1 +
tap.c | 30 +++++++++++++++++++++---------
tap.h | 3 +--
vu_common.c | 26 +++++---------------------
5 files changed, 28 insertions(+), 33 deletions(-)
diff --git a/iov.c b/iov.c
index d39bb099fa69..97e4ea733540 100644
--- a/iov.c
+++ b/iov.c
@@ -200,7 +200,6 @@ size_t iov_tail_size(struct iov_tail *tail)
*
* Return: true if the item still contains any bytes, otherwise false
*/
-/* cppcheck-suppress unusedFunction */
bool iov_tail_drop(struct iov_tail *tail, size_t len)
{
tail->off = tail->off + len;
diff --git a/pcap.c b/pcap.c
index 46d11a2a6daa..03adc4c55f4b 100644
--- a/pcap.c
+++ b/pcap.c
@@ -74,6 +74,7 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt,
* @pkt: Pointer to data buffer, including L2 headers
* @l2len: L2 frame length
*/
+/* cppcheck-suppress unusedFunction */
void pcap(const char *pkt, size_t l2len)
{
struct iovec iov = { (char *)pkt, l2len };
diff --git a/tap.c b/tap.c
index 6db5d88b1760..c5520bf3bc76 100644
--- a/tap.c
+++ b/tap.c
@@ -1070,24 +1070,29 @@ void tap_handler(struct ctx *c, const struct timespec *now)
/**
* tap_add_packet() - Queue/capture packet, update notion of guest MAC address
* @c: Execution context
- * @l2len: Total L2 packet length
- * @p: Packet buffer
+ * @data: Packet to add to the pool
* @now: Current timestamp
*/
-void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
+void tap_add_packet(struct ctx *c, struct iov_tail *data,
const struct timespec *now)
{
+ struct ethhdr eh_storage;
const struct ethhdr *eh;
- pcap(p, l2len);
+ pcap_iov(data->iov, data->cnt, data->off);
- eh = (struct ethhdr *)p;
+ eh = IOV_PEEK_HEADER(data, eh_storage);
+ if (!eh)
+ return;
if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) {
memcpy(c->guest_mac, eh->h_source, ETH_ALEN);
proto_update_l2_buf(c->guest_mac, NULL);
}
+ iov_tail_prune(data);
+ ASSERT(data->cnt == 1); /* packet_add() doesn't support iovec */
+
switch (ntohs(eh->h_proto)) {
case ETH_P_ARP:
case ETH_P_IP:
@@ -1095,14 +1100,16 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
tap4_handler(c, pool_tap4, now);
pool_flush(pool_tap4);
}
- packet_add(pool_tap4, l2len, p);
+ packet_add(pool_tap4, data->iov[0].iov_len - data->off,
+ (char *)data->iov[0].iov_base + data->off);
break;
case ETH_P_IPV6:
if (pool_full(pool_tap6)) {
tap6_handler(c, pool_tap6, now);
pool_flush(pool_tap6);
}
- packet_add(pool_tap6, l2len, p);
+ packet_add(pool_tap6, data->iov[0].iov_len - data->off,
+ (char *)data->iov[0].iov_base + data->off);
break;
default:
break;
@@ -1168,6 +1175,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
while (n >= (ssize_t)sizeof(uint32_t)) {
uint32_t l2len = ntohl_unaligned(p);
+ struct iov_tail data;
if (l2len < sizeof(struct ethhdr) || l2len > L2_MAX_LEN_PASST) {
err("Bad frame size from guest, resetting connection");
@@ -1182,7 +1190,8 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
p += sizeof(uint32_t);
n -= sizeof(uint32_t);
- tap_add_packet(c, l2len, p, now);
+ data = IOV_TAIL_FROM_BUF(p, l2len, 0);
+ tap_add_packet(c, &data, now);
p += l2len;
n -= l2len;
@@ -1226,6 +1235,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
for (n = 0;
n <= (ssize_t)(sizeof(pkt_buf) - L2_MAX_LEN_PASTA);
n += len) {
+ struct iov_tail data;
+
len = read(c->fd_tap, pkt_buf + n, L2_MAX_LEN_PASTA);
if (len == 0) {
@@ -1247,7 +1258,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
len > (ssize_t)L2_MAX_LEN_PASTA)
continue;
- tap_add_packet(c, len, pkt_buf + n, now);
+ data = IOV_TAIL_FROM_BUF(pkt_buf + n, len, 0);
+ tap_add_packet(c, &data, now);
}
tap_handler(c, now);
diff --git a/tap.h b/tap.h
index 936ae9371fd6..ce5510882d5d 100644
--- a/tap.h
+++ b/tap.h
@@ -119,7 +119,6 @@ void tap_sock_update_pool(void *base, size_t size);
void tap_backend_init(struct ctx *c);
void tap_flush_pools(void);
void tap_handler(struct ctx *c, const struct timespec *now);
-void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
+void tap_add_packet(struct ctx *c, struct iov_tail *data,
const struct timespec *now);
-
#endif /* TAP_H */
diff --git a/vu_common.c b/vu_common.c
index 5e6fd4a8261f..b77b21420c57 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -163,7 +163,6 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
struct vu_virtq *vq = &vdev->vq[index];
- int hdrlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
int out_sg_count;
int count;
@@ -176,6 +175,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
while (count < VIRTQUEUE_MAX_SIZE &&
out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) {
int ret;
+ struct iov_tail data;
elem[count].out_num = VU_MAX_TX_BUFFER_NB;
elem[count].out_sg = &out_sg[out_sg_count];
@@ -191,26 +191,10 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
warn("virtio-net transmit queue contains no out buffers");
break;
}
- if (elem[count].out_num == 1) {
- tap_add_packet(vdev->context,
- elem[count].out_sg[0].iov_len - hdrlen,
- (char *)elem[count].out_sg[0].iov_base +
- hdrlen, now);
- } else {
- /* vnet header can be in a separate iovec */
- if (elem[count].out_num != 2) {
- debug("virtio-net transmit queue contains more than one buffer ([%d]: %u)",
- count, elem[count].out_num);
- } else if (elem[count].out_sg[0].iov_len != (size_t)hdrlen) {
- debug("virtio-net transmit queue entry not aligned on hdrlen ([%d]: %d != %zu)",
- count, hdrlen, elem[count].out_sg[0].iov_len);
- } else {
- tap_add_packet(vdev->context,
- elem[count].out_sg[1].iov_len,
- (char *)elem[count].out_sg[1].iov_base,
- now);
- }
- }
+
+ data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
+ if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
+ tap_add_packet(vdev->context, &data, now);
count++;
}
--
@@ -163,7 +163,6 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
struct vu_virtq *vq = &vdev->vq[index];
- int hdrlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
int out_sg_count;
int count;
@@ -176,6 +175,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
while (count < VIRTQUEUE_MAX_SIZE &&
out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) {
int ret;
+ struct iov_tail data;
elem[count].out_num = VU_MAX_TX_BUFFER_NB;
elem[count].out_sg = &out_sg[out_sg_count];
@@ -191,26 +191,10 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
warn("virtio-net transmit queue contains no out buffers");
break;
}
- if (elem[count].out_num == 1) {
- tap_add_packet(vdev->context,
- elem[count].out_sg[0].iov_len - hdrlen,
- (char *)elem[count].out_sg[0].iov_base +
- hdrlen, now);
- } else {
- /* vnet header can be in a separate iovec */
- if (elem[count].out_num != 2) {
- debug("virtio-net transmit queue contains more than one buffer ([%d]: %u)",
- count, elem[count].out_num);
- } else if (elem[count].out_sg[0].iov_len != (size_t)hdrlen) {
- debug("virtio-net transmit queue entry not aligned on hdrlen ([%d]: %d != %zu)",
- count, hdrlen, elem[count].out_sg[0].iov_len);
- } else {
- tap_add_packet(vdev->context,
- elem[count].out_sg[1].iov_len,
- (char *)elem[count].out_sg[1].iov_base,
- now);
- }
- }
+
+ data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
+ if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
+ tap_add_packet(vdev->context, &data, now);
count++;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 05/30] packet: Use iov_tail with packet_add()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (3 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 06/30] packet: Add packet_data() Laurent Vivier
` (24 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Modify the interface of packet_add_do() to take an iov_tail
rather than a memory pointer and length.
Internally it only supports iovec array with only one entry,
after being pruned. We can accept iovec array with several
entries if the offset allows the function to reduce the number
of entries to 1.
tap4_handler() is updated to create an iov_tail value using
IOV_TAIL_FROM_BUF() from the buffer and the length.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
packet.c | 15 ++++++++++++---
packet.h | 7 ++++---
tap.c | 32 ++++++++++++++++++--------------
3 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/packet.c b/packet.c
index 72c61580be1e..98ded4e27aae 100644
--- a/packet.c
+++ b/packet.c
@@ -87,15 +87,16 @@ bool pool_full(const struct pool *p)
/**
* packet_add_do() - Add data as packet descriptor to given pool
* @p: Existing pool
- * @len: Length of new descriptor
- * @start: Start of data
+ * @data: Data to add
* @func: For tracing: name of calling function
* @line: For tracing: caller line of function call
*/
-void packet_add_do(struct pool *p, size_t len, const char *start,
+void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line)
{
size_t idx = p->count;
+ const char *start;
+ size_t len;
if (pool_full(p)) {
debug("add packet index %zu to pool with size %zu, %s:%i",
@@ -103,6 +104,14 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
return;
}
+ if (!iov_tail_prune(data))
+ return;
+
+ ASSERT(data->cnt == 1); /* we don't support iovec */
+
+ len = data->iov[0].iov_len - data->off;
+ start = (char *)data->iov[0].iov_base + data->off;
+
if (packet_check_range(p, start, len, func, line))
return;
diff --git a/packet.h b/packet.h
index c94780a5ea54..af40b39b5251 100644
--- a/packet.h
+++ b/packet.h
@@ -7,6 +7,7 @@
#define PACKET_H
#include <stdbool.h>
+#include "iov.h"
/* Maximum size of a single packet stored in pool, including headers */
#define PACKET_MAX_LEN ((size_t)UINT16_MAX)
@@ -30,7 +31,7 @@ struct pool {
};
int vu_packet_check_range(void *buf, const char *ptr, size_t len);
-void packet_add_do(struct pool *p, size_t len, const char *start,
+void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line);
void *packet_get_try_do(const struct pool *p, const size_t idx,
size_t offset, size_t len, size_t *left,
@@ -41,8 +42,8 @@ void *packet_get_do(const struct pool *p, const size_t idx,
bool pool_full(const struct pool *p);
void pool_flush(struct pool *p);
-#define packet_add(p, len, start) \
- packet_add_do(p, len, start, __func__, __LINE__)
+#define packet_add(p, data) \
+ packet_add_do(p, data, __func__, __LINE__)
#define packet_get_try(p, idx, offset, len, left) \
packet_get_try_do(p, idx, offset, len, left, __func__, __LINE__)
diff --git a/tap.c b/tap.c
index c5520bf3bc76..8d2b118152f1 100644
--- a/tap.c
+++ b/tap.c
@@ -709,6 +709,7 @@ resume:
size_t l2len, l3len, hlen, l4len;
const struct ethhdr *eh;
const struct udphdr *uh;
+ struct iov_tail data;
struct iphdr *iph;
const char *l4h;
@@ -720,7 +721,8 @@ resume:
if (ntohs(eh->h_proto) == ETH_P_ARP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l2len, (char *)eh);
+ data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
+ packet_add(pkt, &data);
arp(c, pkt);
continue;
}
@@ -765,7 +767,8 @@ resume:
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
+ packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
pkt, now);
@@ -779,7 +782,8 @@ resume:
if (iph->protocol == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l2len, (char *)eh);
+ data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
+ packet_add(pkt, &data);
if (dhcp(c, pkt))
continue;
}
@@ -830,7 +834,8 @@ resume:
#undef L4_SET
append:
- packet_add((struct pool *)&seq->p, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
+ packet_add((struct pool *)&seq->p, &data);
}
for (j = 0, seq = tap4_l4; j < seq_count; j++, seq++) {
@@ -886,6 +891,7 @@ resume:
struct in6_addr *saddr, *daddr;
const struct ethhdr *eh;
const struct udphdr *uh;
+ struct iov_tail data;
struct ipv6hdr *ip6h;
uint8_t proto;
char *l4h;
@@ -939,7 +945,8 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add(pkt, &data);
if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
continue;
@@ -958,7 +965,8 @@ resume:
if (proto == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add(pkt, &data);
if (dhcpv6(c, pkt, saddr, daddr))
continue;
@@ -1014,7 +1022,8 @@ resume:
#undef L4_SET
append:
- packet_add((struct pool *)&seq->p, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add((struct pool *)&seq->p, &data);
}
for (j = 0, seq = tap6_l4; j < seq_count; j++, seq++) {
@@ -1090,9 +1099,6 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
proto_update_l2_buf(c->guest_mac, NULL);
}
- iov_tail_prune(data);
- ASSERT(data->cnt == 1); /* packet_add() doesn't support iovec */
-
switch (ntohs(eh->h_proto)) {
case ETH_P_ARP:
case ETH_P_IP:
@@ -1100,16 +1106,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
tap4_handler(c, pool_tap4, now);
pool_flush(pool_tap4);
}
- packet_add(pool_tap4, data->iov[0].iov_len - data->off,
- (char *)data->iov[0].iov_base + data->off);
+ packet_add(pool_tap4, data);
break;
case ETH_P_IPV6:
if (pool_full(pool_tap6)) {
tap6_handler(c, pool_tap6, now);
pool_flush(pool_tap6);
}
- packet_add(pool_tap6, data->iov[0].iov_len - data->off,
- (char *)data->iov[0].iov_base + data->off);
+ packet_add(pool_tap6, data);
break;
default:
break;
--
@@ -709,6 +709,7 @@ resume:
size_t l2len, l3len, hlen, l4len;
const struct ethhdr *eh;
const struct udphdr *uh;
+ struct iov_tail data;
struct iphdr *iph;
const char *l4h;
@@ -720,7 +721,8 @@ resume:
if (ntohs(eh->h_proto) == ETH_P_ARP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l2len, (char *)eh);
+ data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
+ packet_add(pkt, &data);
arp(c, pkt);
continue;
}
@@ -765,7 +767,8 @@ resume:
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
+ packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
pkt, now);
@@ -779,7 +782,8 @@ resume:
if (iph->protocol == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l2len, (char *)eh);
+ data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
+ packet_add(pkt, &data);
if (dhcp(c, pkt))
continue;
}
@@ -830,7 +834,8 @@ resume:
#undef L4_SET
append:
- packet_add((struct pool *)&seq->p, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
+ packet_add((struct pool *)&seq->p, &data);
}
for (j = 0, seq = tap4_l4; j < seq_count; j++, seq++) {
@@ -886,6 +891,7 @@ resume:
struct in6_addr *saddr, *daddr;
const struct ethhdr *eh;
const struct udphdr *uh;
+ struct iov_tail data;
struct ipv6hdr *ip6h;
uint8_t proto;
char *l4h;
@@ -939,7 +945,8 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add(pkt, &data);
if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
continue;
@@ -958,7 +965,8 @@ resume:
if (proto == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_add(pkt, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add(pkt, &data);
if (dhcpv6(c, pkt, saddr, daddr))
continue;
@@ -1014,7 +1022,8 @@ resume:
#undef L4_SET
append:
- packet_add((struct pool *)&seq->p, l4len, l4h);
+ data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
+ packet_add((struct pool *)&seq->p, &data);
}
for (j = 0, seq = tap6_l4; j < seq_count; j++, seq++) {
@@ -1090,9 +1099,6 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
proto_update_l2_buf(c->guest_mac, NULL);
}
- iov_tail_prune(data);
- ASSERT(data->cnt == 1); /* packet_add() doesn't support iovec */
-
switch (ntohs(eh->h_proto)) {
case ETH_P_ARP:
case ETH_P_IP:
@@ -1100,16 +1106,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
tap4_handler(c, pool_tap4, now);
pool_flush(pool_tap4);
}
- packet_add(pool_tap4, data->iov[0].iov_len - data->off,
- (char *)data->iov[0].iov_base + data->off);
+ packet_add(pool_tap4, data);
break;
case ETH_P_IPV6:
if (pool_full(pool_tap6)) {
tap6_handler(c, pool_tap6, now);
pool_flush(pool_tap6);
}
- packet_add(pool_tap6, data->iov[0].iov_len - data->off,
- (char *)data->iov[0].iov_base + data->off);
+ packet_add(pool_tap6, data);
break;
default:
break;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 06/30] packet: Add packet_data()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (4 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 05/30] packet: Use iov_tail with packet_add() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:14 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 07/30] arp: Convert to iov_tail Laurent Vivier
` (23 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
packet_data() gets the data range from a packet descriptor from a
given pool.
It uses iov_tail to return the packet memory.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.c | 42 ++++++++++++++++++++++++++++++++++++++++++
packet.h | 5 +++++
2 files changed, 47 insertions(+)
diff --git a/packet.c b/packet.c
index 98ded4e27aae..82adc9fd1a39 100644
--- a/packet.c
+++ b/packet.c
@@ -190,6 +190,48 @@ void *packet_get_do(const struct pool *p, const size_t idx,
return r;
}
+/**
+ * packet_data_do() - Get data range from packet descriptor from given pool
+ * @p: Packet pool
+ * @idx: Index of packet descriptor in pool
+ * @data: IOV tail to store the address of the data (output)
+ * @func: For tracing: name of calling function, NULL means no trace()
+ * @line: For tracing: caller line of function call
+ *
+ * Return: false if packet index is invalid, true otherwise.
+ * If something wrong with @data, don't return at all (assert).
+ */
+/* cppcheck-suppress unusedFunction */
+bool packet_data_do(const struct pool *p, size_t idx,
+ struct iov_tail *data,
+ const char *func, int line)
+{
+ size_t i;
+
+ ASSERT_WITH_MSG(p->count <= p->size,
+ "Corrupted pool count: %zu, size: %zu, %s:%i",
+ p->count, p->size, func, line);
+
+ if (idx >= p->count) {
+ debug("packet %zu from pool size: %zu, count: %zu, "
+ "%s:%i", idx, p->size, p->count, func, line);
+ return false;
+ }
+
+ data->cnt = 1;
+ data->off = 0;
+ data->iov = &p->pkt[idx];
+
+ for (i = 0; i < data->cnt; i++) {
+ ASSERT_WITH_MSG(!packet_check_range(p, data->iov[i].iov_base,
+ data->iov[i].iov_len,
+ func, line),
+ "Corrupt packet pool, %s:%i", func, line);
+ }
+
+ return true;
+}
+
/**
* pool_flush() - Flush a packet pool
* @p: Pointer to packet pool
diff --git a/packet.h b/packet.h
index af40b39b5251..062afb978124 100644
--- a/packet.h
+++ b/packet.h
@@ -39,6 +39,9 @@ void *packet_get_try_do(const struct pool *p, const size_t idx,
void *packet_get_do(const struct pool *p, const size_t idx,
size_t offset, size_t len, size_t *left,
const char *func, int line);
+bool packet_data_do(const struct pool *p, const size_t idx,
+ struct iov_tail *data,
+ const char *func, int line);
bool pool_full(const struct pool *p);
void pool_flush(struct pool *p);
@@ -49,6 +52,8 @@ void pool_flush(struct pool *p);
packet_get_try_do(p, idx, offset, len, left, __func__, __LINE__)
#define packet_get(p, idx, offset, len, left) \
packet_get_do(p, idx, offset, len, left, __func__, __LINE__)
+#define packet_data(p, idx, data) \
+ packet_data_do(p, idx, data, __func__, __LINE__)
#define PACKET_POOL_DECL(_name, _size, _buf) \
struct _name ## _t { \
--
@@ -39,6 +39,9 @@ void *packet_get_try_do(const struct pool *p, const size_t idx,
void *packet_get_do(const struct pool *p, const size_t idx,
size_t offset, size_t len, size_t *left,
const char *func, int line);
+bool packet_data_do(const struct pool *p, const size_t idx,
+ struct iov_tail *data,
+ const char *func, int line);
bool pool_full(const struct pool *p);
void pool_flush(struct pool *p);
@@ -49,6 +52,8 @@ void pool_flush(struct pool *p);
packet_get_try_do(p, idx, offset, len, left, __func__, __LINE__)
#define packet_get(p, idx, offset, len, left) \
packet_get_do(p, idx, offset, len, left, __func__, __LINE__)
+#define packet_data(p, idx, data) \
+ packet_data_do(p, idx, data, __func__, __LINE__)
#define PACKET_POOL_DECL(_name, _size, _buf) \
struct _name ## _t { \
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 07/30] arp: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (5 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 06/30] packet: Add packet_data() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:17 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 08/30] ndp: " Laurent Vivier
` (22 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
arp.c | 12 +++++++++---
packet.c | 1 -
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arp.c b/arp.c
index 9f1fedeafec0..b3ac42082841 100644
--- a/arp.c
+++ b/arp.c
@@ -74,14 +74,20 @@ int arp(const struct ctx *c, const struct pool *p)
struct arphdr ah;
struct arpmsg am;
} __attribute__((__packed__)) resp;
+ struct arphdr ah_storage;
+ struct ethhdr eh_storage;
+ struct arpmsg am_storage;
const struct ethhdr *eh;
const struct arphdr *ah;
const struct arpmsg *am;
+ struct iov_tail data;
- eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
- ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
- am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
+ if (!packet_data(p, 0, &data))
+ return -1;
+ eh = IOV_REMOVE_HEADER(&data, eh_storage);
+ ah = IOV_REMOVE_HEADER(&data, ah_storage);
+ am = IOV_REMOVE_HEADER(&data, am_storage);
if (!eh || !ah || !am)
return -1;
diff --git a/packet.c b/packet.c
index 82adc9fd1a39..34b1722b9a03 100644
--- a/packet.c
+++ b/packet.c
@@ -201,7 +201,6 @@ void *packet_get_do(const struct pool *p, const size_t idx,
* Return: false if packet index is invalid, true otherwise.
* If something wrong with @data, don't return at all (assert).
*/
-/* cppcheck-suppress unusedFunction */
bool packet_data_do(const struct pool *p, size_t idx,
struct iov_tail *data,
const char *func, int line)
--
@@ -201,7 +201,6 @@ void *packet_get_do(const struct pool *p, const size_t idx,
* Return: false if packet index is invalid, true otherwise.
* If something wrong with @data, don't return at all (assert).
*/
-/* cppcheck-suppress unusedFunction */
bool packet_data_do(const struct pool *p, size_t idx,
struct iov_tail *data,
const char *func, int line)
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 08/30] ndp: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (6 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 07/30] arp: Convert to iov_tail Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 09/30] icmp: " Laurent Vivier
` (21 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
ndp.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/ndp.c b/ndp.c
index 3e1549456839..5de4e508dc52 100644
--- a/ndp.c
+++ b/ndp.c
@@ -350,9 +350,14 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
return 1;
if (ih->icmp6_type == NS) {
+ struct ndp_ns ns_storage;
const struct ndp_ns *ns;
+ struct iov_tail data;
- ns = packet_get(p, 0, 0, sizeof(struct ndp_ns), NULL);
+ if (!packet_data(p, 0, &data))
+ return -1;
+
+ ns = IOV_REMOVE_HEADER(&data, ns_storage);
if (!ns)
return -1;
--
@@ -350,9 +350,14 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
return 1;
if (ih->icmp6_type == NS) {
+ struct ndp_ns ns_storage;
const struct ndp_ns *ns;
+ struct iov_tail data;
- ns = packet_get(p, 0, 0, sizeof(struct ndp_ns), NULL);
+ if (!packet_data(p, 0, &data))
+ return -1;
+
+ ns = IOV_REMOVE_HEADER(&data, ns_storage);
if (!ns)
return -1;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 09/30] icmp: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (7 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 08/30] ndp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:20 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 10/30] udp: " Laurent Vivier
` (20 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_PEEK_HEADER()
rather than packet_get().
Introduce iov_tail_msghdr() to convert iov_tail to msghdr
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
icmp.c | 25 ++++++++++++++-----------
iov.c | 23 +++++++++++++++++++++++
iov.h | 2 ++
3 files changed, 39 insertions(+), 11 deletions(-)
diff --git a/icmp.c b/icmp.c
index 95f38c1e2a3a..fdfc857b5ae8 100644
--- a/icmp.c
+++ b/icmp.c
@@ -241,25 +241,27 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
struct icmp_ping_flow *pingf;
const struct flowside *tgt;
union sockaddr_inany sa;
- size_t dlen, l4len;
+ struct iov_tail data;
+ struct msghdr msh;
uint16_t id, seq;
union flow *flow;
uint8_t proto;
socklen_t sl;
- void *pkt;
(void)saddr;
ASSERT(pif == PIF_TAP);
+ if (!packet_data(p, 0, &data))
+ return -1;
+
if (af == AF_INET) {
+ struct icmphdr ih_storage;
const struct icmphdr *ih;
- if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen)))
+ ih = IOV_PEEK_HEADER(&data, ih_storage);
+ if (!ih)
return 1;
- ih = (struct icmphdr *)pkt;
- l4len = dlen + sizeof(*ih);
-
if (ih->type != ICMP_ECHO)
return 1;
@@ -267,14 +269,13 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
id = ntohs(ih->un.echo.id);
seq = ntohs(ih->un.echo.sequence);
} else if (af == AF_INET6) {
+ struct icmp6hdr ih_storage;
const struct icmp6hdr *ih;
- if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen)))
+ ih = IOV_PEEK_HEADER(&data, ih_storage);
+ if (!ih)
return 1;
- ih = (struct icmp6hdr *)pkt;
- l4len = dlen + sizeof(*ih);
-
if (ih->icmp6_type != ICMPV6_ECHO_REQUEST)
return 1;
@@ -298,8 +299,10 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
ASSERT(flow_proto[pingf->f.type] == proto);
pingf->ts = now->tv_sec;
+
pif_sockaddr(c, &sa, &sl, PIF_HOST, &tgt->eaddr, 0);
- if (sendto(pingf->sock, pkt, l4len, MSG_NOSIGNAL, &sa.sa, sl) < 0) {
+ iov_tail_msghdr(&msh, &data, &sa, sl);
+ if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) {
flow_dbg_perror(pingf, "failed to relay request to socket");
} else {
flow_dbg(pingf,
diff --git a/iov.c b/iov.c
index 97e4ea733540..9d99beb32532 100644
--- a/iov.c
+++ b/iov.c
@@ -158,6 +158,29 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt)
return len;
}
+/**
+ * iov_tail_msghdr - Initialize a msghdr from an IOV tail structure
+ * @msh: msghdr to initialize
+ * @tail: iov_tail to use to set msg_iov and msg_iovlen
+ * @msg_name: Pointer to set to msg_name
+ * @msg_namelen: Size of @msg_name
+ */
+void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
+ void *msg_name, socklen_t msg_namelen)
+{
+ iov_tail_prune(tail);
+
+ ASSERT(tail->off == 0);
+
+ msh->msg_name = msg_name;
+ msh->msg_namelen = msg_namelen;
+ msh->msg_iov = (struct iovec *)tail->iov;
+ msh->msg_iovlen = tail->cnt;
+ msh->msg_control = NULL;
+ msh->msg_controllen = 0;
+ msh->msg_flags = 0;
+}
+
/**
* iov_tail_prune() - Remove any unneeded buffers from an IOV tail
* @tail: IO vector tail (modified)
diff --git a/iov.h b/iov.h
index ccdb690ef3f1..75c3b07a87e3 100644
--- a/iov.h
+++ b/iov.h
@@ -82,6 +82,8 @@ struct iov_tail {
1, \
(off_))
+void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
+ void *msg_name, socklen_t msg_namelen);
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
bool iov_tail_drop(struct iov_tail *tail, size_t len);
--
@@ -82,6 +82,8 @@ struct iov_tail {
1, \
(off_))
+void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
+ void *msg_name, socklen_t msg_namelen);
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
bool iov_tail_drop(struct iov_tail *tail, size_t len);
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 10/30] udp: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (8 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 09/30] icmp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:23 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail Laurent Vivier
` (19 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and IOV_PEEK_HEADER() rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
iov.c | 1 -
udp.c | 33 ++++++++++++++++++++++-----------
2 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/iov.c b/iov.c
index 9d99beb32532..f519eb3cfeaf 100644
--- a/iov.c
+++ b/iov.c
@@ -334,7 +334,6 @@ void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t alig
* iov array, a negative value if there is not enough room in the
* destination iov array
*/
-/* cppcheck-suppress unusedFunction */
ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
struct iov_tail *tail)
{
diff --git a/udp.c b/udp.c
index 75edc2054d4a..3c25f2e0ae97 100644
--- a/udp.c
+++ b/udp.c
@@ -978,9 +978,11 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
struct mmsghdr mm[UIO_MAXIOV];
union sockaddr_inany to_sa;
struct iovec m[UIO_MAXIOV];
+ struct udphdr uh_storage;
const struct udphdr *uh;
struct udp_flow *uflow;
- int i, s, count = 0;
+ int i, j, s, count = 0;
+ struct iov_tail data;
flow_sidx_t tosidx;
in_port_t src, dst;
uint8_t topif;
@@ -988,7 +990,10 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
ASSERT(!c->no_udp);
- uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
+ if (!packet_data(p, idx, &data))
+ return 1;
+
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
if (!uh)
return 1;
@@ -1025,23 +1030,29 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
pif_sockaddr(c, &to_sa, &sl, topif, &toside->eaddr, toside->eport);
- for (i = 0; i < (int)p->count - idx; i++) {
- struct udphdr *uh_send;
- size_t len;
+ for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
+ const struct udphdr *uh_send;
- uh_send = packet_get(p, idx + i, 0, sizeof(*uh), &len);
+ if (!packet_data(p, idx + i, &data))
+ return p->count - idx;
+
+ uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh_send)
return p->count - idx;
mm[i].msg_hdr.msg_name = &to_sa;
mm[i].msg_hdr.msg_namelen = sl;
- if (len) {
- m[i].iov_base = (char *)(uh_send + 1);
- m[i].iov_len = len;
+ if (data.cnt) {
+ int cnt;
+
+ cnt = iov_tail_clone(&m[j], UIO_MAXIOV - j, &data);
+ if (cnt < 0)
+ return p->count - idx;
- mm[i].msg_hdr.msg_iov = m + i;
- mm[i].msg_hdr.msg_iovlen = 1;
+ mm[i].msg_hdr.msg_iov = &m[j];
+ mm[i].msg_hdr.msg_iovlen = cnt;
+ j += cnt;
} else {
mm[i].msg_hdr.msg_iov = NULL;
mm[i].msg_hdr.msg_iovlen = 0;
--
@@ -978,9 +978,11 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
struct mmsghdr mm[UIO_MAXIOV];
union sockaddr_inany to_sa;
struct iovec m[UIO_MAXIOV];
+ struct udphdr uh_storage;
const struct udphdr *uh;
struct udp_flow *uflow;
- int i, s, count = 0;
+ int i, j, s, count = 0;
+ struct iov_tail data;
flow_sidx_t tosidx;
in_port_t src, dst;
uint8_t topif;
@@ -988,7 +990,10 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
ASSERT(!c->no_udp);
- uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
+ if (!packet_data(p, idx, &data))
+ return 1;
+
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
if (!uh)
return 1;
@@ -1025,23 +1030,29 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
pif_sockaddr(c, &to_sa, &sl, topif, &toside->eaddr, toside->eport);
- for (i = 0; i < (int)p->count - idx; i++) {
- struct udphdr *uh_send;
- size_t len;
+ for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
+ const struct udphdr *uh_send;
- uh_send = packet_get(p, idx + i, 0, sizeof(*uh), &len);
+ if (!packet_data(p, idx + i, &data))
+ return p->count - idx;
+
+ uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh_send)
return p->count - idx;
mm[i].msg_hdr.msg_name = &to_sa;
mm[i].msg_hdr.msg_namelen = sl;
- if (len) {
- m[i].iov_base = (char *)(uh_send + 1);
- m[i].iov_len = len;
+ if (data.cnt) {
+ int cnt;
+
+ cnt = iov_tail_clone(&m[j], UIO_MAXIOV - j, &data);
+ if (cnt < 0)
+ return p->count - idx;
- mm[i].msg_hdr.msg_iov = m + i;
- mm[i].msg_hdr.msg_iovlen = 1;
+ mm[i].msg_hdr.msg_iov = &m[j];
+ mm[i].msg_hdr.msg_iovlen = cnt;
+ j += cnt;
} else {
mm[i].msg_hdr.msg_iov = NULL;
mm[i].msg_hdr.msg_iovlen = 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (9 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 10/30] udp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:35 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() " Laurent Vivier
` (18 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and iov_remove_header_() rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
tcp.c | 31 ++++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/tcp.c b/tcp.c
index 957b498db32d..f1048d7230c9 100644
--- a/tcp.c
+++ b/tcp.c
@@ -310,6 +310,16 @@
#include "tcp_buf.h"
#include "tcp_vu.h"
+/*
+ * The size of TCP header (including options) is given by doff (Data Offset)
+ * that is a 4-bit value specifying the number of 32-bit words in the header.
+ * The maximum value of doff is 15 [(1 << 4) - 1].
+ * The maximum length in bytes of options is 15 minus the number of 32-bit
+ * words in the minimal TCP header (5) multiplied by the length of a 32-bit
+ * word (4).
+ */
+#define OPTLEN_MAX (((1UL << 4) - 1 - 5) * 4UL)
+
#ifndef __USE_MISC
/* From Linux UAPI, missing in netinet/tcp.h provided by musl */
struct tcp_repair_opt {
@@ -1957,8 +1967,11 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const struct pool *p, int idx, const struct timespec *now)
{
struct tcp_tap_conn *conn;
+ struct tcphdr th_storage;
const struct tcphdr *th;
- size_t optlen, len;
+ char optsc[OPTLEN_MAX];
+ struct iov_tail data;
+ size_t optlen, l4len;
const char *opts;
union flow *flow;
flow_sidx_t sidx;
@@ -1967,15 +1980,19 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
(void)pif;
- th = packet_get(p, idx, 0, sizeof(*th), &len);
+ if (!packet_data(p, idx, &data))
+ return 1;
+
+ l4len = iov_tail_size(&data);
+
+ th = IOV_REMOVE_HEADER(&data, th_storage);
if (!th)
return 1;
- len += sizeof(*th);
optlen = th->doff * 4UL - sizeof(*th);
/* Static checkers might fail to see this: */
- optlen = MIN(optlen, ((1UL << 4) /* from doff width */ - 6) * 4UL);
- opts = packet_get(p, idx, sizeof(*th), optlen, NULL);
+ optlen = MIN(optlen, OPTLEN_MAX);
+ opts = (char *)iov_remove_header_(&data, &optsc[0], optlen, 1);
sidx = flow_lookup_af(c, IPPROTO_TCP, PIF_TAP, af, saddr, daddr,
ntohs(th->source), ntohs(th->dest));
@@ -1987,7 +2004,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
tcp_conn_from_tap(c, af, saddr, daddr, th,
opts, optlen, now);
else
- tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, len);
+ tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, l4len);
return 1;
}
@@ -1995,7 +2012,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
ASSERT(pif_at_sidx(sidx) == PIF_TAP);
conn = &flow->tcp;
- flow_trace(conn, "packet length %zu from tap", len);
+ flow_trace(conn, "packet length %zu from tap", l4len);
if (th->rst) {
conn_event(c, conn, CLOSED);
--
@@ -310,6 +310,16 @@
#include "tcp_buf.h"
#include "tcp_vu.h"
+/*
+ * The size of TCP header (including options) is given by doff (Data Offset)
+ * that is a 4-bit value specifying the number of 32-bit words in the header.
+ * The maximum value of doff is 15 [(1 << 4) - 1].
+ * The maximum length in bytes of options is 15 minus the number of 32-bit
+ * words in the minimal TCP header (5) multiplied by the length of a 32-bit
+ * word (4).
+ */
+#define OPTLEN_MAX (((1UL << 4) - 1 - 5) * 4UL)
+
#ifndef __USE_MISC
/* From Linux UAPI, missing in netinet/tcp.h provided by musl */
struct tcp_repair_opt {
@@ -1957,8 +1967,11 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const struct pool *p, int idx, const struct timespec *now)
{
struct tcp_tap_conn *conn;
+ struct tcphdr th_storage;
const struct tcphdr *th;
- size_t optlen, len;
+ char optsc[OPTLEN_MAX];
+ struct iov_tail data;
+ size_t optlen, l4len;
const char *opts;
union flow *flow;
flow_sidx_t sidx;
@@ -1967,15 +1980,19 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
(void)pif;
- th = packet_get(p, idx, 0, sizeof(*th), &len);
+ if (!packet_data(p, idx, &data))
+ return 1;
+
+ l4len = iov_tail_size(&data);
+
+ th = IOV_REMOVE_HEADER(&data, th_storage);
if (!th)
return 1;
- len += sizeof(*th);
optlen = th->doff * 4UL - sizeof(*th);
/* Static checkers might fail to see this: */
- optlen = MIN(optlen, ((1UL << 4) /* from doff width */ - 6) * 4UL);
- opts = packet_get(p, idx, sizeof(*th), optlen, NULL);
+ optlen = MIN(optlen, OPTLEN_MAX);
+ opts = (char *)iov_remove_header_(&data, &optsc[0], optlen, 1);
sidx = flow_lookup_af(c, IPPROTO_TCP, PIF_TAP, af, saddr, daddr,
ntohs(th->source), ntohs(th->dest));
@@ -1987,7 +2004,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
tcp_conn_from_tap(c, af, saddr, daddr, th,
opts, optlen, now);
else
- tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, len);
+ tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, l4len);
return 1;
}
@@ -1995,7 +2012,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
ASSERT(pif_at_sidx(sidx) == PIF_TAP);
conn = &flow->tcp;
- flow_trace(conn, "packet length %zu from tap", len);
+ flow_trace(conn, "packet length %zu from tap", l4len);
if (th->rst) {
conn_event(c, conn, CLOSED);
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() to use iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (10 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 2:37 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 13/30] dhcpv6: move offset initialization out of dhcpv6_opt() Laurent Vivier
` (17 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_PEEK_HEADER()
rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
tcp.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/tcp.c b/tcp.c
index f1048d7230c9..e0efc4cacb9b 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1651,16 +1651,22 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
for (i = idx, iov_i = 0; i < (int)p->count; i++) {
uint32_t seq, seq_offset, ack_seq;
+ struct tcphdr th_storage;
const struct tcphdr *th;
- char *data;
- size_t off;
+ struct iov_tail data;
+ size_t off, size;
+ int count;
- th = packet_get(p, i, 0, sizeof(*th), &len);
+ if (!packet_data(p, i, &data))
+ return -1;
+
+ th = IOV_PEEK_HEADER(&data, th_storage);
if (!th)
return -1;
- len += sizeof(*th);
+ len = iov_tail_size(&data);
off = th->doff * 4UL;
+
if (off < sizeof(*th) || off > len)
return -1;
@@ -1670,9 +1676,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
}
len -= off;
- data = packet_get(p, i, off, len, NULL);
- if (!data)
- continue;
+ iov_tail_drop(&data, off);
seq = ntohl(th->seq);
if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
@@ -1746,10 +1750,14 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
continue;
}
- tcp_iov[iov_i].iov_base = data + seq_offset;
- tcp_iov[iov_i].iov_len = len - seq_offset;
- seq_from_tap += tcp_iov[iov_i].iov_len;
- iov_i++;
+ iov_tail_drop(&data, seq_offset);
+ size = len - seq_offset;
+ count = iov_tail_clone(&tcp_iov[iov_i], UIO_MAXIOV - iov_i,
+ &data);
+ if (count < 0)
+ break;
+ seq_from_tap += size;
+ iov_i += count;
if (keep == i)
keep = -1;
--
@@ -1651,16 +1651,22 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
for (i = idx, iov_i = 0; i < (int)p->count; i++) {
uint32_t seq, seq_offset, ack_seq;
+ struct tcphdr th_storage;
const struct tcphdr *th;
- char *data;
- size_t off;
+ struct iov_tail data;
+ size_t off, size;
+ int count;
- th = packet_get(p, i, 0, sizeof(*th), &len);
+ if (!packet_data(p, i, &data))
+ return -1;
+
+ th = IOV_PEEK_HEADER(&data, th_storage);
if (!th)
return -1;
- len += sizeof(*th);
+ len = iov_tail_size(&data);
off = th->doff * 4UL;
+
if (off < sizeof(*th) || off > len)
return -1;
@@ -1670,9 +1676,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
}
len -= off;
- data = packet_get(p, i, off, len, NULL);
- if (!data)
- continue;
+ iov_tail_drop(&data, off);
seq = ntohl(th->seq);
if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
@@ -1746,10 +1750,14 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
continue;
}
- tcp_iov[iov_i].iov_base = data + seq_offset;
- tcp_iov[iov_i].iov_len = len - seq_offset;
- seq_from_tap += tcp_iov[iov_i].iov_len;
- iov_i++;
+ iov_tail_drop(&data, seq_offset);
+ size = len - seq_offset;
+ count = iov_tail_clone(&tcp_iov[iov_i], UIO_MAXIOV - iov_i,
+ &data);
+ if (count < 0)
+ break;
+ seq_from_tap += size;
+ iov_i += count;
if (keep == i)
keep = -1;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 13/30] dhcpv6: move offset initialization out of dhcpv6_opt()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (11 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 14/30] dhcpv6: Extract sending of NotOnLink status Laurent Vivier
` (16 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
No functional change.
Currently, if dhcpv6_opt() is called with offset set to 0, it will set the
offset to point to DHCPv6 options offset.
To simplify the use of iovec_tail in a later patch, move the initialization
out of the function. Replace all the call using 0 by a call using
the offset of the DHCPv6 options.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
dhcpv6.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/dhcpv6.c b/dhcpv6.c
index ba16c664ee24..1e540bb84f0d 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -54,14 +54,14 @@ struct opt_hdr {
uint16_t l;
} __attribute__((packed));
+#define UDP_MSG_HDR_SIZE (sizeof(struct udphdr) + sizeof(struct msg_hdr))
# define OPT_SIZE_CONV(x) (htons_constant(x))
#define OPT_SIZE(x) OPT_SIZE_CONV(sizeof(struct opt_##x) - \
sizeof(struct opt_hdr))
#define OPT_VSIZE(x) (sizeof(struct opt_##x) - \
sizeof(struct opt_hdr))
#define OPT_MAX_SIZE IPV6_MIN_MTU - (sizeof(struct ipv6hdr) + \
- sizeof(struct udphdr) + \
- sizeof(struct msg_hdr))
+ UDP_MSG_HDR_SIZE)
/**
* struct opt_client_id - DHCPv6 Client Identifier option
@@ -292,8 +292,7 @@ static struct opt_hdr *dhcpv6_opt(const struct pool *p, size_t *offset,
struct opt_hdr *o;
size_t left;
- if (!*offset)
- *offset = sizeof(struct udphdr) + sizeof(struct msg_hdr);
+ ASSERT(*offset >= UDP_MSG_HDR_SIZE);
while ((o = packet_get_try(p, 0, *offset, sizeof(*o), &left))) {
unsigned int opt_len = ntohs(o->l) + sizeof(*o);
@@ -329,7 +328,7 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
size_t offset;
foreach(ia_type, ia_types) {
- offset = 0;
+ offset = UDP_MSG_HDR_SIZE;
while ((ia = dhcpv6_opt(p, &offset, *ia_type))) {
if (ntohs(ia->l) < OPT_VSIZE(ia_na))
return NULL;
@@ -466,8 +465,9 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
o = (struct opt_client_fqdn *)(buf + offset);
encode_domain_name(o->domain_name, c->fqdn);
- req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p, &(size_t){ 0 },
- OPT_CLIENT_FQDN);
+ req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p,
+ &(size_t){ UDP_MSG_HDR_SIZE },
+ OPT_CLIENT_FQDN);
if (req_opt && req_opt->flags & 0x01 /* S flag */)
o->flags = 0x02 /* O flag */;
else
@@ -524,15 +524,15 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (!mh)
return -1;
- client_id = dhcpv6_opt(p, &(size_t){ 0 }, OPT_CLIENTID);
+ client_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_CLIENTID);
if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
return -1;
- server_id = dhcpv6_opt(p, &(size_t){ 0 }, OPT_SERVERID);
+ server_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_SERVERID);
if (server_id && ntohs(server_id->l) != OPT_VSIZE(server_id))
return -1;
- ia = dhcpv6_opt(p, &(size_t){ 0 }, OPT_IA_NA);
+ ia = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_NA);
if (ia && ntohs(ia->l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
return -1;
@@ -582,7 +582,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
return -1;
- if (ia || dhcpv6_opt(p, &(size_t){ 0 }, OPT_IA_TA))
+ if (ia || dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_TA))
return -1;
info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
--
@@ -54,14 +54,14 @@ struct opt_hdr {
uint16_t l;
} __attribute__((packed));
+#define UDP_MSG_HDR_SIZE (sizeof(struct udphdr) + sizeof(struct msg_hdr))
# define OPT_SIZE_CONV(x) (htons_constant(x))
#define OPT_SIZE(x) OPT_SIZE_CONV(sizeof(struct opt_##x) - \
sizeof(struct opt_hdr))
#define OPT_VSIZE(x) (sizeof(struct opt_##x) - \
sizeof(struct opt_hdr))
#define OPT_MAX_SIZE IPV6_MIN_MTU - (sizeof(struct ipv6hdr) + \
- sizeof(struct udphdr) + \
- sizeof(struct msg_hdr))
+ UDP_MSG_HDR_SIZE)
/**
* struct opt_client_id - DHCPv6 Client Identifier option
@@ -292,8 +292,7 @@ static struct opt_hdr *dhcpv6_opt(const struct pool *p, size_t *offset,
struct opt_hdr *o;
size_t left;
- if (!*offset)
- *offset = sizeof(struct udphdr) + sizeof(struct msg_hdr);
+ ASSERT(*offset >= UDP_MSG_HDR_SIZE);
while ((o = packet_get_try(p, 0, *offset, sizeof(*o), &left))) {
unsigned int opt_len = ntohs(o->l) + sizeof(*o);
@@ -329,7 +328,7 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
size_t offset;
foreach(ia_type, ia_types) {
- offset = 0;
+ offset = UDP_MSG_HDR_SIZE;
while ((ia = dhcpv6_opt(p, &offset, *ia_type))) {
if (ntohs(ia->l) < OPT_VSIZE(ia_na))
return NULL;
@@ -466,8 +465,9 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
o = (struct opt_client_fqdn *)(buf + offset);
encode_domain_name(o->domain_name, c->fqdn);
- req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p, &(size_t){ 0 },
- OPT_CLIENT_FQDN);
+ req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p,
+ &(size_t){ UDP_MSG_HDR_SIZE },
+ OPT_CLIENT_FQDN);
if (req_opt && req_opt->flags & 0x01 /* S flag */)
o->flags = 0x02 /* O flag */;
else
@@ -524,15 +524,15 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (!mh)
return -1;
- client_id = dhcpv6_opt(p, &(size_t){ 0 }, OPT_CLIENTID);
+ client_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_CLIENTID);
if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
return -1;
- server_id = dhcpv6_opt(p, &(size_t){ 0 }, OPT_SERVERID);
+ server_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_SERVERID);
if (server_id && ntohs(server_id->l) != OPT_VSIZE(server_id))
return -1;
- ia = dhcpv6_opt(p, &(size_t){ 0 }, OPT_IA_NA);
+ ia = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_NA);
if (ia && ntohs(ia->l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
return -1;
@@ -582,7 +582,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
return -1;
- if (ia || dhcpv6_opt(p, &(size_t){ 0 }, OPT_IA_TA))
+ if (ia || dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_TA))
return -1;
info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 14/30] dhcpv6: Extract sending of NotOnLink status
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (12 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 13/30] dhcpv6: move offset initialization out of dhcpv6_opt() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 15/30] dhcpv6: Convert to iov_tail Laurent Vivier
` (15 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Extract code from dhcpv6() into a new function, dhcpv6_send_ia_notonlink()
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
dhcpv6.c | 60 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 39 insertions(+), 21 deletions(-)
diff --git a/dhcpv6.c b/dhcpv6.c
index 1e540bb84f0d..8e6c29df205d 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -357,6 +357,44 @@ err:
return ia;
}
+/**
+ * dhcpv6_send_ia_notonlink() - Send NotOnLink status
+ * @c: Execution context
+ * @ia: Pointer to non-appropriate IA_NA or IA_TA
+ * @client_id: Client ID message option
+ * xid: Transaction ID for message exchange
+ */
+static void dhcpv6_send_ia_notonlink(struct ctx *c, struct opt_hdr *ia,
+ const struct opt_hdr *client_id,
+ uint32_t xid)
+{
+ const struct in6_addr *src = &c->ip6.our_tap_ll;
+ size_t n;
+
+ info("DHCPv6: received CONFIRM with inappropriate IA,"
+ " sending NotOnLink status in REPLY");
+
+ ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
+
+ n = sizeof(struct opt_ia_na);
+ memcpy(resp_not_on_link.var, ia, n);
+ memcpy(resp_not_on_link.var + n, &sc_not_on_link,
+ sizeof(sc_not_on_link));
+
+ n += sizeof(sc_not_on_link);
+ memcpy(resp_not_on_link.var + n, client_id,
+ sizeof(struct opt_hdr) + ntohs(client_id->l));
+
+ n += sizeof(struct opt_hdr) + ntohs(client_id->l);
+
+ n = offsetof(struct resp_not_on_link_t, var) + n;
+
+ resp_not_on_link.hdr.xid = xid;
+
+ tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+ xid, &resp_not_on_link, n);
+}
+
/**
* dhcpv6_dns_fill() - Fill in DNS Servers and Domain Search list options
* @c: Execution context
@@ -549,28 +587,8 @@ int dhcpv6(struct ctx *c, const struct pool *p,
return -1;
if ((bad_ia = dhcpv6_ia_notonlink(p, &c->ip6.addr))) {
- info("DHCPv6: received CONFIRM with inappropriate IA,"
- " sending NotOnLink status in REPLY");
-
- bad_ia->l = htons(OPT_VSIZE(ia_na) +
- sizeof(sc_not_on_link));
- n = sizeof(struct opt_ia_na);
- memcpy(resp_not_on_link.var, bad_ia, n);
-
- memcpy(resp_not_on_link.var + n,
- &sc_not_on_link, sizeof(sc_not_on_link));
- n += sizeof(sc_not_on_link);
-
- memcpy(resp_not_on_link.var + n, client_id,
- sizeof(struct opt_hdr) + ntohs(client_id->l));
- n += sizeof(struct opt_hdr) + ntohs(client_id->l);
-
- n = offsetof(struct resp_not_on_link_t, var) + n;
-
- resp_not_on_link.hdr.xid = mh->xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
- mh->xid, &resp_not_on_link, n);
+ dhcpv6_send_ia_notonlink(c, bad_ia, client_id, mh->xid);
return 1;
}
--
@@ -357,6 +357,44 @@ err:
return ia;
}
+/**
+ * dhcpv6_send_ia_notonlink() - Send NotOnLink status
+ * @c: Execution context
+ * @ia: Pointer to non-appropriate IA_NA or IA_TA
+ * @client_id: Client ID message option
+ * xid: Transaction ID for message exchange
+ */
+static void dhcpv6_send_ia_notonlink(struct ctx *c, struct opt_hdr *ia,
+ const struct opt_hdr *client_id,
+ uint32_t xid)
+{
+ const struct in6_addr *src = &c->ip6.our_tap_ll;
+ size_t n;
+
+ info("DHCPv6: received CONFIRM with inappropriate IA,"
+ " sending NotOnLink status in REPLY");
+
+ ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
+
+ n = sizeof(struct opt_ia_na);
+ memcpy(resp_not_on_link.var, ia, n);
+ memcpy(resp_not_on_link.var + n, &sc_not_on_link,
+ sizeof(sc_not_on_link));
+
+ n += sizeof(sc_not_on_link);
+ memcpy(resp_not_on_link.var + n, client_id,
+ sizeof(struct opt_hdr) + ntohs(client_id->l));
+
+ n += sizeof(struct opt_hdr) + ntohs(client_id->l);
+
+ n = offsetof(struct resp_not_on_link_t, var) + n;
+
+ resp_not_on_link.hdr.xid = xid;
+
+ tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+ xid, &resp_not_on_link, n);
+}
+
/**
* dhcpv6_dns_fill() - Fill in DNS Servers and Domain Search list options
* @c: Execution context
@@ -549,28 +587,8 @@ int dhcpv6(struct ctx *c, const struct pool *p,
return -1;
if ((bad_ia = dhcpv6_ia_notonlink(p, &c->ip6.addr))) {
- info("DHCPv6: received CONFIRM with inappropriate IA,"
- " sending NotOnLink status in REPLY");
-
- bad_ia->l = htons(OPT_VSIZE(ia_na) +
- sizeof(sc_not_on_link));
- n = sizeof(struct opt_ia_na);
- memcpy(resp_not_on_link.var, bad_ia, n);
-
- memcpy(resp_not_on_link.var + n,
- &sc_not_on_link, sizeof(sc_not_on_link));
- n += sizeof(sc_not_on_link);
-
- memcpy(resp_not_on_link.var + n, client_id,
- sizeof(struct opt_hdr) + ntohs(client_id->l));
- n += sizeof(struct opt_hdr) + ntohs(client_id->l);
-
- n = offsetof(struct resp_not_on_link_t, var) + n;
-
- resp_not_on_link.hdr.xid = mh->xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
- mh->xid, &resp_not_on_link, n);
+ dhcpv6_send_ia_notonlink(c, bad_ia, client_id, mh->xid);
return 1;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 15/30] dhcpv6: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (13 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 14/30] dhcpv6: Extract sending of NotOnLink status Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt() Laurent Vivier
` (14 subsequent siblings)
29 siblings, 0 replies; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and IOV_PEEK_HEADER() rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
dhcpv6.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/dhcpv6.c b/dhcpv6.c
index 8e6c29df205d..ae06e646f92f 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -533,12 +533,18 @@ int dhcpv6(struct ctx *c, const struct pool *p,
{
const struct opt_hdr *client_id, *server_id, *ia;
const struct in6_addr *src;
+ struct msg_hdr mh_storage;
const struct msg_hdr *mh;
+ struct udphdr uh_storage;
const struct udphdr *uh;
struct opt_hdr *bad_ia;
+ struct iov_tail data;
size_t mlen, n;
- uh = packet_get(p, 0, 0, sizeof(*uh), &mlen);
+ if (!packet_data(p, 0, &data))
+ return -1;
+
+ uh = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh)
return -1;
@@ -551,6 +557,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (!IN6_IS_ADDR_MULTICAST(daddr))
return -1;
+ mlen = iov_tail_size(&data);
if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh))
return -1;
@@ -558,7 +565,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
src = &c->ip6.our_tap_ll;
- mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL);
+ mh = IOV_PEEK_HEADER(&data, mh_storage);
if (!mh)
return -1;
--
@@ -533,12 +533,18 @@ int dhcpv6(struct ctx *c, const struct pool *p,
{
const struct opt_hdr *client_id, *server_id, *ia;
const struct in6_addr *src;
+ struct msg_hdr mh_storage;
const struct msg_hdr *mh;
+ struct udphdr uh_storage;
const struct udphdr *uh;
struct opt_hdr *bad_ia;
+ struct iov_tail data;
size_t mlen, n;
- uh = packet_get(p, 0, 0, sizeof(*uh), &mlen);
+ if (!packet_data(p, 0, &data))
+ return -1;
+
+ uh = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh)
return -1;
@@ -551,6 +557,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (!IN6_IS_ADDR_MULTICAST(daddr))
return -1;
+ mlen = iov_tail_size(&data);
if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh))
return -1;
@@ -558,7 +565,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
src = &c->ip6.our_tap_ll;
- mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL);
+ mh = IOV_PEEK_HEADER(&data, mh_storage);
if (!mh)
return -1;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (14 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 15/30] dhcpv6: Convert to iov_tail Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 4:14 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 17/30] dhcp: Convert to iov_tail Laurent Vivier
` (13 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
dhcpv6_opt() and its callers are refactored for iov_tail option parsing,
replacing direct offset management for improved robustness.
Its signature is now `bool dhcpv6_opt(iov_tail *data, type)`. `*data` (in/out)
points to a found option on `true` return or is restored on `false`.
The main dhcpv6() function uses IOV_REMOVE_HEADER for the msg_hdr, then
passes the iov_tail (now at options start) to the new dhcpv6_opt().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
dhcpv6.c | 179 ++++++++++++++++++++++++++++++++-----------------------
iov.c | 1 -
2 files changed, 104 insertions(+), 76 deletions(-)
diff --git a/dhcpv6.c b/dhcpv6.c
index ae06e646f92f..e93acaf9955e 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -280,112 +280,125 @@ static struct resp_not_on_link_t {
/**
* dhcpv6_opt() - Get option from DHCPv6 message
- * @p: Packet pool, single packet with UDP header
- * @offset: Offset to look at, 0: end of header, set to option start
+ * @data: Buffer with options, set to matching option on return
* @type: Option type to look up, network order
*
- * Return: pointer to option header, or NULL on malformed or missing option
+ * Return: true if found and @data points to the option header,
+ * or false on malformed or missing option and @data is
+ * unmodified.
*/
-static struct opt_hdr *dhcpv6_opt(const struct pool *p, size_t *offset,
- uint16_t type)
+static bool dhcpv6_opt(struct iov_tail *data, uint16_t type)
{
- struct opt_hdr *o;
- size_t left;
+ struct iov_tail head = *data;
+ struct opt_hdr o_storage;
+ const struct opt_hdr *o;
- ASSERT(*offset >= UDP_MSG_HDR_SIZE);
-
- while ((o = packet_get_try(p, 0, *offset, sizeof(*o), &left))) {
+ while ((o = IOV_PEEK_HEADER(data, o_storage))) {
unsigned int opt_len = ntohs(o->l) + sizeof(*o);
- if (ntohs(o->l) > left)
- return NULL;
+ if (opt_len > iov_tail_size(data))
+ break;
if (o->t == type)
- return o;
+ return true;
- *offset += opt_len;
+ iov_tail_drop(data, opt_len);
}
- return NULL;
+ *data = head;
+ return false;
}
/**
* dhcpv6_ia_notonlink() - Check if any IA contains non-appropriate addresses
- * @p: Packet pool, single packet starting from UDP header
+ * @data: Data to look at, packet starting from UDP header (input/output)
* @la: Address we want to lease to the client
*
- * Return: pointer to non-appropriate IA_NA or IA_TA, if any, NULL otherwise
+ * Return: true and @data points to non-appropriate IA_NA or IA_TA, if any,
+ * false otherwise and @data is unmodified
*/
-static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
- struct in6_addr *la)
+static bool dhcpv6_ia_notonlink(struct iov_tail *data,
+ struct in6_addr *la)
{
int ia_types[2] = { OPT_IA_NA, OPT_IA_TA }, *ia_type;
+ struct opt_ia_addr opt_addr_storage;
const struct opt_ia_addr *opt_addr;
+ struct iov_tail current, ia_base;
+ struct opt_ia_na ia_storage;
char buf[INET6_ADDRSTRLEN];
+ const struct opt_ia_na *ia;
struct in6_addr req_addr;
+ struct opt_hdr h_storage;
const struct opt_hdr *h;
- struct opt_hdr *ia;
- size_t offset;
foreach(ia_type, ia_types) {
- offset = UDP_MSG_HDR_SIZE;
- while ((ia = dhcpv6_opt(p, &offset, *ia_type))) {
- if (ntohs(ia->l) < OPT_VSIZE(ia_na))
- return NULL;
-
- offset += sizeof(struct opt_ia_na);
+ current = *data;
+ while (dhcpv6_opt(¤t, *ia_type)) {
+ ia_base = current;
+ ia = IOV_REMOVE_HEADER(¤t, ia_storage);
+ if (!ia || ntohs(ia->hdr.l) < OPT_VSIZE(ia_na))
+ goto notfound;
+
+ while (dhcpv6_opt(¤t, OPT_IAAADR)) {
+ h = IOV_PEEK_HEADER(¤t, h_storage);
+ if (!h || ntohs(h->l) != OPT_VSIZE(ia_addr))
+ goto notfound;
+
+ opt_addr = IOV_REMOVE_HEADER(¤t,
+ opt_addr_storage);
+ if (!opt_addr)
+ goto notfound;
- while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
- if (ntohs(h->l) != OPT_VSIZE(ia_addr))
- return NULL;
-
- opt_addr = (const struct opt_ia_addr *)h;
req_addr = opt_addr->addr;
if (!IN6_ARE_ADDR_EQUAL(la, &req_addr))
- goto err;
-
- offset += sizeof(struct opt_ia_addr);
+ goto notonlink;
}
}
}
- return NULL;
+notfound:
+ return false;
-err:
+notonlink:
info("DHCPv6: requested address %s not on link",
inet_ntop(AF_INET6, &req_addr, buf, sizeof(buf)));
- return ia;
+ *data = ia_base;
+ return true;
}
/**
* dhcpv6_send_ia_notonlink() - Send NotOnLink status
- * @c: Execution context
- * @ia: Pointer to non-appropriate IA_NA or IA_TA
- * @client_id: Client ID message option
- * xid: Transaction ID for message exchange
+ * @c: Execution context
+ * @ia_base: Non-appropriate IA_NA or IA_TA base
+ * @client_id_base: Client ID message option base
+ * @len: Client ID length
+ * @xid: Transaction ID for message exchange
*/
-static void dhcpv6_send_ia_notonlink(struct ctx *c, struct opt_hdr *ia,
- const struct opt_hdr *client_id,
- uint32_t xid)
+static void dhcpv6_send_ia_notonlink(struct ctx *c,
+ const struct iov_tail *ia_base,
+ const struct iov_tail *client_id_base,
+ int len, uint32_t xid)
{
const struct in6_addr *src = &c->ip6.our_tap_ll;
+ struct opt_hdr *ia = (struct opt_hdr *)resp_not_on_link.var;
size_t n;
info("DHCPv6: received CONFIRM with inappropriate IA,"
" sending NotOnLink status in REPLY");
- ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
-
n = sizeof(struct opt_ia_na);
- memcpy(resp_not_on_link.var, ia, n);
+ iov_to_buf(&ia_base->iov[0], ia_base->cnt, ia_base->off,
+ resp_not_on_link.var, n);
+ ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
memcpy(resp_not_on_link.var + n, &sc_not_on_link,
sizeof(sc_not_on_link));
n += sizeof(sc_not_on_link);
- memcpy(resp_not_on_link.var + n, client_id,
- sizeof(struct opt_hdr) + ntohs(client_id->l));
+ iov_to_buf(&client_id_base->iov[0], client_id_base->cnt,
+ client_id_base->off, resp_not_on_link.var + n,
+ sizeof(struct opt_hdr) + len);
- n += sizeof(struct opt_hdr) + ntohs(client_id->l);
+ n += sizeof(struct opt_hdr) + len;
n = offsetof(struct resp_not_on_link_t, var) + n;
@@ -474,17 +487,19 @@ search:
/**
* dhcpv6_client_fqdn_fill() - Fill in client FQDN option
+ * @data: Data to look at
* @c: Execution context
* @buf: Response message buffer where options will be appended
* @offset: Offset in message buffer for new options
*
* Return: updated length of response message buffer.
*/
-static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
+static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data,
+ const struct ctx *c,
char *buf, int offset)
{
- struct opt_client_fqdn const *req_opt;
+ struct iov_tail current = *data;
struct opt_client_fqdn *o;
size_t opt_len;
@@ -502,14 +517,16 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
}
o = (struct opt_client_fqdn *)(buf + offset);
+ o->flags = 0x00;
encode_domain_name(o->domain_name, c->fqdn);
- req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p,
- &(size_t){ UDP_MSG_HDR_SIZE },
- OPT_CLIENT_FQDN);
- if (req_opt && req_opt->flags & 0x01 /* S flag */)
- o->flags = 0x02 /* O flag */;
- else
- o->flags = 0x00;
+ if (dhcpv6_opt(¤t, OPT_CLIENT_FQDN)) {
+ struct opt_client_fqdn req_opt_storage;
+ struct opt_client_fqdn const *req_opt;
+
+ req_opt = IOV_PEEK_HEADER(¤t, req_opt_storage);
+ if (req_opt && req_opt->flags & 0x01 /* S flag */)
+ o->flags = 0x02 /* O flag */;
+ }
opt_len++;
@@ -531,14 +548,18 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
int dhcpv6(struct ctx *c, const struct pool *p,
const struct in6_addr *saddr, const struct in6_addr *daddr)
{
- const struct opt_hdr *client_id, *server_id, *ia;
+ const struct opt_server_id *server_id = NULL;
+ struct iov_tail data, opt, client_id_base;
+ const struct opt_hdr *client_id = NULL;
+ struct opt_server_id server_id_storage;
+ const struct opt_ia_na *ia = NULL;
+ struct opt_hdr client_id_storage;
+ struct opt_ia_na ia_storage;
const struct in6_addr *src;
struct msg_hdr mh_storage;
const struct msg_hdr *mh;
struct udphdr uh_storage;
const struct udphdr *uh;
- struct opt_hdr *bad_ia;
- struct iov_tail data;
size_t mlen, n;
if (!packet_data(p, 0, &data))
@@ -565,20 +586,26 @@ int dhcpv6(struct ctx *c, const struct pool *p,
src = &c->ip6.our_tap_ll;
- mh = IOV_PEEK_HEADER(&data, mh_storage);
+ mh = IOV_REMOVE_HEADER(&data, mh_storage);
if (!mh)
return -1;
- client_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_CLIENTID);
+ client_id_base = data;
+ if (dhcpv6_opt(&client_id_base, OPT_CLIENTID))
+ client_id = IOV_PEEK_HEADER(&client_id_base, client_id_storage);
if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
return -1;
- server_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_SERVERID);
- if (server_id && ntohs(server_id->l) != OPT_VSIZE(server_id))
+ opt = data;
+ if (dhcpv6_opt(&opt, OPT_SERVERID))
+ server_id = IOV_PEEK_HEADER(&opt, server_id_storage);
+ if (server_id && ntohs(server_id->hdr.l) != OPT_VSIZE(server_id))
return -1;
- ia = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_NA);
- if (ia && ntohs(ia->l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
+ opt = data;
+ if (dhcpv6_opt(&opt, OPT_IA_NA))
+ ia = IOV_PEEK_HEADER(&opt, ia_storage);
+ if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
return -1;
resp.hdr.type = TYPE_REPLY;
@@ -593,9 +620,10 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (mh->type == TYPE_CONFIRM && server_id)
return -1;
- if ((bad_ia = dhcpv6_ia_notonlink(p, &c->ip6.addr))) {
+ if (dhcpv6_ia_notonlink(&data, &c->ip6.addr)) {
- dhcpv6_send_ia_notonlink(c, bad_ia, client_id, mh->xid);
+ dhcpv6_send_ia_notonlink(c, &data, &client_id_base,
+ ntohs(client_id->l), mh->xid);
return 1;
}
@@ -607,7 +635,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
return -1;
- if (ia || dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_TA))
+ if (ia || dhcpv6_opt(&data, OPT_IA_TA))
return -1;
info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
@@ -633,13 +661,14 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (ia)
resp.ia_na.iaid = ((struct opt_ia_na *)ia)->iaid;
- memcpy(&resp.client_id, client_id,
- ntohs(client_id->l) + sizeof(struct opt_hdr));
+ iov_to_buf(&client_id_base.iov[0], client_id_base.cnt,
+ client_id_base.off, &resp.client_id,
+ ntohs(client_id->l) + sizeof(struct opt_hdr));
n = offsetof(struct resp_t, client_id) +
sizeof(struct opt_hdr) + ntohs(client_id->l);
n = dhcpv6_dns_fill(c, (char *)&resp, n);
- n = dhcpv6_client_fqdn_fill(p, c, (char *)&resp, n);
+ n = dhcpv6_client_fqdn_fill(&data, c, (char *)&resp, n);
resp.hdr.xid = mh->xid;
diff --git a/iov.c b/iov.c
index f519eb3cfeaf..d17d4dd3da09 100644
--- a/iov.c
+++ b/iov.c
@@ -109,7 +109,6 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
*
* Return: the number of bytes successfully copied.
*/
-/* cppcheck-suppress [staticFunction] */
size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
size_t offset, void *buf, size_t bytes)
{
--
@@ -109,7 +109,6 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
*
* Return: the number of bytes successfully copied.
*/
-/* cppcheck-suppress [staticFunction] */
size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
size_t offset, void *buf, size_t bytes)
{
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 17/30] dhcp: Convert to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (15 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 4:38 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr() Laurent Vivier
` (12 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and IOV_PEEK_HEADER() rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
dhcp.c | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/dhcp.c b/dhcp.c
index b0de04be6f27..cf73d4b07767 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -302,27 +302,33 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
*/
int dhcp(const struct ctx *c, const struct pool *p)
{
- size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
char macstr[ETH_ADDRSTRLEN];
+ size_t mlen, dlen, opt_len;
struct in_addr mask, dst;
+ struct ethhdr eh_storage;
+ struct iphdr iph_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct iphdr *iph;
const struct udphdr *uh;
+ struct iov_tail data;
struct msg const *m;
struct msg reply;
unsigned int i;
+ struct msg m_storage;
- eh = packet_get(p, 0, offset, sizeof(*eh), NULL);
- offset += sizeof(*eh);
+ if (!packet_data(p, 0, &data))
+ return -1;
- iph = packet_get(p, 0, offset, sizeof(*iph), NULL);
+ eh = IOV_REMOVE_HEADER(&data, eh_storage);
+ iph = IOV_PEEK_HEADER(&data, iph_storage);
if (!eh || !iph)
return -1;
- offset += iph->ihl * 4UL;
- uh = packet_get(p, 0, offset, sizeof(*uh), &mlen);
- offset += sizeof(*uh);
+ if (!iov_tail_drop(&data, iph->ihl * 4UL))
+ return -1;
+ uh = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh)
return -1;
@@ -332,7 +338,10 @@ int dhcp(const struct ctx *c, const struct pool *p)
if (c->no_dhcp)
return 1;
- m = packet_get(p, 0, offset, offsetof(struct msg, o), &opt_len);
+ mlen = iov_tail_size(&data);
+ m = (struct msg const *)iov_remove_header_(&data, &m_storage,
+ offsetof(struct msg, o),
+ __alignof__(struct msg));
if (!m ||
mlen != ntohs(uh->len) - sizeof(*uh) ||
mlen < offsetof(struct msg, o) ||
@@ -355,27 +364,28 @@ int dhcp(const struct ctx *c, const struct pool *p)
memset(&reply.file, 0, sizeof(reply.file));
reply.magic = m->magic;
- offset += offsetof(struct msg, o);
-
for (i = 0; i < ARRAY_SIZE(opts); i++)
opts[i].clen = -1;
- while (opt_off + 2 < opt_len) {
- const uint8_t *olen, *val;
+ opt_len = iov_tail_size(&data);
+ while (opt_len >= 2) {
+ uint8_t olen_storage, type_storage;
+ const uint8_t *olen;
uint8_t *type;
- type = packet_get(p, 0, offset + opt_off, 1, NULL);
- olen = packet_get(p, 0, offset + opt_off + 1, 1, NULL);
+ type = IOV_REMOVE_HEADER(&data, type_storage);
+ olen = IOV_REMOVE_HEADER(&data, olen_storage);
if (!type || !olen)
return -1;
- val = packet_get(p, 0, offset + opt_off + 2, *olen, NULL);
- if (!val)
+ opt_len = iov_tail_size(&data);
+ if (opt_len < *olen)
return -1;
- memcpy(&opts[*type].c, val, *olen);
+ iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
opts[*type].clen = *olen;
- opt_off += *olen + 2;
+ iov_tail_drop(&data, *olen);
+ opt_len -= *olen;
}
opts[80].slen = -1;
--
@@ -302,27 +302,33 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
*/
int dhcp(const struct ctx *c, const struct pool *p)
{
- size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
char macstr[ETH_ADDRSTRLEN];
+ size_t mlen, dlen, opt_len;
struct in_addr mask, dst;
+ struct ethhdr eh_storage;
+ struct iphdr iph_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct iphdr *iph;
const struct udphdr *uh;
+ struct iov_tail data;
struct msg const *m;
struct msg reply;
unsigned int i;
+ struct msg m_storage;
- eh = packet_get(p, 0, offset, sizeof(*eh), NULL);
- offset += sizeof(*eh);
+ if (!packet_data(p, 0, &data))
+ return -1;
- iph = packet_get(p, 0, offset, sizeof(*iph), NULL);
+ eh = IOV_REMOVE_HEADER(&data, eh_storage);
+ iph = IOV_PEEK_HEADER(&data, iph_storage);
if (!eh || !iph)
return -1;
- offset += iph->ihl * 4UL;
- uh = packet_get(p, 0, offset, sizeof(*uh), &mlen);
- offset += sizeof(*uh);
+ if (!iov_tail_drop(&data, iph->ihl * 4UL))
+ return -1;
+ uh = IOV_REMOVE_HEADER(&data, uh_storage);
if (!uh)
return -1;
@@ -332,7 +338,10 @@ int dhcp(const struct ctx *c, const struct pool *p)
if (c->no_dhcp)
return 1;
- m = packet_get(p, 0, offset, offsetof(struct msg, o), &opt_len);
+ mlen = iov_tail_size(&data);
+ m = (struct msg const *)iov_remove_header_(&data, &m_storage,
+ offsetof(struct msg, o),
+ __alignof__(struct msg));
if (!m ||
mlen != ntohs(uh->len) - sizeof(*uh) ||
mlen < offsetof(struct msg, o) ||
@@ -355,27 +364,28 @@ int dhcp(const struct ctx *c, const struct pool *p)
memset(&reply.file, 0, sizeof(reply.file));
reply.magic = m->magic;
- offset += offsetof(struct msg, o);
-
for (i = 0; i < ARRAY_SIZE(opts); i++)
opts[i].clen = -1;
- while (opt_off + 2 < opt_len) {
- const uint8_t *olen, *val;
+ opt_len = iov_tail_size(&data);
+ while (opt_len >= 2) {
+ uint8_t olen_storage, type_storage;
+ const uint8_t *olen;
uint8_t *type;
- type = packet_get(p, 0, offset + opt_off, 1, NULL);
- olen = packet_get(p, 0, offset + opt_off + 1, 1, NULL);
+ type = IOV_REMOVE_HEADER(&data, type_storage);
+ olen = IOV_REMOVE_HEADER(&data, olen_storage);
if (!type || !olen)
return -1;
- val = packet_get(p, 0, offset + opt_off + 2, *olen, NULL);
- if (!val)
+ opt_len = iov_tail_size(&data);
+ if (opt_len < *olen)
return -1;
- memcpy(&opts[*type].c, val, *olen);
+ iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
opts[*type].clen = *olen;
- opt_off += *olen + 2;
+ iov_tail_drop(&data, *olen);
+ opt_len -= *olen;
}
opts[80].slen = -1;
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (16 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 17/30] dhcp: Convert to iov_tail Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 5:12 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail Laurent Vivier
` (11 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and IOV_PEEK_HEADER() rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
ip.c | 32 +++++++++++++++-----------------
ip.h | 3 +--
packet.c | 1 +
tap.c | 4 +++-
4 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/ip.c b/ip.c
index 2cc7f6548aff..50bd69a70596 100644
--- a/ip.c
+++ b/ip.c
@@ -23,50 +23,48 @@
/**
* ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
- * @p: Packet pool, packet number @idx has IPv6 header at @offset
- * @idx: Index of packet in pool
- * @offset: Pre-calculated IPv6 header offset
+ * @data: IPv6 packet
* @proto: Filled with L4 protocol number
* @dlen: Data length (payload excluding header extensions), set on return
*
- * Return: pointer to L4 header, NULL if not found
+ * Return: true if the L4 header is found and @data, @proto, @dlen are set,
+ * false on error. Outputs are indeterminate on failure.
*/
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
- size_t *dlen)
+bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen)
{
+ struct ipv6_opt_hdr o_storage;
const struct ipv6_opt_hdr *o;
+ struct ipv6hdr ip6h_storage;
const struct ipv6hdr *ip6h;
- char *base;
int hdrlen;
uint8_t nh;
- base = packet_get(p, idx, 0, 0, NULL);
- ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
+ ip6h = IOV_REMOVE_HEADER(data, ip6h_storage);
if (!ip6h)
- return NULL;
-
- offset += sizeof(*ip6h);
+ return false;
+ *dlen = iov_tail_size(data);
nh = ip6h->nexthdr;
if (!IPV6_NH_OPT(nh))
goto found;
- while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
+ while ((o = IOV_PEEK_HEADER(data, o_storage))) {
+ *dlen = iov_tail_size(data) - sizeof(*o);
nh = o->nexthdr;
hdrlen = (o->hdrlen + 1) * 8;
if (IPV6_NH_OPT(nh))
- offset += hdrlen;
+ iov_tail_drop(data, hdrlen);
else
goto found;
}
- return NULL;
+ return false;
found:
if (nh == 59)
- return NULL;
+ return false;
*proto = nh;
- return base + offset;
+ return true;
}
diff --git a/ip.h b/ip.h
index 24509d9c11cd..5830b92302e2 100644
--- a/ip.h
+++ b/ip.h
@@ -115,8 +115,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
ip6h->flow_lbl[2];
}
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
- size_t *dlen);
+bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
/* IPv6 link-local all-nodes multicast address, ff02::1 */
static const struct in6_addr in6addr_ll_all_nodes = {
diff --git a/packet.c b/packet.c
index 34b1722b9a03..014b353cdf8b 100644
--- a/packet.c
+++ b/packet.c
@@ -133,6 +133,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
*
* Return: pointer to start of data range, NULL on invalid range or descriptor
*/
+/* cppcheck-suppress [staticFunction] */
void *packet_get_try_do(const struct pool *p, size_t idx, size_t offset,
size_t len, size_t *left, const char *func, int line)
{
diff --git a/tap.c b/tap.c
index 8d2b118152f1..d7852fad6069 100644
--- a/tap.c
+++ b/tap.c
@@ -911,8 +911,10 @@ resume:
if (plen != check)
continue;
- if (!(l4h = ipv6_l4hdr(in, i, sizeof(*eh), &proto, &l4len)))
+ data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
+ if (!ipv6_l4hdr(&data, &proto, &l4len))
continue;
+ l4h = (char *)data.iov[0].iov_base + data.off;
if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
--
@@ -911,8 +911,10 @@ resume:
if (plen != check)
continue;
- if (!(l4h = ipv6_l4hdr(in, i, sizeof(*eh), &proto, &l4len)))
+ data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
+ if (!ipv6_l4hdr(&data, &proto, &l4len))
continue;
+ l4h = (char *)data.iov[0].iov_base + data.off;
if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (17 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 5:17 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 20/30] tap: Convert tap6_handler() " Laurent Vivier
` (10 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_PEEK_HEADER()
rather than packet_get().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
tap.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/tap.c b/tap.c
index d7852fad6069..4fbcad3b385f 100644
--- a/tap.c
+++ b/tap.c
@@ -706,28 +706,34 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
i = 0;
resume:
for (seq_count = 0, seq = NULL; i < in->count; i++) {
- size_t l2len, l3len, hlen, l4len;
+ size_t l3len, hlen, l4len;
+ struct ethhdr eh_storage;
+ struct iphdr iph_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct udphdr *uh;
struct iov_tail data;
struct iphdr *iph;
- const char *l4h;
- packet_get(in, i, 0, 0, &l2len);
+ if (!packet_data(in, i, &data))
+ continue;
- eh = packet_get(in, i, 0, sizeof(*eh), &l3len);
+ eh = IOV_PEEK_HEADER(&data, eh_storage);
if (!eh)
continue;
if (ntohs(eh->h_proto) == ETH_P_ARP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
packet_add(pkt, &data);
arp(c, pkt);
continue;
}
- iph = packet_get(in, i, sizeof(*eh), sizeof(*iph), NULL);
+ if (!iov_tail_drop(&data, sizeof(*eh)))
+ continue;
+ l3len = iov_tail_size(&data);
+
+ iph = IOV_PEEK_HEADER(&data, iph_storage);
if (!iph)
continue;
@@ -755,8 +761,9 @@ resume:
if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr)
c->ip4.addr_seen.s_addr = iph->saddr;
- l4h = packet_get(in, i, sizeof(*eh) + hlen, l4len, NULL);
- if (!l4h)
+ if (!iov_tail_drop(&data, hlen))
+ continue;
+ if (iov_tail_size(&data) != l4len)
continue;
if (iph->protocol == IPPROTO_ICMP) {
@@ -767,7 +774,6 @@ resume:
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
@@ -775,15 +781,17 @@ resume:
continue;
}
- uh = packet_get(in, i, sizeof(*eh) + hlen, sizeof(*uh), NULL);
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
if (!uh)
continue;
if (iph->protocol == IPPROTO_UDP) {
+ struct iov_tail eh_data;
+
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
- packet_add(pkt, &data);
+ packet_data(in, i, &eh_data);
+ packet_add(pkt, &eh_data);
if (dhcp(c, pkt))
continue;
}
@@ -834,7 +842,6 @@ resume:
#undef L4_SET
append:
- data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
packet_add((struct pool *)&seq->p, &data);
}
--
@@ -706,28 +706,34 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
i = 0;
resume:
for (seq_count = 0, seq = NULL; i < in->count; i++) {
- size_t l2len, l3len, hlen, l4len;
+ size_t l3len, hlen, l4len;
+ struct ethhdr eh_storage;
+ struct iphdr iph_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct udphdr *uh;
struct iov_tail data;
struct iphdr *iph;
- const char *l4h;
- packet_get(in, i, 0, 0, &l2len);
+ if (!packet_data(in, i, &data))
+ continue;
- eh = packet_get(in, i, 0, sizeof(*eh), &l3len);
+ eh = IOV_PEEK_HEADER(&data, eh_storage);
if (!eh)
continue;
if (ntohs(eh->h_proto) == ETH_P_ARP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
packet_add(pkt, &data);
arp(c, pkt);
continue;
}
- iph = packet_get(in, i, sizeof(*eh), sizeof(*iph), NULL);
+ if (!iov_tail_drop(&data, sizeof(*eh)))
+ continue;
+ l3len = iov_tail_size(&data);
+
+ iph = IOV_PEEK_HEADER(&data, iph_storage);
if (!iph)
continue;
@@ -755,8 +761,9 @@ resume:
if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr)
c->ip4.addr_seen.s_addr = iph->saddr;
- l4h = packet_get(in, i, sizeof(*eh) + hlen, l4len, NULL);
- if (!l4h)
+ if (!iov_tail_drop(&data, hlen))
+ continue;
+ if (iov_tail_size(&data) != l4len)
continue;
if (iph->protocol == IPPROTO_ICMP) {
@@ -767,7 +774,6 @@ resume:
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
@@ -775,15 +781,17 @@ resume:
continue;
}
- uh = packet_get(in, i, sizeof(*eh) + hlen, sizeof(*uh), NULL);
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
if (!uh)
continue;
if (iph->protocol == IPPROTO_UDP) {
+ struct iov_tail eh_data;
+
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
- packet_add(pkt, &data);
+ packet_data(in, i, &eh_data);
+ packet_add(pkt, &eh_data);
if (dhcp(c, pkt))
continue;
}
@@ -834,7 +842,6 @@ resume:
#undef L4_SET
append:
- data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
packet_add((struct pool *)&seq->p, &data);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 20/30] tap: Convert tap6_handler() to iov_tail
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (18 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:21 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 21/30] packet: rename packet_data() to packet_get() Laurent Vivier
` (9 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Use packet_data() and extract headers using IOV_REMOVE_HEADER()
and IOV_PEEK_HEADER() rather than packet_get().
Remove packet_get() as it is not used anymore.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.c | 70 --------------------------------------------------------
packet.h | 11 ---------
tap.c | 25 ++++++++++++--------
3 files changed, 16 insertions(+), 90 deletions(-)
diff --git a/packet.c b/packet.c
index 014b353cdf8b..5da18bafa576 100644
--- a/packet.c
+++ b/packet.c
@@ -121,76 +121,6 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
p->count++;
}
-/**
- * packet_get_try_do() - Get data range from packet descriptor from given pool
- * @p: Packet pool
- * @idx: Index of packet descriptor in pool
- * @offset: Offset of data range in packet descriptor
- * @len: Length of desired data range
- * @left: Length of available data after range, set on return, can be NULL
- * @func: For tracing: name of calling function
- * @line: For tracing: caller line of function call
- *
- * Return: pointer to start of data range, NULL on invalid range or descriptor
- */
-/* cppcheck-suppress [staticFunction] */
-void *packet_get_try_do(const struct pool *p, size_t idx, size_t offset,
- size_t len, size_t *left, const char *func, int line)
-{
- char *ptr;
-
- ASSERT_WITH_MSG(p->count <= p->size,
- "Corrupt pool count: %zu, size: %zu, %s:%i",
- p->count, p->size, func, line);
-
- if (idx >= p->count) {
- debug("packet %zu from pool count: %zu, %s:%i",
- idx, p->count, func, line);
- return NULL;
- }
-
- if (offset > p->pkt[idx].iov_len ||
- len > (p->pkt[idx].iov_len - offset))
- return NULL;
-
- ptr = (char *)p->pkt[idx].iov_base + offset;
-
- ASSERT_WITH_MSG(!packet_check_range(p, ptr, len, func, line),
- "Corrupt packet pool, %s:%i", func, line);
-
- if (left)
- *left = p->pkt[idx].iov_len - offset - len;
-
- return ptr;
-}
-
-/**
- * packet_get_do() - Get data range from packet descriptor from given pool
- * @p: Packet pool
- * @idx: Index of packet descriptor in pool
- * @offset: Offset of data range in packet descriptor
- * @len: Length of desired data range
- * @left: Length of available data after range, set on return, can be NULL
- * @func: For tracing: name of calling function
- * @line: For tracing: caller line of function call
- *
- * Return: as packet_get_try_do() but log a trace message when returning NULL
- */
-void *packet_get_do(const struct pool *p, const size_t idx,
- size_t offset, size_t len, size_t *left,
- const char *func, int line)
-{
- void *r = packet_get_try_do(p, idx, offset, len, left, func, line);
-
- if (!r) {
- trace("missing packet data length %zu, offset %zu from "
- "length %zu, %s:%i",
- len, offset, p->pkt[idx].iov_len, func, line);
- }
-
- return r;
-}
-
/**
* packet_data_do() - Get data range from packet descriptor from given pool
* @p: Packet pool
diff --git a/packet.h b/packet.h
index 062afb978124..dab8274fa5c5 100644
--- a/packet.h
+++ b/packet.h
@@ -33,12 +33,6 @@ struct pool {
int vu_packet_check_range(void *buf, const char *ptr, size_t len);
void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line);
-void *packet_get_try_do(const struct pool *p, const size_t idx,
- size_t offset, size_t len, size_t *left,
- const char *func, int line);
-void *packet_get_do(const struct pool *p, const size_t idx,
- size_t offset, size_t len, size_t *left,
- const char *func, int line);
bool packet_data_do(const struct pool *p, const size_t idx,
struct iov_tail *data,
const char *func, int line);
@@ -47,11 +41,6 @@ void pool_flush(struct pool *p);
#define packet_add(p, data) \
packet_add_do(p, data, __func__, __LINE__)
-
-#define packet_get_try(p, idx, offset, len, left) \
- packet_get_try_do(p, idx, offset, len, left, __func__, __LINE__)
-#define packet_get(p, idx, offset, len, left) \
- packet_get_do(p, idx, offset, len, left, __func__, __LINE__)
#define packet_data(p, idx, data) \
packet_data_do(p, idx, data, __func__, __LINE__)
diff --git a/tap.c b/tap.c
index 4fbcad3b385f..983f39ee8ee8 100644
--- a/tap.c
+++ b/tap.c
@@ -896,21 +896,28 @@ resume:
for (seq_count = 0, seq = NULL; i < in->count; i++) {
size_t l4len, plen, check;
struct in6_addr *saddr, *daddr;
+ struct ipv6hdr ip6h_storage;
+ struct ethhdr eh_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct udphdr *uh;
struct iov_tail data;
struct ipv6hdr *ip6h;
uint8_t proto;
- char *l4h;
- eh = packet_get(in, i, 0, sizeof(*eh), NULL);
+ if (!packet_data(in, i, &data))
+ return -1;
+
+ eh = IOV_REMOVE_HEADER(&data, eh_storage);
if (!eh)
continue;
- ip6h = packet_get(in, i, sizeof(*eh), sizeof(*ip6h), &check);
+ ip6h = IOV_PEEK_HEADER(&data, ip6h_storage);
if (!ip6h)
continue;
+ check = iov_tail_size(&data) - sizeof(*ip6h);
+
saddr = &ip6h->saddr;
daddr = &ip6h->daddr;
@@ -918,10 +925,8 @@ resume:
if (plen != check)
continue;
- data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
if (!ipv6_l4hdr(&data, &proto, &l4len))
continue;
- l4h = (char *)data.iov[0].iov_base + data.off;
if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
@@ -946,6 +951,8 @@ resume:
}
if (proto == IPPROTO_ICMPV6) {
+ struct icmp6hdr l4h_storage;
+ const struct icmp6hdr *l4h;
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
if (c->no_icmp)
@@ -954,9 +961,9 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add(pkt, &data);
+ l4h = IOV_PEEK_HEADER(&data, l4h_storage);
if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
continue;
@@ -969,12 +976,13 @@ resume:
if (l4len < sizeof(*uh))
continue;
- uh = (struct udphdr *)l4h;
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
+ if (!uh)
+ continue;
if (proto == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add(pkt, &data);
if (dhcpv6(c, pkt, saddr, daddr))
@@ -1031,7 +1039,6 @@ resume:
#undef L4_SET
append:
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add((struct pool *)&seq->p, &data);
}
--
@@ -896,21 +896,28 @@ resume:
for (seq_count = 0, seq = NULL; i < in->count; i++) {
size_t l4len, plen, check;
struct in6_addr *saddr, *daddr;
+ struct ipv6hdr ip6h_storage;
+ struct ethhdr eh_storage;
+ struct udphdr uh_storage;
const struct ethhdr *eh;
const struct udphdr *uh;
struct iov_tail data;
struct ipv6hdr *ip6h;
uint8_t proto;
- char *l4h;
- eh = packet_get(in, i, 0, sizeof(*eh), NULL);
+ if (!packet_data(in, i, &data))
+ return -1;
+
+ eh = IOV_REMOVE_HEADER(&data, eh_storage);
if (!eh)
continue;
- ip6h = packet_get(in, i, sizeof(*eh), sizeof(*ip6h), &check);
+ ip6h = IOV_PEEK_HEADER(&data, ip6h_storage);
if (!ip6h)
continue;
+ check = iov_tail_size(&data) - sizeof(*ip6h);
+
saddr = &ip6h->saddr;
daddr = &ip6h->daddr;
@@ -918,10 +925,8 @@ resume:
if (plen != check)
continue;
- data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
if (!ipv6_l4hdr(&data, &proto, &l4len))
continue;
- l4h = (char *)data.iov[0].iov_base + data.off;
if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
@@ -946,6 +951,8 @@ resume:
}
if (proto == IPPROTO_ICMPV6) {
+ struct icmp6hdr l4h_storage;
+ const struct icmp6hdr *l4h;
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
if (c->no_icmp)
@@ -954,9 +961,9 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add(pkt, &data);
+ l4h = IOV_PEEK_HEADER(&data, l4h_storage);
if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
continue;
@@ -969,12 +976,13 @@ resume:
if (l4len < sizeof(*uh))
continue;
- uh = (struct udphdr *)l4h;
+ uh = IOV_PEEK_HEADER(&data, uh_storage);
+ if (!uh)
+ continue;
if (proto == IPPROTO_UDP) {
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add(pkt, &data);
if (dhcpv6(c, pkt, saddr, daddr))
@@ -1031,7 +1039,6 @@ resume:
#undef L4_SET
append:
- data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
packet_add((struct pool *)&seq->p, &data);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 21/30] packet: rename packet_data() to packet_get()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (19 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 20/30] tap: Convert tap6_handler() " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:22 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 22/30] arp: use iov_tail rather than pool Laurent Vivier
` (8 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
As we have removed packet_get(), we can rename packet_data() to packet_get()
as the name is clearer.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
arp.c | 2 +-
dhcp.c | 2 +-
dhcpv6.c | 2 +-
icmp.c | 2 +-
ndp.c | 2 +-
packet.c | 8 ++++----
packet.h | 9 ++++-----
tap.c | 6 +++---
tcp.c | 4 ++--
udp.c | 4 ++--
10 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/arp.c b/arp.c
index b3ac42082841..8b97df633e70 100644
--- a/arp.c
+++ b/arp.c
@@ -82,7 +82,7 @@ int arp(const struct ctx *c, const struct pool *p)
const struct arpmsg *am;
struct iov_tail data;
- if (!packet_data(p, 0, &data))
+ if (!packet_get(p, 0, &data))
return -1;
eh = IOV_REMOVE_HEADER(&data, eh_storage);
diff --git a/dhcp.c b/dhcp.c
index cf73d4b07767..47317f334945 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -317,7 +317,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
unsigned int i;
struct msg m_storage;
- if (!packet_data(p, 0, &data))
+ if (!packet_get(p, 0, &data))
return -1;
eh = IOV_REMOVE_HEADER(&data, eh_storage);
diff --git a/dhcpv6.c b/dhcpv6.c
index e93acaf9955e..f54a75c642df 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -562,7 +562,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
const struct udphdr *uh;
size_t mlen, n;
- if (!packet_data(p, 0, &data))
+ if (!packet_get(p, 0, &data))
return -1;
uh = IOV_REMOVE_HEADER(&data, uh_storage);
diff --git a/icmp.c b/icmp.c
index fdfc857b5ae8..71c496540310 100644
--- a/icmp.c
+++ b/icmp.c
@@ -251,7 +251,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
(void)saddr;
ASSERT(pif == PIF_TAP);
- if (!packet_data(p, 0, &data))
+ if (!packet_get(p, 0, &data))
return -1;
if (af == AF_INET) {
diff --git a/ndp.c b/ndp.c
index 5de4e508dc52..ba87a0aaa6e9 100644
--- a/ndp.c
+++ b/ndp.c
@@ -354,7 +354,7 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
const struct ndp_ns *ns;
struct iov_tail data;
- if (!packet_data(p, 0, &data))
+ if (!packet_get(p, 0, &data))
return -1;
ns = IOV_REMOVE_HEADER(&data, ns_storage);
diff --git a/packet.c b/packet.c
index 5da18bafa576..cbc43c2fc22d 100644
--- a/packet.c
+++ b/packet.c
@@ -122,7 +122,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
}
/**
- * packet_data_do() - Get data range from packet descriptor from given pool
+ * packet_get_do() - Get data range from packet descriptor from given pool
* @p: Packet pool
* @idx: Index of packet descriptor in pool
* @data: IOV tail to store the address of the data (output)
@@ -132,9 +132,9 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
* Return: false if packet index is invalid, true otherwise.
* If something wrong with @data, don't return at all (assert).
*/
-bool packet_data_do(const struct pool *p, size_t idx,
- struct iov_tail *data,
- const char *func, int line)
+bool packet_get_do(const struct pool *p, size_t idx,
+ struct iov_tail *data,
+ const char *func, int line)
{
size_t i;
diff --git a/packet.h b/packet.h
index dab8274fa5c5..7afe80ef3fcf 100644
--- a/packet.h
+++ b/packet.h
@@ -33,16 +33,15 @@ struct pool {
int vu_packet_check_range(void *buf, const char *ptr, size_t len);
void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line);
-bool packet_data_do(const struct pool *p, const size_t idx,
- struct iov_tail *data,
- const char *func, int line);
+bool packet_get_do(const struct pool *p, const size_t idx,
+ struct iov_tail *data, const char *func, int line);
bool pool_full(const struct pool *p);
void pool_flush(struct pool *p);
#define packet_add(p, data) \
packet_add_do(p, data, __func__, __LINE__)
-#define packet_data(p, idx, data) \
- packet_data_do(p, idx, data, __func__, __LINE__)
+#define packet_get(p, idx, data) \
+ packet_get_do(p, idx, data, __func__, __LINE__)
#define PACKET_POOL_DECL(_name, _size, _buf) \
struct _name ## _t { \
diff --git a/tap.c b/tap.c
index 983f39ee8ee8..1d2e6fd802e9 100644
--- a/tap.c
+++ b/tap.c
@@ -715,7 +715,7 @@ resume:
struct iov_tail data;
struct iphdr *iph;
- if (!packet_data(in, i, &data))
+ if (!packet_get(in, i, &data))
continue;
eh = IOV_PEEK_HEADER(&data, eh_storage);
@@ -790,7 +790,7 @@ resume:
PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
- packet_data(in, i, &eh_data);
+ packet_get(in, i, &eh_data);
packet_add(pkt, &eh_data);
if (dhcp(c, pkt))
continue;
@@ -905,7 +905,7 @@ resume:
struct ipv6hdr *ip6h;
uint8_t proto;
- if (!packet_data(in, i, &data))
+ if (!packet_get(in, i, &data))
return -1;
eh = IOV_REMOVE_HEADER(&data, eh_storage);
diff --git a/tcp.c b/tcp.c
index e0efc4cacb9b..4ba066fd1cac 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1657,7 +1657,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
size_t off, size;
int count;
- if (!packet_data(p, i, &data))
+ if (!packet_get(p, i, &data))
return -1;
th = IOV_PEEK_HEADER(&data, th_storage);
@@ -1988,7 +1988,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
(void)pif;
- if (!packet_data(p, idx, &data))
+ if (!packet_get(p, idx, &data))
return 1;
l4len = iov_tail_size(&data);
diff --git a/udp.c b/udp.c
index 3c25f2e0ae97..86585b7e0942 100644
--- a/udp.c
+++ b/udp.c
@@ -990,7 +990,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
ASSERT(!c->no_udp);
- if (!packet_data(p, idx, &data))
+ if (!packet_get(p, idx, &data))
return 1;
uh = IOV_PEEK_HEADER(&data, uh_storage);
@@ -1033,7 +1033,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
const struct udphdr *uh_send;
- if (!packet_data(p, idx + i, &data))
+ if (!packet_get(p, idx + i, &data))
return p->count - idx;
uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
--
@@ -990,7 +990,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
ASSERT(!c->no_udp);
- if (!packet_data(p, idx, &data))
+ if (!packet_get(p, idx, &data))
return 1;
uh = IOV_PEEK_HEADER(&data, uh_storage);
@@ -1033,7 +1033,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
const struct udphdr *uh_send;
- if (!packet_data(p, idx + i, &data))
+ if (!packet_get(p, idx + i, &data))
return p->count - idx;
uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 22/30] arp: use iov_tail rather than pool
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (20 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 21/30] packet: rename packet_data() to packet_get() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:24 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 23/30] dhcp: " Laurent Vivier
` (7 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
The arp() function signature is changed to accept `struct iov_tail *data`
directly, replacing the previous `const struct pool *p` parameter.
Consequently, arp() no longer fetches packet data internally using
packet_data(), streamlining its logic.
This simplifies callers like tap4_handler(), which now pass the iov_tail
for the L2 ARP frame directly, removing intermediate pool handling.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
arp.c | 14 +++++---------
arp.h | 2 +-
tap.c | 5 +----
3 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/arp.c b/arp.c
index 8b97df633e70..44677ad15b93 100644
--- a/arp.c
+++ b/arp.c
@@ -63,11 +63,11 @@ static bool ignore_arp(const struct ctx *c,
/**
* arp() - Check if this is a supported ARP message, reply as needed
* @c: Execution context
- * @p: Packet pool, single packet with Ethernet buffer
+ * @data: Single packet with Ethernet buffer
*
* Return: 1 if handled, -1 on failure
*/
-int arp(const struct ctx *c, const struct pool *p)
+int arp(const struct ctx *c, struct iov_tail *data)
{
struct {
struct ethhdr eh;
@@ -80,14 +80,10 @@ int arp(const struct ctx *c, const struct pool *p)
const struct ethhdr *eh;
const struct arphdr *ah;
const struct arpmsg *am;
- struct iov_tail data;
- if (!packet_get(p, 0, &data))
- return -1;
-
- eh = IOV_REMOVE_HEADER(&data, eh_storage);
- ah = IOV_REMOVE_HEADER(&data, ah_storage);
- am = IOV_REMOVE_HEADER(&data, am_storage);
+ eh = IOV_REMOVE_HEADER(data, eh_storage);
+ ah = IOV_REMOVE_HEADER(data, ah_storage);
+ am = IOV_REMOVE_HEADER(data, am_storage);
if (!eh || !ah || !am)
return -1;
diff --git a/arp.h b/arp.h
index ac5cd16e47f4..86bcbf878eda 100644
--- a/arp.h
+++ b/arp.h
@@ -20,6 +20,6 @@ struct arpmsg {
unsigned char tip[4];
} __attribute__((__packed__));
-int arp(const struct ctx *c, const struct pool *p);
+int arp(const struct ctx *c, struct iov_tail *data);
#endif /* ARP_H */
diff --git a/tap.c b/tap.c
index 1d2e6fd802e9..ace735cfc136 100644
--- a/tap.c
+++ b/tap.c
@@ -722,10 +722,7 @@ resume:
if (!eh)
continue;
if (ntohs(eh->h_proto) == ETH_P_ARP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
- packet_add(pkt, &data);
- arp(c, pkt);
+ arp(c, &data);
continue;
}
--
@@ -722,10 +722,7 @@ resume:
if (!eh)
continue;
if (ntohs(eh->h_proto) == ETH_P_ARP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
- packet_add(pkt, &data);
- arp(c, pkt);
+ arp(c, &data);
continue;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 23/30] dhcp: use iov_tail rather than pool
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (21 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 22/30] arp: use iov_tail rather than pool Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:26 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 24/30] dhcpv6: " Laurent Vivier
` (6 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
This patch refactors the dhcp() function to accept `struct iov_tail *data`
directly as its packet input, replacing the previous `const struct pool *p`
parameter. Consequently, dhcp() no longer fetches packet data internally
using packet_data().
This change simplifies callers, such as tap4_handler(), which now pass
the iov_tail representing the L2 frame directly to dhcp(). This removes
the need for intermediate packet pool handling for DHCP processing.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
dhcp.c | 32 ++++++++++++++------------------
dhcp.h | 2 +-
tap.c | 5 +----
3 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/dhcp.c b/dhcp.c
index 47317f334945..cad1d037cee1 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -296,11 +296,11 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
/**
* dhcp() - Check if this is a DHCP message, reply as needed
* @c: Execution context
- * @p: Packet pool, single packet with Ethernet buffer
+ * @data: Single packet with Ethernet buffer
*
* Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure
*/
-int dhcp(const struct ctx *c, const struct pool *p)
+int dhcp(const struct ctx *c, struct iov_tail *data)
{
char macstr[ETH_ADDRSTRLEN];
size_t mlen, dlen, opt_len;
@@ -311,24 +311,20 @@ int dhcp(const struct ctx *c, const struct pool *p)
const struct ethhdr *eh;
const struct iphdr *iph;
const struct udphdr *uh;
- struct iov_tail data;
struct msg const *m;
struct msg reply;
unsigned int i;
struct msg m_storage;
- if (!packet_get(p, 0, &data))
- return -1;
-
- eh = IOV_REMOVE_HEADER(&data, eh_storage);
- iph = IOV_PEEK_HEADER(&data, iph_storage);
+ eh = IOV_REMOVE_HEADER(data, eh_storage);
+ iph = IOV_PEEK_HEADER(data, iph_storage);
if (!eh || !iph)
return -1;
- if (!iov_tail_drop(&data, iph->ihl * 4UL))
+ if (!iov_tail_drop(data, iph->ihl * 4UL))
return -1;
- uh = IOV_REMOVE_HEADER(&data, uh_storage);
+ uh = IOV_REMOVE_HEADER(data, uh_storage);
if (!uh)
return -1;
@@ -338,8 +334,8 @@ int dhcp(const struct ctx *c, const struct pool *p)
if (c->no_dhcp)
return 1;
- mlen = iov_tail_size(&data);
- m = (struct msg const *)iov_remove_header_(&data, &m_storage,
+ mlen = iov_tail_size(data);
+ m = (struct msg const *)iov_remove_header_(data, &m_storage,
offsetof(struct msg, o),
__alignof__(struct msg));
if (!m ||
@@ -367,24 +363,24 @@ int dhcp(const struct ctx *c, const struct pool *p)
for (i = 0; i < ARRAY_SIZE(opts); i++)
opts[i].clen = -1;
- opt_len = iov_tail_size(&data);
+ opt_len = iov_tail_size(data);
while (opt_len >= 2) {
uint8_t olen_storage, type_storage;
const uint8_t *olen;
uint8_t *type;
- type = IOV_REMOVE_HEADER(&data, type_storage);
- olen = IOV_REMOVE_HEADER(&data, olen_storage);
+ type = IOV_REMOVE_HEADER(data, type_storage);
+ olen = IOV_REMOVE_HEADER(data, olen_storage);
if (!type || !olen)
return -1;
- opt_len = iov_tail_size(&data);
+ opt_len = iov_tail_size(data);
if (opt_len < *olen)
return -1;
- iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
+ iov_to_buf(&data->iov[0], data->cnt, data->off, &opts[*type].c, *olen);
opts[*type].clen = *olen;
- iov_tail_drop(&data, *olen);
+ iov_tail_drop(data, *olen);
opt_len -= *olen;
}
diff --git a/dhcp.h b/dhcp.h
index 87aeecd8dec8..cd50c99b8856 100644
--- a/dhcp.h
+++ b/dhcp.h
@@ -6,7 +6,7 @@
#ifndef DHCP_H
#define DHCP_H
-int dhcp(const struct ctx *c, const struct pool *p);
+int dhcp(const struct ctx *c, struct iov_tail *data);
void dhcp_init(void);
#endif /* DHCP_H */
diff --git a/tap.c b/tap.c
index ace735cfc136..7d7e89304723 100644
--- a/tap.c
+++ b/tap.c
@@ -785,11 +785,8 @@ resume:
if (iph->protocol == IPPROTO_UDP) {
struct iov_tail eh_data;
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
packet_get(in, i, &eh_data);
- packet_add(pkt, &eh_data);
- if (dhcp(c, pkt))
+ if (dhcp(c, &eh_data))
continue;
}
--
@@ -785,11 +785,8 @@ resume:
if (iph->protocol == IPPROTO_UDP) {
struct iov_tail eh_data;
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
packet_get(in, i, &eh_data);
- packet_add(pkt, &eh_data);
- if (dhcp(c, pkt))
+ if (dhcp(c, &eh_data))
continue;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 24/30] dhcpv6: use iov_tail rather than pool
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (22 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 23/30] dhcp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:27 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 25/30] icmp: " Laurent Vivier
` (5 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
This patch refactors the dhcpv6() function to accept `struct iov_tail *data`
directly as its packet input, replacing the `const struct pool *p` parameter.
Consequently, dhcpv6() no longer fetches packet data internally using
packet_data().
This change simplifies callers, such as tap6_handler(), which now pass
the iov_tail representing the L4 UDP segment (DHCPv6 message) directly.
This removes the need for intermediate packet pool handling.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
dhcpv6.c | 29 +++++++++++++----------------
dhcpv6.h | 2 +-
tap.c | 6 ++----
3 files changed, 16 insertions(+), 21 deletions(-)
diff --git a/dhcpv6.c b/dhcpv6.c
index f54a75c642df..52611d742b0b 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -539,19 +539,19 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data,
/**
* dhcpv6() - Check if this is a DHCPv6 message, reply as needed
* @c: Execution context
- * @p: Packet pool, single packet starting from UDP header
+ * @data: Single packet starting from UDP header
* @saddr: Source IPv6 address of original message
* @daddr: Destination IPv6 address of original message
*
* Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure
*/
-int dhcpv6(struct ctx *c, const struct pool *p,
+int dhcpv6(struct ctx *c, struct iov_tail *data,
const struct in6_addr *saddr, const struct in6_addr *daddr)
{
const struct opt_server_id *server_id = NULL;
- struct iov_tail data, opt, client_id_base;
const struct opt_hdr *client_id = NULL;
struct opt_server_id server_id_storage;
+ struct iov_tail opt, client_id_base;
const struct opt_ia_na *ia = NULL;
struct opt_hdr client_id_storage;
struct opt_ia_na ia_storage;
@@ -562,10 +562,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
const struct udphdr *uh;
size_t mlen, n;
- if (!packet_get(p, 0, &data))
- return -1;
-
- uh = IOV_REMOVE_HEADER(&data, uh_storage);
+ uh = IOV_REMOVE_HEADER(data, uh_storage);
if (!uh)
return -1;
@@ -578,7 +575,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (!IN6_IS_ADDR_MULTICAST(daddr))
return -1;
- mlen = iov_tail_size(&data);
+ mlen = iov_tail_size(data);
if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh))
return -1;
@@ -586,23 +583,23 @@ int dhcpv6(struct ctx *c, const struct pool *p,
src = &c->ip6.our_tap_ll;
- mh = IOV_REMOVE_HEADER(&data, mh_storage);
+ mh = IOV_REMOVE_HEADER(data, mh_storage);
if (!mh)
return -1;
- client_id_base = data;
+ client_id_base = *data;
if (dhcpv6_opt(&client_id_base, OPT_CLIENTID))
client_id = IOV_PEEK_HEADER(&client_id_base, client_id_storage);
if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
return -1;
- opt = data;
+ opt = *data;
if (dhcpv6_opt(&opt, OPT_SERVERID))
server_id = IOV_PEEK_HEADER(&opt, server_id_storage);
if (server_id && ntohs(server_id->hdr.l) != OPT_VSIZE(server_id))
return -1;
- opt = data;
+ opt = *data;
if (dhcpv6_opt(&opt, OPT_IA_NA))
ia = IOV_PEEK_HEADER(&opt, ia_storage);
if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
@@ -620,9 +617,9 @@ int dhcpv6(struct ctx *c, const struct pool *p,
if (mh->type == TYPE_CONFIRM && server_id)
return -1;
- if (dhcpv6_ia_notonlink(&data, &c->ip6.addr)) {
+ if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) {
- dhcpv6_send_ia_notonlink(c, &data, &client_id_base,
+ dhcpv6_send_ia_notonlink(c, data, &client_id_base,
ntohs(client_id->l), mh->xid);
return 1;
@@ -635,7 +632,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
return -1;
- if (ia || dhcpv6_opt(&data, OPT_IA_TA))
+ if (ia || dhcpv6_opt(data, OPT_IA_TA))
return -1;
info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
@@ -668,7 +665,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
n = offsetof(struct resp_t, client_id) +
sizeof(struct opt_hdr) + ntohs(client_id->l);
n = dhcpv6_dns_fill(c, (char *)&resp, n);
- n = dhcpv6_client_fqdn_fill(&data, c, (char *)&resp, n);
+ n = dhcpv6_client_fqdn_fill(data, c, (char *)&resp, n);
resp.hdr.xid = mh->xid;
diff --git a/dhcpv6.h b/dhcpv6.h
index 580998862227..c706dfdbb2ac 100644
--- a/dhcpv6.h
+++ b/dhcpv6.h
@@ -6,7 +6,7 @@
#ifndef DHCPV6_H
#define DHCPV6_H
-int dhcpv6(struct ctx *c, const struct pool *p,
+int dhcpv6(struct ctx *c, struct iov_tail *data,
struct in6_addr *saddr, struct in6_addr *daddr);
void dhcpv6_init(const struct ctx *c);
diff --git a/tap.c b/tap.c
index 7d7e89304723..3262b44c4287 100644
--- a/tap.c
+++ b/tap.c
@@ -975,11 +975,9 @@ resume:
continue;
if (proto == IPPROTO_UDP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
- packet_add(pkt, &data);
+ struct iov_tail uh_data = data;
- if (dhcpv6(c, pkt, saddr, daddr))
+ if (dhcpv6(c, &uh_data, saddr, daddr))
continue;
}
--
@@ -975,11 +975,9 @@ resume:
continue;
if (proto == IPPROTO_UDP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
- packet_add(pkt, &data);
+ struct iov_tail uh_data = data;
- if (dhcpv6(c, pkt, saddr, daddr))
+ if (dhcpv6(c, &uh_data, saddr, daddr))
continue;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 25/30] icmp: use iov_tail rather than pool
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (23 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 24/30] dhcpv6: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:29 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 26/30] ndp: " Laurent Vivier
` (4 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
As the iov_tail has a non zero offset (because of the presence of
packet headers in the iov array), we must copy it to a new
iov array (using iov_tail_splice()) to pass it to sendmsg().
We cannot use anymore iov_tail_msghdr(), so remove it.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
icmp.c | 30 +++++++++++++++++++-----------
icmp.h | 2 +-
iov.c | 23 -----------------------
iov.h | 2 --
tap.c | 7 ++-----
5 files changed, 22 insertions(+), 42 deletions(-)
diff --git a/icmp.c b/icmp.c
index 71c496540310..be800e30c369 100644
--- a/icmp.c
+++ b/icmp.c
@@ -44,6 +44,7 @@
#define ICMP_ECHO_TIMEOUT 60 /* s, timeout for ICMP socket activity */
#define ICMP_NUM_IDS (1U << 16)
+#define MAX_IOV_ICMP 16 /* Arbitrary, should be enough */
/**
* ping_at_sidx() - Get ping specific flow at given sidx
@@ -229,36 +230,33 @@ cancel:
* @af: Address family, AF_INET or AF_INET6
* @saddr: Source address
* @daddr: Destination address
- * @p: Packet pool, single packet with ICMP/ICMPv6 header
+ * @data: Single packet with ICMP/ICMPv6 header
* @now: Current timestamp
*
* Return: count of consumed packets (always 1, even if malformed)
*/
int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const void *saddr, const void *daddr,
- const struct pool *p, const struct timespec *now)
+ struct iov_tail *data, const struct timespec *now)
{
+ struct iovec iov[MAX_IOV_ICMP];
struct icmp_ping_flow *pingf;
const struct flowside *tgt;
union sockaddr_inany sa;
- struct iov_tail data;
struct msghdr msh;
uint16_t id, seq;
union flow *flow;
uint8_t proto;
- socklen_t sl;
+ int cnt;
(void)saddr;
ASSERT(pif == PIF_TAP);
- if (!packet_get(p, 0, &data))
- return -1;
-
if (af == AF_INET) {
struct icmphdr ih_storage;
const struct icmphdr *ih;
- ih = IOV_PEEK_HEADER(&data, ih_storage);
+ ih = IOV_PEEK_HEADER(data, ih_storage);
if (!ih)
return 1;
@@ -272,7 +270,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
struct icmp6hdr ih_storage;
const struct icmp6hdr *ih;
- ih = IOV_PEEK_HEADER(&data, ih_storage);
+ ih = IOV_PEEK_HEADER(data, ih_storage);
if (!ih)
return 1;
@@ -286,6 +284,10 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
ASSERT(0);
}
+ cnt = iov_tail_clone(&iov[0], MAX_IOV_ICMP, data);
+ if (cnt < 0)
+ return 1;
+
flow = flow_at_sidx(flow_lookup_af(c, proto, PIF_TAP,
af, saddr, daddr, id, id));
@@ -300,8 +302,14 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
pingf->ts = now->tv_sec;
- pif_sockaddr(c, &sa, &sl, PIF_HOST, &tgt->eaddr, 0);
- iov_tail_msghdr(&msh, &data, &sa, sl);
+ pif_sockaddr(c, &sa, &msh.msg_namelen, PIF_HOST, &tgt->eaddr, 0);
+ msh.msg_name = &sa;
+ msh.msg_iov = iov;
+ msh.msg_iovlen = cnt;
+ msh.msg_control = NULL;
+ msh.msg_controllen = 0;
+ msh.msg_flags = 0;
+
if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) {
flow_dbg_perror(pingf, "failed to relay request to socket");
} else {
diff --git a/icmp.h b/icmp.h
index 5ce22b5eca1f..d1cecb20e29d 100644
--- a/icmp.h
+++ b/icmp.h
@@ -14,7 +14,7 @@ struct icmp_ping_flow;
void icmp_sock_handler(const struct ctx *c, union epoll_ref ref);
int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
const void *saddr, const void *daddr,
- const struct pool *p, const struct timespec *now);
+ struct iov_tail *data, const struct timespec *now);
void icmp_init(void);
/**
diff --git a/iov.c b/iov.c
index d17d4dd3da09..1d734acdfea6 100644
--- a/iov.c
+++ b/iov.c
@@ -157,29 +157,6 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt)
return len;
}
-/**
- * iov_tail_msghdr - Initialize a msghdr from an IOV tail structure
- * @msh: msghdr to initialize
- * @tail: iov_tail to use to set msg_iov and msg_iovlen
- * @msg_name: Pointer to set to msg_name
- * @msg_namelen: Size of @msg_name
- */
-void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
- void *msg_name, socklen_t msg_namelen)
-{
- iov_tail_prune(tail);
-
- ASSERT(tail->off == 0);
-
- msh->msg_name = msg_name;
- msh->msg_namelen = msg_namelen;
- msh->msg_iov = (struct iovec *)tail->iov;
- msh->msg_iovlen = tail->cnt;
- msh->msg_control = NULL;
- msh->msg_controllen = 0;
- msh->msg_flags = 0;
-}
-
/**
* iov_tail_prune() - Remove any unneeded buffers from an IOV tail
* @tail: IO vector tail (modified)
diff --git a/iov.h b/iov.h
index 75c3b07a87e3..ccdb690ef3f1 100644
--- a/iov.h
+++ b/iov.h
@@ -82,8 +82,6 @@ struct iov_tail {
1, \
(off_))
-void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
- void *msg_name, socklen_t msg_namelen);
bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
bool iov_tail_drop(struct iov_tail *tail, size_t len);
diff --git a/tap.c b/tap.c
index 3262b44c4287..48152a84674c 100644
--- a/tap.c
+++ b/tap.c
@@ -764,17 +764,14 @@ resume:
continue;
if (iph->protocol == IPPROTO_ICMP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
if (c->no_icmp)
continue;
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
- pkt, now);
+ &data, now);
continue;
}
@@ -964,7 +961,7 @@ resume:
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
icmp_tap_handler(c, PIF_TAP, AF_INET6,
- saddr, daddr, pkt, now);
+ saddr, daddr, &data, now);
continue;
}
--
@@ -764,17 +764,14 @@ resume:
continue;
if (iph->protocol == IPPROTO_ICMP) {
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
-
if (c->no_icmp)
continue;
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- packet_add(pkt, &data);
icmp_tap_handler(c, PIF_TAP, AF_INET,
&iph->saddr, &iph->daddr,
- pkt, now);
+ &data, now);
continue;
}
@@ -964,7 +961,7 @@ resume:
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
icmp_tap_handler(c, PIF_TAP, AF_INET6,
- saddr, daddr, pkt, now);
+ saddr, daddr, &data, now);
continue;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 26/30] ndp: use iov_tail rather than pool
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (24 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 25/30] icmp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:31 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P() Laurent Vivier
` (3 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
The ndp() function signature is changed to accept `struct iov_tail *data`
directly, replacing the previous `const struct pool *p` and
`const struct icmp6hdr *ih` parameters.
This change simplifies callers, like tap6_handler(), which now provide
the iov_tail representing the L4 ICMPv6 segment directly to ndp().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
ndp.c | 19 +++++++++++--------
ndp.h | 4 ++--
tap.c | 10 +++-------
3 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/ndp.c b/ndp.c
index ba87a0aaa6e9..eb090cd2c5a7 100644
--- a/ndp.c
+++ b/ndp.c
@@ -336,13 +336,20 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst)
* ndp() - Check for NDP solicitations, reply as needed
* @c: Execution context
* @saddr: Source IPv6 address
- * @p: Packet pool
+ * @data: Single packet with ICMPv6 header
*
* Return: 0 if not handled here, 1 if handled, -1 on failure
*/
-int ndp(const struct ctx *c, const struct icmp6hdr *ih,
- const struct in6_addr *saddr, const struct pool *p)
+int ndp(const struct ctx *c, const struct in6_addr *saddr,
+ struct iov_tail *data)
{
+ struct icmp6hdr ih_storage;
+ const struct icmp6hdr *ih;
+
+ ih = IOV_PEEK_HEADER(data, ih_storage);
+ if (!ih)
+ return -1;
+
if (ih->icmp6_type < RS || ih->icmp6_type > NA)
return 0;
@@ -352,12 +359,8 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
if (ih->icmp6_type == NS) {
struct ndp_ns ns_storage;
const struct ndp_ns *ns;
- struct iov_tail data;
-
- if (!packet_get(p, 0, &data))
- return -1;
- ns = IOV_REMOVE_HEADER(&data, ns_storage);
+ ns = IOV_REMOVE_HEADER(data, ns_storage);
if (!ns)
return -1;
diff --git a/ndp.h b/ndp.h
index 41c2000356ec..b1dd5e82c085 100644
--- a/ndp.h
+++ b/ndp.h
@@ -8,8 +8,8 @@
struct icmp6hdr;
-int ndp(const struct ctx *c, const struct icmp6hdr *ih,
- const struct in6_addr *saddr, const struct pool *p);
+int ndp(const struct ctx *c, const struct in6_addr *saddr,
+ struct iov_tail *data);
void ndp_timer(const struct ctx *c, const struct timespec *now);
#endif /* NDP_H */
diff --git a/tap.c b/tap.c
index 48152a84674c..d327ec0c3d54 100644
--- a/tap.c
+++ b/tap.c
@@ -942,9 +942,7 @@ resume:
}
if (proto == IPPROTO_ICMPV6) {
- struct icmp6hdr l4h_storage;
- const struct icmp6hdr *l4h;
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
+ struct iov_tail ndp_data;
if (c->no_icmp)
continue;
@@ -952,10 +950,8 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- packet_add(pkt, &data);
-
- l4h = IOV_PEEK_HEADER(&data, l4h_storage);
- if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
+ ndp_data = data;
+ if (ndp(c, saddr, &ndp_data))
continue;
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
--
@@ -942,9 +942,7 @@ resume:
}
if (proto == IPPROTO_ICMPV6) {
- struct icmp6hdr l4h_storage;
- const struct icmp6hdr *l4h;
- PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
+ struct iov_tail ndp_data;
if (c->no_icmp)
continue;
@@ -952,10 +950,8 @@ resume:
if (l4len < sizeof(struct icmp6hdr))
continue;
- packet_add(pkt, &data);
-
- l4h = IOV_PEEK_HEADER(&data, l4h_storage);
- if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
+ ndp_data = data;
+ if (ndp(c, saddr, &ndp_data))
continue;
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (25 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 26/30] ndp: " Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:32 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL() Laurent Vivier
` (2 subsequent siblings)
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
These macros are no longer used following the refactoring of packet
handlers to directly use iov_tail. Callers no longer require PACKET_POOL_P
for temporary pools, and PACKET_POOL can be replaced by PACKET_POOL_DECL
and separate initialization if needed.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.h | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/packet.h b/packet.h
index 7afe80ef3fcf..286b6b9994db 100644
--- a/packet.h
+++ b/packet.h
@@ -59,19 +59,10 @@ struct _name ## _t { \
.size = _size, \
}
-#define PACKET_POOL(name, size, buf, buf_size) \
- PACKET_POOL_DECL(name, size, buf) name = \
- PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
-
#define PACKET_INIT(name, size, buf, buf_size) \
(struct name ## _t) PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
#define PACKET_POOL_NOINIT(name, size, buf) \
PACKET_POOL_DECL(name, size, buf) name ## _storage; \
static struct pool *name = (struct pool *)&name ## _storage
-
-#define PACKET_POOL_P(name, size, buf, buf_size) \
- PACKET_POOL(name ## _storage, size, buf, buf_size); \
- struct pool *name = (struct pool *)&name ## _storage
-
#endif /* PACKET_H */
--
@@ -59,19 +59,10 @@ struct _name ## _t { \
.size = _size, \
}
-#define PACKET_POOL(name, size, buf, buf_size) \
- PACKET_POOL_DECL(name, size, buf) name = \
- PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
-
#define PACKET_INIT(name, size, buf, buf_size) \
(struct name ## _t) PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
#define PACKET_POOL_NOINIT(name, size, buf) \
PACKET_POOL_DECL(name, size, buf) name ## _storage; \
static struct pool *name = (struct pool *)&name ## _storage
-
-#define PACKET_POOL_P(name, size, buf, buf_size) \
- PACKET_POOL(name ## _storage, size, buf, buf_size); \
- struct pool *name = (struct pool *)&name ## _storage
-
#endif /* PACKET_H */
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL()
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (26 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-06 6:33 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 29/30] packet: Refactor vhost-user memory region handling Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 30/30] packet: Add support for multi-vector packets Laurent Vivier
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
_buf is not used in the macro. Remove it.
Remove it also from PACKET_POOL_NOINIT() as it was needed
for PACKET_POOL_DECL().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.h | 6 +++---
tap.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/packet.h b/packet.h
index 286b6b9994db..43b9022075d1 100644
--- a/packet.h
+++ b/packet.h
@@ -43,7 +43,7 @@ void pool_flush(struct pool *p);
#define packet_get(p, idx, data) \
packet_get_do(p, idx, data, __func__, __LINE__)
-#define PACKET_POOL_DECL(_name, _size, _buf) \
+#define PACKET_POOL_DECL(_name, _size) \
struct _name ## _t { \
char *buf; \
size_t buf_size; \
@@ -62,7 +62,7 @@ struct _name ## _t { \
#define PACKET_INIT(name, size, buf, buf_size) \
(struct name ## _t) PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
-#define PACKET_POOL_NOINIT(name, size, buf) \
- PACKET_POOL_DECL(name, size, buf) name ## _storage; \
+#define PACKET_POOL_NOINIT(name, size) \
+ PACKET_POOL_DECL(name, size) name ## _storage; \
static struct pool *name = (struct pool *)&name ## _storage
#endif /* PACKET_H */
diff --git a/tap.c b/tap.c
index d327ec0c3d54..bbc786468455 100644
--- a/tap.c
+++ b/tap.c
@@ -95,8 +95,8 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU);
ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
-static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4, pkt_buf);
-static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6, pkt_buf);
+static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4);
+static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6);
#define TAP_SEQS 128 /* Different L4 tuples in one batch */
#define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warnings */
@@ -555,7 +555,7 @@ void eth_update_mac(struct ethhdr *eh,
memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
}
-PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
+PACKET_POOL_DECL(pool_l4, UIO_MAXIOV);
/**
* struct l4_seq4_t - Message sequence for one protocol handler call, IPv4
--
@@ -95,8 +95,8 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU);
ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
-static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4, pkt_buf);
-static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6, pkt_buf);
+static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4);
+static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6);
#define TAP_SEQS 128 /* Different L4 tuples in one batch */
#define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warnings */
@@ -555,7 +555,7 @@ void eth_update_mac(struct ethhdr *eh,
memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
}
-PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
+PACKET_POOL_DECL(pool_l4, UIO_MAXIOV);
/**
* struct l4_seq4_t - Message sequence for one protocol handler call, IPv4
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 29/30] packet: Refactor vhost-user memory region handling
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (27 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL() Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-07 6:10 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 30/30] packet: Add support for multi-vector packets Laurent Vivier
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
This patch refactors the handling of vhost-user memory regions by
introducing a new `struct vdev_memory` to encapsulate the regions
array and their count (`nregions`) within the main `vu_dev` structure.
This new `vdev_memory` structure is then passed to the packet pool by
re-using the existing `p->buf` field. A `p->buf_size` of 0 indicates
that `p->buf` holds a pointer to `struct vdev_memory` instead of a
regular packet buffer. A new helper, `get_vdev_memory()`, is added to
abstract this access pattern.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.c | 18 ++++++++++++++++--
packet.h | 6 ++++--
tap.c | 4 ++--
tap.h | 1 -
vhost_user.c | 28 +++++++++++-----------------
virtio.c | 4 ++--
virtio.h | 18 ++++++++++++++----
vu_common.c | 22 ++++++++++++----------
8 files changed, 61 insertions(+), 40 deletions(-)
diff --git a/packet.c b/packet.c
index cbc43c2fc22d..4b93688509a4 100644
--- a/packet.c
+++ b/packet.c
@@ -22,6 +22,20 @@
#include "util.h"
#include "log.h"
+/**
+ * get_vdev_memory() - Return a pointer to the memory regions of the pool
+ * @p: Packet pool
+ *
+ * Return: Null if none, otherwise a pointer to vdev_memory structure
+ */
+static struct vdev_memory *get_vdev_memory(const struct pool *p)
+{
+ if (p->buf_size)
+ return NULL;
+
+ return (struct vdev_memory *)p->buf;
+}
+
/**
* packet_check_range() - Check if a memory range is valid for a pool
* @p: Packet pool
@@ -41,10 +55,10 @@ static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
return -1;
}
- if (p->buf_size == 0) {
+ if (get_vdev_memory(p)) {
int ret;
- ret = vu_packet_check_range((void *)p->buf, ptr, len);
+ ret = vu_packet_check_range(get_vdev_memory(p), ptr, len);
if (ret == -1)
debug("cannot find region, %s:%i", func, line);
diff --git a/packet.h b/packet.h
index 43b9022075d1..e51cbd19fdc4 100644
--- a/packet.h
+++ b/packet.h
@@ -8,6 +8,7 @@
#include <stdbool.h>
#include "iov.h"
+#include "virtio.h"
/* Maximum size of a single packet stored in pool, including headers */
#define PACKET_MAX_LEN ((size_t)UINT16_MAX)
@@ -15,7 +16,7 @@
/**
* struct pool - Generic pool of packets stored in a buffer
* @buf: Buffer storing packet descriptors,
- * a struct vu_dev_region array for passt vhost-user mode
+ * a struct vdev_region for passt vhost-user mode
* @buf_size: Total size of buffer,
* 0 for passt vhost-user mode
* @size: Number of usable descriptors for the pool
@@ -30,7 +31,8 @@ struct pool {
struct iovec pkt[];
};
-int vu_packet_check_range(void *buf, const char *ptr, size_t len);
+int vu_packet_check_range(struct vdev_memory *memory,
+ const char *ptr, size_t len);
void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line);
bool packet_get_do(const struct pool *p, const size_t idx,
diff --git a/tap.c b/tap.c
index bbc786468455..9fd00915bb01 100644
--- a/tap.c
+++ b/tap.c
@@ -1458,7 +1458,7 @@ static void tap_sock_tun_init(struct ctx *c)
* @base: Buffer base
* @size Buffer size
*/
-void tap_sock_update_pool(void *base, size_t size)
+static void tap_sock_update_pool(void *base, size_t size)
{
int i;
@@ -1479,8 +1479,8 @@ void tap_sock_update_pool(void *base, size_t size)
void tap_backend_init(struct ctx *c)
{
if (c->mode == MODE_VU) {
- tap_sock_update_pool(NULL, 0);
vu_init(c);
+ tap_sock_update_pool(&c->vdev->memory, 0);
} else {
tap_sock_update_pool(pkt_buf, sizeof(pkt_buf));
}
diff --git a/tap.h b/tap.h
index ce5510882d5d..21db4d219ecb 100644
--- a/tap.h
+++ b/tap.h
@@ -115,7 +115,6 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
const struct timespec *now);
int tap_sock_unix_open(char *sock_path);
void tap_sock_reset(struct ctx *c);
-void tap_sock_update_pool(void *base, size_t size);
void tap_backend_init(struct ctx *c);
void tap_flush_pools(void);
void tap_handler(struct ctx *c, const struct timespec *now);
diff --git a/vhost_user.c b/vhost_user.c
index c1522d549f00..f97ec6064cac 100644
--- a/vhost_user.c
+++ b/vhost_user.c
@@ -137,8 +137,8 @@ static void *qva_to_va(struct vu_dev *dev, uint64_t qemu_addr)
unsigned int i;
/* Find matching memory region. */
- for (i = 0; i < dev->nregions; i++) {
- const struct vu_dev_region *r = &dev->regions[i];
+ for (i = 0; i < dev->memory.nregions; i++) {
+ const struct vu_dev_region *r = &dev->memory.regions[i];
if ((qemu_addr >= r->qva) && (qemu_addr < (r->qva + r->size))) {
/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
@@ -428,8 +428,8 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
struct vhost_user_memory m = vmsg->payload.memory, *memory = &m;
unsigned int i;
- for (i = 0; i < vdev->nregions; i++) {
- const struct vu_dev_region *r = &vdev->regions[i];
+ for (i = 0; i < vdev->memory.nregions; i++) {
+ const struct vu_dev_region *r = &vdev->memory.regions[i];
if (r->mmap_addr) {
/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
@@ -437,12 +437,12 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
r->size + r->mmap_offset);
}
}
- vdev->nregions = memory->nregions;
+ vdev->memory.nregions = memory->nregions;
debug("vhost-user nregions: %u", memory->nregions);
- for (i = 0; i < vdev->nregions; i++) {
+ for (i = 0; i < vdev->memory.nregions; i++) {
struct vhost_user_memory_region *msg_region = &memory->regions[i];
- struct vu_dev_region *dev_region = &vdev->regions[i];
+ struct vu_dev_region *dev_region = &vdev->memory.regions[i];
void *mmap_addr;
debug("vhost-user region %d", i);
@@ -484,13 +484,7 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
}
}
- /* As vu_packet_check_range() has no access to the number of
- * memory regions, mark the end of the array with mmap_addr = 0
- */
- ASSERT(vdev->nregions < VHOST_USER_MAX_RAM_SLOTS - 1);
- vdev->regions[vdev->nregions].mmap_addr = 0;
-
- tap_sock_update_pool(vdev->regions, 0);
+ ASSERT(vdev->memory.nregions < VHOST_USER_MAX_RAM_SLOTS);
return false;
}
@@ -1106,8 +1100,8 @@ void vu_cleanup(struct vu_dev *vdev)
vq->vring.avail = 0;
}
- for (i = 0; i < vdev->nregions; i++) {
- const struct vu_dev_region *r = &vdev->regions[i];
+ for (i = 0; i < vdev->memory.nregions; i++) {
+ const struct vu_dev_region *r = &vdev->memory.regions[i];
if (r->mmap_addr) {
/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
@@ -1115,7 +1109,7 @@ void vu_cleanup(struct vu_dev *vdev)
r->size + r->mmap_offset);
}
}
- vdev->nregions = 0;
+ vdev->memory.nregions = 0;
vu_close_log(vdev);
diff --git a/virtio.c b/virtio.c
index ed7842b4c78a..bd388c2dfc7f 100644
--- a/virtio.c
+++ b/virtio.c
@@ -102,8 +102,8 @@ static void *vu_gpa_to_va(const struct vu_dev *dev, uint64_t *plen,
return NULL;
/* Find matching memory region. */
- for (i = 0; i < dev->nregions; i++) {
- const struct vu_dev_region *r = &dev->regions[i];
+ for (i = 0; i < dev->memory.nregions; i++) {
+ const struct vu_dev_region *r = &dev->memory.regions[i];
if ((guest_addr >= r->gpa) &&
(guest_addr < (r->gpa + r->size))) {
diff --git a/virtio.h b/virtio.h
index 32757458ea95..b55cc4042521 100644
--- a/virtio.h
+++ b/virtio.h
@@ -96,11 +96,22 @@ struct vu_dev_region {
*/
#define VHOST_USER_MAX_RAM_SLOTS 32
+/**
+ * struct vdev_memory - Describes the shared memory regions for a vhost-user
+ * device
+ * @nregions: Number of shared memory regions
+ * @regions: Guest shared memory regions
+ */
+struct vdev_memory {
+ uint32_t nregions;
+ struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS];
+};
+
/**
* struct vu_dev - vhost-user device information
* @context: Execution context
- * @nregions: Number of shared memory regions
- * @regions: Guest shared memory regions
+ * @memory: Shared memory regions
+ * @vq: Virtqueues of the device
* @features: Vhost-user features
* @protocol_features: Vhost-user protocol features
* @log_call_fd: Eventfd to report logging update
@@ -109,8 +120,7 @@ struct vu_dev_region {
*/
struct vu_dev {
struct ctx *context;
- uint32_t nregions;
- struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS];
+ struct vdev_memory memory;
struct vu_virtq vq[VHOST_USER_MAX_QUEUES];
uint64_t features;
uint64_t protocol_features;
diff --git a/vu_common.c b/vu_common.c
index b77b21420c57..b716070ea3c3 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -25,26 +25,28 @@
/**
* vu_packet_check_range() - Check if a given memory zone is contained in
* a mapped guest memory region
- * @buf: Array of the available memory regions
+ * @memory: Array of the available memory regions
* @ptr: Start of desired data range
- * @size: Length of desired data range
+ * @len: Length of desired data range
*
* Return: 0 if the zone is in a mapped memory region, -1 otherwise
*/
-int vu_packet_check_range(void *buf, const char *ptr, size_t len)
+int vu_packet_check_range(struct vdev_memory *memory,
+ const char *ptr, size_t len)
{
- struct vu_dev_region *dev_region;
+ struct vu_dev_region *dev_region = memory->regions;
+ unsigned int i;
- for (dev_region = buf; dev_region->mmap_addr; dev_region++) {
- uintptr_t base_addr = dev_region->mmap_addr +
- dev_region->mmap_offset;
+ for (i = 0; i < memory->nregions; i++) {
+ uintptr_t base_addr = dev_region[i].mmap_addr +
+ dev_region[i].mmap_offset;
/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
const char *base = (const char *)base_addr;
- ASSERT(base_addr >= dev_region->mmap_addr);
+ ASSERT(base_addr >= dev_region[i].mmap_addr);
- if (len <= dev_region->size && base <= ptr &&
- (size_t)(ptr - base) <= dev_region->size - len)
+ if (len <= dev_region[i].size && base <= ptr &&
+ (size_t)(ptr - base) <= dev_region[i].size - len)
return 0;
}
--
@@ -25,26 +25,28 @@
/**
* vu_packet_check_range() - Check if a given memory zone is contained in
* a mapped guest memory region
- * @buf: Array of the available memory regions
+ * @memory: Array of the available memory regions
* @ptr: Start of desired data range
- * @size: Length of desired data range
+ * @len: Length of desired data range
*
* Return: 0 if the zone is in a mapped memory region, -1 otherwise
*/
-int vu_packet_check_range(void *buf, const char *ptr, size_t len)
+int vu_packet_check_range(struct vdev_memory *memory,
+ const char *ptr, size_t len)
{
- struct vu_dev_region *dev_region;
+ struct vu_dev_region *dev_region = memory->regions;
+ unsigned int i;
- for (dev_region = buf; dev_region->mmap_addr; dev_region++) {
- uintptr_t base_addr = dev_region->mmap_addr +
- dev_region->mmap_offset;
+ for (i = 0; i < memory->nregions; i++) {
+ uintptr_t base_addr = dev_region[i].mmap_addr +
+ dev_region[i].mmap_offset;
/* NOLINTNEXTLINE(performance-no-int-to-ptr) */
const char *base = (const char *)base_addr;
- ASSERT(base_addr >= dev_region->mmap_addr);
+ ASSERT(base_addr >= dev_region[i].mmap_addr);
- if (len <= dev_region->size && base <= ptr &&
- (size_t)(ptr - base) <= dev_region->size - len)
+ if (len <= dev_region[i].size && base <= ptr &&
+ (size_t)(ptr - base) <= dev_region[i].size - len)
return 0;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH v8 30/30] packet: Add support for multi-vector packets
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
` (28 preceding siblings ...)
2025-08-05 15:46 ` [PATCH v8 29/30] packet: Refactor vhost-user memory region handling Laurent Vivier
@ 2025-08-05 15:46 ` Laurent Vivier
2025-08-07 6:17 ` David Gibson
29 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-05 15:46 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
The packet pool was previously limited to handling packets contained
within a single buffer.
This patch extends the packet pool to support iovec array,
allowing a single logical packet to be composed of multiple iovec.
To accommodate this, the storage format within the pool is modified.
For a multi-vector packet, a header entry is now stored first with
iov_base = NULL and iov_len holding the number of subsequent
vectors. The actual data vectors are then stored in the following
pool slots.
The packet_add_do() and packet_get_do() functions are updated to
manage this new format for storing and retrieving packets. The
pool_full() check is also adjusted to ensure there is enough
space for all vectors of a new packet before adding it.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
packet.c | 50 +++++++++++++++++++++++++++++++++-----------------
packet.h | 2 +-
tap.c | 4 ++--
3 files changed, 36 insertions(+), 20 deletions(-)
diff --git a/packet.c b/packet.c
index 4b93688509a4..d697232d951a 100644
--- a/packet.c
+++ b/packet.c
@@ -90,12 +90,13 @@ static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
/**
* pool_full() - Is a packet pool full?
* @p: Pointer to packet pool
+ * @data: check data can fit in the pool
*
- * Return: true if the pool is full, false if more packets can be added
+ * Return: true if the pool is full, false if data can be added
*/
-bool pool_full(const struct pool *p)
+bool pool_full(const struct pool *p, const struct iov_tail *data)
{
- return p->count >= p->size;
+ return p->count + data->cnt + (data->cnt > 1) >= p->size;
}
/**
@@ -108,11 +109,9 @@ bool pool_full(const struct pool *p)
void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line)
{
- size_t idx = p->count;
- const char *start;
- size_t len;
+ size_t idx = p->count, i, offset;
- if (pool_full(p)) {
+ if (pool_full(p, data)) {
debug("add packet index %zu to pool with size %zu, %s:%i",
idx, p->size, func, line);
return;
@@ -121,18 +120,30 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
if (!iov_tail_prune(data))
return;
- ASSERT(data->cnt == 1); /* we don't support iovec */
+ if (data->cnt > 1) {
+ p->pkt[idx].iov_base = NULL;
+ p->pkt[idx].iov_len = data->cnt;
+ idx++;
+ }
- len = data->iov[0].iov_len - data->off;
- start = (char *)data->iov[0].iov_base + data->off;
+ offset = data->off;
+ for (i = 0; i < data->cnt; i++) {
+ const char *start;
+ size_t len;
- if (packet_check_range(p, start, len, func, line))
- return;
+ len = data->iov[i].iov_len - offset;
+ start = (char *)data->iov[i].iov_base + offset;
+ offset = 0;
- p->pkt[idx].iov_base = (void *)start;
- p->pkt[idx].iov_len = len;
+ if (packet_check_range(p, start, len, func, line))
+ return;
- p->count++;
+ p->pkt[idx].iov_base = (void *)start;
+ p->pkt[idx].iov_len = len;
+ idx++;
+ }
+
+ p->count = idx;
}
/**
@@ -162,9 +173,14 @@ bool packet_get_do(const struct pool *p, size_t idx,
return false;
}
- data->cnt = 1;
+ if (p->pkt[idx].iov_base) {
+ data->cnt = 1;
+ data->iov = &p->pkt[idx];
+ } else {
+ data->cnt = p->pkt[idx].iov_len;
+ data->iov = &p->pkt[idx + 1];
+ }
data->off = 0;
- data->iov = &p->pkt[idx];
for (i = 0; i < data->cnt; i++) {
ASSERT_WITH_MSG(!packet_check_range(p, data->iov[i].iov_base,
diff --git a/packet.h b/packet.h
index e51cbd19fdc4..67dc7deb17db 100644
--- a/packet.h
+++ b/packet.h
@@ -37,7 +37,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
const char *func, int line);
bool packet_get_do(const struct pool *p, const size_t idx,
struct iov_tail *data, const char *func, int line);
-bool pool_full(const struct pool *p);
+bool pool_full(const struct pool *p, const struct iov_tail *data);
void pool_flush(struct pool *p);
#define packet_add(p, data) \
diff --git a/tap.c b/tap.c
index 9fd00915bb01..95688b22fcb7 100644
--- a/tap.c
+++ b/tap.c
@@ -1103,14 +1103,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
switch (ntohs(eh->h_proto)) {
case ETH_P_ARP:
case ETH_P_IP:
- if (pool_full(pool_tap4)) {
+ if (pool_full(pool_tap4, data)) {
tap4_handler(c, pool_tap4, now);
pool_flush(pool_tap4);
}
packet_add(pool_tap4, data);
break;
case ETH_P_IPV6:
- if (pool_full(pool_tap6)) {
+ if (pool_full(pool_tap6, data)) {
tap6_handler(c, pool_tap6, now);
pool_flush(pool_tap6);
}
--
@@ -1103,14 +1103,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
switch (ntohs(eh->h_proto)) {
case ETH_P_ARP:
case ETH_P_IP:
- if (pool_full(pool_tap4)) {
+ if (pool_full(pool_tap4, data)) {
tap4_handler(c, pool_tap4, now);
pool_flush(pool_tap4);
}
packet_add(pool_tap4, data);
break;
case ETH_P_IPV6:
- if (pool_full(pool_tap6)) {
+ if (pool_full(pool_tap6, data)) {
tap6_handler(c, pool_tap6, now);
pool_flush(pool_tap6);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop().
2025-08-05 15:46 ` [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop() Laurent Vivier
@ 2025-08-06 1:32 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 1:32 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 4425 bytes --]
On Tue, Aug 05, 2025 at 05:46:00PM +0200, Laurent Vivier wrote:
> These utilities enhance iov_tail manipulation, useful for
> efficient packet processing by enabling iovec array cloning and
> header stripping without data copies.
>
> - iov_tail_drop(): Discards a specified number of bytes from the
> beginning of an iov_tail by advancing its internal offset and pruning
> consumed elements.
>
> - iov_tail_clone(): Clone an iov_tail into an iovec array, adjusting the
> first iovec entry to remove the iov_tail offset.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
To the extent that the functions look useful and the implementations
look correct; a couple of suggested improvements below.
> ---
> iov.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> iov.h | 3 +++
> 2 files changed, 55 insertions(+)
>
> diff --git a/iov.c b/iov.c
> index edf0444d1955..9d282d4af461 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -192,6 +192,21 @@ size_t iov_tail_size(struct iov_tail *tail)
> return iov_size(tail->iov, tail->cnt) - tail->off;
> }
>
> +/**
> + * iov_tail_drop() - Discard a header from an IOV tail
Maybe _drop_hdr() to clarify that it's dropping specifically from the
start. Or maybe not, I'm torn between brevity and specificity.
> + * @tail: IO vector tail
> + * @len: length to move the head of the tail
> + *
> + * Return: true if the item still contains any bytes, otherwise false
> + */
> +/* cppcheck-suppress unusedFunction */
> +bool iov_tail_drop(struct iov_tail *tail, size_t len)
> +{
> + tail->off = tail->off + len;
> +
> + return iov_tail_prune(tail);
> +}
> +
> /**
> * iov_peek_header_() - Get pointer to a header from an IOV tail
> * @tail: IOV tail to get header from
> @@ -248,3 +263,40 @@ void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
> tail->off = tail->off + len;
> return p;
> }
> +
> +/**
> + * iov_tail_clone() - Assign iov references referencing a subset of the data
> + * in an iov_tail
> + *
> + * @dst_iov: Pointer to the destination array of struct iovec describing
> + * the scatter/gather I/O vector to shallow copy to.
> + * @dst_iov_cnt: Maximum number of elements in the destination iov array.
> + * @tail: Pointer to the source iov_tail
> + *
> + * Return: the number of elements successfully referenced from the destination
> + * iov array, a negative value if there is not enough room in the
> + * destination iov array
> + */
> +/* cppcheck-suppress unusedFunction */
> +ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> + struct iov_tail *tail)
> +{
> + const struct iovec *iov = &tail->iov[0];
> + size_t iov_cnt = tail->cnt;
> + size_t offset = tail->off;
> + unsigned int i, j;
> +
> + i = iov_skip_bytes(iov, iov_cnt, offset, &offset);
If you prune first, then you can assume that offset is within the
first entry, and you'll also know that you need exactly tail->cnt
entries in the destination.
You could then:
if (tail->off != 0)
/* copy partial first entry */
/* memcpy the rest */
> + /* assign iov references referencing a subset of the source one */
> + for (j = 0; i < iov_cnt && j < dst_iov_cnt; i++, j++) {
> + dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
> + dst_iov[j].iov_len = iov[i].iov_len - offset;
> + offset = 0;
> + }
> +
> + if (j == dst_iov_cnt && i != iov_cnt)
> + return -1;
> +
> + return j;
> +}
> diff --git a/iov.h b/iov.h
> index 3fc96ab9755a..bf9820ac52ab 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -72,8 +72,11 @@ struct iov_tail {
>
> bool iov_tail_prune(struct iov_tail *tail);
> size_t iov_tail_size(struct iov_tail *tail);
> +bool iov_tail_drop(struct iov_tail *tail, size_t len);
> void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align);
> void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align);
> +ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> + struct iov_tail *tail);
>
> /**
> * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER()
2025-08-05 15:46 ` [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER() Laurent Vivier
@ 2025-08-06 1:45 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 1:45 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 10093 bytes --]
On Tue, Aug 05, 2025 at 05:46:01PM +0200, Laurent Vivier wrote:
> Provide a temporary variable of the wanted type to store
> the header if the memory in the iovec array is not contiguous.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> iov.c | 55 +++++++++++++++++++++++++++++++++++++++++++++----------
> iov.h | 55 +++++++++++++++++++++++++++++++++++++++++--------------
> tcp_buf.c | 2 +-
> 3 files changed, 87 insertions(+), 25 deletions(-)
>
> diff --git a/iov.c b/iov.c
> index 9d282d4af461..d39bb099fa69 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -109,7 +109,7 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
> *
> * Return: the number of bytes successfully copied.
> */
> -/* cppcheck-suppress unusedFunction */
> +/* cppcheck-suppress [staticFunction] */
> size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
> size_t offset, void *buf, size_t bytes)
> {
> @@ -127,6 +127,7 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
> /* copying data */
> for (copied = 0; copied < bytes && i < iov_cnt; i++) {
> size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
> + /* NOLINTNEXTLINE(clang-analyzer-core.NonNullParamChecker) */
Which parameter was cppcheck complaining about?
> memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
> len);
> copied += len;
> @@ -208,7 +209,7 @@ bool iov_tail_drop(struct iov_tail *tail, size_t len)
> }
>
> /**
> - * iov_peek_header_() - Get pointer to a header from an IOV tail
> + * iov_check_header() - Check if a header can be accessed
> * @tail: IOV tail to get header from
> * @len: Length of header to get, in bytes
> * @align: Required alignment of header, in bytes
> @@ -219,8 +220,7 @@ bool iov_tail_drop(struct iov_tail *tail, size_t len)
> * overruns the IO vector, is not contiguous or doesn't have the
> * requested alignment.
> */
> -/* cppcheck-suppress [staticFunction,unmatchedSuppression] */
> -void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align)
> +static void *iov_check_header(struct iov_tail *tail, size_t len, size_t align)
> {
> char *p;
>
> @@ -240,27 +240,62 @@ void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align)
> return p;
> }
>
> +/**
> + * iov_peek_header_() - Get pointer to a header from an IOV tail
> + * @tail: IOV tail to get header from
> + * @v: Temporary memory to use if the memory in @tail
> + * is discontinuous
> + * @len: Length of header to get, in bytes
> + * @align: Required alignment of header, in bytes
> + *
> + * @tail may be pruned, but will represent the same bytes as before.
> + *
> + * Return: pointer to the first @len logical bytes of the tail, or to
> + * a copy if that overruns the IO vector, is not contiguous or
> + * doesn't have the requested alignment. NULL if that overruns the
> + * IO vector.
> + */
> +/* cppcheck-suppress [staticFunction,unmatchedSuppression] */
> +void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
> +{
> + char *p = iov_check_header(tail, len, align);
> + size_t l;
> +
> + if (p)
> + return p;
> +
> + l = iov_to_buf(tail->iov, tail->cnt, tail->off, v, len);
> + if (l != len)
> + return NULL;
> +
> + return v;
> +}
> +
> /**
> * iov_remove_header_() - Remove a header from an IOV tail
> * @tail: IOV tail to remove header from (modified)
> + * @v: Temporary memory to use if the memory in @tail
> + * is discontinuous
> * @len: Length of header to remove, in bytes
> * @align: Required alignment of header, in bytes
> *
> * On success, @tail is updated so that it longer includes the bytes of the
> * returned header.
> *
> - * Return: pointer to the first @len logical bytes of the tail, NULL if that
> - * overruns the IO vector, is not contiguous or doesn't have the
> - * requested alignment.
> + * Return: pointer to the first @len logical bytes of the tail, or to
> + * a copy if that overruns the IO vector, is not contiguous or
> + * doesn't have the requested alignment. NULL if that overruns the
> + * IO vector.
> */
> -void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
> +void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
> {
> - char *p = iov_peek_header_(tail, len, align);
> + char *p = iov_peek_header_(tail, v, len, align);
>
> if (!p)
> return NULL;
>
> tail->off = tail->off + len;
Do you want to use your new iov_tail_drop() here?
> +
> return p;
> }
>
> @@ -275,7 +310,7 @@ void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align)
> *
> * Return: the number of elements successfully referenced from the destination
> * iov array, a negative value if there is not enough room in the
> - * destination iov array
> + * destination iov array
> */
> /* cppcheck-suppress unusedFunction */
> ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> diff --git a/iov.h b/iov.h
> index bf9820ac52ab..ccdb690ef3f1 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -70,41 +70,68 @@ struct iov_tail {
> #define IOV_TAIL(iov_, cnt_, off_) \
> (struct iov_tail){ .iov = (iov_), .cnt = (cnt_), .off = (off_) }
>
> +/**
> + * IOV_TAIL_FROM_BUF() - Create a new IOV tail from a buffer
> + * @buf_: Buffer address to use in the iovec
> + * @len_: Buffer size
> + * @off_: Byte offset in the buffer where the tail begins
> + */
> +#define IOV_TAIL_FROM_BUF(buf_, len_, off_) \
> + IOV_TAIL((&(const struct iovec){ .iov_base = (buf_), \
> + .iov_len = (len_) }), \
> + 1, \
> + (off_))
> +
This seems unrelated to the rest of the patch, does it belong in a
different one?
Also, given that you're constructing the iovec entries yourself, why
not set tail->off to zero and update iov_base and len accordingly.
> bool iov_tail_prune(struct iov_tail *tail);
> size_t iov_tail_size(struct iov_tail *tail);
> bool iov_tail_drop(struct iov_tail *tail, size_t len);
> -void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align);
> -void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align);
> +void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
> +void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
> ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> struct iov_tail *tail);
>
> /**
> * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
> * @tail_: IOV tail to get header from
> - * @type_: Data type of the header
> + * @var_: Temporary buffer of the type of the header to use if
> + * the memory in the iovec array is not contiguous.
> *
> * @tail_ may be pruned, but will represent the same bytes as before.
> *
> - * Return: pointer of type (@type_ *) located at the start of @tail_, NULL if
> - * we can't get a contiguous and aligned pointer.
> + * Return: pointer of type (@type_ *) located at the start of @tail_
> + * or to @var_ if iovec memory is not contiguous, NULL if
> + * that overruns the iovec.
The NULL case can't happen if given a variable of the right type.
> */
> -#define IOV_PEEK_HEADER(tail_, type_) \
> - ((type_ *)(iov_peek_header_((tail_), \
> - sizeof(type_), __alignof__(type_))))
> +
> +#define IOV_PEEK_HEADER(tail_, var_) \
> + ((__typeof__(var_) *)(iov_peek_header_((tail_), &(var_), \
> + sizeof(var_), \
> + __alignof__(var_))))
>
> /**
> * IOV_REMOVE_HEADER() - Remove and return typed header from an IOV tail
> * @tail_: IOV tail to remove header from (modified)
> - * @type_: Data type of the header to remove
> + * @var_: Temporary buffer of the type of the header to use if
> + * the memory in the iovec array is not contiguous.
> *
> * On success, @tail_ is updated so that it longer includes the bytes of the
> * returned header.
> *
> - * Return: pointer of type (@type_ *) located at the old start of @tail_, NULL
> - * if we can't get a contiguous and aligned pointer.
> + * Return: pointer of type (@type_ *) located at the start of @tail_
> + * or to @var_ if iovec memory is not contiguous, NULL if
> + * that overruns the iovec.
Again, NULL case can't happen.
> + */
> +
> +#define IOV_REMOVE_HEADER(tail_, var_) \
> + ((__typeof__(var_) *)(iov_remove_header_((tail_), &(var_), \
> + sizeof(var_), __alignof__(var_))))
> +
> +/** IOV_DROP_HEADER() - Remove a typed header from an IOV tail
> + * @tail_: IOV tail to remove header from (modified)
> + * @type_: Data type of the header to remove
> + *
> + * Return: true if the tail still contains any bytes, otherwise false
> */
> -#define IOV_REMOVE_HEADER(tail_, type_) \
> - ((type_ *)(iov_remove_header_((tail_), \
> - sizeof(type_), __alignof__(type_))))
> +#define IOV_DROP_HEADER(tail_, type_) iov_tail_drop((tail_), sizeof(type_))
>
> #endif /* IOVEC_H */
> diff --git a/tcp_buf.c b/tcp_buf.c
> index d1fca676c9a7..bc898de86919 100644
> --- a/tcp_buf.c
> +++ b/tcp_buf.c
> @@ -160,7 +160,7 @@ static void tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn,
> uint32_t seq, bool no_tcp_csum)
> {
> struct iov_tail tail = IOV_TAIL(&iov[TCP_IOV_PAYLOAD], 1, 0);
> - struct tcphdr *th = IOV_REMOVE_HEADER(&tail, struct tcphdr);
> + struct tcphdr th_storage, *th = IOV_REMOVE_HEADER(&tail, th_storage);
> struct tap_hdr *taph = iov[TCP_IOV_TAP].iov_base;
> const struct flowside *tapside = TAPFLOW(conn);
> const struct in_addr *a4 = inany_v4(&tapside->oaddr);
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet()
2025-08-05 15:46 ` [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet() Laurent Vivier
@ 2025-08-06 1:56 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 1:56 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 8105 bytes --]
On Tue, Aug 05, 2025 at 05:46:02PM +0200, Laurent Vivier wrote:
> Use IOV_PEEK_HEADER() to get the ethernet header from the iovec.
>
> Move the workaround about multiple iovec array from vu_handle_tx() to
> tap_add_packet(). Removing the offset out of the iovec array should
> reduce the iovec count to 1.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> iov.c | 1 -
> pcap.c | 1 +
> tap.c | 30 +++++++++++++++++++++---------
> tap.h | 3 +--
> vu_common.c | 26 +++++---------------------
> 5 files changed, 28 insertions(+), 33 deletions(-)
>
> diff --git a/iov.c b/iov.c
> index d39bb099fa69..97e4ea733540 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -200,7 +200,6 @@ size_t iov_tail_size(struct iov_tail *tail)
> *
> * Return: true if the item still contains any bytes, otherwise false
> */
> -/* cppcheck-suppress unusedFunction */
> bool iov_tail_drop(struct iov_tail *tail, size_t len)
> {
> tail->off = tail->off + len;
> diff --git a/pcap.c b/pcap.c
> index 46d11a2a6daa..03adc4c55f4b 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -74,6 +74,7 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt,
> * @pkt: Pointer to data buffer, including L2 headers
> * @l2len: L2 frame length
> */
> +/* cppcheck-suppress unusedFunction */
Are we going to get any new users of this? Maybe we can just remove
it now.
> void pcap(const char *pkt, size_t l2len)
> {
> struct iovec iov = { (char *)pkt, l2len };
> diff --git a/tap.c b/tap.c
> index 6db5d88b1760..c5520bf3bc76 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1070,24 +1070,29 @@ void tap_handler(struct ctx *c, const struct timespec *now)
> /**
> * tap_add_packet() - Queue/capture packet, update notion of guest MAC address
> * @c: Execution context
> - * @l2len: Total L2 packet length
> - * @p: Packet buffer
> + * @data: Packet to add to the pool
> * @now: Current timestamp
> */
> -void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
> +void tap_add_packet(struct ctx *c, struct iov_tail *data,
> const struct timespec *now)
> {
> + struct ethhdr eh_storage;
> const struct ethhdr *eh;
>
> - pcap(p, l2len);
> + pcap_iov(data->iov, data->cnt, data->off);
>
> - eh = (struct ethhdr *)p;
> + eh = IOV_PEEK_HEADER(data, eh_storage);
> + if (!eh)
> + return;
>
> if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) {
> memcpy(c->guest_mac, eh->h_source, ETH_ALEN);
> proto_update_l2_buf(c->guest_mac, NULL);
> }
>
> + iov_tail_prune(data);
> + ASSERT(data->cnt == 1); /* packet_add() doesn't support iovec */
So.. the IOV_PEEK_HEADER will have already pruned, but I think it
would be a layering violation to assume that here.
I'm wondering if we should change the invariants for the iov_tail
structure and say they must *always* be pruned, outside of the innards
of the iov_tail handling. That would mean pruning in all
"constructors" of an iov_tail. It would also mean moving the prune to
the end of iov_tail_drop() instead of the beginning. It would have
the nice bonus that peek_header could take the iov_tail as a const *.
> switch (ntohs(eh->h_proto)) {
> case ETH_P_ARP:
> case ETH_P_IP:
> @@ -1095,14 +1100,16 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
> tap4_handler(c, pool_tap4, now);
> pool_flush(pool_tap4);
> }
> - packet_add(pool_tap4, l2len, p);
> + packet_add(pool_tap4, data->iov[0].iov_len - data->off,
> + (char *)data->iov[0].iov_base + data->off);
> break;
> case ETH_P_IPV6:
> if (pool_full(pool_tap6)) {
> tap6_handler(c, pool_tap6, now);
> pool_flush(pool_tap6);
> }
> - packet_add(pool_tap6, l2len, p);
> + packet_add(pool_tap6, data->iov[0].iov_len - data->off,
> + (char *)data->iov[0].iov_base + data->off);
> break;
> default:
> break;
> @@ -1168,6 +1175,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
>
> while (n >= (ssize_t)sizeof(uint32_t)) {
> uint32_t l2len = ntohl_unaligned(p);
> + struct iov_tail data;
>
> if (l2len < sizeof(struct ethhdr) || l2len > L2_MAX_LEN_PASST) {
> err("Bad frame size from guest, resetting connection");
> @@ -1182,7 +1190,8 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now)
> p += sizeof(uint32_t);
> n -= sizeof(uint32_t);
>
> - tap_add_packet(c, l2len, p, now);
> + data = IOV_TAIL_FROM_BUF(p, l2len, 0);
> + tap_add_packet(c, &data, now);
>
> p += l2len;
> n -= l2len;
> @@ -1226,6 +1235,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
> for (n = 0;
> n <= (ssize_t)(sizeof(pkt_buf) - L2_MAX_LEN_PASTA);
> n += len) {
> + struct iov_tail data;
> +
> len = read(c->fd_tap, pkt_buf + n, L2_MAX_LEN_PASTA);
>
> if (len == 0) {
> @@ -1247,7 +1258,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now)
> len > (ssize_t)L2_MAX_LEN_PASTA)
> continue;
>
> - tap_add_packet(c, len, pkt_buf + n, now);
> + data = IOV_TAIL_FROM_BUF(pkt_buf + n, len, 0);
> + tap_add_packet(c, &data, now);
> }
>
> tap_handler(c, now);
> diff --git a/tap.h b/tap.h
> index 936ae9371fd6..ce5510882d5d 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -119,7 +119,6 @@ void tap_sock_update_pool(void *base, size_t size);
> void tap_backend_init(struct ctx *c);
> void tap_flush_pools(void);
> void tap_handler(struct ctx *c, const struct timespec *now);
> -void tap_add_packet(struct ctx *c, ssize_t l2len, char *p,
> +void tap_add_packet(struct ctx *c, struct iov_tail *data,
> const struct timespec *now);
> -
> #endif /* TAP_H */
> diff --git a/vu_common.c b/vu_common.c
> index 5e6fd4a8261f..b77b21420c57 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -163,7 +163,6 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
> struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
> struct vu_virtq *vq = &vdev->vq[index];
> - int hdrlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> int out_sg_count;
> int count;
>
> @@ -176,6 +175,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> while (count < VIRTQUEUE_MAX_SIZE &&
> out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) {
> int ret;
> + struct iov_tail data;
>
> elem[count].out_num = VU_MAX_TX_BUFFER_NB;
> elem[count].out_sg = &out_sg[out_sg_count];
> @@ -191,26 +191,10 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
> warn("virtio-net transmit queue contains no out buffers");
> break;
> }
> - if (elem[count].out_num == 1) {
> - tap_add_packet(vdev->context,
> - elem[count].out_sg[0].iov_len - hdrlen,
> - (char *)elem[count].out_sg[0].iov_base +
> - hdrlen, now);
> - } else {
> - /* vnet header can be in a separate iovec */
> - if (elem[count].out_num != 2) {
> - debug("virtio-net transmit queue contains more than one buffer ([%d]: %u)",
> - count, elem[count].out_num);
> - } else if (elem[count].out_sg[0].iov_len != (size_t)hdrlen) {
> - debug("virtio-net transmit queue entry not aligned on hdrlen ([%d]: %d != %zu)",
> - count, hdrlen, elem[count].out_sg[0].iov_len);
> - } else {
> - tap_add_packet(vdev->context,
> - elem[count].out_sg[1].iov_len,
> - (char *)elem[count].out_sg[1].iov_base,
> - now);
> - }
> - }
> +
> + data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0);
> + if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf))
> + tap_add_packet(vdev->context, &data, now);
Nice simplification here.
>
> count++;
> }
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 06/30] packet: Add packet_data()
2025-08-05 15:46 ` [PATCH v8 06/30] packet: Add packet_data() Laurent Vivier
@ 2025-08-06 2:14 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:14 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3403 bytes --]
On Tue, Aug 05, 2025 at 05:46:04PM +0200, Laurent Vivier wrote:
> packet_data() gets the data range from a packet descriptor from a
> given pool.
>
> It uses iov_tail to return the packet memory.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
I assume the idea is to eventually replace packet_get() with this?
> ---
> packet.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> packet.h | 5 +++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/packet.c b/packet.c
> index 98ded4e27aae..82adc9fd1a39 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -190,6 +190,48 @@ void *packet_get_do(const struct pool *p, const size_t idx,
> return r;
> }
>
> +/**
> + * packet_data_do() - Get data range from packet descriptor from given pool
> + * @p: Packet pool
> + * @idx: Index of packet descriptor in pool
> + * @data: IOV tail to store the address of the data (output)
> + * @func: For tracing: name of calling function, NULL means no trace()
> + * @line: For tracing: caller line of function call
> + *
> + * Return: false if packet index is invalid, true otherwise.
> + * If something wrong with @data, don't return at all (assert).
> + */
> +/* cppcheck-suppress unusedFunction */
> +bool packet_data_do(const struct pool *p, size_t idx,
> + struct iov_tail *data,
> + const char *func, int line)
> +{
> + size_t i;
> +
> + ASSERT_WITH_MSG(p->count <= p->size,
> + "Corrupted pool count: %zu, size: %zu, %s:%i",
> + p->count, p->size, func, line);
> +
> + if (idx >= p->count) {
> + debug("packet %zu from pool size: %zu, count: %zu, "
> + "%s:%i", idx, p->size, p->count, func, line);
> + return false;
> + }
> +
> + data->cnt = 1;
> + data->off = 0;
> + data->iov = &p->pkt[idx];
> +
> + for (i = 0; i < data->cnt; i++) {
> + ASSERT_WITH_MSG(!packet_check_range(p, data->iov[i].iov_base,
> + data->iov[i].iov_len,
> + func, line),
> + "Corrupt packet pool, %s:%i", func, line);
> + }
> +
> + return true;
> +}
> +
> /**
> * pool_flush() - Flush a packet pool
> * @p: Pointer to packet pool
> diff --git a/packet.h b/packet.h
> index af40b39b5251..062afb978124 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -39,6 +39,9 @@ void *packet_get_try_do(const struct pool *p, const size_t idx,
> void *packet_get_do(const struct pool *p, const size_t idx,
> size_t offset, size_t len, size_t *left,
> const char *func, int line);
> +bool packet_data_do(const struct pool *p, const size_t idx,
> + struct iov_tail *data,
> + const char *func, int line);
> bool pool_full(const struct pool *p);
> void pool_flush(struct pool *p);
>
> @@ -49,6 +52,8 @@ void pool_flush(struct pool *p);
> packet_get_try_do(p, idx, offset, len, left, __func__, __LINE__)
> #define packet_get(p, idx, offset, len, left) \
> packet_get_do(p, idx, offset, len, left, __func__, __LINE__)
> +#define packet_data(p, idx, data) \
> + packet_data_do(p, idx, data, __func__, __LINE__)
>
> #define PACKET_POOL_DECL(_name, _size, _buf) \
> struct _name ## _t { \
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/30] arp: Convert to iov_tail
2025-08-05 15:46 ` [PATCH v8 07/30] arp: Convert to iov_tail Laurent Vivier
@ 2025-08-06 2:17 ` David Gibson
2025-08-07 12:58 ` Laurent Vivier
0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:17 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]
On Tue, Aug 05, 2025 at 05:46:05PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Still R-b, but making an observation below that's perhaps more
relevant to the previous patch.
> ---
> arp.c | 12 +++++++++---
> packet.c | 1 -
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arp.c b/arp.c
> index 9f1fedeafec0..b3ac42082841 100644
> --- a/arp.c
> +++ b/arp.c
> @@ -74,14 +74,20 @@ int arp(const struct ctx *c, const struct pool *p)
> struct arphdr ah;
> struct arpmsg am;
> } __attribute__((__packed__)) resp;
> + struct arphdr ah_storage;
> + struct ethhdr eh_storage;
> + struct arpmsg am_storage;
> const struct ethhdr *eh;
> const struct arphdr *ah;
> const struct arpmsg *am;
> + struct iov_tail data;
>
> - eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
> - ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
> - am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
> + if (!packet_data(p, 0, &data))
> + return -1;
The only case where packet_data() will return false is if you give it
a bad packet index. That should never happen, by construction. So
I'm wondering if that should be an ASSSERT() in packet_data() rather
than a return value.
> + eh = IOV_REMOVE_HEADER(&data, eh_storage);
> + ah = IOV_REMOVE_HEADER(&data, ah_storage);
> + am = IOV_REMOVE_HEADER(&data, am_storage);
> if (!eh || !ah || !am)
> return -1;
>
> diff --git a/packet.c b/packet.c
> index 82adc9fd1a39..34b1722b9a03 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -201,7 +201,6 @@ void *packet_get_do(const struct pool *p, const size_t idx,
> * Return: false if packet index is invalid, true otherwise.
> * If something wrong with @data, don't return at all (assert).
> */
> -/* cppcheck-suppress unusedFunction */
> bool packet_data_do(const struct pool *p, size_t idx,
> struct iov_tail *data,
> const char *func, int line)
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 09/30] icmp: Convert to iov_tail
2025-08-05 15:46 ` [PATCH v8 09/30] icmp: " Laurent Vivier
@ 2025-08-06 2:20 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:20 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 4381 bytes --]
On Tue, Aug 05, 2025 at 05:46:07PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_PEEK_HEADER()
> rather than packet_get().
>
> Introduce iov_tail_msghdr() to convert iov_tail to msghdr
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> icmp.c | 25 ++++++++++++++-----------
> iov.c | 23 +++++++++++++++++++++++
> iov.h | 2 ++
> 3 files changed, 39 insertions(+), 11 deletions(-)
>
> diff --git a/icmp.c b/icmp.c
> index 95f38c1e2a3a..fdfc857b5ae8 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -241,25 +241,27 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> struct icmp_ping_flow *pingf;
> const struct flowside *tgt;
> union sockaddr_inany sa;
> - size_t dlen, l4len;
> + struct iov_tail data;
> + struct msghdr msh;
> uint16_t id, seq;
> union flow *flow;
> uint8_t proto;
> socklen_t sl;
> - void *pkt;
>
> (void)saddr;
> ASSERT(pif == PIF_TAP);
>
> + if (!packet_data(p, 0, &data))
> + return -1;
> +
> if (af == AF_INET) {
> + struct icmphdr ih_storage;
> const struct icmphdr *ih;
>
> - if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen)))
> + ih = IOV_PEEK_HEADER(&data, ih_storage);
> + if (!ih)
> return 1;
>
> - ih = (struct icmphdr *)pkt;
> - l4len = dlen + sizeof(*ih);
> -
> if (ih->type != ICMP_ECHO)
> return 1;
>
> @@ -267,14 +269,13 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> id = ntohs(ih->un.echo.id);
> seq = ntohs(ih->un.echo.sequence);
> } else if (af == AF_INET6) {
> + struct icmp6hdr ih_storage;
> const struct icmp6hdr *ih;
>
> - if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen)))
> + ih = IOV_PEEK_HEADER(&data, ih_storage);
> + if (!ih)
> return 1;
>
> - ih = (struct icmp6hdr *)pkt;
> - l4len = dlen + sizeof(*ih);
> -
> if (ih->icmp6_type != ICMPV6_ECHO_REQUEST)
> return 1;
>
> @@ -298,8 +299,10 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> ASSERT(flow_proto[pingf->f.type] == proto);
> pingf->ts = now->tv_sec;
>
> +
Bogus extra blank line.
> pif_sockaddr(c, &sa, &sl, PIF_HOST, &tgt->eaddr, 0);
> - if (sendto(pingf->sock, pkt, l4len, MSG_NOSIGNAL, &sa.sa, sl) < 0) {
> + iov_tail_msghdr(&msh, &data, &sa, sl);
> + if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) {
> flow_dbg_perror(pingf, "failed to relay request to socket");
> } else {
> flow_dbg(pingf,
> diff --git a/iov.c b/iov.c
> index 97e4ea733540..9d99beb32532 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -158,6 +158,29 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt)
> return len;
> }
>
> +/**
> + * iov_tail_msghdr - Initialize a msghdr from an IOV tail structure
> + * @msh: msghdr to initialize
> + * @tail: iov_tail to use to set msg_iov and msg_iovlen
> + * @msg_name: Pointer to set to msg_name
> + * @msg_namelen: Size of @msg_name
> + */
> +void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
> + void *msg_name, socklen_t msg_namelen)
> +{
> + iov_tail_prune(tail);
> +
> + ASSERT(tail->off == 0);
Oof, this is a pretty nasty non-obvious constraint on calling this
function. The whole point of iov_tails is the offset, but this
function won't work in that case.
> + msh->msg_name = msg_name;
> + msh->msg_namelen = msg_namelen;
> + msh->msg_iov = (struct iovec *)tail->iov;
> + msh->msg_iovlen = tail->cnt;
> + msh->msg_control = NULL;
> + msh->msg_controllen = 0;
> + msh->msg_flags = 0;
> +}
> +
> /**
> * iov_tail_prune() - Remove any unneeded buffers from an IOV tail
> * @tail: IO vector tail (modified)
> diff --git a/iov.h b/iov.h
> index ccdb690ef3f1..75c3b07a87e3 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -82,6 +82,8 @@ struct iov_tail {
> 1, \
> (off_))
>
> +void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
> + void *msg_name, socklen_t msg_namelen);
> bool iov_tail_prune(struct iov_tail *tail);
> size_t iov_tail_size(struct iov_tail *tail);
> bool iov_tail_drop(struct iov_tail *tail, size_t len);
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 10/30] udp: Convert to iov_tail
2025-08-05 15:46 ` [PATCH v8 10/30] udp: " Laurent Vivier
@ 2025-08-06 2:23 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:23 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]
On Tue, Aug 05, 2025 at 05:46:08PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> and IOV_PEEK_HEADER() rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> iov.c | 1 -
> udp.c | 33 ++++++++++++++++++++++-----------
> 2 files changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/iov.c b/iov.c
> index 9d99beb32532..f519eb3cfeaf 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -334,7 +334,6 @@ void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t alig
> * iov array, a negative value if there is not enough room in the
> * destination iov array
> */
> -/* cppcheck-suppress unusedFunction */
> ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> struct iov_tail *tail)
> {
> diff --git a/udp.c b/udp.c
> index 75edc2054d4a..3c25f2e0ae97 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -978,9 +978,11 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
> struct mmsghdr mm[UIO_MAXIOV];
> union sockaddr_inany to_sa;
> struct iovec m[UIO_MAXIOV];
> + struct udphdr uh_storage;
> const struct udphdr *uh;
> struct udp_flow *uflow;
> - int i, s, count = 0;
> + int i, j, s, count = 0;
> + struct iov_tail data;
> flow_sidx_t tosidx;
> in_port_t src, dst;
> uint8_t topif;
> @@ -988,7 +990,10 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
>
> ASSERT(!c->no_udp);
>
> - uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
> + if (!packet_data(p, idx, &data))
> + return 1;
> +
> + uh = IOV_PEEK_HEADER(&data, uh_storage);
> if (!uh)
> return 1;
>
> @@ -1025,23 +1030,29 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
>
> pif_sockaddr(c, &to_sa, &sl, topif, &toside->eaddr, toside->eport);
>
> - for (i = 0; i < (int)p->count - idx; i++) {
> - struct udphdr *uh_send;
> - size_t len;
> + for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
> + const struct udphdr *uh_send;
>
> - uh_send = packet_get(p, idx + i, 0, sizeof(*uh), &len);
> + if (!packet_data(p, idx + i, &data))
> + return p->count - idx;
> +
> + uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
> if (!uh_send)
> return p->count - idx;
>
> mm[i].msg_hdr.msg_name = &to_sa;
> mm[i].msg_hdr.msg_namelen = sl;
>
> - if (len) {
> - m[i].iov_base = (char *)(uh_send + 1);
> - m[i].iov_len = len;
> + if (data.cnt) {
> + int cnt;
> +
> + cnt = iov_tail_clone(&m[j], UIO_MAXIOV - j, &data);
> + if (cnt < 0)
> + return p->count - idx;
>
> - mm[i].msg_hdr.msg_iov = m + i;
> - mm[i].msg_hdr.msg_iovlen = 1;
> + mm[i].msg_hdr.msg_iov = &m[j];
> + mm[i].msg_hdr.msg_iovlen = cnt;
> + j += cnt;
> } else {
> mm[i].msg_hdr.msg_iov = NULL;
> mm[i].msg_hdr.msg_iovlen = 0;
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail
2025-08-05 15:46 ` [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail Laurent Vivier
@ 2025-08-06 2:35 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:35 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]
On Tue, Aug 05, 2025 at 05:46:09PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> and iov_remove_header_() rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 31 ++++++++++++++++++++++++-------
> 1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 957b498db32d..f1048d7230c9 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -310,6 +310,16 @@
> #include "tcp_buf.h"
> #include "tcp_vu.h"
>
> +/*
> + * The size of TCP header (including options) is given by doff (Data Offset)
> + * that is a 4-bit value specifying the number of 32-bit words in the header.
> + * The maximum value of doff is 15 [(1 << 4) - 1].
> + * The maximum length in bytes of options is 15 minus the number of 32-bit
> + * words in the minimal TCP header (5) multiplied by the length of a 32-bit
> + * word (4).
> + */
> +#define OPTLEN_MAX (((1UL << 4) - 1 - 5) * 4UL)
> +
> #ifndef __USE_MISC
> /* From Linux UAPI, missing in netinet/tcp.h provided by musl */
> struct tcp_repair_opt {
> @@ -1957,8 +1967,11 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> const struct pool *p, int idx, const struct timespec *now)
> {
> struct tcp_tap_conn *conn;
> + struct tcphdr th_storage;
> const struct tcphdr *th;
> - size_t optlen, len;
> + char optsc[OPTLEN_MAX];
> + struct iov_tail data;
> + size_t optlen, l4len;
> const char *opts;
> union flow *flow;
> flow_sidx_t sidx;
> @@ -1967,15 +1980,19 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
>
> (void)pif;
>
> - th = packet_get(p, idx, 0, sizeof(*th), &len);
> + if (!packet_data(p, idx, &data))
> + return 1;
> +
> + l4len = iov_tail_size(&data);
> +
> + th = IOV_REMOVE_HEADER(&data, th_storage);
> if (!th)
> return 1;
> - len += sizeof(*th);
>
> optlen = th->doff * 4UL - sizeof(*th);
> /* Static checkers might fail to see this: */
> - optlen = MIN(optlen, ((1UL << 4) /* from doff width */ - 6) * 4UL);
> - opts = packet_get(p, idx, sizeof(*th), optlen, NULL);
> + optlen = MIN(optlen, OPTLEN_MAX);
Pre-exisitng, but should we just drop packets with too many options,
rather than truncating the options?
> + opts = (char *)iov_remove_header_(&data, &optsc[0], optlen, 1);
>
> sidx = flow_lookup_af(c, IPPROTO_TCP, PIF_TAP, af, saddr, daddr,
> ntohs(th->source), ntohs(th->dest));
> @@ -1987,7 +2004,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> tcp_conn_from_tap(c, af, saddr, daddr, th,
> opts, optlen, now);
> else
> - tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, len);
> + tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, l4len);
> return 1;
> }
>
> @@ -1995,7 +2012,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> ASSERT(pif_at_sidx(sidx) == PIF_TAP);
> conn = &flow->tcp;
>
> - flow_trace(conn, "packet length %zu from tap", len);
> + flow_trace(conn, "packet length %zu from tap", l4len);
>
> if (th->rst) {
> conn_event(c, conn, CLOSED);
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() to use iov_tail
2025-08-05 15:46 ` [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() " Laurent Vivier
@ 2025-08-06 2:37 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 2:37 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]
On Tue, Aug 05, 2025 at 05:46:10PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_PEEK_HEADER()
> rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 30 +++++++++++++++++++-----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index f1048d7230c9..e0efc4cacb9b 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1651,16 +1651,22 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>
> for (i = idx, iov_i = 0; i < (int)p->count; i++) {
> uint32_t seq, seq_offset, ack_seq;
> + struct tcphdr th_storage;
> const struct tcphdr *th;
> - char *data;
> - size_t off;
> + struct iov_tail data;
> + size_t off, size;
> + int count;
>
> - th = packet_get(p, i, 0, sizeof(*th), &len);
> + if (!packet_data(p, i, &data))
> + return -1;
> +
> + th = IOV_PEEK_HEADER(&data, th_storage);
> if (!th)
> return -1;
> - len += sizeof(*th);
> + len = iov_tail_size(&data);
>
> off = th->doff * 4UL;
> +
> if (off < sizeof(*th) || off > len)
> return -1;
>
> @@ -1670,9 +1676,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> }
>
> len -= off;
> - data = packet_get(p, i, off, len, NULL);
> - if (!data)
> - continue;
> + iov_tail_drop(&data, off);
>
> seq = ntohl(th->seq);
> if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
> @@ -1746,10 +1750,14 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> continue;
> }
>
> - tcp_iov[iov_i].iov_base = data + seq_offset;
> - tcp_iov[iov_i].iov_len = len - seq_offset;
> - seq_from_tap += tcp_iov[iov_i].iov_len;
> - iov_i++;
> + iov_tail_drop(&data, seq_offset);
> + size = len - seq_offset;
> + count = iov_tail_clone(&tcp_iov[iov_i], UIO_MAXIOV - iov_i,
> + &data);
> + if (count < 0)
> + break;
> + seq_from_tap += size;
> + iov_i += count;
>
> if (keep == i)
> keep = -1;
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt()
2025-08-05 15:46 ` [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt() Laurent Vivier
@ 2025-08-06 4:14 ` David Gibson
2025-08-08 13:59 ` Laurent Vivier
0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2025-08-06 4:14 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 13085 bytes --]
On Tue, Aug 05, 2025 at 05:46:14PM +0200, Laurent Vivier wrote:
> dhcpv6_opt() and its callers are refactored for iov_tail option parsing,
> replacing direct offset management for improved robustness.
>
> Its signature is now `bool dhcpv6_opt(iov_tail *data, type)`. `*data` (in/out)
> points to a found option on `true` return or is restored on `false`.
> The main dhcpv6() function uses IOV_REMOVE_HEADER for the msg_hdr, then
> passes the iov_tail (now at options start) to the new dhcpv6_opt().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Hmm. I'm not sure this is a great use case for iov_tail - the code is
repeatedly scanning the same options, so there's a whole bunch of
rewinding. It works, but it seems awkward.
DHCP is a slow path, anyway, so maybe we'd be better off just
linearizing the entire packet and using plain old pointers to scan
through the options.
> ---
> dhcpv6.c | 179 ++++++++++++++++++++++++++++++++-----------------------
> iov.c | 1 -
> 2 files changed, 104 insertions(+), 76 deletions(-)
>
> diff --git a/dhcpv6.c b/dhcpv6.c
> index ae06e646f92f..e93acaf9955e 100644
> --- a/dhcpv6.c
> +++ b/dhcpv6.c
> @@ -280,112 +280,125 @@ static struct resp_not_on_link_t {
>
> /**
> * dhcpv6_opt() - Get option from DHCPv6 message
> - * @p: Packet pool, single packet with UDP header
> - * @offset: Offset to look at, 0: end of header, set to option start
> + * @data: Buffer with options, set to matching option on return
> * @type: Option type to look up, network order
> *
> - * Return: pointer to option header, or NULL on malformed or missing option
> + * Return: true if found and @data points to the option header,
> + * or false on malformed or missing option and @data is
> + * unmodified.
> */
> -static struct opt_hdr *dhcpv6_opt(const struct pool *p, size_t *offset,
> - uint16_t type)
> +static bool dhcpv6_opt(struct iov_tail *data, uint16_t type)
> {
> - struct opt_hdr *o;
> - size_t left;
> + struct iov_tail head = *data;
> + struct opt_hdr o_storage;
> + const struct opt_hdr *o;
>
> - ASSERT(*offset >= UDP_MSG_HDR_SIZE);
> -
> - while ((o = packet_get_try(p, 0, *offset, sizeof(*o), &left))) {
> + while ((o = IOV_PEEK_HEADER(data, o_storage))) {
> unsigned int opt_len = ntohs(o->l) + sizeof(*o);
>
> - if (ntohs(o->l) > left)
> - return NULL;
> + if (opt_len > iov_tail_size(data))
> + break;
>
> if (o->t == type)
> - return o;
> + return true;
>
> - *offset += opt_len;
> + iov_tail_drop(data, opt_len);
> }
>
> - return NULL;
> + *data = head;
> + return false;
> }
>
> /**
> * dhcpv6_ia_notonlink() - Check if any IA contains non-appropriate addresses
> - * @p: Packet pool, single packet starting from UDP header
> + * @data: Data to look at, packet starting from UDP header (input/output)
> * @la: Address we want to lease to the client
> *
> - * Return: pointer to non-appropriate IA_NA or IA_TA, if any, NULL otherwise
> + * Return: true and @data points to non-appropriate IA_NA or IA_TA, if any,
> + * false otherwise and @data is unmodified
> */
> -static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
> - struct in6_addr *la)
> +static bool dhcpv6_ia_notonlink(struct iov_tail *data,
> + struct in6_addr *la)
> {
> int ia_types[2] = { OPT_IA_NA, OPT_IA_TA }, *ia_type;
> + struct opt_ia_addr opt_addr_storage;
> const struct opt_ia_addr *opt_addr;
> + struct iov_tail current, ia_base;
> + struct opt_ia_na ia_storage;
> char buf[INET6_ADDRSTRLEN];
> + const struct opt_ia_na *ia;
> struct in6_addr req_addr;
> + struct opt_hdr h_storage;
> const struct opt_hdr *h;
> - struct opt_hdr *ia;
> - size_t offset;
>
> foreach(ia_type, ia_types) {
> - offset = UDP_MSG_HDR_SIZE;
> - while ((ia = dhcpv6_opt(p, &offset, *ia_type))) {
> - if (ntohs(ia->l) < OPT_VSIZE(ia_na))
> - return NULL;
> -
> - offset += sizeof(struct opt_ia_na);
> + current = *data;
> + while (dhcpv6_opt(¤t, *ia_type)) {
> + ia_base = current;
> + ia = IOV_REMOVE_HEADER(¤t, ia_storage);
> + if (!ia || ntohs(ia->hdr.l) < OPT_VSIZE(ia_na))
> + goto notfound;
> +
> + while (dhcpv6_opt(¤t, OPT_IAAADR)) {
> + h = IOV_PEEK_HEADER(¤t, h_storage);
> + if (!h || ntohs(h->l) != OPT_VSIZE(ia_addr))
> + goto notfound;
> +
> + opt_addr = IOV_REMOVE_HEADER(¤t,
> + opt_addr_storage);
> + if (!opt_addr)
> + goto notfound;
>
> - while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
> - if (ntohs(h->l) != OPT_VSIZE(ia_addr))
> - return NULL;
> -
> - opt_addr = (const struct opt_ia_addr *)h;
> req_addr = opt_addr->addr;
> if (!IN6_ARE_ADDR_EQUAL(la, &req_addr))
> - goto err;
> -
> - offset += sizeof(struct opt_ia_addr);
> + goto notonlink;
> }
> }
> }
>
> - return NULL;
> +notfound:
> + return false;
>
> -err:
> +notonlink:
> info("DHCPv6: requested address %s not on link",
> inet_ntop(AF_INET6, &req_addr, buf, sizeof(buf)));
> - return ia;
> + *data = ia_base;
> + return true;
> }
>
> /**
> * dhcpv6_send_ia_notonlink() - Send NotOnLink status
> - * @c: Execution context
> - * @ia: Pointer to non-appropriate IA_NA or IA_TA
> - * @client_id: Client ID message option
> - * xid: Transaction ID for message exchange
> + * @c: Execution context
> + * @ia_base: Non-appropriate IA_NA or IA_TA base
> + * @client_id_base: Client ID message option base
> + * @len: Client ID length
> + * @xid: Transaction ID for message exchange
> */
> -static void dhcpv6_send_ia_notonlink(struct ctx *c, struct opt_hdr *ia,
> - const struct opt_hdr *client_id,
> - uint32_t xid)
> +static void dhcpv6_send_ia_notonlink(struct ctx *c,
> + const struct iov_tail *ia_base,
> + const struct iov_tail *client_id_base,
> + int len, uint32_t xid)
> {
> const struct in6_addr *src = &c->ip6.our_tap_ll;
> + struct opt_hdr *ia = (struct opt_hdr *)resp_not_on_link.var;
> size_t n;
>
> info("DHCPv6: received CONFIRM with inappropriate IA,"
> " sending NotOnLink status in REPLY");
>
> - ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
> -
> n = sizeof(struct opt_ia_na);
> - memcpy(resp_not_on_link.var, ia, n);
> + iov_to_buf(&ia_base->iov[0], ia_base->cnt, ia_base->off,
> + resp_not_on_link.var, n);
> + ia->l = htons(OPT_VSIZE(ia_na) + sizeof(sc_not_on_link));
> memcpy(resp_not_on_link.var + n, &sc_not_on_link,
> sizeof(sc_not_on_link));
>
> n += sizeof(sc_not_on_link);
> - memcpy(resp_not_on_link.var + n, client_id,
> - sizeof(struct opt_hdr) + ntohs(client_id->l));
> + iov_to_buf(&client_id_base->iov[0], client_id_base->cnt,
> + client_id_base->off, resp_not_on_link.var + n,
> + sizeof(struct opt_hdr) + len);
>
> - n += sizeof(struct opt_hdr) + ntohs(client_id->l);
> + n += sizeof(struct opt_hdr) + len;
>
> n = offsetof(struct resp_not_on_link_t, var) + n;
>
> @@ -474,17 +487,19 @@ search:
>
> /**
> * dhcpv6_client_fqdn_fill() - Fill in client FQDN option
> + * @data: Data to look at
> * @c: Execution context
> * @buf: Response message buffer where options will be appended
> * @offset: Offset in message buffer for new options
> *
> * Return: updated length of response message buffer.
> */
> -static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
> +static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data,
> + const struct ctx *c,
> char *buf, int offset)
>
> {
> - struct opt_client_fqdn const *req_opt;
> + struct iov_tail current = *data;
> struct opt_client_fqdn *o;
> size_t opt_len;
>
> @@ -502,14 +517,16 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
> }
>
> o = (struct opt_client_fqdn *)(buf + offset);
> + o->flags = 0x00;
> encode_domain_name(o->domain_name, c->fqdn);
> - req_opt = (struct opt_client_fqdn *)dhcpv6_opt(p,
> - &(size_t){ UDP_MSG_HDR_SIZE },
> - OPT_CLIENT_FQDN);
> - if (req_opt && req_opt->flags & 0x01 /* S flag */)
> - o->flags = 0x02 /* O flag */;
> - else
> - o->flags = 0x00;
> + if (dhcpv6_opt(¤t, OPT_CLIENT_FQDN)) {
> + struct opt_client_fqdn req_opt_storage;
> + struct opt_client_fqdn const *req_opt;
> +
> + req_opt = IOV_PEEK_HEADER(¤t, req_opt_storage);
> + if (req_opt && req_opt->flags & 0x01 /* S flag */)
> + o->flags = 0x02 /* O flag */;
> + }
>
> opt_len++;
>
> @@ -531,14 +548,18 @@ static size_t dhcpv6_client_fqdn_fill(const struct pool *p, const struct ctx *c,
> int dhcpv6(struct ctx *c, const struct pool *p,
> const struct in6_addr *saddr, const struct in6_addr *daddr)
> {
> - const struct opt_hdr *client_id, *server_id, *ia;
> + const struct opt_server_id *server_id = NULL;
> + struct iov_tail data, opt, client_id_base;
> + const struct opt_hdr *client_id = NULL;
> + struct opt_server_id server_id_storage;
> + const struct opt_ia_na *ia = NULL;
> + struct opt_hdr client_id_storage;
> + struct opt_ia_na ia_storage;
> const struct in6_addr *src;
> struct msg_hdr mh_storage;
> const struct msg_hdr *mh;
> struct udphdr uh_storage;
> const struct udphdr *uh;
> - struct opt_hdr *bad_ia;
> - struct iov_tail data;
> size_t mlen, n;
>
> if (!packet_data(p, 0, &data))
> @@ -565,20 +586,26 @@ int dhcpv6(struct ctx *c, const struct pool *p,
>
> src = &c->ip6.our_tap_ll;
>
> - mh = IOV_PEEK_HEADER(&data, mh_storage);
> + mh = IOV_REMOVE_HEADER(&data, mh_storage);
> if (!mh)
> return -1;
>
> - client_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_CLIENTID);
> + client_id_base = data;
> + if (dhcpv6_opt(&client_id_base, OPT_CLIENTID))
> + client_id = IOV_PEEK_HEADER(&client_id_base, client_id_storage);
> if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
> return -1;
>
> - server_id = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_SERVERID);
> - if (server_id && ntohs(server_id->l) != OPT_VSIZE(server_id))
> + opt = data;
> + if (dhcpv6_opt(&opt, OPT_SERVERID))
> + server_id = IOV_PEEK_HEADER(&opt, server_id_storage);
> + if (server_id && ntohs(server_id->hdr.l) != OPT_VSIZE(server_id))
> return -1;
>
> - ia = dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_NA);
> - if (ia && ntohs(ia->l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
> + opt = data;
> + if (dhcpv6_opt(&opt, OPT_IA_NA))
> + ia = IOV_PEEK_HEADER(&opt, ia_storage);
> + if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
> return -1;
>
> resp.hdr.type = TYPE_REPLY;
> @@ -593,9 +620,10 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> if (mh->type == TYPE_CONFIRM && server_id)
> return -1;
>
> - if ((bad_ia = dhcpv6_ia_notonlink(p, &c->ip6.addr))) {
> + if (dhcpv6_ia_notonlink(&data, &c->ip6.addr)) {
>
> - dhcpv6_send_ia_notonlink(c, bad_ia, client_id, mh->xid);
> + dhcpv6_send_ia_notonlink(c, &data, &client_id_base,
> + ntohs(client_id->l), mh->xid);
>
> return 1;
> }
> @@ -607,7 +635,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
> return -1;
>
> - if (ia || dhcpv6_opt(p, &(size_t){ UDP_MSG_HDR_SIZE }, OPT_IA_TA))
> + if (ia || dhcpv6_opt(&data, OPT_IA_TA))
> return -1;
>
> info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
> @@ -633,13 +661,14 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> if (ia)
> resp.ia_na.iaid = ((struct opt_ia_na *)ia)->iaid;
>
> - memcpy(&resp.client_id, client_id,
> - ntohs(client_id->l) + sizeof(struct opt_hdr));
> + iov_to_buf(&client_id_base.iov[0], client_id_base.cnt,
> + client_id_base.off, &resp.client_id,
> + ntohs(client_id->l) + sizeof(struct opt_hdr));
>
> n = offsetof(struct resp_t, client_id) +
> sizeof(struct opt_hdr) + ntohs(client_id->l);
> n = dhcpv6_dns_fill(c, (char *)&resp, n);
> - n = dhcpv6_client_fqdn_fill(p, c, (char *)&resp, n);
> + n = dhcpv6_client_fqdn_fill(&data, c, (char *)&resp, n);
>
> resp.hdr.xid = mh->xid;
>
> diff --git a/iov.c b/iov.c
> index f519eb3cfeaf..d17d4dd3da09 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -109,7 +109,6 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
> *
> * Return: the number of bytes successfully copied.
> */
> -/* cppcheck-suppress [staticFunction] */
> size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
> size_t offset, void *buf, size_t bytes)
> {
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 17/30] dhcp: Convert to iov_tail
2025-08-05 15:46 ` [PATCH v8 17/30] dhcp: Convert to iov_tail Laurent Vivier
@ 2025-08-06 4:38 ` David Gibson
2025-08-08 9:33 ` Laurent Vivier
0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2025-08-06 4:38 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 4285 bytes --]
On Tue, Aug 05, 2025 at 05:46:15PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> and IOV_PEEK_HEADER() rather than packet_get().
Unlike the previous patch, I think using iov_tail does work here,
because there's a single scan through the options, rather than
repeatedly scanning for options of specific types.
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> dhcp.c | 46 ++++++++++++++++++++++++++++------------------
> 1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/dhcp.c b/dhcp.c
> index b0de04be6f27..cf73d4b07767 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -302,27 +302,33 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
> */
> int dhcp(const struct ctx *c, const struct pool *p)
> {
> - size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
> char macstr[ETH_ADDRSTRLEN];
> + size_t mlen, dlen, opt_len;
> struct in_addr mask, dst;
> + struct ethhdr eh_storage;
> + struct iphdr iph_storage;
> + struct udphdr uh_storage;
> const struct ethhdr *eh;
> const struct iphdr *iph;
> const struct udphdr *uh;
> + struct iov_tail data;
> struct msg const *m;
Pre-existing, but I'm a bit baffled as to what the (const *) is doing here.
> struct msg reply;
> unsigned int i;
> + struct msg m_storage;
>
> - eh = packet_get(p, 0, offset, sizeof(*eh), NULL);
> - offset += sizeof(*eh);
> + if (!packet_data(p, 0, &data))
> + return -1;
>
> - iph = packet_get(p, 0, offset, sizeof(*iph), NULL);
> + eh = IOV_REMOVE_HEADER(&data, eh_storage);
> + iph = IOV_PEEK_HEADER(&data, iph_storage);
> if (!eh || !iph)
> return -1;
>
> - offset += iph->ihl * 4UL;
> - uh = packet_get(p, 0, offset, sizeof(*uh), &mlen);
> - offset += sizeof(*uh);
> + if (!iov_tail_drop(&data, iph->ihl * 4UL))
> + return -1;
>
> + uh = IOV_REMOVE_HEADER(&data, uh_storage);
> if (!uh)
> return -1;
>
> @@ -332,7 +338,10 @@ int dhcp(const struct ctx *c, const struct pool *p)
> if (c->no_dhcp)
> return 1;
>
> - m = packet_get(p, 0, offset, offsetof(struct msg, o), &opt_len);
> + mlen = iov_tail_size(&data);
> + m = (struct msg const *)iov_remove_header_(&data, &m_storage,
> + offsetof(struct msg, o),
> + __alignof__(struct msg));
> if (!m ||
> mlen != ntohs(uh->len) - sizeof(*uh) ||
> mlen < offsetof(struct msg, o) ||
> @@ -355,27 +364,28 @@ int dhcp(const struct ctx *c, const struct pool *p)
> memset(&reply.file, 0, sizeof(reply.file));
> reply.magic = m->magic;
>
> - offset += offsetof(struct msg, o);
> -
> for (i = 0; i < ARRAY_SIZE(opts); i++)
> opts[i].clen = -1;
>
> - while (opt_off + 2 < opt_len) {
> - const uint8_t *olen, *val;
> + opt_len = iov_tail_size(&data);
> + while (opt_len >= 2) {
> + uint8_t olen_storage, type_storage;
> + const uint8_t *olen;
> uint8_t *type;
>
> - type = packet_get(p, 0, offset + opt_off, 1, NULL);
> - olen = packet_get(p, 0, offset + opt_off + 1, 1, NULL);
> + type = IOV_REMOVE_HEADER(&data, type_storage);
> + olen = IOV_REMOVE_HEADER(&data, olen_storage);
It seems a bit mad to access single bytes via 8-byte pointers, but
it's probably not worth the hassle of handling it differently in this
one case.
> if (!type || !olen)
> return -1;
>
> - val = packet_get(p, 0, offset + opt_off + 2, *olen, NULL);
> - if (!val)
> + opt_len = iov_tail_size(&data);
> + if (opt_len < *olen)
> return -1;
>
> - memcpy(&opts[*type].c, val, *olen);
> + iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
So, IIUC, if *olen is much too big, this is still safe..
> opts[*type].clen = *olen;
.. but recording *olen unedited as the length of the option is
probably wrong in that case.
> - opt_off += *olen + 2;
> + iov_tail_drop(&data, *olen);
> + opt_len -= *olen;
Isn't the stanza above doing the equivalent of an
iov_remove_header_()?
> }
>
> opts[80].slen = -1;
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr()
2025-08-05 15:46 ` [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr() Laurent Vivier
@ 2025-08-06 5:12 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 5:12 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 4770 bytes --]
On Tue, Aug 05, 2025 at 05:46:16PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> and IOV_PEEK_HEADER() rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> ip.c | 32 +++++++++++++++-----------------
> ip.h | 3 +--
> packet.c | 1 +
> tap.c | 4 +++-
> 4 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/ip.c b/ip.c
> index 2cc7f6548aff..50bd69a70596 100644
> --- a/ip.c
> +++ b/ip.c
> @@ -23,50 +23,48 @@
>
> /**
> * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
> - * @p: Packet pool, packet number @idx has IPv6 header at @offset
> - * @idx: Index of packet in pool
> - * @offset: Pre-calculated IPv6 header offset
> + * @data: IPv6 packet
> * @proto: Filled with L4 protocol number
> * @dlen: Data length (payload excluding header extensions), set on return
> *
> - * Return: pointer to L4 header, NULL if not found
> + * Return: true if the L4 header is found and @data, @proto, @dlen are set,
> + * false on error. Outputs are indeterminate on failure.
> */
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> - size_t *dlen)
> +bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen)
> {
> + struct ipv6_opt_hdr o_storage;
> const struct ipv6_opt_hdr *o;
> + struct ipv6hdr ip6h_storage;
> const struct ipv6hdr *ip6h;
> - char *base;
> int hdrlen;
> uint8_t nh;
>
> - base = packet_get(p, idx, 0, 0, NULL);
> - ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
> + ip6h = IOV_REMOVE_HEADER(data, ip6h_storage);
> if (!ip6h)
> - return NULL;
> -
> - offset += sizeof(*ip6h);
> + return false;
> + *dlen = iov_tail_size(data);
>
> nh = ip6h->nexthdr;
> if (!IPV6_NH_OPT(nh))
> goto found;
>
> - while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
> + while ((o = IOV_PEEK_HEADER(data, o_storage))) {
> + *dlen = iov_tail_size(data) - sizeof(*o);
I don't think this is quite right. This removes the option header
from the total, but not the option body (if any). It will be
corrected on the next loop, via iova_tail_size() - at the cost of
rescanning the IOV. However, I think that means the body of the last
option will be incorrectly included in dlen.
You could update *dlen incrementally, but AFAICT you don't need it
internally, so you could just compute from iov_tail_size() after the
found label.
> nh = o->nexthdr;
> hdrlen = (o->hdrlen + 1) * 8;
>
> if (IPV6_NH_OPT(nh))
> - offset += hdrlen;
> + iov_tail_drop(data, hdrlen);
> else
> goto found;
> }
>
> - return NULL;
> + return false;
>
> found:
> if (nh == 59)
Pre-existing: it'd be nice to have a name for that bare 59.
> - return NULL;
> + return false;
>
> *proto = nh;
> - return base + offset;
> + return true;
> }
> diff --git a/ip.h b/ip.h
> index 24509d9c11cd..5830b92302e2 100644
> --- a/ip.h
> +++ b/ip.h
> @@ -115,8 +115,7 @@ static inline uint32_t ip6_get_flow_lbl(const struct ipv6hdr *ip6h)
> ip6h->flow_lbl[2];
> }
>
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> - size_t *dlen);
> +bool ipv6_l4hdr(struct iov_tail *data, uint8_t *proto, size_t *dlen);
>
> /* IPv6 link-local all-nodes multicast address, ff02::1 */
> static const struct in6_addr in6addr_ll_all_nodes = {
> diff --git a/packet.c b/packet.c
> index 34b1722b9a03..014b353cdf8b 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -133,6 +133,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
> *
> * Return: pointer to start of data range, NULL on invalid range or descriptor
> */
> +/* cppcheck-suppress [staticFunction] */
> void *packet_get_try_do(const struct pool *p, size_t idx, size_t offset,
> size_t len, size_t *left, const char *func, int line)
> {
> diff --git a/tap.c b/tap.c
> index 8d2b118152f1..d7852fad6069 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -911,8 +911,10 @@ resume:
> if (plen != check)
> continue;
>
> - if (!(l4h = ipv6_l4hdr(in, i, sizeof(*eh), &proto, &l4len)))
> + data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
> + if (!ipv6_l4hdr(&data, &proto, &l4len))
> continue;
> + l4h = (char *)data.iov[0].iov_base + data.off;
>
> if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
> char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail
2025-08-05 15:46 ` [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail Laurent Vivier
@ 2025-08-06 5:17 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 5:17 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3293 bytes --]
On Tue, Aug 05, 2025 at 05:46:17PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_PEEK_HEADER()
> rather than packet_get().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tap.c | 33 ++++++++++++++++++++-------------
> 1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/tap.c b/tap.c
> index d7852fad6069..4fbcad3b385f 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -706,28 +706,34 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
> i = 0;
> resume:
> for (seq_count = 0, seq = NULL; i < in->count; i++) {
> - size_t l2len, l3len, hlen, l4len;
> + size_t l3len, hlen, l4len;
> + struct ethhdr eh_storage;
> + struct iphdr iph_storage;
> + struct udphdr uh_storage;
> const struct ethhdr *eh;
> const struct udphdr *uh;
> struct iov_tail data;
> struct iphdr *iph;
> - const char *l4h;
>
> - packet_get(in, i, 0, 0, &l2len);
> + if (!packet_data(in, i, &data))
> + continue;
>
> - eh = packet_get(in, i, 0, sizeof(*eh), &l3len);
> + eh = IOV_PEEK_HEADER(&data, eh_storage);
> if (!eh)
> continue;
> if (ntohs(eh->h_proto) == ETH_P_ARP) {
> PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
>
> - data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
> packet_add(pkt, &data);
> arp(c, pkt);
> continue;
> }
>
> - iph = packet_get(in, i, sizeof(*eh), sizeof(*iph), NULL);
> + if (!iov_tail_drop(&data, sizeof(*eh)))
> + continue;
> + l3len = iov_tail_size(&data);
> +
> + iph = IOV_PEEK_HEADER(&data, iph_storage);
> if (!iph)
> continue;
>
> @@ -755,8 +761,9 @@ resume:
> if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr)
> c->ip4.addr_seen.s_addr = iph->saddr;
>
> - l4h = packet_get(in, i, sizeof(*eh) + hlen, l4len, NULL);
> - if (!l4h)
> + if (!iov_tail_drop(&data, hlen))
> + continue;
> + if (iov_tail_size(&data) != l4len)
> continue;
>
> if (iph->protocol == IPPROTO_ICMP) {
> @@ -767,7 +774,6 @@ resume:
>
> tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
>
> - data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
> packet_add(pkt, &data);
> icmp_tap_handler(c, PIF_TAP, AF_INET,
> &iph->saddr, &iph->daddr,
> @@ -775,15 +781,17 @@ resume:
> continue;
> }
>
> - uh = packet_get(in, i, sizeof(*eh) + hlen, sizeof(*uh), NULL);
> + uh = IOV_PEEK_HEADER(&data, uh_storage);
> if (!uh)
> continue;
>
> if (iph->protocol == IPPROTO_UDP) {
> + struct iov_tail eh_data;
> +
> PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
>
> - data = IOV_TAIL_FROM_BUF((void *)eh, l2len, 0);
> - packet_add(pkt, &data);
> + packet_data(in, i, &eh_data);
> + packet_add(pkt, &eh_data);
> if (dhcp(c, pkt))
> continue;
> }
> @@ -834,7 +842,6 @@ resume:
> #undef L4_SET
>
> append:
> - data = IOV_TAIL_FROM_BUF((void *)l4h, l4len, 0);
> packet_add((struct pool *)&seq->p, &data);
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 20/30] tap: Convert tap6_handler() to iov_tail
2025-08-05 15:46 ` [PATCH v8 20/30] tap: Convert tap6_handler() " Laurent Vivier
@ 2025-08-06 6:21 ` David Gibson
2025-08-08 13:57 ` Laurent Vivier
0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:21 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3026 bytes --]
On Tue, Aug 05, 2025 at 05:46:18PM +0200, Laurent Vivier wrote:
> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> and IOV_PEEK_HEADER() rather than packet_get().
>
> Remove packet_get() as it is not used anymore.
[snip]
> @@ -896,21 +896,28 @@ resume:
> for (seq_count = 0, seq = NULL; i < in->count; i++) {
> size_t l4len, plen, check;
> struct in6_addr *saddr, *daddr;
> + struct ipv6hdr ip6h_storage;
> + struct ethhdr eh_storage;
> + struct udphdr uh_storage;
> const struct ethhdr *eh;
> const struct udphdr *uh;
> struct iov_tail data;
> struct ipv6hdr *ip6h;
> uint8_t proto;
> - char *l4h;
>
> - eh = packet_get(in, i, 0, sizeof(*eh), NULL);
> + if (!packet_data(in, i, &data))
> + return -1;
> +
> + eh = IOV_REMOVE_HEADER(&data, eh_storage);
> if (!eh)
> continue;
>
> - ip6h = packet_get(in, i, sizeof(*eh), sizeof(*ip6h), &check);
> + ip6h = IOV_PEEK_HEADER(&data, ip6h_storage);
> if (!ip6h)
> continue;
You peek the IPv6 header here, but I haven't spotted where you remove
/ drop it before...
> + check = iov_tail_size(&data) - sizeof(*ip6h);
> +
> saddr = &ip6h->saddr;
> daddr = &ip6h->daddr;
>
> @@ -918,10 +925,8 @@ resume:
> if (plen != check)
> continue;
>
> - data = IOV_TAIL_FROM_BUF(ip6h, sizeof(*ip6h) + check, 0);
> if (!ipv6_l4hdr(&data, &proto, &l4len))
> continue;
> - l4h = (char *)data.iov[0].iov_base + data.off;
>
> if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) {
> char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
> @@ -946,6 +951,8 @@ resume:
> }
>
> if (proto == IPPROTO_ICMPV6) {
> + struct icmp6hdr l4h_storage;
> + const struct icmp6hdr *l4h;
> PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
>
> if (c->no_icmp)
> @@ -954,9 +961,9 @@ resume:
> if (l4len < sizeof(struct icmp6hdr))
> continue;
>
> - data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
> packet_add(pkt, &data);
>
> + l4h = IOV_PEEK_HEADER(&data, l4h_storage);
... peeking the next header here.
> if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
> continue;
>
> @@ -969,12 +976,13 @@ resume:
>
> if (l4len < sizeof(*uh))
> continue;
> - uh = (struct udphdr *)l4h;
> + uh = IOV_PEEK_HEADER(&data, uh_storage);
And here.
> + if (!uh)
> + continue;
>
> if (proto == IPPROTO_UDP) {
> PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
>
> - data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
> packet_add(pkt, &data);
>
> if (dhcpv6(c, pkt, saddr, daddr))
> @@ -1031,7 +1039,6 @@ resume:
> #undef L4_SET
>
> append:
> - data = IOV_TAIL_FROM_BUF(l4h, l4len, 0);
> packet_add((struct pool *)&seq->p, &data);
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 21/30] packet: rename packet_data() to packet_get()
2025-08-05 15:46 ` [PATCH v8 21/30] packet: rename packet_data() to packet_get() Laurent Vivier
@ 2025-08-06 6:22 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:22 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 6917 bytes --]
On Tue, Aug 05, 2025 at 05:46:19PM +0200, Laurent Vivier wrote:
> As we have removed packet_get(), we can rename packet_data() to packet_get()
> as the name is clearer.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> arp.c | 2 +-
> dhcp.c | 2 +-
> dhcpv6.c | 2 +-
> icmp.c | 2 +-
> ndp.c | 2 +-
> packet.c | 8 ++++----
> packet.h | 9 ++++-----
> tap.c | 6 +++---
> tcp.c | 4 ++--
> udp.c | 4 ++--
> 10 files changed, 20 insertions(+), 21 deletions(-)
>
> diff --git a/arp.c b/arp.c
> index b3ac42082841..8b97df633e70 100644
> --- a/arp.c
> +++ b/arp.c
> @@ -82,7 +82,7 @@ int arp(const struct ctx *c, const struct pool *p)
> const struct arpmsg *am;
> struct iov_tail data;
>
> - if (!packet_data(p, 0, &data))
> + if (!packet_get(p, 0, &data))
> return -1;
>
> eh = IOV_REMOVE_HEADER(&data, eh_storage);
> diff --git a/dhcp.c b/dhcp.c
> index cf73d4b07767..47317f334945 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -317,7 +317,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
> unsigned int i;
> struct msg m_storage;
>
> - if (!packet_data(p, 0, &data))
> + if (!packet_get(p, 0, &data))
> return -1;
>
> eh = IOV_REMOVE_HEADER(&data, eh_storage);
> diff --git a/dhcpv6.c b/dhcpv6.c
> index e93acaf9955e..f54a75c642df 100644
> --- a/dhcpv6.c
> +++ b/dhcpv6.c
> @@ -562,7 +562,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> const struct udphdr *uh;
> size_t mlen, n;
>
> - if (!packet_data(p, 0, &data))
> + if (!packet_get(p, 0, &data))
> return -1;
>
> uh = IOV_REMOVE_HEADER(&data, uh_storage);
> diff --git a/icmp.c b/icmp.c
> index fdfc857b5ae8..71c496540310 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -251,7 +251,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> (void)saddr;
> ASSERT(pif == PIF_TAP);
>
> - if (!packet_data(p, 0, &data))
> + if (!packet_get(p, 0, &data))
> return -1;
>
> if (af == AF_INET) {
> diff --git a/ndp.c b/ndp.c
> index 5de4e508dc52..ba87a0aaa6e9 100644
> --- a/ndp.c
> +++ b/ndp.c
> @@ -354,7 +354,7 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
> const struct ndp_ns *ns;
> struct iov_tail data;
>
> - if (!packet_data(p, 0, &data))
> + if (!packet_get(p, 0, &data))
> return -1;
>
> ns = IOV_REMOVE_HEADER(&data, ns_storage);
> diff --git a/packet.c b/packet.c
> index 5da18bafa576..cbc43c2fc22d 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -122,7 +122,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
> }
>
> /**
> - * packet_data_do() - Get data range from packet descriptor from given pool
> + * packet_get_do() - Get data range from packet descriptor from given pool
> * @p: Packet pool
> * @idx: Index of packet descriptor in pool
> * @data: IOV tail to store the address of the data (output)
> @@ -132,9 +132,9 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
> * Return: false if packet index is invalid, true otherwise.
> * If something wrong with @data, don't return at all (assert).
> */
> -bool packet_data_do(const struct pool *p, size_t idx,
> - struct iov_tail *data,
> - const char *func, int line)
> +bool packet_get_do(const struct pool *p, size_t idx,
> + struct iov_tail *data,
> + const char *func, int line)
> {
> size_t i;
>
> diff --git a/packet.h b/packet.h
> index dab8274fa5c5..7afe80ef3fcf 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -33,16 +33,15 @@ struct pool {
> int vu_packet_check_range(void *buf, const char *ptr, size_t len);
> void packet_add_do(struct pool *p, struct iov_tail *data,
> const char *func, int line);
> -bool packet_data_do(const struct pool *p, const size_t idx,
> - struct iov_tail *data,
> - const char *func, int line);
> +bool packet_get_do(const struct pool *p, const size_t idx,
> + struct iov_tail *data, const char *func, int line);
> bool pool_full(const struct pool *p);
> void pool_flush(struct pool *p);
>
> #define packet_add(p, data) \
> packet_add_do(p, data, __func__, __LINE__)
> -#define packet_data(p, idx, data) \
> - packet_data_do(p, idx, data, __func__, __LINE__)
> +#define packet_get(p, idx, data) \
> + packet_get_do(p, idx, data, __func__, __LINE__)
>
> #define PACKET_POOL_DECL(_name, _size, _buf) \
> struct _name ## _t { \
> diff --git a/tap.c b/tap.c
> index 983f39ee8ee8..1d2e6fd802e9 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -715,7 +715,7 @@ resume:
> struct iov_tail data;
> struct iphdr *iph;
>
> - if (!packet_data(in, i, &data))
> + if (!packet_get(in, i, &data))
> continue;
>
> eh = IOV_PEEK_HEADER(&data, eh_storage);
> @@ -790,7 +790,7 @@ resume:
>
> PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
>
> - packet_data(in, i, &eh_data);
> + packet_get(in, i, &eh_data);
> packet_add(pkt, &eh_data);
> if (dhcp(c, pkt))
> continue;
> @@ -905,7 +905,7 @@ resume:
> struct ipv6hdr *ip6h;
> uint8_t proto;
>
> - if (!packet_data(in, i, &data))
> + if (!packet_get(in, i, &data))
> return -1;
>
> eh = IOV_REMOVE_HEADER(&data, eh_storage);
> diff --git a/tcp.c b/tcp.c
> index e0efc4cacb9b..4ba066fd1cac 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1657,7 +1657,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> size_t off, size;
> int count;
>
> - if (!packet_data(p, i, &data))
> + if (!packet_get(p, i, &data))
> return -1;
>
> th = IOV_PEEK_HEADER(&data, th_storage);
> @@ -1988,7 +1988,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
>
> (void)pif;
>
> - if (!packet_data(p, idx, &data))
> + if (!packet_get(p, idx, &data))
> return 1;
>
> l4len = iov_tail_size(&data);
> diff --git a/udp.c b/udp.c
> index 3c25f2e0ae97..86585b7e0942 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -990,7 +990,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
>
> ASSERT(!c->no_udp);
>
> - if (!packet_data(p, idx, &data))
> + if (!packet_get(p, idx, &data))
> return 1;
>
> uh = IOV_PEEK_HEADER(&data, uh_storage);
> @@ -1033,7 +1033,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
> for (i = 0, j = 0; i < (int)p->count - idx && j < UIO_MAXIOV; i++) {
> const struct udphdr *uh_send;
>
> - if (!packet_data(p, idx + i, &data))
> + if (!packet_get(p, idx + i, &data))
> return p->count - idx;
>
> uh_send = IOV_REMOVE_HEADER(&data, uh_storage);
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 22/30] arp: use iov_tail rather than pool
2025-08-05 15:46 ` [PATCH v8 22/30] arp: use iov_tail rather than pool Laurent Vivier
@ 2025-08-06 6:24 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:24 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2838 bytes --]
On Tue, Aug 05, 2025 at 05:46:20PM +0200, Laurent Vivier wrote:
> The arp() function signature is changed to accept `struct iov_tail *data`
> directly, replacing the previous `const struct pool *p` parameter.
> Consequently, arp() no longer fetches packet data internally using
> packet_data(), streamlining its logic.
>
> This simplifies callers like tap4_handler(), which now pass the iov_tail
> for the L2 ARP frame directly, removing intermediate pool handling.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> arp.c | 14 +++++---------
> arp.h | 2 +-
> tap.c | 5 +----
> 3 files changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/arp.c b/arp.c
> index 8b97df633e70..44677ad15b93 100644
> --- a/arp.c
> +++ b/arp.c
> @@ -63,11 +63,11 @@ static bool ignore_arp(const struct ctx *c,
> /**
> * arp() - Check if this is a supported ARP message, reply as needed
> * @c: Execution context
> - * @p: Packet pool, single packet with Ethernet buffer
> + * @data: Single packet with Ethernet buffer
> *
> * Return: 1 if handled, -1 on failure
> */
> -int arp(const struct ctx *c, const struct pool *p)
> +int arp(const struct ctx *c, struct iov_tail *data)
> {
> struct {
> struct ethhdr eh;
> @@ -80,14 +80,10 @@ int arp(const struct ctx *c, const struct pool *p)
> const struct ethhdr *eh;
> const struct arphdr *ah;
> const struct arpmsg *am;
> - struct iov_tail data;
>
> - if (!packet_get(p, 0, &data))
> - return -1;
> -
> - eh = IOV_REMOVE_HEADER(&data, eh_storage);
> - ah = IOV_REMOVE_HEADER(&data, ah_storage);
> - am = IOV_REMOVE_HEADER(&data, am_storage);
> + eh = IOV_REMOVE_HEADER(data, eh_storage);
> + ah = IOV_REMOVE_HEADER(data, ah_storage);
> + am = IOV_REMOVE_HEADER(data, am_storage);
> if (!eh || !ah || !am)
> return -1;
>
> diff --git a/arp.h b/arp.h
> index ac5cd16e47f4..86bcbf878eda 100644
> --- a/arp.h
> +++ b/arp.h
> @@ -20,6 +20,6 @@ struct arpmsg {
> unsigned char tip[4];
> } __attribute__((__packed__));
>
> -int arp(const struct ctx *c, const struct pool *p);
> +int arp(const struct ctx *c, struct iov_tail *data);
>
> #endif /* ARP_H */
> diff --git a/tap.c b/tap.c
> index 1d2e6fd802e9..ace735cfc136 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -722,10 +722,7 @@ resume:
> if (!eh)
> continue;
> if (ntohs(eh->h_proto) == ETH_P_ARP) {
> - PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
> -
> - packet_add(pkt, &data);
> - arp(c, pkt);
> + arp(c, &data);
> continue;
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 23/30] dhcp: use iov_tail rather than pool
2025-08-05 15:46 ` [PATCH v8 23/30] dhcp: " Laurent Vivier
@ 2025-08-06 6:26 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:26 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 4742 bytes --]
On Tue, Aug 05, 2025 at 05:46:21PM +0200, Laurent Vivier wrote:
> This patch refactors the dhcp() function to accept `struct iov_tail *data`
> directly as its packet input, replacing the previous `const struct pool *p`
> parameter. Consequently, dhcp() no longer fetches packet data internally
> using packet_data().
>
> This change simplifies callers, such as tap4_handler(), which now pass
> the iov_tail representing the L2 frame directly to dhcp(). This removes
> the need for intermediate packet pool handling for DHCP processing.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> dhcp.c | 32 ++++++++++++++------------------
> dhcp.h | 2 +-
> tap.c | 5 +----
> 3 files changed, 16 insertions(+), 23 deletions(-)
>
> diff --git a/dhcp.c b/dhcp.c
> index 47317f334945..cad1d037cee1 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -296,11 +296,11 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
> /**
> * dhcp() - Check if this is a DHCP message, reply as needed
> * @c: Execution context
> - * @p: Packet pool, single packet with Ethernet buffer
> + * @data: Single packet with Ethernet buffer
> *
> * Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure
> */
> -int dhcp(const struct ctx *c, const struct pool *p)
> +int dhcp(const struct ctx *c, struct iov_tail *data)
> {
> char macstr[ETH_ADDRSTRLEN];
> size_t mlen, dlen, opt_len;
> @@ -311,24 +311,20 @@ int dhcp(const struct ctx *c, const struct pool *p)
> const struct ethhdr *eh;
> const struct iphdr *iph;
> const struct udphdr *uh;
> - struct iov_tail data;
> struct msg const *m;
> struct msg reply;
> unsigned int i;
> struct msg m_storage;
>
> - if (!packet_get(p, 0, &data))
> - return -1;
> -
> - eh = IOV_REMOVE_HEADER(&data, eh_storage);
> - iph = IOV_PEEK_HEADER(&data, iph_storage);
> + eh = IOV_REMOVE_HEADER(data, eh_storage);
> + iph = IOV_PEEK_HEADER(data, iph_storage);
> if (!eh || !iph)
> return -1;
>
> - if (!iov_tail_drop(&data, iph->ihl * 4UL))
> + if (!iov_tail_drop(data, iph->ihl * 4UL))
> return -1;
>
> - uh = IOV_REMOVE_HEADER(&data, uh_storage);
> + uh = IOV_REMOVE_HEADER(data, uh_storage);
> if (!uh)
> return -1;
>
> @@ -338,8 +334,8 @@ int dhcp(const struct ctx *c, const struct pool *p)
> if (c->no_dhcp)
> return 1;
>
> - mlen = iov_tail_size(&data);
> - m = (struct msg const *)iov_remove_header_(&data, &m_storage,
> + mlen = iov_tail_size(data);
> + m = (struct msg const *)iov_remove_header_(data, &m_storage,
> offsetof(struct msg, o),
> __alignof__(struct msg));
> if (!m ||
> @@ -367,24 +363,24 @@ int dhcp(const struct ctx *c, const struct pool *p)
> for (i = 0; i < ARRAY_SIZE(opts); i++)
> opts[i].clen = -1;
>
> - opt_len = iov_tail_size(&data);
> + opt_len = iov_tail_size(data);
> while (opt_len >= 2) {
> uint8_t olen_storage, type_storage;
> const uint8_t *olen;
> uint8_t *type;
>
> - type = IOV_REMOVE_HEADER(&data, type_storage);
> - olen = IOV_REMOVE_HEADER(&data, olen_storage);
> + type = IOV_REMOVE_HEADER(data, type_storage);
> + olen = IOV_REMOVE_HEADER(data, olen_storage);
> if (!type || !olen)
> return -1;
>
> - opt_len = iov_tail_size(&data);
> + opt_len = iov_tail_size(data);
> if (opt_len < *olen)
> return -1;
>
> - iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
> + iov_to_buf(&data->iov[0], data->cnt, data->off, &opts[*type].c, *olen);
> opts[*type].clen = *olen;
> - iov_tail_drop(&data, *olen);
> + iov_tail_drop(data, *olen);
> opt_len -= *olen;
> }
>
> diff --git a/dhcp.h b/dhcp.h
> index 87aeecd8dec8..cd50c99b8856 100644
> --- a/dhcp.h
> +++ b/dhcp.h
> @@ -6,7 +6,7 @@
> #ifndef DHCP_H
> #define DHCP_H
>
> -int dhcp(const struct ctx *c, const struct pool *p);
> +int dhcp(const struct ctx *c, struct iov_tail *data);
> void dhcp_init(void);
>
> #endif /* DHCP_H */
> diff --git a/tap.c b/tap.c
> index ace735cfc136..7d7e89304723 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -785,11 +785,8 @@ resume:
> if (iph->protocol == IPPROTO_UDP) {
> struct iov_tail eh_data;
>
> - PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
> -
> packet_get(in, i, &eh_data);
> - packet_add(pkt, &eh_data);
> - if (dhcp(c, pkt))
> + if (dhcp(c, &eh_data))
> continue;
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 24/30] dhcpv6: use iov_tail rather than pool
2025-08-05 15:46 ` [PATCH v8 24/30] dhcpv6: " Laurent Vivier
@ 2025-08-06 6:27 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:27 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5709 bytes --]
On Tue, Aug 05, 2025 at 05:46:22PM +0200, Laurent Vivier wrote:
> This patch refactors the dhcpv6() function to accept `struct iov_tail *data`
> directly as its packet input, replacing the `const struct pool *p` parameter.
> Consequently, dhcpv6() no longer fetches packet data internally using
> packet_data().
>
> This change simplifies callers, such as tap6_handler(), which now pass
> the iov_tail representing the L4 UDP segment (DHCPv6 message) directly.
> This removes the need for intermediate packet pool handling.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> dhcpv6.c | 29 +++++++++++++----------------
> dhcpv6.h | 2 +-
> tap.c | 6 ++----
> 3 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/dhcpv6.c b/dhcpv6.c
> index f54a75c642df..52611d742b0b 100644
> --- a/dhcpv6.c
> +++ b/dhcpv6.c
> @@ -539,19 +539,19 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data,
> /**
> * dhcpv6() - Check if this is a DHCPv6 message, reply as needed
> * @c: Execution context
> - * @p: Packet pool, single packet starting from UDP header
> + * @data: Single packet starting from UDP header
> * @saddr: Source IPv6 address of original message
> * @daddr: Destination IPv6 address of original message
> *
> * Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure
> */
> -int dhcpv6(struct ctx *c, const struct pool *p,
> +int dhcpv6(struct ctx *c, struct iov_tail *data,
> const struct in6_addr *saddr, const struct in6_addr *daddr)
> {
> const struct opt_server_id *server_id = NULL;
> - struct iov_tail data, opt, client_id_base;
> const struct opt_hdr *client_id = NULL;
> struct opt_server_id server_id_storage;
> + struct iov_tail opt, client_id_base;
> const struct opt_ia_na *ia = NULL;
> struct opt_hdr client_id_storage;
> struct opt_ia_na ia_storage;
> @@ -562,10 +562,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> const struct udphdr *uh;
> size_t mlen, n;
>
> - if (!packet_get(p, 0, &data))
> - return -1;
> -
> - uh = IOV_REMOVE_HEADER(&data, uh_storage);
> + uh = IOV_REMOVE_HEADER(data, uh_storage);
> if (!uh)
> return -1;
>
> @@ -578,7 +575,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> if (!IN6_IS_ADDR_MULTICAST(daddr))
> return -1;
>
> - mlen = iov_tail_size(&data);
> + mlen = iov_tail_size(data);
> if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh))
> return -1;
>
> @@ -586,23 +583,23 @@ int dhcpv6(struct ctx *c, const struct pool *p,
>
> src = &c->ip6.our_tap_ll;
>
> - mh = IOV_REMOVE_HEADER(&data, mh_storage);
> + mh = IOV_REMOVE_HEADER(data, mh_storage);
> if (!mh)
> return -1;
>
> - client_id_base = data;
> + client_id_base = *data;
> if (dhcpv6_opt(&client_id_base, OPT_CLIENTID))
> client_id = IOV_PEEK_HEADER(&client_id_base, client_id_storage);
> if (!client_id || ntohs(client_id->l) > OPT_VSIZE(client_id))
> return -1;
>
> - opt = data;
> + opt = *data;
> if (dhcpv6_opt(&opt, OPT_SERVERID))
> server_id = IOV_PEEK_HEADER(&opt, server_id_storage);
> if (server_id && ntohs(server_id->hdr.l) != OPT_VSIZE(server_id))
> return -1;
>
> - opt = data;
> + opt = *data;
> if (dhcpv6_opt(&opt, OPT_IA_NA))
> ia = IOV_PEEK_HEADER(&opt, ia_storage);
> if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta)))
> @@ -620,9 +617,9 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> if (mh->type == TYPE_CONFIRM && server_id)
> return -1;
>
> - if (dhcpv6_ia_notonlink(&data, &c->ip6.addr)) {
> + if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) {
>
> - dhcpv6_send_ia_notonlink(c, &data, &client_id_base,
> + dhcpv6_send_ia_notonlink(c, data, &client_id_base,
> ntohs(client_id->l), mh->xid);
>
> return 1;
> @@ -635,7 +632,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> memcmp(&resp.server_id, server_id, sizeof(resp.server_id)))
> return -1;
>
> - if (ia || dhcpv6_opt(&data, OPT_IA_TA))
> + if (ia || dhcpv6_opt(data, OPT_IA_TA))
> return -1;
>
> info("DHCPv6: received INFORMATION_REQUEST, sending REPLY");
> @@ -668,7 +665,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> n = offsetof(struct resp_t, client_id) +
> sizeof(struct opt_hdr) + ntohs(client_id->l);
> n = dhcpv6_dns_fill(c, (char *)&resp, n);
> - n = dhcpv6_client_fqdn_fill(&data, c, (char *)&resp, n);
> + n = dhcpv6_client_fqdn_fill(data, c, (char *)&resp, n);
>
> resp.hdr.xid = mh->xid;
>
> diff --git a/dhcpv6.h b/dhcpv6.h
> index 580998862227..c706dfdbb2ac 100644
> --- a/dhcpv6.h
> +++ b/dhcpv6.h
> @@ -6,7 +6,7 @@
> #ifndef DHCPV6_H
> #define DHCPV6_H
>
> -int dhcpv6(struct ctx *c, const struct pool *p,
> +int dhcpv6(struct ctx *c, struct iov_tail *data,
> struct in6_addr *saddr, struct in6_addr *daddr);
> void dhcpv6_init(const struct ctx *c);
>
> diff --git a/tap.c b/tap.c
> index 7d7e89304723..3262b44c4287 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -975,11 +975,9 @@ resume:
> continue;
>
> if (proto == IPPROTO_UDP) {
> - PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
> -
> - packet_add(pkt, &data);
> + struct iov_tail uh_data = data;
>
> - if (dhcpv6(c, pkt, saddr, daddr))
> + if (dhcpv6(c, &uh_data, saddr, daddr))
> continue;
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 25/30] icmp: use iov_tail rather than pool
2025-08-05 15:46 ` [PATCH v8 25/30] icmp: " Laurent Vivier
@ 2025-08-06 6:29 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:29 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 6712 bytes --]
On Tue, Aug 05, 2025 at 05:46:23PM +0200, Laurent Vivier wrote:
> As the iov_tail has a non zero offset (because of the presence of
> packet headers in the iov array), we must copy it to a new
> iov array (using iov_tail_splice()) to pass it to sendmsg().
>
> We cannot use anymore iov_tail_msghdr(), so remove it.
Right, as I mentioned on that patch, it was always kind of a foot gun.
Was this the only user? Maybe just open code it earlier in the series
rather than introducing then removing the helper.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> icmp.c | 30 +++++++++++++++++++-----------
> icmp.h | 2 +-
> iov.c | 23 -----------------------
> iov.h | 2 --
> tap.c | 7 ++-----
> 5 files changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/icmp.c b/icmp.c
> index 71c496540310..be800e30c369 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -44,6 +44,7 @@
>
> #define ICMP_ECHO_TIMEOUT 60 /* s, timeout for ICMP socket activity */
> #define ICMP_NUM_IDS (1U << 16)
> +#define MAX_IOV_ICMP 16 /* Arbitrary, should be enough */
>
> /**
> * ping_at_sidx() - Get ping specific flow at given sidx
> @@ -229,36 +230,33 @@ cancel:
> * @af: Address family, AF_INET or AF_INET6
> * @saddr: Source address
> * @daddr: Destination address
> - * @p: Packet pool, single packet with ICMP/ICMPv6 header
> + * @data: Single packet with ICMP/ICMPv6 header
> * @now: Current timestamp
> *
> * Return: count of consumed packets (always 1, even if malformed)
> */
> int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> const void *saddr, const void *daddr,
> - const struct pool *p, const struct timespec *now)
> + struct iov_tail *data, const struct timespec *now)
> {
> + struct iovec iov[MAX_IOV_ICMP];
> struct icmp_ping_flow *pingf;
> const struct flowside *tgt;
> union sockaddr_inany sa;
> - struct iov_tail data;
> struct msghdr msh;
> uint16_t id, seq;
> union flow *flow;
> uint8_t proto;
> - socklen_t sl;
> + int cnt;
>
> (void)saddr;
> ASSERT(pif == PIF_TAP);
>
> - if (!packet_get(p, 0, &data))
> - return -1;
> -
> if (af == AF_INET) {
> struct icmphdr ih_storage;
> const struct icmphdr *ih;
>
> - ih = IOV_PEEK_HEADER(&data, ih_storage);
> + ih = IOV_PEEK_HEADER(data, ih_storage);
> if (!ih)
> return 1;
>
> @@ -272,7 +270,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> struct icmp6hdr ih_storage;
> const struct icmp6hdr *ih;
>
> - ih = IOV_PEEK_HEADER(&data, ih_storage);
> + ih = IOV_PEEK_HEADER(data, ih_storage);
> if (!ih)
> return 1;
>
> @@ -286,6 +284,10 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> ASSERT(0);
> }
>
> + cnt = iov_tail_clone(&iov[0], MAX_IOV_ICMP, data);
> + if (cnt < 0)
> + return 1;
> +
> flow = flow_at_sidx(flow_lookup_af(c, proto, PIF_TAP,
> af, saddr, daddr, id, id));
>
> @@ -300,8 +302,14 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> pingf->ts = now->tv_sec;
>
>
> - pif_sockaddr(c, &sa, &sl, PIF_HOST, &tgt->eaddr, 0);
> - iov_tail_msghdr(&msh, &data, &sa, sl);
> + pif_sockaddr(c, &sa, &msh.msg_namelen, PIF_HOST, &tgt->eaddr, 0);
> + msh.msg_name = &sa;
> + msh.msg_iov = iov;
> + msh.msg_iovlen = cnt;
> + msh.msg_control = NULL;
> + msh.msg_controllen = 0;
> + msh.msg_flags = 0;
> +
> if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) {
> flow_dbg_perror(pingf, "failed to relay request to socket");
> } else {
> diff --git a/icmp.h b/icmp.h
> index 5ce22b5eca1f..d1cecb20e29d 100644
> --- a/icmp.h
> +++ b/icmp.h
> @@ -14,7 +14,7 @@ struct icmp_ping_flow;
> void icmp_sock_handler(const struct ctx *c, union epoll_ref ref);
> int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> const void *saddr, const void *daddr,
> - const struct pool *p, const struct timespec *now);
> + struct iov_tail *data, const struct timespec *now);
> void icmp_init(void);
>
> /**
> diff --git a/iov.c b/iov.c
> index d17d4dd3da09..1d734acdfea6 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -157,29 +157,6 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt)
> return len;
> }
>
> -/**
> - * iov_tail_msghdr - Initialize a msghdr from an IOV tail structure
> - * @msh: msghdr to initialize
> - * @tail: iov_tail to use to set msg_iov and msg_iovlen
> - * @msg_name: Pointer to set to msg_name
> - * @msg_namelen: Size of @msg_name
> - */
> -void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
> - void *msg_name, socklen_t msg_namelen)
> -{
> - iov_tail_prune(tail);
> -
> - ASSERT(tail->off == 0);
> -
> - msh->msg_name = msg_name;
> - msh->msg_namelen = msg_namelen;
> - msh->msg_iov = (struct iovec *)tail->iov;
> - msh->msg_iovlen = tail->cnt;
> - msh->msg_control = NULL;
> - msh->msg_controllen = 0;
> - msh->msg_flags = 0;
> -}
> -
> /**
> * iov_tail_prune() - Remove any unneeded buffers from an IOV tail
> * @tail: IO vector tail (modified)
> diff --git a/iov.h b/iov.h
> index 75c3b07a87e3..ccdb690ef3f1 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -82,8 +82,6 @@ struct iov_tail {
> 1, \
> (off_))
>
> -void iov_tail_msghdr(struct msghdr *msh, struct iov_tail *tail,
> - void *msg_name, socklen_t msg_namelen);
> bool iov_tail_prune(struct iov_tail *tail);
> size_t iov_tail_size(struct iov_tail *tail);
> bool iov_tail_drop(struct iov_tail *tail, size_t len);
> diff --git a/tap.c b/tap.c
> index 3262b44c4287..48152a84674c 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -764,17 +764,14 @@ resume:
> continue;
>
> if (iph->protocol == IPPROTO_ICMP) {
> - PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
> -
> if (c->no_icmp)
> continue;
>
> tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
>
> - packet_add(pkt, &data);
> icmp_tap_handler(c, PIF_TAP, AF_INET,
> &iph->saddr, &iph->daddr,
> - pkt, now);
> + &data, now);
> continue;
> }
>
> @@ -964,7 +961,7 @@ resume:
> tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
>
> icmp_tap_handler(c, PIF_TAP, AF_INET6,
> - saddr, daddr, pkt, now);
> + saddr, daddr, &data, now);
> continue;
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 26/30] ndp: use iov_tail rather than pool
2025-08-05 15:46 ` [PATCH v8 26/30] ndp: " Laurent Vivier
@ 2025-08-06 6:31 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:31 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3405 bytes --]
On Tue, Aug 05, 2025 at 05:46:24PM +0200, Laurent Vivier wrote:
> The ndp() function signature is changed to accept `struct iov_tail *data`
> directly, replacing the previous `const struct pool *p` and
> `const struct icmp6hdr *ih` parameters.
>
> This change simplifies callers, like tap6_handler(), which now provide
> the iov_tail representing the L4 ICMPv6 segment directly to ndp().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> ndp.c | 19 +++++++++++--------
> ndp.h | 4 ++--
> tap.c | 10 +++-------
> 3 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/ndp.c b/ndp.c
> index ba87a0aaa6e9..eb090cd2c5a7 100644
> --- a/ndp.c
> +++ b/ndp.c
> @@ -336,13 +336,20 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst)
> * ndp() - Check for NDP solicitations, reply as needed
> * @c: Execution context
> * @saddr: Source IPv6 address
> - * @p: Packet pool
> + * @data: Single packet with ICMPv6 header
> *
> * Return: 0 if not handled here, 1 if handled, -1 on failure
> */
> -int ndp(const struct ctx *c, const struct icmp6hdr *ih,
> - const struct in6_addr *saddr, const struct pool *p)
> +int ndp(const struct ctx *c, const struct in6_addr *saddr,
> + struct iov_tail *data)
> {
> + struct icmp6hdr ih_storage;
> + const struct icmp6hdr *ih;
> +
> + ih = IOV_PEEK_HEADER(data, ih_storage);
> + if (!ih)
> + return -1;
> +
> if (ih->icmp6_type < RS || ih->icmp6_type > NA)
> return 0;
>
> @@ -352,12 +359,8 @@ int ndp(const struct ctx *c, const struct icmp6hdr *ih,
> if (ih->icmp6_type == NS) {
> struct ndp_ns ns_storage;
> const struct ndp_ns *ns;
> - struct iov_tail data;
> -
> - if (!packet_get(p, 0, &data))
> - return -1;
>
> - ns = IOV_REMOVE_HEADER(&data, ns_storage);
> + ns = IOV_REMOVE_HEADER(data, ns_storage);
> if (!ns)
> return -1;
>
> diff --git a/ndp.h b/ndp.h
> index 41c2000356ec..b1dd5e82c085 100644
> --- a/ndp.h
> +++ b/ndp.h
> @@ -8,8 +8,8 @@
>
> struct icmp6hdr;
>
> -int ndp(const struct ctx *c, const struct icmp6hdr *ih,
> - const struct in6_addr *saddr, const struct pool *p);
> +int ndp(const struct ctx *c, const struct in6_addr *saddr,
> + struct iov_tail *data);
> void ndp_timer(const struct ctx *c, const struct timespec *now);
>
> #endif /* NDP_H */
> diff --git a/tap.c b/tap.c
> index 48152a84674c..d327ec0c3d54 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -942,9 +942,7 @@ resume:
> }
>
> if (proto == IPPROTO_ICMPV6) {
> - struct icmp6hdr l4h_storage;
> - const struct icmp6hdr *l4h;
> - PACKET_POOL_P(pkt, 1, in->buf, in->buf_size);
> + struct iov_tail ndp_data;
>
> if (c->no_icmp)
> continue;
> @@ -952,10 +950,8 @@ resume:
> if (l4len < sizeof(struct icmp6hdr))
> continue;
>
> - packet_add(pkt, &data);
> -
> - l4h = IOV_PEEK_HEADER(&data, l4h_storage);
> - if (ndp(c, (struct icmp6hdr *)l4h, saddr, pkt))
> + ndp_data = data;
> + if (ndp(c, saddr, &ndp_data))
> continue;
>
> tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P()
2025-08-05 15:46 ` [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P() Laurent Vivier
@ 2025-08-06 6:32 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:32 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]
On Tue, Aug 05, 2025 at 05:46:25PM +0200, Laurent Vivier wrote:
> These macros are no longer used following the refactoring of packet
> handlers to directly use iov_tail. Callers no longer require PACKET_POOL_P
> for temporary pools, and PACKET_POOL can be replaced by PACKET_POOL_DECL
> and separate initialization if needed.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> packet.h | 9 ---------
> 1 file changed, 9 deletions(-)
>
> diff --git a/packet.h b/packet.h
> index 7afe80ef3fcf..286b6b9994db 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -59,19 +59,10 @@ struct _name ## _t { \
> .size = _size, \
> }
>
> -#define PACKET_POOL(name, size, buf, buf_size) \
> - PACKET_POOL_DECL(name, size, buf) name = \
> - PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
> -
> #define PACKET_INIT(name, size, buf, buf_size) \
> (struct name ## _t) PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
>
> #define PACKET_POOL_NOINIT(name, size, buf) \
> PACKET_POOL_DECL(name, size, buf) name ## _storage; \
> static struct pool *name = (struct pool *)&name ## _storage
> -
> -#define PACKET_POOL_P(name, size, buf, buf_size) \
> - PACKET_POOL(name ## _storage, size, buf, buf_size); \
> - struct pool *name = (struct pool *)&name ## _storage
> -
> #endif /* PACKET_H */
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL()
2025-08-05 15:46 ` [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL() Laurent Vivier
@ 2025-08-06 6:33 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-06 6:33 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]
On Tue, Aug 05, 2025 at 05:46:26PM +0200, Laurent Vivier wrote:
> _buf is not used in the macro. Remove it.
> Remove it also from PACKET_POOL_NOINIT() as it was needed
> for PACKET_POOL_DECL().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> packet.h | 6 +++---
> tap.c | 6 +++---
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/packet.h b/packet.h
> index 286b6b9994db..43b9022075d1 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -43,7 +43,7 @@ void pool_flush(struct pool *p);
> #define packet_get(p, idx, data) \
> packet_get_do(p, idx, data, __func__, __LINE__)
>
> -#define PACKET_POOL_DECL(_name, _size, _buf) \
> +#define PACKET_POOL_DECL(_name, _size) \
> struct _name ## _t { \
> char *buf; \
> size_t buf_size; \
> @@ -62,7 +62,7 @@ struct _name ## _t { \
> #define PACKET_INIT(name, size, buf, buf_size) \
> (struct name ## _t) PACKET_POOL_INIT_NOCAST(size, buf, buf_size)
>
> -#define PACKET_POOL_NOINIT(name, size, buf) \
> - PACKET_POOL_DECL(name, size, buf) name ## _storage; \
> +#define PACKET_POOL_NOINIT(name, size) \
> + PACKET_POOL_DECL(name, size) name ## _storage; \
> static struct pool *name = (struct pool *)&name ## _storage
> #endif /* PACKET_H */
> diff --git a/tap.c b/tap.c
> index d327ec0c3d54..bbc786468455 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -95,8 +95,8 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU);
> ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
>
> /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
> -static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4, pkt_buf);
> -static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6, pkt_buf);
> +static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4);
> +static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6);
>
> #define TAP_SEQS 128 /* Different L4 tuples in one batch */
> #define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warnings */
> @@ -555,7 +555,7 @@ void eth_update_mac(struct ethhdr *eh,
> memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
> }
>
> -PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
> +PACKET_POOL_DECL(pool_l4, UIO_MAXIOV);
>
> /**
> * struct l4_seq4_t - Message sequence for one protocol handler call, IPv4
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 29/30] packet: Refactor vhost-user memory region handling
2025-08-05 15:46 ` [PATCH v8 29/30] packet: Refactor vhost-user memory region handling Laurent Vivier
@ 2025-08-07 6:10 ` David Gibson
2025-08-07 9:05 ` Laurent Vivier
0 siblings, 1 reply; 66+ messages in thread
From: David Gibson @ 2025-08-07 6:10 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 11219 bytes --]
On Tue, Aug 05, 2025 at 05:46:27PM +0200, Laurent Vivier wrote:
> This patch refactors the handling of vhost-user memory regions by
> introducing a new `struct vdev_memory` to encapsulate the regions
> array and their count (`nregions`) within the main `vu_dev` structure.
>
> This new `vdev_memory` structure is then passed to the packet pool by
> re-using the existing `p->buf` field. A `p->buf_size` of 0 indicates
> that `p->buf` holds a pointer to `struct vdev_memory` instead of a
> regular packet buffer. A new helper, `get_vdev_memory()`, is added to
> abstract this access pattern.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> packet.c | 18 ++++++++++++++++--
> packet.h | 6 ++++--
> tap.c | 4 ++--
> tap.h | 1 -
> vhost_user.c | 28 +++++++++++-----------------
> virtio.c | 4 ++--
> virtio.h | 18 ++++++++++++++----
> vu_common.c | 22 ++++++++++++----------
> 8 files changed, 61 insertions(+), 40 deletions(-)
>
> diff --git a/packet.c b/packet.c
> index cbc43c2fc22d..4b93688509a4 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -22,6 +22,20 @@
> #include "util.h"
> #include "log.h"
>
> +/**
> + * get_vdev_memory() - Return a pointer to the memory regions of the pool
> + * @p: Packet pool
> + *
> + * Return: Null if none, otherwise a pointer to vdev_memory structure
> + */
> +static struct vdev_memory *get_vdev_memory(const struct pool *p)
> +{
> + if (p->buf_size)
> + return NULL;
> +
> + return (struct vdev_memory *)p->buf;
> +}
> +
> /**
> * packet_check_range() - Check if a memory range is valid for a pool
> * @p: Packet pool
> @@ -41,10 +55,10 @@ static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
> return -1;
> }
>
> - if (p->buf_size == 0) {
> + if (get_vdev_memory(p)) {
> int ret;
>
> - ret = vu_packet_check_range((void *)p->buf, ptr, len);
> + ret = vu_packet_check_range(get_vdev_memory(p), ptr, len);
Seems like it would be marginally more natural to assign
get_vdev_memory() to a temporary in the if, then re-use it here.
> if (ret == -1)
> debug("cannot find region, %s:%i", func, line);
> diff --git a/packet.h b/packet.h
> index 43b9022075d1..e51cbd19fdc4 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -8,6 +8,7 @@
>
> #include <stdbool.h>
> #include "iov.h"
> +#include "virtio.h"
>
> /* Maximum size of a single packet stored in pool, including headers */
> #define PACKET_MAX_LEN ((size_t)UINT16_MAX)
> @@ -15,7 +16,7 @@
> /**
> * struct pool - Generic pool of packets stored in a buffer
> * @buf: Buffer storing packet descriptors,
> - * a struct vu_dev_region array for passt vhost-user mode
> + * a struct vdev_region for passt vhost-user mode
> * @buf_size: Total size of buffer,
> * 0 for passt vhost-user mode
> * @size: Number of usable descriptors for the pool
> @@ -30,7 +31,8 @@ struct pool {
> struct iovec pkt[];
> };
>
> -int vu_packet_check_range(void *buf, const char *ptr, size_t len);
> +int vu_packet_check_range(struct vdev_memory *memory,
> + const char *ptr, size_t len);
> void packet_add_do(struct pool *p, struct iov_tail *data,
> const char *func, int line);
> bool packet_get_do(const struct pool *p, const size_t idx,
> diff --git a/tap.c b/tap.c
> index bbc786468455..9fd00915bb01 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1458,7 +1458,7 @@ static void tap_sock_tun_init(struct ctx *c)
> * @base: Buffer base
> * @size Buffer size
> */
> -void tap_sock_update_pool(void *base, size_t size)
> +static void tap_sock_update_pool(void *base, size_t size)
> {
> int i;
>
> @@ -1479,8 +1479,8 @@ void tap_sock_update_pool(void *base, size_t size)
> void tap_backend_init(struct ctx *c)
> {
> if (c->mode == MODE_VU) {
> - tap_sock_update_pool(NULL, 0);
> vu_init(c);
> + tap_sock_update_pool(&c->vdev->memory, 0);
> } else {
> tap_sock_update_pool(pkt_buf, sizeof(pkt_buf));
> }
> diff --git a/tap.h b/tap.h
> index ce5510882d5d..21db4d219ecb 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -115,7 +115,6 @@ void tap_handler_passt(struct ctx *c, uint32_t events,
> const struct timespec *now);
> int tap_sock_unix_open(char *sock_path);
> void tap_sock_reset(struct ctx *c);
> -void tap_sock_update_pool(void *base, size_t size);
> void tap_backend_init(struct ctx *c);
> void tap_flush_pools(void);
> void tap_handler(struct ctx *c, const struct timespec *now);
> diff --git a/vhost_user.c b/vhost_user.c
> index c1522d549f00..f97ec6064cac 100644
> --- a/vhost_user.c
> +++ b/vhost_user.c
> @@ -137,8 +137,8 @@ static void *qva_to_va(struct vu_dev *dev, uint64_t qemu_addr)
> unsigned int i;
>
> /* Find matching memory region. */
> - for (i = 0; i < dev->nregions; i++) {
> - const struct vu_dev_region *r = &dev->regions[i];
> + for (i = 0; i < dev->memory.nregions; i++) {
> + const struct vu_dev_region *r = &dev->memory.regions[i];
>
> if ((qemu_addr >= r->qva) && (qemu_addr < (r->qva + r->size))) {
> /* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> @@ -428,8 +428,8 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> struct vhost_user_memory m = vmsg->payload.memory, *memory = &m;
> unsigned int i;
>
> - for (i = 0; i < vdev->nregions; i++) {
> - const struct vu_dev_region *r = &vdev->regions[i];
> + for (i = 0; i < vdev->memory.nregions; i++) {
> + const struct vu_dev_region *r = &vdev->memory.regions[i];
>
> if (r->mmap_addr) {
> /* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> @@ -437,12 +437,12 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> r->size + r->mmap_offset);
> }
> }
> - vdev->nregions = memory->nregions;
> + vdev->memory.nregions = memory->nregions;
>
> debug("vhost-user nregions: %u", memory->nregions);
> - for (i = 0; i < vdev->nregions; i++) {
> + for (i = 0; i < vdev->memory.nregions; i++) {
> struct vhost_user_memory_region *msg_region = &memory->regions[i];
> - struct vu_dev_region *dev_region = &vdev->regions[i];
> + struct vu_dev_region *dev_region = &vdev->memory.regions[i];
> void *mmap_addr;
>
> debug("vhost-user region %d", i);
> @@ -484,13 +484,7 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> }
> }
>
> - /* As vu_packet_check_range() has no access to the number of
> - * memory regions, mark the end of the array with mmap_addr = 0
> - */
> - ASSERT(vdev->nregions < VHOST_USER_MAX_RAM_SLOTS - 1);
> - vdev->regions[vdev->nregions].mmap_addr = 0;
> -
> - tap_sock_update_pool(vdev->regions, 0);
> + ASSERT(vdev->memory.nregions < VHOST_USER_MAX_RAM_SLOTS);
It looks like the assertion is changing threshold by one, and I'm not
sure why.
>
> return false;
> }
> @@ -1106,8 +1100,8 @@ void vu_cleanup(struct vu_dev *vdev)
> vq->vring.avail = 0;
> }
>
> - for (i = 0; i < vdev->nregions; i++) {
> - const struct vu_dev_region *r = &vdev->regions[i];
> + for (i = 0; i < vdev->memory.nregions; i++) {
> + const struct vu_dev_region *r = &vdev->memory.regions[i];
>
> if (r->mmap_addr) {
> /* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> @@ -1115,7 +1109,7 @@ void vu_cleanup(struct vu_dev *vdev)
> r->size + r->mmap_offset);
> }
> }
> - vdev->nregions = 0;
> + vdev->memory.nregions = 0;
>
> vu_close_log(vdev);
>
> diff --git a/virtio.c b/virtio.c
> index ed7842b4c78a..bd388c2dfc7f 100644
> --- a/virtio.c
> +++ b/virtio.c
> @@ -102,8 +102,8 @@ static void *vu_gpa_to_va(const struct vu_dev *dev, uint64_t *plen,
> return NULL;
>
> /* Find matching memory region. */
> - for (i = 0; i < dev->nregions; i++) {
> - const struct vu_dev_region *r = &dev->regions[i];
> + for (i = 0; i < dev->memory.nregions; i++) {
> + const struct vu_dev_region *r = &dev->memory.regions[i];
>
> if ((guest_addr >= r->gpa) &&
> (guest_addr < (r->gpa + r->size))) {
> diff --git a/virtio.h b/virtio.h
> index 32757458ea95..b55cc4042521 100644
> --- a/virtio.h
> +++ b/virtio.h
> @@ -96,11 +96,22 @@ struct vu_dev_region {
> */
> #define VHOST_USER_MAX_RAM_SLOTS 32
>
> +/**
> + * struct vdev_memory - Describes the shared memory regions for a vhost-user
> + * device
> + * @nregions: Number of shared memory regions
> + * @regions: Guest shared memory regions
> + */
> +struct vdev_memory {
> + uint32_t nregions;
> + struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS];
> +};
> +
> /**
> * struct vu_dev - vhost-user device information
> * @context: Execution context
> - * @nregions: Number of shared memory regions
> - * @regions: Guest shared memory regions
> + * @memory: Shared memory regions
> + * @vq: Virtqueues of the device
> * @features: Vhost-user features
> * @protocol_features: Vhost-user protocol features
> * @log_call_fd: Eventfd to report logging update
> @@ -109,8 +120,7 @@ struct vu_dev_region {
> */
> struct vu_dev {
> struct ctx *context;
> - uint32_t nregions;
> - struct vu_dev_region regions[VHOST_USER_MAX_RAM_SLOTS];
> + struct vdev_memory memory;
> struct vu_virtq vq[VHOST_USER_MAX_QUEUES];
> uint64_t features;
> uint64_t protocol_features;
> diff --git a/vu_common.c b/vu_common.c
> index b77b21420c57..b716070ea3c3 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -25,26 +25,28 @@
> /**
> * vu_packet_check_range() - Check if a given memory zone is contained in
> * a mapped guest memory region
> - * @buf: Array of the available memory regions
> + * @memory: Array of the available memory regions
> * @ptr: Start of desired data range
> - * @size: Length of desired data range
> + * @len: Length of desired data range
> *
> * Return: 0 if the zone is in a mapped memory region, -1 otherwise
> */
> -int vu_packet_check_range(void *buf, const char *ptr, size_t len)
> +int vu_packet_check_range(struct vdev_memory *memory,
> + const char *ptr, size_t len)
> {
> - struct vu_dev_region *dev_region;
> + struct vu_dev_region *dev_region = memory->regions;
> + unsigned int i;
>
> - for (dev_region = buf; dev_region->mmap_addr; dev_region++) {
> - uintptr_t base_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> + for (i = 0; i < memory->nregions; i++) {
> + uintptr_t base_addr = dev_region[i].mmap_addr +
> + dev_region[i].mmap_offset;
> /* NOLINTNEXTLINE(performance-no-int-to-ptr) */
> const char *base = (const char *)base_addr;
>
> - ASSERT(base_addr >= dev_region->mmap_addr);
> + ASSERT(base_addr >= dev_region[i].mmap_addr);
>
> - if (len <= dev_region->size && base <= ptr &&
> - (size_t)(ptr - base) <= dev_region->size - len)
> + if (len <= dev_region[i].size && base <= ptr &&
> + (size_t)(ptr - base) <= dev_region[i].size - len)
> return 0;
> }
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 30/30] packet: Add support for multi-vector packets
2025-08-05 15:46 ` [PATCH v8 30/30] packet: Add support for multi-vector packets Laurent Vivier
@ 2025-08-07 6:17 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-07 6:17 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5789 bytes --]
On Tue, Aug 05, 2025 at 05:46:28PM +0200, Laurent Vivier wrote:
> The packet pool was previously limited to handling packets contained
> within a single buffer.
>
> This patch extends the packet pool to support iovec array,
> allowing a single logical packet to be composed of multiple iovec.
>
> To accommodate this, the storage format within the pool is modified.
> For a multi-vector packet, a header entry is now stored first with
> iov_base = NULL and iov_len holding the number of subsequent
> vectors. The actual data vectors are then stored in the following
> pool slots.
>
> The packet_add_do() and packet_get_do() functions are updated to
> manage this new format for storing and retrieving packets. The
> pool_full() check is also adjusted to ensure there is enough
> space for all vectors of a new packet before adding it.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> packet.c | 50 +++++++++++++++++++++++++++++++++-----------------
> packet.h | 2 +-
> tap.c | 4 ++--
> 3 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/packet.c b/packet.c
> index 4b93688509a4..d697232d951a 100644
> --- a/packet.c
> +++ b/packet.c
> @@ -90,12 +90,13 @@ static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
> /**
> * pool_full() - Is a packet pool full?
> * @p: Pointer to packet pool
> + * @data: check data can fit in the pool
> *
> - * Return: true if the pool is full, false if more packets can be added
> + * Return: true if the pool is full, false if data can be added
> */
> -bool pool_full(const struct pool *p)
> +bool pool_full(const struct pool *p, const struct iov_tail *data)
Given the slightly changed semantics, I wonder if 'pool_can_fit()'
might be a better name now.
> {
> - return p->count >= p->size;
> + return p->count + data->cnt + (data->cnt > 1) >= p->size;
This test is only correct if data is already pruned. As I've said
elsewhere, it might be worth changing to the assumption that iov_tails
are pruned everywhere outside the iov_tail internal handling.
Oh.. also I think the new check is off by one (in the relatively safe
direction). It will say there's no room when there is just exactly
enough room.
> }
>
> /**
> @@ -108,11 +109,9 @@ bool pool_full(const struct pool *p)
> void packet_add_do(struct pool *p, struct iov_tail *data,
> const char *func, int line)
> {
> - size_t idx = p->count;
> - const char *start;
> - size_t len;
> + size_t idx = p->count, i, offset;
>
> - if (pool_full(p)) {
> + if (pool_full(p, data)) {
> debug("add packet index %zu to pool with size %zu, %s:%i",
> idx, p->size, func, line);
> return;
> @@ -121,18 +120,30 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
> if (!iov_tail_prune(data))
> return;
>
> - ASSERT(data->cnt == 1); /* we don't support iovec */
> + if (data->cnt > 1) {
> + p->pkt[idx].iov_base = NULL;
> + p->pkt[idx].iov_len = data->cnt;
> + idx++;
> + }
>
> - len = data->iov[0].iov_len - data->off;
> - start = (char *)data->iov[0].iov_base + data->off;
> + offset = data->off;
> + for (i = 0; i < data->cnt; i++) {
> + const char *start;
> + size_t len;
>
> - if (packet_check_range(p, start, len, func, line))
> - return;
> + len = data->iov[i].iov_len - offset;
> + start = (char *)data->iov[i].iov_base + offset;
> + offset = 0;
>
> - p->pkt[idx].iov_base = (void *)start;
> - p->pkt[idx].iov_len = len;
> + if (packet_check_range(p, start, len, func, line))
> + return;
>
> - p->count++;
> + p->pkt[idx].iov_base = (void *)start;
> + p->pkt[idx].iov_len = len;
> + idx++;
Hm. Isn't the above equivalent to iov_tail_clone()? Is calling
packet_check_range() on each chunk the only reason for open-coding it
here?
> + }
> +
> + p->count = idx;
> }
>
> /**
> @@ -162,9 +173,14 @@ bool packet_get_do(const struct pool *p, size_t idx,
> return false;
> }
>
> - data->cnt = 1;
> + if (p->pkt[idx].iov_base) {
> + data->cnt = 1;
> + data->iov = &p->pkt[idx];
> + } else {
> + data->cnt = p->pkt[idx].iov_len;
> + data->iov = &p->pkt[idx + 1];
> + }
> data->off = 0;
> - data->iov = &p->pkt[idx];
>
> for (i = 0; i < data->cnt; i++) {
> ASSERT_WITH_MSG(!packet_check_range(p, data->iov[i].iov_base,
> diff --git a/packet.h b/packet.h
> index e51cbd19fdc4..67dc7deb17db 100644
> --- a/packet.h
> +++ b/packet.h
> @@ -37,7 +37,7 @@ void packet_add_do(struct pool *p, struct iov_tail *data,
> const char *func, int line);
> bool packet_get_do(const struct pool *p, const size_t idx,
> struct iov_tail *data, const char *func, int line);
> -bool pool_full(const struct pool *p);
> +bool pool_full(const struct pool *p, const struct iov_tail *data);
> void pool_flush(struct pool *p);
>
> #define packet_add(p, data) \
> diff --git a/tap.c b/tap.c
> index 9fd00915bb01..95688b22fcb7 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1103,14 +1103,14 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
> switch (ntohs(eh->h_proto)) {
> case ETH_P_ARP:
> case ETH_P_IP:
> - if (pool_full(pool_tap4)) {
> + if (pool_full(pool_tap4, data)) {
> tap4_handler(c, pool_tap4, now);
> pool_flush(pool_tap4);
> }
> packet_add(pool_tap4, data);
> break;
> case ETH_P_IPV6:
> - if (pool_full(pool_tap6)) {
> + if (pool_full(pool_tap6, data)) {
> tap6_handler(c, pool_tap6, now);
> pool_flush(pool_tap6);
> }
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 29/30] packet: Refactor vhost-user memory region handling
2025-08-07 6:10 ` David Gibson
@ 2025-08-07 9:05 ` Laurent Vivier
2025-08-07 11:44 ` David Gibson
0 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-07 9:05 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On 07/08/2025 08:10, David Gibson wrote:
>> @@ -437,12 +437,12 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
>> r->size + r->mmap_offset);
>> }
>> }
>> - vdev->nregions = memory->nregions;
>> + vdev->memory.nregions = memory->nregions;
>>
>> debug("vhost-user nregions: %u", memory->nregions);
>> - for (i = 0; i < vdev->nregions; i++) {
>> + for (i = 0; i < vdev->memory.nregions; i++) {
>> struct vhost_user_memory_region *msg_region = &memory->regions[i];
>> - struct vu_dev_region *dev_region = &vdev->regions[i];
>> + struct vu_dev_region *dev_region = &vdev->memory.regions[i];
>> void *mmap_addr;
>>
>> debug("vhost-user region %d", i);
>> @@ -484,13 +484,7 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
>> }
>> }
>>
>> - /* As vu_packet_check_range() has no access to the number of
>> - * memory regions, mark the end of the array with mmap_addr = 0
>> - */
>> - ASSERT(vdev->nregions < VHOST_USER_MAX_RAM_SLOTS - 1);
>> - vdev->regions[vdev->nregions].mmap_addr = 0;
>> -
>> - tap_sock_update_pool(vdev->regions, 0);
>> + ASSERT(vdev->memory.nregions < VHOST_USER_MAX_RAM_SLOTS);
> It looks like the assertion is changing threshold by one, and I'm not
> sure why.
It's because previous version was using the last slot to mark the end of the array with a
NULL memory region. Now, we have a counter, it's not needed anymore.
Thanks,
Laurent
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 29/30] packet: Refactor vhost-user memory region handling
2025-08-07 9:05 ` Laurent Vivier
@ 2025-08-07 11:44 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-07 11:44 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1843 bytes --]
On Thu, Aug 07, 2025 at 11:05:46AM +0200, Laurent Vivier wrote:
> On 07/08/2025 08:10, David Gibson wrote:
> > > @@ -437,12 +437,12 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> > > r->size + r->mmap_offset);
> > > }
> > > }
> > > - vdev->nregions = memory->nregions;
> > > + vdev->memory.nregions = memory->nregions;
> > > debug("vhost-user nregions: %u", memory->nregions);
> > > - for (i = 0; i < vdev->nregions; i++) {
> > > + for (i = 0; i < vdev->memory.nregions; i++) {
> > > struct vhost_user_memory_region *msg_region = &memory->regions[i];
> > > - struct vu_dev_region *dev_region = &vdev->regions[i];
> > > + struct vu_dev_region *dev_region = &vdev->memory.regions[i];
> > > void *mmap_addr;
> > > debug("vhost-user region %d", i);
> > > @@ -484,13 +484,7 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev,
> > > }
> > > }
> > > - /* As vu_packet_check_range() has no access to the number of
> > > - * memory regions, mark the end of the array with mmap_addr = 0
> > > - */
> > > - ASSERT(vdev->nregions < VHOST_USER_MAX_RAM_SLOTS - 1);
> > > - vdev->regions[vdev->nregions].mmap_addr = 0;
> > > -
> > > - tap_sock_update_pool(vdev->regions, 0);
> > > + ASSERT(vdev->memory.nregions < VHOST_USER_MAX_RAM_SLOTS);
> > It looks like the assertion is changing threshold by one, and I'm not
> > sure why.
>
> It's because previous version was using the last slot to mark the end of the
> array with a NULL memory region. Now, we have a counter, it's not needed
> anymore.
Ah, ok. Might be worth mentioning that in the commit message.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/30] arp: Convert to iov_tail
2025-08-06 2:17 ` David Gibson
@ 2025-08-07 12:58 ` Laurent Vivier
2025-08-07 13:11 ` Stefano Brivio
0 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-07 12:58 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev, David Gibson
On 06/08/2025 04:17, David Gibson wrote:
> On Tue, Aug 05, 2025 at 05:46:05PM +0200, Laurent Vivier wrote:
>> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
>> rather than packet_get().
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>
> Still R-b, but making an observation below that's perhaps more
> relevant to the previous patch.
>
>> ---
>> arp.c | 12 +++++++++---
>> packet.c | 1 -
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/arp.c b/arp.c
>> index 9f1fedeafec0..b3ac42082841 100644
>> --- a/arp.c
>> +++ b/arp.c
>> @@ -74,14 +74,20 @@ int arp(const struct ctx *c, const struct pool *p)
>> struct arphdr ah;
>> struct arpmsg am;
>> } __attribute__((__packed__)) resp;
>> + struct arphdr ah_storage;
>> + struct ethhdr eh_storage;
>> + struct arpmsg am_storage;
>> const struct ethhdr *eh;
>> const struct arphdr *ah;
>> const struct arpmsg *am;
>> + struct iov_tail data;
>>
>> - eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
>> - ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
>> - am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
>> + if (!packet_data(p, 0, &data))
>> + return -1;
>
> The only case where packet_data() will return false is if you give it
> a bad packet index. That should never happen, by construction. So
> I'm wondering if that should be an ASSSERT() in packet_data() rather
> than a return value.
>
>
Stefano, why do you think of this idea?
Thanks,
Laurent
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/30] arp: Convert to iov_tail
2025-08-07 12:58 ` Laurent Vivier
@ 2025-08-07 13:11 ` Stefano Brivio
2025-08-13 2:21 ` David Gibson
0 siblings, 1 reply; 66+ messages in thread
From: Stefano Brivio @ 2025-08-07 13:11 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev, David Gibson
On Thu, 7 Aug 2025 14:58:34 +0200
Laurent Vivier <lvivier@redhat.com> wrote:
> On 06/08/2025 04:17, David Gibson wrote:
> > On Tue, Aug 05, 2025 at 05:46:05PM +0200, Laurent Vivier wrote:
> >> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> >> rather than packet_get().
> >>
> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >
> > Still R-b, but making an observation below that's perhaps more
> > relevant to the previous patch.
> >
> >> ---
> >> arp.c | 12 +++++++++---
> >> packet.c | 1 -
> >> 2 files changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arp.c b/arp.c
> >> index 9f1fedeafec0..b3ac42082841 100644
> >> --- a/arp.c
> >> +++ b/arp.c
> >> @@ -74,14 +74,20 @@ int arp(const struct ctx *c, const struct pool *p)
> >> struct arphdr ah;
> >> struct arpmsg am;
> >> } __attribute__((__packed__)) resp;
> >> + struct arphdr ah_storage;
> >> + struct ethhdr eh_storage;
> >> + struct arpmsg am_storage;
> >> const struct ethhdr *eh;
> >> const struct arphdr *ah;
> >> const struct arpmsg *am;
> >> + struct iov_tail data;
> >>
> >> - eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
> >> - ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
> >> - am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
> >> + if (!packet_data(p, 0, &data))
> >> + return -1;
> >
> > The only case where packet_data() will return false is if you give it
> > a bad packet index. That should never happen, by construction. So
> > I'm wondering if that should be an ASSSERT() in packet_data() rather
> > than a return value.
>
> Stefano, why do you think of this idea?
Well, yes, it *should* be by construction, but somewhere we might
eventually calculate that index (indirectly) using data we receive, and
I don't think we want to ASSERT() if somebody finds a way to make us
calculate a bad index.
It's not a strong objection against ASSERT(), though. It makes the code
marginally more terse and might help us find issues, too. I just have a
slight preference for a return value in this case anyway (better to
dodge a security issue and hide a functional issue than risking hitting
both, I think).
--
Stefano
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 17/30] dhcp: Convert to iov_tail
2025-08-06 4:38 ` David Gibson
@ 2025-08-08 9:33 ` Laurent Vivier
2025-08-13 2:27 ` David Gibson
0 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-08 9:33 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On 06/08/2025 06:38, David Gibson wrote:
> On Tue, Aug 05, 2025 at 05:46:15PM +0200, Laurent Vivier wrote:
>> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
>> and IOV_PEEK_HEADER() rather than packet_get().
>
> Unlike the previous patch, I think using iov_tail does work here,
> because there's a single scan through the options, rather than
> repeatedly scanning for options of specific types.
>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>> dhcp.c | 46 ++++++++++++++++++++++++++++------------------
>> 1 file changed, 28 insertions(+), 18 deletions(-)
>>
>> diff --git a/dhcp.c b/dhcp.c
>> index b0de04be6f27..cf73d4b07767 100644
>> --- a/dhcp.c
>> +++ b/dhcp.c
>> @@ -302,27 +302,33 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
>> */
>> int dhcp(const struct ctx *c, const struct pool *p)
>> {
>> - size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
>> char macstr[ETH_ADDRSTRLEN];
>> + size_t mlen, dlen, opt_len;
>> struct in_addr mask, dst;
>> + struct ethhdr eh_storage;
>> + struct iphdr iph_storage;
>> + struct udphdr uh_storage;
>> const struct ethhdr *eh;
>> const struct iphdr *iph;
>> const struct udphdr *uh;
>> + struct iov_tail data;
>> struct msg const *m;
>
> Pre-existing, but I'm a bit baffled as to what the (const *) is doing here.
>
>> struct msg reply;
>> unsigned int i;
>> + struct msg m_storage;
>>
>> - eh = packet_get(p, 0, offset, sizeof(*eh), NULL);
>> - offset += sizeof(*eh);
>> + if (!packet_data(p, 0, &data))
>> + return -1;
>>
>> - iph = packet_get(p, 0, offset, sizeof(*iph), NULL);
>> + eh = IOV_REMOVE_HEADER(&data, eh_storage);
>> + iph = IOV_PEEK_HEADER(&data, iph_storage);
>> if (!eh || !iph)
>> return -1;
>>
>> - offset += iph->ihl * 4UL;
>> - uh = packet_get(p, 0, offset, sizeof(*uh), &mlen);
>> - offset += sizeof(*uh);
>> + if (!iov_tail_drop(&data, iph->ihl * 4UL))
>> + return -1;
>>
>> + uh = IOV_REMOVE_HEADER(&data, uh_storage);
>> if (!uh)
>> return -1;
>>
>> @@ -332,7 +338,10 @@ int dhcp(const struct ctx *c, const struct pool *p)
>> if (c->no_dhcp)
>> return 1;
>>
>> - m = packet_get(p, 0, offset, offsetof(struct msg, o), &opt_len);
>> + mlen = iov_tail_size(&data);
>> + m = (struct msg const *)iov_remove_header_(&data, &m_storage,
>> + offsetof(struct msg, o),
>> + __alignof__(struct msg));
>> if (!m ||
>> mlen != ntohs(uh->len) - sizeof(*uh) ||
>> mlen < offsetof(struct msg, o) ||
>> @@ -355,27 +364,28 @@ int dhcp(const struct ctx *c, const struct pool *p)
>> memset(&reply.file, 0, sizeof(reply.file));
>> reply.magic = m->magic;
>>
>> - offset += offsetof(struct msg, o);
>> -
>> for (i = 0; i < ARRAY_SIZE(opts); i++)
>> opts[i].clen = -1;
>>
>> - while (opt_off + 2 < opt_len) {
>> - const uint8_t *olen, *val;
>> + opt_len = iov_tail_size(&data);
>> + while (opt_len >= 2) {
>> + uint8_t olen_storage, type_storage;
>> + const uint8_t *olen;
>> uint8_t *type;
>>
>> - type = packet_get(p, 0, offset + opt_off, 1, NULL);
>> - olen = packet_get(p, 0, offset + opt_off + 1, 1, NULL);
>> + type = IOV_REMOVE_HEADER(&data, type_storage);
>> + olen = IOV_REMOVE_HEADER(&data, olen_storage);
>
> It seems a bit mad to access single bytes via 8-byte pointers, but
> it's probably not worth the hassle of handling it differently in this
> one case.
>
>> if (!type || !olen)
>> return -1;
>>
>> - val = packet_get(p, 0, offset + opt_off + 2, *olen, NULL);
>> - if (!val)
>> + opt_len = iov_tail_size(&data);
>> + if (opt_len < *olen)
>> return -1;
>>
>> - memcpy(&opts[*type].c, val, *olen);
>> + iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
>
> So, IIUC, if *olen is much too big, this is still safe..
>
>> opts[*type].clen = *olen;
>
> .. but recording *olen unedited as the length of the option is
> probably wrong in that case.
I don't understand how to edit *olen. There is no change regarding the original code.
>
>> - opt_off += *olen + 2;
>> + iov_tail_drop(&data, *olen);
>> + opt_len -= *olen;
>
> Isn't the stanza above doing the equivalent of an
> iov_remove_header_()?
No, in fact iov_remove_header_() copy to the buffer only if the data are discontinuous, in
this case we want to copy the data unconditionally to edit them later. We don't want to
edit the data in the iovec buffer.
>
>> }
>>
>> opts[80].slen = -1;
>
Thanks,
Laurent
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 20/30] tap: Convert tap6_handler() to iov_tail
2025-08-06 6:21 ` David Gibson
@ 2025-08-08 13:57 ` Laurent Vivier
2025-08-13 3:22 ` David Gibson
0 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-08 13:57 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On 06/08/2025 08:21, David Gibson wrote:
> On Tue, Aug 05, 2025 at 05:46:18PM +0200, Laurent Vivier wrote:
>> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
>> and IOV_PEEK_HEADER() rather than packet_get().
>>
>> Remove packet_get() as it is not used anymore.
>
> [snip]
>> @@ -896,21 +896,28 @@ resume:
>> for (seq_count = 0, seq = NULL; i < in->count; i++) {
>> size_t l4len, plen, check;
>> struct in6_addr *saddr, *daddr;
>> + struct ipv6hdr ip6h_storage;
>> + struct ethhdr eh_storage;
>> + struct udphdr uh_storage;
>> const struct ethhdr *eh;
>> const struct udphdr *uh;
>> struct iov_tail data;
>> struct ipv6hdr *ip6h;
>> uint8_t proto;
>> - char *l4h;
>>
>> - eh = packet_get(in, i, 0, sizeof(*eh), NULL);
>> + if (!packet_data(in, i, &data))
>> + return -1;
>> +
>> + eh = IOV_REMOVE_HEADER(&data, eh_storage);
>> if (!eh)
>> continue;
>>
>> - ip6h = packet_get(in, i, sizeof(*eh), sizeof(*ip6h), &check);
>> + ip6h = IOV_PEEK_HEADER(&data, ip6h_storage);
>> if (!ip6h)
>> continue;
>
> You peek the IPv6 header here, but I haven't spotted where you remove
> / drop it before...
>
In fact, data offset is modified by ipv6_l4hdr() that scans the headers and the data
provided to ipv6_l4hdr() must point to an ipv6hdr (it provides also the proto that is
alloed to do the correct IOV_PEEK_HEADER()).
Thanks,
Laurent
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt()
2025-08-06 4:14 ` David Gibson
@ 2025-08-08 13:59 ` Laurent Vivier
2025-08-13 2:29 ` David Gibson
0 siblings, 1 reply; 66+ messages in thread
From: Laurent Vivier @ 2025-08-08 13:59 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On 06/08/2025 06:14, David Gibson wrote:
> On Tue, Aug 05, 2025 at 05:46:14PM +0200, Laurent Vivier wrote:
>> dhcpv6_opt() and its callers are refactored for iov_tail option parsing,
>> replacing direct offset management for improved robustness.
>>
>> Its signature is now `bool dhcpv6_opt(iov_tail *data, type)`. `*data` (in/out)
>> points to a found option on `true` return or is restored on `false`.
>> The main dhcpv6() function uses IOV_REMOVE_HEADER for the msg_hdr, then
>> passes the iov_tail (now at options start) to the new dhcpv6_opt().
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Hmm. I'm not sure this is a great use case for iov_tail - the code is
> repeatedly scanning the same options, so there's a whole bunch of
> rewinding. It works, but it seems awkward.
>
> DHCP is a slow path, anyway, so maybe we'd be better off just
> linearizing the entire packet and using plain old pointers to scan
> through the options.
>
Well, I'd like to avoid to rewrite this patch as it has a lot of changes.
But if you think it's really needed I will do in a new version of the series.
Thanks,
LAurent
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 07/30] arp: Convert to iov_tail
2025-08-07 13:11 ` Stefano Brivio
@ 2025-08-13 2:21 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-13 2:21 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev
[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]
On Thu, Aug 07, 2025 at 03:11:32PM +0200, Stefano Brivio wrote:
> On Thu, 7 Aug 2025 14:58:34 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
>
> > On 06/08/2025 04:17, David Gibson wrote:
> > > On Tue, Aug 05, 2025 at 05:46:05PM +0200, Laurent Vivier wrote:
> > >> Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> > >> rather than packet_get().
> > >>
> > >> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > >> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > >
> > > Still R-b, but making an observation below that's perhaps more
> > > relevant to the previous patch.
> > >
> > >> ---
> > >> arp.c | 12 +++++++++---
> > >> packet.c | 1 -
> > >> 2 files changed, 9 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/arp.c b/arp.c
> > >> index 9f1fedeafec0..b3ac42082841 100644
> > >> --- a/arp.c
> > >> +++ b/arp.c
> > >> @@ -74,14 +74,20 @@ int arp(const struct ctx *c, const struct pool *p)
> > >> struct arphdr ah;
> > >> struct arpmsg am;
> > >> } __attribute__((__packed__)) resp;
> > >> + struct arphdr ah_storage;
> > >> + struct ethhdr eh_storage;
> > >> + struct arpmsg am_storage;
> > >> const struct ethhdr *eh;
> > >> const struct arphdr *ah;
> > >> const struct arpmsg *am;
> > >> + struct iov_tail data;
> > >>
> > >> - eh = packet_get(p, 0, 0, sizeof(*eh), NULL);
> > >> - ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL);
> > >> - am = packet_get(p, 0, sizeof(*eh) + sizeof(*ah), sizeof(*am), NULL);
> > >> + if (!packet_data(p, 0, &data))
> > >> + return -1;
> > >
> > > The only case where packet_data() will return false is if you give it
> > > a bad packet index. That should never happen, by construction. So
> > > I'm wondering if that should be an ASSSERT() in packet_data() rather
> > > than a return value.
> >
> > Stefano, why do you think of this idea?
>
> Well, yes, it *should* be by construction, but somewhere we might
> eventually calculate that index (indirectly) using data we receive, and
> I don't think we want to ASSERT() if somebody finds a way to make us
> calculate a bad index.
I really can't imagine a scenario where we'd want to do that. The
order of things in the packet pool is entirely arbitrary, so anything
from outside can't have any data that would be relevant to finding an
index. So, in all cases we're either passing a (valid) index from one
part of our code to another, or scanning the entire pool. Any failure
in either case would represent a pretty unlikely bug on our side,
which makes ASSERT() the appropriate choice IMO.
> It's not a strong objection against ASSERT(), though. It makes the code
> marginally more terse and might help us find issues, too. I just have a
> slight preference for a return value in this case anyway (better to
> dodge a security issue and hide a functional issue than risking hitting
> both, I think).
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 17/30] dhcp: Convert to iov_tail
2025-08-08 9:33 ` Laurent Vivier
@ 2025-08-13 2:27 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-13 2:27 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5544 bytes --]
On Fri, Aug 08, 2025 at 11:33:24AM +0200, Laurent Vivier wrote:
> On 06/08/2025 06:38, David Gibson wrote:
> > On Tue, Aug 05, 2025 at 05:46:15PM +0200, Laurent Vivier wrote:
> > > Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> > > and IOV_PEEK_HEADER() rather than packet_get().
> >
> > Unlike the previous patch, I think using iov_tail does work here,
> > because there's a single scan through the options, rather than
> > repeatedly scanning for options of specific types.
> >
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > ---
> > > dhcp.c | 46 ++++++++++++++++++++++++++++------------------
> > > 1 file changed, 28 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/dhcp.c b/dhcp.c
> > > index b0de04be6f27..cf73d4b07767 100644
> > > --- a/dhcp.c
> > > +++ b/dhcp.c
> > > @@ -302,27 +302,33 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
> > > */
> > > int dhcp(const struct ctx *c, const struct pool *p)
> > > {
> > > - size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
> > > char macstr[ETH_ADDRSTRLEN];
> > > + size_t mlen, dlen, opt_len;
> > > struct in_addr mask, dst;
> > > + struct ethhdr eh_storage;
> > > + struct iphdr iph_storage;
> > > + struct udphdr uh_storage;
> > > const struct ethhdr *eh;
> > > const struct iphdr *iph;
> > > const struct udphdr *uh;
> > > + struct iov_tail data;
> > > struct msg const *m;
> >
> > Pre-existing, but I'm a bit baffled as to what the (const *) is doing here.
> >
> > > struct msg reply;
> > > unsigned int i;
> > > + struct msg m_storage;
> > > - eh = packet_get(p, 0, offset, sizeof(*eh), NULL);
> > > - offset += sizeof(*eh);
> > > + if (!packet_data(p, 0, &data))
> > > + return -1;
> > > - iph = packet_get(p, 0, offset, sizeof(*iph), NULL);
> > > + eh = IOV_REMOVE_HEADER(&data, eh_storage);
> > > + iph = IOV_PEEK_HEADER(&data, iph_storage);
> > > if (!eh || !iph)
> > > return -1;
> > > - offset += iph->ihl * 4UL;
> > > - uh = packet_get(p, 0, offset, sizeof(*uh), &mlen);
> > > - offset += sizeof(*uh);
> > > + if (!iov_tail_drop(&data, iph->ihl * 4UL))
> > > + return -1;
> > > + uh = IOV_REMOVE_HEADER(&data, uh_storage);
> > > if (!uh)
> > > return -1;
> > > @@ -332,7 +338,10 @@ int dhcp(const struct ctx *c, const struct pool *p)
> > > if (c->no_dhcp)
> > > return 1;
> > > - m = packet_get(p, 0, offset, offsetof(struct msg, o), &opt_len);
> > > + mlen = iov_tail_size(&data);
> > > + m = (struct msg const *)iov_remove_header_(&data, &m_storage,
> > > + offsetof(struct msg, o),
> > > + __alignof__(struct msg));
> > > if (!m ||
> > > mlen != ntohs(uh->len) - sizeof(*uh) ||
> > > mlen < offsetof(struct msg, o) ||
> > > @@ -355,27 +364,28 @@ int dhcp(const struct ctx *c, const struct pool *p)
> > > memset(&reply.file, 0, sizeof(reply.file));
> > > reply.magic = m->magic;
> > > - offset += offsetof(struct msg, o);
> > > -
> > > for (i = 0; i < ARRAY_SIZE(opts); i++)
> > > opts[i].clen = -1;
> > > - while (opt_off + 2 < opt_len) {
> > > - const uint8_t *olen, *val;
> > > + opt_len = iov_tail_size(&data);
> > > + while (opt_len >= 2) {
> > > + uint8_t olen_storage, type_storage;
> > > + const uint8_t *olen;
> > > uint8_t *type;
> > > - type = packet_get(p, 0, offset + opt_off, 1, NULL);
> > > - olen = packet_get(p, 0, offset + opt_off + 1, 1, NULL);
> > > + type = IOV_REMOVE_HEADER(&data, type_storage);
> > > + olen = IOV_REMOVE_HEADER(&data, olen_storage);
> >
> > It seems a bit mad to access single bytes via 8-byte pointers, but
> > it's probably not worth the hassle of handling it differently in this
> > one case.
> >
> > > if (!type || !olen)
> > > return -1;
> > > - val = packet_get(p, 0, offset + opt_off + 2, *olen, NULL);
> > > - if (!val)
> > > + opt_len = iov_tail_size(&data);
> > > + if (opt_len < *olen)
> > > return -1;
> > > - memcpy(&opts[*type].c, val, *olen);
> > > + iov_to_buf(&data.iov[0], data.cnt, data.off, &opts[*type].c, *olen);
> >
> > So, IIUC, if *olen is much too big, this is still safe..
> >
> > > opts[*type].clen = *olen;
> >
> > .. but recording *olen unedited as the length of the option is
> > probably wrong in that case.
>
> I don't understand how to edit *olen. There is no change regarding the original code.
Sorry, I was concerned about a malformed packet which gave a long olen
when there isn't actually that much that. I missed that you tested
for (opt_len < *olen) above, which handles that case.
> >
> > > - opt_off += *olen + 2;
> > > + iov_tail_drop(&data, *olen);
> > > + opt_len -= *olen;
> >
> > Isn't the stanza above doing the equivalent of an
> > iov_remove_header_()?
>
> No, in fact iov_remove_header_() copy to the buffer only if the data are
> discontinuous, in this case we want to copy the data unconditionally to edit
> them later. We don't want to edit the data in the iovec buffer.
Ah, right. Unconditionally linearizing / copying a header seems like
it might be a useful thing in more places. I wonder if we should make
that its own helper and we can use it both here and in the slow path
of iov_remove_header_().
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt()
2025-08-08 13:59 ` Laurent Vivier
@ 2025-08-13 2:29 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-13 2:29 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]
On Fri, Aug 08, 2025 at 03:59:51PM +0200, Laurent Vivier wrote:
> On 06/08/2025 06:14, David Gibson wrote:
> > On Tue, Aug 05, 2025 at 05:46:14PM +0200, Laurent Vivier wrote:
> > > dhcpv6_opt() and its callers are refactored for iov_tail option parsing,
> > > replacing direct offset management for improved robustness.
> > >
> > > Its signature is now `bool dhcpv6_opt(iov_tail *data, type)`. `*data` (in/out)
> > > points to a found option on `true` return or is restored on `false`.
> > > The main dhcpv6() function uses IOV_REMOVE_HEADER for the msg_hdr, then
> > > passes the iov_tail (now at options start) to the new dhcpv6_opt().
> > >
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >
> > Hmm. I'm not sure this is a great use case for iov_tail - the code is
> > repeatedly scanning the same options, so there's a whole bunch of
> > rewinding. It works, but it seems awkward.
> >
> > DHCP is a slow path, anyway, so maybe we'd be better off just
> > linearizing the entire packet and using plain old pointers to scan
> > through the options.
> >
>
> Well, I'd like to avoid to rewrite this patch as it has a lot of
> changes.
Yeah, fair enough.
> But if you think it's really needed I will do in a new version of
> the series.
No, I don't think it's important enough.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v8 20/30] tap: Convert tap6_handler() to iov_tail
2025-08-08 13:57 ` Laurent Vivier
@ 2025-08-13 3:22 ` David Gibson
0 siblings, 0 replies; 66+ messages in thread
From: David Gibson @ 2025-08-13 3:22 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]
On Fri, Aug 08, 2025 at 03:57:04PM +0200, Laurent Vivier wrote:
> On 06/08/2025 08:21, David Gibson wrote:
> > On Tue, Aug 05, 2025 at 05:46:18PM +0200, Laurent Vivier wrote:
> > > Use packet_data() and extract headers using IOV_REMOVE_HEADER()
> > > and IOV_PEEK_HEADER() rather than packet_get().
> > >
> > > Remove packet_get() as it is not used anymore.
> >
> > [snip]
> > > @@ -896,21 +896,28 @@ resume:
> > > for (seq_count = 0, seq = NULL; i < in->count; i++) {
> > > size_t l4len, plen, check;
> > > struct in6_addr *saddr, *daddr;
> > > + struct ipv6hdr ip6h_storage;
> > > + struct ethhdr eh_storage;
> > > + struct udphdr uh_storage;
> > > const struct ethhdr *eh;
> > > const struct udphdr *uh;
> > > struct iov_tail data;
> > > struct ipv6hdr *ip6h;
> > > uint8_t proto;
> > > - char *l4h;
> > > - eh = packet_get(in, i, 0, sizeof(*eh), NULL);
> > > + if (!packet_data(in, i, &data))
> > > + return -1;
> > > +
> > > + eh = IOV_REMOVE_HEADER(&data, eh_storage);
> > > if (!eh)
> > > continue;
> > > - ip6h = packet_get(in, i, sizeof(*eh), sizeof(*ip6h), &check);
> > > + ip6h = IOV_PEEK_HEADER(&data, ip6h_storage);
> > > if (!ip6h)
> > > continue;
> >
> > You peek the IPv6 header here, but I haven't spotted where you remove
> > / drop it before...
> >
>
> In fact, data offset is modified by ipv6_l4hdr() that scans the headers and
> the data provided to ipv6_l4hdr() must point to an ipv6hdr (it provides also
> the proto that is alloed to do the correct IOV_PEEK_HEADER()).
Ah, right.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2025-08-13 3:27 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-05 15:45 [PATCH v8 00/30] Introduce discontiguous frames management Laurent Vivier
2025-08-05 15:45 ` [PATCH v8 01/30] arp: Don't mix incoming and outgoing buffers Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 02/30] iov: Introduce iov_tail_clone() and iov_tail_drop() Laurent Vivier
2025-08-06 1:32 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 03/30] iov: Update IOV_REMOVE_HEADER() and IOV_PEEK_HEADER() Laurent Vivier
2025-08-06 1:45 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 04/30] tap: Use iov_tail with tap_add_packet() Laurent Vivier
2025-08-06 1:56 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 05/30] packet: Use iov_tail with packet_add() Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 06/30] packet: Add packet_data() Laurent Vivier
2025-08-06 2:14 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 07/30] arp: Convert to iov_tail Laurent Vivier
2025-08-06 2:17 ` David Gibson
2025-08-07 12:58 ` Laurent Vivier
2025-08-07 13:11 ` Stefano Brivio
2025-08-13 2:21 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 08/30] ndp: " Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 09/30] icmp: " Laurent Vivier
2025-08-06 2:20 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 10/30] udp: " Laurent Vivier
2025-08-06 2:23 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 11/30] tcp: Convert tcp_tap_handler() to use iov_tail Laurent Vivier
2025-08-06 2:35 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 12/30] tcp: Convert tcp_data_from_tap() " Laurent Vivier
2025-08-06 2:37 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 13/30] dhcpv6: move offset initialization out of dhcpv6_opt() Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 14/30] dhcpv6: Extract sending of NotOnLink status Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 15/30] dhcpv6: Convert to iov_tail Laurent Vivier
2025-08-05 15:46 ` [PATCH v8 16/30] dhcpv6: Use iov_tail in dhcpv6_opt() Laurent Vivier
2025-08-06 4:14 ` David Gibson
2025-08-08 13:59 ` Laurent Vivier
2025-08-13 2:29 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 17/30] dhcp: Convert to iov_tail Laurent Vivier
2025-08-06 4:38 ` David Gibson
2025-08-08 9:33 ` Laurent Vivier
2025-08-13 2:27 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 18/30] ip: Use iov_tail in ipv6_l4hdr() Laurent Vivier
2025-08-06 5:12 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 19/30] tap: Convert tap4_handler() to iov_tail Laurent Vivier
2025-08-06 5:17 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 20/30] tap: Convert tap6_handler() " Laurent Vivier
2025-08-06 6:21 ` David Gibson
2025-08-08 13:57 ` Laurent Vivier
2025-08-13 3:22 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 21/30] packet: rename packet_data() to packet_get() Laurent Vivier
2025-08-06 6:22 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 22/30] arp: use iov_tail rather than pool Laurent Vivier
2025-08-06 6:24 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 23/30] dhcp: " Laurent Vivier
2025-08-06 6:26 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 24/30] dhcpv6: " Laurent Vivier
2025-08-06 6:27 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 25/30] icmp: " Laurent Vivier
2025-08-06 6:29 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 26/30] ndp: " Laurent Vivier
2025-08-06 6:31 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 27/30] packet: remove PACKET_POOL() and PACKET_POOL_P() Laurent Vivier
2025-08-06 6:32 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 28/30] packet: remove unused parameter from PACKET_POOL_DECL() Laurent Vivier
2025-08-06 6:33 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 29/30] packet: Refactor vhost-user memory region handling Laurent Vivier
2025-08-07 6:10 ` David Gibson
2025-08-07 9:05 ` Laurent Vivier
2025-08-07 11:44 ` David Gibson
2025-08-05 15:46 ` [PATCH v8 30/30] packet: Add support for multi-vector packets Laurent Vivier
2025-08-07 6:17 ` David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).