[PATCH v2 0/8] Add vhost-user support to passt (part 1)

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH v2 0/8] Add vhost-user support to passt (part 1)
@ 2024-02-14  8:56 Laurent Vivier
  2024-02-14  8:56 ` [PATCH v2 1/8] iov: add some functions to manage iovec Laurent Vivier
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

v2 comparing to vhost-user full part:
  - part 1 includes only preliminary patches (checksum, iovec, cleanup)
  - see detailed v2 history log in each patch.

Full series v1 available at:

  [PATCH 00/24] Add vhost-user support to passt.
  https://url.corp.redhat.com/passt-vhost-user-v1

Thanks,
Laurent

Laurent Vivier (8):
  iov: add some functions to manage iovec
  pcap: add pcap_iov()
  checksum: align buffers
  checksum: add csum_iov()
  util: move IP stuff from util.[ch] to ip.[ch]
  checksum: use csum_ip4_header() in udp.c and tcp.c
  checksum: introduce functions to compute the header part checksum for
    TCP/UDP
  tap: make tap_update_mac() generic

 Makefile     |  12 +--
 checksum.c   | 163 +++++++++++++++++++++------------------
 checksum.h   |  14 ++--
 conf.c       |   1 +
 dhcp.c       |   1 +
 flow.c       |   1 +
 icmp.c       |   1 +
 iov.c        | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++
 iov.h        |  43 +++++++++++
 ip.c         |  72 +++++++++++++++++
 ip.h         |  86 +++++++++++++++++++++
 ndp.c        |   1 +
 pcap.c       |  61 +++++++++++++--
 pcap.h       |   1 +
 port_fwd.c   |   1 +
 qrap.c       |   1 +
 tap.c        |  32 ++++++--
 tap.h        |   2 +-
 tcp.c        |  73 ++++++------------
 tcp_splice.c |   1 +
 udp.c        |  39 ++++------
 util.c       |  55 -------------
 util.h       |  76 ------------------
 23 files changed, 640 insertions(+), 309 deletions(-)
 create mode 100644 iov.c
 create mode 100644 iov.h
 create mode 100644 ip.c
 create mode 100644 ip.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/8] iov: add some functions to manage iovec
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  0:24   ` David Gibson
  2024-02-14  8:56 ` [PATCH v2 2/8] pcap: add pcap_iov() Laurent Vivier
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce functions to copy to/from a buffer from/to an iovec array,
to compute data length in in bytes of an iovec and to copy memory from
an iovec to another.

iov_from_buf(), iov_to_buf(), iov_size(), iov_copy().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
       - reorder added files in alphanetical order in Makefile
       - update comments, cosmetic cleanup
       - rename iov_from_buf_full/iov_to_buf_full to
         iov_fill_from_buf/iov_fill_to_buf
       - split loops that manage offset and bytes copy.
       - move iov_from_buf()/iov_to_buf() to iov.c

 Makefile |   8 +--
 iov.c    | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 iov.h    |  43 +++++++++++
 3 files changed, 259 insertions(+), 4 deletions(-)
 create mode 100644 iov.c
 create mode 100644 iov.h

diff --git a/Makefile b/Makefile
index af4fa87e7e13..156398b3844e 100644
--- a/Makefile
+++ b/Makefile
@@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\"
 FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
 
 PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
-	igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \
-	passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \
-	util.c
+	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
+	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
+	tcp_splice.c udp.c util.c
 QRAP_SRCS = qrap.c
 SRCS = $(PASST_SRCS) $(QRAP_SRCS)
 
 MANPAGES = passt.1 pasta.1 qrap.1
 
 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
-	flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \
+	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
 	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
 	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
 HEADERS = $(PASST_HEADERS) seccomp.h
diff --git a/iov.c b/iov.c
new file mode 100644
index 000000000000..73dd5cf25d0d
--- /dev/null
+++ b/iov.c
@@ -0,0 +1,212 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iov.h - helpers for using (partial) iovecs.
+ *
+ * Copyrigh (c) 2024 Red Hat
+ * Author: Laurent Vivier <lvivier@redhat.com>
+ *
+ * This file also contains code originally from QEMU include/qemu/iov.h
+ * and licensed under the following terms:
+ *
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Amit Shah <amit.shah@redhat.com>
+ *  Michael Tokarev <mjt@tls.msk.ru>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+#include <sys/socket.h>
+
+#include "util.h"
+#include "iov.h"
+
+/**
+ * iov_from_buf - Copy data from a buffer to a scatter/gather
+ *                I/O vector (struct iovec) efficiently.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the
+ *             scatter/gather I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Byte offset in the iov array where copying should start.
+ * @buf:       Pointer to the source buffer containing the data to copy.
+ * @bytes:     Total number of bytes to copy from buf to iov.
+ *
+ * Returns:    The number of bytes successfully copied.
+ */
+size_t iov_from_buf(const struct iovec *iov, unsigned int iov_cnt,
+		    size_t offset, const void *buf, size_t bytes)
+{
+	if (__builtin_constant_p(bytes) && iov_cnt &&
+		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
+		memcpy((char *)iov[0].iov_base + offset, buf, bytes);
+		return bytes;
+	}
+
+	return iov_fill_from_buf(iov, iov_cnt, offset, buf, bytes);
+}
+
+/**
+ * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to
+ *		a buffer efficiently.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the scatter/gather
+ *             I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Offset within the first element of iov from where copying should start.
+ * @buf:       Pointer to the destination buffer where data will be copied.
+ * @bytes:     Total number of bytes to copy from iov to buf.
+ *
+ * Returns:    The number of bytes successfully copied.
+ */
+size_t iov_to_buf(const struct iovec *iov, unsigned int iov_cnt,
+		  size_t offset, void *buf, size_t bytes)
+{
+	if (__builtin_constant_p(bytes) && iov_cnt &&
+		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
+		memcpy(buf, (char *)iov[0].iov_base + offset, bytes);
+		return bytes;
+	}
+
+	return iov_fill_to_buf(iov, iov_cnt, offset, buf, bytes);
+}
+
+/**
+ * iov_fill_from_buf - Copy data from a buffer to a scatter/gather
+ *                     I/O vector (struct iovec) until either all bytes
+ *                     are copied or all elements in the vector are filled.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the scatter/gather
+ *             I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Byte offset in the iov array where copying should start.
+ * @buf:       Pointer to the source buffer containing the data to copy.
+ * @bytes:     Total number of bytes to copy from buf to iov.
+ *
+ * Returns:    The total number of bytes successfully copied
+ *
+ */
+size_t iov_fill_from_buf(const struct iovec *iov, unsigned int iov_cnt,
+			 size_t offset, const void *buf, size_t bytes)
+{
+	unsigned int i;
+	size_t copied;
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
+		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
+
+		memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied,
+		       len);
+		copied += len;
+		offset = 0;
+	}
+
+	return copied;
+}
+
+/**
+ * iov_fill_to_buf - Copy data from a scatter/gather I/O vector (struct iovec)
+ *                   to a buffer until either all bytes are copied or all
+ *                   elements in the vector are read.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the
+ *             scatter/gather I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Byte offset in the iov array where copying should start.
+ * @buf:       Pointer to the destination buffer where data will be copied.
+ * @bytes:     Total number of bytes to copy from iov to buf.
+ *
+ * Returns:    The total number of bytes successfully copied
+ */
+size_t iov_fill_to_buf(const struct iovec *iov, unsigned int iov_cnt,
+		       size_t offset, void *buf, size_t bytes)
+{
+	unsigned int i;
+	size_t copied;
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
+		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
+		memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
+		       len);
+		copied += len;
+		offset = 0;
+	}
+
+	return copied;
+}
+
+/**
+ * iov_size - Calculate the total size of a scatter/gather I/O vector
+ *            (struct iovec).
+ *
+ * @iov:       Pointer to the array of struct iovec describing the
+ *             scatter/gather I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ *
+ * Returns:    The total size in bytes.
+ */
+size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt)
+{
+	size_t len;
+	unsigned int i;
+
+	for (i = 0, len = 0; i < iov_cnt; i++) {
+		len += iov[i].iov_len;
+	}
+	return len;
+}
+
+/**
+ * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to
+ *            another.
+ *
+ * @dst_iov:      Pointer to the destination array of struct iovec describing
+ *                the scatter/gather I/O vector to copy to.
+ * @dst_iov_cnt:  Number of elements in the destination iov array.
+ * @iov:          Pointer to the source array of struct iovec describing
+ *                the scatter/gather I/O vector to copy from.
+ * @iov_cnt:      Number of elements in the source iov array.
+ * @offset:       Offset within the source iov from where copying should start.
+ * @bytes:        Total number of bytes to copy from iov to dst_iov.
+ *
+ * Returns:       The number of elements successfully copied to the destination
+ *                iov array.
+ */
+unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
+		  const struct iovec *iov, unsigned int iov_cnt,
+		  size_t offset, size_t bytes)
+{
+	unsigned int i, j;
+	size_t len;
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) {
+		len = MIN(bytes, iov[i].iov_len - offset);
+
+		dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
+		dst_iov[j].iov_len = len;
+		j++;
+		bytes -= len;
+		offset = 0;
+	}
+
+	return j;
+}
diff --git a/iov.h b/iov.h
new file mode 100644
index 000000000000..0153acca9e62
--- /dev/null
+++ b/iov.h
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iov.c - helpers for using (partial) iovecs.
+ *
+ * Copyrigh (c) 2024 Red Hat
+ * Author: Laurent Vivier <lvivier@redhat.com>
+ *
+ * This file also contains code originally from QEMU include/qemu/iov.h
+ * and licensed under the following terms:
+ *
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Amit Shah <amit.shah@redhat.com>
+ *  Michael Tokarev <mjt@tls.msk.ru>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#ifndef IOVEC_H
+#define IOVEC_H
+
+#include <unistd.h>
+#include <string.h>
+
+size_t iov_fill_from_buf(const struct iovec *iov, unsigned int iov_cnt,
+			 size_t offset, const void *buf, size_t bytes);
+size_t iov_fill_to_buf(const struct iovec *iov, const unsigned int iov_cnt,
+		       size_t offset, void *buf, size_t bytes);
+size_t iov_from_buf(const struct iovec *iov, unsigned int iov_cnt,
+                    size_t offset, const void *buf, size_t bytes);
+size_t iov_to_buf(const struct iovec *iov, unsigned int iov_cnt,
+                  size_t offset, void *buf, size_t bytes);
+
+size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt);
+unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
+		  const struct iovec *iov, unsigned int iov_cnt,
+		  size_t offset, size_t bytes);
+#endif /* IOVEC_H */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 2/8] pcap: add pcap_iov()
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
  2024-02-14  8:56 ` [PATCH v2 1/8] iov: add some functions to manage iovec Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  0:35   ` David Gibson
  2024-02-16  5:30   ` Stefano Brivio
  2024-02-14  8:56 ` [PATCH v2 3/8] checksum: align buffers Laurent Vivier
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - introduce pcap_header(), a common helper to write
        packet header
      - use writev() rather than write() in a loop
      - add functions comment

 pcap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 pcap.h |  1 +
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/pcap.c b/pcap.c
index 501d52d4992b..3869a403dd0f 100644
--- a/pcap.c
+++ b/pcap.c
@@ -20,6 +20,7 @@
 #include <sys/time.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/uio.h>
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
@@ -31,6 +32,7 @@
 #include "util.h"
 #include "passt.h"
 #include "log.h"
+#include "iov.h"
 
 #define PCAP_VERSION_MINOR 4
 
@@ -65,6 +67,28 @@ struct pcap_pkthdr {
 	uint32_t len;
 };
 
+/*
+ * pcap_header - Write a pcap packet header to the pcap file descriptor (pcap_fd).
+ *
+ * @len:       Length of the packet data.
+ * @tv:        Pointer to a timeval struct containing the timestamp for the packet.
+ *
+ * Returns; -1 in case of error, otherwise, 0 to indicate success.
+ */
+static int pcap_header(size_t len, const struct timeval *tv)
+{
+	struct pcap_pkthdr h;
+
+	h.tv_sec = tv->tv_sec;
+	h.tv_usec = tv->tv_usec;
+	h.caplen = h.len = len;
+
+	if (write(pcap_fd, &h, sizeof(h)) < 0)
+		return -1;
+
+	return 0;
+}
+
 /**
  * pcap_frame() - Capture a single frame to pcap file with given timestamp
  * @pkt:	Pointer to data buffer, including L2 headers
@@ -75,13 +99,7 @@ struct pcap_pkthdr {
  */
 static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
 {
-	struct pcap_pkthdr h;
-
-	h.tv_sec = tv->tv_sec;
-	h.tv_usec = tv->tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
+	if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0)
 		return -errno;
 
 	return 0;
@@ -130,6 +148,35 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 	}
 }
 
+/*
+ * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov)
+ *             to a pcap file descriptor (pcap_fd).
+ *
+ * @iov:       Pointer to the array of struct iovec describing the scatter/gather
+ *             I/O vector containing packet data to write, including L2 header
+ * @n:         Number of elements in the iov array.
+ */
+void pcap_iov(const struct iovec *iov, unsigned int n)
+{
+	struct timeval tv;
+	size_t len;
+
+	if (pcap_fd == -1)
+		return;
+
+	gettimeofday(&tv, NULL);
+
+	len = iov_size(iov, n);
+
+	if (pcap_header(len, &tv) < 0) {
+		debug("Cannot write pcap header");
+		return;
+	}
+
+	if (writev(pcap_fd, iov, n) < 0)
+		debug("Cannot log packet using writev(), n = %u\n", n);
+}
+
 /**
  * pcap_init() - Initialise pcap file
  * @c:		Execution context
diff --git a/pcap.h b/pcap.h
index da5a7e846b72..732a0ddf14cc 100644
--- a/pcap.h
+++ b/pcap.h
@@ -8,6 +8,7 @@
 
 void pcap(const char *pkt, size_t len);
 void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
+void pcap_iov(const struct iovec *iov, unsigned int n);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 3/8] checksum: align buffers
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
  2024-02-14  8:56 ` [PATCH v2 1/8] iov: add some functions to manage iovec Laurent Vivier
  2024-02-14  8:56 ` [PATCH v2 2/8] pcap: add pcap_iov() Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  0:40   ` David Gibson
  2024-02-14  8:56 ` [PATCH v2 4/8] checksum: add csum_iov() Laurent Vivier
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

if buffer is not aligned use sum_16b() only on the not aligned
part, and then use csum_avx2() on the remaining part

Remove unneeded now function csum_unaligned().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - use ROUND_UP() and sizeof(__m256i)
      - fix function comment
      - remove csum_unaligned() and use csum() instead

 checksum.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/checksum.c b/checksum.c
index f21c9b7a14d1..65486b4625ba 100644
--- a/checksum.c
+++ b/checksum.c
@@ -56,6 +56,8 @@
 #include <linux/udp.h>
 #include <linux/icmpv6.h>
 
+#include "util.h"
+
 /* Checksums are optional for UDP over IPv4, so we usually just set
  * them to 0.  Change this to 1 to calculate real UDP over IPv4
  * checksums
@@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum)
 	return sum;
 }
 
-/**
- * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data
- * @buf:	Input data
- * @len:	Input length
- * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
- *
- * Return: 16-bit IPv4-style checksum
- */
-/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
-__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
-uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
-{
-	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
-}
+uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 /**
  * csum_ip4_header() - Calculate and set IPv4 header checksum
@@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 void csum_ip4_header(struct iphdr *ip4h)
 {
 	ip4h->check = 0;
-	ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0);
+	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
 }
 
 /**
@@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr,
 			+ htons(IPPROTO_UDP);
 		/* Add in partial checksum for the UDP header alone */
 		psum += sum_16b(udp4hr, sizeof(*udp4hr));
-		udp4hr->check = csum_unaligned(payload, len, psum);
+		udp4hr->check = csum(payload, len, psum);
 	}
 }
 
@@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 	/* Partial checksum for ICMP header alone */
 	psum = sum_16b(icmp4hr, sizeof(*icmp4hr));
 
-	icmp4hr->checksum = csum_unaligned(payload, len, psum);
+	icmp4hr->checksum = csum(payload, len, psum);
 }
 
 /**
@@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr,
 	udp6hr->check = 0;
 	/* Add in partial checksum for the UDP header alone */
 	psum += sum_16b(udp6hr, sizeof(*udp6hr));
-	udp6hr->check = csum_unaligned(payload, len, psum);
+	udp6hr->check = csum(payload, len, psum);
 }
 
 /**
@@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
 	icmp6hr->icmp6_cksum = 0;
 	/* Add in partial checksum for the ICMPv6 header alone */
 	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
-	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
+	icmp6hr->icmp6_cksum = csum(payload, len, psum);
 }
 
 #ifdef __AVX2__
@@ -397,17 +386,29 @@ less_than_128_bytes:
 
 /**
  * csum() - Compute TCP/IP-style checksum
- * @buf:	Input buffer, must be aligned to 32-byte boundary
+ * @buf:	Input buffer
  * @len:	Input length
  * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
  *
- * Return: 16-bit folded, complemented checksum sum
+ * Return: 16-bit folded, complemented checksum
  */
 /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return (uint16_t)~csum_fold(csum_avx2(buf, len, init));
+	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
+	unsigned int pad = align - (intptr_t)buf;
+
+	if (len < pad)
+		pad = len;
+
+	if (pad)
+		init += sum_16b(buf, pad);
+
+	if (len > pad)
+		init = csum_avx2((void *)align, len - pad, init);
+
+	return (uint16_t)~csum_fold(init);
 }
 
 #else /* __AVX2__ */
@@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return csum_unaligned(buf, len, init);
+	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
 #endif /* !__AVX2__ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 4/8] checksum: add csum_iov()
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (2 preceding siblings ...)
  2024-02-14  8:56 ` [PATCH v2 3/8] checksum: align buffers Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  0:44   ` David Gibson
  2024-02-14  8:56 ` [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce the function csum_unfolded() that computes the unfolded
32-bit checksum of a data buffer, and call it from csum() that returns
the folded value.

Introduce csum_iov() that computes the checksum using csum_folded() on
all vectors of the iovec array and returns the folded result.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - fix typo and superfluous space
      - update comments

 checksum.c | 46 ++++++++++++++++++++++++++++++++++------------
 checksum.h |  1 +
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/checksum.c b/checksum.c
index 65486b4625ba..ac2bc49f7eb0 100644
--- a/checksum.c
+++ b/checksum.c
@@ -385,16 +385,16 @@ less_than_128_bytes:
 }
 
 /**
- * csum() - Compute TCP/IP-style checksum
- * @buf:	Input buffer
- * @len:	Input length
- * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * csum_unfolded - Calculate the unfolded checksum of a data buffer.
  *
- * Return: 16-bit folded, complemented checksum
+ * @buf:   Input buffer
+ * @len:   Input length
+ * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
+ *
+ * Return: 32-bit unfolded, complemented checksum
  */
-/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
-uint16_t csum(const void *buf, size_t len, uint32_t init)
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
 {
 	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
 	unsigned int pad = align - (intptr_t)buf;
@@ -408,16 +408,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 	if (len > pad)
 		init = csum_avx2((void *)align, len - pad, init);
 
-	return (uint16_t)~csum_fold(init);
+	return init;
 }
-
 #else /* __AVX2__ */
+/**
+ * csum_unfolded - Calculate the unfolded checksum of a data buffer.
+ *
+ * @buf:   Input buffer
+ * @len:   Input length
+ * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
+ *
+ * Return: 32-bit unfolded, complemented checksum
+ */
+__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
+{
+	return sum_16b(buf, len) + init;
+}
+#endif /* !__AVX2__ */
 
 /**
  * csum() - Compute TCP/IP-style checksum
  * @buf:	Input buffer
  * @len:	Input length
- * @sum:	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
  *
  * Return: 16-bit folded, complemented checksum
  */
@@ -425,7 +439,15 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
+	return (uint16_t)~csum_fold(csum_unfolded(buf, len, init));
 }
 
-#endif /* !__AVX2__ */
+uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; i++)
+		init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init);
+
+	return (uint16_t)~csum_fold(init);
+}
diff --git a/checksum.h b/checksum.h
index 21c0310d3804..6a20297a5826 100644
--- a/checksum.h
+++ b/checksum.h
@@ -25,5 +25,6 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
 		const void *payload, size_t len);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
+uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init);
 
 #endif /* CHECKSUM_H */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch]
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (3 preceding siblings ...)
  2024-02-14  8:56 ` [PATCH v2 4/8] checksum: add csum_iov() Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  2:29   ` David Gibson
  2024-02-14  8:56 ` [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce ip.[ch] file to encapsulate IP protocol handling
functions and structures.
Modify various files to include the new header ip.h when it's
needed.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - update rational and comments

 Makefile     |  8 ++---
 conf.c       |  1 +
 dhcp.c       |  1 +
 flow.c       |  1 +
 icmp.c       |  1 +
 ip.c         | 72 +++++++++++++++++++++++++++++++++++++++++++
 ip.h         | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 ndp.c        |  1 +
 port_fwd.c   |  1 +
 qrap.c       |  1 +
 tap.c        |  1 +
 tcp.c        |  1 +
 tcp_splice.c |  1 +
 udp.c        |  1 +
 util.c       | 55 ---------------------------------
 util.h       | 76 ----------------------------------------------
 16 files changed, 173 insertions(+), 135 deletions(-)
 create mode 100644 ip.c
 create mode 100644 ip.h

diff --git a/Makefile b/Makefile
index 156398b3844e..e1ebb454bc6b 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\"
 FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
 
 PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
-	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
+	igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
 	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
 	tcp_splice.c udp.c util.c
 QRAP_SRCS = qrap.c
@@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS)
 MANPAGES = passt.1 pasta.1 qrap.1
 
 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
-	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
-	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
-	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
+	flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \
+	ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \
+	siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
 HEADERS = $(PASST_HEADERS) seccomp.h
 
 C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 };
diff --git a/conf.c b/conf.c
index 5e15b665be9c..93bfda331349 100644
--- a/conf.c
+++ b/conf.c
@@ -35,6 +35,7 @@
 #include <netinet/if_ether.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "netlink.h"
 #include "udp.h"
diff --git a/dhcp.c b/dhcp.c
index 110772867632..ff4834a3dce9 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -25,6 +25,7 @@
 #include <limits.h>
 
 #include "util.h"
+#include "ip.h"
 #include "checksum.h"
 #include "packet.h"
 #include "passt.h"
diff --git a/flow.c b/flow.c
index 5e94a7a949e5..73d52bda8774 100644
--- a/flow.c
+++ b/flow.c
@@ -11,6 +11,7 @@
 #include <string.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "siphash.h"
 #include "inany.h"
diff --git a/icmp.c b/icmp.c
index 9434fc5a7490..3b85a8578316 100644
--- a/icmp.c
+++ b/icmp.c
@@ -33,6 +33,7 @@
 
 #include "packet.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "log.h"
diff --git a/ip.c b/ip.c
new file mode 100644
index 000000000000..2cc7f6548aff
--- /dev/null
+++ b/ip.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/* PASST - Plug A Simple Socket Transport
+ *  for qemu/UNIX domain socket mode
+ *
+ * PASTA - Pack A Subtle Tap Abstraction
+ *  for network namespace/tap device mode
+ *
+ * ip.c - IP related functions
+ *
+ * Copyright (c) 2020-2021 Red Hat GmbH
+ * Author: Stefano Brivio <sbrivio@redhat.com>
+ */
+
+#include <stddef.h>
+#include "util.h"
+#include "ip.h"
+
+#define IPV6_NH_OPT(nh)							\
+	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
+	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
+	 (nh) == 140 || (nh) == 253 || (nh) == 254)
+
+/**
+ * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
+ * @p:		Packet pool, packet number @idx has IPv6 header at @offset
+ * @idx:	Index of packet in pool
+ * @offset:	Pre-calculated IPv6 header offset
+ * @proto:	Filled with L4 protocol number
+ * @dlen:	Data length (payload excluding header extensions), set on return
+ *
+ * Return: pointer to L4 header, NULL if not found
+ */
+char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
+		 size_t *dlen)
+{
+	const struct ipv6_opt_hdr *o;
+	const struct ipv6hdr *ip6h;
+	char *base;
+	int hdrlen;
+	uint8_t nh;
+
+	base = packet_get(p, idx, 0, 0, NULL);
+	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
+	if (!ip6h)
+		return NULL;
+
+	offset += sizeof(*ip6h);
+
+	nh = ip6h->nexthdr;
+	if (!IPV6_NH_OPT(nh))
+		goto found;
+
+	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
+		nh = o->nexthdr;
+		hdrlen = (o->hdrlen + 1) * 8;
+
+		if (IPV6_NH_OPT(nh))
+			offset += hdrlen;
+		else
+			goto found;
+	}
+
+	return NULL;
+
+found:
+	if (nh == 59)
+		return NULL;
+
+	*proto = nh;
+	return base + offset;
+}
diff --git a/ip.h b/ip.h
new file mode 100644
index 000000000000..b2e08bc049f3
--- /dev/null
+++ b/ip.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (c) 2021 Red Hat GmbH
+ * Author: Stefano Brivio <sbrivio@redhat.com>
+ */
+
+#ifndef IP_H
+#define IP_H
+
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+
+#define IN4_IS_ADDR_UNSPECIFIED(a) \
+	((a)->s_addr == htonl_constant(INADDR_ANY))
+#define IN4_IS_ADDR_BROADCAST(a) \
+	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
+#define IN4_IS_ADDR_LOOPBACK(a) \
+	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+#define IN4_IS_ADDR_MULTICAST(a) \
+	(IN_MULTICAST(ntohl((a)->s_addr)))
+#define IN4_ARE_ADDR_EQUAL(a, b) \
+	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
+#define IN4ADDR_LOOPBACK_INIT \
+	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
+#define IN4ADDR_ANY_INIT \
+	{ .s_addr	= htonl_constant(INADDR_ANY) }
+
+#define L2_BUF_IP4_INIT(proto)						\
+	{								\
+		.version	= 4,					\
+		.ihl		= 5,					\
+		.tos		= 0,					\
+		.tot_len	= 0,					\
+		.id		= 0,					\
+		.frag_off	= 0,					\
+		.ttl		= 0xff,					\
+		.protocol	= (proto),				\
+		.saddr		= 0,					\
+		.daddr		= 0,					\
+	}
+#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
+				 (uint32_t)htons_constant(0xff00 | (proto)))
+
+#define L2_BUF_IP6_INIT(proto)						\
+	{								\
+		.priority	= 0,					\
+		.version	= 6,					\
+		.flow_lbl	= { 0 },				\
+		.payload_len	= 0,					\
+		.nexthdr	= (proto),				\
+		.hop_limit	= 255,					\
+		.saddr		= IN6ADDR_ANY_INIT,			\
+		.daddr		= IN6ADDR_ANY_INIT,			\
+	}
+
+struct ipv6hdr {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+#if __BYTE_ORDER == __BIG_ENDIAN
+	uint8_t			version:4,
+				priority:4;
+#else
+	uint8_t			priority:4,
+				version:4;
+#endif
+#pragma GCC diagnostic pop
+	uint8_t			flow_lbl[3];
+
+	uint16_t		payload_len;
+	uint8_t			nexthdr;
+	uint8_t			hop_limit;
+
+	struct in6_addr		saddr;
+	struct in6_addr		daddr;
+};
+
+struct ipv6_opt_hdr {
+	uint8_t			nexthdr;
+	uint8_t			hdrlen;
+	/*
+	 * TLV encoded option data follows.
+	 */
+} __attribute__((packed));	/* required for some archs */
+
+char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
+		 size_t *dlen);
+#endif /* IP_H */
diff --git a/ndp.c b/ndp.c
index 4c85ab8bcaee..c58f4b222b76 100644
--- a/ndp.c
+++ b/ndp.c
@@ -28,6 +28,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "log.h"
diff --git a/port_fwd.c b/port_fwd.c
index 6f6c836c57ad..e1ec31e2232c 100644
--- a/port_fwd.c
+++ b/port_fwd.c
@@ -21,6 +21,7 @@
 #include <stdio.h>
 
 #include "util.h"
+#include "ip.h"
 #include "port_fwd.h"
 #include "passt.h"
 #include "lineread.h"
diff --git a/qrap.c b/qrap.c
index 97f350a4bf0b..d59670621731 100644
--- a/qrap.c
+++ b/qrap.c
@@ -32,6 +32,7 @@
 #include <linux/icmpv6.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "arp.h"
 
diff --git a/tap.c b/tap.c
index 396dee7eef25..3ea03f720d6d 100644
--- a/tap.c
+++ b/tap.c
@@ -45,6 +45,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "arp.h"
 #include "dhcp.h"
diff --git a/tcp.c b/tcp.c
index 2ab443d5c3f2..45ef5146729a 100644
--- a/tcp.c
+++ b/tcp.c
@@ -289,6 +289,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "siphash.h"
diff --git a/tcp_splice.c b/tcp_splice.c
index 26d32065cd47..66575ca95a1e 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -49,6 +49,7 @@
 #include <sys/socket.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "log.h"
 #include "tcp_splice.h"
diff --git a/udp.c b/udp.c
index b5b8f8a7cd5b..d514c864ab5b 100644
--- a/udp.c
+++ b/udp.c
@@ -112,6 +112,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "pcap.h"
diff --git a/util.c b/util.c
index 21b35ff94db1..f73ea1d98a09 100644
--- a/util.c
+++ b/util.c
@@ -30,61 +30,6 @@
 #include "packet.h"
 #include "log.h"
 
-#define IPV6_NH_OPT(nh)							\
-	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
-	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
-	 (nh) == 140 || (nh) == 253 || (nh) == 254)
-
-/**
- * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
- * @p:		Packet pool, packet number @idx has IPv6 header at @offset
- * @idx:	Index of packet in pool
- * @offset:	Pre-calculated IPv6 header offset
- * @proto:	Filled with L4 protocol number
- * @dlen:	Data length (payload excluding header extensions), set on return
- *
- * Return: pointer to L4 header, NULL if not found
- */
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
-		 size_t *dlen)
-{
-	const struct ipv6_opt_hdr *o;
-	const struct ipv6hdr *ip6h;
-	char *base;
-	int hdrlen;
-	uint8_t nh;
-
-	base = packet_get(p, idx, 0, 0, NULL);
-	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
-	if (!ip6h)
-		return NULL;
-
-	offset += sizeof(*ip6h);
-
-	nh = ip6h->nexthdr;
-	if (!IPV6_NH_OPT(nh))
-		goto found;
-
-	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
-		nh = o->nexthdr;
-		hdrlen = (o->hdrlen + 1) * 8;
-
-		if (IPV6_NH_OPT(nh))
-			offset += hdrlen;
-		else
-			goto found;
-	}
-
-	return NULL;
-
-found:
-	if (nh == 59)
-		return NULL;
-
-	*proto = nh;
-	return base + offset;
-}
-
 /**
  * sock_l4() - Create and bind socket for given L4, add to epoll list
  * @c:		Execution context
diff --git a/util.h b/util.h
index d2320f8cc99a..f7c3dfee9972 100644
--- a/util.h
+++ b/util.h
@@ -110,22 +110,6 @@
 #define	htonl_constant(x)	(__bswap_constant_32(x))
 #endif
 
-#define IN4_IS_ADDR_UNSPECIFIED(a) \
-	((a)->s_addr == htonl_constant(INADDR_ANY))
-#define IN4_IS_ADDR_BROADCAST(a) \
-	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
-#define IN4_IS_ADDR_LOOPBACK(a) \
-	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
-#define IN4_IS_ADDR_MULTICAST(a) \
-	(IN_MULTICAST(ntohl((a)->s_addr)))
-#define IN4_ARE_ADDR_EQUAL(a, b) \
-	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
-#define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
-#define IN4ADDR_ANY_INIT \
-	{ .s_addr	= htonl_constant(INADDR_ANY) }
-
-
 #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
 int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 	     void *arg);
@@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_IP4_INIT(proto)						\
-	{								\
-		.version	= 4,					\
-		.ihl		= 5,					\
-		.tos		= 0,					\
-		.tot_len	= 0,					\
-		.id		= 0,					\
-		.frag_off	= 0,					\
-		.ttl		= 0xff,					\
-		.protocol	= (proto),				\
-		.saddr		= 0,					\
-		.daddr		= 0,					\
-	}
-#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
-				 (uint32_t)htons_constant(0xff00 | (proto)))
-
-#define L2_BUF_IP6_INIT(proto)						\
-	{								\
-		.priority	= 0,					\
-		.version	= 6,					\
-		.flow_lbl	= { 0 },				\
-		.payload_len	= 0,					\
-		.nexthdr	= (proto),				\
-		.hop_limit	= 255,					\
-		.saddr		= IN6ADDR_ANY_INIT,			\
-		.daddr		= IN6ADDR_ANY_INIT,			\
-	}
-
 #define RCVBUF_BIG		(2UL * 1024 * 1024)
 #define SNDBUF_BIG		(4UL * 1024 * 1024)
 #define SNDBUF_SMALL		(128UL * 1024)
@@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 #include <net/if.h>
 #include <limits.h>
 #include <stdint.h>
-#include <netinet/ip6.h>
 
 #include "packet.h"
 
 struct ctx;
 
-struct ipv6hdr {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wpedantic"
-#if __BYTE_ORDER == __BIG_ENDIAN
-	uint8_t			version:4,
-				priority:4;
-#else
-	uint8_t			priority:4,
-				version:4;
-#endif
-#pragma GCC diagnostic pop
-	uint8_t			flow_lbl[3];
-
-	uint16_t		payload_len;
-	uint8_t			nexthdr;
-	uint8_t			hop_limit;
-
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-};
-
-struct ipv6_opt_hdr {
-	uint8_t			nexthdr;
-	uint8_t			hdrlen;
-	/*
-	 * TLV encoded option data follows.
-	 */
-} __attribute__((packed));	/* required for some archs */
-
 /* cppcheck-suppress funcArgNamesDifferent */
 __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); }
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
-		 size_t *dlen);
 int sock_l4(const struct ctx *c, int af, uint8_t proto,
 	    const void *bind_addr, const char *ifname, uint16_t port,
 	    uint32_t data);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (4 preceding siblings ...)
  2024-02-14  8:56 ` [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  2:51   ` David Gibson
  2024-02-16  9:08   ` Stefano Brivio
  2024-02-14  8:56 ` [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
  2024-02-14  8:56 ` [PATCH v2 8/8] tap: make tap_update_mac() generic Laurent Vivier
  7 siblings, 2 replies; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

We can find the same function to compute the IPv4 header
checksum in tcp.c, udp.c and tap.c

Use the function defined for tap.c, csum_ip4_header(), but
with the code used in tcp.c and udp.c as it doesn't need a fully
initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - use csum_ip4_header() from checksum.c
      - use code from tcp.c and udp.c in csum_ip4_header()
      - use "const struct iphfr *", check is not updated by the
        function but by the caller.

 checksum.c | 16 ++++++++++++----
 checksum.h |  2 +-
 tap.c      |  2 +-
 tcp.c      | 22 +---------------------
 udp.c      | 23 +++++------------------
 5 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/checksum.c b/checksum.c
index ac2bc49f7eb0..5613187a1c82 100644
--- a/checksum.c
+++ b/checksum.c
@@ -57,6 +57,7 @@
 #include <linux/icmpv6.h>
 
 #include "util.h"
+#include "ip.h"
 
 /* Checksums are optional for UDP over IPv4, so we usually just set
  * them to 0.  Change this to 1 to calculate real UDP over IPv4
@@ -115,13 +116,20 @@ uint16_t csum_fold(uint32_t sum)
 uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 /**
- * csum_ip4_header() - Calculate and set IPv4 header checksum
+ * csum_ip4_header() - Calculate IPv4 header checksum
  * @ip4h:	IPv4 header
  */
-void csum_ip4_header(struct iphdr *ip4h)
+uint16_t csum_ip4_header(const struct iphdr *ip4h)
 {
-	ip4h->check = 0;
-	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
+	uint32_t sum = L2_BUF_IP4_PSUM(ip4h->protocol);
+
+	sum += ip4h->tot_len;
+	sum += (ip4h->saddr >> 16) & 0xffff;
+	sum += ip4h->saddr & 0xffff;
+	sum += (ip4h->daddr >> 16) & 0xffff;
+	sum += ip4h->daddr & 0xffff;
+
+	return ~csum_fold(sum);
 }
 
 /**
diff --git a/checksum.h b/checksum.h
index 6a20297a5826..b87ecd720df5 100644
--- a/checksum.h
+++ b/checksum.h
@@ -13,7 +13,7 @@ struct icmp6hdr;
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
-void csum_ip4_header(struct iphdr *ip4h);
+uint16_t csum_ip4_header(const struct iphdr *ip4h);
 void csum_udp4(struct udphdr *udp4hr,
 	       struct in_addr saddr, struct in_addr daddr,
 	       const void *payload, size_t len);
diff --git a/tap.c b/tap.c
index 3ea03f720d6d..70f36a55314f 100644
--- a/tap.c
+++ b/tap.c
@@ -160,7 +160,7 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst,
 	ip4h->protocol = proto;
 	ip4h->saddr = src.s_addr;
 	ip4h->daddr = dst.s_addr;
-	csum_ip4_header(ip4h);
+	ip4h->check = csum_ip4_header(ip4h);
 	return ip4h + 1;
 }
 
diff --git a/tcp.c b/tcp.c
index 45ef5146729a..35e240f4ffc3 100644
--- a/tcp.c
+++ b/tcp.c
@@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
 		trace("TCP: failed to set SO_SNDBUF to %i", v);
 }
 
-/**
- * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one
- * @buf:	L2 packet buffer with final IPv4 header
- */
-static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf)
-{
-	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP);
-
-	sum += buf->iph.tot_len;
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-
-	buf->iph.check = (uint16_t)~csum_fold(sum);
-}
-
 /**
  * tcp_update_check_tcp4() - Update TCP checksum from stored one
  * @buf:	L2 packet buffer with final IPv4 header
@@ -1393,10 +1376,7 @@ do {									\
 		b->iph.saddr = a4->s_addr;
 		b->iph.daddr = c->ip4.addr_seen.s_addr;
 
-		if (check)
-			b->iph.check = *check;
-		else
-			tcp_update_check_ip4(b);
+		b->iph.check = check ? *check : csum_ip4_header(&b->iph);
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
diff --git a/udp.c b/udp.c
index d514c864ab5b..e645c800a823 100644
--- a/udp.c
+++ b/udp.c
@@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
 	}
 }
 
-/**
- * udp_update_check4() - Update checksum with variable parts from stored one
- * @buf:	L2 packet buffer with final IPv4 header
- */
-static void udp_update_check4(struct udp4_l2_buf_t *buf)
-{
-	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
-
-	sum += buf->iph.tot_len;
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-
-	buf->iph.check = (uint16_t)~csum_fold(sum);
-}
-
 /**
  * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
  * @eth_d:	Ethernet destination address, NULL if unchanged
@@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
  *
  * Return: size of tap frame with headers
  */
+#pragma GCC diagnostic push
+/* ignore unaligned pointer value warning for &b->iph */
+#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
 static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
@@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 		b->iph.saddr = b->s_in.sin_addr.s_addr;
 	}
 
-	udp_update_check4(b);
+	b->iph.check = csum_ip4_header(&b->iph);
 	b->uh.source = b->s_in.sin_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
 
 	return tap_iov_len(c, &b->taph, ip_len);
 }
+#pragma GCC diagnostic pop
 
 /**
  * udp_update_hdr6() - Update headers for one IPv6 datagram
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (5 preceding siblings ...)
  2024-02-14  8:56 ` [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  3:12   ` David Gibson
  2024-02-14  8:56 ` [PATCH v2 8/8] tap: make tap_update_mac() generic Laurent Vivier
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

The TCP and UDP checksums are computed using the data in the TCP/UDP
payload but also some informations in the IP header (protocol,
length, source and destination addresses).

We add two functions, proto_ipv4_header_psum() and
proto_ipv6_header_psum(), to compute the checksum of the IP
header part.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - move new function to checksum.c
      - use _psum rather than _checksum in the name
      - replace csum_udp4() and csum_udp6() by the new function

 checksum.c | 70 ++++++++++++++++++++----------------------------------
 checksum.h | 11 ++++-----
 tap.c      | 19 +++++++++++++--
 tcp.c      | 42 +++++++++++++-------------------
 udp.c      | 11 +++++----
 5 files changed, 72 insertions(+), 81 deletions(-)

diff --git a/checksum.c b/checksum.c
index 5613187a1c82..90dad96ee2c1 100644
--- a/checksum.c
+++ b/checksum.c
@@ -59,12 +59,6 @@
 #include "util.h"
 #include "ip.h"
 
-/* Checksums are optional for UDP over IPv4, so we usually just set
- * them to 0.  Change this to 1 to calculate real UDP over IPv4
- * checksums
- */
-#define UDP4_REAL_CHECKSUMS	0
-
 /**
  * sum_16b() - Calculate sum of 16-bit words
  * @buf:	Input buffer
@@ -133,31 +127,23 @@ uint16_t csum_ip4_header(const struct iphdr *ip4h)
 }
 
 /**
- * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
- * @udp4hr:	UDP header, initialised apart from checksum
- * @saddr:	IPv4 source address
- * @daddr:	IPv4 destination address
- * @payload:	ICMPv4 packet payload
- * @len:	Length of @payload (not including UDP)
+ * proto_ipv4_header_psum() - Calculates the partial checksum of an
+ * 			      IPv4 header for UDP or TCP
+ * @param:	ip4h Pointer to the IPv4 header structure
+ * @proto:	proto Protocol number
+ * Returns:	Partial checksum of the IPv4 header
  */
-void csum_udp4(struct udphdr *udp4hr,
-	       struct in_addr saddr, struct in_addr daddr,
-	       const void *payload, size_t len)
+uint32_t proto_ipv4_header_psum(struct iphdr *ip4h, uint8_t proto)
 {
-	/* UDP checksums are optional, so don't bother */
-	udp4hr->check = 0;
-
-	if (UDP4_REAL_CHECKSUMS) {
-		/* UNTESTED: if we did want real UDPv4 checksums, this
-		 * is roughly what we'd need */
-		uint32_t psum = csum_fold(saddr.s_addr)
-			+ csum_fold(daddr.s_addr)
-			+ htons(len + sizeof(*udp4hr))
-			+ htons(IPPROTO_UDP);
-		/* Add in partial checksum for the UDP header alone */
-		psum += sum_16b(udp4hr, sizeof(*udp4hr));
-		udp4hr->check = csum(payload, len, psum);
-	}
+	uint32_t sum = htons(proto);
+
+	sum += (ip4h->saddr >> 16) & 0xffff;
+	sum += ip4h->saddr & 0xffff;
+	sum += (ip4h->daddr >> 16) & 0xffff;
+	sum += ip4h->daddr & 0xffff;
+	sum += htons(ntohs(ip4h->tot_len) - 20);
+
+	return sum;
 }
 
 /**
@@ -179,24 +165,20 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 }
 
 /**
- * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet
- * @udp6hr:	UDP header, initialised apart from checksum
- * @payload:	UDP packet payload
- * @len:	Length of @payload (not including UDP header)
+ * proto_ipv6_header_psum() - Calculates the partial checksum of an
+ * 			      IPv6 header for UDP or TCP
+ * @param:	ip6h Pointer to the IPv4 header structure
+ * @proto:	proto Protocol number
+ * Returns:	Partial checksum of the IPv6 header
  */
-void csum_udp6(struct udphdr *udp6hr,
-	       const struct in6_addr *saddr, const struct in6_addr *daddr,
-	       const void *payload, size_t len)
+uint32_t proto_ipv6_header_psum(struct ipv6hdr *ip6h, uint8_t proto)
 {
-	/* Partial checksum for the pseudo-IPv6 header */
-	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
-		        sum_16b(daddr, sizeof(*daddr)) +
-		        htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
+	uint32_t sum = htons(proto) + ip6h->payload_len;
+
+	sum += sum_16b(&ip6h->saddr, sizeof(ip6h->saddr));
+	sum += sum_16b(&ip6h->daddr, sizeof(ip6h->daddr));
 
-	udp6hr->check = 0;
-	/* Add in partial checksum for the UDP header alone */
-	psum += sum_16b(udp6hr, sizeof(*udp6hr));
-	udp6hr->check = csum(payload, len, psum);
+	return sum;
 }
 
 /**
diff --git a/checksum.h b/checksum.h
index b87ecd720df5..10533f708853 100644
--- a/checksum.h
+++ b/checksum.h
@@ -6,24 +6,23 @@
 #ifndef CHECKSUM_H
 #define CHECKSUM_H
 
+struct iphdr;
 struct udphdr;
 struct icmphdr;
+struct ipv6hdr;
 struct icmp6hdr;
 
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
 uint16_t csum_ip4_header(const struct iphdr *ip4h);
-void csum_udp4(struct udphdr *udp4hr,
-	       struct in_addr saddr, struct in_addr daddr,
-	       const void *payload, size_t len);
+uint32_t proto_ipv4_header_psum(struct iphdr *ip4h, uint8_t proto);
 void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
-void csum_udp6(struct udphdr *udp6hr,
-	       const struct in6_addr *saddr, const struct in6_addr *daddr,
-	       const void *payload, size_t len);
+uint32_t proto_ipv6_header_psum(struct ipv6hdr *ip6h, uint8_t proto);
 void csum_icmp6(struct icmp6hdr *icmp6hr,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
 		const void *payload, size_t len);
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
 uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init);
 
diff --git a/tap.c b/tap.c
index 70f36a55314f..02b51100d089 100644
--- a/tap.c
+++ b/tap.c
@@ -58,6 +58,12 @@
 #include "tap.h"
 #include "log.h"
 
+/* Checksums are optional for UDP over IPv4, so we usually just set
+ * them to 0.  Change this to 1 to calculate real UDP over IPv4
+ * checksums
+ */
+#define UDP4_REAL_CHECKSUMS     0
+
 /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
 static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf);
 static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf);
@@ -188,7 +194,12 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 	uh->source = htons(sport);
 	uh->dest = htons(dport);
 	uh->len = htons(udplen);
-	csum_udp4(uh, src, dst, in, len);
+	uh->check = 0;
+	if (UDP4_REAL_CHECKSUMS) {
+		uint32_t sum = proto_ipv4_header_psum(ip4h, IPPROTO_UDP);
+		sum = csum_unfolded(uh, sizeof(struct udphdr), sum);
+		uh->check = csum(in, len, sum);
+	}
 	memcpy(data, in, len);
 
 	if (tap_send(c, buf, len + (data - buf)) < 0)
@@ -271,11 +282,15 @@ void tap_udp6_send(const struct ctx *c,
 	void *uhp = tap_push_ip6h(ip6h, src, dst, udplen, IPPROTO_UDP, flow);
 	struct udphdr *uh = (struct udphdr *)uhp;
 	char *data = (char *)(uh + 1);
+	uint32_t sum;
 
 	uh->source = htons(sport);
 	uh->dest = htons(dport);
 	uh->len = htons(udplen);
-	csum_udp6(uh, src, dst, in, len);
+	uh->check = 0;
+	sum = proto_ipv6_header_psum(ip6h, IPPROTO_UDP);
+	sum = csum_unfolded(uh, sizeof(struct udphdr), sum);
+	uh->check = csum(in, len, sum);
 	memcpy(data, in, len);
 
 	if (tap_send(c, buf, len + (data - buf)) < 1)
diff --git a/tcp.c b/tcp.c
index 35e240f4ffc3..6a0020f708c0 100644
--- a/tcp.c
+++ b/tcp.c
@@ -938,39 +938,25 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
  * tcp_update_check_tcp4() - Update TCP checksum from stored one
  * @buf:	L2 packet buffer with final IPv4 header
  */
-static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf)
+static uint16_t tcp_update_check_tcp4(struct iphdr *iph)
 {
-	uint16_t tlen = ntohs(buf->iph.tot_len) - 20;
-	uint32_t sum = htons(IPPROTO_TCP);
+	struct tcphdr *th = (struct tcphdr *)(iph + 1);
+	uint16_t tlen = ntohs(iph->tot_len) - 20;
+	uint32_t sum = proto_ipv4_header_psum(iph, IPPROTO_TCP);
 
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-	sum += htons(ntohs(buf->iph.tot_len) - 20);
-
-	buf->th.check = 0;
-	buf->th.check = csum(&buf->th, tlen, sum);
+	return csum(th, tlen, sum);
 }
 
 /**
  * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6
  * @buf:	L2 packet buffer with final IPv6 header
  */
-static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf)
+static uint16_t tcp_update_check_tcp6(struct ipv6hdr *ip6h)
 {
-	int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr);
-
-	buf->ip6h.hop_limit = IPPROTO_TCP;
-	buf->ip6h.version = 0;
-	buf->ip6h.nexthdr = 0;
+	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
+	uint32_t sum = proto_ipv6_header_psum(ip6h, IPPROTO_TCP);
 
-	buf->th.check = 0;
-	buf->th.check = csum(&buf->ip6h, len, 0);
-
-	buf->ip6h.hop_limit = 255;
-	buf->ip6h.version = 6;
-	buf->ip6h.nexthdr = IPPROTO_TCP;
+	return csum(th, ntohs(ip6h->payload_len), sum);
 }
 
 /**
@@ -1380,7 +1366,8 @@ do {									\
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
-		tcp_update_check_tcp4(b);
+		b->th.check = 0;
+		b->th.check = tcp_update_check_tcp4(&b->iph);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
@@ -1399,7 +1386,12 @@ do {									\
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
-		tcp_update_check_tcp6(b);
+		b->th.check = 0;
+		b->th.check = tcp_update_check_tcp6(&b->ip6h);
+
+		b->ip6h.hop_limit = 255;
+		b->ip6h.version = 6;
+		b->ip6h.nexthdr = IPPROTO_TCP;
 
 		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
diff --git a/udp.c b/udp.c
index e645c800a823..bf24288d5751 100644
--- a/udp.c
+++ b/udp.c
@@ -618,6 +618,9 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
  *
  * Return: size of tap frame with headers
  */
+#pragma GCC diagnostic push
+/* ignore unaligned pointer value warning for &b->ip6h */
+#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
 static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
@@ -673,16 +676,16 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 	b->uh.source = b->s_in6.sin6_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = b->ip6h.payload_len;
-
-	b->ip6h.hop_limit = IPPROTO_UDP;
-	b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0;
-	b->uh.check = csum(&b->ip6h, ip_len, 0);
+	b->uh.check = 0;
+	b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
+			   proto_ipv6_header_psum(&b->ip6h, IPPROTO_UDP));
 	b->ip6h.version = 6;
 	b->ip6h.nexthdr = IPPROTO_UDP;
 	b->ip6h.hop_limit = 255;
 
 	return tap_iov_len(c, &b->taph, ip_len);
 }
+#pragma GCC diagnostic pop
 
 /**
  * udp_tap_send() - Prepare UDP datagrams and send to tap interface
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 8/8] tap: make tap_update_mac() generic
  2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (6 preceding siblings ...)
  2024-02-14  8:56 ` [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
@ 2024-02-14  8:56 ` Laurent Vivier
  2024-02-15  3:13   ` David Gibson
  7 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-14  8:56 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Use ethhdr rather than tap_hdr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v2:
      - update function comment
      - move the patch earlier in the series

 tap.c | 10 +++++-----
 tap.h |  2 +-
 tcp.c |  8 ++++----
 udp.c |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tap.c b/tap.c
index 02b51100d089..9ffb0f0a88d4 100644
--- a/tap.c
+++ b/tap.c
@@ -457,18 +457,18 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n)
 }
 
 /**
- * tap_update_mac() - Update tap L2 header with new Ethernet addresses
- * @taph:	Tap headers to update
+ * eth_update_mac() - Update tap L2 header with new Ethernet addresses
+ * @eh:		Ethernet headers to update
  * @eth_d:	Ethernet destination address, NULL if unchanged
  * @eth_s:	Ethernet source address, NULL if unchanged
  */
-void tap_update_mac(struct tap_hdr *taph,
+void eth_update_mac(struct ethhdr *eh,
 		    const unsigned char *eth_d, const unsigned char *eth_s)
 {
 	if (eth_d)
-		memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest));
+		memcpy(eh->h_dest, eth_d, sizeof(eh->h_dest));
 	if (eth_s)
-		memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source));
+		memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
 }
 
 PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
diff --git a/tap.h b/tap.h
index 466d91466c3d..437b9aa2b43f 100644
--- a/tap.h
+++ b/tap.h
@@ -74,7 +74,7 @@ void tap_icmp6_send(const struct ctx *c,
 		    const void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n);
-void tap_update_mac(struct tap_hdr *taph,
+void eth_update_mac(struct ethhdr *eh,
 		    const unsigned char *eth_d, const unsigned char *eth_s);
 void tap_listen_handler(struct ctx *c, uint32_t events);
 void tap_handler_pasta(struct ctx *c, uint32_t events,
diff --git a/tcp.c b/tcp.c
index 6a0020f708c0..1c80299111f3 100644
--- a/tcp.c
+++ b/tcp.c
@@ -974,10 +974,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
 		struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i];
 
-		tap_update_mac(&b4->taph, eth_d, eth_s);
-		tap_update_mac(&b6->taph, eth_d, eth_s);
-		tap_update_mac(&b4f->taph, eth_d, eth_s);
-		tap_update_mac(&b6f->taph, eth_d, eth_s);
+		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b4f->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6f->taph.eh, eth_d, eth_s);
 	}
 }
 
diff --git a/udp.c b/udp.c
index bf24288d5751..97c1292f6b59 100644
--- a/udp.c
+++ b/udp.c
@@ -283,8 +283,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
-		tap_update_mac(&b4->taph, eth_d, eth_s);
-		tap_update_mac(&b6->taph, eth_d, eth_s);
+		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
 	}
 }
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/8] iov: add some functions to manage iovec
  2024-02-14  8:56 ` [PATCH v2 1/8] iov: add some functions to manage iovec Laurent Vivier
@ 2024-02-15  0:24   ` David Gibson
  2024-02-15  0:32     ` David Gibson
  2024-02-16  5:29     ` Stefano Brivio
  0 siblings, 2 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  0:24 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 12906 bytes --]

On Wed, Feb 14, 2024 at 09:56:21AM +0100, Laurent Vivier wrote:
> Introduce functions to copy to/from a buffer from/to an iovec array,
> to compute data length in in bytes of an iovec and to copy memory from
> an iovec to another.
> 
> iov_from_buf(), iov_to_buf(), iov_size(), iov_copy().
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>        - reorder added files in alphanetical order in Makefile
>        - update comments, cosmetic cleanup
>        - rename iov_from_buf_full/iov_to_buf_full to
>          iov_fill_from_buf/iov_fill_to_buf
>        - split loops that manage offset and bytes copy.
>        - move iov_from_buf()/iov_to_buf() to iov.c
> 
>  Makefile |   8 +--
>  iov.c    | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  iov.h    |  43 +++++++++++
>  3 files changed, 259 insertions(+), 4 deletions(-)
>  create mode 100644 iov.c
>  create mode 100644 iov.h
> 
> diff --git a/Makefile b/Makefile
> index af4fa87e7e13..156398b3844e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\"
>  FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
>  
>  PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
> -	igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \
> -	passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \
> -	util.c
> +	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
> +	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
> +	tcp_splice.c udp.c util.c
>  QRAP_SRCS = qrap.c
>  SRCS = $(PASST_SRCS) $(QRAP_SRCS)
>  
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
>  PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
> -	flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \
> +	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
>  	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
>  	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
>  HEADERS = $(PASST_HEADERS) seccomp.h
> diff --git a/iov.c b/iov.c
> new file mode 100644
> index 000000000000..73dd5cf25d0d
> --- /dev/null
> +++ b/iov.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * iov.h - helpers for using (partial) iovecs.
> + *
> + * Copyrigh (c) 2024 Red Hat

Typo: s/Copyrigh/Copyright/

AIUI, the "(c) 2024" has no real purpose, see
https://source.redhat.com/departments/legal/redhatintellectualproperty/intellectual_property_and_ip_litigation_wiki/copyright_notices_in_source_code

> + * Author: Laurent Vivier <lvivier@redhat.com>
> + *
> + * This file also contains code originally from QEMU include/qemu/iov.h
> + * and licensed under the following terms:
> + *
> + * Copyright (C) 2010 Red Hat, Inc.
> + *
> + * Author(s):
> + *  Amit Shah <amit.shah@redhat.com>
> + *  Michael Tokarev <mjt@tls.msk.ru>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.

The description of the provenance of the code and its authorship is
useful.  I don't think the second copyright notice is useful in this
case, since it's also Red Hat, like the first.  Likewise, I don't
think the GPL invocation is useful, since we're not changing that
license.

> + */
> +#include <sys/socket.h>
> +
> +#include "util.h"
> +#include "iov.h"
> +
> +/**
> + * iov_from_buf - Copy data from a buffer to a scatter/gather
> + *                I/O vector (struct iovec) efficiently.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the
> + *             scatter/gather I/O vector.

I feel like an IO vector is a common enough concept that we could just
say "IO vector" rather than this rather wordy description.

> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Byte offset in the iov array where copying should start.
> + * @buf:       Pointer to the source buffer containing the data to copy.
> + * @bytes:     Total number of bytes to copy from buf to iov.
> + *
> + * Returns:    The number of bytes successfully copied.
> + */
> +size_t iov_from_buf(const struct iovec *iov, unsigned int iov_cnt,
> +		    size_t offset, const void *buf, size_t bytes)
> +{
> +	if (__builtin_constant_p(bytes) && iov_cnt &&
> +		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
> +		memcpy((char *)iov[0].iov_base + offset, buf, bytes);
> +		return bytes;
> +	}
> +
> +	return iov_fill_from_buf(iov, iov_cnt, offset, buf, bytes);
> +}
> +
> +/**
> + * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to
> + *		a buffer efficiently.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the scatter/gather
> + *             I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Offset within the first element of iov from where copying should start.
> + * @buf:       Pointer to the destination buffer where data will be copied.
> + * @bytes:     Total number of bytes to copy from iov to buf.
> + *
> + * Returns:    The number of bytes successfully copied.
> + */
> +size_t iov_to_buf(const struct iovec *iov, unsigned int iov_cnt,
> +		  size_t offset, void *buf, size_t bytes)
> +{
> +	if (__builtin_constant_p(bytes) && iov_cnt &&
> +		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
> +		memcpy(buf, (char *)iov[0].iov_base + offset, bytes);
> +		return bytes;
> +	}
> +
> +	return iov_fill_to_buf(iov, iov_cnt, offset, buf, bytes);
> +}
> +
> +/**
> + * iov_fill_from_buf - Copy data from a buffer to a scatter/gather
> + *                     I/O vector (struct iovec) until either all bytes
> + *                     are copied or all elements in the vector are filled.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the scatter/gather
> + *             I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Byte offset in the iov array where copying should start.
> + * @buf:       Pointer to the source buffer containing the data to copy.
> + * @bytes:     Total number of bytes to copy from buf to iov.
> + *
> + * Returns:    The total number of bytes successfully copied
> + *
> + */
> +size_t iov_fill_from_buf(const struct iovec *iov, unsigned int iov_cnt,
> +			 size_t offset, const void *buf, size_t bytes)

We could just open code this in iov_from_buf(), since I don't think we
ever have a reason to call it directly.

> +{
> +	unsigned int i;
> +	size_t copied;
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
> +		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
> +
> +		memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied,
> +		       len);
> +		copied += len;
> +		offset = 0;
> +	}
> +
> +	return copied;
> +}
> +
> +/**
> + * iov_fill_to_buf - Copy data from a scatter/gather I/O vector (struct iovec)
> + *                   to a buffer until either all bytes are copied or all
> + *                   elements in the vector are read.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the
> + *             scatter/gather I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Byte offset in the iov array where copying should start.
> + * @buf:       Pointer to the destination buffer where data will be copied.
> + * @bytes:     Total number of bytes to copy from iov to buf.
> + *
> + * Returns:    The total number of bytes successfully copied
> + */
> +size_t iov_fill_to_buf(const struct iovec *iov, unsigned int iov_cnt,
> +		       size_t offset, void *buf, size_t bytes)
> +{
> +	unsigned int i;
> +	size_t copied;
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
> +		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
> +		memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
> +		       len);
> +		copied += len;
> +		offset = 0;
> +	}
> +
> +	return copied;
> +}
> +
> +/**
> + * iov_size - Calculate the total size of a scatter/gather I/O vector
> + *            (struct iovec).
> + *
> + * @iov:       Pointer to the array of struct iovec describing the
> + *             scatter/gather I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + *
> + * Returns:    The total size in bytes.
> + */
> +size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt)
> +{
> +	size_t len;
> +	unsigned int i;

Other order for these locals please (longest to shortest).

> +	for (i = 0, len = 0; i < iov_cnt; i++) {
> +		len += iov[i].iov_len;
> +	}

No braces here (passt style, again).

> +	return len;
> +}
> +
> +/**
> + * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to
> + *            another.
> + *
> + * @dst_iov:      Pointer to the destination array of struct iovec describing
> + *                the scatter/gather I/O vector to copy to.
> + * @dst_iov_cnt:  Number of elements in the destination iov array.
> + * @iov:          Pointer to the source array of struct iovec describing
> + *                the scatter/gather I/O vector to copy from.
> + * @iov_cnt:      Number of elements in the source iov array.
> + * @offset:       Offset within the source iov from where copying should start.
> + * @bytes:        Total number of bytes to copy from iov to dst_iov.
> + *
> + * Returns:       The number of elements successfully copied to the destination
> + *                iov array.
> + */
> +unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
> +		  const struct iovec *iov, unsigned int iov_cnt,
> +		  size_t offset, size_t bytes)
> +{
> +	unsigned int i, j;
> +	size_t len;
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) {
> +		len = MIN(bytes, iov[i].iov_len - offset);
> +
> +		dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
> +		dst_iov[j].iov_len = len;
> +		j++;
> +		bytes -= len;
> +		offset = 0;
> +	}
> +
> +	return j;
> +}
> diff --git a/iov.h b/iov.h
> new file mode 100644
> index 000000000000..0153acca9e62
> --- /dev/null
> +++ b/iov.h
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * iov.c - helpers for using (partial) iovecs.
> + *
> + * Copyrigh (c) 2024 Red Hat

Same typo again.

> + * Author: Laurent Vivier <lvivier@redhat.com>
> + *
> + * This file also contains code originally from QEMU include/qemu/iov.h
> + * and licensed under the following terms:
> + *
> + * Copyright (C) 2010 Red Hat, Inc.
> + *
> + * Author(s):
> + *  Amit Shah <amit.shah@redhat.com>
> + *  Michael Tokarev <mjt@tls.msk.ru>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + */
> +
> +#ifndef IOVEC_H
> +#define IOVEC_H
> +
> +#include <unistd.h>
> +#include <string.h>
> +
> +size_t iov_fill_from_buf(const struct iovec *iov, unsigned int iov_cnt,
> +			 size_t offset, const void *buf, size_t bytes);
> +size_t iov_fill_to_buf(const struct iovec *iov, const unsigned int iov_cnt,
> +		       size_t offset, void *buf, size_t bytes);
> +size_t iov_from_buf(const struct iovec *iov, unsigned int iov_cnt,
> +                    size_t offset, const void *buf, size_t bytes);
> +size_t iov_to_buf(const struct iovec *iov, unsigned int iov_cnt,
> +                  size_t offset, void *buf, size_t bytes);
> +
> +size_t iov_size(const struct iovec *iov, const unsigned int iov_cnt);
> +unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
> +		  const struct iovec *iov, unsigned int iov_cnt,
> +		  size_t offset, size_t bytes);
> +#endif /* IOVEC_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/8] iov: add some functions to manage iovec
  2024-02-15  0:24   ` David Gibson
@ 2024-02-15  0:32     ` David Gibson
  2024-02-16  5:29     ` Stefano Brivio
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  0:32 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

On Thu, Feb 15, 2024 at 11:24:38AM +1100, David Gibson wrote:
> On Wed, Feb 14, 2024 at 09:56:21AM +0100, Laurent Vivier wrote:
> > Introduce functions to copy to/from a buffer from/to an iovec array,
> > to compute data length in in bytes of an iovec and to copy memory from
> > an iovec to another.
> > 
> > iov_from_buf(), iov_to_buf(), iov_size(), iov_copy().
> > 
> > Signed-off-by: Laurent Vivier <lvivier@redhat.com>

[snip]
> > +size_t iov_from_buf(const struct iovec *iov, unsigned int iov_cnt,
> > +		    size_t offset, const void *buf, size_t bytes)

One other thing I didn't think of on my first reply: although it
probably doesn't matter in practice, struct msghdr uses a size_t for
the length of the vector.  So, I think it makes sense for us to
standardise on that too.  To confuse matters, writev() uses a (signed)
int, but we work with recvmsg() etc. more than we do with writev() so
I think size_t is a better choice.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/8] pcap: add pcap_iov()
  2024-02-14  8:56 ` [PATCH v2 2/8] pcap: add pcap_iov() Laurent Vivier
@ 2024-02-15  0:35   ` David Gibson
  2024-02-16  5:30   ` Stefano Brivio
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  0:35 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

On Wed, Feb 14, 2024 at 09:56:22AM +0100, Laurent Vivier wrote:

Some kind of commit message, please, even if it's minimal.

> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - introduce pcap_header(), a common helper to write
>         packet header
>       - use writev() rather than write() in a loop
>       - add functions comment
> 
>  pcap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  pcap.h |  1 +
>  2 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/pcap.c b/pcap.c
> index 501d52d4992b..3869a403dd0f 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -20,6 +20,7 @@
>  #include <sys/time.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/uio.h>
>  #include <fcntl.h>
>  #include <time.h>
>  #include <errno.h>
> @@ -31,6 +32,7 @@
>  #include "util.h"
>  #include "passt.h"
>  #include "log.h"
> +#include "iov.h"
>  
>  #define PCAP_VERSION_MINOR 4
>  
> @@ -65,6 +67,28 @@ struct pcap_pkthdr {
>  	uint32_t len;
>  };
>  
> +/*
> + * pcap_header - Write a pcap packet header to the pcap file descriptor (pcap_fd).
> + *
> + * @len:       Length of the packet data.
> + * @tv:        Pointer to a timeval struct containing the timestamp for the packet.

Just "timestamp for packet" would suffice.

> + *
> + * Returns; -1 in case of error, otherwise, 0 to indicate success.
> + */
> +static int pcap_header(size_t len, const struct timeval *tv)
> +{
> +	struct pcap_pkthdr h;
> +
> +	h.tv_sec = tv->tv_sec;
> +	h.tv_usec = tv->tv_usec;
> +	h.caplen = h.len = len;
> +
> +	if (write(pcap_fd, &h, sizeof(h)) < 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
>  /**
>   * pcap_frame() - Capture a single frame to pcap file with given timestamp
>   * @pkt:	Pointer to data buffer, including L2 headers
> @@ -75,13 +99,7 @@ struct pcap_pkthdr {
>   */
>  static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
>  {
> -	struct pcap_pkthdr h;
> -
> -	h.tv_sec = tv->tv_sec;
> -	h.tv_usec = tv->tv_usec;
> -	h.caplen = h.len = len;
> -
> -	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
> +	if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0)
>  		return -errno;
>  
>  	return 0;
> @@ -130,6 +148,35 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
>  	}
>  }
>  
> +/*
> + * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov)
> + *             to a pcap file descriptor (pcap_fd).
> + *
> + * @iov:       Pointer to the array of struct iovec describing the scatter/gather
> + *             I/O vector containing packet data to write, including L2 header
> + * @n:         Number of elements in the iov array.
> + */
> +void pcap_iov(const struct iovec *iov, unsigned int n)
> +{
> +	struct timeval tv;
> +	size_t len;
> +
> +	if (pcap_fd == -1)
> +		return;
> +
> +	gettimeofday(&tv, NULL);
> +
> +	len = iov_size(iov, n);
> +
> +	if (pcap_header(len, &tv) < 0) {
> +		debug("Cannot write pcap header");
> +		return;
> +	}
> +
> +	if (writev(pcap_fd, iov, n) < 0)
> +		debug("Cannot log packet using writev(), n = %u\n", n);

I'm not convinced the length of the io vector is particularly useful
here.  strerror(errno) might be more useful, although the existing
pcap() helpers also don't print that.

> +}
> +
>  /**
>   * pcap_init() - Initialise pcap file
>   * @c:		Execution context
> diff --git a/pcap.h b/pcap.h
> index da5a7e846b72..732a0ddf14cc 100644
> --- a/pcap.h
> +++ b/pcap.h
> @@ -8,6 +8,7 @@
>  
>  void pcap(const char *pkt, size_t len);
>  void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
> +void pcap_iov(const struct iovec *iov, unsigned int n);
>  void pcap_init(struct ctx *c);
>  
>  #endif /* PCAP_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/8] checksum: align buffers
  2024-02-14  8:56 ` [PATCH v2 3/8] checksum: align buffers Laurent Vivier
@ 2024-02-15  0:40   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  0:40 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5086 bytes --]

On Wed, Feb 14, 2024 at 09:56:23AM +0100, Laurent Vivier wrote:
> if buffer is not aligned use sum_16b() only on the not aligned

Nit: s/if/If/

> part, and then use csum_avx2() on the remaining part
> 
> Remove unneeded now function csum_unaligned().
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v2:
>       - use ROUND_UP() and sizeof(__m256i)
>       - fix function comment
>       - remove csum_unaligned() and use csum() instead
> 
>  checksum.c | 47 ++++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index f21c9b7a14d1..65486b4625ba 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -56,6 +56,8 @@
>  #include <linux/udp.h>
>  #include <linux/icmpv6.h>
>  
> +#include "util.h"
> +
>  /* Checksums are optional for UDP over IPv4, so we usually just set
>   * them to 0.  Change this to 1 to calculate real UDP over IPv4
>   * checksums
> @@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum)
>  	return sum;
>  }
>  
> -/**
> - * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data
> - * @buf:	Input data
> - * @len:	Input length
> - * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
> - *
> - * Return: 16-bit IPv4-style checksum
> - */
> -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
> -__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
> -uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> -{
> -	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> -}
> +uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
>  /**
>   * csum_ip4_header() - Calculate and set IPv4 header checksum
> @@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
>  void csum_ip4_header(struct iphdr *ip4h)
>  {
>  	ip4h->check = 0;
> -	ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0);
> +	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
>  }
>  
>  /**
> @@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr,
>  			+ htons(IPPROTO_UDP);
>  		/* Add in partial checksum for the UDP header alone */
>  		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> -		udp4hr->check = csum_unaligned(payload, len, psum);
> +		udp4hr->check = csum(payload, len, psum);
>  	}
>  }
>  
> @@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
>  	/* Partial checksum for ICMP header alone */
>  	psum = sum_16b(icmp4hr, sizeof(*icmp4hr));
>  
> -	icmp4hr->checksum = csum_unaligned(payload, len, psum);
> +	icmp4hr->checksum = csum(payload, len, psum);
>  }
>  
>  /**
> @@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr,
>  	udp6hr->check = 0;
>  	/* Add in partial checksum for the UDP header alone */
>  	psum += sum_16b(udp6hr, sizeof(*udp6hr));
> -	udp6hr->check = csum_unaligned(payload, len, psum);
> +	udp6hr->check = csum(payload, len, psum);
>  }
>  
>  /**
> @@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
>  	icmp6hr->icmp6_cksum = 0;
>  	/* Add in partial checksum for the ICMPv6 header alone */
>  	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
> -	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
> +	icmp6hr->icmp6_cksum = csum(payload, len, psum);
>  }
>  
>  #ifdef __AVX2__
> @@ -397,17 +386,29 @@ less_than_128_bytes:
>  
>  /**
>   * csum() - Compute TCP/IP-style checksum
> - * @buf:	Input buffer, must be aligned to 32-byte boundary
> + * @buf:	Input buffer
>   * @len:	Input length
>   * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
>   *
> - * Return: 16-bit folded, complemented checksum sum
> + * Return: 16-bit folded, complemented checksum
>   */
>  /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
>  uint16_t csum(const void *buf, size_t len, uint32_t init)
>  {
> -	return (uint16_t)~csum_fold(csum_avx2(buf, len, init));
> +	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
> +	unsigned int pad = align - (intptr_t)buf;
> +
> +	if (len < pad)
> +		pad = len;
> +
> +	if (pad)
> +		init += sum_16b(buf, pad);
> +
> +	if (len > pad)
> +		init = csum_avx2((void *)align, len - pad, init);
> +
> +	return (uint16_t)~csum_fold(init);
>  }
>  
>  #else /* __AVX2__ */
> @@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
>  uint16_t csum(const void *buf, size_t len, uint32_t init)
>  {
> -	return csum_unaligned(buf, len, init);
> +	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
>  }
>  
>  #endif /* !__AVX2__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/8] checksum: add csum_iov()
  2024-02-14  8:56 ` [PATCH v2 4/8] checksum: add csum_iov() Laurent Vivier
@ 2024-02-15  0:44   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  0:44 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4268 bytes --]

On Wed, Feb 14, 2024 at 09:56:24AM +0100, Laurent Vivier wrote:
> Introduce the function csum_unfolded() that computes the unfolded
> 32-bit checksum of a data buffer, and call it from csum() that returns
> the folded value.
> 
> Introduce csum_iov() that computes the checksum using csum_folded() on
> all vectors of the iovec array and returns the folded result.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - fix typo and superfluous space
>       - update comments
> 
>  checksum.c | 46 ++++++++++++++++++++++++++++++++++------------
>  checksum.h |  1 +
>  2 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 65486b4625ba..ac2bc49f7eb0 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -385,16 +385,16 @@ less_than_128_bytes:
>  }
>  
>  /**
> - * csum() - Compute TCP/IP-style checksum
> - * @buf:	Input buffer
> - * @len:	Input length
> - * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
> + * csum_unfolded - Calculate the unfolded checksum of a data buffer.
>   *
> - * Return: 16-bit folded, complemented checksum
> + * @buf:   Input buffer
> + * @len:   Input length
> + * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
> + *
> + * Return: 32-bit unfolded, complemented checksum

This function neither folds nor complements (indeed, you can't
complement until after you fold).

>   */
> -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
> -uint16_t csum(const void *buf, size_t len, uint32_t init)
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
>  {
>  	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
>  	unsigned int pad = align - (intptr_t)buf;
> @@ -408,16 +408,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
>  	if (len > pad)
>  		init = csum_avx2((void *)align, len - pad, init);
>  
> -	return (uint16_t)~csum_fold(init);
> +	return init;
>  }
> -
>  #else /* __AVX2__ */
> +/**
> + * csum_unfolded - Calculate the unfolded checksum of a data buffer.
> + *
> + * @buf:   Input buffer
> + * @len:   Input length
> + * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
> + *
> + * Return: 32-bit unfolded, complemented checksum
> + */
> +__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
> +{
> +	return sum_16b(buf, len) + init;
> +}
> +#endif /* !__AVX2__ */
>  
>  /**
>   * csum() - Compute TCP/IP-style checksum
>   * @buf:	Input buffer
>   * @len:	Input length
> - * @sum:	Initial 32-bit checksum, 0 for no pre-computed checksum
> + * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
>   *
>   * Return: 16-bit folded, complemented checksum
>   */
> @@ -425,7 +439,15 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
>  uint16_t csum(const void *buf, size_t len, uint32_t init)
>  {
> -	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> +	return (uint16_t)~csum_fold(csum_unfolded(buf, len, init));
>  }
>  
> -#endif /* !__AVX2__ */


Function comment, please.

> +uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < n; i++)
> +		init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init);
> +
> +	return (uint16_t)~csum_fold(init);
> +}
> diff --git a/checksum.h b/checksum.h
> index 21c0310d3804..6a20297a5826 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -25,5 +25,6 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
>  		const struct in6_addr *saddr, const struct in6_addr *daddr,
>  		const void *payload, size_t len);
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
> +uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init);
>  
>  #endif /* CHECKSUM_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch]
  2024-02-14  8:56 ` [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
@ 2024-02-15  2:29   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  2:29 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 15610 bytes --]

On Wed, Feb 14, 2024 at 09:56:25AM +0100, Laurent Vivier wrote:
> Introduce ip.[ch] file to encapsulate IP protocol handling
> functions and structures.
> Modify various files to include the new header ip.h when it's
> needed.

This one, and some of your other commit messages seems to be a bit
oddly wrapped, not that it really matters.

> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v2:
>       - update rational and comments
> 
>  Makefile     |  8 ++---
>  conf.c       |  1 +
>  dhcp.c       |  1 +
>  flow.c       |  1 +
>  icmp.c       |  1 +
>  ip.c         | 72 +++++++++++++++++++++++++++++++++++++++++++
>  ip.h         | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ndp.c        |  1 +
>  port_fwd.c   |  1 +
>  qrap.c       |  1 +
>  tap.c        |  1 +
>  tcp.c        |  1 +
>  tcp_splice.c |  1 +
>  udp.c        |  1 +
>  util.c       | 55 ---------------------------------
>  util.h       | 76 ----------------------------------------------
>  16 files changed, 173 insertions(+), 135 deletions(-)
>  create mode 100644 ip.c
>  create mode 100644 ip.h
> 
> diff --git a/Makefile b/Makefile
> index 156398b3844e..e1ebb454bc6b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\"
>  FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
>  
>  PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
> -	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
> +	igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
>  	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
>  	tcp_splice.c udp.c util.c
>  QRAP_SRCS = qrap.c
> @@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS)
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
>  PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
> -	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
> -	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
> -	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
> +	flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \
> +	ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \
> +	siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
>  HEADERS = $(PASST_HEADERS) seccomp.h
>  
>  C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 };
> diff --git a/conf.c b/conf.c
> index 5e15b665be9c..93bfda331349 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -35,6 +35,7 @@
>  #include <netinet/if_ether.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "netlink.h"
>  #include "udp.h"
> diff --git a/dhcp.c b/dhcp.c
> index 110772867632..ff4834a3dce9 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -25,6 +25,7 @@
>  #include <limits.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "checksum.h"
>  #include "packet.h"
>  #include "passt.h"
> diff --git a/flow.c b/flow.c
> index 5e94a7a949e5..73d52bda8774 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -11,6 +11,7 @@
>  #include <string.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "siphash.h"
>  #include "inany.h"
> diff --git a/icmp.c b/icmp.c
> index 9434fc5a7490..3b85a8578316 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -33,6 +33,7 @@
>  
>  #include "packet.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "log.h"
> diff --git a/ip.c b/ip.c
> new file mode 100644
> index 000000000000..2cc7f6548aff
> --- /dev/null
> +++ b/ip.c
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + *  for qemu/UNIX domain socket mode
> + *
> + * PASTA - Pack A Subtle Tap Abstraction
> + *  for network namespace/tap device mode
> + *
> + * ip.c - IP related functions
> + *
> + * Copyright (c) 2020-2021 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + */
> +
> +#include <stddef.h>
> +#include "util.h"
> +#include "ip.h"
> +
> +#define IPV6_NH_OPT(nh)							\
> +	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
> +	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
> +	 (nh) == 140 || (nh) == 253 || (nh) == 254)
> +
> +/**
> + * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
> + * @p:		Packet pool, packet number @idx has IPv6 header at @offset
> + * @idx:	Index of packet in pool
> + * @offset:	Pre-calculated IPv6 header offset
> + * @proto:	Filled with L4 protocol number
> + * @dlen:	Data length (payload excluding header extensions), set on return
> + *
> + * Return: pointer to L4 header, NULL if not found
> + */
> +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> +		 size_t *dlen)
> +{
> +	const struct ipv6_opt_hdr *o;
> +	const struct ipv6hdr *ip6h;
> +	char *base;
> +	int hdrlen;
> +	uint8_t nh;
> +
> +	base = packet_get(p, idx, 0, 0, NULL);
> +	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
> +	if (!ip6h)
> +		return NULL;
> +
> +	offset += sizeof(*ip6h);
> +
> +	nh = ip6h->nexthdr;
> +	if (!IPV6_NH_OPT(nh))
> +		goto found;
> +
> +	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
> +		nh = o->nexthdr;
> +		hdrlen = (o->hdrlen + 1) * 8;
> +
> +		if (IPV6_NH_OPT(nh))
> +			offset += hdrlen;
> +		else
> +			goto found;
> +	}
> +
> +	return NULL;
> +
> +found:
> +	if (nh == 59)
> +		return NULL;
> +
> +	*proto = nh;
> +	return base + offset;
> +}
> diff --git a/ip.h b/ip.h
> new file mode 100644
> index 000000000000..b2e08bc049f3
> --- /dev/null
> +++ b/ip.h
> @@ -0,0 +1,86 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + * Copyright (c) 2021 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + */
> +
> +#ifndef IP_H
> +#define IP_H
> +
> +#include <netinet/ip.h>
> +#include <netinet/ip6.h>
> +
> +#define IN4_IS_ADDR_UNSPECIFIED(a) \
> +	((a)->s_addr == htonl_constant(INADDR_ANY))
> +#define IN4_IS_ADDR_BROADCAST(a) \
> +	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
> +#define IN4_IS_ADDR_LOOPBACK(a) \
> +	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
> +#define IN4_IS_ADDR_MULTICAST(a) \
> +	(IN_MULTICAST(ntohl((a)->s_addr)))
> +#define IN4_ARE_ADDR_EQUAL(a, b) \
> +	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
> +#define IN4ADDR_LOOPBACK_INIT \
> +	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
> +#define IN4ADDR_ANY_INIT \
> +	{ .s_addr	= htonl_constant(INADDR_ANY) }
> +
> +#define L2_BUF_IP4_INIT(proto)						\
> +	{								\
> +		.version	= 4,					\
> +		.ihl		= 5,					\
> +		.tos		= 0,					\
> +		.tot_len	= 0,					\
> +		.id		= 0,					\
> +		.frag_off	= 0,					\
> +		.ttl		= 0xff,					\
> +		.protocol	= (proto),				\
> +		.saddr		= 0,					\
> +		.daddr		= 0,					\
> +	}
> +#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
> +				 (uint32_t)htons_constant(0xff00 | (proto)))
> +
> +#define L2_BUF_IP6_INIT(proto)						\
> +	{								\
> +		.priority	= 0,					\
> +		.version	= 6,					\
> +		.flow_lbl	= { 0 },				\
> +		.payload_len	= 0,					\
> +		.nexthdr	= (proto),				\
> +		.hop_limit	= 255,					\
> +		.saddr		= IN6ADDR_ANY_INIT,			\
> +		.daddr		= IN6ADDR_ANY_INIT,			\
> +	}
> +
> +struct ipv6hdr {
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpedantic"
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +	uint8_t			version:4,
> +				priority:4;
> +#else
> +	uint8_t			priority:4,
> +				version:4;
> +#endif
> +#pragma GCC diagnostic pop
> +	uint8_t			flow_lbl[3];
> +
> +	uint16_t		payload_len;
> +	uint8_t			nexthdr;
> +	uint8_t			hop_limit;
> +
> +	struct in6_addr		saddr;
> +	struct in6_addr		daddr;
> +};
> +
> +struct ipv6_opt_hdr {
> +	uint8_t			nexthdr;
> +	uint8_t			hdrlen;
> +	/*
> +	 * TLV encoded option data follows.
> +	 */
> +} __attribute__((packed));	/* required for some archs */
> +
> +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> +		 size_t *dlen);
> +#endif /* IP_H */
> diff --git a/ndp.c b/ndp.c
> index 4c85ab8bcaee..c58f4b222b76 100644
> --- a/ndp.c
> +++ b/ndp.c
> @@ -28,6 +28,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "log.h"
> diff --git a/port_fwd.c b/port_fwd.c
> index 6f6c836c57ad..e1ec31e2232c 100644
> --- a/port_fwd.c
> +++ b/port_fwd.c
> @@ -21,6 +21,7 @@
>  #include <stdio.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "port_fwd.h"
>  #include "passt.h"
>  #include "lineread.h"
> diff --git a/qrap.c b/qrap.c
> index 97f350a4bf0b..d59670621731 100644
> --- a/qrap.c
> +++ b/qrap.c
> @@ -32,6 +32,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "arp.h"
>  
> diff --git a/tap.c b/tap.c
> index 396dee7eef25..3ea03f720d6d 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -45,6 +45,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "arp.h"
>  #include "dhcp.h"
> diff --git a/tcp.c b/tcp.c
> index 2ab443d5c3f2..45ef5146729a 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -289,6 +289,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "siphash.h"
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 26d32065cd47..66575ca95a1e 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -49,6 +49,7 @@
>  #include <sys/socket.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "log.h"
>  #include "tcp_splice.h"
> diff --git a/udp.c b/udp.c
> index b5b8f8a7cd5b..d514c864ab5b 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -112,6 +112,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "pcap.h"
> diff --git a/util.c b/util.c
> index 21b35ff94db1..f73ea1d98a09 100644
> --- a/util.c
> +++ b/util.c
> @@ -30,61 +30,6 @@
>  #include "packet.h"
>  #include "log.h"
>  
> -#define IPV6_NH_OPT(nh)							\
> -	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
> -	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
> -	 (nh) == 140 || (nh) == 253 || (nh) == 254)
> -
> -/**
> - * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
> - * @p:		Packet pool, packet number @idx has IPv6 header at @offset
> - * @idx:	Index of packet in pool
> - * @offset:	Pre-calculated IPv6 header offset
> - * @proto:	Filled with L4 protocol number
> - * @dlen:	Data length (payload excluding header extensions), set on return
> - *
> - * Return: pointer to L4 header, NULL if not found
> - */
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> -		 size_t *dlen)
> -{
> -	const struct ipv6_opt_hdr *o;
> -	const struct ipv6hdr *ip6h;
> -	char *base;
> -	int hdrlen;
> -	uint8_t nh;
> -
> -	base = packet_get(p, idx, 0, 0, NULL);
> -	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
> -	if (!ip6h)
> -		return NULL;
> -
> -	offset += sizeof(*ip6h);
> -
> -	nh = ip6h->nexthdr;
> -	if (!IPV6_NH_OPT(nh))
> -		goto found;
> -
> -	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
> -		nh = o->nexthdr;
> -		hdrlen = (o->hdrlen + 1) * 8;
> -
> -		if (IPV6_NH_OPT(nh))
> -			offset += hdrlen;
> -		else
> -			goto found;
> -	}
> -
> -	return NULL;
> -
> -found:
> -	if (nh == 59)
> -		return NULL;
> -
> -	*proto = nh;
> -	return base + offset;
> -}
> -
>  /**
>   * sock_l4() - Create and bind socket for given L4, add to epoll list
>   * @c:		Execution context
> diff --git a/util.h b/util.h
> index d2320f8cc99a..f7c3dfee9972 100644
> --- a/util.h
> +++ b/util.h
> @@ -110,22 +110,6 @@
>  #define	htonl_constant(x)	(__bswap_constant_32(x))
>  #endif
>  
> -#define IN4_IS_ADDR_UNSPECIFIED(a) \
> -	((a)->s_addr == htonl_constant(INADDR_ANY))
> -#define IN4_IS_ADDR_BROADCAST(a) \
> -	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
> -#define IN4_IS_ADDR_LOOPBACK(a) \
> -	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
> -#define IN4_IS_ADDR_MULTICAST(a) \
> -	(IN_MULTICAST(ntohl((a)->s_addr)))
> -#define IN4_ARE_ADDR_EQUAL(a, b) \
> -	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
> -#define IN4ADDR_LOOPBACK_INIT \
> -	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
> -#define IN4ADDR_ANY_INIT \
> -	{ .s_addr	= htonl_constant(INADDR_ANY) }
> -
> -
>  #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
>  int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  	     void *arg);
> @@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  			 (void *)(arg));				\
>  	} while (0)
>  
> -#define L2_BUF_IP4_INIT(proto)						\
> -	{								\
> -		.version	= 4,					\
> -		.ihl		= 5,					\
> -		.tos		= 0,					\
> -		.tot_len	= 0,					\
> -		.id		= 0,					\
> -		.frag_off	= 0,					\
> -		.ttl		= 0xff,					\
> -		.protocol	= (proto),				\
> -		.saddr		= 0,					\
> -		.daddr		= 0,					\
> -	}
> -#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
> -				 (uint32_t)htons_constant(0xff00 | (proto)))
> -
> -#define L2_BUF_IP6_INIT(proto)						\
> -	{								\
> -		.priority	= 0,					\
> -		.version	= 6,					\
> -		.flow_lbl	= { 0 },				\
> -		.payload_len	= 0,					\
> -		.nexthdr	= (proto),				\
> -		.hop_limit	= 255,					\
> -		.saddr		= IN6ADDR_ANY_INIT,			\
> -		.daddr		= IN6ADDR_ANY_INIT,			\
> -	}
> -
>  #define RCVBUF_BIG		(2UL * 1024 * 1024)
>  #define SNDBUF_BIG		(4UL * 1024 * 1024)
>  #define SNDBUF_SMALL		(128UL * 1024)
> @@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  #include <net/if.h>
>  #include <limits.h>
>  #include <stdint.h>
> -#include <netinet/ip6.h>
>  
>  #include "packet.h"
>  
>  struct ctx;
>  
> -struct ipv6hdr {
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpedantic"
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -	uint8_t			version:4,
> -				priority:4;
> -#else
> -	uint8_t			priority:4,
> -				version:4;
> -#endif
> -#pragma GCC diagnostic pop
> -	uint8_t			flow_lbl[3];
> -
> -	uint16_t		payload_len;
> -	uint8_t			nexthdr;
> -	uint8_t			hop_limit;
> -
> -	struct in6_addr		saddr;
> -	struct in6_addr		daddr;
> -};
> -
> -struct ipv6_opt_hdr {
> -	uint8_t			nexthdr;
> -	uint8_t			hdrlen;
> -	/*
> -	 * TLV encoded option data follows.
> -	 */
> -} __attribute__((packed));	/* required for some archs */
> -
>  /* cppcheck-suppress funcArgNamesDifferent */
>  __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); }
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> -		 size_t *dlen);
>  int sock_l4(const struct ctx *c, int af, uint8_t proto,
>  	    const void *bind_addr, const char *ifname, uint16_t port,
>  	    uint32_t data);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-14  8:56 ` [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
@ 2024-02-15  2:51   ` David Gibson
  2024-02-16  9:08   ` Stefano Brivio
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  2:51 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7813 bytes --]

On Wed, Feb 14, 2024 at 09:56:26AM +0100, Laurent Vivier wrote:
> We can find the same function to compute the IPv4 header
> checksum in tcp.c, udp.c and tap.c
> 
> Use the function defined for tap.c, csum_ip4_header(), but
> with the code used in tcp.c and udp.c as it doesn't need a fully
> initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - use csum_ip4_header() from checksum.c
>       - use code from tcp.c and udp.c in csum_ip4_header()
>       - use "const struct iphfr *", check is not updated by the
>         function but by the caller.
> 
>  checksum.c | 16 ++++++++++++----
>  checksum.h |  2 +-
>  tap.c      |  2 +-
>  tcp.c      | 22 +---------------------
>  udp.c      | 23 +++++------------------
>  5 files changed, 20 insertions(+), 45 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index ac2bc49f7eb0..5613187a1c82 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -57,6 +57,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  
>  /* Checksums are optional for UDP over IPv4, so we usually just set
>   * them to 0.  Change this to 1 to calculate real UDP over IPv4
> @@ -115,13 +116,20 @@ uint16_t csum_fold(uint32_t sum)
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
>  /**
> - * csum_ip4_header() - Calculate and set IPv4 header checksum
> + * csum_ip4_header() - Calculate IPv4 header checksum
>   * @ip4h:	IPv4 header
>   */
> -void csum_ip4_header(struct iphdr *ip4h)
> +uint16_t csum_ip4_header(const struct iphdr *ip4h)
>  {
> -	ip4h->check = 0;
> -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> +	uint32_t sum = L2_BUF_IP4_PSUM(ip4h->protocol);

Hrm, it's probably not a huge deal, but this change has more
consequences than might be immediately apparent.

In the existing use cases, I was expecting L2_BUF_IP4_PSUM() to be
evaluated at compile time, because it's always passed a constant.
With this new formulation the setting of ip4h->protocol is far
separated from this checksum, so I doubt the compiler will be able to
deduce it always has the same value.  As well as extra computation
that could be an extra memory access, which is more significant.  Als,
the macro uses htons_constant(), which I guess works for
non-constants, but probably isn't ideal.

So, although it seems technically redundant, I'd suggest passing in
the protocol rather than reading it from the header, to preserve that
ability to constant fold where the protocol is statically known.
Well.. assuming the compiler inlines enough to propagate the constant
across the function call, which given we don't have a separate link
pass is possible.

Or, maybe we should rework this to take the addresses as parameters
too.  That does have a few advantages:

  * It makes it obvious exactly what this function requires, rather
    than having assumptions about what fields of the header must
    already be initialised
  * It should avoid the #pragma nonsense to avoid the unaligned
    warning
  * For at least some of the callsites, the addresses are probably
    already in registers, so it might save a couple of memory accesses

> +	sum += ip4h->tot_len;
> +	sum += (ip4h->saddr >> 16) & 0xffff;
> +	sum += ip4h->saddr & 0xffff;
> +	sum += (ip4h->daddr >> 16) & 0xffff;
> +	sum += ip4h->daddr & 0xffff;
> +
> +	return ~csum_fold(sum);
>  }
>  
>  /**
> diff --git a/checksum.h b/checksum.h
> index 6a20297a5826..b87ecd720df5 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -13,7 +13,7 @@ struct icmp6hdr;
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> -void csum_ip4_header(struct iphdr *ip4h);
> +uint16_t csum_ip4_header(const struct iphdr *ip4h);
>  void csum_udp4(struct udphdr *udp4hr,
>  	       struct in_addr saddr, struct in_addr daddr,
>  	       const void *payload, size_t len);
> diff --git a/tap.c b/tap.c
> index 3ea03f720d6d..70f36a55314f 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -160,7 +160,7 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst,
>  	ip4h->protocol = proto;
>  	ip4h->saddr = src.s_addr;
>  	ip4h->daddr = dst.s_addr;
> -	csum_ip4_header(ip4h);
> +	ip4h->check = csum_ip4_header(ip4h);
>  	return ip4h + 1;
>  }
>  
> diff --git a/tcp.c b/tcp.c
> index 45ef5146729a..35e240f4ffc3 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
>  		trace("TCP: failed to set SO_SNDBUF to %i", v);
>  }
>  
> -/**
> - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * tcp_update_check_tcp4() - Update TCP checksum from stored one
>   * @buf:	L2 packet buffer with final IPv4 header
> @@ -1393,10 +1376,7 @@ do {									\
>  		b->iph.saddr = a4->s_addr;
>  		b->iph.daddr = c->ip4.addr_seen.s_addr;
>  
> -		if (check)
> -			b->iph.check = *check;
> -		else
> -			tcp_update_check_ip4(b);
> +		b->iph.check = check ? *check : csum_ip4_header(&b->iph);
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> diff --git a/udp.c b/udp.c
> index d514c864ab5b..e645c800a823 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
>  	}
>  }
>  
> -/**
> - * udp_update_check4() - Update checksum with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void udp_update_check4(struct udp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
>   * @eth_d:	Ethernet destination address, NULL if unchanged
> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>   *
>   * Return: size of tap frame with headers
>   */
> +#pragma GCC diagnostic push
> +/* ignore unaligned pointer value warning for &b->iph */
> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
>  static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  			      const struct timespec *now)
>  {
> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  		b->iph.saddr = b->s_in.sin_addr.s_addr;
>  	}
>  
> -	udp_update_check4(b);
> +	b->iph.check = csum_ip4_header(&b->iph);
>  	b->uh.source = b->s_in.sin_port;
>  	b->uh.dest = htons(dstport);
>  	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
>  
>  	return tap_iov_len(c, &b->taph, ip_len);
>  }
> +#pragma GCC diagnostic pop
>  
>  /**
>   * udp_update_hdr6() - Update headers for one IPv6 datagram

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-14  8:56 ` [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
@ 2024-02-15  3:12   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  3:12 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 12656 bytes --]

On Wed, Feb 14, 2024 at 09:56:27AM +0100, Laurent Vivier wrote:
> The TCP and UDP checksums are computed using the data in the TCP/UDP
> payload but also some informations in the IP header (protocol,
> length, source and destination addresses).
> 
> We add two functions, proto_ipv4_header_psum() and
> proto_ipv6_header_psum(), to compute the checksum of the IP
> header part.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - move new function to checksum.c
>       - use _psum rather than _checksum in the name
>       - replace csum_udp4() and csum_udp6() by the new function
> 
>  checksum.c | 70 ++++++++++++++++++++----------------------------------
>  checksum.h | 11 ++++-----
>  tap.c      | 19 +++++++++++++--
>  tcp.c      | 42 +++++++++++++-------------------
>  udp.c      | 11 +++++----
>  5 files changed, 72 insertions(+), 81 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 5613187a1c82..90dad96ee2c1 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -59,12 +59,6 @@
>  #include "util.h"
>  #include "ip.h"
>  
> -/* Checksums are optional for UDP over IPv4, so we usually just set
> - * them to 0.  Change this to 1 to calculate real UDP over IPv4
> - * checksums
> - */
> -#define UDP4_REAL_CHECKSUMS	0
> -
>  /**
>   * sum_16b() - Calculate sum of 16-bit words
>   * @buf:	Input buffer
> @@ -133,31 +127,23 @@ uint16_t csum_ip4_header(const struct iphdr *ip4h)
>  }
>  
>  /**
> - * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
> - * @udp4hr:	UDP header, initialised apart from checksum
> - * @saddr:	IPv4 source address
> - * @daddr:	IPv4 destination address
> - * @payload:	ICMPv4 packet payload
> - * @len:	Length of @payload (not including UDP)
> + * proto_ipv4_header_psum() - Calculates the partial checksum of an
> + * 			      IPv4 header for UDP or TCP
> + * @param:	ip4h Pointer to the IPv4 header structure
> + * @proto:	proto Protocol number
> + * Returns:	Partial checksum of the IPv4 header
>   */
> -void csum_udp4(struct udphdr *udp4hr,
> -	       struct in_addr saddr, struct in_addr daddr,
> -	       const void *payload, size_t len)
> +uint32_t proto_ipv4_header_psum(struct iphdr *ip4h, uint8_t proto)

As per comments on the previous patch, I think there are some
advantages to passing the specific header fields as parameters, rather
than assuming they're already writen to the header structure.
Especially since that's closer to the interface of the pre-existing
functions.

>  {
> -	/* UDP checksums are optional, so don't bother */
> -	udp4hr->check = 0;
> -
> -	if (UDP4_REAL_CHECKSUMS) {
> -		/* UNTESTED: if we did want real UDPv4 checksums, this
> -		 * is roughly what we'd need */
> -		uint32_t psum = csum_fold(saddr.s_addr)
> -			+ csum_fold(daddr.s_addr)
> -			+ htons(len + sizeof(*udp4hr))
> -			+ htons(IPPROTO_UDP);
> -		/* Add in partial checksum for the UDP header alone */
> -		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> -		udp4hr->check = csum(payload, len, psum);
> -	}
> +	uint32_t sum = htons(proto);
> +
> +	sum += (ip4h->saddr >> 16) & 0xffff;
> +	sum += ip4h->saddr & 0xffff;
> +	sum += (ip4h->daddr >> 16) & 0xffff;
> +	sum += ip4h->daddr & 0xffff;
> +	sum += htons(ntohs(ip4h->tot_len) - 20);
> +
> +	return sum;
>  }
>  
>  /**
> @@ -179,24 +165,20 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
>  }
>  
>  /**
> - * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet
> - * @udp6hr:	UDP header, initialised apart from checksum
> - * @payload:	UDP packet payload
> - * @len:	Length of @payload (not including UDP header)
> + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> + * 			      IPv6 header for UDP or TCP
> + * @param:	ip6h Pointer to the IPv4 header structure
> + * @proto:	proto Protocol number
> + * Returns:	Partial checksum of the IPv6 header
>   */
> -void csum_udp6(struct udphdr *udp6hr,
> -	       const struct in6_addr *saddr, const struct in6_addr *daddr,
> -	       const void *payload, size_t len)
> +uint32_t proto_ipv6_header_psum(struct ipv6hdr *ip6h, uint8_t proto)
>  {
> -	/* Partial checksum for the pseudo-IPv6 header */
> -	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> -		        sum_16b(daddr, sizeof(*daddr)) +
> -		        htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
> +	uint32_t sum = htons(proto) + ip6h->payload_len;
> +
> +	sum += sum_16b(&ip6h->saddr, sizeof(ip6h->saddr));
> +	sum += sum_16b(&ip6h->daddr, sizeof(ip6h->daddr));
>  
> -	udp6hr->check = 0;
> -	/* Add in partial checksum for the UDP header alone */
> -	psum += sum_16b(udp6hr, sizeof(*udp6hr));
> -	udp6hr->check = csum(payload, len, psum);
> +	return sum;
>  }
>  
>  /**
> diff --git a/checksum.h b/checksum.h
> index b87ecd720df5..10533f708853 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -6,24 +6,23 @@
>  #ifndef CHECKSUM_H
>  #define CHECKSUM_H
>  
> +struct iphdr;
>  struct udphdr;
>  struct icmphdr;
> +struct ipv6hdr;
>  struct icmp6hdr;
>  
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
>  uint16_t csum_ip4_header(const struct iphdr *ip4h);
> -void csum_udp4(struct udphdr *udp4hr,
> -	       struct in_addr saddr, struct in_addr daddr,
> -	       const void *payload, size_t len);
> +uint32_t proto_ipv4_header_psum(struct iphdr *ip4h, uint8_t proto);
>  void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
> -void csum_udp6(struct udphdr *udp6hr,
> -	       const struct in6_addr *saddr, const struct in6_addr *daddr,
> -	       const void *payload, size_t len);
> +uint32_t proto_ipv6_header_psum(struct ipv6hdr *ip6h, uint8_t proto);
>  void csum_icmp6(struct icmp6hdr *icmp6hr,
>  		const struct in6_addr *saddr, const struct in6_addr *daddr,
>  		const void *payload, size_t len);
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  uint16_t csum_iov(struct iovec *iov, unsigned int n, uint32_t init);
>  
> diff --git a/tap.c b/tap.c
> index 70f36a55314f..02b51100d089 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -58,6 +58,12 @@
>  #include "tap.h"
>  #include "log.h"
>  
> +/* Checksums are optional for UDP over IPv4, so we usually just set
> + * them to 0.  Change this to 1 to calculate real UDP over IPv4
> + * checksums
> + */
> +#define UDP4_REAL_CHECKSUMS     0
> +
>  /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
>  static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf);
>  static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf);
> @@ -188,7 +194,12 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
>  	uh->source = htons(sport);
>  	uh->dest = htons(dport);
>  	uh->len = htons(udplen);
> -	csum_udp4(uh, src, dst, in, len);
> +	uh->check = 0;
> +	if (UDP4_REAL_CHECKSUMS) {
> +		uint32_t sum = proto_ipv4_header_psum(ip4h, IPPROTO_UDP);
> +		sum = csum_unfolded(uh, sizeof(struct udphdr), sum);
> +		uh->check = csum(in, len, sum);
> +	}
>  	memcpy(data, in, len);
>  
>  	if (tap_send(c, buf, len + (data - buf)) < 0)
> @@ -271,11 +282,15 @@ void tap_udp6_send(const struct ctx *c,
>  	void *uhp = tap_push_ip6h(ip6h, src, dst, udplen, IPPROTO_UDP, flow);
>  	struct udphdr *uh = (struct udphdr *)uhp;
>  	char *data = (char *)(uh + 1);
> +	uint32_t sum;
>  
>  	uh->source = htons(sport);
>  	uh->dest = htons(dport);
>  	uh->len = htons(udplen);
> -	csum_udp6(uh, src, dst, in, len);
> +	uh->check = 0;
> +	sum = proto_ipv6_header_psum(ip6h, IPPROTO_UDP);
> +	sum = csum_unfolded(uh, sizeof(struct udphdr), sum);
> +	uh->check = csum(in, len, sum);

I think it would still be good to have a single-call helper for the
UDP checksums since we need them in two places: here for the "slow
path" used by DHCP etc. and then in udp.c for the "fast path".

>  	memcpy(data, in, len);
>  
>  	if (tap_send(c, buf, len + (data - buf)) < 1)
> diff --git a/tcp.c b/tcp.c
> index 35e240f4ffc3..6a0020f708c0 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -938,39 +938,25 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
>   * tcp_update_check_tcp4() - Update TCP checksum from stored one
>   * @buf:	L2 packet buffer with final IPv4 header
>   */
> -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf)
> +static uint16_t tcp_update_check_tcp4(struct iphdr *iph)
>  {
> -	uint16_t tlen = ntohs(buf->iph.tot_len) - 20;
> -	uint32_t sum = htons(IPPROTO_TCP);
> +	struct tcphdr *th = (struct tcphdr *)(iph + 1);
> +	uint16_t tlen = ntohs(iph->tot_len) - 20;
> +	uint32_t sum = proto_ipv4_header_psum(iph, IPPROTO_TCP);
>  
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -	sum += htons(ntohs(buf->iph.tot_len) - 20);
> -
> -	buf->th.check = 0;
> -	buf->th.check = csum(&buf->th, tlen, sum);
> +	return csum(th, tlen, sum);
>  }
>  
>  /**
>   * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6
>   * @buf:	L2 packet buffer with final IPv6 header
>   */
> -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf)
> +static uint16_t tcp_update_check_tcp6(struct ipv6hdr *ip6h)
>  {
> -	int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr);
> -
> -	buf->ip6h.hop_limit = IPPROTO_TCP;
> -	buf->ip6h.version = 0;
> -	buf->ip6h.nexthdr = 0;
> +	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
> +	uint32_t sum = proto_ipv6_header_psum(ip6h, IPPROTO_TCP);
>  
> -	buf->th.check = 0;
> -	buf->th.check = csum(&buf->ip6h, len, 0);
> -
> -	buf->ip6h.hop_limit = 255;
> -	buf->ip6h.version = 6;
> -	buf->ip6h.nexthdr = IPPROTO_TCP;
> +	return csum(th, ntohs(ip6h->payload_len), sum);
>  }
>  
>  /**
> @@ -1380,7 +1366,8 @@ do {									\
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> -		tcp_update_check_tcp4(b);
> +		b->th.check = 0;

I think this initialisation should be folded into
tcp_update_check_tcp4().  Otherwise th.check == 0 is a pretty
non-obvious pre-condition for that function.

> +		b->th.check = tcp_update_check_tcp4(&b->iph);
>  
>  		tlen = tap_iov_len(c, &b->taph, ip_len);
>  	} else {
> @@ -1399,7 +1386,12 @@ do {									\
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> -		tcp_update_check_tcp6(b);
> +		b->th.check = 0;

Same for v6, of course.

> +		b->th.check = tcp_update_check_tcp6(&b->ip6h);
> +
> +		b->ip6h.hop_limit = 255;
> +		b->ip6h.version = 6;
> +		b->ip6h.nexthdr = IPPROTO_TCP;
>  
>  		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
>  		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
> diff --git a/udp.c b/udp.c
> index e645c800a823..bf24288d5751 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -618,6 +618,9 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,

Hmm.. pre-existing bug(?) but udp_update_hdr4() should probably
respect the UDP4_REAL_CHECKSUMS option as well.  Using a common helper
for there and tap_udp4_send() which checks it would be nice.

>   *
>   * Return: size of tap frame with headers
>   */
> +#pragma GCC diagnostic push
> +/* ignore unaligned pointer value warning for &b->ip6h */
> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
>  static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
>  			      const struct timespec *now)
>  {
> @@ -673,16 +676,16 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
>  	b->uh.source = b->s_in6.sin6_port;
>  	b->uh.dest = htons(dstport);
>  	b->uh.len = b->ip6h.payload_len;
> -
> -	b->ip6h.hop_limit = IPPROTO_UDP;
> -	b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0;
> -	b->uh.check = csum(&b->ip6h, ip_len, 0);
> +	b->uh.check = 0;
> +	b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> +			   proto_ipv6_header_psum(&b->ip6h, IPPROTO_UDP));
>  	b->ip6h.version = 6;
>  	b->ip6h.nexthdr = IPPROTO_UDP;
>  	b->ip6h.hop_limit = 255;
>  
>  	return tap_iov_len(c, &b->taph, ip_len);
>  }
> +#pragma GCC diagnostic pop
>  
>  /**
>   * udp_tap_send() - Prepare UDP datagrams and send to tap interface

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 8/8] tap: make tap_update_mac() generic
  2024-02-14  8:56 ` [PATCH v2 8/8] tap: make tap_update_mac() generic Laurent Vivier
@ 2024-02-15  3:13   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-15  3:13 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3635 bytes --]

On Wed, Feb 14, 2024 at 09:56:28AM +0100, Laurent Vivier wrote:
> Use ethhdr rather than tap_hdr.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v2:
>       - update function comment
>       - move the patch earlier in the series
> 
>  tap.c | 10 +++++-----
>  tap.h |  2 +-
>  tcp.c |  8 ++++----
>  udp.c |  4 ++--
>  4 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 02b51100d089..9ffb0f0a88d4 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -457,18 +457,18 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n)
>  }
>  
>  /**
> - * tap_update_mac() - Update tap L2 header with new Ethernet addresses
> - * @taph:	Tap headers to update
> + * eth_update_mac() - Update tap L2 header with new Ethernet addresses
> + * @eh:		Ethernet headers to update
>   * @eth_d:	Ethernet destination address, NULL if unchanged
>   * @eth_s:	Ethernet source address, NULL if unchanged
>   */
> -void tap_update_mac(struct tap_hdr *taph,
> +void eth_update_mac(struct ethhdr *eh,
>  		    const unsigned char *eth_d, const unsigned char *eth_s)
>  {
>  	if (eth_d)
> -		memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest));
> +		memcpy(eh->h_dest, eth_d, sizeof(eh->h_dest));
>  	if (eth_s)
> -		memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source));
> +		memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
>  }
>  
>  PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
> diff --git a/tap.h b/tap.h
> index 466d91466c3d..437b9aa2b43f 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -74,7 +74,7 @@ void tap_icmp6_send(const struct ctx *c,
>  		    const void *in, size_t len);
>  int tap_send(const struct ctx *c, const void *data, size_t len);
>  size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n);
> -void tap_update_mac(struct tap_hdr *taph,
> +void eth_update_mac(struct ethhdr *eh,
>  		    const unsigned char *eth_d, const unsigned char *eth_s);
>  void tap_listen_handler(struct ctx *c, uint32_t events);
>  void tap_handler_pasta(struct ctx *c, uint32_t events,
> diff --git a/tcp.c b/tcp.c
> index 6a0020f708c0..1c80299111f3 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -974,10 +974,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
>  		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
>  		struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i];
>  
> -		tap_update_mac(&b4->taph, eth_d, eth_s);
> -		tap_update_mac(&b6->taph, eth_d, eth_s);
> -		tap_update_mac(&b4f->taph, eth_d, eth_s);
> -		tap_update_mac(&b6f->taph, eth_d, eth_s);
> +		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
> +		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
> +		eth_update_mac(&b4f->taph.eh, eth_d, eth_s);
> +		eth_update_mac(&b6f->taph.eh, eth_d, eth_s);
>  	}
>  }
>  
> diff --git a/udp.c b/udp.c
> index bf24288d5751..97c1292f6b59 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -283,8 +283,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
>  		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
>  		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
>  
> -		tap_update_mac(&b4->taph, eth_d, eth_s);
> -		tap_update_mac(&b6->taph, eth_d, eth_s);
> +		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
> +		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
>  	}
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/8] iov: add some functions to manage iovec
  2024-02-15  0:24   ` David Gibson
  2024-02-15  0:32     ` David Gibson
@ 2024-02-16  5:29     ` Stefano Brivio
  1 sibling, 0 replies; 28+ messages in thread
From: Stefano Brivio @ 2024-02-16  5:29 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Thu, 15 Feb 2024 11:24:38 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 14, 2024 at 09:56:21AM +0100, Laurent Vivier wrote:
> >
> > [...]
> >
> > + * Author: Laurent Vivier <lvivier@redhat.com>
> > + *
> > + * This file also contains code originally from QEMU include/qemu/iov.h
> > + * and licensed under the following terms:
> > + *
> > + * Copyright (C) 2010 Red Hat, Inc.
> > + *
> > + * Author(s):
> > + *  Amit Shah <amit.shah@redhat.com>
> > + *  Michael Tokarev <mjt@tls.msk.ru>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Contributions after 2012-01-13 are licensed under the terms of the
> > + * GNU GPL, version 2 or (at your option) any later version.  
> 
> The description of the provenance of the code and its authorship is
> useful.  I don't think the second copyright notice is useful in this
> case, since it's also Red Hat, like the first.  Likewise, I don't
> think the GPL invocation is useful, since we're not changing that
> license.

That's simply a full quote of the original terms. It's not required for
any purpose, but I think it's more convenient to just quote as-is
rather than editing bits outs of it.

-- 
Stefano


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/8] pcap: add pcap_iov()
  2024-02-14  8:56 ` [PATCH v2 2/8] pcap: add pcap_iov() Laurent Vivier
  2024-02-15  0:35   ` David Gibson
@ 2024-02-16  5:30   ` Stefano Brivio
  1 sibling, 0 replies; 28+ messages in thread
From: Stefano Brivio @ 2024-02-16  5:30 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Wed, 14 Feb 2024 09:56:22 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - introduce pcap_header(), a common helper to write
>         packet header
>       - use writev() rather than write() in a loop
>       - add functions comment
> 
>  pcap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  pcap.h |  1 +
>  2 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/pcap.c b/pcap.c
> index 501d52d4992b..3869a403dd0f 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -20,6 +20,7 @@
>  #include <sys/time.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/uio.h>
>  #include <fcntl.h>
>  #include <time.h>
>  #include <errno.h>
> @@ -31,6 +32,7 @@
>  #include "util.h"
>  #include "passt.h"
>  #include "log.h"
> +#include "iov.h"
>  
>  #define PCAP_VERSION_MINOR 4
>  
> @@ -65,6 +67,28 @@ struct pcap_pkthdr {
>  	uint32_t len;
>  };
>  
> +/*
> + * pcap_header - Write a pcap packet header to the pcap file descriptor (pcap_fd).

Nit: pcap_header(). And "(pcap_fd)" doesn't seem to be a valid
reference (anymore?).

> + *
> + * @len:       Length of the packet data.
> + * @tv:        Pointer to a timeval struct containing the timestamp for the packet.
> + *
> + * Returns; -1 in case of error, otherwise, 0 to indicate success.

"Return: -1" ...I know, it's wrong in pcap_frame().

-- 
Stefano


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-14  8:56 ` [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
  2024-02-15  2:51   ` David Gibson
@ 2024-02-16  9:08   ` Stefano Brivio
  2024-02-16 14:17     ` Laurent Vivier
  1 sibling, 1 reply; 28+ messages in thread
From: Stefano Brivio @ 2024-02-16  9:08 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Wed, 14 Feb 2024 09:56:26 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> We can find the same function to compute the IPv4 header
> checksum in tcp.c, udp.c and tap.c
> 
> Use the function defined for tap.c, csum_ip4_header(), but
> with the code used in tcp.c and udp.c as it doesn't need a fully
> initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v2:
>       - use csum_ip4_header() from checksum.c
>       - use code from tcp.c and udp.c in csum_ip4_header()
>       - use "const struct iphfr *", check is not updated by the
>         function but by the caller.
> 
>  checksum.c | 16 ++++++++++++----
>  checksum.h |  2 +-
>  tap.c      |  2 +-
>  tcp.c      | 22 +---------------------
>  udp.c      | 23 +++++------------------
>  5 files changed, 20 insertions(+), 45 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index ac2bc49f7eb0..5613187a1c82 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -57,6 +57,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  
>  /* Checksums are optional for UDP over IPv4, so we usually just set
>   * them to 0.  Change this to 1 to calculate real UDP over IPv4
> @@ -115,13 +116,20 @@ uint16_t csum_fold(uint32_t sum)
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
>  /**
> - * csum_ip4_header() - Calculate and set IPv4 header checksum
> + * csum_ip4_header() - Calculate IPv4 header checksum
>   * @ip4h:	IPv4 header
>   */
> -void csum_ip4_header(struct iphdr *ip4h)
> +uint16_t csum_ip4_header(const struct iphdr *ip4h)
>  {
> -	ip4h->check = 0;
> -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> +	uint32_t sum = L2_BUF_IP4_PSUM(ip4h->protocol);
> +
> +	sum += ip4h->tot_len;
> +	sum += (ip4h->saddr >> 16) & 0xffff;
> +	sum += ip4h->saddr & 0xffff;
> +	sum += (ip4h->daddr >> 16) & 0xffff;
> +	sum += ip4h->daddr & 0xffff;
> +
> +	return ~csum_fold(sum);
>  }
>  
>  /**
> diff --git a/checksum.h b/checksum.h
> index 6a20297a5826..b87ecd720df5 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -13,7 +13,7 @@ struct icmp6hdr;
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> -void csum_ip4_header(struct iphdr *ip4h);
> +uint16_t csum_ip4_header(const struct iphdr *ip4h);
>  void csum_udp4(struct udphdr *udp4hr,
>  	       struct in_addr saddr, struct in_addr daddr,
>  	       const void *payload, size_t len);
> diff --git a/tap.c b/tap.c
> index 3ea03f720d6d..70f36a55314f 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -160,7 +160,7 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst,
>  	ip4h->protocol = proto;
>  	ip4h->saddr = src.s_addr;
>  	ip4h->daddr = dst.s_addr;
> -	csum_ip4_header(ip4h);
> +	ip4h->check = csum_ip4_header(ip4h);
>  	return ip4h + 1;
>  }
>  
> diff --git a/tcp.c b/tcp.c
> index 45ef5146729a..35e240f4ffc3 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
>  		trace("TCP: failed to set SO_SNDBUF to %i", v);
>  }
>  
> -/**
> - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * tcp_update_check_tcp4() - Update TCP checksum from stored one
>   * @buf:	L2 packet buffer with final IPv4 header
> @@ -1393,10 +1376,7 @@ do {									\
>  		b->iph.saddr = a4->s_addr;
>  		b->iph.daddr = c->ip4.addr_seen.s_addr;
>  
> -		if (check)
> -			b->iph.check = *check;
> -		else
> -			tcp_update_check_ip4(b);
> +		b->iph.check = check ? *check : csum_ip4_header(&b->iph);
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> diff --git a/udp.c b/udp.c
> index d514c864ab5b..e645c800a823 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
>  	}
>  }
>  
> -/**
> - * udp_update_check4() - Update checksum with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void udp_update_check4(struct udp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
>   * @eth_d:	Ethernet destination address, NULL if unchanged
> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>   *
>   * Return: size of tap frame with headers
>   */
> +#pragma GCC diagnostic push
> +/* ignore unaligned pointer value warning for &b->iph */
> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
>  static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  			      const struct timespec *now)
>  {
> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  		b->iph.saddr = b->s_in.sin_addr.s_addr;
>  	}
>  
> -	udp_update_check4(b);
> +	b->iph.check = csum_ip4_header(&b->iph);

Similar comment as I had on v1: I don't think this is safe.

If &b->iph is, say, 0x2000, it's all fine: when csum_ip4_header() needs
to access, say, ip4h->tot_len, it will dereference 0x2000 and look at
16 bits, 2 bytes into it.

If &b->iph is 0x2001, though, csum_ip4_header() will dereference 0x2001
and, on some architectures, boom.

You need to pass b, or, if possible, to align iph to a 4-bytes boundary.
There's a reason why I implemented it like it is now.

The current version is rather inconvenient and ugly, so it's great if
you manage to improve it this way, but you shouldn't risk dereferencing
unaligned pointers... unless you know for some reason that they are
aligned, of course.

-- 
Stefano


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-16  9:08   ` Stefano Brivio
@ 2024-02-16 14:17     ` Laurent Vivier
  2024-02-16 14:54       ` Stefano Brivio
  0 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-16 14:17 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 2/16/24 10:08, Stefano Brivio wrote:
> On Wed, 14 Feb 2024 09:56:26 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
>
>> ...
>>   /**
>>    * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
>>    * @eth_d:	Ethernet destination address, NULL if unchanged
>> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>>    *
>>    * Return: size of tap frame with headers
>>    */
>> +#pragma GCC diagnostic push
>> +/* ignore unaligned pointer value warning for &b->iph */
>> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
>>   static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>>   			      const struct timespec *now)
>>   {
>> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>>   		b->iph.saddr = b->s_in.sin_addr.s_addr;
>>   	}
>>   
>> -	udp_update_check4(b);
>> +	b->iph.check = csum_ip4_header(&b->iph);
> Similar comment as I had on v1: I don't think this is safe.
>
> If &b->iph is, say, 0x2000, it's all fine: when csum_ip4_header() needs
> to access, say, ip4h->tot_len, it will dereference 0x2000 and look at
> 16 bits, 2 bytes into it.
>
> If &b->iph is 0x2001, though, csum_ip4_header() will dereference 0x2001
> and, on some architectures, boom.

I don't understand how &b->iph cannot be aligned as b should be aligned and b is defined 
using udp4_l2_buf_t structure with _attribute__ ((packed, aligned(__alignof__(unsigned 
int)))).

Thanks,
Laurent




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-16 14:17     ` Laurent Vivier
@ 2024-02-16 14:54       ` Stefano Brivio
  2024-02-16 18:05         ` Laurent Vivier
  0 siblings, 1 reply; 28+ messages in thread
From: Stefano Brivio @ 2024-02-16 14:54 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Fri, 16 Feb 2024 15:17:13 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 2/16/24 10:08, Stefano Brivio wrote:
> > On Wed, 14 Feb 2024 09:56:26 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >  
> >> ...
> >>   /**
> >>    * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
> >>    * @eth_d:	Ethernet destination address, NULL if unchanged
> >> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> >>    *
> >>    * Return: size of tap frame with headers
> >>    */
> >> +#pragma GCC diagnostic push
> >> +/* ignore unaligned pointer value warning for &b->iph */
> >> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
> >>   static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >>   			      const struct timespec *now)
> >>   {
> >> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >>   		b->iph.saddr = b->s_in.sin_addr.s_addr;
> >>   	}
> >>   
> >> -	udp_update_check4(b);
> >> +	b->iph.check = csum_ip4_header(&b->iph);  
> > Similar comment as I had on v1: I don't think this is safe.
> >
> > If &b->iph is, say, 0x2000, it's all fine: when csum_ip4_header() needs
> > to access, say, ip4h->tot_len, it will dereference 0x2000 and look at
> > 16 bits, 2 bytes into it.
> >
> > If &b->iph is 0x2001, though, csum_ip4_header() will dereference 0x2001
> > and, on some architectures, boom.  
> 
> I don't understand how &b->iph cannot be aligned as b should be aligned and b is defined 
> using udp4_l2_buf_t structure with _attribute__ ((packed, aligned(__alignof__(unsigned 
> int)))).

That's because of the size of struct tap_hdr (18 bytes). On, at least,
x86_64, armhf, and i686:

  $ pahole passt

  [...]

  struct udp4_l2_buf_t {
          struct sockaddr_in         s_in;                 /*     0    16 */
          struct tap_hdr             taph;                 /*    16    18 */
          struct iphdr               iph;                  /*    34    20 */

  [...]

...we could align the start of 'taph' by adding 2 bytes of padding before
it, note that the size of struct sockaddr_in doesn't depend on the
architecture. But then you can't dereference 'taph', which is probably
even worse.

-- 
Stefano


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-16 14:54       ` Stefano Brivio
@ 2024-02-16 18:05         ` Laurent Vivier
  2024-02-16 18:24           ` Stefano Brivio
  0 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2024-02-16 18:05 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 2/16/24 15:54, Stefano Brivio wrote:
> On Fri, 16 Feb 2024 15:17:13 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
>
>> On 2/16/24 10:08, Stefano Brivio wrote:
>>> On Wed, 14 Feb 2024 09:56:26 +0100
>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>   
>>>> ...
>>>>    /**
>>>>     * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
>>>>     * @eth_d:	Ethernet destination address, NULL if unchanged
>>>> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>>>>     *
>>>>     * Return: size of tap frame with headers
>>>>     */
>>>> +#pragma GCC diagnostic push
>>>> +/* ignore unaligned pointer value warning for &b->iph */
>>>> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
>>>>    static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>>>>    			      const struct timespec *now)
>>>>    {
>>>> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>>>>    		b->iph.saddr = b->s_in.sin_addr.s_addr;
>>>>    	}
>>>>    
>>>> -	udp_update_check4(b);
>>>> +	b->iph.check = csum_ip4_header(&b->iph);
>>> Similar comment as I had on v1: I don't think this is safe.
>>>
>>> If &b->iph is, say, 0x2000, it's all fine: when csum_ip4_header() needs
>>> to access, say, ip4h->tot_len, it will dereference 0x2000 and look at
>>> 16 bits, 2 bytes into it.
>>>
>>> If &b->iph is 0x2001, though, csum_ip4_header() will dereference 0x2001
>>> and, on some architectures, boom.
>> I don't understand how &b->iph cannot be aligned as b should be aligned and b is defined
>> using udp4_l2_buf_t structure with _attribute__ ((packed, aligned(__alignof__(unsigned
>> int)))).
> That's because of the size of struct tap_hdr (18 bytes). On, at least,
> x86_64, armhf, and i686:
>
>    $ pahole passt
>
>    [...]
>
>    struct udp4_l2_buf_t {
>            struct sockaddr_in         s_in;                 /*     0    16 */
>            struct tap_hdr             taph;                 /*    16    18 */
>            struct iphdr               iph;                  /*    34    20 */
>
>    [...]
>
> ...we could align the start of 'taph' by adding 2 bytes of padding before
> it, note that the size of struct sockaddr_in doesn't depend on the
> architecture. But then you can't dereference 'taph', which is probably
> even worse.
>
So I think in the worst case iph is aligned on 2.

Do you know which architectures don't support this alignment?

Do you know if we will support this architecture?

I think I will send the v3 of my series without fixing that because I don't have enough 
time this week. I will address the problem later.

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-16 18:05         ` Laurent Vivier
@ 2024-02-16 18:24           ` Stefano Brivio
  2024-02-17 14:22             ` Laurent Vivier
  0 siblings, 1 reply; 28+ messages in thread
From: Stefano Brivio @ 2024-02-16 18:24 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Fri, 16 Feb 2024 19:05:39 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 2/16/24 15:54, Stefano Brivio wrote:
> > On Fri, 16 Feb 2024 15:17:13 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >  
> >> On 2/16/24 10:08, Stefano Brivio wrote:  
> >>> On Wed, 14 Feb 2024 09:56:26 +0100
> >>> Laurent Vivier <lvivier@redhat.com> wrote:
> >>>     
> >>>> ...
> >>>>    /**
> >>>>     * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
> >>>>     * @eth_d:	Ethernet destination address, NULL if unchanged
> >>>> @@ -579,6 +562,9 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> >>>>     *
> >>>>     * Return: size of tap frame with headers
> >>>>     */
> >>>> +#pragma GCC diagnostic push
> >>>> +/* ignore unaligned pointer value warning for &b->iph */
> >>>> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
> >>>>    static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >>>>    			      const struct timespec *now)
> >>>>    {
> >>>> @@ -614,13 +600,14 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >>>>    		b->iph.saddr = b->s_in.sin_addr.s_addr;
> >>>>    	}
> >>>>    
> >>>> -	udp_update_check4(b);
> >>>> +	b->iph.check = csum_ip4_header(&b->iph);  
> >>> Similar comment as I had on v1: I don't think this is safe.
> >>>
> >>> If &b->iph is, say, 0x2000, it's all fine: when csum_ip4_header() needs
> >>> to access, say, ip4h->tot_len, it will dereference 0x2000 and look at
> >>> 16 bits, 2 bytes into it.
> >>>
> >>> If &b->iph is 0x2001, though, csum_ip4_header() will dereference 0x2001
> >>> and, on some architectures, boom.  
> >> I don't understand how &b->iph cannot be aligned as b should be aligned and b is defined
> >> using udp4_l2_buf_t structure with _attribute__ ((packed, aligned(__alignof__(unsigned
> >> int)))).  
> > That's because of the size of struct tap_hdr (18 bytes). On, at least,
> > x86_64, armhf, and i686:
> >
> >    $ pahole passt
> >
> >    [...]
> >
> >    struct udp4_l2_buf_t {
> >            struct sockaddr_in         s_in;                 /*     0    16 */
> >            struct tap_hdr             taph;                 /*    16    18 */
> >            struct iphdr               iph;                  /*    34    20 */
> >
> >    [...]
> >
> > ...we could align the start of 'taph' by adding 2 bytes of padding before
> > it, note that the size of struct sockaddr_in doesn't depend on the
> > architecture. But then you can't dereference 'taph', which is probably
> > even worse.
> >  
> So I think in the worst case iph is aligned on 2.

...in every case, actually.

> Do you know which architectures don't support this alignment?

I couldn't find a table, from experience / memory it's not a good idea
to do this especially on several MIPS flavours and 32-bit ARM. From a
kernel tree:

  $ grep -rn "select HAVE_EFFICIENT_UNALIGNED_ACCESS" arch/
  arch/arc/Kconfig:352:	select HAVE_EFFICIENT_UNALIGNED_ACCESS
  arch/x86/Kconfig:216:	select HAVE_EFFICIENT_UNALIGNED_ACCESS
  arch/arm64/Kconfig:204:	select HAVE_EFFICIENT_UNALIGNED_ACCESS
  arch/s390/Kconfig:174:	select HAVE_EFFICIENT_UNALIGNED_ACCESS
  arch/loongarch/Kconfig:114:	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
  arch/powerpc/Kconfig:237:	select HAVE_EFFICIENT_UNALIGNED_ACCESS
  arch/m68k/Kconfig:30:	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED
  arch/arm/Kconfig:98:	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU

these are the architectures on which, at least under some conditions or
on some CPUs, unaligned access are generally okay. It could be
problematic on everything else (again, from my experience, it will
actually be).

> Do you know if we will support this architecture?

I think we should try to be nice to all architectures currently
supported by the Linux kernel. We have some tests for a number of
architectures (currently disabled, but I give some a run from time to
time). And Debian packages are built for these architectures:

  https://buildd.debian.org/status/package.php?p=passt

> I think I will send the v3 of my series without fixing that because I don't have enough 
> time this week. I will address the problem later.

No problem! I will also try to spend a moment and see if there's some
reasonable solution I can suggest. Thanks,

-- 
Stefano


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-16 18:24           ` Stefano Brivio
@ 2024-02-17 14:22             ` Laurent Vivier
  2024-02-17 14:37               ` Stefano Brivio
  2024-02-19  2:06               ` David Gibson
  0 siblings, 2 replies; 28+ messages in thread
From: Laurent Vivier @ 2024-02-17 14:22 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

On 2/16/24 19:24, Stefano Brivio wrote:
> On Fri, 16 Feb 2024 19:05:39 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
>
> ...
>> I think I will send the v3 of my series without fixing that because I don't have enough
>> time this week. I will address the problem later.
> No problem! I will also try to spend a moment and see if there's some
> reasonable solution I can suggest. Thanks,
>
I can imagine 4 solutions:

  * to use inline functions (could it helps the compiler to manage the alignment problem?)
  * to use C macros
  * to use these new functions only with vhost-user as we know pointers will be aligned.
  * to include structure we want to address in a generic wrapperstructure that will
    unalign it as it is done with the current structure.

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-17 14:22             ` Laurent Vivier
@ 2024-02-17 14:37               ` Stefano Brivio
  2024-02-19  2:06               ` David Gibson
  1 sibling, 0 replies; 28+ messages in thread
From: Stefano Brivio @ 2024-02-17 14:37 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Sat, 17 Feb 2024 15:22:12 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 2/16/24 19:24, Stefano Brivio wrote:
> > On Fri, 16 Feb 2024 19:05:39 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >
> > ...  
> >> I think I will send the v3 of my series without fixing that because I don't have enough
> >> time this week. I will address the problem later.  
> > No problem! I will also try to spend a moment and see if there's some
> > reasonable solution I can suggest. Thanks,
> >  
> I can imagine 4 solutions:
> 
>   * to use inline functions (could it helps the compiler to manage the alignment problem?)

I guess in practice yes, but it could be formally complicated for a
compiler to make sure no instructions dereferencing those pointers will
be emitted, plus this is on the packet path and if the compiler decides
to *not* inline, we shouldn't force that.

>   * to use C macros

I'm not sure exactly how, I have some vague idea of what you might
mean, it could be quite awkward though.

>   * to use these new functions only with vhost-user as we know pointers will be aligned.

This is quite unlikely to help: the problem is that 802.3 (Ethernet)
frame headers are (without VLANs) 14 bytes.

If you align the start of the frame, and we need those frames (and
pointers to them) whenever we talk Layer-2, the rest can't be aligned
to 4-bytes boundary.

>   * to include structure we want to address in a generic wrapperstructure that will
>     unalign it as it is done with the current structure.

This sounds like the easiest and safest way to me. Note that pointers
to 'taph' can be happily dereferenced, too. You can pass around
pointers to that, instead of using 'iph'.

I used (almost everywhere?) the start of the buffer, but 'taph' is fine
as well.

-- 
Stefano

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-17 14:22             ` Laurent Vivier
  2024-02-17 14:37               ` Stefano Brivio
@ 2024-02-19  2:06               ` David Gibson
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2024-02-19  2:06 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Stefano Brivio, passt-dev

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

On Sat, Feb 17, 2024 at 03:22:12PM +0100, Laurent Vivier wrote:
> On 2/16/24 19:24, Stefano Brivio wrote:
> > On Fri, 16 Feb 2024 19:05:39 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> > 
> > ...
> > > I think I will send the v3 of my series without fixing that because I don't have enough
> > > time this week. I will address the problem later.
> > No problem! I will also try to spend a moment and see if there's some
> > reasonable solution I can suggest. Thanks,
> > 
> I can imagine 4 solutions:
> 
>  * to use inline functions (could it helps the compiler to manage the alignment problem?)
>  * to use C macros
>  * to use these new functions only with vhost-user as we know pointers will be aligned.
>  * to include structure we want to address in a generic wrapperstructure that will
>    unalign it as it is done with the current structure.

I think some of my earlier comments suggested passing some values,
rather than reading them from the iph - this would take us closer to
the "feed" style of csum calculation that we already use for siphash.
As a side effect, I think that will sidestep at least some of these
problems.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-02-19  2:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14  8:56 [PATCH v2 0/8] Add vhost-user support to passt (part 1) Laurent Vivier
2024-02-14  8:56 ` [PATCH v2 1/8] iov: add some functions to manage iovec Laurent Vivier
2024-02-15  0:24   ` David Gibson
2024-02-15  0:32     ` David Gibson
2024-02-16  5:29     ` Stefano Brivio
2024-02-14  8:56 ` [PATCH v2 2/8] pcap: add pcap_iov() Laurent Vivier
2024-02-15  0:35   ` David Gibson
2024-02-16  5:30   ` Stefano Brivio
2024-02-14  8:56 ` [PATCH v2 3/8] checksum: align buffers Laurent Vivier
2024-02-15  0:40   ` David Gibson
2024-02-14  8:56 ` [PATCH v2 4/8] checksum: add csum_iov() Laurent Vivier
2024-02-15  0:44   ` David Gibson
2024-02-14  8:56 ` [PATCH v2 5/8] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
2024-02-15  2:29   ` David Gibson
2024-02-14  8:56 ` [PATCH v2 6/8] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
2024-02-15  2:51   ` David Gibson
2024-02-16  9:08   ` Stefano Brivio
2024-02-16 14:17     ` Laurent Vivier
2024-02-16 14:54       ` Stefano Brivio
2024-02-16 18:05         ` Laurent Vivier
2024-02-16 18:24           ` Stefano Brivio
2024-02-17 14:22             ` Laurent Vivier
2024-02-17 14:37               ` Stefano Brivio
2024-02-19  2:06               ` David Gibson
2024-02-14  8:56 ` [PATCH v2 7/8] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
2024-02-15  3:12   ` David Gibson
2024-02-14  8:56 ` [PATCH v2 8/8] tap: make tap_update_mac() generic Laurent Vivier
2024-02-15  3:13   ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).