[PATCH v3 0/9] Add vhost-user support to passt (part 1)

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH v3 0/9] Add vhost-user support to passt (part 1)
@ 2024-02-17 15:07 Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 1/9] iov: add some functions to manage iovec Laurent Vivier
                   ` (9 more replies)
  0 siblings, 10 replies; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

v3:
  - add a patch that has been extracted from:
    "tcp: extract buffer management from tcp_send_flag()"
    -> "tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()"
  - see detailed v3 history log in each patch
  - I didn't address the alignment problem when we provide a pointer
    to a sub-structure in the internal buffer structure.
    (for the last patches of the series).

v2 comparing to vhost-user full part:
  - part 1 includes only preliminary patches (checksum, iovec, cleanup)
  - see detailed v2 history log in each patch.

Full series v1 available at:

  [PATCH 00/24] Add vhost-user support to passt.
  https://url.corp.redhat.com/passt-vhost-user-v1

Thanks,
Laurent

Laurent Vivier (9):
  iov: add some functions to manage iovec
  pcap: add pcap_iov()
  checksum: align buffers
  checksum: add csum_iov()
  util: move IP stuff from util.[ch] to ip.[ch]
  checksum: use csum_ip4_header() in udp.c and tcp.c
  checksum: introduce functions to compute the header part checksum for
    TCP/UDP
  tap: make tap_update_mac() generic
  tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()

 Makefile     |  12 +--
 checksum.c   | 171 ++++++++++++++++++++++++++++++-----------
 checksum.h   |   9 ++-
 conf.c       |   1 +
 dhcp.c       |   1 +
 flow.c       |   1 +
 icmp.c       |   1 +
 iov.c        | 171 +++++++++++++++++++++++++++++++++++++++++
 iov.h        |  29 +++++++
 ip.c         |  72 +++++++++++++++++
 ip.h         |  86 +++++++++++++++++++++
 ndp.c        |   1 +
 pcap.c       |  62 +++++++++++++--
 pcap.h       |   1 +
 port_fwd.c   |   1 +
 qrap.c       |   1 +
 tap.c        |  14 ++--
 tap.h        |   2 +-
 tcp.c        | 213 +++++++++++++++++++++++++++++----------------------
 tcp_splice.c |   1 +
 udp.c        |  35 +++------
 util.c       |  55 -------------
 util.h       |  76 ------------------
 23 files changed, 703 insertions(+), 313 deletions(-)
 create mode 100644 iov.c
 create mode 100644 iov.h
 create mode 100644 ip.c
 create mode 100644 ip.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/9] iov: add some functions to manage iovec
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  2:45   ` David Gibson
  2024-02-17 15:07 ` [PATCH v3 2/9] pcap: add pcap_iov() Laurent Vivier
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce functions to copy to/from a buffer from/to an iovec array,
to compute data length in in bytes of an iovec and to copy memory from
an iovec to another.

iov_from_buf(), iov_to_buf(), iov_size(), iov_copy().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
       - update copyright statement
         (but keep it as proposed by Stefano)
       - open code of iov_fill_from_buf()/iov_fill_to_buf()
         into iov_frombuf()/iov_to_buf()
       - coding style cleanup
       - use size_t for the length of the vectors
    
    v2:
       - reorder added files in alphanetical order in Makefile
       - update comments, cosmetic cleanup
       - rename iov_from_buf_full/iov_to_buf_full to
         iov_fill_from_buf/iov_fill_to_buf
       - split loops that manage offset and bytes copy.
       - move iov_from_buf()/iov_to_buf() to iov.c

 Makefile |   8 +--
 iov.c    | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 iov.h    |  29 ++++++++++
 3 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 iov.c
 create mode 100644 iov.h

diff --git a/Makefile b/Makefile
index af4fa87e7e13..156398b3844e 100644
--- a/Makefile
+++ b/Makefile
@@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\"
 FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
 
 PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
-	igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \
-	passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \
-	util.c
+	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
+	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
+	tcp_splice.c udp.c util.c
 QRAP_SRCS = qrap.c
 SRCS = $(PASST_SRCS) $(QRAP_SRCS)
 
 MANPAGES = passt.1 pasta.1 qrap.1
 
 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
-	flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \
+	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
 	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
 	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
 HEADERS = $(PASST_HEADERS) seccomp.h
diff --git a/iov.c b/iov.c
new file mode 100644
index 000000000000..1135f87e2f45
--- /dev/null
+++ b/iov.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iov.h - helpers for using (partial) iovecs.
+ *
+ * Copyright Red Hat
+ * Author: Laurent Vivier <lvivier@redhat.com>
+ *
+ * This file also contains code originally from QEMU include/qemu/iov.h
+ * and licensed under the following terms:
+ *
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Amit Shah <amit.shah@redhat.com>
+ *  Michael Tokarev <mjt@tls.msk.ru>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+#include <sys/socket.h>
+
+#include "util.h"
+#include "iov.h"
+
+/**
+ * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec)
+ *                efficiently.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the
+ *             scatter/gather I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Byte offset in the iov array where copying should start.
+ * @buf:       Pointer to the source buffer containing the data to copy.
+ * @bytes:     Total number of bytes to copy from buf to iov.
+ *
+ * Returns:    The number of bytes successfully copied.
+ */
+size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
+		    size_t offset, const void *buf, size_t bytes)
+{
+	unsigned int i;
+	size_t copied;
+
+	if (__builtin_constant_p(bytes) && iov_cnt &&
+		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
+		memcpy((char *)iov[0].iov_base + offset, buf, bytes);
+		return bytes;
+	}
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
+		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
+
+		memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied,
+		       len);
+		copied += len;
+		offset = 0;
+	}
+
+	return copied;
+}
+
+/**
+ * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to
+ *		a buffer efficiently.
+ *
+ * @iov:       Pointer to the array of struct iovec describing the scatter/gather
+ *             I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ * @offset:    Offset within the first element of iov from where copying should start.
+ * @buf:       Pointer to the destination buffer where data will be copied.
+ * @bytes:     Total number of bytes to copy from iov to buf.
+ *
+ * Returns:    The number of bytes successfully copied.
+ */
+size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
+		  size_t offset, void *buf, size_t bytes)
+{
+	unsigned int i;
+	size_t copied;
+
+	if (__builtin_constant_p(bytes) && iov_cnt &&
+		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
+		memcpy(buf, (char *)iov[0].iov_base + offset, bytes);
+		return bytes;
+	}
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
+		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
+		memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
+		       len);
+		copied += len;
+		offset = 0;
+	}
+
+	return copied;
+}
+
+/**
+ * iov_size - Calculate the total size of a scatter/gather I/O vector
+ *            (struct iovec).
+ *
+ * @iov:       Pointer to the array of struct iovec describing the
+ *             scatter/gather I/O vector.
+ * @iov_cnt:   Number of elements in the iov array.
+ *
+ * Returns:    The total size in bytes.
+ */
+size_t iov_size(const struct iovec *iov, size_t iov_cnt)
+{
+	unsigned int i;
+	size_t len;
+
+	for (i = 0, len = 0; i < iov_cnt; i++)
+		len += iov[i].iov_len;
+
+	return len;
+}
+
+/**
+ * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to
+ *            another.
+ *
+ * @dst_iov:      Pointer to the destination array of struct iovec describing
+ *                the scatter/gather I/O vector to copy to.
+ * @dst_iov_cnt:  Number of elements in the destination iov array.
+ * @iov:          Pointer to the source array of struct iovec describing
+ *                the scatter/gather I/O vector to copy from.
+ * @iov_cnt:      Number of elements in the source iov array.
+ * @offset:       Offset within the source iov from where copying should start.
+ * @bytes:        Total number of bytes to copy from iov to dst_iov.
+ *
+ * Returns:       The number of elements successfully copied to the destination
+ *                iov array.
+ */
+unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt,
+		  const struct iovec *iov, size_t iov_cnt,
+		  size_t offset, size_t bytes)
+{
+	unsigned int i, j;
+	size_t len;
+
+	/* skipping offset bytes in the iovec */
+	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
+		offset -= iov[i].iov_len;
+
+	/* copying data */
+	for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) {
+		len = MIN(bytes, iov[i].iov_len - offset);
+
+		dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
+		dst_iov[j].iov_len = len;
+		j++;
+		bytes -= len;
+		offset = 0;
+	}
+
+	return j;
+}
diff --git a/iov.h b/iov.h
new file mode 100644
index 000000000000..ee35a75d0870
--- /dev/null
+++ b/iov.h
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iov.c - helpers for using (partial) iovecs.
+ *
+ * Copyrigh Red Hat
+ * Author: Laurent Vivier <lvivier@redhat.com>
+ *
+ * This file also contains code originally from QEMU include/qemu/iov.h:
+ *
+ * Author(s):
+ *  Amit Shah <amit.shah@redhat.com>
+ *  Michael Tokarev <mjt@tls.msk.ru>
+ */
+
+#ifndef IOVEC_H
+#define IOVEC_H
+
+#include <unistd.h>
+#include <string.h>
+
+size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
+                    size_t offset, const void *buf, size_t bytes);
+size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
+                  size_t offset, void *buf, size_t bytes);
+size_t iov_size(const struct iovec *iov, size_t iov_cnt);
+unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt,
+		  const struct iovec *iov, size_t iov_cnt,
+		  size_t offset, size_t bytes);
+#endif /* IOVEC_H */
-- 
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iov.c - helpers for using (partial) iovecs.
+ *
+ * Copyrigh Red Hat
+ * Author: Laurent Vivier <lvivier@redhat.com>
+ *
+ * This file also contains code originally from QEMU include/qemu/iov.h:
+ *
+ * Author(s):
+ *  Amit Shah <amit.shah@redhat.com>
+ *  Michael Tokarev <mjt@tls.msk.ru>
+ */
+
+#ifndef IOVEC_H
+#define IOVEC_H
+
+#include <unistd.h>
+#include <string.h>
+
+size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
+                    size_t offset, const void *buf, size_t bytes);
+size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
+                  size_t offset, void *buf, size_t bytes);
+size_t iov_size(const struct iovec *iov, size_t iov_cnt);
+unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt,
+		  const struct iovec *iov, size_t iov_cnt,
+		  size_t offset, size_t bytes);
+#endif /* IOVEC_H */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 2/9] pcap: add pcap_iov()
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 1/9] iov: add some functions to manage iovec Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  2:50   ` David Gibson
  2024-02-17 15:07 ` [PATCH v3 3/9] checksum: align buffers Laurent Vivier
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce a new function pcap_iov() to capture packet desribed by an IO
vector.

Move packet header writing to to pcap_header() and use it in
pcap_frame() and pcap_iov().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
      - update rational
      - update comment
      - use strerror(errno)
      - use size_t for io vector length
    
    v2:
      - introduce pcap_header(), a common helper to write
        packet header
      - use writev() rather than write() in a loop
      - add functions comment

 pcap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 pcap.h |  1 +
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/pcap.c b/pcap.c
index 501d52d4992b..4e213eea8113 100644
--- a/pcap.c
+++ b/pcap.c
@@ -20,6 +20,7 @@
 #include <sys/time.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/uio.h>
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
@@ -31,6 +32,7 @@
 #include "util.h"
 #include "passt.h"
 #include "log.h"
+#include "iov.h"
 
 #define PCAP_VERSION_MINOR 4
 
@@ -65,6 +67,28 @@ struct pcap_pkthdr {
 	uint32_t len;
 };
 
+/*
+ * pcap_header() - Write a pcap packet header to pcap file
+ *
+ * @len:       Length of the packet data.
+ * @tv:        Timestamp for the packet.
+ *
+ * Return: -1 in case of error, otherwise, 0 to indicate success.
+ */
+static int pcap_header(size_t len, const struct timeval *tv)
+{
+	struct pcap_pkthdr h;
+
+	h.tv_sec = tv->tv_sec;
+	h.tv_usec = tv->tv_usec;
+	h.caplen = h.len = len;
+
+	if (write(pcap_fd, &h, sizeof(h)) < 0)
+		return -1;
+
+	return 0;
+}
+
 /**
  * pcap_frame() - Capture a single frame to pcap file with given timestamp
  * @pkt:	Pointer to data buffer, including L2 headers
@@ -75,13 +99,7 @@ struct pcap_pkthdr {
  */
 static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
 {
-	struct pcap_pkthdr h;
-
-	h.tv_sec = tv->tv_sec;
-	h.tv_usec = tv->tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
+	if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0)
 		return -errno;
 
 	return 0;
@@ -130,6 +148,36 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 	}
 }
 
+/*
+ * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov)
+ *             to a pcap file descriptor (pcap_fd).
+ *
+ * @iov:       Pointer to the array of struct iovec describing the scatter/gather
+ *             I/O vector containing packet data to write, including L2 header
+ * @n:         Number of elements in the iov array.
+ */
+void pcap_iov(const struct iovec *iov, size_t n)
+{
+	struct timeval tv;
+	size_t len;
+
+	if (pcap_fd == -1)
+		return;
+
+	gettimeofday(&tv, NULL);
+
+	len = iov_size(iov, n);
+
+	if (pcap_header(len, &tv) < 0) {
+		debug("Cannot write pcap header");
+		return;
+	}
+
+	if (writev(pcap_fd, iov, n) < 0)
+		debug("Cannot log packet using writev(): %s\n",
+		      strerror(errno));
+}
+
 /**
  * pcap_init() - Initialise pcap file
  * @c:		Execution context
diff --git a/pcap.h b/pcap.h
index da5a7e846b72..4950f617f4c8 100644
--- a/pcap.h
+++ b/pcap.h
@@ -8,6 +8,7 @@
 
 void pcap(const char *pkt, size_t len);
 void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
+void pcap_iov(const struct iovec *iov, size_t n);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
-- 
@@ -8,6 +8,7 @@
 
 void pcap(const char *pkt, size_t len);
 void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
+void pcap_iov(const struct iovec *iov, size_t n);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 3/9] checksum: align buffers
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 1/9] iov: add some functions to manage iovec Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 2/9] pcap: add pcap_iov() Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 4/9] checksum: add csum_iov() Laurent Vivier
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson

If buffer is not aligned use sum_16b() only on the not aligned
part, and then use csum_avx2() on the remaining part

Remove unneeded now function csum_unaligned().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

Notes:
    v3:
      - Add David's R-b
    
    v2:
      - use ROUND_UP() and sizeof(__m256i)
      - fix function comment
      - remove csum_unaligned() and use csum() instead

 checksum.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/checksum.c b/checksum.c
index f21c9b7a14d1..65486b4625ba 100644
--- a/checksum.c
+++ b/checksum.c
@@ -56,6 +56,8 @@
 #include <linux/udp.h>
 #include <linux/icmpv6.h>
 
+#include "util.h"
+
 /* Checksums are optional for UDP over IPv4, so we usually just set
  * them to 0.  Change this to 1 to calculate real UDP over IPv4
  * checksums
@@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum)
 	return sum;
 }
 
-/**
- * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data
- * @buf:	Input data
- * @len:	Input length
- * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
- *
- * Return: 16-bit IPv4-style checksum
- */
-/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
-__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
-uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
-{
-	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
-}
+uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 /**
  * csum_ip4_header() - Calculate and set IPv4 header checksum
@@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 void csum_ip4_header(struct iphdr *ip4h)
 {
 	ip4h->check = 0;
-	ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0);
+	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
 }
 
 /**
@@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr,
 			+ htons(IPPROTO_UDP);
 		/* Add in partial checksum for the UDP header alone */
 		psum += sum_16b(udp4hr, sizeof(*udp4hr));
-		udp4hr->check = csum_unaligned(payload, len, psum);
+		udp4hr->check = csum(payload, len, psum);
 	}
 }
 
@@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 	/* Partial checksum for ICMP header alone */
 	psum = sum_16b(icmp4hr, sizeof(*icmp4hr));
 
-	icmp4hr->checksum = csum_unaligned(payload, len, psum);
+	icmp4hr->checksum = csum(payload, len, psum);
 }
 
 /**
@@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr,
 	udp6hr->check = 0;
 	/* Add in partial checksum for the UDP header alone */
 	psum += sum_16b(udp6hr, sizeof(*udp6hr));
-	udp6hr->check = csum_unaligned(payload, len, psum);
+	udp6hr->check = csum(payload, len, psum);
 }
 
 /**
@@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
 	icmp6hr->icmp6_cksum = 0;
 	/* Add in partial checksum for the ICMPv6 header alone */
 	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
-	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
+	icmp6hr->icmp6_cksum = csum(payload, len, psum);
 }
 
 #ifdef __AVX2__
@@ -397,17 +386,29 @@ less_than_128_bytes:
 
 /**
  * csum() - Compute TCP/IP-style checksum
- * @buf:	Input buffer, must be aligned to 32-byte boundary
+ * @buf:	Input buffer
  * @len:	Input length
  * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
  *
- * Return: 16-bit folded, complemented checksum sum
+ * Return: 16-bit folded, complemented checksum
  */
 /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return (uint16_t)~csum_fold(csum_avx2(buf, len, init));
+	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
+	unsigned int pad = align - (intptr_t)buf;
+
+	if (len < pad)
+		pad = len;
+
+	if (pad)
+		init += sum_16b(buf, pad);
+
+	if (len > pad)
+		init = csum_avx2((void *)align, len - pad, init);
+
+	return (uint16_t)~csum_fold(init);
 }
 
 #else /* __AVX2__ */
@@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return csum_unaligned(buf, len, init);
+	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
 #endif /* !__AVX2__ */
-- 
@@ -56,6 +56,8 @@
 #include <linux/udp.h>
 #include <linux/icmpv6.h>
 
+#include "util.h"
+
 /* Checksums are optional for UDP over IPv4, so we usually just set
  * them to 0.  Change this to 1 to calculate real UDP over IPv4
  * checksums
@@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum)
 	return sum;
 }
 
-/**
- * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data
- * @buf:	Input data
- * @len:	Input length
- * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
- *
- * Return: 16-bit IPv4-style checksum
- */
-/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
-__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
-uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
-{
-	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
-}
+uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 /**
  * csum_ip4_header() - Calculate and set IPv4 header checksum
@@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 void csum_ip4_header(struct iphdr *ip4h)
 {
 	ip4h->check = 0;
-	ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0);
+	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
 }
 
 /**
@@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr,
 			+ htons(IPPROTO_UDP);
 		/* Add in partial checksum for the UDP header alone */
 		psum += sum_16b(udp4hr, sizeof(*udp4hr));
-		udp4hr->check = csum_unaligned(payload, len, psum);
+		udp4hr->check = csum(payload, len, psum);
 	}
 }
 
@@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 	/* Partial checksum for ICMP header alone */
 	psum = sum_16b(icmp4hr, sizeof(*icmp4hr));
 
-	icmp4hr->checksum = csum_unaligned(payload, len, psum);
+	icmp4hr->checksum = csum(payload, len, psum);
 }
 
 /**
@@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr,
 	udp6hr->check = 0;
 	/* Add in partial checksum for the UDP header alone */
 	psum += sum_16b(udp6hr, sizeof(*udp6hr));
-	udp6hr->check = csum_unaligned(payload, len, psum);
+	udp6hr->check = csum(payload, len, psum);
 }
 
 /**
@@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
 	icmp6hr->icmp6_cksum = 0;
 	/* Add in partial checksum for the ICMPv6 header alone */
 	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
-	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
+	icmp6hr->icmp6_cksum = csum(payload, len, psum);
 }
 
 #ifdef __AVX2__
@@ -397,17 +386,29 @@ less_than_128_bytes:
 
 /**
  * csum() - Compute TCP/IP-style checksum
- * @buf:	Input buffer, must be aligned to 32-byte boundary
+ * @buf:	Input buffer
  * @len:	Input length
  * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
  *
- * Return: 16-bit folded, complemented checksum sum
+ * Return: 16-bit folded, complemented checksum
  */
 /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return (uint16_t)~csum_fold(csum_avx2(buf, len, init));
+	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
+	unsigned int pad = align - (intptr_t)buf;
+
+	if (len < pad)
+		pad = len;
+
+	if (pad)
+		init += sum_16b(buf, pad);
+
+	if (len > pad)
+		init = csum_avx2((void *)align, len - pad, init);
+
+	return (uint16_t)~csum_fold(init);
 }
 
 #else /* __AVX2__ */
@@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return csum_unaligned(buf, len, init);
+	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
 #endif /* !__AVX2__ */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 4/9] checksum: add csum_iov()
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (2 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 3/9] checksum: align buffers Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  2:52   ` David Gibson
  2024-02-17 15:07 ` [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Introduce the function csum_unfolded() that computes the unfolded
32-bit checksum of a data buffer, and call it from csum() that returns
the folded value.

Introduce csum_iov() that computes the checksum using csum_folded() on
all vectors of the iovec array and returns the folded result.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
      - update comments
      - use size_t for the IO vectors length
      - include checksum.h in checksum.c
      - export csum_unfolded() (for later)
    
    v2:
      - fix typo and superfluous space
      - update comments

 checksum.c | 56 ++++++++++++++++++++++++++++++++++++++++++------------
 checksum.h |  2 ++
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/checksum.c b/checksum.c
index 65486b4625ba..74e3742bc6f6 100644
--- a/checksum.c
+++ b/checksum.c
@@ -57,6 +57,7 @@
 #include <linux/icmpv6.h>
 
 #include "util.h"
+#include "checksum.h"
 
 /* Checksums are optional for UDP over IPv4, so we usually just set
  * them to 0.  Change this to 1 to calculate real UDP over IPv4
@@ -385,16 +386,16 @@ less_than_128_bytes:
 }
 
 /**
- * csum() - Compute TCP/IP-style checksum
- * @buf:	Input buffer
- * @len:	Input length
- * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * csum_unfolded - Calculate the unfolded checksum of a data buffer.
  *
- * Return: 16-bit folded, complemented checksum
+ * @buf:   Input buffer
+ * @len:   Input length
+ * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
+ *
+ * Return: 32-bit unfolded
  */
-/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
-uint16_t csum(const void *buf, size_t len, uint32_t init)
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
 {
 	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
 	unsigned int pad = align - (intptr_t)buf;
@@ -408,16 +409,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 	if (len > pad)
 		init = csum_avx2((void *)align, len - pad, init);
 
-	return (uint16_t)~csum_fold(init);
+	return init;
 }
-
 #else /* __AVX2__ */
+/**
+ * csum_unfolded - Calculate the unfolded checksum of a data buffer.
+ *
+ * @buf:   Input buffer
+ * @len:   Input length
+ * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
+ *
+ * Return: 32-bit unfolded checksum
+ */
+__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
+{
+	return sum_16b(buf, len) + init;
+}
+#endif /* !__AVX2__ */
 
 /**
  * csum() - Compute TCP/IP-style checksum
  * @buf:	Input buffer
  * @len:	Input length
- * @sum:	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
  *
  * Return: 16-bit folded, complemented checksum
  */
@@ -425,7 +440,24 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
 __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
 uint16_t csum(const void *buf, size_t len, uint32_t init)
 {
-	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
+	return (uint16_t)~csum_fold(csum_unfolded(buf, len, init));
 }
 
-#endif /* !__AVX2__ */
+/**
+ * csum_iov() - Calculates the unfolded checksum over an array of IO vectors
+ *
+ * @iov		Pointer to the array of IO vectors
+ * @n		Length of the array
+ * @init	Initial 32-bit checksum, 0 for no pre-computed checksum
+ *
+ * Return: 16-bit folded, complemented checksum
+ */
+uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; i++)
+		init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init);
+
+	return (uint16_t)~csum_fold(init);
+}
diff --git a/checksum.h b/checksum.h
index 21c0310d3804..dfa705a04a24 100644
--- a/checksum.h
+++ b/checksum.h
@@ -24,6 +24,8 @@ void csum_udp6(struct udphdr *udp6hr,
 void csum_icmp6(struct icmp6hdr *icmp6hr,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
 		const void *payload, size_t len);
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
+uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init);
 
 #endif /* CHECKSUM_H */
-- 
@@ -24,6 +24,8 @@ void csum_udp6(struct udphdr *udp6hr,
 void csum_icmp6(struct icmp6hdr *icmp6hr,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
 		const void *payload, size_t len);
+uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
+uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init);
 
 #endif /* CHECKSUM_H */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch]
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (3 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 4/9] checksum: add csum_iov() Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  2:59   ` David Gibson
  2024-02-17 15:07 ` [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson

Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures.  Modify various files to include the new header ip.h when
it's needed.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

Notes:
    v3:
      - rewrap rational
      - add David's R-b
    
    v2:
      - update rational and comments

 Makefile     |  8 ++---
 conf.c       |  1 +
 dhcp.c       |  1 +
 flow.c       |  1 +
 icmp.c       |  1 +
 ip.c         | 72 +++++++++++++++++++++++++++++++++++++++++++
 ip.h         | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 ndp.c        |  1 +
 port_fwd.c   |  1 +
 qrap.c       |  1 +
 tap.c        |  1 +
 tcp.c        |  1 +
 tcp_splice.c |  1 +
 udp.c        |  1 +
 util.c       | 55 ---------------------------------
 util.h       | 76 ----------------------------------------------
 16 files changed, 173 insertions(+), 135 deletions(-)
 create mode 100644 ip.c
 create mode 100644 ip.h

diff --git a/Makefile b/Makefile
index 156398b3844e..e1ebb454bc6b 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\"
 FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
 
 PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
-	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
+	igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
 	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
 	tcp_splice.c udp.c util.c
 QRAP_SRCS = qrap.c
@@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS)
 MANPAGES = passt.1 pasta.1 qrap.1
 
 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
-	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
-	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
-	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
+	flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \
+	ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \
+	siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
 HEADERS = $(PASST_HEADERS) seccomp.h
 
 C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 };
diff --git a/conf.c b/conf.c
index 5e15b665be9c..93bfda331349 100644
--- a/conf.c
+++ b/conf.c
@@ -35,6 +35,7 @@
 #include <netinet/if_ether.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "netlink.h"
 #include "udp.h"
diff --git a/dhcp.c b/dhcp.c
index 110772867632..ff4834a3dce9 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -25,6 +25,7 @@
 #include <limits.h>
 
 #include "util.h"
+#include "ip.h"
 #include "checksum.h"
 #include "packet.h"
 #include "passt.h"
diff --git a/flow.c b/flow.c
index 5e94a7a949e5..73d52bda8774 100644
--- a/flow.c
+++ b/flow.c
@@ -11,6 +11,7 @@
 #include <string.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "siphash.h"
 #include "inany.h"
diff --git a/icmp.c b/icmp.c
index 9434fc5a7490..3b85a8578316 100644
--- a/icmp.c
+++ b/icmp.c
@@ -33,6 +33,7 @@
 
 #include "packet.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "log.h"
diff --git a/ip.c b/ip.c
new file mode 100644
index 000000000000..2cc7f6548aff
--- /dev/null
+++ b/ip.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/* PASST - Plug A Simple Socket Transport
+ *  for qemu/UNIX domain socket mode
+ *
+ * PASTA - Pack A Subtle Tap Abstraction
+ *  for network namespace/tap device mode
+ *
+ * ip.c - IP related functions
+ *
+ * Copyright (c) 2020-2021 Red Hat GmbH
+ * Author: Stefano Brivio <sbrivio@redhat.com>
+ */
+
+#include <stddef.h>
+#include "util.h"
+#include "ip.h"
+
+#define IPV6_NH_OPT(nh)							\
+	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
+	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
+	 (nh) == 140 || (nh) == 253 || (nh) == 254)
+
+/**
+ * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
+ * @p:		Packet pool, packet number @idx has IPv6 header at @offset
+ * @idx:	Index of packet in pool
+ * @offset:	Pre-calculated IPv6 header offset
+ * @proto:	Filled with L4 protocol number
+ * @dlen:	Data length (payload excluding header extensions), set on return
+ *
+ * Return: pointer to L4 header, NULL if not found
+ */
+char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
+		 size_t *dlen)
+{
+	const struct ipv6_opt_hdr *o;
+	const struct ipv6hdr *ip6h;
+	char *base;
+	int hdrlen;
+	uint8_t nh;
+
+	base = packet_get(p, idx, 0, 0, NULL);
+	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
+	if (!ip6h)
+		return NULL;
+
+	offset += sizeof(*ip6h);
+
+	nh = ip6h->nexthdr;
+	if (!IPV6_NH_OPT(nh))
+		goto found;
+
+	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
+		nh = o->nexthdr;
+		hdrlen = (o->hdrlen + 1) * 8;
+
+		if (IPV6_NH_OPT(nh))
+			offset += hdrlen;
+		else
+			goto found;
+	}
+
+	return NULL;
+
+found:
+	if (nh == 59)
+		return NULL;
+
+	*proto = nh;
+	return base + offset;
+}
diff --git a/ip.h b/ip.h
new file mode 100644
index 000000000000..b2e08bc049f3
--- /dev/null
+++ b/ip.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (c) 2021 Red Hat GmbH
+ * Author: Stefano Brivio <sbrivio@redhat.com>
+ */
+
+#ifndef IP_H
+#define IP_H
+
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+
+#define IN4_IS_ADDR_UNSPECIFIED(a) \
+	((a)->s_addr == htonl_constant(INADDR_ANY))
+#define IN4_IS_ADDR_BROADCAST(a) \
+	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
+#define IN4_IS_ADDR_LOOPBACK(a) \
+	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+#define IN4_IS_ADDR_MULTICAST(a) \
+	(IN_MULTICAST(ntohl((a)->s_addr)))
+#define IN4_ARE_ADDR_EQUAL(a, b) \
+	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
+#define IN4ADDR_LOOPBACK_INIT \
+	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
+#define IN4ADDR_ANY_INIT \
+	{ .s_addr	= htonl_constant(INADDR_ANY) }
+
+#define L2_BUF_IP4_INIT(proto)						\
+	{								\
+		.version	= 4,					\
+		.ihl		= 5,					\
+		.tos		= 0,					\
+		.tot_len	= 0,					\
+		.id		= 0,					\
+		.frag_off	= 0,					\
+		.ttl		= 0xff,					\
+		.protocol	= (proto),				\
+		.saddr		= 0,					\
+		.daddr		= 0,					\
+	}
+#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
+				 (uint32_t)htons_constant(0xff00 | (proto)))
+
+#define L2_BUF_IP6_INIT(proto)						\
+	{								\
+		.priority	= 0,					\
+		.version	= 6,					\
+		.flow_lbl	= { 0 },				\
+		.payload_len	= 0,					\
+		.nexthdr	= (proto),				\
+		.hop_limit	= 255,					\
+		.saddr		= IN6ADDR_ANY_INIT,			\
+		.daddr		= IN6ADDR_ANY_INIT,			\
+	}
+
+struct ipv6hdr {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+#if __BYTE_ORDER == __BIG_ENDIAN
+	uint8_t			version:4,
+				priority:4;
+#else
+	uint8_t			priority:4,
+				version:4;
+#endif
+#pragma GCC diagnostic pop
+	uint8_t			flow_lbl[3];
+
+	uint16_t		payload_len;
+	uint8_t			nexthdr;
+	uint8_t			hop_limit;
+
+	struct in6_addr		saddr;
+	struct in6_addr		daddr;
+};
+
+struct ipv6_opt_hdr {
+	uint8_t			nexthdr;
+	uint8_t			hdrlen;
+	/*
+	 * TLV encoded option data follows.
+	 */
+} __attribute__((packed));	/* required for some archs */
+
+char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
+		 size_t *dlen);
+#endif /* IP_H */
diff --git a/ndp.c b/ndp.c
index 4c85ab8bcaee..c58f4b222b76 100644
--- a/ndp.c
+++ b/ndp.c
@@ -28,6 +28,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "log.h"
diff --git a/port_fwd.c b/port_fwd.c
index 6f6c836c57ad..e1ec31e2232c 100644
--- a/port_fwd.c
+++ b/port_fwd.c
@@ -21,6 +21,7 @@
 #include <stdio.h>
 
 #include "util.h"
+#include "ip.h"
 #include "port_fwd.h"
 #include "passt.h"
 #include "lineread.h"
diff --git a/qrap.c b/qrap.c
index 97f350a4bf0b..d59670621731 100644
--- a/qrap.c
+++ b/qrap.c
@@ -32,6 +32,7 @@
 #include <linux/icmpv6.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "arp.h"
 
diff --git a/tap.c b/tap.c
index 396dee7eef25..3ea03f720d6d 100644
--- a/tap.c
+++ b/tap.c
@@ -45,6 +45,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "arp.h"
 #include "dhcp.h"
diff --git a/tcp.c b/tcp.c
index 2ab443d5c3f2..45ef5146729a 100644
--- a/tcp.c
+++ b/tcp.c
@@ -289,6 +289,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "siphash.h"
diff --git a/tcp_splice.c b/tcp_splice.c
index 26d32065cd47..66575ca95a1e 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -49,6 +49,7 @@
 #include <sys/socket.h>
 
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "log.h"
 #include "tcp_splice.h"
diff --git a/udp.c b/udp.c
index 933f24b81616..56b58bd8b43a 100644
--- a/udp.c
+++ b/udp.c
@@ -112,6 +112,7 @@
 
 #include "checksum.h"
 #include "util.h"
+#include "ip.h"
 #include "passt.h"
 #include "tap.h"
 #include "pcap.h"
diff --git a/util.c b/util.c
index 21b35ff94db1..f73ea1d98a09 100644
--- a/util.c
+++ b/util.c
@@ -30,61 +30,6 @@
 #include "packet.h"
 #include "log.h"
 
-#define IPV6_NH_OPT(nh)							\
-	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
-	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
-	 (nh) == 140 || (nh) == 253 || (nh) == 254)
-
-/**
- * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
- * @p:		Packet pool, packet number @idx has IPv6 header at @offset
- * @idx:	Index of packet in pool
- * @offset:	Pre-calculated IPv6 header offset
- * @proto:	Filled with L4 protocol number
- * @dlen:	Data length (payload excluding header extensions), set on return
- *
- * Return: pointer to L4 header, NULL if not found
- */
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
-		 size_t *dlen)
-{
-	const struct ipv6_opt_hdr *o;
-	const struct ipv6hdr *ip6h;
-	char *base;
-	int hdrlen;
-	uint8_t nh;
-
-	base = packet_get(p, idx, 0, 0, NULL);
-	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
-	if (!ip6h)
-		return NULL;
-
-	offset += sizeof(*ip6h);
-
-	nh = ip6h->nexthdr;
-	if (!IPV6_NH_OPT(nh))
-		goto found;
-
-	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
-		nh = o->nexthdr;
-		hdrlen = (o->hdrlen + 1) * 8;
-
-		if (IPV6_NH_OPT(nh))
-			offset += hdrlen;
-		else
-			goto found;
-	}
-
-	return NULL;
-
-found:
-	if (nh == 59)
-		return NULL;
-
-	*proto = nh;
-	return base + offset;
-}
-
 /**
  * sock_l4() - Create and bind socket for given L4, add to epoll list
  * @c:		Execution context
diff --git a/util.h b/util.h
index d2320f8cc99a..f7c3dfee9972 100644
--- a/util.h
+++ b/util.h
@@ -110,22 +110,6 @@
 #define	htonl_constant(x)	(__bswap_constant_32(x))
 #endif
 
-#define IN4_IS_ADDR_UNSPECIFIED(a) \
-	((a)->s_addr == htonl_constant(INADDR_ANY))
-#define IN4_IS_ADDR_BROADCAST(a) \
-	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
-#define IN4_IS_ADDR_LOOPBACK(a) \
-	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
-#define IN4_IS_ADDR_MULTICAST(a) \
-	(IN_MULTICAST(ntohl((a)->s_addr)))
-#define IN4_ARE_ADDR_EQUAL(a, b) \
-	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
-#define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
-#define IN4ADDR_ANY_INIT \
-	{ .s_addr	= htonl_constant(INADDR_ANY) }
-
-
 #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
 int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 	     void *arg);
@@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_IP4_INIT(proto)						\
-	{								\
-		.version	= 4,					\
-		.ihl		= 5,					\
-		.tos		= 0,					\
-		.tot_len	= 0,					\
-		.id		= 0,					\
-		.frag_off	= 0,					\
-		.ttl		= 0xff,					\
-		.protocol	= (proto),				\
-		.saddr		= 0,					\
-		.daddr		= 0,					\
-	}
-#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
-				 (uint32_t)htons_constant(0xff00 | (proto)))
-
-#define L2_BUF_IP6_INIT(proto)						\
-	{								\
-		.priority	= 0,					\
-		.version	= 6,					\
-		.flow_lbl	= { 0 },				\
-		.payload_len	= 0,					\
-		.nexthdr	= (proto),				\
-		.hop_limit	= 255,					\
-		.saddr		= IN6ADDR_ANY_INIT,			\
-		.daddr		= IN6ADDR_ANY_INIT,			\
-	}
-
 #define RCVBUF_BIG		(2UL * 1024 * 1024)
 #define SNDBUF_BIG		(4UL * 1024 * 1024)
 #define SNDBUF_SMALL		(128UL * 1024)
@@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 #include <net/if.h>
 #include <limits.h>
 #include <stdint.h>
-#include <netinet/ip6.h>
 
 #include "packet.h"
 
 struct ctx;
 
-struct ipv6hdr {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wpedantic"
-#if __BYTE_ORDER == __BIG_ENDIAN
-	uint8_t			version:4,
-				priority:4;
-#else
-	uint8_t			priority:4,
-				version:4;
-#endif
-#pragma GCC diagnostic pop
-	uint8_t			flow_lbl[3];
-
-	uint16_t		payload_len;
-	uint8_t			nexthdr;
-	uint8_t			hop_limit;
-
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-};
-
-struct ipv6_opt_hdr {
-	uint8_t			nexthdr;
-	uint8_t			hdrlen;
-	/*
-	 * TLV encoded option data follows.
-	 */
-} __attribute__((packed));	/* required for some archs */
-
 /* cppcheck-suppress funcArgNamesDifferent */
 __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); }
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
-		 size_t *dlen);
 int sock_l4(const struct ctx *c, int af, uint8_t proto,
 	    const void *bind_addr, const char *ifname, uint16_t port,
 	    uint32_t data);
-- 
@@ -110,22 +110,6 @@
 #define	htonl_constant(x)	(__bswap_constant_32(x))
 #endif
 
-#define IN4_IS_ADDR_UNSPECIFIED(a) \
-	((a)->s_addr == htonl_constant(INADDR_ANY))
-#define IN4_IS_ADDR_BROADCAST(a) \
-	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
-#define IN4_IS_ADDR_LOOPBACK(a) \
-	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
-#define IN4_IS_ADDR_MULTICAST(a) \
-	(IN_MULTICAST(ntohl((a)->s_addr)))
-#define IN4_ARE_ADDR_EQUAL(a, b) \
-	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
-#define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
-#define IN4ADDR_ANY_INIT \
-	{ .s_addr	= htonl_constant(INADDR_ANY) }
-
-
 #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
 int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 	     void *arg);
@@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_IP4_INIT(proto)						\
-	{								\
-		.version	= 4,					\
-		.ihl		= 5,					\
-		.tos		= 0,					\
-		.tot_len	= 0,					\
-		.id		= 0,					\
-		.frag_off	= 0,					\
-		.ttl		= 0xff,					\
-		.protocol	= (proto),				\
-		.saddr		= 0,					\
-		.daddr		= 0,					\
-	}
-#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
-				 (uint32_t)htons_constant(0xff00 | (proto)))
-
-#define L2_BUF_IP6_INIT(proto)						\
-	{								\
-		.priority	= 0,					\
-		.version	= 6,					\
-		.flow_lbl	= { 0 },				\
-		.payload_len	= 0,					\
-		.nexthdr	= (proto),				\
-		.hop_limit	= 255,					\
-		.saddr		= IN6ADDR_ANY_INIT,			\
-		.daddr		= IN6ADDR_ANY_INIT,			\
-	}
-
 #define RCVBUF_BIG		(2UL * 1024 * 1024)
 #define SNDBUF_BIG		(4UL * 1024 * 1024)
 #define SNDBUF_SMALL		(128UL * 1024)
@@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 #include <net/if.h>
 #include <limits.h>
 #include <stdint.h>
-#include <netinet/ip6.h>
 
 #include "packet.h"
 
 struct ctx;
 
-struct ipv6hdr {
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wpedantic"
-#if __BYTE_ORDER == __BIG_ENDIAN
-	uint8_t			version:4,
-				priority:4;
-#else
-	uint8_t			priority:4,
-				version:4;
-#endif
-#pragma GCC diagnostic pop
-	uint8_t			flow_lbl[3];
-
-	uint16_t		payload_len;
-	uint8_t			nexthdr;
-	uint8_t			hop_limit;
-
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-};
-
-struct ipv6_opt_hdr {
-	uint8_t			nexthdr;
-	uint8_t			hdrlen;
-	/*
-	 * TLV encoded option data follows.
-	 */
-} __attribute__((packed));	/* required for some archs */
-
 /* cppcheck-suppress funcArgNamesDifferent */
 __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); }
-char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
-		 size_t *dlen);
 int sock_l4(const struct ctx *c, int af, uint8_t proto,
 	    const void *bind_addr, const char *ifname, uint16_t port,
 	    uint32_t data);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (4 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  3:01   ` David Gibson
  2024-02-29 16:24   ` Stefano Brivio
  2024-02-17 15:07 ` [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

We can find the same function to compute the IPv4 header
checksum in tcp.c, udp.c and tap.c

Use the function defined for tap.c, csum_ip4_header(), but
with the code used in tcp.c and udp.c as it doesn't need a fully
initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
      - function parameters provide tot_len, saddr, daddr and protocol
       rather than an iphdr
    
    v2:
      - use csum_ip4_header() from checksum.c
      - use code from tcp.c and udp.c in csum_ip4_header()
      - use "const struct iphfr *", check is not updated by the
        function but by the caller.

 checksum.c | 17 +++++++++++++----
 checksum.h |  3 ++-
 tap.c      |  3 ++-
 tcp.c      | 24 +++---------------------
 udp.c      | 20 ++------------------
 5 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/checksum.c b/checksum.c
index 74e3742bc6f6..511b296a9a80 100644
--- a/checksum.c
+++ b/checksum.c
@@ -57,6 +57,7 @@
 #include <linux/icmpv6.h>
 
 #include "util.h"
+#include "ip.h"
 #include "checksum.h"
 
 /* Checksums are optional for UDP over IPv4, so we usually just set
@@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
 uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 /**
- * csum_ip4_header() - Calculate and set IPv4 header checksum
+ * csum_ip4_header() - Calculate IPv4 header checksum
  * @ip4h:	IPv4 header
  */
-void csum_ip4_header(struct iphdr *ip4h)
+uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
+			 uint32_t saddr, uint32_t daddr)
 {
-	ip4h->check = 0;
-	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
+	uint32_t sum = L2_BUF_IP4_PSUM(protocol);
+
+	sum += tot_len;
+	sum += (saddr >> 16) & 0xffff;
+	sum += saddr & 0xffff;
+	sum += (daddr >> 16) & 0xffff;
+	sum += daddr & 0xffff;
+
+	return ~csum_fold(sum);
 }
 
 /**
diff --git a/checksum.h b/checksum.h
index dfa705a04a24..92db73612b6e 100644
--- a/checksum.h
+++ b/checksum.h
@@ -13,7 +13,8 @@ struct icmp6hdr;
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
-void csum_ip4_header(struct iphdr *ip4h);
+uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
+			 uint32_t saddr, uint32_t daddr);
 void csum_udp4(struct udphdr *udp4hr,
 	       struct in_addr saddr, struct in_addr daddr,
 	       const void *payload, size_t len);
diff --git a/tap.c b/tap.c
index 3ea03f720d6d..2d590e8525a0 100644
--- a/tap.c
+++ b/tap.c
@@ -160,7 +160,8 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst,
 	ip4h->protocol = proto;
 	ip4h->saddr = src.s_addr;
 	ip4h->daddr = dst.s_addr;
-	csum_ip4_header(ip4h);
+	ip4h->check = csum_ip4_header(ip4h->tot_len, proto,
+				      src.s_addr, dst.s_addr);
 	return ip4h + 1;
 }
 
diff --git a/tcp.c b/tcp.c
index 45ef5146729a..8887656d3ee8 100644
--- a/tcp.c
+++ b/tcp.c
@@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
 		trace("TCP: failed to set SO_SNDBUF to %i", v);
 }
 
-/**
- * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one
- * @buf:	L2 packet buffer with final IPv4 header
- */
-static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf)
-{
-	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP);
-
-	sum += buf->iph.tot_len;
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-
-	buf->iph.check = (uint16_t)~csum_fold(sum);
-}
-
 /**
  * tcp_update_check_tcp4() - Update TCP checksum from stored one
  * @buf:	L2 packet buffer with final IPv4 header
@@ -1393,10 +1376,9 @@ do {									\
 		b->iph.saddr = a4->s_addr;
 		b->iph.daddr = c->ip4.addr_seen.s_addr;
 
-		if (check)
-			b->iph.check = *check;
-		else
-			tcp_update_check_ip4(b);
+		b->iph.check = check ? *check :
+			       csum_ip4_header(b->iph.tot_len, IPPROTO_TCP,
+					       b->iph.saddr, b->iph.daddr);
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
diff --git a/udp.c b/udp.c
index 56b58bd8b43a..cd34f659210b 100644
--- a/udp.c
+++ b/udp.c
@@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
 	}
 }
 
-/**
- * udp_update_check4() - Update checksum with variable parts from stored one
- * @buf:	L2 packet buffer with final IPv4 header
- */
-static void udp_update_check4(struct udp4_l2_buf_t *buf)
-{
-	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
-
-	sum += buf->iph.tot_len;
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-
-	buf->iph.check = (uint16_t)~csum_fold(sum);
-}
-
 /**
  * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
  * @eth_d:	Ethernet destination address, NULL if unchanged
@@ -614,7 +597,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 		b->iph.saddr = b->s_in.sin_addr.s_addr;
 	}
 
-	udp_update_check4(b);
+	b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
+				       b->iph.saddr, b->iph.daddr);
 	b->uh.source = b->s_in.sin_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
-- 
@@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
 	}
 }
 
-/**
- * udp_update_check4() - Update checksum with variable parts from stored one
- * @buf:	L2 packet buffer with final IPv4 header
- */
-static void udp_update_check4(struct udp4_l2_buf_t *buf)
-{
-	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
-
-	sum += buf->iph.tot_len;
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-
-	buf->iph.check = (uint16_t)~csum_fold(sum);
-}
-
 /**
  * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
  * @eth_d:	Ethernet destination address, NULL if unchanged
@@ -614,7 +597,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 		b->iph.saddr = b->s_in.sin_addr.s_addr;
 	}
 
-	udp_update_check4(b);
+	b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
+				       b->iph.saddr, b->iph.daddr);
 	b->uh.source = b->s_in.sin_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (5 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  3:08   ` David Gibson
  2024-02-17 15:07 ` [PATCH v3 8/9] tap: make tap_update_mac() generic Laurent Vivier
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

The TCP and UDP checksums are computed using the data in the TCP/UDP
payload but also some informations in the IP header (protocol,
length, source and destination addresses).

We add two functions, proto_ipv4_header_psum() and
proto_ipv6_header_psum(), to compute the checksum of the IP
header part.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
      - function parameters provide tot_len, saddr, daddr and protocol
        rather than an iphdr/ipv6hdr
    
    v2:
      - move new function to checksum.c
      - use _psum rather than _checksum in the name
      - replace csum_udp4() and csum_udp6() by the new function

 checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------
 checksum.h |  4 ++++
 tcp.c      | 44 ++++++++++++++++-------------------
 udp.c      | 10 ++++----
 4 files changed, 81 insertions(+), 44 deletions(-)

diff --git a/checksum.c b/checksum.c
index 511b296a9a80..55bf1340a257 100644
--- a/checksum.c
+++ b/checksum.c
@@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
 	return ~csum_fold(sum);
 }
 
+/**
+ * proto_ipv4_header_psum() - Calculates the partial checksum of an
+ * 			      IPv4 header for UDP or TCP
+ * @tot_len:	Payload length
+ * @proto:	Protocol number
+ * @saddr:	Source address
+ * @daddr:	Destination address
+ * @proto:	proto Protocol number
+ * Returns:	Partial checksum of the IPv4 header
+ */
+uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
+				uint32_t saddr, uint32_t daddr)
+{
+	uint32_t psum = htons(protocol);
+
+	psum += (saddr >> 16) & 0xffff;
+	psum += saddr & 0xffff;
+	psum += (daddr >> 16) & 0xffff;
+	psum += daddr & 0xffff;
+	psum += htons(ntohs(tot_len) - 20);
+
+	return psum;
+}
+
 /**
  * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
  * @udp4hr:	UDP header, initialised apart from checksum
@@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr,
 	udp4hr->check = 0;
 
 	if (UDP4_REAL_CHECKSUMS) {
-		/* UNTESTED: if we did want real UDPv4 checksums, this
-		 * is roughly what we'd need */
-		uint32_t psum = csum_fold(saddr.s_addr)
-			+ csum_fold(daddr.s_addr)
-			+ htons(len + sizeof(*udp4hr))
-			+ htons(IPPROTO_UDP);
-		/* Add in partial checksum for the UDP header alone */
-		psum += sum_16b(udp4hr, sizeof(*udp4hr));
+		uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP,
+						       saddr.s_addr,
+						       daddr.s_addr);
+		psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum);
 		udp4hr->check = csum(payload, len, psum);
 	}
 }
@@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 	icmp4hr->checksum = csum(payload, len, psum);
 }
 
+/**
+ * proto_ipv6_header_psum() - Calculates the partial checksum of an
+ * 			      IPv6 header for UDP or TCP
+ * @payload_len:	Payload length
+ * @proto:		Protocol number
+ * @saddr:		Source address
+ * @daddr:		Destination address
+ * Returns:	Partial checksum of the IPv6 header
+ */
+uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
+				struct in6_addr saddr, struct in6_addr daddr)
+{
+	uint32_t sum = htons(protocol) + payload_len;
+
+	sum += sum_16b(&saddr, sizeof(saddr));
+	sum += sum_16b(&daddr, sizeof(daddr));
+
+	return sum;
+}
+
 /**
  * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet
  * @udp6hr:	UDP header, initialised apart from checksum
@@ -190,14 +230,11 @@ void csum_udp6(struct udphdr *udp6hr,
 	       const struct in6_addr *saddr, const struct in6_addr *daddr,
 	       const void *payload, size_t len)
 {
-	/* Partial checksum for the pseudo-IPv6 header */
-	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
-		        sum_16b(daddr, sizeof(*daddr)) +
-		        htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
-
+	uint32_t psum = proto_ipv6_header_psum(len, IPPROTO_UDP,
+					       *saddr, *daddr);
 	udp6hr->check = 0;
-	/* Add in partial checksum for the UDP header alone */
-	psum += sum_16b(udp6hr, sizeof(*udp6hr));
+
+	psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum);
 	udp6hr->check = csum(payload, len, psum);
 }
 
diff --git a/checksum.h b/checksum.h
index 92db73612b6e..b2b5b8e8b77e 100644
--- a/checksum.h
+++ b/checksum.h
@@ -15,10 +15,14 @@ uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
 uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
 			 uint32_t saddr, uint32_t daddr);
+uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
+				uint32_t saddr, uint32_t daddr);
 void csum_udp4(struct udphdr *udp4hr,
 	       struct in_addr saddr, struct in_addr daddr,
 	       const void *payload, size_t len);
 void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
+uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
+				struct in6_addr saddr, struct in6_addr daddr);
 void csum_udp6(struct udphdr *udp6hr,
 	       const struct in6_addr *saddr, const struct in6_addr *daddr,
 	       const void *payload, size_t len);
diff --git a/tcp.c b/tcp.c
index 8887656d3ee8..8ee252131504 100644
--- a/tcp.c
+++ b/tcp.c
@@ -938,39 +938,29 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
  * tcp_update_check_tcp4() - Update TCP checksum from stored one
  * @buf:	L2 packet buffer with final IPv4 header
  */
-static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf)
+static void tcp_update_check_tcp4(struct iphdr *iph)
 {
-	uint16_t tlen = ntohs(buf->iph.tot_len) - 20;
-	uint32_t sum = htons(IPPROTO_TCP);
+	struct tcphdr *th = (struct tcphdr *)(iph + 1);
+	uint16_t tlen = ntohs(iph->tot_len) - 20;
+	uint32_t sum = proto_ipv4_header_psum(iph->tot_len, IPPROTO_TCP,
+					      iph->saddr, iph->daddr);
 
-	sum += (buf->iph.saddr >> 16) & 0xffff;
-	sum += buf->iph.saddr & 0xffff;
-	sum += (buf->iph.daddr >> 16) & 0xffff;
-	sum += buf->iph.daddr & 0xffff;
-	sum += htons(ntohs(buf->iph.tot_len) - 20);
-
-	buf->th.check = 0;
-	buf->th.check = csum(&buf->th, tlen, sum);
+	th->check = 0;
+	th->check = csum(th, tlen, sum);
 }
 
 /**
  * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6
  * @buf:	L2 packet buffer with final IPv6 header
  */
-static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf)
+static void tcp_update_check_tcp6(struct ipv6hdr *ip6h)
 {
-	int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr);
-
-	buf->ip6h.hop_limit = IPPROTO_TCP;
-	buf->ip6h.version = 0;
-	buf->ip6h.nexthdr = 0;
+	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
+	uint32_t sum = proto_ipv6_header_psum(ip6h->payload_len, IPPROTO_TCP,
+					      ip6h->saddr, ip6h->daddr);
 
-	buf->th.check = 0;
-	buf->th.check = csum(&buf->ip6h, len, 0);
-
-	buf->ip6h.hop_limit = 255;
-	buf->ip6h.version = 6;
-	buf->ip6h.nexthdr = IPPROTO_TCP;
+	th->check = 0;
+	th->check = csum(th, ntohs(ip6h->payload_len), sum);
 }
 
 /**
@@ -1382,7 +1372,7 @@ do {									\
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
-		tcp_update_check_tcp4(b);
+		tcp_update_check_tcp4(&b->iph);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
@@ -1401,7 +1391,11 @@ do {									\
 
 		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
 
-		tcp_update_check_tcp6(b);
+		tcp_update_check_tcp6(&b->ip6h);
+
+		b->ip6h.hop_limit = 255;
+		b->ip6h.version = 6;
+		b->ip6h.nexthdr = IPPROTO_TCP;
 
 		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
diff --git a/udp.c b/udp.c
index cd34f659210b..e123b35a955c 100644
--- a/udp.c
+++ b/udp.c
@@ -670,10 +670,12 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 	b->uh.source = b->s_in6.sin6_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = b->ip6h.payload_len;
-
-	b->ip6h.hop_limit = IPPROTO_UDP;
-	b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0;
-	b->uh.check = csum(&b->ip6h, ip_len, 0);
+	b->uh.check = 0;
+	b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
+			   proto_ipv6_header_psum(b->ip6h.payload_len,
+						  IPPROTO_UDP,
+						  b->ip6h.saddr,
+						  b->ip6h.daddr));
 	b->ip6h.version = 6;
 	b->ip6h.nexthdr = IPPROTO_UDP;
 	b->ip6h.hop_limit = 255;
-- 
@@ -670,10 +670,12 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 	b->uh.source = b->s_in6.sin6_port;
 	b->uh.dest = htons(dstport);
 	b->uh.len = b->ip6h.payload_len;
-
-	b->ip6h.hop_limit = IPPROTO_UDP;
-	b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0;
-	b->uh.check = csum(&b->ip6h, ip_len, 0);
+	b->uh.check = 0;
+	b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
+			   proto_ipv6_header_psum(b->ip6h.payload_len,
+						  IPPROTO_UDP,
+						  b->ip6h.saddr,
+						  b->ip6h.daddr));
 	b->ip6h.version = 6;
 	b->ip6h.nexthdr = IPPROTO_UDP;
 	b->ip6h.hop_limit = 255;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 8/9] tap: make tap_update_mac() generic
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (6 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-17 15:07 ` [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Laurent Vivier
  2024-02-29 16:31 ` [PATCH v3 0/9] Add vhost-user support to passt (part 1) Stefano Brivio
  9 siblings, 0 replies; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson

Use ethhdr rather than tap_hdr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---

Notes:
    v3:
      - add David's R-b
    
    v2:
      - update function comment
      - move the patch earlier in the series

 tap.c | 10 +++++-----
 tap.h |  2 +-
 tcp.c |  8 ++++----
 udp.c |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tap.c b/tap.c
index 2d590e8525a0..d555f59cb6f7 100644
--- a/tap.c
+++ b/tap.c
@@ -443,18 +443,18 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n)
 }
 
 /**
- * tap_update_mac() - Update tap L2 header with new Ethernet addresses
- * @taph:	Tap headers to update
+ * eth_update_mac() - Update tap L2 header with new Ethernet addresses
+ * @eh:		Ethernet headers to update
  * @eth_d:	Ethernet destination address, NULL if unchanged
  * @eth_s:	Ethernet source address, NULL if unchanged
  */
-void tap_update_mac(struct tap_hdr *taph,
+void eth_update_mac(struct ethhdr *eh,
 		    const unsigned char *eth_d, const unsigned char *eth_s)
 {
 	if (eth_d)
-		memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest));
+		memcpy(eh->h_dest, eth_d, sizeof(eh->h_dest));
 	if (eth_s)
-		memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source));
+		memcpy(eh->h_source, eth_s, sizeof(eh->h_source));
 }
 
 PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
diff --git a/tap.h b/tap.h
index 466d91466c3d..437b9aa2b43f 100644
--- a/tap.h
+++ b/tap.h
@@ -74,7 +74,7 @@ void tap_icmp6_send(const struct ctx *c,
 		    const void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n);
-void tap_update_mac(struct tap_hdr *taph,
+void eth_update_mac(struct ethhdr *eh,
 		    const unsigned char *eth_d, const unsigned char *eth_s);
 void tap_listen_handler(struct ctx *c, uint32_t events);
 void tap_handler_pasta(struct ctx *c, uint32_t events,
diff --git a/tcp.c b/tcp.c
index 8ee252131504..aa03c20712f6 100644
--- a/tcp.c
+++ b/tcp.c
@@ -978,10 +978,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
 		struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i];
 
-		tap_update_mac(&b4->taph, eth_d, eth_s);
-		tap_update_mac(&b6->taph, eth_d, eth_s);
-		tap_update_mac(&b4f->taph, eth_d, eth_s);
-		tap_update_mac(&b6f->taph, eth_d, eth_s);
+		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b4f->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6f->taph.eh, eth_d, eth_s);
 	}
 }
 
diff --git a/udp.c b/udp.c
index e123b35a955c..dc7f4559b50d 100644
--- a/udp.c
+++ b/udp.c
@@ -283,8 +283,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
-		tap_update_mac(&b4->taph, eth_d, eth_s);
-		tap_update_mac(&b6->taph, eth_d, eth_s);
+		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
 	}
 }
 
-- 
@@ -283,8 +283,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s)
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
-		tap_update_mac(&b4->taph, eth_d, eth_s);
-		tap_update_mac(&b6->taph, eth_d, eth_s);
+		eth_update_mac(&b4->taph.eh, eth_d, eth_s);
+		eth_update_mac(&b6->taph.eh, eth_d, eth_s);
 	}
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (7 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 8/9] tap: make tap_update_mac() generic Laurent Vivier
@ 2024-02-17 15:07 ` Laurent Vivier
  2024-02-19  3:14   ` David Gibson
  2024-02-29 16:29   ` Stefano Brivio
  2024-02-29 16:31 ` [PATCH v3 0/9] Add vhost-user support to passt (part 1) Stefano Brivio
  9 siblings, 2 replies; 38+ messages in thread
From: Laurent Vivier @ 2024-02-17 15:07 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function
tcp_fill_header().

Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to
tcp_fill_ipv4_header() and tcp_fill_ipv6_header()

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---

Notes:
    v3:
      - add to sub-series part 1
    
    v2:
      - extract header filling functions from
        "tcp: extract buffer management from tcp_send_flag()"
      - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(),
        tcp_fill_ipv6_header()
      - use upside-down Christmas tree arguments order
      - replace (void *) by (struct tcphdr *)

 tcp.c | 154 +++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 104 insertions(+), 50 deletions(-)

diff --git a/tcp.c b/tcp.c
index aa03c20712f6..bc57a4f6e611 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c)
 	tcp_l2_data_buf_flush(c);
 }
 
+/**
+ * tcp_fill_header() - Fill the TCP header fields for a given TCP segment.
+ *
+ * @th:		Pointer to the TCP header structure
+ * @conn:	Pointer to the TCP connection structure
+ * @seq:	Sequence number
+ */
+static void tcp_fill_header(struct tcphdr *th,
+			       const struct tcp_tap_conn *conn, uint32_t seq)
+{
+	th->source = htons(conn->fport);
+	th->dest = htons(conn->eport);
+	th->seq = htonl(seq);
+	th->ack_seq = htonl(conn->seq_ack_to_tap);
+	if (conn->events & ESTABLISHED)	{
+		th->window = htons(conn->wnd_to_tap);
+	} else {
+		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;
+
+		th->window = htons(MIN(wnd, USHRT_MAX));
+	}
+}
+
+/**
+ * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ * @iph:	Pointer to IPv4 header, immediately followed by a TCP header
+ * @plen:	Payload length (including TCP header options)
+ * @check:	Checksum, if already known
+ * @seq:	Sequence number for this segment
+ *
+ * Return: IP frame length including L2 headers, host order
+ */
+static size_t tcp_fill_ipv4_header(const struct ctx *c,
+				   const struct tcp_tap_conn *conn,
+				   struct iphdr *iph, size_t plen,
+				   const uint16_t *check, uint32_t seq)
+{
+	size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
+	const struct in_addr *a4 = inany_v4(&conn->faddr);
+	struct tcphdr *th = (struct tcphdr *)(iph + 1);
+
+	iph->tot_len = htons(ip_len);
+	iph->saddr = a4->s_addr;
+	iph->daddr = c->ip4.addr_seen.s_addr;
+
+	iph->check = check ? *check :
+		     csum_ip4_header(iph->tot_len, IPPROTO_TCP,
+				     iph->saddr, iph->daddr);
+
+
+	tcp_fill_header(th, conn, seq);
+
+	tcp_update_check_tcp4(iph);
+
+	return ip_len;
+}
+
+/**
+ * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ * @ip6h:	Pointer to IPv6 header, immediately followed by a TCP header
+ * @plen:	Payload length (including TCP header options)
+ * @check:	Checksum, if already known
+ * @seq:	Sequence number for this segment
+ *
+ * Return: The total length of the IPv6 packet, host order
+ */
+static size_t tcp_fill_ipv6_header(const struct ctx *c,
+				   const struct tcp_tap_conn *conn,
+				   struct ipv6hdr *ip6h, size_t plen,
+				   uint32_t seq)
+{
+	size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
+
+	ip6h->payload_len = htons(plen + sizeof(struct tcphdr));
+	ip6h->saddr = conn->faddr.a6;
+	if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr))
+		ip6h->daddr = c->ip6.addr_ll_seen;
+	else
+		ip6h->daddr = c->ip6.addr_seen;
+
+	memset(ip6h->flow_lbl, 0, 3);
+
+	tcp_fill_header(th, conn, seq);
+
+	tcp_update_check_tcp6(ip6h);
+
+	ip6h->hop_limit = 255;
+	ip6h->version = 6;
+	ip6h->nexthdr = IPPROTO_TCP;
+
+	ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf;
+	ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff;
+	ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff;
+
+	return ip_len;
+}
+
 /**
  * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers
  * @c:		Execution context
@@ -1343,67 +1445,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 	const struct in_addr *a4 = inany_v4(&conn->faddr);
 	size_t ip_len, tlen;
 
-#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq)			\
-do {									\
-	b->th.source = htons(conn->fport);				\
-	b->th.dest = htons(conn->eport);				\
-	b->th.seq = htonl(seq);						\
-	b->th.ack_seq = htonl(conn->seq_ack_to_tap);			\
-	if (conn->events & ESTABLISHED)	{				\
-		b->th.window = htons(conn->wnd_to_tap);			\
-	} else {							\
-		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;	\
-									\
-		b->th.window = htons(MIN(wnd, USHRT_MAX));		\
-	}								\
-} while (0)
-
 	if (a4) {
 		struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p;
 
-		ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
-		b->iph.tot_len = htons(ip_len);
-		b->iph.saddr = a4->s_addr;
-		b->iph.daddr = c->ip4.addr_seen.s_addr;
-
-		b->iph.check = check ? *check :
-			       csum_ip4_header(b->iph.tot_len, IPPROTO_TCP,
-					       b->iph.saddr, b->iph.daddr);
-
-		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
-
-		tcp_update_check_tcp4(&b->iph);
+		ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
-		ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
-
-		b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr));
-		b->ip6h.saddr = conn->faddr.a6;
-		if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr))
-			b->ip6h.daddr = c->ip6.addr_ll_seen;
-		else
-			b->ip6h.daddr = c->ip6.addr_seen;
-
-		memset(b->ip6h.flow_lbl, 0, 3);
-
-		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
-
-		tcp_update_check_tcp6(&b->ip6h);
-
-		b->ip6h.hop_limit = 255;
-		b->ip6h.version = 6;
-		b->ip6h.nexthdr = IPPROTO_TCP;
-
-		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
-		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
-		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
+		ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	}
-#undef SET_TCP_HEADER_COMMON_V4_V6
 
 	return tlen;
 }
-- 
@@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c)
 	tcp_l2_data_buf_flush(c);
 }
 
+/**
+ * tcp_fill_header() - Fill the TCP header fields for a given TCP segment.
+ *
+ * @th:		Pointer to the TCP header structure
+ * @conn:	Pointer to the TCP connection structure
+ * @seq:	Sequence number
+ */
+static void tcp_fill_header(struct tcphdr *th,
+			       const struct tcp_tap_conn *conn, uint32_t seq)
+{
+	th->source = htons(conn->fport);
+	th->dest = htons(conn->eport);
+	th->seq = htonl(seq);
+	th->ack_seq = htonl(conn->seq_ack_to_tap);
+	if (conn->events & ESTABLISHED)	{
+		th->window = htons(conn->wnd_to_tap);
+	} else {
+		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;
+
+		th->window = htons(MIN(wnd, USHRT_MAX));
+	}
+}
+
+/**
+ * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ * @iph:	Pointer to IPv4 header, immediately followed by a TCP header
+ * @plen:	Payload length (including TCP header options)
+ * @check:	Checksum, if already known
+ * @seq:	Sequence number for this segment
+ *
+ * Return: IP frame length including L2 headers, host order
+ */
+static size_t tcp_fill_ipv4_header(const struct ctx *c,
+				   const struct tcp_tap_conn *conn,
+				   struct iphdr *iph, size_t plen,
+				   const uint16_t *check, uint32_t seq)
+{
+	size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
+	const struct in_addr *a4 = inany_v4(&conn->faddr);
+	struct tcphdr *th = (struct tcphdr *)(iph + 1);
+
+	iph->tot_len = htons(ip_len);
+	iph->saddr = a4->s_addr;
+	iph->daddr = c->ip4.addr_seen.s_addr;
+
+	iph->check = check ? *check :
+		     csum_ip4_header(iph->tot_len, IPPROTO_TCP,
+				     iph->saddr, iph->daddr);
+
+
+	tcp_fill_header(th, conn, seq);
+
+	tcp_update_check_tcp4(iph);
+
+	return ip_len;
+}
+
+/**
+ * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ * @ip6h:	Pointer to IPv6 header, immediately followed by a TCP header
+ * @plen:	Payload length (including TCP header options)
+ * @check:	Checksum, if already known
+ * @seq:	Sequence number for this segment
+ *
+ * Return: The total length of the IPv6 packet, host order
+ */
+static size_t tcp_fill_ipv6_header(const struct ctx *c,
+				   const struct tcp_tap_conn *conn,
+				   struct ipv6hdr *ip6h, size_t plen,
+				   uint32_t seq)
+{
+	size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
+
+	ip6h->payload_len = htons(plen + sizeof(struct tcphdr));
+	ip6h->saddr = conn->faddr.a6;
+	if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr))
+		ip6h->daddr = c->ip6.addr_ll_seen;
+	else
+		ip6h->daddr = c->ip6.addr_seen;
+
+	memset(ip6h->flow_lbl, 0, 3);
+
+	tcp_fill_header(th, conn, seq);
+
+	tcp_update_check_tcp6(ip6h);
+
+	ip6h->hop_limit = 255;
+	ip6h->version = 6;
+	ip6h->nexthdr = IPPROTO_TCP;
+
+	ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf;
+	ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff;
+	ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff;
+
+	return ip_len;
+}
+
 /**
  * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers
  * @c:		Execution context
@@ -1343,67 +1445,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 	const struct in_addr *a4 = inany_v4(&conn->faddr);
 	size_t ip_len, tlen;
 
-#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq)			\
-do {									\
-	b->th.source = htons(conn->fport);				\
-	b->th.dest = htons(conn->eport);				\
-	b->th.seq = htonl(seq);						\
-	b->th.ack_seq = htonl(conn->seq_ack_to_tap);			\
-	if (conn->events & ESTABLISHED)	{				\
-		b->th.window = htons(conn->wnd_to_tap);			\
-	} else {							\
-		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;	\
-									\
-		b->th.window = htons(MIN(wnd, USHRT_MAX));		\
-	}								\
-} while (0)
-
 	if (a4) {
 		struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p;
 
-		ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
-		b->iph.tot_len = htons(ip_len);
-		b->iph.saddr = a4->s_addr;
-		b->iph.daddr = c->ip4.addr_seen.s_addr;
-
-		b->iph.check = check ? *check :
-			       csum_ip4_header(b->iph.tot_len, IPPROTO_TCP,
-					       b->iph.saddr, b->iph.daddr);
-
-		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
-
-		tcp_update_check_tcp4(&b->iph);
+		ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
-		ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
-
-		b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr));
-		b->ip6h.saddr = conn->faddr.a6;
-		if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr))
-			b->ip6h.daddr = c->ip6.addr_ll_seen;
-		else
-			b->ip6h.daddr = c->ip6.addr_seen;
-
-		memset(b->ip6h.flow_lbl, 0, 3);
-
-		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
-
-		tcp_update_check_tcp6(&b->ip6h);
-
-		b->ip6h.hop_limit = 255;
-		b->ip6h.version = 6;
-		b->ip6h.nexthdr = IPPROTO_TCP;
-
-		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
-		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
-		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
+		ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq);
 
 		tlen = tap_iov_len(c, &b->taph, ip_len);
 	}
-#undef SET_TCP_HEADER_COMMON_V4_V6
 
 	return tlen;
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/9] iov: add some functions to manage iovec
  2024-02-17 15:07 ` [PATCH v3 1/9] iov: add some functions to manage iovec Laurent Vivier
@ 2024-02-19  2:45   ` David Gibson
  0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  2:45 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 9770 bytes --]

On Sat, Feb 17, 2024 at 04:07:17PM +0100, Laurent Vivier wrote:
> Introduce functions to copy to/from a buffer from/to an iovec array,
> to compute data length in in bytes of an iovec and to copy memory from
> an iovec to another.
> 
> iov_from_buf(), iov_to_buf(), iov_size(), iov_copy().
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v3:
>        - update copyright statement
>          (but keep it as proposed by Stefano)
>        - open code of iov_fill_from_buf()/iov_fill_to_buf()
>          into iov_frombuf()/iov_to_buf()
>        - coding style cleanup
>        - use size_t for the length of the vectors
>     
>     v2:
>        - reorder added files in alphanetical order in Makefile
>        - update comments, cosmetic cleanup
>        - rename iov_from_buf_full/iov_to_buf_full to
>          iov_fill_from_buf/iov_fill_to_buf
>        - split loops that manage offset and bytes copy.
>        - move iov_from_buf()/iov_to_buf() to iov.c
> 
>  Makefile |   8 +--
>  iov.c    | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  iov.h    |  29 ++++++++++
>  3 files changed, 204 insertions(+), 4 deletions(-)
>  create mode 100644 iov.c
>  create mode 100644 iov.h
> 
> diff --git a/Makefile b/Makefile
> index af4fa87e7e13..156398b3844e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\"
>  FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
>  
>  PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
> -	igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \
> -	passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \
> -	util.c
> +	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
> +	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
> +	tcp_splice.c udp.c util.c
>  QRAP_SRCS = qrap.c
>  SRCS = $(PASST_SRCS) $(QRAP_SRCS)
>  
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
>  PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
> -	flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \
> +	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
>  	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
>  	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
>  HEADERS = $(PASST_HEADERS) seccomp.h
> diff --git a/iov.c b/iov.c
> new file mode 100644
> index 000000000000..1135f87e2f45
> --- /dev/null
> +++ b/iov.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * iov.h - helpers for using (partial) iovecs.
> + *
> + * Copyright Red Hat
> + * Author: Laurent Vivier <lvivier@redhat.com>
> + *
> + * This file also contains code originally from QEMU include/qemu/iov.h
> + * and licensed under the following terms:
> + *
> + * Copyright (C) 2010 Red Hat, Inc.
> + *
> + * Author(s):
> + *  Amit Shah <amit.shah@redhat.com>
> + *  Michael Tokarev <mjt@tls.msk.ru>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + */
> +#include <sys/socket.h>
> +
> +#include "util.h"
> +#include "iov.h"
> +
> +/**
> + * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec)
> + *                efficiently.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the
> + *             scatter/gather I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Byte offset in the iov array where copying should start.
> + * @buf:       Pointer to the source buffer containing the data to copy.
> + * @bytes:     Total number of bytes to copy from buf to iov.
> + *
> + * Returns:    The number of bytes successfully copied.
> + */
> +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
> +		    size_t offset, const void *buf, size_t bytes)
> +{
> +	unsigned int i;
> +	size_t copied;
> +
> +	if (__builtin_constant_p(bytes) && iov_cnt &&
> +		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
> +		memcpy((char *)iov[0].iov_base + offset, buf, bytes);
> +		return bytes;
> +	}
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
> +		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
> +
> +		memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied,
> +		       len);
> +		copied += len;
> +		offset = 0;
> +	}
> +
> +	return copied;
> +}
> +
> +/**
> + * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to
> + *		a buffer efficiently.
> + *
> + * @iov:       Pointer to the array of struct iovec describing the scatter/gather
> + *             I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + * @offset:    Offset within the first element of iov from where copying should start.
> + * @buf:       Pointer to the destination buffer where data will be copied.
> + * @bytes:     Total number of bytes to copy from iov to buf.
> + *
> + * Returns:    The number of bytes successfully copied.
> + */
> +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
> +		  size_t offset, void *buf, size_t bytes)
> +{
> +	unsigned int i;
> +	size_t copied;
> +
> +	if (__builtin_constant_p(bytes) && iov_cnt &&
> +		offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) {
> +		memcpy(buf, (char *)iov[0].iov_base + offset, bytes);
> +		return bytes;
> +	}
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (copied = 0; copied < bytes && i < iov_cnt; i++) {
> +		size_t len = MIN(iov[i].iov_len - offset, bytes - copied);
> +		memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset,
> +		       len);
> +		copied += len;
> +		offset = 0;
> +	}
> +
> +	return copied;
> +}
> +
> +/**
> + * iov_size - Calculate the total size of a scatter/gather I/O vector
> + *            (struct iovec).
> + *
> + * @iov:       Pointer to the array of struct iovec describing the
> + *             scatter/gather I/O vector.
> + * @iov_cnt:   Number of elements in the iov array.
> + *
> + * Returns:    The total size in bytes.
> + */
> +size_t iov_size(const struct iovec *iov, size_t iov_cnt)
> +{
> +	unsigned int i;
> +	size_t len;
> +
> +	for (i = 0, len = 0; i < iov_cnt; i++)
> +		len += iov[i].iov_len;
> +
> +	return len;
> +}
> +
> +/**
> + * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to
> + *            another.
> + *
> + * @dst_iov:      Pointer to the destination array of struct iovec describing
> + *                the scatter/gather I/O vector to copy to.
> + * @dst_iov_cnt:  Number of elements in the destination iov array.
> + * @iov:          Pointer to the source array of struct iovec describing
> + *                the scatter/gather I/O vector to copy from.
> + * @iov_cnt:      Number of elements in the source iov array.
> + * @offset:       Offset within the source iov from where copying should start.
> + * @bytes:        Total number of bytes to copy from iov to dst_iov.
> + *
> + * Returns:       The number of elements successfully copied to the destination
> + *                iov array.
> + */
> +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt,
> +		  const struct iovec *iov, size_t iov_cnt,
> +		  size_t offset, size_t bytes)
> +{
> +	unsigned int i, j;
> +	size_t len;
> +
> +	/* skipping offset bytes in the iovec */
> +	for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++)
> +		offset -= iov[i].iov_len;
> +
> +	/* copying data */
> +	for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) {
> +		len = MIN(bytes, iov[i].iov_len - offset);
> +
> +		dst_iov[j].iov_base = (char *)iov[i].iov_base + offset;
> +		dst_iov[j].iov_len = len;
> +		j++;
> +		bytes -= len;
> +		offset = 0;
> +	}
> +
> +	return j;
> +}
> diff --git a/iov.h b/iov.h
> new file mode 100644
> index 000000000000..ee35a75d0870
> --- /dev/null
> +++ b/iov.h
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * iov.c - helpers for using (partial) iovecs.
> + *
> + * Copyrigh Red Hat
> + * Author: Laurent Vivier <lvivier@redhat.com>
> + *
> + * This file also contains code originally from QEMU include/qemu/iov.h:
> + *
> + * Author(s):
> + *  Amit Shah <amit.shah@redhat.com>
> + *  Michael Tokarev <mjt@tls.msk.ru>
> + */
> +
> +#ifndef IOVEC_H
> +#define IOVEC_H
> +
> +#include <unistd.h>
> +#include <string.h>
> +
> +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt,
> +                    size_t offset, const void *buf, size_t bytes);
> +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
> +                  size_t offset, void *buf, size_t bytes);
> +size_t iov_size(const struct iovec *iov, size_t iov_cnt);
> +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt,
> +		  const struct iovec *iov, size_t iov_cnt,
> +		  size_t offset, size_t bytes);
> +#endif /* IOVEC_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/9] pcap: add pcap_iov()
  2024-02-17 15:07 ` [PATCH v3 2/9] pcap: add pcap_iov() Laurent Vivier
@ 2024-02-19  2:50   ` David Gibson
  0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  2:50 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4139 bytes --]

On Sat, Feb 17, 2024 at 04:07:18PM +0100, Laurent Vivier wrote:
> Introduce a new function pcap_iov() to capture packet desribed by an IO
> vector.
> 
> Move packet header writing to to pcap_header() and use it in
> pcap_frame() and pcap_iov().
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v3:
>       - update rational
>       - update comment
>       - use strerror(errno)
>       - use size_t for io vector length
>     
>     v2:
>       - introduce pcap_header(), a common helper to write
>         packet header
>       - use writev() rather than write() in a loop
>       - add functions comment
> 
>  pcap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  pcap.h |  1 +
>  2 files changed, 56 insertions(+), 7 deletions(-)
> 
> diff --git a/pcap.c b/pcap.c
> index 501d52d4992b..4e213eea8113 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -20,6 +20,7 @@
>  #include <sys/time.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/uio.h>
>  #include <fcntl.h>
>  #include <time.h>
>  #include <errno.h>
> @@ -31,6 +32,7 @@
>  #include "util.h"
>  #include "passt.h"
>  #include "log.h"
> +#include "iov.h"
>  
>  #define PCAP_VERSION_MINOR 4
>  
> @@ -65,6 +67,28 @@ struct pcap_pkthdr {
>  	uint32_t len;
>  };
>  
> +/*
> + * pcap_header() - Write a pcap packet header to pcap file
> + *
> + * @len:       Length of the packet data.
> + * @tv:        Timestamp for the packet.
> + *
> + * Return: -1 in case of error, otherwise, 0 to indicate success.
> + */
> +static int pcap_header(size_t len, const struct timeval *tv)
> +{
> +	struct pcap_pkthdr h;
> +
> +	h.tv_sec = tv->tv_sec;
> +	h.tv_usec = tv->tv_usec;
> +	h.caplen = h.len = len;
> +
> +	if (write(pcap_fd, &h, sizeof(h)) < 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
>  /**
>   * pcap_frame() - Capture a single frame to pcap file with given timestamp
>   * @pkt:	Pointer to data buffer, including L2 headers
> @@ -75,13 +99,7 @@ struct pcap_pkthdr {
>   */
>  static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
>  {
> -	struct pcap_pkthdr h;
> -
> -	h.tv_sec = tv->tv_sec;
> -	h.tv_usec = tv->tv_usec;
> -	h.caplen = h.len = len;
> -
> -	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
> +	if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0)
>  		return -errno;
>  
>  	return 0;
> @@ -130,6 +148,36 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
>  	}
>  }
>  
> +/*
> + * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov)
> + *             to a pcap file descriptor (pcap_fd).
> + *
> + * @iov:       Pointer to the array of struct iovec describing the scatter/gather
> + *             I/O vector containing packet data to write, including L2 header
> + * @n:         Number of elements in the iov array.
> + */
> +void pcap_iov(const struct iovec *iov, size_t n)
> +{
> +	struct timeval tv;
> +	size_t len;
> +
> +	if (pcap_fd == -1)
> +		return;
> +
> +	gettimeofday(&tv, NULL);
> +
> +	len = iov_size(iov, n);
> +
> +	if (pcap_header(len, &tv) < 0) {
> +		debug("Cannot write pcap header");
> +		return;
> +	}
> +
> +	if (writev(pcap_fd, iov, n) < 0)
> +		debug("Cannot log packet using writev(): %s\n",
> +		      strerror(errno));
> +}
> +
>  /**
>   * pcap_init() - Initialise pcap file
>   * @c:		Execution context
> diff --git a/pcap.h b/pcap.h
> index da5a7e846b72..4950f617f4c8 100644
> --- a/pcap.h
> +++ b/pcap.h
> @@ -8,6 +8,7 @@
>  
>  void pcap(const char *pkt, size_t len);
>  void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
> +void pcap_iov(const struct iovec *iov, size_t n);
>  void pcap_init(struct ctx *c);
>  
>  #endif /* PCAP_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 4/9] checksum: add csum_iov()
  2024-02-17 15:07 ` [PATCH v3 4/9] checksum: add csum_iov() Laurent Vivier
@ 2024-02-19  2:52   ` David Gibson
  0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  2:52 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5011 bytes --]

On Sat, Feb 17, 2024 at 04:07:20PM +0100, Laurent Vivier wrote:
> Introduce the function csum_unfolded() that computes the unfolded
> 32-bit checksum of a data buffer, and call it from csum() that returns
> the folded value.
> 
> Introduce csum_iov() that computes the checksum using csum_folded() on
> all vectors of the iovec array and returns the folded result.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v3:
>       - update comments
>       - use size_t for the IO vectors length
>       - include checksum.h in checksum.c
>       - export csum_unfolded() (for later)
>     
>     v2:
>       - fix typo and superfluous space
>       - update comments
> 
>  checksum.c | 56 ++++++++++++++++++++++++++++++++++++++++++------------
>  checksum.h |  2 ++
>  2 files changed, 46 insertions(+), 12 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 65486b4625ba..74e3742bc6f6 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -57,6 +57,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "checksum.h"
>  
>  /* Checksums are optional for UDP over IPv4, so we usually just set
>   * them to 0.  Change this to 1 to calculate real UDP over IPv4
> @@ -385,16 +386,16 @@ less_than_128_bytes:
>  }
>  
>  /**
> - * csum() - Compute TCP/IP-style checksum
> - * @buf:	Input buffer
> - * @len:	Input length
> - * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
> + * csum_unfolded - Calculate the unfolded checksum of a data buffer.
>   *
> - * Return: 16-bit folded, complemented checksum
> + * @buf:   Input buffer
> + * @len:   Input length
> + * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
> + *
> + * Return: 32-bit unfolded
>   */
> -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
> -uint16_t csum(const void *buf, size_t len, uint32_t init)
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
>  {
>  	intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i));
>  	unsigned int pad = align - (intptr_t)buf;
> @@ -408,16 +409,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
>  	if (len > pad)
>  		init = csum_avx2((void *)align, len - pad, init);
>  
> -	return (uint16_t)~csum_fold(init);
> +	return init;
>  }
> -
>  #else /* __AVX2__ */
> +/**
> + * csum_unfolded - Calculate the unfolded checksum of a data buffer.
> + *
> + * @buf:   Input buffer
> + * @len:   Input length
> + * @init:  Initial 32-bit checksum, 0 for no pre-computed checksum
> + *
> + * Return: 32-bit unfolded checksum
> + */
> +__attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
> +{
> +	return sum_16b(buf, len) + init;
> +}
> +#endif /* !__AVX2__ */
>  
>  /**
>   * csum() - Compute TCP/IP-style checksum
>   * @buf:	Input buffer
>   * @len:	Input length
> - * @sum:	Initial 32-bit checksum, 0 for no pre-computed checksum
> + * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
>   *
>   * Return: 16-bit folded, complemented checksum
>   */
> @@ -425,7 +440,24 @@ uint16_t csum(const void *buf, size_t len, uint32_t init)
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
>  uint16_t csum(const void *buf, size_t len, uint32_t init)
>  {
> -	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> +	return (uint16_t)~csum_fold(csum_unfolded(buf, len, init));
>  }
>  
> -#endif /* !__AVX2__ */
> +/**
> + * csum_iov() - Calculates the unfolded checksum over an array of IO vectors
> + *
> + * @iov		Pointer to the array of IO vectors
> + * @n		Length of the array
> + * @init	Initial 32-bit checksum, 0 for no pre-computed checksum
> + *
> + * Return: 16-bit folded, complemented checksum
> + */
> +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < n; i++)
> +		init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init);
> +
> +	return (uint16_t)~csum_fold(init);
> +}
> diff --git a/checksum.h b/checksum.h
> index 21c0310d3804..dfa705a04a24 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -24,6 +24,8 @@ void csum_udp6(struct udphdr *udp6hr,
>  void csum_icmp6(struct icmp6hdr *icmp6hr,
>  		const struct in6_addr *saddr, const struct in6_addr *daddr,
>  		const void *payload, size_t len);
> +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
> +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init);
>  
>  #endif /* CHECKSUM_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch]
  2024-02-17 15:07 ` [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
@ 2024-02-19  2:59   ` David Gibson
  0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  2:59 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 15912 bytes --]

On Sat, Feb 17, 2024 at 04:07:21PM +0100, Laurent Vivier wrote:
> Introduce ip.[ch] file to encapsulate IP protocol handling functions and
> structures.  Modify various files to include the new header ip.h when
> it's needed.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> 
> Notes:
>     v3:
>       - rewrap rational

Tangential English usage note.  "rationale" is different from
"rational", though they're related.  "Rationale" is the noun - "an
explanation of the reason for the thing", "rational" is an adjective -
"sensible, logical, having a rationale".

(Except of course that in maths "rational" is also a noun - "a number
expressible as a ratio").

>       - add David's R-b
>     
>     v2:
>       - update rational and comments
> 
>  Makefile     |  8 ++---
>  conf.c       |  1 +
>  dhcp.c       |  1 +
>  flow.c       |  1 +
>  icmp.c       |  1 +
>  ip.c         | 72 +++++++++++++++++++++++++++++++++++++++++++
>  ip.h         | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ndp.c        |  1 +
>  port_fwd.c   |  1 +
>  qrap.c       |  1 +
>  tap.c        |  1 +
>  tcp.c        |  1 +
>  tcp_splice.c |  1 +
>  udp.c        |  1 +
>  util.c       | 55 ---------------------------------
>  util.h       | 76 ----------------------------------------------
>  16 files changed, 173 insertions(+), 135 deletions(-)
>  create mode 100644 ip.c
>  create mode 100644 ip.h
> 
> diff --git a/Makefile b/Makefile
> index 156398b3844e..e1ebb454bc6b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\"
>  FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS)
>  
>  PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \
> -	igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
> +	igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \
>  	packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \
>  	tcp_splice.c udp.c util.c
>  QRAP_SRCS = qrap.c
> @@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS)
>  MANPAGES = passt.1 pasta.1 qrap.1
>  
>  PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \
> -	flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \
> -	netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \
> -	tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
> +	flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \
> +	ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \
> +	siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h
>  HEADERS = $(PASST_HEADERS) seccomp.h
>  
>  C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 };
> diff --git a/conf.c b/conf.c
> index 5e15b665be9c..93bfda331349 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -35,6 +35,7 @@
>  #include <netinet/if_ether.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "netlink.h"
>  #include "udp.h"
> diff --git a/dhcp.c b/dhcp.c
> index 110772867632..ff4834a3dce9 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -25,6 +25,7 @@
>  #include <limits.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "checksum.h"
>  #include "packet.h"
>  #include "passt.h"
> diff --git a/flow.c b/flow.c
> index 5e94a7a949e5..73d52bda8774 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -11,6 +11,7 @@
>  #include <string.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "siphash.h"
>  #include "inany.h"
> diff --git a/icmp.c b/icmp.c
> index 9434fc5a7490..3b85a8578316 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -33,6 +33,7 @@
>  
>  #include "packet.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "log.h"
> diff --git a/ip.c b/ip.c
> new file mode 100644
> index 000000000000..2cc7f6548aff
> --- /dev/null
> +++ b/ip.c
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + *  for qemu/UNIX domain socket mode
> + *
> + * PASTA - Pack A Subtle Tap Abstraction
> + *  for network namespace/tap device mode
> + *
> + * ip.c - IP related functions
> + *
> + * Copyright (c) 2020-2021 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + */
> +
> +#include <stddef.h>
> +#include "util.h"
> +#include "ip.h"
> +
> +#define IPV6_NH_OPT(nh)							\
> +	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
> +	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
> +	 (nh) == 140 || (nh) == 253 || (nh) == 254)
> +
> +/**
> + * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
> + * @p:		Packet pool, packet number @idx has IPv6 header at @offset
> + * @idx:	Index of packet in pool
> + * @offset:	Pre-calculated IPv6 header offset
> + * @proto:	Filled with L4 protocol number
> + * @dlen:	Data length (payload excluding header extensions), set on return
> + *
> + * Return: pointer to L4 header, NULL if not found
> + */
> +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> +		 size_t *dlen)
> +{
> +	const struct ipv6_opt_hdr *o;
> +	const struct ipv6hdr *ip6h;
> +	char *base;
> +	int hdrlen;
> +	uint8_t nh;
> +
> +	base = packet_get(p, idx, 0, 0, NULL);
> +	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
> +	if (!ip6h)
> +		return NULL;
> +
> +	offset += sizeof(*ip6h);
> +
> +	nh = ip6h->nexthdr;
> +	if (!IPV6_NH_OPT(nh))
> +		goto found;
> +
> +	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
> +		nh = o->nexthdr;
> +		hdrlen = (o->hdrlen + 1) * 8;
> +
> +		if (IPV6_NH_OPT(nh))
> +			offset += hdrlen;
> +		else
> +			goto found;
> +	}
> +
> +	return NULL;
> +
> +found:
> +	if (nh == 59)
> +		return NULL;
> +
> +	*proto = nh;
> +	return base + offset;
> +}
> diff --git a/ip.h b/ip.h
> new file mode 100644
> index 000000000000..b2e08bc049f3
> --- /dev/null
> +++ b/ip.h
> @@ -0,0 +1,86 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + * Copyright (c) 2021 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio@redhat.com>
> + */
> +
> +#ifndef IP_H
> +#define IP_H
> +
> +#include <netinet/ip.h>
> +#include <netinet/ip6.h>
> +
> +#define IN4_IS_ADDR_UNSPECIFIED(a) \
> +	((a)->s_addr == htonl_constant(INADDR_ANY))
> +#define IN4_IS_ADDR_BROADCAST(a) \
> +	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
> +#define IN4_IS_ADDR_LOOPBACK(a) \
> +	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
> +#define IN4_IS_ADDR_MULTICAST(a) \
> +	(IN_MULTICAST(ntohl((a)->s_addr)))
> +#define IN4_ARE_ADDR_EQUAL(a, b) \
> +	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
> +#define IN4ADDR_LOOPBACK_INIT \
> +	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
> +#define IN4ADDR_ANY_INIT \
> +	{ .s_addr	= htonl_constant(INADDR_ANY) }
> +
> +#define L2_BUF_IP4_INIT(proto)						\
> +	{								\
> +		.version	= 4,					\
> +		.ihl		= 5,					\
> +		.tos		= 0,					\
> +		.tot_len	= 0,					\
> +		.id		= 0,					\
> +		.frag_off	= 0,					\
> +		.ttl		= 0xff,					\
> +		.protocol	= (proto),				\
> +		.saddr		= 0,					\
> +		.daddr		= 0,					\
> +	}
> +#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
> +				 (uint32_t)htons_constant(0xff00 | (proto)))
> +
> +#define L2_BUF_IP6_INIT(proto)						\
> +	{								\
> +		.priority	= 0,					\
> +		.version	= 6,					\
> +		.flow_lbl	= { 0 },				\
> +		.payload_len	= 0,					\
> +		.nexthdr	= (proto),				\
> +		.hop_limit	= 255,					\
> +		.saddr		= IN6ADDR_ANY_INIT,			\
> +		.daddr		= IN6ADDR_ANY_INIT,			\
> +	}
> +
> +struct ipv6hdr {
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpedantic"
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +	uint8_t			version:4,
> +				priority:4;
> +#else
> +	uint8_t			priority:4,
> +				version:4;
> +#endif
> +#pragma GCC diagnostic pop
> +	uint8_t			flow_lbl[3];
> +
> +	uint16_t		payload_len;
> +	uint8_t			nexthdr;
> +	uint8_t			hop_limit;
> +
> +	struct in6_addr		saddr;
> +	struct in6_addr		daddr;
> +};
> +
> +struct ipv6_opt_hdr {
> +	uint8_t			nexthdr;
> +	uint8_t			hdrlen;
> +	/*
> +	 * TLV encoded option data follows.
> +	 */
> +} __attribute__((packed));	/* required for some archs */
> +
> +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> +		 size_t *dlen);
> +#endif /* IP_H */
> diff --git a/ndp.c b/ndp.c
> index 4c85ab8bcaee..c58f4b222b76 100644
> --- a/ndp.c
> +++ b/ndp.c
> @@ -28,6 +28,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "log.h"
> diff --git a/port_fwd.c b/port_fwd.c
> index 6f6c836c57ad..e1ec31e2232c 100644
> --- a/port_fwd.c
> +++ b/port_fwd.c
> @@ -21,6 +21,7 @@
>  #include <stdio.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "port_fwd.h"
>  #include "passt.h"
>  #include "lineread.h"
> diff --git a/qrap.c b/qrap.c
> index 97f350a4bf0b..d59670621731 100644
> --- a/qrap.c
> +++ b/qrap.c
> @@ -32,6 +32,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "arp.h"
>  
> diff --git a/tap.c b/tap.c
> index 396dee7eef25..3ea03f720d6d 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -45,6 +45,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "arp.h"
>  #include "dhcp.h"
> diff --git a/tcp.c b/tcp.c
> index 2ab443d5c3f2..45ef5146729a 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -289,6 +289,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "siphash.h"
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 26d32065cd47..66575ca95a1e 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -49,6 +49,7 @@
>  #include <sys/socket.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "log.h"
>  #include "tcp_splice.h"
> diff --git a/udp.c b/udp.c
> index 933f24b81616..56b58bd8b43a 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -112,6 +112,7 @@
>  
>  #include "checksum.h"
>  #include "util.h"
> +#include "ip.h"
>  #include "passt.h"
>  #include "tap.h"
>  #include "pcap.h"
> diff --git a/util.c b/util.c
> index 21b35ff94db1..f73ea1d98a09 100644
> --- a/util.c
> +++ b/util.c
> @@ -30,61 +30,6 @@
>  #include "packet.h"
>  #include "log.h"
>  
> -#define IPV6_NH_OPT(nh)							\
> -	((nh) == 0   || (nh) == 43  || (nh) == 44  || (nh) == 50  ||	\
> -	 (nh) == 51  || (nh) == 60  || (nh) == 135 || (nh) == 139 ||	\
> -	 (nh) == 140 || (nh) == 253 || (nh) == 254)
> -
> -/**
> - * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol
> - * @p:		Packet pool, packet number @idx has IPv6 header at @offset
> - * @idx:	Index of packet in pool
> - * @offset:	Pre-calculated IPv6 header offset
> - * @proto:	Filled with L4 protocol number
> - * @dlen:	Data length (payload excluding header extensions), set on return
> - *
> - * Return: pointer to L4 header, NULL if not found
> - */
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> -		 size_t *dlen)
> -{
> -	const struct ipv6_opt_hdr *o;
> -	const struct ipv6hdr *ip6h;
> -	char *base;
> -	int hdrlen;
> -	uint8_t nh;
> -
> -	base = packet_get(p, idx, 0, 0, NULL);
> -	ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen);
> -	if (!ip6h)
> -		return NULL;
> -
> -	offset += sizeof(*ip6h);
> -
> -	nh = ip6h->nexthdr;
> -	if (!IPV6_NH_OPT(nh))
> -		goto found;
> -
> -	while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) {
> -		nh = o->nexthdr;
> -		hdrlen = (o->hdrlen + 1) * 8;
> -
> -		if (IPV6_NH_OPT(nh))
> -			offset += hdrlen;
> -		else
> -			goto found;
> -	}
> -
> -	return NULL;
> -
> -found:
> -	if (nh == 59)
> -		return NULL;
> -
> -	*proto = nh;
> -	return base + offset;
> -}
> -
>  /**
>   * sock_l4() - Create and bind socket for given L4, add to epoll list
>   * @c:		Execution context
> diff --git a/util.h b/util.h
> index d2320f8cc99a..f7c3dfee9972 100644
> --- a/util.h
> +++ b/util.h
> @@ -110,22 +110,6 @@
>  #define	htonl_constant(x)	(__bswap_constant_32(x))
>  #endif
>  
> -#define IN4_IS_ADDR_UNSPECIFIED(a) \
> -	((a)->s_addr == htonl_constant(INADDR_ANY))
> -#define IN4_IS_ADDR_BROADCAST(a) \
> -	((a)->s_addr == htonl_constant(INADDR_BROADCAST))
> -#define IN4_IS_ADDR_LOOPBACK(a) \
> -	(ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
> -#define IN4_IS_ADDR_MULTICAST(a) \
> -	(IN_MULTICAST(ntohl((a)->s_addr)))
> -#define IN4_ARE_ADDR_EQUAL(a, b) \
> -	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
> -#define IN4ADDR_LOOPBACK_INIT \
> -	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
> -#define IN4ADDR_ANY_INIT \
> -	{ .s_addr	= htonl_constant(INADDR_ANY) }
> -
> -
>  #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
>  int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  	     void *arg);
> @@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  			 (void *)(arg));				\
>  	} while (0)
>  
> -#define L2_BUF_IP4_INIT(proto)						\
> -	{								\
> -		.version	= 4,					\
> -		.ihl		= 5,					\
> -		.tos		= 0,					\
> -		.tot_len	= 0,					\
> -		.id		= 0,					\
> -		.frag_off	= 0,					\
> -		.ttl		= 0xff,					\
> -		.protocol	= (proto),				\
> -		.saddr		= 0,					\
> -		.daddr		= 0,					\
> -	}
> -#define L2_BUF_IP4_PSUM(proto)	((uint32_t)htons_constant(0x4500) +	\
> -				 (uint32_t)htons_constant(0xff00 | (proto)))
> -
> -#define L2_BUF_IP6_INIT(proto)						\
> -	{								\
> -		.priority	= 0,					\
> -		.version	= 6,					\
> -		.flow_lbl	= { 0 },				\
> -		.payload_len	= 0,					\
> -		.nexthdr	= (proto),				\
> -		.hop_limit	= 255,					\
> -		.saddr		= IN6ADDR_ANY_INIT,			\
> -		.daddr		= IN6ADDR_ANY_INIT,			\
> -	}
> -
>  #define RCVBUF_BIG		(2UL * 1024 * 1024)
>  #define SNDBUF_BIG		(4UL * 1024 * 1024)
>  #define SNDBUF_SMALL		(128UL * 1024)
> @@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  #include <net/if.h>
>  #include <limits.h>
>  #include <stdint.h>
> -#include <netinet/ip6.h>
>  
>  #include "packet.h"
>  
>  struct ctx;
>  
> -struct ipv6hdr {
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpedantic"
> -#if __BYTE_ORDER == __BIG_ENDIAN
> -	uint8_t			version:4,
> -				priority:4;
> -#else
> -	uint8_t			priority:4,
> -				version:4;
> -#endif
> -#pragma GCC diagnostic pop
> -	uint8_t			flow_lbl[3];
> -
> -	uint16_t		payload_len;
> -	uint8_t			nexthdr;
> -	uint8_t			hop_limit;
> -
> -	struct in6_addr		saddr;
> -	struct in6_addr		daddr;
> -};
> -
> -struct ipv6_opt_hdr {
> -	uint8_t			nexthdr;
> -	uint8_t			hdrlen;
> -	/*
> -	 * TLV encoded option data follows.
> -	 */
> -} __attribute__((packed));	/* required for some archs */
> -
>  /* cppcheck-suppress funcArgNamesDifferent */
>  __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); }
> -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto,
> -		 size_t *dlen);
>  int sock_l4(const struct ctx *c, int af, uint8_t proto,
>  	    const void *bind_addr, const char *ifname, uint16_t port,
>  	    uint32_t data);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-17 15:07 ` [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
@ 2024-02-19  3:01   ` David Gibson
  2024-02-29 16:24   ` Stefano Brivio
  1 sibling, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  3:01 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 6116 bytes --]

On Sat, Feb 17, 2024 at 04:07:22PM +0100, Laurent Vivier wrote:
> We can find the same function to compute the IPv4 header
> checksum in tcp.c, udp.c and tap.c
> 
> Use the function defined for tap.c, csum_ip4_header(), but
> with the code used in tcp.c and udp.c as it doesn't need a fully
> initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
> Notes:
>     v3:
>       - function parameters provide tot_len, saddr, daddr and protocol
>        rather than an iphdr
>     
>     v2:
>       - use csum_ip4_header() from checksum.c
>       - use code from tcp.c and udp.c in csum_ip4_header()
>       - use "const struct iphfr *", check is not updated by the
>         function but by the caller.
> 
>  checksum.c | 17 +++++++++++++----
>  checksum.h |  3 ++-
>  tap.c      |  3 ++-
>  tcp.c      | 24 +++---------------------
>  udp.c      | 20 ++------------------
>  5 files changed, 22 insertions(+), 45 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 74e3742bc6f6..511b296a9a80 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -57,6 +57,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "checksum.h"
>  
>  /* Checksums are optional for UDP over IPv4, so we usually just set
> @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
>  /**
> - * csum_ip4_header() - Calculate and set IPv4 header checksum
> + * csum_ip4_header() - Calculate IPv4 header checksum
>   * @ip4h:	IPv4 header
>   */
> -void csum_ip4_header(struct iphdr *ip4h)
> +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> +			 uint32_t saddr, uint32_t daddr)
>  {
> -	ip4h->check = 0;
> -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);
> +
> +	sum += tot_len;
> +	sum += (saddr >> 16) & 0xffff;
> +	sum += saddr & 0xffff;
> +	sum += (daddr >> 16) & 0xffff;
> +	sum += daddr & 0xffff;
> +
> +	return ~csum_fold(sum);
>  }
>  
>  /**
> diff --git a/checksum.h b/checksum.h
> index dfa705a04a24..92db73612b6e 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -13,7 +13,8 @@ struct icmp6hdr;
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> -void csum_ip4_header(struct iphdr *ip4h);
> +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> +			 uint32_t saddr, uint32_t daddr);
>  void csum_udp4(struct udphdr *udp4hr,
>  	       struct in_addr saddr, struct in_addr daddr,
>  	       const void *payload, size_t len);
> diff --git a/tap.c b/tap.c
> index 3ea03f720d6d..2d590e8525a0 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -160,7 +160,8 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst,
>  	ip4h->protocol = proto;
>  	ip4h->saddr = src.s_addr;
>  	ip4h->daddr = dst.s_addr;
> -	csum_ip4_header(ip4h);
> +	ip4h->check = csum_ip4_header(ip4h->tot_len, proto,
> +				      src.s_addr, dst.s_addr);
>  	return ip4h + 1;
>  }
>  
> diff --git a/tcp.c b/tcp.c
> index 45ef5146729a..8887656d3ee8 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
>  		trace("TCP: failed to set SO_SNDBUF to %i", v);
>  }
>  
> -/**
> - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * tcp_update_check_tcp4() - Update TCP checksum from stored one
>   * @buf:	L2 packet buffer with final IPv4 header
> @@ -1393,10 +1376,9 @@ do {									\
>  		b->iph.saddr = a4->s_addr;
>  		b->iph.daddr = c->ip4.addr_seen.s_addr;
>  
> -		if (check)
> -			b->iph.check = *check;
> -		else
> -			tcp_update_check_ip4(b);
> +		b->iph.check = check ? *check :
> +			       csum_ip4_header(b->iph.tot_len, IPPROTO_TCP,
> +					       b->iph.saddr, b->iph.daddr);
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> diff --git a/udp.c b/udp.c
> index 56b58bd8b43a..cd34f659210b 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd)
>  	}
>  }
>  
> -/**
> - * udp_update_check4() - Update checksum with variable parts from stored one
> - * @buf:	L2 packet buffer with final IPv4 header
> - */
> -static void udp_update_check4(struct udp4_l2_buf_t *buf)
> -{
> -	uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP);
> -
> -	sum += buf->iph.tot_len;
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -
> -	buf->iph.check = (uint16_t)~csum_fold(sum);
> -}
> -
>  /**
>   * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses
>   * @eth_d:	Ethernet destination address, NULL if unchanged
> @@ -614,7 +597,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  		b->iph.saddr = b->s_in.sin_addr.s_addr;
>  	}
>  
> -	udp_update_check4(b);
> +	b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> +				       b->iph.saddr, b->iph.daddr);
>  	b->uh.source = b->s_in.sin_port;
>  	b->uh.dest = htons(dstport);
>  	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-17 15:07 ` [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
@ 2024-02-19  3:08   ` David Gibson
  2024-02-28 13:26     ` Laurent Vivier
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-02-19  3:08 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 9328 bytes --]

On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:
> The TCP and UDP checksums are computed using the data in the TCP/UDP
> payload but also some informations in the IP header (protocol,
> length, source and destination addresses).
> 
> We add two functions, proto_ipv4_header_psum() and
> proto_ipv6_header_psum(), to compute the checksum of the IP
> header part.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v3:
>       - function parameters provide tot_len, saddr, daddr and protocol
>         rather than an iphdr/ipv6hdr
>     
>     v2:
>       - move new function to checksum.c
>       - use _psum rather than _checksum in the name
>       - replace csum_udp4() and csum_udp6() by the new function
> 
>  checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------
>  checksum.h |  4 ++++
>  tcp.c      | 44 ++++++++++++++++-------------------
>  udp.c      | 10 ++++----
>  4 files changed, 81 insertions(+), 44 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 511b296a9a80..55bf1340a257 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
>  	return ~csum_fold(sum);
>  }
>  
> +/**
> + * proto_ipv4_header_psum() - Calculates the partial checksum of an
> + * 			      IPv4 header for UDP or TCP
> + * @tot_len:	Payload length
> + * @proto:	Protocol number
> + * @saddr:	Source address
> + * @daddr:	Destination address
> + * @proto:	proto Protocol number
> + * Returns:	Partial checksum of the IPv4 header
> + */
> +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
> +				uint32_t saddr, uint32_t daddr)
> +{
> +	uint32_t psum = htons(protocol);
> +
> +	psum += (saddr >> 16) & 0xffff;
> +	psum += saddr & 0xffff;
> +	psum += (daddr >> 16) & 0xffff;
> +	psum += daddr & 0xffff;
> +	psum += htons(ntohs(tot_len) - 20);
> +
> +	return psum;
> +}
> +
>  /**
>   * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
>   * @udp4hr:	UDP header, initialised apart from checksum
> @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr,
>  	udp4hr->check = 0;
>  
>  	if (UDP4_REAL_CHECKSUMS) {
> -		/* UNTESTED: if we did want real UDPv4 checksums, this
> -		 * is roughly what we'd need */
> -		uint32_t psum = csum_fold(saddr.s_addr)
> -			+ csum_fold(daddr.s_addr)
> -			+ htons(len + sizeof(*udp4hr))
> -			+ htons(IPPROTO_UDP);
> -		/* Add in partial checksum for the UDP header alone */
> -		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> +		uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP,
> +						       saddr.s_addr,
> +						       daddr.s_addr);
> +		psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum);
>  		udp4hr->check = csum(payload, len, psum);
>  	}
>  }
> @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
>  	icmp4hr->checksum = csum(payload, len, psum);
>  }
>  
> +/**
> + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> + * 			      IPv6 header for UDP or TCP
> + * @payload_len:	Payload length
> + * @proto:		Protocol number
> + * @saddr:		Source address
> + * @daddr:		Destination address
> + * Returns:	Partial checksum of the IPv6 header
> + */
> +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> +				struct in6_addr saddr, struct in6_addr daddr)

Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
not be what we want.

> +{
> +	uint32_t sum = htons(protocol) + payload_len;

Uh.. doesn't that need to be htons(payload_len + sizeof(struct
ipv6hdr)) rather than simply payload_len?

> +
> +	sum += sum_16b(&saddr, sizeof(saddr));
> +	sum += sum_16b(&daddr, sizeof(daddr));
> +
> +	return sum;
> +}
> +
>  /**
>   * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet
>   * @udp6hr:	UDP header, initialised apart from checksum
> @@ -190,14 +230,11 @@ void csum_udp6(struct udphdr *udp6hr,
>  	       const struct in6_addr *saddr, const struct in6_addr *daddr,
>  	       const void *payload, size_t len)
>  {
> -	/* Partial checksum for the pseudo-IPv6 header */
> -	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> -		        sum_16b(daddr, sizeof(*daddr)) +
> -		        htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
> -
> +	uint32_t psum = proto_ipv6_header_psum(len, IPPROTO_UDP,
> +					       *saddr, *daddr);
>  	udp6hr->check = 0;
> -	/* Add in partial checksum for the UDP header alone */
> -	psum += sum_16b(udp6hr, sizeof(*udp6hr));
> +
> +	psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum);
>  	udp6hr->check = csum(payload, len, psum);
>  }
>  
> diff --git a/checksum.h b/checksum.h
> index 92db73612b6e..b2b5b8e8b77e 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -15,10 +15,14 @@ uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
>  uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
>  			 uint32_t saddr, uint32_t daddr);
> +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
> +				uint32_t saddr, uint32_t daddr);
>  void csum_udp4(struct udphdr *udp4hr,
>  	       struct in_addr saddr, struct in_addr daddr,
>  	       const void *payload, size_t len);
>  void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
> +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> +				struct in6_addr saddr, struct in6_addr daddr);
>  void csum_udp6(struct udphdr *udp6hr,
>  	       const struct in6_addr *saddr, const struct in6_addr *daddr,
>  	       const void *payload, size_t len);
> diff --git a/tcp.c b/tcp.c
> index 8887656d3ee8..8ee252131504 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -938,39 +938,29 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
>   * tcp_update_check_tcp4() - Update TCP checksum from stored one
>   * @buf:	L2 packet buffer with final IPv4 header
>   */
> -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf)
> +static void tcp_update_check_tcp4(struct iphdr *iph)
>  {
> -	uint16_t tlen = ntohs(buf->iph.tot_len) - 20;
> -	uint32_t sum = htons(IPPROTO_TCP);
> +	struct tcphdr *th = (struct tcphdr *)(iph + 1);
> +	uint16_t tlen = ntohs(iph->tot_len) - 20;
> +	uint32_t sum = proto_ipv4_header_psum(iph->tot_len, IPPROTO_TCP,
> +					      iph->saddr, iph->daddr);
>  
> -	sum += (buf->iph.saddr >> 16) & 0xffff;
> -	sum += buf->iph.saddr & 0xffff;
> -	sum += (buf->iph.daddr >> 16) & 0xffff;
> -	sum += buf->iph.daddr & 0xffff;
> -	sum += htons(ntohs(buf->iph.tot_len) - 20);
> -
> -	buf->th.check = 0;
> -	buf->th.check = csum(&buf->th, tlen, sum);
> +	th->check = 0;
> +	th->check = csum(th, tlen, sum);
>  }
>  
>  /**
>   * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6
>   * @buf:	L2 packet buffer with final IPv6 header
>   */
> -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf)
> +static void tcp_update_check_tcp6(struct ipv6hdr *ip6h)
>  {
> -	int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr);
> -
> -	buf->ip6h.hop_limit = IPPROTO_TCP;
> -	buf->ip6h.version = 0;
> -	buf->ip6h.nexthdr = 0;
> +	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
> +	uint32_t sum = proto_ipv6_header_psum(ip6h->payload_len, IPPROTO_TCP,
> +					      ip6h->saddr, ip6h->daddr);
>  
> -	buf->th.check = 0;
> -	buf->th.check = csum(&buf->ip6h, len, 0);
> -
> -	buf->ip6h.hop_limit = 255;
> -	buf->ip6h.version = 6;
> -	buf->ip6h.nexthdr = IPPROTO_TCP;
> +	th->check = 0;
> +	th->check = csum(th, ntohs(ip6h->payload_len), sum);
>  }
>  
>  /**
> @@ -1382,7 +1372,7 @@ do {									\
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> -		tcp_update_check_tcp4(b);
> +		tcp_update_check_tcp4(&b->iph);
>  
>  		tlen = tap_iov_len(c, &b->taph, ip_len);
>  	} else {
> @@ -1401,7 +1391,11 @@ do {									\
>  
>  		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
>  
> -		tcp_update_check_tcp6(b);
> +		tcp_update_check_tcp6(&b->ip6h);
> +
> +		b->ip6h.hop_limit = 255;
> +		b->ip6h.version = 6;
> +		b->ip6h.nexthdr = IPPROTO_TCP;
>  
>  		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
>  		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
> diff --git a/udp.c b/udp.c
> index cd34f659210b..e123b35a955c 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -670,10 +670,12 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
>  	b->uh.source = b->s_in6.sin6_port;
>  	b->uh.dest = htons(dstport);
>  	b->uh.len = b->ip6h.payload_len;
> -
> -	b->ip6h.hop_limit = IPPROTO_UDP;
> -	b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0;
> -	b->uh.check = csum(&b->ip6h, ip_len, 0);
> +	b->uh.check = 0;
> +	b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> +			   proto_ipv6_header_psum(b->ip6h.payload_len,
> +						  IPPROTO_UDP,
> +						  b->ip6h.saddr,
> +						  b->ip6h.daddr));
>  	b->ip6h.version = 6;
>  	b->ip6h.nexthdr = IPPROTO_UDP;
>  	b->ip6h.hop_limit = 255;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()
  2024-02-17 15:07 ` [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Laurent Vivier
@ 2024-02-19  3:14   ` David Gibson
  2024-02-29 16:29   ` Stefano Brivio
  1 sibling, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-02-19  3:14 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7661 bytes --]

On Sat, Feb 17, 2024 at 04:07:25PM +0100, Laurent Vivier wrote:
> Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function
> tcp_fill_header().
> 
> Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to
> tcp_fill_ipv4_header() and tcp_fill_ipv6_header()
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v3:
>       - add to sub-series part 1
>     
>     v2:
>       - extract header filling functions from
>         "tcp: extract buffer management from tcp_send_flag()"
>       - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(),
>         tcp_fill_ipv6_header()
>       - use upside-down Christmas tree arguments order
>       - replace (void *) by (struct tcphdr *)
> 
>  tcp.c | 154 +++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 104 insertions(+), 50 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index aa03c20712f6..bc57a4f6e611 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c)
>  	tcp_l2_data_buf_flush(c);
>  }
>  
> +/**
> + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment.
> + *
> + * @th:		Pointer to the TCP header structure
> + * @conn:	Pointer to the TCP connection structure
> + * @seq:	Sequence number
> + */
> +static void tcp_fill_header(struct tcphdr *th,
> +			       const struct tcp_tap_conn *conn, uint32_t seq)
> +{
> +	th->source = htons(conn->fport);
> +	th->dest = htons(conn->eport);
> +	th->seq = htonl(seq);
> +	th->ack_seq = htonl(conn->seq_ack_to_tap);
> +	if (conn->events & ESTABLISHED)	{
> +		th->window = htons(conn->wnd_to_tap);
> +	} else {
> +		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;
> +
> +		th->window = htons(MIN(wnd, USHRT_MAX));
> +	}
> +}
> +
> +/**
> + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
> + * @c:		Execution context
> + * @conn:	Connection pointer
> + * @iph:	Pointer to IPv4 header, immediately followed by a TCP header
> + * @plen:	Payload length (including TCP header options)
> + * @check:	Checksum, if already known
> + * @seq:	Sequence number for this segment
> + *
> + * Return: IP frame length including L2 headers, host order
> + */
> +static size_t tcp_fill_ipv4_header(const struct ctx *c,
> +				   const struct tcp_tap_conn *conn,
> +				   struct iphdr *iph, size_t plen,
> +				   const uint16_t *check, uint32_t seq)
> +{
> +	size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
> +	const struct in_addr *a4 = inany_v4(&conn->faddr);
> +	struct tcphdr *th = (struct tcphdr *)(iph + 1);
> +
> +	iph->tot_len = htons(ip_len);
> +	iph->saddr = a4->s_addr;
> +	iph->daddr = c->ip4.addr_seen.s_addr;
> +
> +	iph->check = check ? *check :
> +		     csum_ip4_header(iph->tot_len, IPPROTO_TCP,
> +				     iph->saddr, iph->daddr);
> +
> +
> +	tcp_fill_header(th, conn, seq);
> +
> +	tcp_update_check_tcp4(iph);
> +
> +	return ip_len;
> +}
> +
> +/**
> + * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers
> + * @c:		Execution context
> + * @conn:	Connection pointer
> + * @ip6h:	Pointer to IPv6 header, immediately followed by a TCP header
> + * @plen:	Payload length (including TCP header options)
> + * @check:	Checksum, if already known
> + * @seq:	Sequence number for this segment
> + *
> + * Return: The total length of the IPv6 packet, host order
> + */
> +static size_t tcp_fill_ipv6_header(const struct ctx *c,
> +				   const struct tcp_tap_conn *conn,
> +				   struct ipv6hdr *ip6h, size_t plen,
> +				   uint32_t seq)
> +{
> +	size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
> +	struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
> +
> +	ip6h->payload_len = htons(plen + sizeof(struct tcphdr));
> +	ip6h->saddr = conn->faddr.a6;
> +	if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr))
> +		ip6h->daddr = c->ip6.addr_ll_seen;
> +	else
> +		ip6h->daddr = c->ip6.addr_seen;
> +
> +	memset(ip6h->flow_lbl, 0, 3);
> +
> +	tcp_fill_header(th, conn, seq);
> +
> +	tcp_update_check_tcp6(ip6h);
> +
> +	ip6h->hop_limit = 255;
> +	ip6h->version = 6;
> +	ip6h->nexthdr = IPPROTO_TCP;
> +
> +	ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf;
> +	ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff;
> +	ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff;

IIUC, the reason part of the ip6h update is done after the TCP header
update, but part before was a consequence of how we did the
checksumming: we computed the pseudo-header checksum by doing a full
checksum operation over the partially filled header, meaning filling
the fields not in the pseudo-header had to be done afterwards.

Now that you've reworked the checksumming, that's no longer necessary,
so you could group all the ip6g initialisations together.   Oh.. and
also avoid the pre-filling of the flow_lbl with 0s before filling the
real values.

> +	return ip_len;
> +}
> +
>  /**
>   * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers
>   * @c:		Execution context
> @@ -1343,67 +1445,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
>  	const struct in_addr *a4 = inany_v4(&conn->faddr);
>  	size_t ip_len, tlen;
>  
> -#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq)			\
> -do {									\
> -	b->th.source = htons(conn->fport);				\
> -	b->th.dest = htons(conn->eport);				\
> -	b->th.seq = htonl(seq);						\
> -	b->th.ack_seq = htonl(conn->seq_ack_to_tap);			\
> -	if (conn->events & ESTABLISHED)	{				\
> -		b->th.window = htons(conn->wnd_to_tap);			\
> -	} else {							\
> -		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;	\
> -									\
> -		b->th.window = htons(MIN(wnd, USHRT_MAX));		\
> -	}								\
> -} while (0)
> -
>  	if (a4) {
>  		struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p;
>  
> -		ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
> -		b->iph.tot_len = htons(ip_len);
> -		b->iph.saddr = a4->s_addr;
> -		b->iph.daddr = c->ip4.addr_seen.s_addr;
> -
> -		b->iph.check = check ? *check :
> -			       csum_ip4_header(b->iph.tot_len, IPPROTO_TCP,
> -					       b->iph.saddr, b->iph.daddr);
> -
> -		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
> -
> -		tcp_update_check_tcp4(&b->iph);
> +		ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq);
>  
>  		tlen = tap_iov_len(c, &b->taph, ip_len);
>  	} else {
>  		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
>  
> -		ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
> -
> -		b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr));
> -		b->ip6h.saddr = conn->faddr.a6;
> -		if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr))
> -			b->ip6h.daddr = c->ip6.addr_ll_seen;
> -		else
> -			b->ip6h.daddr = c->ip6.addr_seen;
> -
> -		memset(b->ip6h.flow_lbl, 0, 3);
> -
> -		SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
> -
> -		tcp_update_check_tcp6(&b->ip6h);
> -
> -		b->ip6h.hop_limit = 255;
> -		b->ip6h.version = 6;
> -		b->ip6h.nexthdr = IPPROTO_TCP;
> -
> -		b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf;
> -		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
> -		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
> +		ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq);
>  
>  		tlen = tap_iov_len(c, &b->taph, ip_len);
>  	}
> -#undef SET_TCP_HEADER_COMMON_V4_V6
>  
>  	return tlen;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-19  3:08   ` David Gibson
@ 2024-02-28 13:26     ` Laurent Vivier
  2024-02-29  0:38       ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Laurent Vivier @ 2024-02-28 13:26 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On 2/19/24 04:08, David Gibson wrote:
> On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:
>> The TCP and UDP checksums are computed using the data in the TCP/UDP
>> payload but also some informations in the IP header (protocol,
>> length, source and destination addresses).
>>
>> We add two functions, proto_ipv4_header_psum() and
>> proto_ipv6_header_psum(), to compute the checksum of the IP
>> header part.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>>
>> Notes:
>>      v3:
>>        - function parameters provide tot_len, saddr, daddr and protocol
>>          rather than an iphdr/ipv6hdr
>>      
>>      v2:
>>        - move new function to checksum.c
>>        - use _psum rather than _checksum in the name
>>        - replace csum_udp4() and csum_udp6() by the new function
>>
>>   checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------
>>   checksum.h |  4 ++++
>>   tcp.c      | 44 ++++++++++++++++-------------------
>>   udp.c      | 10 ++++----
>>   4 files changed, 81 insertions(+), 44 deletions(-)
>>
>> diff --git a/checksum.c b/checksum.c
>> index 511b296a9a80..55bf1340a257 100644
>> --- a/checksum.c
>> +++ b/checksum.c
>> @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
>>   	return ~csum_fold(sum);
>>   }
>>   
>> +/**
>> + * proto_ipv4_header_psum() - Calculates the partial checksum of an
>> + * 			      IPv4 header for UDP or TCP
>> + * @tot_len:	Payload length
>> + * @proto:	Protocol number
>> + * @saddr:	Source address
>> + * @daddr:	Destination address
>> + * @proto:	proto Protocol number
>> + * Returns:	Partial checksum of the IPv4 header
>> + */
>> +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
>> +				uint32_t saddr, uint32_t daddr)
>> +{
>> +	uint32_t psum = htons(protocol);
>> +
>> +	psum += (saddr >> 16) & 0xffff;
>> +	psum += saddr & 0xffff;
>> +	psum += (daddr >> 16) & 0xffff;
>> +	psum += daddr & 0xffff;
>> +	psum += htons(ntohs(tot_len) - 20);
>> +
>> +	return psum;
>> +}
>> +
>>   /**
>>    * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
>>    * @udp4hr:	UDP header, initialised apart from checksum
>> @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr,
>>   	udp4hr->check = 0;
>>   
>>   	if (UDP4_REAL_CHECKSUMS) {
>> -		/* UNTESTED: if we did want real UDPv4 checksums, this
>> -		 * is roughly what we'd need */
>> -		uint32_t psum = csum_fold(saddr.s_addr)
>> -			+ csum_fold(daddr.s_addr)
>> -			+ htons(len + sizeof(*udp4hr))
>> -			+ htons(IPPROTO_UDP);
>> -		/* Add in partial checksum for the UDP header alone */
>> -		psum += sum_16b(udp4hr, sizeof(*udp4hr));
>> +		uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP,
>> +						       saddr.s_addr,
>> +						       daddr.s_addr);
>> +		psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum);
>>   		udp4hr->check = csum(payload, len, psum);
>>   	}
>>   }
>> @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
>>   	icmp4hr->checksum = csum(payload, len, psum);
>>   }
>>   
>> +/**
>> + * proto_ipv6_header_psum() - Calculates the partial checksum of an
>> + * 			      IPv6 header for UDP or TCP
>> + * @payload_len:	Payload length
>> + * @proto:		Protocol number
>> + * @saddr:		Source address
>> + * @daddr:		Destination address
>> + * Returns:	Partial checksum of the IPv6 header
>> + */
>> +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
>> +				struct in6_addr saddr, struct in6_addr daddr)
> 
> Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> not be what we want.

The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can 
be misaligned).

Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr to some local 
variables and then provide the pointers of the local variables to proto_ipv6_header_psum()?

> 
>> +{
>> +	uint32_t sum = htons(protocol) + payload_len;
> 
> Uh.. doesn't that need to be htons(payload_len + sizeof(struct
> ipv6hdr)) rather than simply payload_len?
> 

payload_len is:

- b->ip6h.payload_len (from udp_update_hdr6())
- ip6h->payload_len (from tcp_update_check_tcp6())

and in ip6h payload_len is:

- htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)) (from udp_update_hdr6())
- htons(plen + sizeof(struct tcphdr)) (from tcp_fill_ipv6_header())

So this is correct... but

csum_udp6() uses "len" from tap_udp6_send(), so there is a bug here.

but there is also a problem with proto_ipv4_header_psum() that need host endianness and 
tcp_update_check_tcp4() provides network endianness...

The first idea was to use the value from ip6h payload_len as it is already computed.
But mixing network endianness and host endianness appears to be a bad idea...

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-28 13:26     ` Laurent Vivier
@ 2024-02-29  0:38       ` David Gibson
  2024-02-29  7:05         ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-02-29  0:38 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 6650 bytes --]

On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:
> On 2/19/24 04:08, David Gibson wrote:
> > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:
> > > The TCP and UDP checksums are computed using the data in the TCP/UDP
> > > payload but also some informations in the IP header (protocol,
> > > length, source and destination addresses).
> > > 
> > > We add two functions, proto_ipv4_header_psum() and
> > > proto_ipv6_header_psum(), to compute the checksum of the IP
> > > header part.
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > ---
> > > 
> > > Notes:
> > >      v3:
> > >        - function parameters provide tot_len, saddr, daddr and protocol
> > >          rather than an iphdr/ipv6hdr
> > >      v2:
> > >        - move new function to checksum.c
> > >        - use _psum rather than _checksum in the name
> > >        - replace csum_udp4() and csum_udp6() by the new function
> > > 
> > >   checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------
> > >   checksum.h |  4 ++++
> > >   tcp.c      | 44 ++++++++++++++++-------------------
> > >   udp.c      | 10 ++++----
> > >   4 files changed, 81 insertions(+), 44 deletions(-)
> > > 
> > > diff --git a/checksum.c b/checksum.c
> > > index 511b296a9a80..55bf1340a257 100644
> > > --- a/checksum.c
> > > +++ b/checksum.c
> > > @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> > >   	return ~csum_fold(sum);
> > >   }
> > > +/**
> > > + * proto_ipv4_header_psum() - Calculates the partial checksum of an
> > > + * 			      IPv4 header for UDP or TCP
> > > + * @tot_len:	Payload length
> > > + * @proto:	Protocol number
> > > + * @saddr:	Source address
> > > + * @daddr:	Destination address
> > > + * @proto:	proto Protocol number
> > > + * Returns:	Partial checksum of the IPv4 header
> > > + */
> > > +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol,
> > > +				uint32_t saddr, uint32_t daddr)
> > > +{
> > > +	uint32_t psum = htons(protocol);
> > > +
> > > +	psum += (saddr >> 16) & 0xffff;
> > > +	psum += saddr & 0xffff;
> > > +	psum += (daddr >> 16) & 0xffff;
> > > +	psum += daddr & 0xffff;
> > > +	psum += htons(ntohs(tot_len) - 20);
> > > +
> > > +	return psum;
> > > +}
> > > +
> > >   /**
> > >    * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet
> > >    * @udp4hr:	UDP header, initialised apart from checksum
> > > @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr,
> > >   	udp4hr->check = 0;
> > >   	if (UDP4_REAL_CHECKSUMS) {
> > > -		/* UNTESTED: if we did want real UDPv4 checksums, this
> > > -		 * is roughly what we'd need */
> > > -		uint32_t psum = csum_fold(saddr.s_addr)
> > > -			+ csum_fold(daddr.s_addr)
> > > -			+ htons(len + sizeof(*udp4hr))
> > > -			+ htons(IPPROTO_UDP);
> > > -		/* Add in partial checksum for the UDP header alone */
> > > -		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> > > +		uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP,
> > > +						       saddr.s_addr,
> > > +						       daddr.s_addr);
> > > +		psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum);
> > >   		udp4hr->check = csum(payload, len, psum);
> > >   	}
> > >   }
> > > @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
> > >   	icmp4hr->checksum = csum(payload, len, psum);
> > >   }
> > > +/**
> > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > + * 			      IPv6 header for UDP or TCP
> > > + * @payload_len:	Payload length
> > > + * @proto:		Protocol number
> > > + * @saddr:		Source address
> > > + * @daddr:		Destination address
> > > + * Returns:	Partial checksum of the IPv6 header
> > > + */
> > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > +				struct in6_addr saddr, struct in6_addr daddr)
> > 
> > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > not be what we want.
> 
> The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> &ip6h->daddr can be misaligned).

Ah, right.  That's a neat idea, but I'm not sure it really helps: I
think it will just move the misaligned access from inside the function
to the call site, where we try to marshal the parameter from something
unaligned.

> Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr
> to some local variables and then provide the pointers of the local variables
> to proto_ipv6_header_psum()?

Honestly, I'm not sure.

> > > +{
> > > +	uint32_t sum = htons(protocol) + payload_len;
> > 
> > Uh.. doesn't that need to be htons(payload_len + sizeof(struct
> > ipv6hdr)) rather than simply payload_len?
> > 
> 
> payload_len is:
> 
> - b->ip6h.payload_len (from udp_update_hdr6())
> - ip6h->payload_len (from tcp_update_check_tcp6())
> 
> and in ip6h payload_len is:
> 
> - htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)) (from udp_update_hdr6())
> - htons(plen + sizeof(struct tcphdr)) (from tcp_fill_ipv6_header())
> 
> So this is correct... but

Ah, right.  Not sure why I thought the ip6h length needed to be
included.

As a rule htons(x) + y is always suspect, because you generally can't
do math on values that aren't host endian.  We get away with it in
these csum functions because the way they're folded means the answers
end up the same - as long as we're consistent about it, anyway.

> csum_udp6() uses "len" from tap_udp6_send(), so there is a bug here.
> 
> but there is also a problem with proto_ipv4_header_psum() that need host
> endianness and tcp_update_check_tcp4() provides network endianness...
> 
> The first idea was to use the value from ip6h payload_len as it is already computed.
> But mixing network endianness and host endianness appears to be a bad idea...

Right.  As a rule I really dislike putting non-host-endian values in a
plain u16 local or parameter, because it's really easy to think it's
just a number, rather than a funny encoding of a number.  Likewise, I
think it's a lot easier to keep track of things if every field of a
struct has a strict endianness, which we never change in place, even
temporarily. (Ideally, in fact, I'd prefer to see non-host-endian
values always in an encapsulating type that won't let you do math on
them, but that's not always practical in C).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29  0:38       ` David Gibson
@ 2024-02-29  7:05         ` Stefano Brivio
  2024-02-29  8:49           ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29  7:05 UTC (permalink / raw)
  To: David Gibson, Laurent Vivier; +Cc: passt-dev

On Thu, 29 Feb 2024 11:38:53 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:
> > On 2/19/24 04:08, David Gibson wrote:  
> > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > >
> > > [...]
> > >
> > > > +/**
> > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > + * 			      IPv6 header for UDP or TCP
> > > > + * @payload_len:	Payload length
> > > > + * @proto:		Protocol number
> > > > + * @saddr:		Source address
> > > > + * @daddr:		Destination address
> > > > + * Returns:	Partial checksum of the IPv6 header
> > > > + */
> > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > +				struct in6_addr saddr, struct in6_addr daddr)  
> > > 
> > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > not be what we want.  
> > 
> > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > &ip6h->daddr can be misaligned).  
> 
> Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> think it will just move the misaligned access from inside the function
> to the call site, where we try to marshal the parameter from something
> unaligned.

I haven't tested this yet, but note that this is generally okay: the
problem is *dereferencing* an unaligned pointer. But if you load memory
from an aligned pointer, and extract a value from this memory, it's all
fine.

Speaking MIPS, this is not safe on all CPU models:

	la	$1, 1002   # s1 now contains the value 1002
	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2

but this is:

	la	$1, 1000   # s1 now contains the value 1000
	la	$2, 1004   # s3 now contains the value 1004
	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
	sll	$5, $3, 16 # 16-bit shift left s3 into s5
	srl	$6, $4, 16 # 16-bit shift right s4 into s6
	or	$2, $5, $6 # OR s5 and s6 into s2

On x86, as far as I know, mov will digest the equivalent of the first
version without issues.

> > Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr
> > to some local variables and then provide the pointers of the local variables
> > to proto_ipv6_header_psum()?  
> 
> Honestly, I'm not sure.

I think it's pretty much the same. Let the compiler pass 16-byte
variables by value, and it will generally do this for us, but only if
needed.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29  7:05         ` Stefano Brivio
@ 2024-02-29  8:49           ` David Gibson
  2024-02-29  8:56             ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-02-29  8:49 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 3422 bytes --]

On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:
> On Thu, 29 Feb 2024 11:38:53 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:
> > > On 2/19/24 04:08, David Gibson wrote:  
> > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > >
> > > > [...]
> > > >
> > > > > +/**
> > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > + * 			      IPv6 header for UDP or TCP
> > > > > + * @payload_len:	Payload length
> > > > > + * @proto:		Protocol number
> > > > > + * @saddr:		Source address
> > > > > + * @daddr:		Destination address
> > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > + */
> > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > +				struct in6_addr saddr, struct in6_addr daddr)  
> > > > 
> > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > not be what we want.  
> > > 
> > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > &ip6h->daddr can be misaligned).  
> > 
> > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > think it will just move the misaligned access from inside the function
> > to the call site, where we try to marshal the parameter from something
> > unaligned.
> 
> I haven't tested this yet, but note that this is generally okay: the
> problem is *dereferencing* an unaligned pointer. But if you load memory
> from an aligned pointer, and extract a value from this memory, it's all
> fine.

Right, that's kind of what I'm getting at.  Assuming this value starts
in an unaligned buffer, then in order to pass this by value the caller
will need to load from that unaligned pointer.  AFAIK, the compiler
will base the type of loads only on the pointed to type, which isn't
changed whether we dereference in the caller or the callee.

> 
> Speaking MIPS, this is not safe on all CPU models:
> 
> 	la	$1, 1002   # s1 now contains the value 1002
> 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> 
> but this is:
> 
> 	la	$1, 1000   # s1 now contains the value 1000
> 	la	$2, 1004   # s3 now contains the value 1004
> 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> 	or	$2, $5, $6 # OR s5 and s6 into s2

Right, but I don't think merely moving the dereference to the caller
will necessarily induce the compiler to generate this rather than the
former.

> On x86, as far as I know, mov will digest the equivalent of the first
> version without issues.
> 
> > > Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr
> > > to some local variables and then provide the pointers of the local variables
> > > to proto_ipv6_header_psum()?  
> > 
> > Honestly, I'm not sure.
> 
> I think it's pretty much the same. Let the compiler pass 16-byte
> variables by value, and it will generally do this for us, but only if
> needed.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29  8:49           ` David Gibson
@ 2024-02-29  8:56             ` Stefano Brivio
  2024-02-29 14:15               ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29  8:56 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Thu, 29 Feb 2024 19:49:09 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:
> > On Thu, 29 Feb 2024 11:38:53 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:  
> > > > On 2/19/24 04:08, David Gibson wrote:    
> > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > >
> > > > > [...]
> > > > >  
> > > > > > +/**
> > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > + * @payload_len:	Payload length
> > > > > > + * @proto:		Protocol number
> > > > > > + * @saddr:		Source address
> > > > > > + * @daddr:		Destination address
> > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > + */
> > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > +				struct in6_addr saddr, struct in6_addr daddr)    
> > > > > 
> > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > not be what we want.    
> > > > 
> > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > &ip6h->daddr can be misaligned).    
> > > 
> > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > think it will just move the misaligned access from inside the function
> > > to the call site, where we try to marshal the parameter from something
> > > unaligned.  
> > 
> > I haven't tested this yet, but note that this is generally okay: the
> > problem is *dereferencing* an unaligned pointer. But if you load memory
> > from an aligned pointer, and extract a value from this memory, it's all
> > fine.  
> 
> Right, that's kind of what I'm getting at.  Assuming this value starts
> in an unaligned buffer, then in order to pass this by value the caller
> will need to load from that unaligned pointer.  AFAIK, the compiler
> will base the type of loads only on the pointed to type, which isn't
> changed whether we dereference in the caller or the callee.
> 
> > 
> > Speaking MIPS, this is not safe on all CPU models:
> > 
> > 	la	$1, 1002   # s1 now contains the value 1002
> > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > 
> > but this is:
> > 
> > 	la	$1, 1000   # s1 now contains the value 1000
> > 	la	$2, 1004   # s3 now contains the value 1004
> > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > 	or	$2, $5, $6 # OR s5 and s6 into s2  
> 
> Right, but I don't think merely moving the dereference to the caller
> will necessarily induce the compiler to generate this rather than the
> former.

Oh, oops, I didn't realise this was the case (I haven't reviewed the
patch yet).

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29  8:56             ` Stefano Brivio
@ 2024-02-29 14:15               ` Stefano Brivio
  2024-02-29 23:09                 ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29 14:15 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Thu, 29 Feb 2024 09:56:25 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Thu, 29 Feb 2024 19:49:09 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:  
> > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >     
> > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:    
> > > > > On 2/19/24 04:08, David Gibson wrote:      
> > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > >
> > > > > > [...]
> > > > > >    
> > > > > > > +/**
> > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > + * @payload_len:	Payload length
> > > > > > > + * @proto:		Protocol number
> > > > > > > + * @saddr:		Source address
> > > > > > > + * @daddr:		Destination address
> > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > + */
> > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)      
> > > > > > 
> > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > not be what we want.      
> > > > > 
> > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > &ip6h->daddr can be misaligned).      
> > > > 
> > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > think it will just move the misaligned access from inside the function
> > > > to the call site, where we try to marshal the parameter from something
> > > > unaligned.    
> > > 
> > > I haven't tested this yet, but note that this is generally okay: the
> > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > from an aligned pointer, and extract a value from this memory, it's all
> > > fine.    
> > 
> > Right, that's kind of what I'm getting at.  Assuming this value starts
> > in an unaligned buffer, then in order to pass this by value the caller
> > will need to load from that unaligned pointer.  AFAIK, the compiler
> > will base the type of loads only on the pointed to type, which isn't
> > changed whether we dereference in the caller or the callee.
> >   
> > > 
> > > Speaking MIPS, this is not safe on all CPU models:
> > > 
> > > 	la	$1, 1002   # s1 now contains the value 1002
> > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > 
> > > but this is:
> > > 
> > > 	la	$1, 1000   # s1 now contains the value 1000
> > > 	la	$2, 1004   # s3 now contains the value 1004
> > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > 	or	$2, $5, $6 # OR s5 and s6 into s2    
> > 
> > Right, but I don't think merely moving the dereference to the caller
> > will necessarily induce the compiler to generate this rather than the
> > former.  
> 
> Oh, oops, I didn't realise this was the case (I haven't reviewed the
> patch yet).

...no, that's not the case. Dereferencing 'iph' from
struct tcp[46]_l2_buf_t is fine:

struct tcp4_l2_buf_t {
        uint8_t                    pad[2];               /*     0     2 */
        struct tap_hdr             taph;                 /*     2    18 */
        struct iphdr               iph;                  /*    20    20 */
	[...]
} __attribute__((__packed__));

struct tcp6_l2_buf_t {
        uint8_t                    pad[2];               /*     0     2 */
        struct tap_hdr             taph;                 /*     2    18 */
        struct ipv6hdr             ip6h;                 /*    20    40 */
	[...]
} __attribute__((__packed__));

The problematic structures are the UDP buffers:

struct udp4_l2_buf_t {
        struct sockaddr_in         s_in;                 /*     0    16 */
        struct tap_hdr             taph;                 /*    16    18 */
        struct iphdr               iph;                  /*    34    20 */
	[...]
} __attribute__((__aligned__(4)));

and for UDP, this patch is dereferencing buffer pointers only, not
pointers to headers.

Inconsistent if you want, but it's quite simple and it works, plus if you
had half a mind (at least for UDP) to split buffers into header and
payloads iovec parts... this doesn't need to be exceedingly elegant.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-17 15:07 ` [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
  2024-02-19  3:01   ` David Gibson
@ 2024-02-29 16:24   ` Stefano Brivio
  2024-02-29 23:10     ` David Gibson
  1 sibling, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29 16:24 UTC (permalink / raw)
  To: Laurent Vivier, David Gibson; +Cc: passt-dev

On Sat, 17 Feb 2024 16:07:22 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> We can find the same function to compute the IPv4 header
> checksum in tcp.c, udp.c and tap.c
> 
> Use the function defined for tap.c, csum_ip4_header(), but
> with the code used in tcp.c and udp.c as it doesn't need a fully
> initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v3:
>       - function parameters provide tot_len, saddr, daddr and protocol
>        rather than an iphdr
>     
>     v2:
>       - use csum_ip4_header() from checksum.c
>       - use code from tcp.c and udp.c in csum_ip4_header()
>       - use "const struct iphfr *", check is not updated by the
>         function but by the caller.
> 
>  checksum.c | 17 +++++++++++++----
>  checksum.h |  3 ++-
>  tap.c      |  3 ++-
>  tcp.c      | 24 +++---------------------
>  udp.c      | 20 ++------------------
>  5 files changed, 22 insertions(+), 45 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 74e3742bc6f6..511b296a9a80 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -57,6 +57,7 @@
>  #include <linux/icmpv6.h>
>  
>  #include "util.h"
> +#include "ip.h"
>  #include "checksum.h"
>  
>  /* Checksums are optional for UDP over IPv4, so we usually just set
> @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
>  /**
> - * csum_ip4_header() - Calculate and set IPv4 header checksum
> + * csum_ip4_header() - Calculate IPv4 header checksum
>   * @ip4h:	IPv4 header
>   */
> -void csum_ip4_header(struct iphdr *ip4h)
> +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> +			 uint32_t saddr, uint32_t daddr)
>  {
> -	ip4h->check = 0;
> -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);

Now that we use this macro, Coverity Scan realises that it's broken:

#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) +     \
				(uint32_t)htons_constant(0xff00 | (proto)))

...but proto is eight (lower) bits, so this actually ignores 'proto'.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()
  2024-02-17 15:07 ` [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Laurent Vivier
  2024-02-19  3:14   ` David Gibson
@ 2024-02-29 16:29   ` Stefano Brivio
  1 sibling, 0 replies; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29 16:29 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Sat, 17 Feb 2024 16:07:25 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function
> tcp_fill_header().
> 
> Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to
> tcp_fill_ipv4_header() and tcp_fill_ipv6_header()
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> 
> Notes:
>     v3:
>       - add to sub-series part 1
>     
>     v2:
>       - extract header filling functions from
>         "tcp: extract buffer management from tcp_send_flag()"
>       - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(),
>         tcp_fill_ipv6_header()
>       - use upside-down Christmas tree arguments order
>       - replace (void *) by (struct tcphdr *)
> 
>  tcp.c | 154 +++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 104 insertions(+), 50 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index aa03c20712f6..bc57a4f6e611 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c)
>  	tcp_l2_data_buf_flush(c);
>  }
>  
> +/**
> + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment.
> + *
> + * @th:		Pointer to the TCP header structure
> + * @conn:	Pointer to the TCP connection structure
> + * @seq:	Sequence number
> + */
> +static void tcp_fill_header(struct tcphdr *th,
> +			       const struct tcp_tap_conn *conn, uint32_t seq)
> +{
> +	th->source = htons(conn->fport);
> +	th->dest = htons(conn->eport);
> +	th->seq = htonl(seq);
> +	th->ack_seq = htonl(conn->seq_ack_to_tap);
> +	if (conn->events & ESTABLISHED)	{
> +		th->window = htons(conn->wnd_to_tap);
> +	} else {
> +		unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap;
> +
> +		th->window = htons(MIN(wnd, USHRT_MAX));
> +	}
> +}
> +
> +/**
> + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
> + * @c:		Execution context
> + * @conn:	Connection pointer
> + * @iph:	Pointer to IPv4 header, immediately followed by a TCP header
> + * @plen:	Payload length (including TCP header options)
> + * @check:	Checksum, if already known
> + * @seq:	Sequence number for this segment
> + *
> + * Return: IP frame length including L2 headers, host order
> + */
> +static size_t tcp_fill_ipv4_header(const struct ctx *c,
> +				   const struct tcp_tap_conn *conn,
> +				   struct iphdr *iph, size_t plen,
> +				   const uint16_t *check, uint32_t seq)
> +{
> +	size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr);
> +	const struct in_addr *a4 = inany_v4(&conn->faddr);
> +	struct tcphdr *th = (struct tcphdr *)(iph + 1);
> +
> +	iph->tot_len = htons(ip_len);
> +	iph->saddr = a4->s_addr;

The reasoning behind the fact that a4 isn't NULL here is relatively
simple to follow: you already check for inany_v4(&conn->faddr) in the
caller, if it evaluates to true, you call this.

Still, it's a bit too convoluted for Coverity's taste. Could you
perhaps add an ASSERT(a4) before this block to make it obvious? It's a
bit annoying that we extract the address twice, but I don't see a much
better alternative compared to what you did.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 0/9] Add vhost-user support to passt (part 1)
  2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
                   ` (8 preceding siblings ...)
  2024-02-17 15:07 ` [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Laurent Vivier
@ 2024-02-29 16:31 ` Stefano Brivio
  9 siblings, 0 replies; 38+ messages in thread
From: Stefano Brivio @ 2024-02-29 16:31 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev

On Sat, 17 Feb 2024 16:07:16 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> v3:
>   - add a patch that has been extracted from:
>     "tcp: extract buffer management from tcp_send_flag()"
>     -> "tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()"  
>   - see detailed v3 history log in each patch
>   - I didn't address the alignment problem when we provide a pointer
>     to a sub-structure in the internal buffer structure.
>     (for the last patches of the series).

Excluding pending comments, the series looks good to me.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29 14:15               ` Stefano Brivio
@ 2024-02-29 23:09                 ` David Gibson
  2024-03-01  6:56                   ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-02-29 23:09 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 5489 bytes --]

On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:
> On Thu, 29 Feb 2024 09:56:25 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Thu, 29 Feb 2024 19:49:09 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:  
> > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:    
> > > > > > On 2/19/24 04:08, David Gibson wrote:      
> > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > >
> > > > > > > [...]
> > > > > > >    
> > > > > > > > +/**
> > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > + * @payload_len:	Payload length
> > > > > > > > + * @proto:		Protocol number
> > > > > > > > + * @saddr:		Source address
> > > > > > > > + * @daddr:		Destination address
> > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > + */
> > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)      
> > > > > > > 
> > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > not be what we want.      
> > > > > > 
> > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > &ip6h->daddr can be misaligned).      
> > > > > 
> > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > think it will just move the misaligned access from inside the function
> > > > > to the call site, where we try to marshal the parameter from something
> > > > > unaligned.    
> > > > 
> > > > I haven't tested this yet, but note that this is generally okay: the
> > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > fine.    
> > > 
> > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > in an unaligned buffer, then in order to pass this by value the caller
> > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > will base the type of loads only on the pointed to type, which isn't
> > > changed whether we dereference in the caller or the callee.
> > >   
> > > > 
> > > > Speaking MIPS, this is not safe on all CPU models:
> > > > 
> > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > 
> > > > but this is:
> > > > 
> > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > 	or	$2, $5, $6 # OR s5 and s6 into s2    
> > > 
> > > Right, but I don't think merely moving the dereference to the caller
> > > will necessarily induce the compiler to generate this rather than the
> > > former.  
> > 
> > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > patch yet).
> 
> ...no, that's not the case. Dereferencing 'iph' from
> struct tcp[46]_l2_buf_t is fine:
> 
> struct tcp4_l2_buf_t {
>         uint8_t                    pad[2];               /*     0     2 */
>         struct tap_hdr             taph;                 /*     2    18 */
>         struct iphdr               iph;                  /*    20    20 */
> 	[...]
> } __attribute__((__packed__));
> 
> struct tcp6_l2_buf_t {
>         uint8_t                    pad[2];               /*     0     2 */
>         struct tap_hdr             taph;                 /*     2    18 */
>         struct ipv6hdr             ip6h;                 /*    20    40 */
> 	[...]
> } __attribute__((__packed__));
> 
> The problematic structures are the UDP buffers:
> 
> struct udp4_l2_buf_t {
>         struct sockaddr_in         s_in;                 /*     0    16 */
>         struct tap_hdr             taph;                 /*    16    18 */
>         struct iphdr               iph;                  /*    34    20 */
> 	[...]
> } __attribute__((__aligned__(4)));
> 
> and for UDP, this patch is dereferencing buffer pointers only, not
> pointers to headers.

Ok... but my point remains, I'm not seeing that passing the address by
value actually helps - it just seems to change whether we need to
handle the unaligned load in the caller or the callee.

> Inconsistent if you want, but it's quite simple and it works, plus if you
> had half a mind (at least for UDP) to split buffers into header and
> payloads iovec parts... this doesn't need to be exceedingly elegant.

That won't actually help this, since (for now at least) I intend to
handle all the headers, including UDP, as one blob.  The payload in
the second iov will be the UDP payload, not the IP payload.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-29 16:24   ` Stefano Brivio
@ 2024-02-29 23:10     ` David Gibson
  2024-03-01  7:58       ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-02-29 23:10 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:
> On Sat, 17 Feb 2024 16:07:22 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
> > We can find the same function to compute the IPv4 header
> > checksum in tcp.c, udp.c and tap.c
> > 
> > Use the function defined for tap.c, csum_ip4_header(), but
> > with the code used in tcp.c and udp.c as it doesn't need a fully
> > initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> > 
> > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > ---
> > 
> > Notes:
> >     v3:
> >       - function parameters provide tot_len, saddr, daddr and protocol
> >        rather than an iphdr
> >     
> >     v2:
> >       - use csum_ip4_header() from checksum.c
> >       - use code from tcp.c and udp.c in csum_ip4_header()
> >       - use "const struct iphfr *", check is not updated by the
> >         function but by the caller.
> > 
> >  checksum.c | 17 +++++++++++++----
> >  checksum.h |  3 ++-
> >  tap.c      |  3 ++-
> >  tcp.c      | 24 +++---------------------
> >  udp.c      | 20 ++------------------
> >  5 files changed, 22 insertions(+), 45 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index 74e3742bc6f6..511b296a9a80 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -57,6 +57,7 @@
> >  #include <linux/icmpv6.h>
> >  
> >  #include "util.h"
> > +#include "ip.h"
> >  #include "checksum.h"
> >  
> >  /* Checksums are optional for UDP over IPv4, so we usually just set
> > @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
> >  uint16_t csum(const void *buf, size_t len, uint32_t init);
> >  
> >  /**
> > - * csum_ip4_header() - Calculate and set IPv4 header checksum
> > + * csum_ip4_header() - Calculate IPv4 header checksum
> >   * @ip4h:	IPv4 header
> >   */
> > -void csum_ip4_header(struct iphdr *ip4h)
> > +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> > +			 uint32_t saddr, uint32_t daddr)
> >  {
> > -	ip4h->check = 0;
> > -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> > +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);
> 
> Now that we use this macro, Coverity Scan realises that it's broken:
> 
> #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) +     \
> 				(uint32_t)htons_constant(0xff00 | (proto)))
> 
> ...but proto is eight (lower) bits, so this actually ignores 'proto'.

Uh... how so?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-02-29 23:09                 ` David Gibson
@ 2024-03-01  6:56                   ` Stefano Brivio
  2024-03-04  1:54                     ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-03-01  6:56 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Fri, 1 Mar 2024 10:09:39 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:
> > On Thu, 29 Feb 2024 09:56:25 +0100
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:    
> > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >       
> > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:      
> > > > > > > On 2/19/24 04:08, David Gibson wrote:        
> > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >      
> > > > > > > > > +/**
> > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > + * @saddr:		Source address
> > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > + */
> > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)        
> > > > > > > > 
> > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > not be what we want.        
> > > > > > > 
> > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > &ip6h->daddr can be misaligned).        
> > > > > > 
> > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > think it will just move the misaligned access from inside the function
> > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > unaligned.      
> > > > > 
> > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > fine.      
> > > > 
> > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > will base the type of loads only on the pointed to type, which isn't
> > > > changed whether we dereference in the caller or the callee.
> > > >     
> > > > > 
> > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > 
> > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > 
> > > > > but this is:
> > > > > 
> > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2      
> > > > 
> > > > Right, but I don't think merely moving the dereference to the caller
> > > > will necessarily induce the compiler to generate this rather than the
> > > > former.    
> > > 
> > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > patch yet).  
> > 
> > ...no, that's not the case. Dereferencing 'iph' from
> > struct tcp[46]_l2_buf_t is fine:
> > 
> > struct tcp4_l2_buf_t {
> >         uint8_t                    pad[2];               /*     0     2 */
> >         struct tap_hdr             taph;                 /*     2    18 */
> >         struct iphdr               iph;                  /*    20    20 */
> > 	[...]
> > } __attribute__((__packed__));
> > 
> > struct tcp6_l2_buf_t {
> >         uint8_t                    pad[2];               /*     0     2 */
> >         struct tap_hdr             taph;                 /*     2    18 */
> >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > 	[...]
> > } __attribute__((__packed__));
> > 
> > The problematic structures are the UDP buffers:
> > 
> > struct udp4_l2_buf_t {
> >         struct sockaddr_in         s_in;                 /*     0    16 */
> >         struct tap_hdr             taph;                 /*    16    18 */
> >         struct iphdr               iph;                  /*    34    20 */
> > 	[...]
> > } __attribute__((__aligned__(4)));
> > 
> > and for UDP, this patch is dereferencing buffer pointers only, not
> > pointers to headers.  
> 
> Ok... but my point remains, I'm not seeing that passing the address by
> value actually helps - it just seems to change whether we need to
> handle the unaligned load in the caller or the callee.

For UDP and IPv4 (from 6/9):

+       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
+                                      b->iph.saddr, b->iph.daddr);

and for IPv6 (this patch):

+       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
+                          proto_ipv6_header_psum(b->ip6h.payload_len,
+                                                 IPPROTO_UDP,
+                                                 b->ip6h.saddr,
+                                                 b->ip6h.daddr));

these cause loads starting from 'b', which is aligned, instead of
passing 'iph' or 'ip6h', unaligned, and loading from there.

This patch isn't just passing the address by value: it's also changing
the load operation. It doesn't do iph->tot_len, it goes back to the
load from 'b' with b->iph.tot_len.

> > Inconsistent if you want, but it's quite simple and it works, plus if you
> > had half a mind (at least for UDP) to split buffers into header and
> > payloads iovec parts... this doesn't need to be exceedingly elegant.  
> 
> That won't actually help this, since (for now at least) I intend to
> handle all the headers, including UDP, as one blob.  The payload in
> the second iov will be the UDP payload, not the IP payload.

Ah, right... and I guess you already told me.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-02-29 23:10     ` David Gibson
@ 2024-03-01  7:58       ` Stefano Brivio
  2024-03-01 12:23         ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-03-01  7:58 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Fri, 1 Mar 2024 10:10:52 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:
> > On Sat, 17 Feb 2024 16:07:22 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> > > We can find the same function to compute the IPv4 header
> > > checksum in tcp.c, udp.c and tap.c
> > > 
> > > Use the function defined for tap.c, csum_ip4_header(), but
> > > with the code used in tcp.c and udp.c as it doesn't need a fully
> > > initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > ---
> > > 
> > > Notes:
> > >     v3:
> > >       - function parameters provide tot_len, saddr, daddr and protocol
> > >        rather than an iphdr
> > >     
> > >     v2:
> > >       - use csum_ip4_header() from checksum.c
> > >       - use code from tcp.c and udp.c in csum_ip4_header()
> > >       - use "const struct iphfr *", check is not updated by the
> > >         function but by the caller.
> > > 
> > >  checksum.c | 17 +++++++++++++----
> > >  checksum.h |  3 ++-
> > >  tap.c      |  3 ++-
> > >  tcp.c      | 24 +++---------------------
> > >  udp.c      | 20 ++------------------
> > >  5 files changed, 22 insertions(+), 45 deletions(-)
> > > 
> > > diff --git a/checksum.c b/checksum.c
> > > index 74e3742bc6f6..511b296a9a80 100644
> > > --- a/checksum.c
> > > +++ b/checksum.c
> > > @@ -57,6 +57,7 @@
> > >  #include <linux/icmpv6.h>
> > >  
> > >  #include "util.h"
> > > +#include "ip.h"
> > >  #include "checksum.h"
> > >  
> > >  /* Checksums are optional for UDP over IPv4, so we usually just set
> > > @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
> > >  uint16_t csum(const void *buf, size_t len, uint32_t init);
> > >  
> > >  /**
> > > - * csum_ip4_header() - Calculate and set IPv4 header checksum
> > > + * csum_ip4_header() - Calculate IPv4 header checksum
> > >   * @ip4h:	IPv4 header
> > >   */
> > > -void csum_ip4_header(struct iphdr *ip4h)
> > > +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> > > +			 uint32_t saddr, uint32_t daddr)
> > >  {
> > > -	ip4h->check = 0;
> > > -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> > > +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);  
> > 
> > Now that we use this macro, Coverity Scan realises that it's broken:
> > 
> > #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) +     \
> > 				(uint32_t)htons_constant(0xff00 | (proto)))
> > 
> > ...but proto is eight (lower) bits, so this actually ignores 'proto'.  
> 
> Uh... how so?

Oops, sorry, it's not broken, and this is a false positive due to the
fact that __bswap_constant_16() (which htons_constant() resolves to, on
little-endian) is defined, for example in glibc, as:

#define __bswap_constant_16(x) \
     ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))

and in this case the first term of the | resolves to a constant value,
0xff, because 0xffxx >> 8 is 0xff for any value of xx.

I couldn't think of a "solution", yet.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-03-01  7:58       ` Stefano Brivio
@ 2024-03-01 12:23         ` David Gibson
  2024-03-04 17:04           ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-03-01 12:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 3876 bytes --]

On Fri, Mar 01, 2024 at 08:58:45AM +0100, Stefano Brivio wrote:
> On Fri, 1 Mar 2024 10:10:52 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:
> > > On Sat, 17 Feb 2024 16:07:22 +0100
> > > Laurent Vivier <lvivier@redhat.com> wrote:
> > >   
> > > > We can find the same function to compute the IPv4 header
> > > > checksum in tcp.c, udp.c and tap.c
> > > > 
> > > > Use the function defined for tap.c, csum_ip4_header(), but
> > > > with the code used in tcp.c and udp.c as it doesn't need a fully
> > > > initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
> > > > 
> > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > > ---
> > > > 
> > > > Notes:
> > > >     v3:
> > > >       - function parameters provide tot_len, saddr, daddr and protocol
> > > >        rather than an iphdr
> > > >     
> > > >     v2:
> > > >       - use csum_ip4_header() from checksum.c
> > > >       - use code from tcp.c and udp.c in csum_ip4_header()
> > > >       - use "const struct iphfr *", check is not updated by the
> > > >         function but by the caller.
> > > > 
> > > >  checksum.c | 17 +++++++++++++----
> > > >  checksum.h |  3 ++-
> > > >  tap.c      |  3 ++-
> > > >  tcp.c      | 24 +++---------------------
> > > >  udp.c      | 20 ++------------------
> > > >  5 files changed, 22 insertions(+), 45 deletions(-)
> > > > 
> > > > diff --git a/checksum.c b/checksum.c
> > > > index 74e3742bc6f6..511b296a9a80 100644
> > > > --- a/checksum.c
> > > > +++ b/checksum.c
> > > > @@ -57,6 +57,7 @@
> > > >  #include <linux/icmpv6.h>
> > > >  
> > > >  #include "util.h"
> > > > +#include "ip.h"
> > > >  #include "checksum.h"
> > > >  
> > > >  /* Checksums are optional for UDP over IPv4, so we usually just set
> > > > @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum)
> > > >  uint16_t csum(const void *buf, size_t len, uint32_t init);
> > > >  
> > > >  /**
> > > > - * csum_ip4_header() - Calculate and set IPv4 header checksum
> > > > + * csum_ip4_header() - Calculate IPv4 header checksum
> > > >   * @ip4h:	IPv4 header
> > > >   */
> > > > -void csum_ip4_header(struct iphdr *ip4h)
> > > > +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> > > > +			 uint32_t saddr, uint32_t daddr)
> > > >  {
> > > > -	ip4h->check = 0;
> > > > -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> > > > +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);  
> > > 
> > > Now that we use this macro, Coverity Scan realises that it's broken:
> > > 
> > > #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) +     \
> > > 				(uint32_t)htons_constant(0xff00 | (proto)))
> > > 
> > > ...but proto is eight (lower) bits, so this actually ignores 'proto'.  
> > 
> > Uh... how so?
> 
> Oops, sorry, it's not broken, and this is a false positive due to the
> fact that __bswap_constant_16() (which htons_constant() resolves to, on
> little-endian) is defined, for example in glibc, as:
> 
> #define __bswap_constant_16(x) \
>      ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
> 
> and in this case the first term of the | resolves to a constant value,
> 0xff, because 0xffxx >> 8 is 0xff for any value of xx.

Right.  This really seems overzealous of coverity: it seems like any
occasion where the compiler would constant fold could result in a
similar warning.

> I couldn't think of a "solution", yet.

Making it an inline function rather than a macro might be enough to
convince Coverity.  Otherwise we could just mark it as a false
positive in the Coverity web interface.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-01  6:56                   ` Stefano Brivio
@ 2024-03-04  1:54                     ` David Gibson
  2024-03-04 11:00                       ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-03-04  1:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 6564 bytes --]

On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:
> On Fri, 1 Mar 2024 10:09:39 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:
> > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >   
> > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >   
> > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:    
> > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:      
> > > > > > > > On 2/19/24 04:08, David Gibson wrote:        
> > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > >      
> > > > > > > > > > +/**
> > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > + */
> > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)        
> > > > > > > > > 
> > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > not be what we want.        
> > > > > > > > 
> > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > &ip6h->daddr can be misaligned).        
> > > > > > > 
> > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > unaligned.      
> > > > > > 
> > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > fine.      
> > > > > 
> > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > changed whether we dereference in the caller or the callee.
> > > > >     
> > > > > > 
> > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > 
> > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > 
> > > > > > but this is:
> > > > > > 
> > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2      
> > > > > 
> > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > will necessarily induce the compiler to generate this rather than the
> > > > > former.    
> > > > 
> > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > patch yet).  
> > > 
> > > ...no, that's not the case. Dereferencing 'iph' from
> > > struct tcp[46]_l2_buf_t is fine:
> > > 
> > > struct tcp4_l2_buf_t {
> > >         uint8_t                    pad[2];               /*     0     2 */
> > >         struct tap_hdr             taph;                 /*     2    18 */
> > >         struct iphdr               iph;                  /*    20    20 */
> > > 	[...]
> > > } __attribute__((__packed__));
> > > 
> > > struct tcp6_l2_buf_t {
> > >         uint8_t                    pad[2];               /*     0     2 */
> > >         struct tap_hdr             taph;                 /*     2    18 */
> > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > 	[...]
> > > } __attribute__((__packed__));
> > > 
> > > The problematic structures are the UDP buffers:
> > > 
> > > struct udp4_l2_buf_t {
> > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > >         struct tap_hdr             taph;                 /*    16    18 */
> > >         struct iphdr               iph;                  /*    34    20 */
> > > 	[...]
> > > } __attribute__((__aligned__(4)));
> > > 
> > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > pointers to headers.  
> > 
> > Ok... but my point remains, I'm not seeing that passing the address by
> > value actually helps - it just seems to change whether we need to
> > handle the unaligned load in the caller or the callee.
> 
> For UDP and IPv4 (from 6/9):
> 
> +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> +                                      b->iph.saddr, b->iph.daddr);
> 
> and for IPv6 (this patch):
> 
> +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> +                                                 IPPROTO_UDP,
> +                                                 b->ip6h.saddr,
> +                                                 b->ip6h.daddr));
> 
> these cause loads starting from 'b', which is aligned, instead of
> passing 'iph' or 'ip6h', unaligned, and loading from there.

No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
b->ip6h.payload_len.  Just because we're computing the offset a bit
differently doesn't change the load itself.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-04  1:54                     ` David Gibson
@ 2024-03-04 11:00                       ` Stefano Brivio
  2024-03-04 22:47                         ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-03-04 11:00 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Mon, 4 Mar 2024 12:54:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:
> > On Fri, 1 Mar 2024 10:09:39 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:  
> > > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >     
> > > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:      
> > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > >         
> > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:        
> > > > > > > > > On 2/19/24 04:08, David Gibson wrote:          
> > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > > >
> > > > > > > > > > [...]
> > > > > > > > > >        
> > > > > > > > > > > +/**
> > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > > + */
> > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)          
> > > > > > > > > > 
> > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > > not be what we want.          
> > > > > > > > > 
> > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > > &ip6h->daddr can be misaligned).          
> > > > > > > > 
> > > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > > unaligned.        
> > > > > > > 
> > > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > > fine.        
> > > > > > 
> > > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > > changed whether we dereference in the caller or the callee.
> > > > > >       
> > > > > > > 
> > > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > > 
> > > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > > 
> > > > > > > but this is:
> > > > > > > 
> > > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2        
> > > > > > 
> > > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > > will necessarily induce the compiler to generate this rather than the
> > > > > > former.      
> > > > > 
> > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > > patch yet).    
> > > > 
> > > > ...no, that's not the case. Dereferencing 'iph' from
> > > > struct tcp[46]_l2_buf_t is fine:
> > > > 
> > > > struct tcp4_l2_buf_t {
> > > >         uint8_t                    pad[2];               /*     0     2 */
> > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > >         struct iphdr               iph;                  /*    20    20 */
> > > > 	[...]
> > > > } __attribute__((__packed__));
> > > > 
> > > > struct tcp6_l2_buf_t {
> > > >         uint8_t                    pad[2];               /*     0     2 */
> > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > > 	[...]
> > > > } __attribute__((__packed__));
> > > > 
> > > > The problematic structures are the UDP buffers:
> > > > 
> > > > struct udp4_l2_buf_t {
> > > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > > >         struct tap_hdr             taph;                 /*    16    18 */
> > > >         struct iphdr               iph;                  /*    34    20 */
> > > > 	[...]
> > > > } __attribute__((__aligned__(4)));
> > > > 
> > > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > > pointers to headers.    
> > > 
> > > Ok... but my point remains, I'm not seeing that passing the address by
> > > value actually helps - it just seems to change whether we need to
> > > handle the unaligned load in the caller or the callee.  
> > 
> > For UDP and IPv4 (from 6/9):
> > 
> > +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> > +                                      b->iph.saddr, b->iph.daddr);
> > 
> > and for IPv6 (this patch):
> > 
> > +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> > +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> > +                                                 IPPROTO_UDP,
> > +                                                 b->ip6h.saddr,
> > +                                                 b->ip6h.daddr));
> > 
> > these cause loads starting from 'b', which is aligned, instead of
> > passing 'iph' or 'ip6h', unaligned, and loading from there.  
> 
> No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
> b->ip6h.payload_len.

It depends how we define "loading from" -- the problem, in general, is
not the memory location per se, the problem is dereferencing memory
pointers.

I plan to try an example on MIPS in a bit, but meanwhile, this is what I
mean:

	lw	$2, 0($1)

and $1 needs to be aligned. Then, the compiler needs to know if this:

	lw	$2, 333($1)

is fine, or if there needs to be a load from another address. However,
we need to give the chance to the compiler to use an aligned pointer
(that is, 'b', not '&b->iph').

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c
  2024-03-01 12:23         ` David Gibson
@ 2024-03-04 17:04           ` Stefano Brivio
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Brivio @ 2024-03-04 17:04 UTC (permalink / raw)
  To: David Gibson, Laurent Vivier; +Cc: passt-dev

On Fri, 1 Mar 2024 23:23:41 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Mar 01, 2024 at 08:58:45AM +0100, Stefano Brivio wrote:
> > On Fri, 1 Mar 2024 10:10:52 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:  
> > > > On Sat, 17 Feb 2024 16:07:22 +0100
> > > > Laurent Vivier <lvivier@redhat.com> wrote:
> > > >     
> > > > > +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol,
> > > > > +			 uint32_t saddr, uint32_t daddr)
> > > > >  {
> > > > > -	ip4h->check = 0;
> > > > > -	ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0);
> > > > > +	uint32_t sum = L2_BUF_IP4_PSUM(protocol);    
> > > > 
> > > > Now that we use this macro, Coverity Scan realises that it's broken:
> > > > 
> > > > #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) +     \
> > > > 				(uint32_t)htons_constant(0xff00 | (proto)))
> > > > 
> > > > ...but proto is eight (lower) bits, so this actually ignores 'proto'.    
> > > 
> > > Uh... how so?  
> > 
> > Oops, sorry, it's not broken, and this is a false positive due to the
> > fact that __bswap_constant_16() (which htons_constant() resolves to, on
> > little-endian) is defined, for example in glibc, as:
> > 
> > #define __bswap_constant_16(x) \
> >      ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))
> > 
> > and in this case the first term of the | resolves to a constant value,
> > 0xff, because 0xffxx >> 8 is 0xff for any value of xx.  
> 
> Right.  This really seems overzealous of coverity: it seems like any
> occasion where the compiler would constant fold could result in a
> similar warning.
> 
> > I couldn't think of a "solution", yet.  
> 
> Making it an inline function rather than a macro might be enough to
> convince Coverity.  Otherwise we could just mark it as a false
> positive in the Coverity web interface.

The inline function didn't help per se, but while trying it out (with a
number of variations on it) I realised that, Coverity being overzealous
or not... 'proto' isn't a constant, so we shouldn't use
__bswap_constant_16(), unless we want to define three constants for the
Layer-4 protocols we support.

Switched to htons() as it obviously ought to be, problem solved. I'll
post this as follow-up patch to Laurent's series.

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-04 11:00                       ` Stefano Brivio
@ 2024-03-04 22:47                         ` Stefano Brivio
  2024-03-06  5:09                           ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-03-04 22:47 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Mon, 4 Mar 2024 12:00:40 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Mon, 4 Mar 2024 12:54:12 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:  
> > > On Fri, 1 Mar 2024 10:09:39 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >     
> > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:    
> > > > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > >       
> > > > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:        
> > > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > >           
> > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:          
> > > > > > > > > > On 2/19/24 04:08, David Gibson wrote:            
> > > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > > > >
> > > > > > > > > > > [...]
> > > > > > > > > > >          
> > > > > > > > > > > > +/**
> > > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > > > + */
> > > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)            
> > > > > > > > > > > 
> > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > > > not be what we want.            
> > > > > > > > > > 
> > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > > > &ip6h->daddr can be misaligned).            
> > > > > > > > > 
> > > > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > > > unaligned.          
> > > > > > > > 
> > > > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > > > fine.          
> > > > > > > 
> > > > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > > > changed whether we dereference in the caller or the callee.
> > > > > > >         
> > > > > > > > 
> > > > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > > > 
> > > > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > > > 
> > > > > > > > but this is:
> > > > > > > > 
> > > > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2          
> > > > > > > 
> > > > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > > > will necessarily induce the compiler to generate this rather than the
> > > > > > > former.        
> > > > > > 
> > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > > > patch yet).      
> > > > > 
> > > > > ...no, that's not the case. Dereferencing 'iph' from
> > > > > struct tcp[46]_l2_buf_t is fine:
> > > > > 
> > > > > struct tcp4_l2_buf_t {
> > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > >         struct iphdr               iph;                  /*    20    20 */
> > > > > 	[...]
> > > > > } __attribute__((__packed__));
> > > > > 
> > > > > struct tcp6_l2_buf_t {
> > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > > > 	[...]
> > > > > } __attribute__((__packed__));
> > > > > 
> > > > > The problematic structures are the UDP buffers:
> > > > > 
> > > > > struct udp4_l2_buf_t {
> > > > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > > > >         struct tap_hdr             taph;                 /*    16    18 */
> > > > >         struct iphdr               iph;                  /*    34    20 */
> > > > > 	[...]
> > > > > } __attribute__((__aligned__(4)));
> > > > > 
> > > > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > > > pointers to headers.      
> > > > 
> > > > Ok... but my point remains, I'm not seeing that passing the address by
> > > > value actually helps - it just seems to change whether we need to
> > > > handle the unaligned load in the caller or the callee.    
> > > 
> > > For UDP and IPv4 (from 6/9):
> > > 
> > > +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> > > +                                      b->iph.saddr, b->iph.daddr);
> > > 
> > > and for IPv6 (this patch):
> > > 
> > > +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> > > +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> > > +                                                 IPPROTO_UDP,
> > > +                                                 b->ip6h.saddr,
> > > +                                                 b->ip6h.daddr));
> > > 
> > > these cause loads starting from 'b', which is aligned, instead of
> > > passing 'iph' or 'ip6h', unaligned, and loading from there.    
> > 
> > No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
> > b->ip6h.payload_len.  
> 
> It depends how we define "loading from" -- the problem, in general, is
> not the memory location per se, the problem is dereferencing memory
> pointers.
> 
> I plan to try an example on MIPS in a bit [...]

Actually, armhf first (for clarity):

$ cat align.c
#include <stdio.h>
#include <stdint.h>

struct disarray {
    uint8_t oops;
    uint32_t v1;
    uint32_t v2;
} __attribute__((packed, aligned(__alignof__(unsigned int))));

void f1(uint32_t *v1) {
    *v1 += 42;
}

uint32_t f2(uint32_t v2) {
    return v2++;
}

int main()
{
    struct disarray d = { 0x55, 0xaa, 0xaa };

    f1(&d.v1);
    f2(d.v2);

    fprintf(stdout, "%08x %08x", d.v1, d.v2);
}

$ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
align.c: In function ‘main’:
align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
   22 |     f1(&d.v1);
      |        ^~~~~

$ arm-linux-gnueabihf-objdump -S --disassemble=main align
[...]
    f1(&d.v1);
 562:	ab01      	add	r3, sp, #4
 564:	3301      	adds	r3, #1
 566:	4618      	mov	r0, r3
 568:	f7ff ffde 	bl	528 <f1>
[...]

before the call to f1(), the address in r3 is not aligned (we just
added #1), despite -mno-unaligned-access. I guess gcc can only warn
about that, but not fix it.

This:
  https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

says:
  -munaligned-access
  -mno-unaligned-access

    Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. 

Implying, I guess, that on those architectures unaligned accesses
shouldn't be done. I think Thumb mode also has issues with this, by
the way. 

And in f1() we just have a ldr from that address (passed on r0):
void f1(uint32_t *v1) {
 528:	b082      	sub	sp, #8
 52a:	9001      	str	r0, [sp, #4]
    *v1 += 42;
 52c:	9b01      	ldr	r3, [sp, #4]
 52e:	681b      	ldr	r3, [r3, #0]
 530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a

$ arm-linux-gnueabihf-objdump -S --disassemble=f1 align
[...]
    *v1 += 42;
 52c:	9b01      	ldr	r3, [sp, #4]
 52e:	681b      	ldr	r3, [r3, #0]
 530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a

...but the call to f2() is fine: we load with offset 8 from the stack
pointer, shift word right, load from offset 12, shift word left, OR:

$ arm-linux-gnueabihf-objdump -S --disassemble=main align
[...]
    f2(d.v2);
 56c:	9b02      	ldr	r3, [sp, #8]
 56e:	0a1b      	lsrs	r3, r3, #8
 570:	f89d 200c 	ldrb.w	r2, [sp, #12]
 574:	0612      	lsls	r2, r2, #24
 576:	4313      	orrs	r3, r2
 578:	4618      	mov	r0, r3
 57a:	f7ff ffe0 	bl	53e <f2>
[...]

Now on to MIPS (MIPS32):

$ mips-linux-gnu-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
align.c: In function ‘main’:
align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
   22 |     f1(&d.v1);
      |        ^~~~~

$ mips-linux-gnu-objdump -S --disassemble=main align
[...]
    f1(&d.v1);
 7bc:	27a20019 	addiu	v0,sp,25
 7c0:	00402025 	move	a0,v0
 7c4:	8f82802c 	lw	v0,-32724(gp)
 7c8:	0040c825 	move	t9,v0
 7cc:	0411ffe0 	bal	750 <f1>
 7d0:	00000000 	nop
 7d4:	8fbc0010 	lw	gp,16(sp)
[...]

'&d.v1' is passed in a0, again unaligned (stack pointer plus 25). And f1()
uses it just like that:

$ mips-linux-gnu-objdump -S --disassemble=f1 align
[...]
void f1(uint32_t *v1) {
 750:	afa40000 	sw	a0,0(sp)
    *v1 += 42;
 754:	8fa20000 	lw	v0,0(sp)
 758:	8c420000 	lw	v0,0(v0)
 75c:	2443002a 	addiu	v1,v0,42
[...]

while the call to f2() is, again, fine:

$ mips-linux-gnu-objdump -S --disassemble=main align
    f2(d.v2);
 7e0:	8ba2001d 	lwl	v0,29(sp)
 7e4:	9ba20020 	lwr	v0,32(sp)
 7e8:	00402025 	move	a0,v0
 7ec:	8f828030 	lw	v0,-32720(gp)
 7f0:	0040c825 	move	t9,v0
 7f4:	0411ffdf 	bal	774 <f2>
 7f8:	00000000 	nop
 7fc:	8fbc0010 	lw	gp,16(sp)

two loads, from stack pointer + 29 and stack pointer + 32. MIPS32 has lwl
and lwr (the infamous US4814976A patent, now expired) to avoid load plus
shift plus OR.

Now, you might argue that what I'm describing here might simply be gcc's
behaviour, and if gcc avoids unaligned loads as long as we don't pass
unaligned pointers around, that's not any better for us -- other compilers
might do things differently.

And... yes, packed structures are actually a GNU extension: C standards
don't say anything about loads like my f1(d.v2) call above, so all I'm
showing here is that a particular compiler is fine with these accesses,
but not unaligned pointers.

On the other hand, this seems to be a well established behaviour, and I
don't think we could realistically drop every load of unaligned *values*.
Unaligned pointers, we currently don't dereference any, because gcc warns
otherwise.

So, practically speaking, I guess as long as we avoid dereferencing
unaligned pointers, we should be fine?

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-04 22:47                         ` Stefano Brivio
@ 2024-03-06  5:09                           ` David Gibson
  2024-03-08  0:08                             ` Stefano Brivio
  0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2024-03-06  5:09 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 13592 bytes --]

On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:
> On Mon, 4 Mar 2024 12:00:40 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Mon, 4 Mar 2024 12:54:12 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:  
> > > > On Fri, 1 Mar 2024 10:09:39 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:    
> > > > > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > >       
> > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > >       
> > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:        
> > > > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > >           
> > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:          
> > > > > > > > > > > On 2/19/24 04:08, David Gibson wrote:            
> > > > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > > > > >
> > > > > > > > > > > > [...]
> > > > > > > > > > > >          
> > > > > > > > > > > > > +/**
> > > > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)            
> > > > > > > > > > > > 
> > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > > > > not be what we want.            
> > > > > > > > > > > 
> > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > > > > &ip6h->daddr can be misaligned).            
> > > > > > > > > > 
> > > > > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > > > > unaligned.          
> > > > > > > > > 
> > > > > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > > > > fine.          
> > > > > > > > 
> > > > > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > > > > changed whether we dereference in the caller or the callee.
> > > > > > > >         
> > > > > > > > > 
> > > > > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > > > > 
> > > > > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > > > > 
> > > > > > > > > but this is:
> > > > > > > > > 
> > > > > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2          
> > > > > > > > 
> > > > > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > > > > will necessarily induce the compiler to generate this rather than the
> > > > > > > > former.        
> > > > > > > 
> > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > > > > patch yet).      
> > > > > > 
> > > > > > ...no, that's not the case. Dereferencing 'iph' from
> > > > > > struct tcp[46]_l2_buf_t is fine:
> > > > > > 
> > > > > > struct tcp4_l2_buf_t {
> > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > >         struct iphdr               iph;                  /*    20    20 */
> > > > > > 	[...]
> > > > > > } __attribute__((__packed__));
> > > > > > 
> > > > > > struct tcp6_l2_buf_t {
> > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > > > > 	[...]
> > > > > > } __attribute__((__packed__));
> > > > > > 
> > > > > > The problematic structures are the UDP buffers:
> > > > > > 
> > > > > > struct udp4_l2_buf_t {
> > > > > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > > > > >         struct tap_hdr             taph;                 /*    16    18 */
> > > > > >         struct iphdr               iph;                  /*    34    20 */
> > > > > > 	[...]
> > > > > > } __attribute__((__aligned__(4)));
> > > > > > 
> > > > > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > > > > pointers to headers.      
> > > > > 
> > > > > Ok... but my point remains, I'm not seeing that passing the address by
> > > > > value actually helps - it just seems to change whether we need to
> > > > > handle the unaligned load in the caller or the callee.    
> > > > 
> > > > For UDP and IPv4 (from 6/9):
> > > > 
> > > > +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> > > > +                                      b->iph.saddr, b->iph.daddr);
> > > > 
> > > > and for IPv6 (this patch):
> > > > 
> > > > +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> > > > +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> > > > +                                                 IPPROTO_UDP,
> > > > +                                                 b->ip6h.saddr,
> > > > +                                                 b->ip6h.daddr));
> > > > 
> > > > these cause loads starting from 'b', which is aligned, instead of
> > > > passing 'iph' or 'ip6h', unaligned, and loading from there.    
> > > 
> > > No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
> > > b->ip6h.payload_len.  
> > 
> > It depends how we define "loading from" -- the problem, in general, is
> > not the memory location per se, the problem is dereferencing memory
> > pointers.
> > 
> > I plan to try an example on MIPS in a bit [...]
> 
> Actually, armhf first (for clarity):
> 
> $ cat align.c
> #include <stdio.h>
> #include <stdint.h>
> 
> struct disarray {
>     uint8_t oops;
>     uint32_t v1;
>     uint32_t v2;
> } __attribute__((packed, aligned(__alignof__(unsigned int))));
> 
> void f1(uint32_t *v1) {
>     *v1 += 42;
> }
> 
> uint32_t f2(uint32_t v2) {
>     return v2++;
> }
> 
> int main()
> {
>     struct disarray d = { 0x55, 0xaa, 0xaa };
> 
>     f1(&d.v1);
>     f2(d.v2);
> 
>     fprintf(stdout, "%08x %08x", d.v1, d.v2);
> }
> 
> $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
> align.c: In function ‘main’:
> align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
>    22 |     f1(&d.v1);
>       |        ^~~~~
> 
> $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> [...]
>     f1(&d.v1);
>  562:	ab01      	add	r3, sp, #4
>  564:	3301      	adds	r3, #1
>  566:	4618      	mov	r0, r3
>  568:	f7ff ffde 	bl	528 <f1>
> [...]
> 
> before the call to f1(), the address in r3 is not aligned (we just
> added #1), despite -mno-unaligned-access. I guess gcc can only warn
> about that, but not fix it.
> 
> This:
>   https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
> 
> says:
>   -munaligned-access
>   -mno-unaligned-access
> 
>     Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. 
> 
> Implying, I guess, that on those architectures unaligned accesses
> shouldn't be done. I think Thumb mode also has issues with this, by
> the way. 
> 
> And in f1() we just have a ldr from that address (passed on r0):
> void f1(uint32_t *v1) {
>  528:	b082      	sub	sp, #8
>  52a:	9001      	str	r0, [sp, #4]
>     *v1 += 42;
>  52c:	9b01      	ldr	r3, [sp, #4]
>  52e:	681b      	ldr	r3, [r3, #0]
>  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> 
> $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align
> [...]
>     *v1 += 42;
>  52c:	9b01      	ldr	r3, [sp, #4]
>  52e:	681b      	ldr	r3, [r3, #0]
>  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> 
> ...but the call to f2() is fine: we load with offset 8 from the stack
> pointer, shift word right, load from offset 12, shift word left, OR:
> 
> $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> [...]
>     f2(d.v2);
>  56c:	9b02      	ldr	r3, [sp, #8]
>  56e:	0a1b      	lsrs	r3, r3, #8
>  570:	f89d 200c 	ldrb.w	r2, [sp, #12]
>  574:	0612      	lsls	r2, r2, #24
>  576:	4313      	orrs	r3, r2
>  578:	4618      	mov	r0, r3
>  57a:	f7ff ffe0 	bl	53e <f2>
> [...]

Huh.  Ok, so I guess the compiler realises it's doing a load from a
packed structure and generates the necessary fixup code.  I thought it
would only consider the type of the actually loaded value.  Are you
sure it still does this correctly when optimization is enabled?

> 
> Now on to MIPS (MIPS32):
> 
> $ mips-linux-gnu-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
> align.c: In function ‘main’:
> align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
>    22 |     f1(&d.v1);
>       |        ^~~~~
> 
> $ mips-linux-gnu-objdump -S --disassemble=main align
> [...]
>     f1(&d.v1);
>  7bc:	27a20019 	addiu	v0,sp,25
>  7c0:	00402025 	move	a0,v0
>  7c4:	8f82802c 	lw	v0,-32724(gp)
>  7c8:	0040c825 	move	t9,v0
>  7cc:	0411ffe0 	bal	750 <f1>
>  7d0:	00000000 	nop
>  7d4:	8fbc0010 	lw	gp,16(sp)
> [...]
> 
> '&d.v1' is passed in a0, again unaligned (stack pointer plus 25). And f1()
> uses it just like that:
> 
> $ mips-linux-gnu-objdump -S --disassemble=f1 align
> [...]
> void f1(uint32_t *v1) {
>  750:	afa40000 	sw	a0,0(sp)
>     *v1 += 42;
>  754:	8fa20000 	lw	v0,0(sp)
>  758:	8c420000 	lw	v0,0(v0)
>  75c:	2443002a 	addiu	v1,v0,42
> [...]
> 
> while the call to f2() is, again, fine:
> 
> $ mips-linux-gnu-objdump -S --disassemble=main align
>     f2(d.v2);
>  7e0:	8ba2001d 	lwl	v0,29(sp)
>  7e4:	9ba20020 	lwr	v0,32(sp)
>  7e8:	00402025 	move	a0,v0
>  7ec:	8f828030 	lw	v0,-32720(gp)
>  7f0:	0040c825 	move	t9,v0
>  7f4:	0411ffdf 	bal	774 <f2>
>  7f8:	00000000 	nop
>  7fc:	8fbc0010 	lw	gp,16(sp)
> 
> two loads, from stack pointer + 29 and stack pointer + 32. MIPS32 has lwl
> and lwr (the infamous US4814976A patent, now expired) to avoid load plus
> shift plus OR.
> 
> Now, you might argue that what I'm describing here might simply be gcc's
> behaviour, and if gcc avoids unaligned loads as long as we don't pass
> unaligned pointers around, that's not any better for us -- other compilers
> might do things differently.
> 
> And... yes, packed structures are actually a GNU extension: C standards
> don't say anything about loads like my f1(d.v2) call above, so all I'm
> showing here is that a particular compiler is fine with these accesses,
> but not unaligned pointers.
> 
> On the other hand, this seems to be a well established behaviour, and I
> don't think we could realistically drop every load of unaligned *values*.
> Unaligned pointers, we currently don't dereference any, because gcc warns
> otherwise.
> 
> So, practically speaking, I guess as long as we avoid dereferencing
> unaligned pointers, we should be fine?
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-06  5:09                           ` David Gibson
@ 2024-03-08  0:08                             ` Stefano Brivio
  2024-03-08  1:20                               ` David Gibson
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Brivio @ 2024-03-08  0:08 UTC (permalink / raw)
  To: David Gibson; +Cc: Laurent Vivier, passt-dev

On Wed, 6 Mar 2024 16:09:23 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:
> > On Mon, 4 Mar 2024 12:00:40 +0100
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Mon, 4 Mar 2024 12:54:12 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:    
> > > > > On Fri, 1 Mar 2024 10:09:39 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >       
> > > > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:      
> > > > > > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > > >         
> > > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > >         
> > > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:          
> > > > > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > > >             
> > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:            
> > > > > > > > > > > > On 2/19/24 04:08, David Gibson wrote:              
> > > > > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > > > > > >
> > > > > > > > > > > > > [...]
> > > > > > > > > > > > >            
> > > > > > > > > > > > > > +/**
> > > > > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)              
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > > > > > not be what we want.              
> > > > > > > > > > > > 
> > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > > > > > &ip6h->daddr can be misaligned).              
> > > > > > > > > > > 
> > > > > > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > > > > > unaligned.            
> > > > > > > > > > 
> > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > > > > > fine.            
> > > > > > > > > 
> > > > > > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > > > > > changed whether we dereference in the caller or the callee.
> > > > > > > > >           
> > > > > > > > > > 
> > > > > > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > > > > > 
> > > > > > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > > > > > 
> > > > > > > > > > but this is:
> > > > > > > > > > 
> > > > > > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2            
> > > > > > > > > 
> > > > > > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > > > > > will necessarily induce the compiler to generate this rather than the
> > > > > > > > > former.          
> > > > > > > > 
> > > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > > > > > patch yet).        
> > > > > > > 
> > > > > > > ...no, that's not the case. Dereferencing 'iph' from
> > > > > > > struct tcp[46]_l2_buf_t is fine:
> > > > > > > 
> > > > > > > struct tcp4_l2_buf_t {
> > > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > > >         struct iphdr               iph;                  /*    20    20 */
> > > > > > > 	[...]
> > > > > > > } __attribute__((__packed__));
> > > > > > > 
> > > > > > > struct tcp6_l2_buf_t {
> > > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > > > > > 	[...]
> > > > > > > } __attribute__((__packed__));
> > > > > > > 
> > > > > > > The problematic structures are the UDP buffers:
> > > > > > > 
> > > > > > > struct udp4_l2_buf_t {
> > > > > > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > > > > > >         struct tap_hdr             taph;                 /*    16    18 */
> > > > > > >         struct iphdr               iph;                  /*    34    20 */
> > > > > > > 	[...]
> > > > > > > } __attribute__((__aligned__(4)));
> > > > > > > 
> > > > > > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > > > > > pointers to headers.        
> > > > > > 
> > > > > > Ok... but my point remains, I'm not seeing that passing the address by
> > > > > > value actually helps - it just seems to change whether we need to
> > > > > > handle the unaligned load in the caller or the callee.      
> > > > > 
> > > > > For UDP and IPv4 (from 6/9):
> > > > > 
> > > > > +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> > > > > +                                      b->iph.saddr, b->iph.daddr);
> > > > > 
> > > > > and for IPv6 (this patch):
> > > > > 
> > > > > +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> > > > > +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> > > > > +                                                 IPPROTO_UDP,
> > > > > +                                                 b->ip6h.saddr,
> > > > > +                                                 b->ip6h.daddr));
> > > > > 
> > > > > these cause loads starting from 'b', which is aligned, instead of
> > > > > passing 'iph' or 'ip6h', unaligned, and loading from there.      
> > > > 
> > > > No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
> > > > b->ip6h.payload_len.    
> > > 
> > > It depends how we define "loading from" -- the problem, in general, is
> > > not the memory location per se, the problem is dereferencing memory
> > > pointers.
> > > 
> > > I plan to try an example on MIPS in a bit [...]  
> > 
> > Actually, armhf first (for clarity):
> > 
> > $ cat align.c
> > #include <stdio.h>
> > #include <stdint.h>
> > 
> > struct disarray {
> >     uint8_t oops;
> >     uint32_t v1;
> >     uint32_t v2;
> > } __attribute__((packed, aligned(__alignof__(unsigned int))));
> > 
> > void f1(uint32_t *v1) {
> >     *v1 += 42;
> > }
> > 
> > uint32_t f2(uint32_t v2) {
> >     return v2++;
> > }
> > 
> > int main()
> > {
> >     struct disarray d = { 0x55, 0xaa, 0xaa };
> > 
> >     f1(&d.v1);
> >     f2(d.v2);
> > 
> >     fprintf(stdout, "%08x %08x", d.v1, d.v2);
> > }
> > 
> > $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
> > align.c: In function ‘main’:
> > align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
> >    22 |     f1(&d.v1);
> >       |        ^~~~~
> > 
> > $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> > [...]
> >     f1(&d.v1);
> >  562:	ab01      	add	r3, sp, #4
> >  564:	3301      	adds	r3, #1
> >  566:	4618      	mov	r0, r3
> >  568:	f7ff ffde 	bl	528 <f1>
> > [...]
> > 
> > before the call to f1(), the address in r3 is not aligned (we just
> > added #1), despite -mno-unaligned-access. I guess gcc can only warn
> > about that, but not fix it.
> > 
> > This:
> >   https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
> > 
> > says:
> >   -munaligned-access
> >   -mno-unaligned-access
> > 
> >     Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. 
> > 
> > Implying, I guess, that on those architectures unaligned accesses
> > shouldn't be done. I think Thumb mode also has issues with this, by
> > the way. 
> > 
> > And in f1() we just have a ldr from that address (passed on r0):
> > void f1(uint32_t *v1) {
> >  528:	b082      	sub	sp, #8
> >  52a:	9001      	str	r0, [sp, #4]
> >     *v1 += 42;
> >  52c:	9b01      	ldr	r3, [sp, #4]
> >  52e:	681b      	ldr	r3, [r3, #0]
> >  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> > 
> > $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align
> > [...]
> >     *v1 += 42;
> >  52c:	9b01      	ldr	r3, [sp, #4]
> >  52e:	681b      	ldr	r3, [r3, #0]
> >  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> > 
> > ...but the call to f2() is fine: we load with offset 8 from the stack
> > pointer, shift word right, load from offset 12, shift word left, OR:
> > 
> > $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> > [...]
> >     f2(d.v2);
> >  56c:	9b02      	ldr	r3, [sp, #8]
> >  56e:	0a1b      	lsrs	r3, r3, #8
> >  570:	f89d 200c 	ldrb.w	r2, [sp, #12]
> >  574:	0612      	lsls	r2, r2, #24
> >  576:	4313      	orrs	r3, r2
> >  578:	4618      	mov	r0, r3
> >  57a:	f7ff ffe0 	bl	53e <f2>
> > [...]  
> 
> Huh.  Ok, so I guess the compiler realises it's doing a load from a
> packed structure and generates the necessary fixup code.  I thought it
> would only consider the type of the actually loaded value.

Well, it can't just do that, because otherwise we couldn't use packed
structures on any architecture that doesn't support unaligned accesses,
right?

Once you pass a pointer to an unaligned value, though, the fixup
information is lost, and the compiler models a function as simply taking
a given pointer with a given type: the model doesn't include information
as to where the value is stored.

> Are you sure it still does this correctly when optimization is enabled?

I'm fairly sure, even though I couldn't find a "normative" reference (at
least as far as GNU extensions are concerned) guaranteeing that (but I
didn't try hard).

I had to move f1() and f2() to different compilation units, otherwise
with -O2 they get merged into main():

$ cat align.c
#include <stdio.h>
#include <stdint.h>

struct disarray {
    uint8_t oops;
    uint32_t v1;
    uint32_t v2;
} __attribute__((packed, aligned(__alignof__(unsigned int))));

void f1(uint32_t *v1);

uint32_t f2(uint32_t v2);

int main()
{
    struct disarray d = { 0x55, 0xaa, 0xaa };

    f1(&d.v1);
    f2(d.v2);

    fprintf(stdout, "%08x %08x", d.v1, d.v2);
}

$ cat align_f1.c
#include <stdint.h>

void f1(uint32_t *v1) {
    *v1 += 42;
}

$ cat align_f2.c
#include <stdint.h>

uint32_t f2(uint32_t v2) {
    return v2++;
}

$ arm-linux-gnueabihf-gcc-12 -g -O2 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align_f1.c align_f2.c
align.c: In function ‘main’:
align.c:18:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
   18 |     f1(&d.v1);
      |        ^~~~~

...note the usual dance before f2() is called, but not before the call
to f1():

$ arm-linux-gnueabihf-objdump -S --disassemble=main align
[...]
    struct disarray d = { 0x55, 0xaa, 0xaa };
 43e:	e903 0007 	stmdb	r3, {r0, r1, r2}

    f1(&d.v1);
 442:	f10d 0005 	add.w	r0, sp, #5
 446:	f000 f8a5 	bl	594 <f1>
    f2(d.v2);
 44a:	f89d 300c 	ldrb.w	r3, [sp, #12]
 44e:	9802      	ldr	r0, [sp, #8]
 450:	061b      	lsls	r3, r3, #24
 452:	ea43 2010 	orr.w	r0, r3, r0, lsr #8
 456:	f000 f8a1 	bl	59c <f2>
[...]

-- 
Stefano


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP
  2024-03-08  0:08                             ` Stefano Brivio
@ 2024-03-08  1:20                               ` David Gibson
  0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2024-03-08  1:20 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 12921 bytes --]

On Fri, Mar 08, 2024 at 01:08:45AM +0100, Stefano Brivio wrote:
> On Wed, 6 Mar 2024 16:09:23 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:
> > > On Mon, 4 Mar 2024 12:00:40 +0100
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >   
> > > > On Mon, 4 Mar 2024 12:54:12 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >   
> > > > > On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:    
> > > > > > On Fri, 1 Mar 2024 10:09:39 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:      
> > > > > > > > On Thu, 29 Feb 2024 09:56:25 +0100
> > > > > > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > > > >         
> > > > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100
> > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > >         
> > > > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:          
> > > > > > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100
> > > > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > > > >             
> > > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:            
> > > > > > > > > > > > > On 2/19/24 04:08, David Gibson wrote:              
> > > > > > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:  
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [...]
> > > > > > > > > > > > > >            
> > > > > > > > > > > > > > > +/**
> > > > > > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an
> > > > > > > > > > > > > > > + * 			      IPv6 header for UDP or TCP
> > > > > > > > > > > > > > > + * @payload_len:	Payload length
> > > > > > > > > > > > > > > + * @proto:		Protocol number
> > > > > > > > > > > > > > > + * @saddr:		Source address
> > > > > > > > > > > > > > > + * @daddr:		Destination address
> > > > > > > > > > > > > > > + * Returns:	Partial checksum of the IPv6 header
> > > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
> > > > > > > > > > > > > > > +				struct in6_addr saddr, struct in6_addr daddr)              
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might
> > > > > > > > > > > > > > not be what we want.              
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and
> > > > > > > > > > > > > &ip6h->daddr can be misaligned).              
> > > > > > > > > > > > 
> > > > > > > > > > > > Ah, right.  That's a neat idea, but I'm not sure it really helps: I
> > > > > > > > > > > > think it will just move the misaligned access from inside the function
> > > > > > > > > > > > to the call site, where we try to marshal the parameter from something
> > > > > > > > > > > > unaligned.            
> > > > > > > > > > > 
> > > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the
> > > > > > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory
> > > > > > > > > > > from an aligned pointer, and extract a value from this memory, it's all
> > > > > > > > > > > fine.            
> > > > > > > > > > 
> > > > > > > > > > Right, that's kind of what I'm getting at.  Assuming this value starts
> > > > > > > > > > in an unaligned buffer, then in order to pass this by value the caller
> > > > > > > > > > will need to load from that unaligned pointer.  AFAIK, the compiler
> > > > > > > > > > will base the type of loads only on the pointed to type, which isn't
> > > > > > > > > > changed whether we dereference in the caller or the callee.
> > > > > > > > > >           
> > > > > > > > > > > 
> > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models:
> > > > > > > > > > > 
> > > > > > > > > > > 	la	$1, 1002   # s1 now contains the value 1002
> > > > > > > > > > > 	lw	$2, 0($1)  # load word from memory at 1002 + 0 into s2
> > > > > > > > > > > 
> > > > > > > > > > > but this is:
> > > > > > > > > > > 
> > > > > > > > > > > 	la	$1, 1000   # s1 now contains the value 1000
> > > > > > > > > > > 	la	$2, 1004   # s3 now contains the value 1004
> > > > > > > > > > > 	lw	$3, 0($1)  # load word from memory at 1000 + 0 into s3
> > > > > > > > > > > 	lw	$4, 0($3)  # load word from memory at 1004 + 0 into s4
> > > > > > > > > > > 	sll	$5, $3, 16 # 16-bit shift left s3 into s5
> > > > > > > > > > > 	srl	$6, $4, 16 # 16-bit shift right s4 into s6
> > > > > > > > > > > 	or	$2, $5, $6 # OR s5 and s6 into s2            
> > > > > > > > > > 
> > > > > > > > > > Right, but I don't think merely moving the dereference to the caller
> > > > > > > > > > will necessarily induce the compiler to generate this rather than the
> > > > > > > > > > former.          
> > > > > > > > > 
> > > > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the
> > > > > > > > > patch yet).        
> > > > > > > > 
> > > > > > > > ...no, that's not the case. Dereferencing 'iph' from
> > > > > > > > struct tcp[46]_l2_buf_t is fine:
> > > > > > > > 
> > > > > > > > struct tcp4_l2_buf_t {
> > > > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > > > >         struct iphdr               iph;                  /*    20    20 */
> > > > > > > > 	[...]
> > > > > > > > } __attribute__((__packed__));
> > > > > > > > 
> > > > > > > > struct tcp6_l2_buf_t {
> > > > > > > >         uint8_t                    pad[2];               /*     0     2 */
> > > > > > > >         struct tap_hdr             taph;                 /*     2    18 */
> > > > > > > >         struct ipv6hdr             ip6h;                 /*    20    40 */
> > > > > > > > 	[...]
> > > > > > > > } __attribute__((__packed__));
> > > > > > > > 
> > > > > > > > The problematic structures are the UDP buffers:
> > > > > > > > 
> > > > > > > > struct udp4_l2_buf_t {
> > > > > > > >         struct sockaddr_in         s_in;                 /*     0    16 */
> > > > > > > >         struct tap_hdr             taph;                 /*    16    18 */
> > > > > > > >         struct iphdr               iph;                  /*    34    20 */
> > > > > > > > 	[...]
> > > > > > > > } __attribute__((__aligned__(4)));
> > > > > > > > 
> > > > > > > > and for UDP, this patch is dereferencing buffer pointers only, not
> > > > > > > > pointers to headers.        
> > > > > > > 
> > > > > > > Ok... but my point remains, I'm not seeing that passing the address by
> > > > > > > value actually helps - it just seems to change whether we need to
> > > > > > > handle the unaligned load in the caller or the callee.      
> > > > > > 
> > > > > > For UDP and IPv4 (from 6/9):
> > > > > > 
> > > > > > +       b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP,
> > > > > > +                                      b->iph.saddr, b->iph.daddr);
> > > > > > 
> > > > > > and for IPv6 (this patch):
> > > > > > 
> > > > > > +       b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len),
> > > > > > +                          proto_ipv6_header_psum(b->ip6h.payload_len,
> > > > > > +                                                 IPPROTO_UDP,
> > > > > > +                                                 b->ip6h.saddr,
> > > > > > +                                                 b->ip6h.daddr));
> > > > > > 
> > > > > > these cause loads starting from 'b', which is aligned, instead of
> > > > > > passing 'iph' or 'ip6h', unaligned, and loading from there.      
> > > > > 
> > > > > No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and
> > > > > b->ip6h.payload_len.    
> > > > 
> > > > It depends how we define "loading from" -- the problem, in general, is
> > > > not the memory location per se, the problem is dereferencing memory
> > > > pointers.
> > > > 
> > > > I plan to try an example on MIPS in a bit [...]  
> > > 
> > > Actually, armhf first (for clarity):
> > > 
> > > $ cat align.c
> > > #include <stdio.h>
> > > #include <stdint.h>
> > > 
> > > struct disarray {
> > >     uint8_t oops;
> > >     uint32_t v1;
> > >     uint32_t v2;
> > > } __attribute__((packed, aligned(__alignof__(unsigned int))));
> > > 
> > > void f1(uint32_t *v1) {
> > >     *v1 += 42;
> > > }
> > > 
> > > uint32_t f2(uint32_t v2) {
> > >     return v2++;
> > > }
> > > 
> > > int main()
> > > {
> > >     struct disarray d = { 0x55, 0xaa, 0xaa };
> > > 
> > >     f1(&d.v1);
> > >     f2(d.v2);
> > > 
> > >     fprintf(stdout, "%08x %08x", d.v1, d.v2);
> > > }
> > > 
> > > $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c
> > > align.c: In function ‘main’:
> > > align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member]
> > >    22 |     f1(&d.v1);
> > >       |        ^~~~~
> > > 
> > > $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> > > [...]
> > >     f1(&d.v1);
> > >  562:	ab01      	add	r3, sp, #4
> > >  564:	3301      	adds	r3, #1
> > >  566:	4618      	mov	r0, r3
> > >  568:	f7ff ffde 	bl	528 <f1>
> > > [...]
> > > 
> > > before the call to f1(), the address in r3 is not aligned (we just
> > > added #1), despite -mno-unaligned-access. I guess gcc can only warn
> > > about that, but not fix it.
> > > 
> > > This:
> > >   https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
> > > 
> > > says:
> > >   -munaligned-access
> > >   -mno-unaligned-access
> > > 
> > >     Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. 
> > > 
> > > Implying, I guess, that on those architectures unaligned accesses
> > > shouldn't be done. I think Thumb mode also has issues with this, by
> > > the way. 
> > > 
> > > And in f1() we just have a ldr from that address (passed on r0):
> > > void f1(uint32_t *v1) {
> > >  528:	b082      	sub	sp, #8
> > >  52a:	9001      	str	r0, [sp, #4]
> > >     *v1 += 42;
> > >  52c:	9b01      	ldr	r3, [sp, #4]
> > >  52e:	681b      	ldr	r3, [r3, #0]
> > >  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> > > 
> > > $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align
> > > [...]
> > >     *v1 += 42;
> > >  52c:	9b01      	ldr	r3, [sp, #4]
> > >  52e:	681b      	ldr	r3, [r3, #0]
> > >  530:	f103 022a 	add.w	r2, r3, #42	@ 0x2a
> > > 
> > > ...but the call to f2() is fine: we load with offset 8 from the stack
> > > pointer, shift word right, load from offset 12, shift word left, OR:
> > > 
> > > $ arm-linux-gnueabihf-objdump -S --disassemble=main align
> > > [...]
> > >     f2(d.v2);
> > >  56c:	9b02      	ldr	r3, [sp, #8]
> > >  56e:	0a1b      	lsrs	r3, r3, #8
> > >  570:	f89d 200c 	ldrb.w	r2, [sp, #12]
> > >  574:	0612      	lsls	r2, r2, #24
> > >  576:	4313      	orrs	r3, r2
> > >  578:	4618      	mov	r0, r3
> > >  57a:	f7ff ffe0 	bl	53e <f2>
> > > [...]  
> > 
> > Huh.  Ok, so I guess the compiler realises it's doing a load from a
> > packed structure and generates the necessary fixup code.  I thought it
> > would only consider the type of the actually loaded value.
> 
> Well, it can't just do that, because otherwise we couldn't use packed
> structures on any architecture that doesn't support unaligned accesses,
> right?

Hm.. yeah, I guess not, I hadn't thought it through that way.  I find
it difficult to reason about what will happen with packed structures,
since they explicitly break rules which are otherwise pretty much
universal.

> Once you pass a pointer to an unaligned value, though, the fixup
> information is lost, and the compiler models a function as simply taking
> a given pointer with a given type: the model doesn't include information
> as to where the value is stored.

Right.  Everything you've said about this now makes sense to me with
this realisation.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-03-08  1:20 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-17 15:07 [PATCH v3 0/9] Add vhost-user support to passt (part 1) Laurent Vivier
2024-02-17 15:07 ` [PATCH v3 1/9] iov: add some functions to manage iovec Laurent Vivier
2024-02-19  2:45   ` David Gibson
2024-02-17 15:07 ` [PATCH v3 2/9] pcap: add pcap_iov() Laurent Vivier
2024-02-19  2:50   ` David Gibson
2024-02-17 15:07 ` [PATCH v3 3/9] checksum: align buffers Laurent Vivier
2024-02-17 15:07 ` [PATCH v3 4/9] checksum: add csum_iov() Laurent Vivier
2024-02-19  2:52   ` David Gibson
2024-02-17 15:07 ` [PATCH v3 5/9] util: move IP stuff from util.[ch] to ip.[ch] Laurent Vivier
2024-02-19  2:59   ` David Gibson
2024-02-17 15:07 ` [PATCH v3 6/9] checksum: use csum_ip4_header() in udp.c and tcp.c Laurent Vivier
2024-02-19  3:01   ` David Gibson
2024-02-29 16:24   ` Stefano Brivio
2024-02-29 23:10     ` David Gibson
2024-03-01  7:58       ` Stefano Brivio
2024-03-01 12:23         ` David Gibson
2024-03-04 17:04           ` Stefano Brivio
2024-02-17 15:07 ` [PATCH v3 7/9] checksum: introduce functions to compute the header part checksum for TCP/UDP Laurent Vivier
2024-02-19  3:08   ` David Gibson
2024-02-28 13:26     ` Laurent Vivier
2024-02-29  0:38       ` David Gibson
2024-02-29  7:05         ` Stefano Brivio
2024-02-29  8:49           ` David Gibson
2024-02-29  8:56             ` Stefano Brivio
2024-02-29 14:15               ` Stefano Brivio
2024-02-29 23:09                 ` David Gibson
2024-03-01  6:56                   ` Stefano Brivio
2024-03-04  1:54                     ` David Gibson
2024-03-04 11:00                       ` Stefano Brivio
2024-03-04 22:47                         ` Stefano Brivio
2024-03-06  5:09                           ` David Gibson
2024-03-08  0:08                             ` Stefano Brivio
2024-03-08  1:20                               ` David Gibson
2024-02-17 15:07 ` [PATCH v3 8/9] tap: make tap_update_mac() generic Laurent Vivier
2024-02-17 15:07 ` [PATCH v3 9/9] tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Laurent Vivier
2024-02-19  3:14   ` David Gibson
2024-02-29 16:29   ` Stefano Brivio
2024-02-29 16:31 ` [PATCH v3 0/9] Add vhost-user support to passt (part 1) Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).