public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 00/14] Clean up checksum and header generation for inbound packets
@ 2022-10-17  8:57 David Gibson
  2022-10-17  8:57 ` [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums David Gibson
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

The main packet "fast paths" for UDP and TCP mostly just forward
packets rather than generating them from scratch.  However the control
paths for ICMP and DHCP sometimes generate packets more or less from
scratch.  Because these are relatively rare, it's not performance
critical.

The paths for sending these packets have some duplication of the
header generation.  There's also some layering violation in
tap_ip_send() which both generates IP headers and updates the L4 (UDP
or UCMP) checksum.

Finally that checksum generation is a little awkward: it temporarily
generates the IP pseudo header (or something close enough to serve) in
the place of the actual header, generates the checksum, then replaces
it with the real IP header.  This approach seems to be causing
miscompiles with some LTO optimization, because the stores to the
pseudo header are being moved or elided across the code calculating
the checksum.

This series addresses all of these.  We consolidate and clarify the
packet sending helpers, and use them in some places there was
previously duplicated code.  In the process we use new checksum
generation helpers which take a different approach which should avoid
the LTO problems (this aspect I haven't tested yet though).

David Gibson (14):
  Add csum_icmp6() helper for calculating ICMPv6 checksums
  Add csum_icmp4() helper for calculating ICMPv4 checksums
  Add csum_udp6() helper for calculating UDP over IPv6 checksums
  Add csum_udp4() helper for calculating UDP over IPv4 checksums
  Add csum_ip4_header() helper to calculate IPv4 header checksums
  Add helpers for normal inbound packet destination addresses
  Remove support for TCP packets from tap_ip_send()
  tap: Remove unhelpeful vnet_pre optimization from tap_send()
  Split tap_ip_send() into IPv4 and IPv6 specific functions
  tap: Split tap_ip6_send() into UDP and ICMP variants
  ndp: Remove unneeded eh_source parameter
  ndp: Use tap_icmp6_send() helper
  tap: Split tap_ip4_send() into UDP and ICMP variants
  dhcp: Use tap_udp4_send() helper in dhcp()

 arp.c      |   2 +-
 checksum.c | 122 ++++++++++++++++++-----
 checksum.h |  19 +++-
 dhcp.c     |  19 +---
 dhcpv6.c   |  21 +---
 icmp.c     |  12 +--
 ndp.c      |  28 +-----
 ndp.h      |   3 +-
 tap.c      | 286 ++++++++++++++++++++++++++++++++---------------------
 tap.h      |  19 +++-
 10 files changed, 323 insertions(+), 208 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:01   ` Stefano Brivio
  2022-10-17  8:57 ` [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums David Gibson
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

At least two places in passt calculate ICMPv6 checksums, ndp() and
tap_ip_send().  Add a helper to handle this calculation in both places.
For future flexibility, the new helper takes parameters for the fields in
the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
be explicitly constructed.  It also allows the ICMPv6 header and payload to
be in separate buffers, although we don't use this yet.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 27 +++++++++++++++++++++++++++
 checksum.h |  7 +++++++
 ndp.c      |  5 +----
 tap.c      |  6 ++----
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/checksum.c b/checksum.c
index 56ad01e..0e207c8 100644
--- a/checksum.c
+++ b/checksum.c
@@ -52,6 +52,8 @@
 #include <stddef.h>
 #include <stdint.h>
 
+#include <linux/icmpv6.h>
+
 /**
  * sum_16b() - Calculate sum of 16-bit words
  * @buf:	Input buffer
@@ -105,6 +107,31 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
+/**
+ * csum_icmp6() - Calculate checksum for an ICMPv6 packet
+ * @icmp6hr:	ICMPv6 header, initialized apart from checksum
+ * @saddr:	IPv6 source address
+ * @daddr:	IPv6 destination address
+ * @payload:	ICMP packet payload
+ * @len:	Length of @payload (not including ICMPv6 header)
+ */
+void csum_icmp6(struct icmp6hdr *icmp6hr,
+		const struct in6_addr *saddr,
+		const struct in6_addr *daddr,
+		const void *payload,
+		size_t len)
+{
+	/* Partial checksum for the pseudo-IPv6 header */
+	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
+		sum_16b(daddr, sizeof(*daddr)) +
+		htons(len + sizeof(*icmp6hr)) + htons(IPPROTO_ICMPV6);
+
+	icmp6hr->icmp6_cksum = 0;
+	/* Add in partial checksum for the ICMPv6 header alone */
+	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
+	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
+}
+
 /**
  * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place
  * @iph:	Packet buffer, IP header
diff --git a/checksum.h b/checksum.h
index 5418406..2c72200 100644
--- a/checksum.h
+++ b/checksum.h
@@ -6,9 +6,16 @@
 #ifndef CHECKSUM_H
 #define CHECKSUM_H
 
+struct icmp6hdr;
+
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
+void csum_icmp6(struct icmp6hdr *ih,
+		const struct in6_addr *saddr,
+		const struct in6_addr *daddr,
+		const void *payload,
+		size_t len);
 void csum_tcp4(struct iphdr *iph);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
 
diff --git a/ndp.c b/ndp.c
index dec36a9..03f1d06 100644
--- a/ndp.c
+++ b/ndp.c
@@ -189,10 +189,7 @@ dns_done:
 		ip6hr->saddr = c->ip6.addr_ll;
 
 	ip6hr->payload_len = htons(sizeof(*ihr) + len);
-	ip6hr->hop_limit = IPPROTO_ICMPV6;
-	ihr->icmp6_cksum = 0;
-	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
-						 sizeof(*ihr) + len, 0);
+	csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len);
 
 	ip6hr->version = 6;
 	ip6hr->nexthdr = IPPROTO_ICMPV6;
diff --git a/tap.c b/tap.c
index 8b6d9bc..aafc92b 100644
--- a/tap.c
+++ b/tap.c
@@ -191,10 +191,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (proto == IPPROTO_ICMPV6) {
 			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
 
-			ih->icmp6_cksum = 0;
-			ih->icmp6_cksum = csum_unaligned(ip6h,
-							 len + sizeof(*ip6h),
-							 0);
+			csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
+				   ih + 1, len - sizeof(*ih));
 		}
 		ip6h->version = 6;
 		ip6h->nexthdr = proto;
-- 
@@ -191,10 +191,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (proto == IPPROTO_ICMPV6) {
 			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
 
-			ih->icmp6_cksum = 0;
-			ih->icmp6_cksum = csum_unaligned(ip6h,
-							 len + sizeof(*ip6h),
-							 0);
+			csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
+				   ih + 1, len - sizeof(*ih));
 		}
 		ip6h->version = 6;
 		ip6h->nexthdr = proto;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
  2022-10-17  8:57 ` [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:01   ` Stefano Brivio
  2022-10-17  8:57 ` [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums David Gibson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

Although tap_ip_send() is currently the only place calculating ICMPv4
checksums, create a helper function for symmetry with ICMPv6.  For future
flexibility it allows the ICMPv6 header and payload to be in separate
buffers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 15 +++++++++++++++
 checksum.h |  2 ++
 tap.c      |  4 +---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/checksum.c b/checksum.c
index 0e207c8..c8b6b42 100644
--- a/checksum.c
+++ b/checksum.c
@@ -52,6 +52,7 @@
 #include <stddef.h>
 #include <stdint.h>
 
+#include <linux/icmp.h>
 #include <linux/icmpv6.h>
 
 /**
@@ -107,6 +108,20 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
+/**
+ * csum_icmp4() - Calculate checksum for an ICMPv4 packet
+ * @icmp4hr:	ICMPv4 header, initialized apart from checksum
+ * @payload:	ICMPv4 packet payload
+ * @len:	Length of @payload (not including ICMPv4 header)
+ */
+void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
+{
+	/* Partial checksum for ICMPv4 header alone */
+	uint32_t hrsum = sum_16b(icmp4hr, sizeof(*icmp4hr));
+	icmp4hr->checksum = 0;
+	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
+}
+
 /**
  * csum_icmp6() - Calculate checksum for an ICMPv6 packet
  * @icmp6hr:	ICMPv6 header, initialized apart from checksum
diff --git a/checksum.h b/checksum.h
index 2c72200..ff95cf9 100644
--- a/checksum.h
+++ b/checksum.h
@@ -6,11 +6,13 @@
 #ifndef CHECKSUM_H
 #define CHECKSUM_H
 
+struct icmphdr;
 struct icmp6hdr;
 
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
+void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
 void csum_icmp6(struct icmp6hdr *ih,
 		const struct in6_addr *saddr,
 		const struct in6_addr *daddr,
diff --git a/tap.c b/tap.c
index aafc92b..f082901 100644
--- a/tap.c
+++ b/tap.c
@@ -148,9 +148,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 			uh->check = 0;
 		} else if (iph->protocol == IPPROTO_ICMP) {
 			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
-
-			ih->checksum = 0;
-			ih->checksum = csum_unaligned(ih, len, 0);
+			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
 		}
 
 		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0)
-- 
@@ -148,9 +148,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 			uh->check = 0;
 		} else if (iph->protocol == IPPROTO_ICMP) {
 			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
-
-			ih->checksum = 0;
-			ih->checksum = csum_unaligned(ih, len, 0);
+			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
 		}
 
 		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
  2022-10-17  8:57 ` [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums David Gibson
  2022-10-17  8:57 ` [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:02   ` Stefano Brivio
  2022-10-17  8:57 ` [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums David Gibson
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

Add a helper for calculating UDP checksums when used over IPv6
For future flexibility, the new helper takes parameters for the fields in
the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
be explicitly constructed.  It also allows the UDP header and payload to
be in separate buffers, although we don't use this yet.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 23 +++++++++++++++++++++++
 checksum.h |  5 +++++
 tap.c      |  5 ++---
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/checksum.c b/checksum.c
index c8b6b42..0849fb1 100644
--- a/checksum.c
+++ b/checksum.c
@@ -52,6 +52,7 @@
 #include <stddef.h>
 #include <stdint.h>
 
+#include <linux/udp.h>
 #include <linux/icmp.h>
 #include <linux/icmpv6.h>
 
@@ -122,6 +123,28 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
 	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
 }
 
+/**
+ * csum_udp6() - Calculate checksum for a UDP over IPv6 packet
+ * @udp6hr:	UDP header, initialized apart from checksum
+ * @payload:	UDP packet payload
+ * @len:	Length of @payload (not including UDP header)
+ */
+void csum_udp6(struct udphdr *udp6hr,
+	       const struct in6_addr *saddr,
+	       const struct in6_addr *daddr,
+	       const void *payload, size_t len)
+{
+	/* Partial checksum for the pseudo-IPv6 header */
+	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
+		sum_16b(daddr, sizeof(*daddr)) +
+		htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
+
+	udp6hr->check = 0;
+	/* Add in partial checksum for the UDP header alone */
+	psum += sum_16b(udp6hr, sizeof(*udp6hr));
+	udp6hr->check = csum_unaligned(payload, len, psum);
+}
+
 /**
  * csum_icmp6() - Calculate checksum for an ICMPv6 packet
  * @icmp6hr:	ICMPv6 header, initialized apart from checksum
diff --git a/checksum.h b/checksum.h
index ff95cf9..1b9f48e 100644
--- a/checksum.h
+++ b/checksum.h
@@ -6,6 +6,7 @@
 #ifndef CHECKSUM_H
 #define CHECKSUM_H
 
+struct udphdr;
 struct icmphdr;
 struct icmp6hdr;
 
@@ -13,6 +14,10 @@ uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
 void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
+void csum_udp6(struct udphdr *udp6hr,
+	       const struct in6_addr *saddr,
+	       const struct in6_addr *daddr,
+	       const void *payload, size_t len);
 void csum_icmp6(struct icmp6hdr *ih,
 		const struct in6_addr *saddr,
 		const struct in6_addr *daddr,
diff --git a/tap.c b/tap.c
index f082901..9c197cb 100644
--- a/tap.c
+++ b/tap.c
@@ -183,9 +183,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (proto == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
 
-			uh->check = 0;
-			uh->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
-						   0);
+			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
+				  uh + 1, len - sizeof(*uh));
 		} else if (proto == IPPROTO_ICMPV6) {
 			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
 
-- 
@@ -183,9 +183,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (proto == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
 
-			uh->check = 0;
-			uh->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
-						   0);
+			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
+				  uh + 1, len - sizeof(*uh));
 		} else if (proto == IPPROTO_ICMPV6) {
 			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (2 preceding siblings ...)
  2022-10-17  8:57 ` [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:03   ` Stefano Brivio
  2022-10-17  8:57 ` [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums David Gibson
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

At least two places in passt fill in UDP over IPv4 checksums, although
since UDP checksums are optional with IPv4 that just amounts to storing
a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in
dhcp()).  For consistency, add a helper for this "calculation".

Just for the heck of it, add the option (compile time disabled for now) to
calculate real UDP checksums.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 33 +++++++++++++++++++++++++++++++++
 checksum.h |  3 +++
 dhcp.c     |  2 +-
 tap.c      |  2 +-
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/checksum.c b/checksum.c
index 0849fb1..72f1cfb 100644
--- a/checksum.c
+++ b/checksum.c
@@ -56,6 +56,11 @@
 #include <linux/icmp.h>
 #include <linux/icmpv6.h>
 
+/* Checksums are optional for UDP over IPv4, so we usually just set
+ * them to 0.  Change this 1 to calculate real UDP over IPv4 checksums
+ */
+#define UDP4_REAL_CHECKSUMS	0
+
 /**
  * sum_16b() - Calculate sum of 16-bit words
  * @buf:	Input buffer
@@ -109,6 +114,34 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
+/**
+ * csum_udp4() - Calculate checksum for a UDP over IPv4 packet
+ * @udp4hr:	UDP header, initialized apart from checksum
+ * @saddr:	IPv4 source address
+ * @daddr:	IPv4 destination address
+ * @payload:	ICMPv4 packet payload
+ * @len:	Length of @payload (not including UDP)
+ */
+void csum_udp4(struct udphdr *udp4hr,
+	       in_addr_t saddr, in_addr_t daddr,
+	       const void *payload, size_t len)
+{
+	/* UDP checksums are optional, so don't bother */
+	udp4hr->check = 0;
+
+	if (UDP4_REAL_CHECKSUMS) {
+		/* UNTESTED: if we did want real UDPv4 checksums, this
+		 * is roughly what we'd need */
+		uint32_t psum = csum_fold(htonl(saddr))
+			+ csum_fold(htonl(daddr))
+			+ htons(len + sizeof(*udp4hr))
+			+ htons(IPPROTO_UDP);
+		/* Add in partial checksum for the UDP header alone */
+		psum += sum_16b(udp4hr, sizeof(*udp4hr));
+		udp4hr->check = csum_unaligned(payload, len, psum);
+	}
+}
+
 /**
  * csum_icmp4() - Calculate checksum for an ICMPv4 packet
  * @icmp4hr:	ICMPv4 header, initialized apart from checksum
diff --git a/checksum.h b/checksum.h
index 1b9f48e..a9502b9 100644
--- a/checksum.h
+++ b/checksum.h
@@ -13,6 +13,9 @@ struct icmp6hdr;
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
+void csum_udp4(struct udphdr *udp4hr,
+	       in_addr_t saddr, in_addr_t daddr,
+	       const void *payload, size_t len);
 void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
 void csum_udp6(struct udphdr *udp6hr,
 	       const struct in6_addr *saddr,
diff --git a/dhcp.c b/dhcp.c
index 7f0cc0b..8dcf645 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -364,9 +364,9 @@ int dhcp(const struct ctx *c, const struct pool *p)
 		opt_set_dns_search(c, sizeof(m->o));
 
 	uh->len = htons(len = offsetof(struct msg, o) + fill(m) + sizeof(*uh));
-	uh->check = 0;
 	uh->source = htons(67);
 	uh->dest = htons(68);
+	csum_udp4(uh, c->ip4.gw, c->ip4.addr, uh + 1, len - sizeof(*uh));
 
 	iph->tot_len = htons(len += sizeof(*iph));
 	iph->daddr = c->ip4.addr;
diff --git a/tap.c b/tap.c
index 9c197cb..58fc1de 100644
--- a/tap.c
+++ b/tap.c
@@ -145,7 +145,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (iph->protocol == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(iph + 1);
 
-			uh->check = 0;
+			csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh));
 		} else if (iph->protocol == IPPROTO_ICMP) {
 			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
 			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
-- 
@@ -145,7 +145,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		} else if (iph->protocol == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(iph + 1);
 
-			uh->check = 0;
+			csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh));
 		} else if (iph->protocol == IPPROTO_ICMP) {
 			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
 			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (3 preceding siblings ...)
  2022-10-17  8:57 ` [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:03   ` Stefano Brivio
  2022-10-17  8:57 ` [PATCH 06/14] Add helpers for normal inbound packet destination addresses David Gibson
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

We calculate IPv4 header checksums in at least two places, in dhcp() and
in tap_ip_send.  Add a helper to handle this calculation in both places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 6 ++++++
 checksum.h | 1 +
 dhcp.c     | 3 +--
 tap.c      | 3 +--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/checksum.c b/checksum.c
index 72f1cfb..f25a96a 100644
--- a/checksum.c
+++ b/checksum.c
@@ -114,6 +114,12 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
 	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
 }
 
+void csum_ip4_header(struct iphdr *ip4hr)
+{
+	ip4hr->check = 0;
+	ip4hr->check = csum_unaligned(ip4hr, (size_t)ip4hr->ihl * 4, 0);
+}
+
 /**
  * csum_udp4() - Calculate checksum for a UDP over IPv4 packet
  * @udp4hr:	UDP header, initialized apart from checksum
diff --git a/checksum.h b/checksum.h
index a9502b9..bdb2ed2 100644
--- a/checksum.h
+++ b/checksum.h
@@ -13,6 +13,7 @@ struct icmp6hdr;
 uint32_t sum_16b(const void *buf, size_t len);
 uint16_t csum_fold(uint32_t sum);
 uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
+void csum_ip4_header(struct iphdr *ip4hr);
 void csum_udp4(struct udphdr *udp4hr,
 	       in_addr_t saddr, in_addr_t daddr,
 	       const void *payload, size_t len);
diff --git a/dhcp.c b/dhcp.c
index 8dcf645..875e18b 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -371,8 +371,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	iph->tot_len = htons(len += sizeof(*iph));
 	iph->daddr = c->ip4.addr;
 	iph->saddr = c->ip4.gw;
-	iph->check = 0;
-	iph->check = csum_unaligned(iph, (intptr_t)(iph->ihl * 4), 0);
+	csum_ip4_header(iph);
 
 	len += sizeof(*eh);
 	memcpy(eh->h_dest, eh->h_source, ETH_ALEN);
diff --git a/tap.c b/tap.c
index 58fc1de..de02c56 100644
--- a/tap.c
+++ b/tap.c
@@ -135,8 +135,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		iph->daddr = c->ip4.addr_seen;
 		memcpy(&iph->saddr, &src->s6_addr[12], 4);
 
-		iph->check = 0;
-		iph->check = csum_unaligned(iph, (size_t)iph->ihl * 4, 0);
+		csum_ip4_header(iph);
 
 		memcpy(data, in, len);
 
-- 
@@ -135,8 +135,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		iph->daddr = c->ip4.addr_seen;
 		memcpy(&iph->saddr, &src->s6_addr[12], 4);
 
-		iph->check = 0;
-		iph->check = csum_unaligned(iph, (size_t)iph->ihl * 4, 0);
+		csum_ip4_header(iph);
 
 		memcpy(data, in, len);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/14] Add helpers for normal inbound packet destination addresses
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (4 preceding siblings ...)
  2022-10-17  8:57 ` [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums David Gibson
@ 2022-10-17  8:57 ` David Gibson
  2022-10-18  3:04   ` Stefano Brivio
  2022-10-17  8:58 ` [PATCH 07/14] Remove support for TCP packets from tap_ip_send() David Gibson
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

tap_ip_send() doesn't take a destination address, because it's specifically
for inbound packets, and the IP addresses of the guest/namespace are
already known to us.  Rather than open-coding this destination address
logic, make helper functions for it which will enable some later cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 29 ++++++++++++++++++++++++-----
 tap.h |  3 +++
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tap.c b/tap.c
index de02c56..41e8ff2 100644
--- a/tap.c
+++ b/tap.c
@@ -96,6 +96,28 @@ int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre)
 	return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len);
 }
 
+/**
+ * tap_ip4_daddr() - Normal IPv4 destination address for inbound packets
+ * @c:		Execution context
+ */
+in_addr_t tap_ip4_daddr(const struct ctx *c)
+{
+	return c->ip4.addr_seen;
+}
+
+/**
+ * tap_ip6_daddr() - Normal IPv4 destination address for inbound packets
+ * @c:		Execution context
+ * @src:	Source address
+ */
+const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
+				     const struct in6_addr *src)
+{
+	if (IN6_IS_ADDR_LINKLOCAL(src))
+		return &c->ip6.addr_ll_seen;
+	return &c->ip6.addr_seen;
+}
+
 /**
  * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
  * @c:		Execution context
@@ -132,7 +154,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		iph->frag_off = 0;
 		iph->ttl = 255;
 		iph->protocol = proto;
-		iph->daddr = c->ip4.addr_seen;
+		iph->daddr = tap_ip4_daddr(c);
 		memcpy(&iph->saddr, &src->s6_addr[12], 4);
 
 		csum_ip4_header(iph);
@@ -163,10 +185,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		ip6h->priority = 0;
 
 		ip6h->saddr = *src;
-		if (IN6_IS_ADDR_LINKLOCAL(src))
-			ip6h->daddr = c->ip6.addr_ll_seen;
-		else
-			ip6h->daddr = c->ip6.addr_seen;
+		ip6h->daddr = *tap_ip6_daddr(c, src);
 
 		memcpy(data, in, len);
 
diff --git a/tap.h b/tap.h
index df3aec0..a6764b4 100644
--- a/tap.h
+++ b/tap.h
@@ -6,6 +6,9 @@
 #ifndef TAP_H
 #define TAP_H
 
+in_addr_t tap_ip4_daddr(const struct ctx *c);
+const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
+				     const struct in6_addr *src);
 void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		 const char *in, size_t len, uint32_t flow);
 int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);
-- 
@@ -6,6 +6,9 @@
 #ifndef TAP_H
 #define TAP_H
 
+in_addr_t tap_ip4_daddr(const struct ctx *c);
+const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
+				     const struct in6_addr *src);
 void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		 const char *in, size_t len, uint32_t flow);
 int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/14] Remove support for TCP packets from tap_ip_send()
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (5 preceding siblings ...)
  2022-10-17  8:57 ` [PATCH 06/14] Add helpers for normal inbound packet destination addresses David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-17  8:58 ` [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send() David Gibson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

tap_ip_send() is never used for TCP packets, we're unlikely to use it for
that in future, and the handling of TCP packets makes other cleanups
unnecessarily awkward.  Remove it.

This is the only user of csum_tcp4(), so we can remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 checksum.c | 34 ----------------------------------
 checksum.h |  1 -
 tap.c      | 11 ++---------
 3 files changed, 2 insertions(+), 44 deletions(-)

diff --git a/checksum.c b/checksum.c
index f25a96a..887cfe3 100644
--- a/checksum.c
+++ b/checksum.c
@@ -209,40 +209,6 @@ void csum_icmp6(struct icmp6hdr *icmp6hr,
 	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
 }
 
-/**
- * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place
- * @iph:	Packet buffer, IP header
- */
-void csum_tcp4(struct iphdr *iph)
-{
-	uint16_t tlen = ntohs(iph->tot_len) - iph->ihl * 4, *p;
-	struct tcphdr *th;
-	uint32_t sum = 0;
-
-	th = (struct tcphdr *)((char *)iph + (intptr_t)(iph->ihl * 4));
-	p = (uint16_t *)th;
-
-	sum += (iph->saddr >> 16) & 0xffff;
-	sum += iph->saddr & 0xffff;
-	sum += (iph->daddr >> 16) & 0xffff;
-	sum += iph->daddr & 0xffff;
-
-	sum += htons(IPPROTO_TCP);
-	sum += htons(tlen);
-
-	th->check = 0;
-	while (tlen > 1) {
-		sum += *p++;
-		tlen -= 2;
-	}
-
-	if (tlen > 0) {
-		sum += *p & htons(0xff00);
-	}
-
-	th->check = (uint16_t)~csum_fold(sum);
-}
-
 #ifdef __AVX2__
 #include <immintrin.h>
 
diff --git a/checksum.h b/checksum.h
index bdb2ed2..eb3640e 100644
--- a/checksum.h
+++ b/checksum.h
@@ -27,7 +27,6 @@ void csum_icmp6(struct icmp6hdr *ih,
 		const struct in6_addr *daddr,
 		const void *payload,
 		size_t len);
-void csum_tcp4(struct iphdr *iph);
 uint16_t csum(const void *buf, size_t len, uint32_t init);
 
 #endif /* CHECKSUM_H */
diff --git a/tap.c b/tap.c
index 41e8ff2..3ad5d7c 100644
--- a/tap.c
+++ b/tap.c
@@ -161,9 +161,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 
 		memcpy(data, in, len);
 
-		if (iph->protocol == IPPROTO_TCP) {
-			csum_tcp4(iph);
-		} else if (iph->protocol == IPPROTO_UDP) {
+		if (iph->protocol == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(iph + 1);
 
 			csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh));
@@ -192,13 +190,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		ip6h->hop_limit = proto;
 		ip6h->version = 0;
 		ip6h->nexthdr = 0;
-		if (proto == IPPROTO_TCP) {
-			struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
 
-			th->check = 0;
-			th->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
-						   0);
-		} else if (proto == IPPROTO_UDP) {
+		if (proto == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
 
 			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
-- 
@@ -161,9 +161,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 
 		memcpy(data, in, len);
 
-		if (iph->protocol == IPPROTO_TCP) {
-			csum_tcp4(iph);
-		} else if (iph->protocol == IPPROTO_UDP) {
+		if (iph->protocol == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(iph + 1);
 
 			csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh));
@@ -192,13 +190,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		ip6h->hop_limit = proto;
 		ip6h->version = 0;
 		ip6h->nexthdr = 0;
-		if (proto == IPPROTO_TCP) {
-			struct tcphdr *th = (struct tcphdr *)(ip6h + 1);
 
-			th->check = 0;
-			th->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
-						   0);
-		} else if (proto == IPPROTO_UDP) {
+		if (proto == IPPROTO_UDP) {
 			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
 
 			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send()
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (6 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 07/14] Remove support for TCP packets from tap_ip_send() David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-18  3:05   ` Stefano Brivio
  2022-10-17  8:58 ` [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions David Gibson
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

Callers of tap_send() can optionally use a small optimization by adding
extra space for the 4 byte length header used on the qemu socket interface.
tap_ip_send() is currently the only user of this, but this is used only
for "slow path" ICMP and DHCP packets, so there's not a lot of value to
the optimization.

Worse, having the two paths here complicates the interface and makes future
cleanups difficult, so just remove it.  I have some plans to bring back the
optimization in a more general way in future, but for now it's just in the
way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arp.c  |  2 +-
 dhcp.c |  2 +-
 ndp.c  |  2 +-
 tap.c  | 29 +++++++++--------------------
 tap.h  |  2 +-
 5 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arp.c b/arp.c
index 0ad97af..141d43f 100644
--- a/arp.c
+++ b/arp.c
@@ -81,7 +81,7 @@ int arp(const struct ctx *c, const struct pool *p)
 	memcpy(eh->h_dest,	eh->h_source,	sizeof(eh->h_dest));
 	memcpy(eh->h_source,	c->mac,		sizeof(eh->h_source));
 
-	if (tap_send(c, eh, len, 0) < 0)
+	if (tap_send(c, eh, len) < 0)
 		perror("ARP: send");
 
 	return 1;
diff --git a/dhcp.c b/dhcp.c
index 875e18b..2b3af82 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -377,7 +377,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	memcpy(eh->h_dest, eh->h_source, ETH_ALEN);
 	memcpy(eh->h_source, c->mac, ETH_ALEN);
 
-	if (tap_send(c, eh, len, 0) < 0)
+	if (tap_send(c, eh, len) < 0)
 		perror("DHCP: send");
 
 	return 1;
diff --git a/ndp.c b/ndp.c
index 03f1d06..79be0cf 100644
--- a/ndp.c
+++ b/ndp.c
@@ -200,7 +200,7 @@ dns_done:
 	memcpy(ehr->h_source, c->mac, ETH_ALEN);
 	ehr->h_proto = htons(ETH_P_IPV6);
 
-	if (tap_send(c, ehr, len, 0) < 0)
+	if (tap_send(c, ehr, len) < 0)
 		perror("NDP: send");
 
 	return 1;
diff --git a/tap.c b/tap.c
index 3ad5d7c..ae75fac 100644
--- a/tap.c
+++ b/tap.c
@@ -66,34 +66,24 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf);
  * @c:		Execution context
  * @data:	Packet buffer
  * @len:	Total L2 packet length
- * @vnet_pre:	Buffer has four-byte headroom
  *
  * Return: return code from send() or write()
  */
-int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre)
+int tap_send(const struct ctx *c, const void *data, size_t len)
 {
-	if (vnet_pre)
-		pcap((char *)data + 4, len);
-	else
-		pcap(data, len);
+	pcap(data, len);
 
 	if (c->mode == MODE_PASST) {
 		int flags = MSG_NOSIGNAL | MSG_DONTWAIT;
+		uint32_t vnet_len = htonl(len);
 
-		if (vnet_pre) {
-			*((uint32_t *)data) = htonl(len);
-			len += 4;
-		} else {
-			uint32_t vnet_len = htonl(len);
-
-			if (send(c->fd_tap, &vnet_len, 4, flags) < 0)
-				return -1;
-		}
+		if (send(c->fd_tap, &vnet_len, 4, flags) < 0)
+			return -1;
 
 		return send(c->fd_tap, data, len, flags);
 	}
 
-	return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len);
+	return write(c->fd_tap, (char *)data, len);
 }
 
 /**
@@ -131,10 +121,9 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		 const char *in, size_t len, uint32_t flow)
 {
 	char buf[USHRT_MAX];
-	char *pkt = buf + 4;
 	struct ethhdr *eh;
 
-	eh = (struct ethhdr *)pkt;
+	eh = (struct ethhdr *)buf;
 
 	/* TODO: ARP table lookup */
 	memcpy(eh->h_dest, c->mac_guest, ETH_ALEN);
@@ -170,7 +159,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
 		}
 
-		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0)
+		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh)) < 0)
 			debug("tap: failed to send %lu bytes (IPv4)", len);
 	} else {
 		struct ipv6hdr *ip6h = (struct ipv6hdr *)(eh + 1);
@@ -211,7 +200,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 			ip6h->flow_lbl[2] = (flow >> 0) & 0xff;
 		}
 
-		if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh), 1) < 1)
+		if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh)) < 1)
 			debug("tap: failed to send %lu bytes (IPv6)", len);
 	}
 }
diff --git a/tap.h b/tap.h
index a6764b4..a8da8bb 100644
--- a/tap.h
+++ b/tap.h
@@ -11,7 +11,7 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
 void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		 const char *in, size_t len, uint32_t flow);
-int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);
+int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
 void tap_sock_init(struct ctx *c);
-- 
@@ -11,7 +11,7 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
 void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
 		 const char *in, size_t len, uint32_t flow);
-int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);
+int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
 void tap_sock_init(struct ctx *c);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (7 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send() David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-18  3:06   ` Stefano Brivio
  2022-10-17  8:58 ` [PATCH 10/14] tap: Split tap_ip6_send() into UDP and ICMP variants David Gibson
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and
it turns out that every caller (statically) knows if it is using IPv4 or
IPv6.  So split into separate tap_ip4_send() and tap_ip6_send() functions.
Use a new tap_l2_hdr() function for the very small common part.

While we're there, make some minor cleanups:
  - We were double writing some fields in the IPv6 header, so that it
    temporary matched the pseudo-header for checksum calculation.  With
    recent checksum reworks, this isn't neccessary any more.
  - We don't use any IPv4 header options, so use some sizeof() constructs
    instead of some open coded values for header length.
  - The comment used to say that the flow label was for TCP over IPv6, but
    in fact the only thing we used it for was ICMPv6

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 dhcpv6.c |   6 +-
 icmp.c   |  10 +---
 tap.c    | 176 +++++++++++++++++++++++++++++--------------------------
 tap.h    |   6 +-
 4 files changed, 102 insertions(+), 96 deletions(-)

diff --git a/dhcpv6.c b/dhcpv6.c
index e7640ce..7829968 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -531,8 +531,8 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 
 			resp_not_on_link.hdr.xid = mh->xid;
 
-			tap_ip_send(c, src, IPPROTO_UDP,
-				    (char *)&resp_not_on_link, n, mh->xid);
+			tap_ip6_send(c, src, IPPROTO_UDP,
+				     (char *)&resp_not_on_link, n, mh->xid);
 
 			return 1;
 		}
@@ -580,7 +580,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 
 	resp.hdr.xid = mh->xid;
 
-	tap_ip_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
+	tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
 	c->ip6.addr_seen = c->ip6.addr;
 
 	return 1;
diff --git a/icmp.c b/icmp.c
index 21ea2d7..61c2d90 100644
--- a/icmp.c
+++ b/icmp.c
@@ -69,10 +69,6 @@ static uint8_t icmp_act[IP_VERSIONS][DIV_ROUND_UP(ICMP_NUM_IDS, 8)];
 void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
 		       uint32_t events, const struct timespec *now)
 {
-	struct in6_addr a6 = { .s6_addr = {    0,    0,    0,    0,
-					       0,    0,    0,    0,
-					       0,    0, 0xff, 0xff,
-					       0,    0,    0,    0 } };
 	union icmp_epoll_ref *iref = &ref.r.p.icmp;
 	struct sockaddr_storage sr;
 	socklen_t sl = sizeof(sr);
@@ -109,7 +105,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
 			icmp_id_map[V6][id].seq = seq;
 		}
 
-		tap_ip_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
+		tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
 	} else {
 		struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr;
 		struct icmphdr *ih = (struct icmphdr *)buf;
@@ -127,9 +123,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
 			icmp_id_map[V4][id].seq = seq;
 		}
 
-		memcpy(&a6.s6_addr[12], &sr4->sin_addr, sizeof(sr4->sin_addr));
-
-		tap_ip_send(c, &a6, IPPROTO_ICMP, buf, n, 0);
+		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
 	}
 }
 
diff --git a/tap.c b/tap.c
index ae75fac..45547ac 100644
--- a/tap.c
+++ b/tap.c
@@ -109,100 +109,110 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 }
 
 /**
- * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
+ * tap_l2_hdr() - Build an L2 header for an inbound packet
  * @c:		Execution context
- * @src:	IPv6 source address, IPv4-mapped for IPv4 sources
- * @proto:	L4 protocol number
- * @in:		Payload
- * @len:	L4 payload length
- * @flow:	Flow label for TCP over IPv6
+ * @buf:	Buffer address at which to generate header
+ * @proto:	Ethernet protocol number for L3
+ *
+ * Returns a pointer at which to write the packet's payload
  */
-void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
-		 const char *in, size_t len, uint32_t flow)
+static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
 {
-	char buf[USHRT_MAX];
-	struct ethhdr *eh;
-
-	eh = (struct ethhdr *)buf;
+	struct ethhdr *eh = (struct ethhdr *)buf;
 
 	/* TODO: ARP table lookup */
 	memcpy(eh->h_dest, c->mac_guest, ETH_ALEN);
 	memcpy(eh->h_source, c->mac, ETH_ALEN);
+	eh->h_proto = ntohs(proto);
+	return eh + 1;
+}
 
-	if (IN6_IS_ADDR_V4MAPPED(src)) {
-		struct iphdr *iph = (struct iphdr *)(eh + 1);
-		char *data = (char *)(iph + 1);
-
-		eh->h_proto = ntohs(ETH_P_IP);
-
-		iph->version = 4;
-		iph->ihl = 5;
-		iph->tos = 0;
-		iph->tot_len = htons(len + 20);
-		iph->id = 0;
-		iph->frag_off = 0;
-		iph->ttl = 255;
-		iph->protocol = proto;
-		iph->daddr = tap_ip4_daddr(c);
-		memcpy(&iph->saddr, &src->s6_addr[12], 4);
-
-		csum_ip4_header(iph);
-
-		memcpy(data, in, len);
-
-		if (iph->protocol == IPPROTO_UDP) {
-			struct udphdr *uh = (struct udphdr *)(iph + 1);
-
-			csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh));
-		} else if (iph->protocol == IPPROTO_ICMP) {
-			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
-			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
-		}
-
-		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh)) < 0)
-			debug("tap: failed to send %lu bytes (IPv4)", len);
-	} else {
-		struct ipv6hdr *ip6h = (struct ipv6hdr *)(eh + 1);
-		char *data = (char *)(ip6h + 1);
-
-		eh->h_proto = ntohs(ETH_P_IPV6);
-
-		memset(ip6h->flow_lbl, 0, 3);
-		ip6h->payload_len = htons(len);
-		ip6h->priority = 0;
-
-		ip6h->saddr = *src;
-		ip6h->daddr = *tap_ip6_daddr(c, src);
-
-		memcpy(data, in, len);
-
-		ip6h->hop_limit = proto;
-		ip6h->version = 0;
-		ip6h->nexthdr = 0;
-
-		if (proto == IPPROTO_UDP) {
-			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
-
-			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
-				  uh + 1, len - sizeof(*uh));
-		} else if (proto == IPPROTO_ICMPV6) {
-			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
+/**
+ * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
+ * @c:		Execution context
+ * @src:	IPv4 source address
+ * @proto:	L4 protocol number
+ * @in:		Payload
+ * @len:	L4 payload length
+ */
+void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
+		  const char *in, size_t len)
+{
+	char buf[USHRT_MAX];
+	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
+	char *data = (char *)(ip4h + 1);
+
+	ip4h->version = 4;
+	ip4h->ihl = sizeof(struct iphdr) / 4;
+	ip4h->tos = 0;
+	ip4h->tot_len = htons(len + sizeof(*ip4h));
+	ip4h->id = 0;
+	ip4h->frag_off = 0;
+	ip4h->ttl = 255;
+	ip4h->protocol = proto;
+	ip4h->saddr = src;
+	ip4h->daddr = tap_ip4_daddr(c);
+	csum_ip4_header(ip4h);
+
+	memcpy(data, in, len);
+
+	if (ip4h->protocol == IPPROTO_UDP) {
+		struct udphdr *uh = (struct udphdr *)(ip4h + 1);
+
+		csum_udp4(uh, ip4h->saddr, ip4h->daddr,
+			  uh + 1, len - sizeof(*uh));
+	} else if (ip4h->protocol == IPPROTO_ICMP) {
+		struct icmphdr *ih = (struct icmphdr *)(ip4h + 1);
+		csum_icmp4(ih, ih + 1, len - sizeof(*ih));
+	}
 
-			csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
-				   ih + 1, len - sizeof(*ih));
-		}
-		ip6h->version = 6;
-		ip6h->nexthdr = proto;
-		ip6h->hop_limit = 255;
-		if (flow) {
-			ip6h->flow_lbl[0] = (flow >> 16) & 0xf;
-			ip6h->flow_lbl[1] = (flow >> 8) & 0xff;
-			ip6h->flow_lbl[2] = (flow >> 0) & 0xff;
-		}
+	if (tap_send(c, buf, len + (data - buf)) < 0)
+		debug("tap: failed to send %lu bytes (IPv4)", len);
+}
 
-		if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh)) < 1)
-			debug("tap: failed to send %lu bytes (IPv6)", len);
+/**
+ * tap_ip6_send() - Send IPv6 packet, with L2 headers, calculating L3/L4 checksums
+ * @c:		Execution context
+ * @src:	IPv6 source address
+ * @proto:	L4 protocol number
+ * @in:		Payload
+ * @len:	L4 payload length
+ * @flow:	Flow label
+ */
+void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
+		  uint8_t proto, const char *in, size_t len, uint32_t flow)
+{
+	char buf[USHRT_MAX];
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)tap_l2_hdr(c, buf, ETH_P_IPV6);
+	char *data = (char *)(ip6h + 1);
+
+	ip6h->payload_len = htons(len);
+	ip6h->priority = 0;
+	ip6h->version = 6;
+	ip6h->nexthdr = proto;
+	ip6h->hop_limit = 255;
+	ip6h->saddr = *src;
+	ip6h->daddr = *tap_ip6_daddr(c, src);
+	ip6h->flow_lbl[0] = (flow >> 16) & 0xf;
+	ip6h->flow_lbl[1] = (flow >> 8) & 0xff;
+	ip6h->flow_lbl[2] = (flow >> 0) & 0xff;
+
+	memcpy(data, in, len);
+
+	if (proto == IPPROTO_UDP) {
+		struct udphdr *uh = (struct udphdr *)(ip6h + 1);
+
+		csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
+			  uh + 1, len - sizeof(*uh));
+	} else if (proto == IPPROTO_ICMPV6) {
+		struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
+
+		csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
+			   ih + 1, len - sizeof(*ih));
 	}
+
+	if (tap_send(c, buf, len + (data - buf)) < 1)
+		debug("tap: failed to send %lu bytes (IPv6)", len);
 }
 
 PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
diff --git a/tap.h b/tap.h
index a8da8bb..011ba8e 100644
--- a/tap.h
+++ b/tap.h
@@ -9,8 +9,10 @@
 in_addr_t tap_ip4_daddr(const struct ctx *c);
 const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
-void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
-		 const char *in, size_t len, uint32_t flow);
+void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
+		  const char *in, size_t len);
+void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
+		  uint8_t proto, const char *in, size_t len, uint32_t flow);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
-- 
@@ -9,8 +9,10 @@
 in_addr_t tap_ip4_daddr(const struct ctx *c);
 const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
-void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
-		 const char *in, size_t len, uint32_t flow);
+void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
+		  const char *in, size_t len);
+void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
+		  uint8_t proto, const char *in, size_t len, uint32_t flow);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/14] tap: Split tap_ip6_send() into UDP and ICMP variants
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (8 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-17  8:58 ` [PATCH 11/14] ndp: Remove unneeded eh_source parameter David Gibson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

tap_ip6_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp6_send() and tap_icmp6_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv6 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 dhcpv6.c | 21 +++------------
 icmp.c   |  3 ++-
 tap.c    | 79 +++++++++++++++++++++++++++++++++++++++-----------------
 tap.h    |  9 +++++--
 4 files changed, 68 insertions(+), 44 deletions(-)

diff --git a/dhcpv6.c b/dhcpv6.c
index 7829968..e763aed 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -208,15 +208,8 @@ struct msg_hdr {
 	uint32_t xid:24;
 } __attribute__((__packed__));
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define UH_RESP {{{ 547, 546, 0, 0, }}}
-#else
-#define UH_RESP {{{ __bswap_constant_16(547), __bswap_constant_16(546), 0, 0 }}}
-#endif
-
 /**
  * struct resp_t - Normal advertise and reply message
- * @uh:			UDP header
  * @hdr:		DHCP message header
  * @server_id:		Server Identifier option
  * @ia_na:		Non-temporary Address option
@@ -226,7 +219,6 @@ struct msg_hdr {
  * @dns_search:		Domain Search List, here just for storage size
  */
 static struct resp_t {
-	struct udphdr  uh;
 	struct msg_hdr hdr;
 
 	struct opt_server_id server_id;
@@ -236,7 +228,6 @@ static struct resp_t {
 	struct opt_dns_servers dns_servers;
 	struct opt_dns_search dns_search;
 } __attribute__((__packed__)) resp = {
-	UH_RESP,
 	{ 0 },
 	SERVER_ID,
 
@@ -270,13 +261,11 @@ static const struct opt_status_code sc_not_on_link = {
 
 /**
  * struct resp_not_on_link_t - NotOnLink error (mandated by RFC 8415, 18.3.2.)
- * @uh:		UDP header
  * @hdr:	DHCP message header
  * @server_id:	Server Identifier option
  * @var:	Payload: IA_NA from client, status code, client ID
  */
 static struct resp_not_on_link_t {
-	struct udphdr  uh;
 	struct msg_hdr hdr;
 
 	struct opt_server_id server_id;
@@ -284,7 +273,6 @@ static struct resp_not_on_link_t {
 	uint8_t var[sizeof(struct opt_ia_na) + sizeof(struct opt_status_code) +
 		    sizeof(struct opt_client_id)];
 } __attribute__((__packed__)) resp_not_on_link = {
-	UH_RESP,
 	{ TYPE_REPLY, 0 },
 	SERVER_ID,
 	{ 0, },
@@ -527,12 +515,11 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 			n += sizeof(struct opt_hdr) + ntohs(client_id->l);
 
 			n = offsetof(struct resp_not_on_link_t, var) + n;
-			resp_not_on_link.uh.len = htons(n);
 
 			resp_not_on_link.hdr.xid = mh->xid;
 
-			tap_ip6_send(c, src, IPPROTO_UDP,
-				     (char *)&resp_not_on_link, n, mh->xid);
+			tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+				      mh->xid, &resp_not_on_link, n);
 
 			return 1;
 		}
@@ -576,11 +563,11 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 	n = offsetof(struct resp_t, client_id) +
 	    sizeof(struct opt_hdr) + ntohs(client_id->l);
 	n = dhcpv6_dns_fill(c, (char *)&resp, n);
-	resp.uh.len = htons(n);
 
 	resp.hdr.xid = mh->xid;
 
-	tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
+	tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546,
+		      mh->xid, &resp, n);
 	c->ip6.addr_seen = c->ip6.addr;
 
 	return 1;
diff --git a/icmp.c b/icmp.c
index 61c2d90..6493ea9 100644
--- a/icmp.c
+++ b/icmp.c
@@ -105,7 +105,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
 			icmp_id_map[V6][id].seq = seq;
 		}
 
-		tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
+		tap_icmp6_send(c, &sr6->sin6_addr,
+			       tap_ip6_daddr(c, &sr6->sin6_addr), buf, n);
 	} else {
 		struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr;
 		struct icmphdr *ih = (struct icmphdr *)buf;
diff --git a/tap.c b/tap.c
index 45547ac..b0c1481 100644
--- a/tap.c
+++ b/tap.c
@@ -170,21 +170,11 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
 		debug("tap: failed to send %lu bytes (IPv4)", len);
 }
 
-/**
- * tap_ip6_send() - Send IPv6 packet, with L2 headers, calculating L3/L4 checksums
- * @c:		Execution context
- * @src:	IPv6 source address
- * @proto:	L4 protocol number
- * @in:		Payload
- * @len:	L4 payload length
- * @flow:	Flow label
- */
-void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
-		  uint8_t proto, const char *in, size_t len, uint32_t flow)
+static void *tap_ip6_hdr(char *buf,
+			 const struct in6_addr *src, const struct in6_addr *dst,
+			 size_t len, uint8_t proto, uint32_t flow)
 {
-	char buf[USHRT_MAX];
-	struct ipv6hdr *ip6h = (struct ipv6hdr *)tap_l2_hdr(c, buf, ETH_P_IPV6);
-	char *data = (char *)(ip6h + 1);
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)buf;
 
 	ip6h->payload_len = htons(len);
 	ip6h->priority = 0;
@@ -192,24 +182,65 @@ void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
 	ip6h->nexthdr = proto;
 	ip6h->hop_limit = 255;
 	ip6h->saddr = *src;
-	ip6h->daddr = *tap_ip6_daddr(c, src);
+	ip6h->daddr = *dst;
 	ip6h->flow_lbl[0] = (flow >> 16) & 0xf;
 	ip6h->flow_lbl[1] = (flow >> 8) & 0xff;
 	ip6h->flow_lbl[2] = (flow >> 0) & 0xff;
+	return ip6h + 1;
+}
 
+/**
+ * tap_udp6_send() - Send UDP over IPv6 packet
+ * @c:		Execution context
+ * @src:	IPv6 source address
+ * @sport:	UDP source port
+ * @dst:	IPv6 destination address
+ * @dport:	UDP destination port
+ * @flow:	Flow label
+ * @in:		UDP payload contents (not including UDP header)
+ * @len:	UDP payload length (not including UDP header)
+ */
+void tap_udp6_send(const struct ctx *c,
+		   const struct in6_addr *src, in_port_t sport,
+		   const struct in6_addr *dst, in_port_t dport,
+		   uint32_t flow, const void *in, size_t len)
+{
+	size_t udplen = len + sizeof(struct udphdr);
+	char buf[USHRT_MAX];
+	void *ip6h = tap_l2_hdr(c, buf, ETH_P_IPV6);
+	void *uhp = tap_ip6_hdr(ip6h, src, dst, udplen, IPPROTO_UDP, flow);
+	struct udphdr *uh = (struct udphdr *)uhp;
+	char *data = (char *)(uh + 1);
+
+	uh->source = htons(sport);
+	uh->dest = htons(dport);
+	uh->len = htons(udplen);
+	csum_udp6(uh, src, dst, in, len);
 	memcpy(data, in, len);
 
-	if (proto == IPPROTO_UDP) {
-		struct udphdr *uh = (struct udphdr *)(ip6h + 1);
+	if (tap_send(c, buf, len + (data - buf)) < 1)
+		debug("tap: failed to send %lu bytes (IPv6)", len);
+}
 
-		csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
-			  uh + 1, len - sizeof(*uh));
-	} else if (proto == IPPROTO_ICMPV6) {
-		struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
+/**
+ * tap_icmp6_send() - Send ICMPv6 packet
+ * @c:		Execution context
+ * @src:	IPv6 source address
+ * @dst:	IPv6 destination address
+ * @in:		ICMP packet, including ICMP header
+ * @len:	ICMP packet length, including ICMP header
+ */
+void tap_icmp6_send(const struct ctx *c,
+		    const struct in6_addr *src, const struct in6_addr *dst,
+		    void *in, size_t len)
+{
+	char buf[USHRT_MAX];
+	void *ip6h = tap_l2_hdr(c, buf, ETH_P_IPV6);
+	char *data = tap_ip6_hdr(ip6h, src, dst, len, IPPROTO_ICMPV6, 0);
+	struct icmp6hdr *icmp6h = (struct icmp6hdr *)data;
 
-		csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
-			   ih + 1, len - sizeof(*ih));
-	}
+	memcpy(data, in, len);
+	csum_icmp6(icmp6h, src, dst, icmp6h + 1, len - sizeof(*icmp6h));
 
 	if (tap_send(c, buf, len + (data - buf)) < 1)
 		debug("tap: failed to send %lu bytes (IPv6)", len);
diff --git a/tap.h b/tap.h
index 011ba8e..d43c7a0 100644
--- a/tap.h
+++ b/tap.h
@@ -11,8 +11,13 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
 void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
 		  const char *in, size_t len);
-void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
-		  uint8_t proto, const char *in, size_t len, uint32_t flow);
+void tap_udp6_send(const struct ctx *c,
+		   const struct in6_addr *src, in_port_t sport,
+		   const struct in6_addr *dst, in_port_t dport,
+		   uint32_t flow, const void *in, size_t len);
+void tap_icmp6_send(const struct ctx *c,
+		    const struct in6_addr *src, const struct in6_addr *dst,
+		    void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
-- 
@@ -11,8 +11,13 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
 void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
 		  const char *in, size_t len);
-void tap_ip6_send(const struct ctx *c, const struct in6_addr *src,
-		  uint8_t proto, const char *in, size_t len, uint32_t flow);
+void tap_udp6_send(const struct ctx *c,
+		   const struct in6_addr *src, in_port_t sport,
+		   const struct in6_addr *dst, in_port_t dport,
+		   uint32_t flow, const void *in, size_t len);
+void tap_icmp6_send(const struct ctx *c,
+		    const struct in6_addr *src, const struct in6_addr *dst,
+		    void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/14] ndp: Remove unneeded eh_source parameter
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (9 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 10/14] tap: Split tap_ip6_send() into UDP and ICMP variants David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-17  8:58 ` [PATCH 12/14] ndp: Use tap_icmp6_send() helper David Gibson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

ndp() takes a parameter giving the ethernet source address of the packet
it is to respond to, which it uses to determine the destination address to
send the reply packet to.

This is not necessary, because the address will always be the guest's
MAC address.  Even if the guest has just changed MAC address, then either
tap_handler_passt() or tap_handler_pasta() - which are the only call paths
leading to ndp() will have updated c->mac_guest with the new value.

So, remove the parameter, and just use c->mac_guest, making it more
consistent with other paths where we construct packets to send inwards.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 ndp.c | 6 ++----
 ndp.h | 3 +--
 tap.c | 2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/ndp.c b/ndp.c
index 79be0cf..f96b4b7 100644
--- a/ndp.c
+++ b/ndp.c
@@ -41,13 +41,11 @@
  * ndp() - Check for NDP solicitations, reply as needed
  * @c:		Execution context
  * @ih:		ICMPv6 header
- * @eh_source:	Source Ethernet address
  * @saddr	Source IPv6 address
  *
  * Return: 0 if not handled here, 1 if handled, -1 on failure
  */
-int ndp(struct ctx *c, const struct icmp6hdr *ih,
-	const unsigned char *eh_source, const struct in6_addr *saddr)
+int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr)
 {
 	char buf[BUFSIZ] = { 0 };
 	struct ipv6hdr *ip6hr;
@@ -196,7 +194,7 @@ dns_done:
 	ip6hr->hop_limit = 255;
 
 	len += sizeof(*ehr) + sizeof(*ip6hr) + sizeof(*ihr);
-	memcpy(ehr->h_dest, eh_source, ETH_ALEN);
+	memcpy(ehr->h_dest, c->mac_guest, ETH_ALEN);
 	memcpy(ehr->h_source, c->mac, ETH_ALEN);
 	ehr->h_proto = htons(ETH_P_IPV6);
 
diff --git a/ndp.h b/ndp.h
index d857425..b012747 100644
--- a/ndp.h
+++ b/ndp.h
@@ -6,7 +6,6 @@
 #ifndef NDP_H
 #define NDP_H
 
-int ndp(struct ctx *c, const struct icmp6hdr *ih,
-	const unsigned char *eh_source, const struct in6_addr *saddr);
+int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr);
 
 #endif /* NDP_H */
diff --git a/tap.c b/tap.c
index b0c1481..274f4ba 100644
--- a/tap.c
+++ b/tap.c
@@ -560,7 +560,7 @@ resume:
 			if (l4_len < sizeof(struct icmp6hdr))
 				continue;
 
-			if (ndp(c, (struct icmp6hdr *)l4h, eh->h_source, saddr))
+			if (ndp(c, (struct icmp6hdr *)l4h, saddr))
 				continue;
 
 			tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
-- 
@@ -560,7 +560,7 @@ resume:
 			if (l4_len < sizeof(struct icmp6hdr))
 				continue;
 
-			if (ndp(c, (struct icmp6hdr *)l4h, eh->h_source, saddr))
+			if (ndp(c, (struct icmp6hdr *)l4h, saddr))
 				continue;
 
 			tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/14] ndp: Use tap_icmp6_send() helper
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (10 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 11/14] ndp: Remove unneeded eh_source parameter David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-17  8:58 ` [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants David Gibson
  2022-10-17  8:58 ` [PATCH 14/14] dhcp: Use tap_udp4_send() helper in dhcp() David Gibson
  13 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

We send ICMPv6 packets to the guest from both icmp.c and from ndp.c.  The
case in ndp() manually constructs L2 and IPv6 headers, unlike the version
in icmp.c which uses the tap_icmp6_send() helper from tap.c  Now that we've
broaded the parameters of tap_icmp6_send() we can use it in ndp() as well
saving some duplicated logic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 ndp.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/ndp.c b/ndp.c
index f96b4b7..80e1f19 100644
--- a/ndp.c
+++ b/ndp.c
@@ -47,6 +47,7 @@
  */
 int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr)
 {
+	const struct in6_addr *rsaddr; /* src addr for reply */
 	char buf[BUFSIZ] = { 0 };
 	struct ipv6hdr *ip6hr;
 	struct icmp6hdr *ihr;
@@ -180,26 +181,12 @@ dns_done:
 	else
 		c->ip6.addr_seen = *saddr;
 
-	ip6hr->daddr = *saddr;
 	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-		ip6hr->saddr = c->ip6.gw;
+		rsaddr = &c->ip6.gw;
 	else
-		ip6hr->saddr = c->ip6.addr_ll;
+		rsaddr = &c->ip6.addr_ll;
 
-	ip6hr->payload_len = htons(sizeof(*ihr) + len);
-	csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len);
-
-	ip6hr->version = 6;
-	ip6hr->nexthdr = IPPROTO_ICMPV6;
-	ip6hr->hop_limit = 255;
-
-	len += sizeof(*ehr) + sizeof(*ip6hr) + sizeof(*ihr);
-	memcpy(ehr->h_dest, c->mac_guest, ETH_ALEN);
-	memcpy(ehr->h_source, c->mac, ETH_ALEN);
-	ehr->h_proto = htons(ETH_P_IPV6);
-
-	if (tap_send(c, ehr, len) < 0)
-		perror("NDP: send");
+	tap_icmp6_send(c, rsaddr, saddr, ihr, len + sizeof(*ihr));
 
 	return 1;
 }
-- 
@@ -47,6 +47,7 @@
  */
 int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr)
 {
+	const struct in6_addr *rsaddr; /* src addr for reply */
 	char buf[BUFSIZ] = { 0 };
 	struct ipv6hdr *ip6hr;
 	struct icmp6hdr *ihr;
@@ -180,26 +181,12 @@ dns_done:
 	else
 		c->ip6.addr_seen = *saddr;
 
-	ip6hr->daddr = *saddr;
 	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-		ip6hr->saddr = c->ip6.gw;
+		rsaddr = &c->ip6.gw;
 	else
-		ip6hr->saddr = c->ip6.addr_ll;
+		rsaddr = &c->ip6.addr_ll;
 
-	ip6hr->payload_len = htons(sizeof(*ihr) + len);
-	csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len);
-
-	ip6hr->version = 6;
-	ip6hr->nexthdr = IPPROTO_ICMPV6;
-	ip6hr->hop_limit = 255;
-
-	len += sizeof(*ehr) + sizeof(*ip6hr) + sizeof(*ihr);
-	memcpy(ehr->h_dest, c->mac_guest, ETH_ALEN);
-	memcpy(ehr->h_source, c->mac, ETH_ALEN);
-	ehr->h_proto = htons(ETH_P_IPV6);
-
-	if (tap_send(c, ehr, len) < 0)
-		perror("NDP: send");
+	tap_icmp6_send(c, rsaddr, saddr, ihr, len + sizeof(*ihr));
 
 	return 1;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (11 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 12/14] ndp: Use tap_icmp6_send() helper David Gibson
@ 2022-10-17  8:58 ` David Gibson
  2022-10-18  3:06   ` Stefano Brivio
  2022-10-17  8:58 ` [PATCH 14/14] dhcp: Use tap_udp4_send() helper in dhcp() David Gibson
  13 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

tap_ip4_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv4 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 icmp.c |  3 ++-
 tap.c  | 75 +++++++++++++++++++++++++++++++++++++++++-----------------
 tap.h  |  7 ++++--
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/icmp.c b/icmp.c
index 6493ea9..233acf9 100644
--- a/icmp.c
+++ b/icmp.c
@@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
 			icmp_id_map[V4][id].seq = seq;
 		}
 
-		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
+		tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c),
+			       buf, n);
 	}
 }
 
diff --git a/tap.c b/tap.c
index 274f4ba..5792880 100644
--- a/tap.c
+++ b/tap.c
@@ -127,20 +127,10 @@ static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
 	return eh + 1;
 }
 
-/**
- * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
- * @c:		Execution context
- * @src:	IPv4 source address
- * @proto:	L4 protocol number
- * @in:		Payload
- * @len:	L4 payload length
- */
-void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
-		  const char *in, size_t len)
+static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
+			 size_t len, uint8_t proto)
 {
-	char buf[USHRT_MAX];
-	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
-	char *data = (char *)(ip4h + 1);
+	struct iphdr *ip4h = (struct iphdr *)buf;
 
 	ip4h->version = 4;
 	ip4h->ihl = sizeof(struct iphdr) / 4;
@@ -151,20 +141,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
 	ip4h->ttl = 255;
 	ip4h->protocol = proto;
 	ip4h->saddr = src;
-	ip4h->daddr = tap_ip4_daddr(c);
+	ip4h->daddr = dst;
 	csum_ip4_header(ip4h);
+	return ip4h + 1;
+}
+
+/**
+ * tap_udp4_send() - Send UDP over IPv4 packet
+ * @c:		Execution context
+ * @src:	IPv4 source address
+ * @sport:	UDP source port
+ * @dst:	IPv4 destination address
+ * @dport:	UDP destination port
+ * @in:		UDP payload contents (not including UDP header)
+ * @len:	UDP payload length (not including UDP header)
+ */
+/* cppcheck-suppress unusedFunction */
+void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
+		   in_addr_t dst, in_port_t dport,
+		   const void *in, size_t len)
+{
+	size_t udplen = len + sizeof(struct udphdr);
+	char buf[USHRT_MAX];
+	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
+	void *uhp = tap_ip4_hdr(ip4h, src, dst, udplen, IPPROTO_UDP);
+	struct udphdr *uh = (struct udphdr *)uhp;
+	char *data = (char *)(uh + 1);
 
+	uh->source = htons(sport);
+	uh->dest = htons(dport);
+	uh->len = htons(udplen);
+	csum_udp4(uh, src, dst, in, len);
 	memcpy(data, in, len);
 
-	if (ip4h->protocol == IPPROTO_UDP) {
-		struct udphdr *uh = (struct udphdr *)(ip4h + 1);
+	if (tap_send(c, buf, len + (data - buf)) < 0)
+		debug("tap: failed to send %lu bytes (IPv4)", len);
+}
 
-		csum_udp4(uh, ip4h->saddr, ip4h->daddr,
-			  uh + 1, len - sizeof(*uh));
-	} else if (ip4h->protocol == IPPROTO_ICMP) {
-		struct icmphdr *ih = (struct icmphdr *)(ip4h + 1);
-		csum_icmp4(ih, ih + 1, len - sizeof(*ih));
-	}
+/**
+ * tap_icmp4_send() - Send ICMPv4 packet
+ * @c:		Execution context
+ * @src:	IPv4 source address
+ * @dst:	IPv4 destination address
+ * @in:		ICMP packet, including ICMP header
+ * @len:	ICMP packet length, including ICMP header
+ */
+void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
+		    void *in, size_t len)
+{
+	char buf[USHRT_MAX];
+	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
+	char *data = tap_ip4_hdr(ip4h, src, dst, len, IPPROTO_ICMP);
+	struct icmphdr *icmp4h = (struct icmphdr *)data;
+
+	memcpy(data, in, len);
+	csum_icmp4(icmp4h, icmp4h + 1, len - sizeof(*icmp4h));
 
 	if (tap_send(c, buf, len + (data - buf)) < 0)
 		debug("tap: failed to send %lu bytes (IPv4)", len);
diff --git a/tap.h b/tap.h
index d43c7a0..743bc58 100644
--- a/tap.h
+++ b/tap.h
@@ -7,10 +7,13 @@
 #define TAP_H
 
 in_addr_t tap_ip4_daddr(const struct ctx *c);
+void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
+		   in_addr_t dst, in_port_t dport,
+		   const void *in, size_t len);
+void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
+		    void *in, size_t len);
 const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
-void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
-		  const char *in, size_t len);
 void tap_udp6_send(const struct ctx *c,
 		   const struct in6_addr *src, in_port_t sport,
 		   const struct in6_addr *dst, in_port_t dport,
-- 
@@ -7,10 +7,13 @@
 #define TAP_H
 
 in_addr_t tap_ip4_daddr(const struct ctx *c);
+void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
+		   in_addr_t dst, in_port_t dport,
+		   const void *in, size_t len);
+void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
+		    void *in, size_t len);
 const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
 				     const struct in6_addr *src);
-void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
-		  const char *in, size_t len);
 void tap_udp6_send(const struct ctx *c,
 		   const struct in6_addr *src, in_port_t sport,
 		   const struct in6_addr *dst, in_port_t dport,
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 14/14] dhcp: Use tap_udp4_send() helper in dhcp()
  2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
                   ` (12 preceding siblings ...)
  2022-10-17  8:58 ` [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants David Gibson
@ 2022-10-17  8:58 ` David Gibson
  13 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-17  8:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, David Gibson

The IPv4 specific dhcp() manually constructs L2 and IP headers to send its
DHCP reply packet, unlike its IPv6 equivalent in dhcpv6.c which uses the
tap_udp6_send() helper.  Now that we've broaded the parameters to
tap_udp4_send() we can use it in dhcp() to avoid some duplicated logic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 dhcp.c | 18 ++----------------
 tap.c  |  1 -
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/dhcp.c b/dhcp.c
index 2b3af82..d22698a 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -363,22 +363,8 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	if (!c->no_dhcp_dns_search)
 		opt_set_dns_search(c, sizeof(m->o));
 
-	uh->len = htons(len = offsetof(struct msg, o) + fill(m) + sizeof(*uh));
-	uh->source = htons(67);
-	uh->dest = htons(68);
-	csum_udp4(uh, c->ip4.gw, c->ip4.addr, uh + 1, len - sizeof(*uh));
-
-	iph->tot_len = htons(len += sizeof(*iph));
-	iph->daddr = c->ip4.addr;
-	iph->saddr = c->ip4.gw;
-	csum_ip4_header(iph);
-
-	len += sizeof(*eh);
-	memcpy(eh->h_dest, eh->h_source, ETH_ALEN);
-	memcpy(eh->h_source, c->mac, ETH_ALEN);
-
-	if (tap_send(c, eh, len) < 0)
-		perror("DHCP: send");
+	len = offsetof(struct msg, o) + fill(m);
+	tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, len);
 
 	return 1;
 }
diff --git a/tap.c b/tap.c
index 5792880..75f1e38 100644
--- a/tap.c
+++ b/tap.c
@@ -156,7 +156,6 @@ static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
  * @in:		UDP payload contents (not including UDP header)
  * @len:	UDP payload length (not including UDP header)
  */
-/* cppcheck-suppress unusedFunction */
 void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
 		   in_addr_t dst, in_port_t dport,
 		   const void *in, size_t len)
-- 
@@ -156,7 +156,6 @@ static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
  * @in:		UDP payload contents (not including UDP header)
  * @len:	UDP payload length (not including UDP header)
  */
-/* cppcheck-suppress unusedFunction */
 void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
 		   in_addr_t dst, in_port_t dport,
 		   const void *in, size_t len)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums
  2022-10-17  8:57 ` [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums David Gibson
@ 2022-10-18  3:01   ` Stefano Brivio
  2022-10-18 12:05     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:01 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:54 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> At least two places in passt calculate ICMPv6 checksums, ndp() and
> tap_ip_send().  Add a helper to handle this calculation in both places.
> For future flexibility, the new helper takes parameters for the fields in
> the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
> be explicitly constructed.  It also allows the ICMPv6 header and payload to
> be in separate buffers, although we don't use this yet.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  checksum.c | 27 +++++++++++++++++++++++++++
>  checksum.h |  7 +++++++
>  ndp.c      |  5 +----
>  tap.c      |  6 ++----
>  4 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 56ad01e..0e207c8 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -52,6 +52,8 @@
>  #include <stddef.h>
>  #include <stdint.h>
>  
> +#include <linux/icmpv6.h>
> +
>  /**
>   * sum_16b() - Calculate sum of 16-bit words
>   * @buf:	Input buffer
> @@ -105,6 +107,31 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
>  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
>  }
>  
> +/**
> + * csum_icmp6() - Calculate checksum for an ICMPv6 packet

"Calculate and set" ...?

> + * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> + * @saddr:	IPv6 source address
> + * @daddr:	IPv6 destination address
> + * @payload:	ICMP packet payload
> + * @len:	Length of @payload (not including ICMPv6 header)
> + */
> +void csum_icmp6(struct icmp6hdr *icmp6hr,
> +		const struct in6_addr *saddr,
> +		const struct in6_addr *daddr,

I think:
		const struct in6_addr *saddr, const struct in6_addr *daddr,

would be easier on eyes.

> +		const void *payload,
> +		size_t len)
> +{
> +	/* Partial checksum for the pseudo-IPv6 header */
> +	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> +		sum_16b(daddr, sizeof(*daddr)) +
> +		htons(len + sizeof(*icmp6hr)) + htons(IPPROTO_ICMPV6);

Maybe:

	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
			sum_16b(daddr, sizeof(*daddr)) +
			htons(len + sizeof(*icmp6hr))  + htons(IPPROTO_ICMPV6);

for me, it turns things from "sum a bunch of things" to "addresses and
something else".

> +
> +	icmp6hr->icmp6_cksum = 0;
> +	/* Add in partial checksum for the ICMPv6 header alone */
> +	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
> +	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
> +}
> +
>  /**
>   * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place
>   * @iph:	Packet buffer, IP header
> diff --git a/checksum.h b/checksum.h
> index 5418406..2c72200 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -6,9 +6,16 @@
>  #ifndef CHECKSUM_H
>  #define CHECKSUM_H
>  
> +struct icmp6hdr;
> +
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> +void csum_icmp6(struct icmp6hdr *ih,
> +		const struct in6_addr *saddr,
> +		const struct in6_addr *daddr,
> +		const void *payload,
> +		size_t len);

It looks a bit like Haskell. ;) I would really use the horizontal space
we have.

>  void csum_tcp4(struct iphdr *iph);
>  uint16_t csum(const void *buf, size_t len, uint32_t init);
>  
> diff --git a/ndp.c b/ndp.c
> index dec36a9..03f1d06 100644
> --- a/ndp.c
> +++ b/ndp.c
> @@ -189,10 +189,7 @@ dns_done:
>  		ip6hr->saddr = c->ip6.addr_ll;
>  
>  	ip6hr->payload_len = htons(sizeof(*ihr) + len);
> -	ip6hr->hop_limit = IPPROTO_ICMPV6;
> -	ihr->icmp6_cksum = 0;
> -	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
> -						 sizeof(*ihr) + len, 0);
> +	csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len);

Nice to see this all going away!

>  	ip6hr->version = 6;
>  	ip6hr->nexthdr = IPPROTO_ICMPV6;
> diff --git a/tap.c b/tap.c
> index 8b6d9bc..aafc92b 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -191,10 +191,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  		} else if (proto == IPPROTO_ICMPV6) {
>  			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
>  
> -			ih->icmp6_cksum = 0;
> -			ih->icmp6_cksum = csum_unaligned(ip6h,
> -							 len + sizeof(*ip6h),
> -							 0);
> +			csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
> +				   ih + 1, len - sizeof(*ih));
>  		}
>  		ip6h->version = 6;
>  		ip6h->nexthdr = proto;

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums
  2022-10-17  8:57 ` [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums David Gibson
@ 2022-10-18  3:01   ` Stefano Brivio
  2022-10-18 12:06     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:01 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:55 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Although tap_ip_send() is currently the only place calculating ICMPv4
> checksums, create a helper function for symmetry with ICMPv6.  For future
> flexibility it allows the ICMPv6 header and payload to be in separate
> buffers.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  checksum.c | 15 +++++++++++++++
>  checksum.h |  2 ++
>  tap.c      |  4 +---
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 0e207c8..c8b6b42 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -52,6 +52,7 @@
>  #include <stddef.h>
>  #include <stdint.h>
>  
> +#include <linux/icmp.h>
>  #include <linux/icmpv6.h>
>  
>  /**
> @@ -107,6 +108,20 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
>  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
>  }
>  
> +/**
> + * csum_icmp4() - Calculate checksum for an ICMPv4 packet

"Calculate and set"? By the way, there's no such thing as ICMPv4 --
it's ICMP.

> + * @icmp4hr:	ICMPv4 header, initialized apart from checksum

...-ised, if you respin. For consistency, I would call this 'ih'.

> + * @payload:	ICMPv4 packet payload
> + * @len:	Length of @payload (not including ICMPv4 header)
> + */
> +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)

I guess csum_icmp() is preferable. Indeed, for TCP and UDP 'tcp4' and
'udp4' make sense because those are the same protocols over IPv4 and
IPv6.

> +{
> +	/* Partial checksum for ICMPv4 header alone */
> +	uint32_t hrsum = sum_16b(icmp4hr, sizeof(*icmp4hr));

A white line would be nice here.

I would call this psum (same as in csum_icmp6()) or hdrsum, 'hr' isn't
really used for "header" elsewhere.

> +	icmp4hr->checksum = 0;
> +	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
> +}
> +
>  /**
>   * csum_icmp6() - Calculate checksum for an ICMPv6 packet
>   * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> diff --git a/checksum.h b/checksum.h
> index 2c72200..ff95cf9 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -6,11 +6,13 @@
>  #ifndef CHECKSUM_H
>  #define CHECKSUM_H
>  
> +struct icmphdr;
>  struct icmp6hdr;
>  
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> +void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
>  void csum_icmp6(struct icmp6hdr *ih,
>  		const struct in6_addr *saddr,
>  		const struct in6_addr *daddr,
> diff --git a/tap.c b/tap.c
> index aafc92b..f082901 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -148,9 +148,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  			uh->check = 0;
>  		} else if (iph->protocol == IPPROTO_ICMP) {
>  			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
> -
> -			ih->checksum = 0;
> -			ih->checksum = csum_unaligned(ih, len, 0);
> +			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
>  		}
>  
>  		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0)

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums
  2022-10-17  8:57 ` [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums David Gibson
@ 2022-10-18  3:02   ` Stefano Brivio
  2022-10-18 12:06     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:02 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:56 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Add a helper for calculating UDP checksums when used over IPv6
> For future flexibility, the new helper takes parameters for the fields in
> the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
> be explicitly constructed.  It also allows the UDP header and payload to
> be in separate buffers, although we don't use this yet.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  checksum.c | 23 +++++++++++++++++++++++
>  checksum.h |  5 +++++
>  tap.c      |  5 ++---
>  3 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index c8b6b42..0849fb1 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -52,6 +52,7 @@
>  #include <stddef.h>
>  #include <stdint.h>
>  
> +#include <linux/udp.h>
>  #include <linux/icmp.h>
>  #include <linux/icmpv6.h>
>  
> @@ -122,6 +123,28 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
>  	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
>  }
>  
> +/**
> + * csum_udp6() - Calculate checksum for a UDP over IPv6 packet

Calculate and set.

> + * @udp6hr:	UDP header, initialized apart from checksum

-ised.

> + * @payload:	UDP packet payload
> + * @len:	Length of @payload (not including UDP header)
> + */
> +void csum_udp6(struct udphdr *udp6hr,
> +	       const struct in6_addr *saddr,
> +	       const struct in6_addr *daddr,

You could use some horizontal space.

> +	       const void *payload, size_t len)
> +{
> +	/* Partial checksum for the pseudo-IPv6 header */
> +	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> +		sum_16b(daddr, sizeof(*daddr)) +
> +		htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);

Alignment:

	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
			sum_16b(daddr, sizeof(*daddr)) +
			htons(len + sizeof(*udp6hr))   + htons(IPPROTO_UDP);

> +	udp6hr->check = 0;
> +	/* Add in partial checksum for the UDP header alone */
> +	psum += sum_16b(udp6hr, sizeof(*udp6hr));
> +	udp6hr->check = csum_unaligned(payload, len, psum);
> +}
> +
>  /**
>   * csum_icmp6() - Calculate checksum for an ICMPv6 packet
>   * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> diff --git a/checksum.h b/checksum.h
> index ff95cf9..1b9f48e 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -6,6 +6,7 @@
>  #ifndef CHECKSUM_H
>  #define CHECKSUM_H
>  
> +struct udphdr;
>  struct icmphdr;
>  struct icmp6hdr;
>  
> @@ -13,6 +14,10 @@ uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
>  void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
> +void csum_udp6(struct udphdr *udp6hr,
> +	       const struct in6_addr *saddr,
> +	       const struct in6_addr *daddr,
> +	       const void *payload, size_t len);

Use some horizontal space.

>  void csum_icmp6(struct icmp6hdr *ih,
>  		const struct in6_addr *saddr,
>  		const struct in6_addr *daddr,
> diff --git a/tap.c b/tap.c
> index f082901..9c197cb 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -183,9 +183,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  		} else if (proto == IPPROTO_UDP) {
>  			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
>  
> -			uh->check = 0;
> -			uh->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
> -						   0);
> +			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
> +				  uh + 1, len - sizeof(*uh));
>  		} else if (proto == IPPROTO_ICMPV6) {
>  			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
>  

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums
  2022-10-17  8:57 ` [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums David Gibson
@ 2022-10-18  3:03   ` Stefano Brivio
  2022-10-18 12:06     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:03 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:57 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> At least two places in passt fill in UDP over IPv4 checksums, although
> since UDP checksums are optional with IPv4 that just amounts to storing
> a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in
> dhcp()).  For consistency, add a helper for this "calculation".
> 
> Just for the heck of it, add the option (compile time disabled for now) to
> calculate real UDP checksums.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  checksum.c | 33 +++++++++++++++++++++++++++++++++
>  checksum.h |  3 +++
>  dhcp.c     |  2 +-
>  tap.c      |  2 +-
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 0849fb1..72f1cfb 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -56,6 +56,11 @@
>  #include <linux/icmp.h>
>  #include <linux/icmpv6.h>
>  
> +/* Checksums are optional for UDP over IPv4, so we usually just set
> + * them to 0.  Change this 1 to calculate real UDP over IPv4 checksums

to 1

> + */
> +#define UDP4_REAL_CHECKSUMS	0
> +
>  /**
>   * sum_16b() - Calculate sum of 16-bit words
>   * @buf:	Input buffer
> @@ -109,6 +114,34 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
>  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
>  }
>  
> +/**
> + * csum_udp4() - Calculate checksum for a UDP over IPv4 packet

and set

> + * @udp4hr:	UDP header, initialized apart from checksum
> + * @saddr:	IPv4 source address
> + * @daddr:	IPv4 destination address
> + * @payload:	ICMPv4 packet payload
> + * @len:	Length of @payload (not including UDP)
> + */
> +void csum_udp4(struct udphdr *udp4hr,
> +	       in_addr_t saddr, in_addr_t daddr,
> +	       const void *payload, size_t len)
> +{
> +	/* UDP checksums are optional, so don't bother */
> +	udp4hr->check = 0;
> +
> +	if (UDP4_REAL_CHECKSUMS) {
> +		/* UNTESTED: if we did want real UDPv4 checksums, this
> +		 * is roughly what we'd need */
> +		uint32_t psum = csum_fold(htonl(saddr))
> +			+ csum_fold(htonl(daddr))
> +			+ htons(len + sizeof(*udp4hr))
> +			+ htons(IPPROTO_UDP);
> +		/* Add in partial checksum for the UDP header alone */
> +		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> +		udp4hr->check = csum_unaligned(payload, len, psum);
> +	}
> +}
> +
>  /**
>   * csum_icmp4() - Calculate checksum for an ICMPv4 packet
>   * @icmp4hr:	ICMPv4 header, initialized apart from checksum
> diff --git a/checksum.h b/checksum.h
> index 1b9f48e..a9502b9 100644
> --- a/checksum.h
> +++ b/checksum.h
> @@ -13,6 +13,9 @@ struct icmp6hdr;
>  uint32_t sum_16b(const void *buf, size_t len);
>  uint16_t csum_fold(uint32_t sum);
>  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> +void csum_udp4(struct udphdr *udp4hr,
> +	       in_addr_t saddr, in_addr_t daddr,
> +	       const void *payload, size_t len);

Horizontal space.

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums
  2022-10-17  8:57 ` [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums David Gibson
@ 2022-10-18  3:03   ` Stefano Brivio
  2022-10-18 12:07     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:03 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:58 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> We calculate IPv4 header checksums in at least two places, in dhcp() and
> in tap_ip_send.  Add a helper to handle this calculation in both places.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  checksum.c | 6 ++++++
>  checksum.h | 1 +
>  dhcp.c     | 3 +--
>  tap.c      | 3 +--
>  4 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/checksum.c b/checksum.c
> index 72f1cfb..f25a96a 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -114,6 +114,12 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
>  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
>  }
>  

/**
 * csum_ip4_header() - Calculate and set IPv4 header checksum
 * @iph:	IPv4 header
 */

...I just tried to run Doxygen, I think it would be nice to have
eventually (especially for DOT call graphs), things don't look too bad.

> +void csum_ip4_header(struct iphdr *ip4hr)
> +{
> +	ip4hr->check = 0;
> +	ip4hr->check = csum_unaligned(ip4hr, (size_t)ip4hr->ihl * 4, 0);

iph, for consistency.

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/14] Add helpers for normal inbound packet destination addresses
  2022-10-17  8:57 ` [PATCH 06/14] Add helpers for normal inbound packet destination addresses David Gibson
@ 2022-10-18  3:04   ` Stefano Brivio
  2022-10-18 12:07     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:04 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:57:59 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> tap_ip_send() doesn't take a destination address, because it's specifically
> for inbound packets, and the IP addresses of the guest/namespace are
> already known to us.  Rather than open-coding this destination address
> logic, make helper functions for it which will enable some later cleanups.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  tap.c | 29 ++++++++++++++++++++++++-----
>  tap.h |  3 +++
>  2 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index de02c56..41e8ff2 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -96,6 +96,28 @@ int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre)
>  	return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len);
>  }
>  
> +/**
> + * tap_ip4_daddr() - Normal IPv4 destination address for inbound packets
> + * @c:		Execution context

Given that the address is returned in network order, I think this would
be relevant here:

 * Return: IPv4 address, network order

> + */
> +in_addr_t tap_ip4_daddr(const struct ctx *c)
> +{
> +	return c->ip4.addr_seen;
> +}
> +
> +/**
> + * tap_ip6_daddr() - Normal IPv4 destination address for inbound packets
> + * @c:		Execution context
> + * @src:	Source address

 * Return: pointer to IPv6 address

> + */
> +const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> +				     const struct in6_addr *src)
> +{
> +	if (IN6_IS_ADDR_LINKLOCAL(src))
> +		return &c->ip6.addr_ll_seen;
> +	return &c->ip6.addr_seen;
> +}
> +
>  /**
>   * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
>   * @c:		Execution context
> @@ -132,7 +154,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  		iph->frag_off = 0;
>  		iph->ttl = 255;
>  		iph->protocol = proto;
> -		iph->daddr = c->ip4.addr_seen;
> +		iph->daddr = tap_ip4_daddr(c);
>  		memcpy(&iph->saddr, &src->s6_addr[12], 4);
>  
>  		csum_ip4_header(iph);
> @@ -163,10 +185,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  		ip6h->priority = 0;
>  
>  		ip6h->saddr = *src;
> -		if (IN6_IS_ADDR_LINKLOCAL(src))
> -			ip6h->daddr = c->ip6.addr_ll_seen;
> -		else
> -			ip6h->daddr = c->ip6.addr_seen;
> +		ip6h->daddr = *tap_ip6_daddr(c, src);
>  
>  		memcpy(data, in, len);
>  
> diff --git a/tap.h b/tap.h
> index df3aec0..a6764b4 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -6,6 +6,9 @@
>  #ifndef TAP_H
>  #define TAP_H
>  
> +in_addr_t tap_ip4_daddr(const struct ctx *c);
> +const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> +				     const struct in6_addr *src);
>  void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
>  		 const char *in, size_t len, uint32_t flow);
>  int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send()
  2022-10-17  8:58 ` [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send() David Gibson
@ 2022-10-18  3:05   ` Stefano Brivio
  2022-10-18 12:07     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:05 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:58:01 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Callers of tap_send() can optionally use a small optimization by adding
> extra space for the 4 byte length header used on the qemu socket interface.
> tap_ip_send() is currently the only user of this, but this is used only
> for "slow path" ICMP and DHCP packets, so there's not a lot of value to
> the optimization.

Not anymore, definitely. It used to look quite bad in perf(1) when I
shuffled connection data around.

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions
  2022-10-17  8:58 ` [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions David Gibson
@ 2022-10-18  3:06   ` Stefano Brivio
  2022-10-18 12:07     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:06 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:58:02 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and
> it turns out that every caller (statically) knows if it is using IPv4 or
> IPv6.  So split into separate tap_ip4_send() and tap_ip6_send() functions.
> Use a new tap_l2_hdr() function for the very small common part.
> 
> While we're there, make some minor cleanups:
>   - We were double writing some fields in the IPv6 header, so that it
>     temporary matched the pseudo-header for checksum calculation.  With
>     recent checksum reworks, this isn't neccessary any more.
>   - We don't use any IPv4 header options, so use some sizeof() constructs
>     instead of some open coded values for header length.
>   - The comment used to say that the flow label was for TCP over IPv6, but
>     in fact the only thing we used it for was ICMPv6

...right, this used to be the data path.

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  dhcpv6.c |   6 +-
>  icmp.c   |  10 +---
>  tap.c    | 176 +++++++++++++++++++++++++++++--------------------------
>  tap.h    |   6 +-
>  4 files changed, 102 insertions(+), 96 deletions(-)
> 
> diff --git a/dhcpv6.c b/dhcpv6.c
> index e7640ce..7829968 100644
> --- a/dhcpv6.c
> +++ b/dhcpv6.c
> @@ -531,8 +531,8 @@ int dhcpv6(struct ctx *c, const struct pool *p,
>  
>  			resp_not_on_link.hdr.xid = mh->xid;
>  
> -			tap_ip_send(c, src, IPPROTO_UDP,
> -				    (char *)&resp_not_on_link, n, mh->xid);
> +			tap_ip6_send(c, src, IPPROTO_UDP,
> +				     (char *)&resp_not_on_link, n, mh->xid);
>  
>  			return 1;
>  		}
> @@ -580,7 +580,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
>  
>  	resp.hdr.xid = mh->xid;
>  
> -	tap_ip_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
> +	tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
>  	c->ip6.addr_seen = c->ip6.addr;
>  
>  	return 1;
> diff --git a/icmp.c b/icmp.c
> index 21ea2d7..61c2d90 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -69,10 +69,6 @@ static uint8_t icmp_act[IP_VERSIONS][DIV_ROUND_UP(ICMP_NUM_IDS, 8)];
>  void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  		       uint32_t events, const struct timespec *now)
>  {
> -	struct in6_addr a6 = { .s6_addr = {    0,    0,    0,    0,
> -					       0,    0,    0,    0,
> -					       0,    0, 0xff, 0xff,
> -					       0,    0,    0,    0 } };
>  	union icmp_epoll_ref *iref = &ref.r.p.icmp;
>  	struct sockaddr_storage sr;
>  	socklen_t sl = sizeof(sr);
> @@ -109,7 +105,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  			icmp_id_map[V6][id].seq = seq;
>  		}
>  
> -		tap_ip_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
> +		tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
>  	} else {
>  		struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr;
>  		struct icmphdr *ih = (struct icmphdr *)buf;
> @@ -127,9 +123,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  			icmp_id_map[V4][id].seq = seq;
>  		}
>  
> -		memcpy(&a6.s6_addr[12], &sr4->sin_addr, sizeof(sr4->sin_addr));
> -
> -		tap_ip_send(c, &a6, IPPROTO_ICMP, buf, n, 0);
> +		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
>  	}
>  }
>  
> diff --git a/tap.c b/tap.c
> index ae75fac..45547ac 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -109,100 +109,110 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
>  }
>  
>  /**
> - * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
> + * tap_l2_hdr() - Build an L2 header for an inbound packet
>   * @c:		Execution context
> - * @src:	IPv6 source address, IPv4-mapped for IPv4 sources
> - * @proto:	L4 protocol number
> - * @in:		Payload
> - * @len:	L4 payload length
> - * @flow:	Flow label for TCP over IPv6
> + * @buf:	Buffer address at which to generate header
> + * @proto:	Ethernet protocol number for L3
> + *
> + * Returns a pointer at which to write the packet's payload

 * Return: ...

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants
  2022-10-17  8:58 ` [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants David Gibson
@ 2022-10-18  3:06   ` Stefano Brivio
  2022-10-18 12:07     ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18  3:06 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 17 Oct 2022 19:58:06 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> tap_ip4_send() has special case logic to compute the checksums for UDP
> and ICMP packets, which is a mild layering violation.  By using a suitable
> helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
> without greatly increasing the code size, this removing that layering
> violation.
> 
> We make some small changes to the interface while there.  In both cases
> we make the destination IPv4 address a parameter, which will be useful
> later.  For the UDP variant we make it take just the UDP payload, and it
> will generate the UDP header.  For the ICMP variant we pass in the ICMP
> header as before.  The inconsistency is because that's what seems to be
> the more natural way to invoke the function in the callers in each case.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  icmp.c |  3 ++-
>  tap.c  | 75 +++++++++++++++++++++++++++++++++++++++++-----------------
>  tap.h  |  7 ++++--
>  3 files changed, 60 insertions(+), 25 deletions(-)
> 
> diff --git a/icmp.c b/icmp.c
> index 6493ea9..233acf9 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  			icmp_id_map[V4][id].seq = seq;
>  		}
>  
> -		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
> +		tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c),
> +			       buf, n);
>  	}
>  }
>  
> diff --git a/tap.c b/tap.c
> index 274f4ba..5792880 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -127,20 +127,10 @@ static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
>  	return eh + 1;
>  }
>  
> -/**
> - * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
> - * @c:		Execution context
> - * @src:	IPv4 source address
> - * @proto:	L4 protocol number
> - * @in:		Payload
> - * @len:	L4 payload length
> - */
> -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> -		  const char *in, size_t len)

I understand why you return ip(4)h + 1 here because I've just reviewed
9/14, I wouldn't know otherwise:

/**
 * tap_ip4_hdr() - Build IPv4 header for inbound packet, with checksum
 * @c:		Execution context
 * @src:	IPv4 source address, network order
 * @dst:	IPv4 destination address, network order
 * @len:	L4 payload length
 * @proto:	L4 protocol number
 *
 * Return: pointer to write payload to
 */
> +static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
> +			 size_t len, uint8_t proto)
>  {
> -	char buf[USHRT_MAX];
> -	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
> -	char *data = (char *)(ip4h + 1);
> +	struct iphdr *ip4h = (struct iphdr *)buf;
>  
>  	ip4h->version = 4;
>  	ip4h->ihl = sizeof(struct iphdr) / 4;
> @@ -151,20 +141,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
>  	ip4h->ttl = 255;
>  	ip4h->protocol = proto;
>  	ip4h->saddr = src;
> -	ip4h->daddr = tap_ip4_daddr(c);
> +	ip4h->daddr = dst;
>  	csum_ip4_header(ip4h);
> +	return ip4h + 1;
> +}
> +
> +/**
> + * tap_udp4_send() - Send UDP over IPv4 packet
> + * @c:		Execution context
> + * @src:	IPv4 source address
> + * @sport:	UDP source port
> + * @dst:	IPv4 destination address
> + * @dport:	UDP destination port
> + * @in:		UDP payload contents (not including UDP header)
> + * @len:	UDP payload length (not including UDP header)
> + */
> +/* cppcheck-suppress unusedFunction */
> +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> +		   in_addr_t dst, in_port_t dport,
> +		   const void *in, size_t len)
> +{
> +	size_t udplen = len + sizeof(struct udphdr);
> +	char buf[USHRT_MAX];
> +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> +	void *uhp = tap_ip4_hdr(ip4h, src, dst, udplen, IPPROTO_UDP);

Two observations:

- this saves one line and one cast, but it's really a bit unnatural that
  tap_ip4_hdr() doesn't point to the header it just made, or to nothing.

  I would rather have to +1 the return value or the original pointer
  instead or having this trick

> +	struct udphdr *uh = (struct udphdr *)uhp;
> +	char *data = (char *)(uh + 1);

- it's longer, but in my opinion clearer, if we split a bit more clearly
  the components of the packet, that is, something like (untested):

char buf[USHRT_MAX];
struct udphdr *uh;
struct iphdr *iph;
char *data;

iph = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP) + 1;
tap_ip_hdr(iph, src, dst, len + sizeof(uh), IPPROTO_UDP);

uh = (struct udphdr *)iph + 1;
uh->source = htons(sport);
uh->dest = htons(dport);
uh->len = htons(len + sizeof(uh));
csum_udp4(uh, src, dst, in, len);

data = uh + 1;
memcpy(data, in, len);

if (tap_send(c, buf, len + (data - buf)) < 0)
	debug("tap: failed to send %lu bytes (IPv4)", len);
>  
> +	uh->source = htons(sport);
> +	uh->dest = htons(dport);
> +	uh->len = htons(udplen);
> +	csum_udp4(uh, src, dst, in, len);
>  	memcpy(data, in, len);
>  
> -	if (ip4h->protocol == IPPROTO_UDP) {
> -		struct udphdr *uh = (struct udphdr *)(ip4h + 1);
> +	if (tap_send(c, buf, len + (data - buf)) < 0)
> +		debug("tap: failed to send %lu bytes (IPv4)", len);
> +}
>  
> -		csum_udp4(uh, ip4h->saddr, ip4h->daddr,
> -			  uh + 1, len - sizeof(*uh));
> -	} else if (ip4h->protocol == IPPROTO_ICMP) {
> -		struct icmphdr *ih = (struct icmphdr *)(ip4h + 1);
> -		csum_icmp4(ih, ih + 1, len - sizeof(*ih));
> -	}
> +/**
> + * tap_icmp4_send() - Send ICMPv4 packet
> + * @c:		Execution context
> + * @src:	IPv4 source address
> + * @dst:	IPv4 destination address
> + * @in:		ICMP packet, including ICMP header
> + * @len:	ICMP packet length, including ICMP header
> + */
> +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
> +		    void *in, size_t len)
> +{
> +	char buf[USHRT_MAX];
> +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> +	char *data = tap_ip4_hdr(ip4h, src, dst, len, IPPROTO_ICMP);
> +	struct icmphdr *icmp4h = (struct icmphdr *)data;

...same here, even though perhaps not so apparent.

> +
> +	memcpy(data, in, len);
> +	csum_icmp4(icmp4h, icmp4h + 1, len - sizeof(*icmp4h));
>  
>  	if (tap_send(c, buf, len + (data - buf)) < 0)
>  		debug("tap: failed to send %lu bytes (IPv4)", len);
> diff --git a/tap.h b/tap.h
> index d43c7a0..743bc58 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -7,10 +7,13 @@
>  #define TAP_H
>  
>  in_addr_t tap_ip4_daddr(const struct ctx *c);
> +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> +		   in_addr_t dst, in_port_t dport,
> +		   const void *in, size_t len);
> +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
> +		    void *in, size_t len);
>  const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
>  				     const struct in6_addr *src);
> -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> -		  const char *in, size_t len);
>  void tap_udp6_send(const struct ctx *c,
>  		   const struct in6_addr *src, in_port_t sport,
>  		   const struct in6_addr *dst, in_port_t dport,

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums
  2022-10-18  3:01   ` Stefano Brivio
@ 2022-10-18 12:05     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:05 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 5171 bytes --]

On Tue, Oct 18, 2022 at 05:01:01AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:57:54 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > At least two places in passt calculate ICMPv6 checksums, ndp() and
> > tap_ip_send().  Add a helper to handle this calculation in both places.
> > For future flexibility, the new helper takes parameters for the fields in
> > the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
> > be explicitly constructed.  It also allows the ICMPv6 header and payload to
> > be in separate buffers, although we don't use this yet.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  checksum.c | 27 +++++++++++++++++++++++++++
> >  checksum.h |  7 +++++++
> >  ndp.c      |  5 +----
> >  tap.c      |  6 ++----
> >  4 files changed, 37 insertions(+), 8 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index 56ad01e..0e207c8 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -52,6 +52,8 @@
> >  #include <stddef.h>
> >  #include <stdint.h>
> >  
> > +#include <linux/icmpv6.h>
> > +
> >  /**
> >   * sum_16b() - Calculate sum of 16-bit words
> >   * @buf:	Input buffer
> > @@ -105,6 +107,31 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> >  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> >  }
> >  
> > +/**
> > + * csum_icmp6() - Calculate checksum for an ICMPv6 packet
> 
> "Calculate and set" ...?

Done.

> > + * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> > + * @saddr:	IPv6 source address
> > + * @daddr:	IPv6 destination address
> > + * @payload:	ICMP packet payload
> > + * @len:	Length of @payload (not including ICMPv6 header)
> > + */
> > +void csum_icmp6(struct icmp6hdr *icmp6hr,
> > +		const struct in6_addr *saddr,
> > +		const struct in6_addr *daddr,
> 
> I think:
> 		const struct in6_addr *saddr, const struct in6_addr *daddr,
> 
> would be easier on eyes.

Done.  Not sure why I did it that way in the first place.

> > +		const void *payload,
> > +		size_t len)
> > +{
> > +	/* Partial checksum for the pseudo-IPv6 header */
> > +	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> > +		sum_16b(daddr, sizeof(*daddr)) +
> > +		htons(len + sizeof(*icmp6hr)) + htons(IPPROTO_ICMPV6);
> 
> Maybe:
> 
> 	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> 			sum_16b(daddr, sizeof(*daddr)) +
> 			htons(len + sizeof(*icmp6hr))  + htons(IPPROTO_ICMPV6);
> 
> for me, it turns things from "sum a bunch of things" to "addresses and
> something else".

Fair enough, done.

> > +
> > +	icmp6hr->icmp6_cksum = 0;
> > +	/* Add in partial checksum for the ICMPv6 header alone */
> > +	psum += sum_16b(icmp6hr, sizeof(*icmp6hr));
> > +	icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum);
> > +}
> > +
> >  /**
> >   * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place
> >   * @iph:	Packet buffer, IP header
> > diff --git a/checksum.h b/checksum.h
> > index 5418406..2c72200 100644
> > --- a/checksum.h
> > +++ b/checksum.h
> > @@ -6,9 +6,16 @@
> >  #ifndef CHECKSUM_H
> >  #define CHECKSUM_H
> >  
> > +struct icmp6hdr;
> > +
> >  uint32_t sum_16b(const void *buf, size_t len);
> >  uint16_t csum_fold(uint32_t sum);
> >  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> > +void csum_icmp6(struct icmp6hdr *ih,
> > +		const struct in6_addr *saddr,
> > +		const struct in6_addr *daddr,
> > +		const void *payload,
> > +		size_t len);
> 
> It looks a bit like Haskell. ;) I would really use the horizontal space
> we have.
> 
> >  void csum_tcp4(struct iphdr *iph);
> >  uint16_t csum(const void *buf, size_t len, uint32_t init);
> >  
> > diff --git a/ndp.c b/ndp.c
> > index dec36a9..03f1d06 100644
> > --- a/ndp.c
> > +++ b/ndp.c
> > @@ -189,10 +189,7 @@ dns_done:
> >  		ip6hr->saddr = c->ip6.addr_ll;
> >  
> >  	ip6hr->payload_len = htons(sizeof(*ihr) + len);
> > -	ip6hr->hop_limit = IPPROTO_ICMPV6;
> > -	ihr->icmp6_cksum = 0;
> > -	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
> > -						 sizeof(*ihr) + len, 0);
> > +	csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len);
> 
> Nice to see this all going away!
> 
> >  	ip6hr->version = 6;
> >  	ip6hr->nexthdr = IPPROTO_ICMPV6;
> > diff --git a/tap.c b/tap.c
> > index 8b6d9bc..aafc92b 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -191,10 +191,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  		} else if (proto == IPPROTO_ICMPV6) {
> >  			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
> >  
> > -			ih->icmp6_cksum = 0;
> > -			ih->icmp6_cksum = csum_unaligned(ip6h,
> > -							 len + sizeof(*ip6h),
> > -							 0);
> > +			csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr,
> > +				   ih + 1, len - sizeof(*ih));
> >  		}
> >  		ip6h->version = 6;
> >  		ip6h->nexthdr = proto;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums
  2022-10-18  3:01   ` Stefano Brivio
@ 2022-10-18 12:06     ` David Gibson
  2022-10-18 12:28       ` Stefano Brivio
  0 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:06 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4051 bytes --]

On Tue, Oct 18, 2022 at 05:01:51AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:57:55 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Although tap_ip_send() is currently the only place calculating ICMPv4
> > checksums, create a helper function for symmetry with ICMPv6.  For future
> > flexibility it allows the ICMPv6 header and payload to be in separate
> > buffers.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  checksum.c | 15 +++++++++++++++
> >  checksum.h |  2 ++
> >  tap.c      |  4 +---
> >  3 files changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index 0e207c8..c8b6b42 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -52,6 +52,7 @@
> >  #include <stddef.h>
> >  #include <stdint.h>
> >  
> > +#include <linux/icmp.h>
> >  #include <linux/icmpv6.h>
> >  
> >  /**
> > @@ -107,6 +108,20 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> >  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> >  }
> >  
> > +/**
> > + * csum_icmp4() - Calculate checksum for an ICMPv4 packet
> 
> "Calculate and set"?

Done.

> By the way, there's no such thing as ICMPv4 --
> it's ICMP.

Technically, yes, but I kind of wanted to make it clear at a glance
that these are IPv4 specific functions.  I'd also like to avoid the
implication that v4 is the "normal" sort.  I've changed from "ICMPv4"
to "ICMP" in the comments, but I've left the '4's in the various names

> > + * @icmp4hr:	ICMPv4 header, initialized apart from checksum
> 
> ...-ised, if you respin. For consistency, I would call this 'ih'.
> 
> > + * @payload:	ICMPv4 packet payload
> > + * @len:	Length of @payload (not including ICMPv4 header)
> > + */
> > +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
> 
> I guess csum_icmp() is preferable. Indeed, for TCP and UDP 'tcp4' and
> 'udp4' make sense because those are the same protocols over IPv4 and
> IPv6.

See above.

> > +{
> > +	/* Partial checksum for ICMPv4 header alone */
> > +	uint32_t hrsum = sum_16b(icmp4hr, sizeof(*icmp4hr));
> 
> A white line would be nice here.

Done.
> 
> I would call this psum (same as in csum_icmp6())

Changed to 'psum'.

> or hdrsum, 'hr' isn't
> really used for "header" elsewhere.

Well.. except as a suffix, 'ihr' etc.

> > +	icmp4hr->checksum = 0;
> > +	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
> > +}
> > +
> >  /**
> >   * csum_icmp6() - Calculate checksum for an ICMPv6 packet
> >   * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> > diff --git a/checksum.h b/checksum.h
> > index 2c72200..ff95cf9 100644
> > --- a/checksum.h
> > +++ b/checksum.h
> > @@ -6,11 +6,13 @@
> >  #ifndef CHECKSUM_H
> >  #define CHECKSUM_H
> >  
> > +struct icmphdr;
> >  struct icmp6hdr;
> >  
> >  uint32_t sum_16b(const void *buf, size_t len);
> >  uint16_t csum_fold(uint32_t sum);
> >  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> > +void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
> >  void csum_icmp6(struct icmp6hdr *ih,
> >  		const struct in6_addr *saddr,
> >  		const struct in6_addr *daddr,
> > diff --git a/tap.c b/tap.c
> > index aafc92b..f082901 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -148,9 +148,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  			uh->check = 0;
> >  		} else if (iph->protocol == IPPROTO_ICMP) {
> >  			struct icmphdr *ih = (struct icmphdr *)(iph + 1);
> > -
> > -			ih->checksum = 0;
> > -			ih->checksum = csum_unaligned(ih, len, 0);
> > +			csum_icmp4(ih, ih + 1, len - sizeof(*ih));
> >  		}
> >  
> >  		if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums
  2022-10-18  3:02   ` Stefano Brivio
@ 2022-10-18 12:06     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:06 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4255 bytes --]

On Tue, Oct 18, 2022 at 05:02:31AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:57:56 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Add a helper for calculating UDP checksums when used over IPv6
> > For future flexibility, the new helper takes parameters for the fields in
> > the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
> > be explicitly constructed.  It also allows the UDP header and payload to
> > be in separate buffers, although we don't use this yet.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  checksum.c | 23 +++++++++++++++++++++++
> >  checksum.h |  5 +++++
> >  tap.c      |  5 ++---
> >  3 files changed, 30 insertions(+), 3 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index c8b6b42..0849fb1 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -52,6 +52,7 @@
> >  #include <stddef.h>
> >  #include <stdint.h>
> >  
> > +#include <linux/udp.h>
> >  #include <linux/icmp.h>
> >  #include <linux/icmpv6.h>
> >  
> > @@ -122,6 +123,28 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len)
> >  	icmp4hr->checksum = csum_unaligned(payload, len, hrsum);
> >  }
> >  
> > +/**
> > + * csum_udp6() - Calculate checksum for a UDP over IPv6 packet
> 
> Calculate and set.

Done.

> > + * @udp6hr:	UDP header, initialized apart from checksum
> 
> -ised.

Done.

> > + * @payload:	UDP packet payload
> > + * @len:	Length of @payload (not including UDP header)
> > + */
> > +void csum_udp6(struct udphdr *udp6hr,
> > +	       const struct in6_addr *saddr,
> > +	       const struct in6_addr *daddr,
> 
> You could use some horizontal space.

Done.

> > +	       const void *payload, size_t len)
> > +{
> > +	/* Partial checksum for the pseudo-IPv6 header */
> > +	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> > +		sum_16b(daddr, sizeof(*daddr)) +
> > +		htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP);
> 
> Alignment:
> 
> 	uint32_t psum = sum_16b(saddr, sizeof(*saddr)) +
> 			sum_16b(daddr, sizeof(*daddr)) +
> 			htons(len + sizeof(*udp6hr))   + htons(IPPROTO_UDP);

Done.

> > +	udp6hr->check = 0;
> > +	/* Add in partial checksum for the UDP header alone */
> > +	psum += sum_16b(udp6hr, sizeof(*udp6hr));
> > +	udp6hr->check = csum_unaligned(payload, len, psum);
> > +}
> > +
> >  /**
> >   * csum_icmp6() - Calculate checksum for an ICMPv6 packet
> >   * @icmp6hr:	ICMPv6 header, initialized apart from checksum
> > diff --git a/checksum.h b/checksum.h
> > index ff95cf9..1b9f48e 100644
> > --- a/checksum.h
> > +++ b/checksum.h
> > @@ -6,6 +6,7 @@
> >  #ifndef CHECKSUM_H
> >  #define CHECKSUM_H
> >  
> > +struct udphdr;
> >  struct icmphdr;
> >  struct icmp6hdr;
> >  
> > @@ -13,6 +14,10 @@ uint32_t sum_16b(const void *buf, size_t len);
> >  uint16_t csum_fold(uint32_t sum);
> >  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> >  void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len);
> > +void csum_udp6(struct udphdr *udp6hr,
> > +	       const struct in6_addr *saddr,
> > +	       const struct in6_addr *daddr,
> > +	       const void *payload, size_t len);
> 
> Use some horizontal space.

Done.

> >  void csum_icmp6(struct icmp6hdr *ih,
> >  		const struct in6_addr *saddr,
> >  		const struct in6_addr *daddr,
> > diff --git a/tap.c b/tap.c
> > index f082901..9c197cb 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -183,9 +183,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  		} else if (proto == IPPROTO_UDP) {
> >  			struct udphdr *uh = (struct udphdr *)(ip6h + 1);
> >  
> > -			uh->check = 0;
> > -			uh->check = csum_unaligned(ip6h, len + sizeof(*ip6h),
> > -						   0);
> > +			csum_udp6(uh, &ip6h->saddr, &ip6h->daddr,
> > +				  uh + 1, len - sizeof(*uh));
> >  		} else if (proto == IPPROTO_ICMPV6) {
> >  			struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1);
> >  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums
  2022-10-18  3:03   ` Stefano Brivio
@ 2022-10-18 12:06     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:06 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3490 bytes --]

On Tue, Oct 18, 2022 at 05:03:09AM +0200, Stefano Brivio wrote:
11;rgb:ffff/ffff/ffff> On Mon, 17 Oct 2022 19:57:57 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > At least two places in passt fill in UDP over IPv4 checksums, although
> > since UDP checksums are optional with IPv4 that just amounts to storing
> > a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in
> > dhcp()).  For consistency, add a helper for this "calculation".
> > 
> > Just for the heck of it, add the option (compile time disabled for now) to
> > calculate real UDP checksums.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  checksum.c | 33 +++++++++++++++++++++++++++++++++
> >  checksum.h |  3 +++
> >  dhcp.c     |  2 +-
> >  tap.c      |  2 +-
> >  4 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index 0849fb1..72f1cfb 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -56,6 +56,11 @@
> >  #include <linux/icmp.h>
> >  #include <linux/icmpv6.h>
> >  
> > +/* Checksums are optional for UDP over IPv4, so we usually just set
> > + * them to 0.  Change this 1 to calculate real UDP over IPv4 checksums
> 
> to 1

Done.

> > + */
> > +#define UDP4_REAL_CHECKSUMS	0
> > +
> >  /**
> >   * sum_16b() - Calculate sum of 16-bit words
> >   * @buf:	Input buffer
> > @@ -109,6 +114,34 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> >  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> >  }
> >  
> > +/**
> > + * csum_udp4() - Calculate checksum for a UDP over IPv4 packet
> 
> and set

Done.

> > + * @udp4hr:	UDP header, initialized apart from checksum
> > + * @saddr:	IPv4 source address
> > + * @daddr:	IPv4 destination address
> > + * @payload:	ICMPv4 packet payload
> > + * @len:	Length of @payload (not including UDP)
> > + */
> > +void csum_udp4(struct udphdr *udp4hr,
> > +	       in_addr_t saddr, in_addr_t daddr,
> > +	       const void *payload, size_t len)
> > +{
> > +	/* UDP checksums are optional, so don't bother */
> > +	udp4hr->check = 0;
> > +
> > +	if (UDP4_REAL_CHECKSUMS) {
> > +		/* UNTESTED: if we did want real UDPv4 checksums, this
> > +		 * is roughly what we'd need */
> > +		uint32_t psum = csum_fold(htonl(saddr))
> > +			+ csum_fold(htonl(daddr))
> > +			+ htons(len + sizeof(*udp4hr))
> > +			+ htons(IPPROTO_UDP);
> > +		/* Add in partial checksum for the UDP header alone */
> > +		psum += sum_16b(udp4hr, sizeof(*udp4hr));
> > +		udp4hr->check = csum_unaligned(payload, len, psum);
> > +	}
> > +}
> > +
> >  /**
> >   * csum_icmp4() - Calculate checksum for an ICMPv4 packet
> >   * @icmp4hr:	ICMPv4 header, initialized apart from checksum
> > diff --git a/checksum.h b/checksum.h
> > index 1b9f48e..a9502b9 100644
> > --- a/checksum.h
> > +++ b/checksum.h
> > @@ -13,6 +13,9 @@ struct icmp6hdr;
> >  uint32_t sum_16b(const void *buf, size_t len);
> >  uint16_t csum_fold(uint32_t sum);
> >  uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init);
> > +void csum_udp4(struct udphdr *udp4hr,
> > +	       in_addr_t saddr, in_addr_t daddr,
> > +	       const void *payload, size_t len);
> 
> Horizontal space.

Done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums
  2022-10-18  3:03   ` Stefano Brivio
@ 2022-10-18 12:07     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1620 bytes --]

On Tue, Oct 18, 2022 at 05:03:49AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:57:58 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We calculate IPv4 header checksums in at least two places, in dhcp() and
> > in tap_ip_send.  Add a helper to handle this calculation in both places.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  checksum.c | 6 ++++++
> >  checksum.h | 1 +
> >  dhcp.c     | 3 +--
> >  tap.c      | 3 +--
> >  4 files changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/checksum.c b/checksum.c
> > index 72f1cfb..f25a96a 100644
> > --- a/checksum.c
> > +++ b/checksum.c
> > @@ -114,6 +114,12 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> >  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> >  }
> >  
> 
> /**
>  * csum_ip4_header() - Calculate and set IPv4 header checksum
>  * @iph:	IPv4 header
>  */
> 
> ...I just tried to run Doxygen, I think it would be nice to have
> eventually (especially for DOT call graphs), things don't look too bad.
> 
> > +void csum_ip4_header(struct iphdr *ip4hr)
> > +{
> > +	ip4hr->check = 0;
> > +	ip4hr->check = csum_unaligned(ip4hr, (size_t)ip4hr->ihl * 4, 0);
> 
> iph, for consistency.

As noted before, I'd prefer to avoid the implication that IPv4 is
normal and IPv6 is special.  I have changed to just ip4h, though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/14] Add helpers for normal inbound packet destination addresses
  2022-10-18  3:04   ` Stefano Brivio
@ 2022-10-18 12:07     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3472 bytes --]

On Tue, Oct 18, 2022 at 05:04:41AM +0200, Stefano Brivio wrote:
11;rgb:ffff/ffff/ffff> On Mon, 17 Oct 2022 19:57:59 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > tap_ip_send() doesn't take a destination address, because it's specifically
> > for inbound packets, and the IP addresses of the guest/namespace are
> > already known to us.  Rather than open-coding this destination address
> > logic, make helper functions for it which will enable some later cleanups.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  tap.c | 29 ++++++++++++++++++++++++-----
> >  tap.h |  3 +++
> >  2 files changed, 27 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tap.c b/tap.c
> > index de02c56..41e8ff2 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -96,6 +96,28 @@ int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre)
> >  	return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len);
> >  }
> >  
> > +/**
> > + * tap_ip4_daddr() - Normal IPv4 destination address for inbound packets
> > + * @c:		Execution context
> 
> Given that the address is returned in network order, I think this would
> be relevant here:
> 
>  * Return: IPv4 address, network order

Done.

> > + */
> > +in_addr_t tap_ip4_daddr(const struct ctx *c)
> > +{
> > +	return c->ip4.addr_seen;
> > +}
> > +
> > +/**
> > + * tap_ip6_daddr() - Normal IPv4 destination address for inbound packets
> > + * @c:		Execution context
> > + * @src:	Source address
> 
>  * Return: pointer to IPv6 address

Done.

> > + */
> > +const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> > +				     const struct in6_addr *src)
> > +{
> > +	if (IN6_IS_ADDR_LINKLOCAL(src))
> > +		return &c->ip6.addr_ll_seen;
> > +	return &c->ip6.addr_seen;
> > +}
> > +
> >  /**
> >   * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
> >   * @c:		Execution context
> > @@ -132,7 +154,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  		iph->frag_off = 0;
> >  		iph->ttl = 255;
> >  		iph->protocol = proto;
> > -		iph->daddr = c->ip4.addr_seen;
> > +		iph->daddr = tap_ip4_daddr(c);
> >  		memcpy(&iph->saddr, &src->s6_addr[12], 4);
> >  
> >  		csum_ip4_header(iph);
> > @@ -163,10 +185,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  		ip6h->priority = 0;
> >  
> >  		ip6h->saddr = *src;
> > -		if (IN6_IS_ADDR_LINKLOCAL(src))
> > -			ip6h->daddr = c->ip6.addr_ll_seen;
> > -		else
> > -			ip6h->daddr = c->ip6.addr_seen;
> > +		ip6h->daddr = *tap_ip6_daddr(c, src);
> >  
> >  		memcpy(data, in, len);
> >  
> > diff --git a/tap.h b/tap.h
> > index df3aec0..a6764b4 100644
> > --- a/tap.h
> > +++ b/tap.h
> > @@ -6,6 +6,9 @@
> >  #ifndef TAP_H
> >  #define TAP_H
> >  
> > +in_addr_t tap_ip4_daddr(const struct ctx *c);
> > +const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> > +				     const struct in6_addr *src);
> >  void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto,
> >  		 const char *in, size_t len, uint32_t flow);
> >  int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send()
  2022-10-18  3:05   ` Stefano Brivio
@ 2022-10-18 12:07     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

On Tue, Oct 18, 2022 at 05:05:13AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:58:01 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Callers of tap_send() can optionally use a small optimization by adding
> > extra space for the 4 byte length header used on the qemu socket interface.
> > tap_ip_send() is currently the only user of this, but this is used only
> > for "slow path" ICMP and DHCP packets, so there's not a lot of value to
> > the optimization.
> 
> Not anymore, definitely. It used to look quite bad in perf(1) when I
> shuffled connection data around.

Yeah.  In any case I think I can get an equivalent optimization back
at some point.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions
  2022-10-18  3:06   ` Stefano Brivio
@ 2022-10-18 12:07     ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4722 bytes --]

On Tue, Oct 18, 2022 at 05:06:11AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:58:02 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and
> > it turns out that every caller (statically) knows if it is using IPv4 or
> > IPv6.  So split into separate tap_ip4_send() and tap_ip6_send() functions.
> > Use a new tap_l2_hdr() function for the very small common part.
> > 
> > While we're there, make some minor cleanups:
> >   - We were double writing some fields in the IPv6 header, so that it
> >     temporary matched the pseudo-header for checksum calculation.  With
> >     recent checksum reworks, this isn't neccessary any more.
> >   - We don't use any IPv4 header options, so use some sizeof() constructs
> >     instead of some open coded values for header length.
> >   - The comment used to say that the flow label was for TCP over IPv6, but
> >     in fact the only thing we used it for was ICMPv6
> 
> ...right, this used to be the data path.

Makes sense.  Realized that comment isn't quite correct, because it
was DHCPv6 rather than ICMPv6 traffic that got the flow labels.

> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  dhcpv6.c |   6 +-
> >  icmp.c   |  10 +---
> >  tap.c    | 176 +++++++++++++++++++++++++++++--------------------------
> >  tap.h    |   6 +-
> >  4 files changed, 102 insertions(+), 96 deletions(-)
> > 
> > diff --git a/dhcpv6.c b/dhcpv6.c
> > index e7640ce..7829968 100644
> > --- a/dhcpv6.c
> > +++ b/dhcpv6.c
> > @@ -531,8 +531,8 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> >  
> >  			resp_not_on_link.hdr.xid = mh->xid;
> >  
> > -			tap_ip_send(c, src, IPPROTO_UDP,
> > -				    (char *)&resp_not_on_link, n, mh->xid);
> > +			tap_ip6_send(c, src, IPPROTO_UDP,
> > +				     (char *)&resp_not_on_link, n, mh->xid);
> >  
> >  			return 1;
> >  		}
> > @@ -580,7 +580,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
> >  
> >  	resp.hdr.xid = mh->xid;
> >  
> > -	tap_ip_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
> > +	tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid);
> >  	c->ip6.addr_seen = c->ip6.addr;
> >  
> >  	return 1;
> > diff --git a/icmp.c b/icmp.c
> > index 21ea2d7..61c2d90 100644
> > --- a/icmp.c
> > +++ b/icmp.c
> > @@ -69,10 +69,6 @@ static uint8_t icmp_act[IP_VERSIONS][DIV_ROUND_UP(ICMP_NUM_IDS, 8)];
> >  void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  		       uint32_t events, const struct timespec *now)
> >  {
> > -	struct in6_addr a6 = { .s6_addr = {    0,    0,    0,    0,
> > -					       0,    0,    0,    0,
> > -					       0,    0, 0xff, 0xff,
> > -					       0,    0,    0,    0 } };
> >  	union icmp_epoll_ref *iref = &ref.r.p.icmp;
> >  	struct sockaddr_storage sr;
> >  	socklen_t sl = sizeof(sr);
> > @@ -109,7 +105,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  			icmp_id_map[V6][id].seq = seq;
> >  		}
> >  
> > -		tap_ip_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
> > +		tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0);
> >  	} else {
> >  		struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr;
> >  		struct icmphdr *ih = (struct icmphdr *)buf;
> > @@ -127,9 +123,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  			icmp_id_map[V4][id].seq = seq;
> >  		}
> >  
> > -		memcpy(&a6.s6_addr[12], &sr4->sin_addr, sizeof(sr4->sin_addr));
> > -
> > -		tap_ip_send(c, &a6, IPPROTO_ICMP, buf, n, 0);
> > +		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
> >  	}
> >  }
> >  
> > diff --git a/tap.c b/tap.c
> > index ae75fac..45547ac 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -109,100 +109,110 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> >  }
> >  
> >  /**
> > - * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums
> > + * tap_l2_hdr() - Build an L2 header for an inbound packet
> >   * @c:		Execution context
> > - * @src:	IPv6 source address, IPv4-mapped for IPv4 sources
> > - * @proto:	L4 protocol number
> > - * @in:		Payload
> > - * @len:	L4 payload length
> > - * @flow:	Flow label for TCP over IPv6
> > + * @buf:	Buffer address at which to generate header
> > + * @proto:	Ethernet protocol number for L3
> > + *
> > + * Returns a pointer at which to write the packet's payload
> 
>  * Return: ...

Done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants
  2022-10-18  3:06   ` Stefano Brivio
@ 2022-10-18 12:07     ` David Gibson
  2022-10-18 12:27       ` Stefano Brivio
  0 siblings, 1 reply; 36+ messages in thread
From: David Gibson @ 2022-10-18 12:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 8680 bytes --]

On Tue, Oct 18, 2022 at 05:06:34AM +0200, Stefano Brivio wrote:
> On Mon, 17 Oct 2022 19:58:06 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > tap_ip4_send() has special case logic to compute the checksums for UDP
> > and ICMP packets, which is a mild layering violation.  By using a suitable
> > helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
> > without greatly increasing the code size, this removing that layering
> > violation.
> > 
> > We make some small changes to the interface while there.  In both cases
> > we make the destination IPv4 address a parameter, which will be useful
> > later.  For the UDP variant we make it take just the UDP payload, and it
> > will generate the UDP header.  For the ICMP variant we pass in the ICMP
> > header as before.  The inconsistency is because that's what seems to be
> > the more natural way to invoke the function in the callers in each case.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  icmp.c |  3 ++-
> >  tap.c  | 75 +++++++++++++++++++++++++++++++++++++++++-----------------
> >  tap.h  |  7 ++++--
> >  3 files changed, 60 insertions(+), 25 deletions(-)
> > 
> > diff --git a/icmp.c b/icmp.c
> > index 6493ea9..233acf9 100644
> > --- a/icmp.c
> > +++ b/icmp.c
> > @@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> >  			icmp_id_map[V4][id].seq = seq;
> >  		}
> >  
> > -		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
> > +		tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c),
> > +			       buf, n);
> >  	}
> >  }
> >  
> > diff --git a/tap.c b/tap.c
> > index 274f4ba..5792880 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -127,20 +127,10 @@ static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
> >  	return eh + 1;
> >  }
> >  
> > -/**
> > - * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
> > - * @c:		Execution context
> > - * @src:	IPv4 source address
> > - * @proto:	L4 protocol number
> > - * @in:		Payload
> > - * @len:	L4 payload length
> > - */
> > -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > -		  const char *in, size_t len)
> 
> I understand why you return ip(4)h + 1 here because I've just reviewed
> 9/14, I wouldn't know otherwise:
> 
> /**
>  * tap_ip4_hdr() - Build IPv4 header for inbound packet, with checksum
>  * @c:		Execution context
>  * @src:	IPv4 source address, network order
>  * @dst:	IPv4 destination address, network order
>  * @len:	L4 payload length
>  * @proto:	L4 protocol number
>  *
>  * Return: pointer to write payload to
>  */

Oops, yes, forgot to add a function comment.  Done.

> > +static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
> > +			 size_t len, uint8_t proto)
> >  {
> > -	char buf[USHRT_MAX];
> > -	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
> > -	char *data = (char *)(ip4h + 1);
> > +	struct iphdr *ip4h = (struct iphdr *)buf;
> >  
> >  	ip4h->version = 4;
> >  	ip4h->ihl = sizeof(struct iphdr) / 4;
> > @@ -151,20 +141,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> >  	ip4h->ttl = 255;
> >  	ip4h->protocol = proto;
> >  	ip4h->saddr = src;
> > -	ip4h->daddr = tap_ip4_daddr(c);
> > +	ip4h->daddr = dst;
> >  	csum_ip4_header(ip4h);
> > +	return ip4h + 1;
> > +}
> > +
> > +/**
> > + * tap_udp4_send() - Send UDP over IPv4 packet
> > + * @c:		Execution context
> > + * @src:	IPv4 source address
> > + * @sport:	UDP source port
> > + * @dst:	IPv4 destination address
> > + * @dport:	UDP destination port
> > + * @in:		UDP payload contents (not including UDP header)
> > + * @len:	UDP payload length (not including UDP header)
> > + */
> > +/* cppcheck-suppress unusedFunction */
> > +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> > +		   in_addr_t dst, in_port_t dport,
> > +		   const void *in, size_t len)
> > +{
> > +	size_t udplen = len + sizeof(struct udphdr);
> > +	char buf[USHRT_MAX];
> > +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> > +	void *uhp = tap_ip4_hdr(ip4h, src, dst, udplen, IPPROTO_UDP);
> 
> Two observations:
> 
> - this saves one line and one cast, but it's really a bit unnatural that
>   tap_ip4_hdr() doesn't point to the header it just made, or to nothing.
> 
>   I would rather have to +1 the return value or the original pointer
>   instead or having this trick
> 
> > +	struct udphdr *uh = (struct udphdr *)uhp;
> > +	char *data = (char *)(uh + 1);
> 
> - it's longer, but in my opinion clearer, if we split a bit more clearly
>   the components of the packet, that is, something like (untested):

I don't really want to change this.  Yes, it's a bit counterintuitive
at first blush, but there's a reason for this approach.

This style of a function which generates a header then points *after*
it works even if the header it generates is of variable length.
Advancing to the payload in the caller doesn't (at least not without
breaking the abstraction I'm trying to generate with these helpers).

That's not just theoretical, because at some point I'd like to extend
the l2_hdr function to also allocate space for the qemu socket length
header.

I'm certainly open to name changes to make this behaviour more
obvious, but I think returning the payload pointer not the header
pointer makes for a better abstraction here.

> char buf[USHRT_MAX];
> struct udphdr *uh;
> struct iphdr *iph;
> char *data;
> 
> iph = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP) + 1;
> tap_ip_hdr(iph, src, dst, len + sizeof(uh), IPPROTO_UDP);
> 
> uh = (struct udphdr *)iph + 1;
> uh->source = htons(sport);
> uh->dest = htons(dport);
> uh->len = htons(len + sizeof(uh));
> csum_udp4(uh, src, dst, in, len);
> 
> data = uh + 1;
> memcpy(data, in, len);
> 
> if (tap_send(c, buf, len + (data - buf)) < 0)
> 	debug("tap: failed to send %lu bytes (IPv4)", len);
> >  
> > +	uh->source = htons(sport);
> > +	uh->dest = htons(dport);
> > +	uh->len = htons(udplen);
> > +	csum_udp4(uh, src, dst, in, len);
> >  	memcpy(data, in, len);
> >  
> > -	if (ip4h->protocol == IPPROTO_UDP) {
> > -		struct udphdr *uh = (struct udphdr *)(ip4h + 1);
> > +	if (tap_send(c, buf, len + (data - buf)) < 0)
> > +		debug("tap: failed to send %lu bytes (IPv4)", len);
> > +}
> >  
> > -		csum_udp4(uh, ip4h->saddr, ip4h->daddr,
> > -			  uh + 1, len - sizeof(*uh));
> > -	} else if (ip4h->protocol == IPPROTO_ICMP) {
> > -		struct icmphdr *ih = (struct icmphdr *)(ip4h + 1);
> > -		csum_icmp4(ih, ih + 1, len - sizeof(*ih));
> > -	}
> > +/**
> > + * tap_icmp4_send() - Send ICMPv4 packet
> > + * @c:		Execution context
> > + * @src:	IPv4 source address
> > + * @dst:	IPv4 destination address
> > + * @in:		ICMP packet, including ICMP header
> > + * @len:	ICMP packet length, including ICMP header
> > + */
> > +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
> > +		    void *in, size_t len)
> > +{
> > +	char buf[USHRT_MAX];
> > +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> > +	char *data = tap_ip4_hdr(ip4h, src, dst, len, IPPROTO_ICMP);
> > +	struct icmphdr *icmp4h = (struct icmphdr *)data;
> 
> ...same here, even though perhaps not so apparent.
> 
> > +
> > +	memcpy(data, in, len);
> > +	csum_icmp4(icmp4h, icmp4h + 1, len - sizeof(*icmp4h));
> >  
> >  	if (tap_send(c, buf, len + (data - buf)) < 0)
> >  		debug("tap: failed to send %lu bytes (IPv4)", len);
> > diff --git a/tap.h b/tap.h
> > index d43c7a0..743bc58 100644
> > --- a/tap.h
> > +++ b/tap.h
> > @@ -7,10 +7,13 @@
> >  #define TAP_H
> >  
> >  in_addr_t tap_ip4_daddr(const struct ctx *c);
> > +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> > +		   in_addr_t dst, in_port_t dport,
> > +		   const void *in, size_t len);
> > +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst,
> > +		    void *in, size_t len);
> >  const struct in6_addr *tap_ip6_daddr(const struct ctx *c,
> >  				     const struct in6_addr *src);
> > -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > -		  const char *in, size_t len);
> >  void tap_udp6_send(const struct ctx *c,
> >  		   const struct in6_addr *src, in_port_t sport,
> >  		   const struct in6_addr *dst, in_port_t dport,
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants
  2022-10-18 12:07     ` David Gibson
@ 2022-10-18 12:27       ` Stefano Brivio
  2022-10-18 23:54         ` David Gibson
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18 12:27 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Tue, 18 Oct 2022 23:07:58 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Oct 18, 2022 at 05:06:34AM +0200, Stefano Brivio wrote:
> > On Mon, 17 Oct 2022 19:58:06 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > tap_ip4_send() has special case logic to compute the checksums for UDP
> > > and ICMP packets, which is a mild layering violation.  By using a suitable
> > > helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
> > > without greatly increasing the code size, this removing that layering
> > > violation.
> > > 
> > > We make some small changes to the interface while there.  In both cases
> > > we make the destination IPv4 address a parameter, which will be useful
> > > later.  For the UDP variant we make it take just the UDP payload, and it
> > > will generate the UDP header.  For the ICMP variant we pass in the ICMP
> > > header as before.  The inconsistency is because that's what seems to be
> > > the more natural way to invoke the function in the callers in each case.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  icmp.c |  3 ++-
> > >  tap.c  | 75 +++++++++++++++++++++++++++++++++++++++++-----------------
> > >  tap.h  |  7 ++++--
> > >  3 files changed, 60 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/icmp.c b/icmp.c
> > > index 6493ea9..233acf9 100644
> > > --- a/icmp.c
> > > +++ b/icmp.c
> > > @@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> > >  			icmp_id_map[V4][id].seq = seq;
> > >  		}
> > >  
> > > -		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
> > > +		tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c),
> > > +			       buf, n);
> > >  	}
> > >  }
> > >  
> > > diff --git a/tap.c b/tap.c
> > > index 274f4ba..5792880 100644
> > > --- a/tap.c
> > > +++ b/tap.c
> > > @@ -127,20 +127,10 @@ static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
> > >  	return eh + 1;
> > >  }
> > >  
> > > -/**
> > > - * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
> > > - * @c:		Execution context
> > > - * @src:	IPv4 source address
> > > - * @proto:	L4 protocol number
> > > - * @in:		Payload
> > > - * @len:	L4 payload length
> > > - */
> > > -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > > -		  const char *in, size_t len)  
> > 
> > I understand why you return ip(4)h + 1 here because I've just reviewed
> > 9/14, I wouldn't know otherwise:
> > 
> > /**
> >  * tap_ip4_hdr() - Build IPv4 header for inbound packet, with checksum
> >  * @c:		Execution context
> >  * @src:	IPv4 source address, network order
> >  * @dst:	IPv4 destination address, network order
> >  * @len:	L4 payload length
> >  * @proto:	L4 protocol number
> >  *
> >  * Return: pointer to write payload to
> >  */  
> 
> Oops, yes, forgot to add a function comment.  Done.
> 
> > > +static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
> > > +			 size_t len, uint8_t proto)
> > >  {
> > > -	char buf[USHRT_MAX];
> > > -	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
> > > -	char *data = (char *)(ip4h + 1);
> > > +	struct iphdr *ip4h = (struct iphdr *)buf;
> > >  
> > >  	ip4h->version = 4;
> > >  	ip4h->ihl = sizeof(struct iphdr) / 4;
> > > @@ -151,20 +141,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > >  	ip4h->ttl = 255;
> > >  	ip4h->protocol = proto;
> > >  	ip4h->saddr = src;
> > > -	ip4h->daddr = tap_ip4_daddr(c);
> > > +	ip4h->daddr = dst;
> > >  	csum_ip4_header(ip4h);
> > > +	return ip4h + 1;
> > > +}
> > > +
> > > +/**
> > > + * tap_udp4_send() - Send UDP over IPv4 packet
> > > + * @c:		Execution context
> > > + * @src:	IPv4 source address
> > > + * @sport:	UDP source port
> > > + * @dst:	IPv4 destination address
> > > + * @dport:	UDP destination port
> > > + * @in:		UDP payload contents (not including UDP header)
> > > + * @len:	UDP payload length (not including UDP header)
> > > + */
> > > +/* cppcheck-suppress unusedFunction */
> > > +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> > > +		   in_addr_t dst, in_port_t dport,
> > > +		   const void *in, size_t len)
> > > +{
> > > +	size_t udplen = len + sizeof(struct udphdr);
> > > +	char buf[USHRT_MAX];
> > > +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> > > +	void *uhp = tap_ip4_hdr(ip4h, src, dst, udplen, IPPROTO_UDP);  
> > 
> > Two observations:
> > 
> > - this saves one line and one cast, but it's really a bit unnatural that
> >   tap_ip4_hdr() doesn't point to the header it just made, or to nothing.
> > 
> >   I would rather have to +1 the return value or the original pointer
> >   instead or having this trick
> >   
> > > +	struct udphdr *uh = (struct udphdr *)uhp;
> > > +	char *data = (char *)(uh + 1);  
> > 
> > - it's longer, but in my opinion clearer, if we split a bit more clearly
> >   the components of the packet, that is, something like (untested):  
> 
> I don't really want to change this.  Yes, it's a bit counterintuitive
> at first blush, but there's a reason for this approach.
> 
> This style of a function which generates a header then points *after*
> it works even if the header it generates is of variable length.
> Advancing to the payload in the caller doesn't (at least not without
> breaking the abstraction I'm trying to generate with these helpers).
> 
> That's not just theoretical, because at some point I'd like to extend
> the l2_hdr function to also allocate space for the qemu socket length
> header.
> 
> I'm certainly open to name changes to make this behaviour more
> obvious, but I think returning the payload pointer not the header
> pointer makes for a better abstraction here.

Hmm, yes, I think the variable length case is a very valid point, and
also in terms of abstraction I see the advantage. There are just two
things I can think of:

- passing the end pointer as argument (not as practical as your
  solution, though)

- naming it tap_ip4_push_hdr(), tap_ip4_hdr_after(),
  tap_ip4_hdr_goto_next(), tap_ip4_leave_header_behind()... I can't
  think of anything better at this point. I'll keep thinking, but at the
  moment I'd be fine even with the current name.

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums
  2022-10-18 12:06     ` David Gibson
@ 2022-10-18 12:28       ` Stefano Brivio
  0 siblings, 0 replies; 36+ messages in thread
From: Stefano Brivio @ 2022-10-18 12:28 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Tue, 18 Oct 2022 23:06:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Oct 18, 2022 at 05:01:51AM +0200, Stefano Brivio wrote:
> > On Mon, 17 Oct 2022 19:57:55 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > Although tap_ip_send() is currently the only place calculating ICMPv4
> > > checksums, create a helper function for symmetry with ICMPv6.  For future
> > > flexibility it allows the ICMPv6 header and payload to be in separate
> > > buffers.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  checksum.c | 15 +++++++++++++++
> > >  checksum.h |  2 ++
> > >  tap.c      |  4 +---
> > >  3 files changed, 18 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/checksum.c b/checksum.c
> > > index 0e207c8..c8b6b42 100644
> > > --- a/checksum.c
> > > +++ b/checksum.c
> > > @@ -52,6 +52,7 @@
> > >  #include <stddef.h>
> > >  #include <stdint.h>
> > >  
> > > +#include <linux/icmp.h>
> > >  #include <linux/icmpv6.h>
> > >  
> > >  /**
> > > @@ -107,6 +108,20 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init)
> > >  	return (uint16_t)~csum_fold(sum_16b(buf, len) + init);
> > >  }
> > >  
> > > +/**
> > > + * csum_icmp4() - Calculate checksum for an ICMPv4 packet  
> > 
> > "Calculate and set"?  
> 
> Done.
> 
> > By the way, there's no such thing as ICMPv4 --
> > it's ICMP.  
> 
> Technically, yes, but I kind of wanted to make it clear at a glance
> that these are IPv4 specific functions.  I'd also like to avoid the
> implication that v4 is the "normal" sort.  I've changed from "ICMPv4"
> to "ICMP" in the comments, but I've left the '4's in the various names

Ah, yes, sure, makes sense, as long as we don't refer to "ICMPv4" in
the comments I'm fine with it. :)

-- 
Stefano


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants
  2022-10-18 12:27       ` Stefano Brivio
@ 2022-10-18 23:54         ` David Gibson
  0 siblings, 0 replies; 36+ messages in thread
From: David Gibson @ 2022-10-18 23:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7088 bytes --]

On Tue, Oct 18, 2022 at 02:27:04PM +0200, Stefano Brivio wrote:
> On Tue, 18 Oct 2022 23:07:58 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Oct 18, 2022 at 05:06:34AM +0200, Stefano Brivio wrote:
> > > On Mon, 17 Oct 2022 19:58:06 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > tap_ip4_send() has special case logic to compute the checksums for UDP
> > > > and ICMP packets, which is a mild layering violation.  By using a suitable
> > > > helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
> > > > without greatly increasing the code size, this removing that layering
> > > > violation.
> > > > 
> > > > We make some small changes to the interface while there.  In both cases
> > > > we make the destination IPv4 address a parameter, which will be useful
> > > > later.  For the UDP variant we make it take just the UDP payload, and it
> > > > will generate the UDP header.  For the ICMP variant we pass in the ICMP
> > > > header as before.  The inconsistency is because that's what seems to be
> > > > the more natural way to invoke the function in the callers in each case.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  icmp.c |  3 ++-
> > > >  tap.c  | 75 +++++++++++++++++++++++++++++++++++++++++-----------------
> > > >  tap.h  |  7 ++++--
> > > >  3 files changed, 60 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/icmp.c b/icmp.c
> > > > index 6493ea9..233acf9 100644
> > > > --- a/icmp.c
> > > > +++ b/icmp.c
> > > > @@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref,
> > > >  			icmp_id_map[V4][id].seq = seq;
> > > >  		}
> > > >  
> > > > -		tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n);
> > > > +		tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c),
> > > > +			       buf, n);
> > > >  	}
> > > >  }
> > > >  
> > > > diff --git a/tap.c b/tap.c
> > > > index 274f4ba..5792880 100644
> > > > --- a/tap.c
> > > > +++ b/tap.c
> > > > @@ -127,20 +127,10 @@ static void *tap_l2_hdr(const struct ctx *c, void *buf, uint16_t proto)
> > > >  	return eh + 1;
> > > >  }
> > > >  
> > > > -/**
> > > > - * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums
> > > > - * @c:		Execution context
> > > > - * @src:	IPv4 source address
> > > > - * @proto:	L4 protocol number
> > > > - * @in:		Payload
> > > > - * @len:	L4 payload length
> > > > - */
> > > > -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > > > -		  const char *in, size_t len)  
> > > 
> > > I understand why you return ip(4)h + 1 here because I've just reviewed
> > > 9/14, I wouldn't know otherwise:
> > > 
> > > /**
> > >  * tap_ip4_hdr() - Build IPv4 header for inbound packet, with checksum
> > >  * @c:		Execution context
> > >  * @src:	IPv4 source address, network order
> > >  * @dst:	IPv4 destination address, network order
> > >  * @len:	L4 payload length
> > >  * @proto:	L4 protocol number
> > >  *
> > >  * Return: pointer to write payload to
> > >  */  
> > 
> > Oops, yes, forgot to add a function comment.  Done.
> > 
> > > > +static void *tap_ip4_hdr(char *buf, in_addr_t src, in_addr_t dst,
> > > > +			 size_t len, uint8_t proto)
> > > >  {
> > > > -	char buf[USHRT_MAX];
> > > > -	struct iphdr *ip4h = (struct iphdr *)tap_l2_hdr(c, buf, ETH_P_IP);
> > > > -	char *data = (char *)(ip4h + 1);
> > > > +	struct iphdr *ip4h = (struct iphdr *)buf;
> > > >  
> > > >  	ip4h->version = 4;
> > > >  	ip4h->ihl = sizeof(struct iphdr) / 4;
> > > > @@ -151,20 +141,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto,
> > > >  	ip4h->ttl = 255;
> > > >  	ip4h->protocol = proto;
> > > >  	ip4h->saddr = src;
> > > > -	ip4h->daddr = tap_ip4_daddr(c);
> > > > +	ip4h->daddr = dst;
> > > >  	csum_ip4_header(ip4h);
> > > > +	return ip4h + 1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tap_udp4_send() - Send UDP over IPv4 packet
> > > > + * @c:		Execution context
> > > > + * @src:	IPv4 source address
> > > > + * @sport:	UDP source port
> > > > + * @dst:	IPv4 destination address
> > > > + * @dport:	UDP destination port
> > > > + * @in:		UDP payload contents (not including UDP header)
> > > > + * @len:	UDP payload length (not including UDP header)
> > > > + */
> > > > +/* cppcheck-suppress unusedFunction */
> > > > +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport,
> > > > +		   in_addr_t dst, in_port_t dport,
> > > > +		   const void *in, size_t len)
> > > > +{
> > > > +	size_t udplen = len + sizeof(struct udphdr);
> > > > +	char buf[USHRT_MAX];
> > > > +	void *ip4h = tap_l2_hdr(c, buf, ETH_P_IP);
> > > > +	void *uhp = tap_ip4_hdr(ip4h, src, dst, udplen, IPPROTO_UDP);  
> > > 
> > > Two observations:
> > > 
> > > - this saves one line and one cast, but it's really a bit unnatural that
> > >   tap_ip4_hdr() doesn't point to the header it just made, or to nothing.
> > > 
> > >   I would rather have to +1 the return value or the original pointer
> > >   instead or having this trick
> > >   
> > > > +	struct udphdr *uh = (struct udphdr *)uhp;
> > > > +	char *data = (char *)(uh + 1);  
> > > 
> > > - it's longer, but in my opinion clearer, if we split a bit more clearly
> > >   the components of the packet, that is, something like (untested):  
> > 
> > I don't really want to change this.  Yes, it's a bit counterintuitive
> > at first blush, but there's a reason for this approach.
> > 
> > This style of a function which generates a header then points *after*
> > it works even if the header it generates is of variable length.
> > Advancing to the payload in the caller doesn't (at least not without
> > breaking the abstraction I'm trying to generate with these helpers).
> > 
> > That's not just theoretical, because at some point I'd like to extend
> > the l2_hdr function to also allocate space for the qemu socket length
> > header.
> > 
> > I'm certainly open to name changes to make this behaviour more
> > obvious, but I think returning the payload pointer not the header
> > pointer makes for a better abstraction here.
> 
> Hmm, yes, I think the variable length case is a very valid point, and
> also in terms of abstraction I see the advantage. There are just two
> things I can think of:
> 
> - passing the end pointer as argument (not as practical as your
>   solution, though)
> 
> - naming it tap_ip4_push_hdr(), tap_ip4_hdr_after(),
>   tap_ip4_hdr_goto_next(), tap_ip4_leave_header_behind()... I can't
>   think of anything better at this point. I'll keep thinking, but at the
>   moment I'd be fine even with the current name.

I've gone with a variant of the 'push' naming, I think that makes it a
bit clearer.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-10-18 23:54 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17  8:57 [PATCH 00/14] Clean up checksum and header generation for inbound packets David Gibson
2022-10-17  8:57 ` [PATCH 01/14] Add csum_icmp6() helper for calculating ICMPv6 checksums David Gibson
2022-10-18  3:01   ` Stefano Brivio
2022-10-18 12:05     ` David Gibson
2022-10-17  8:57 ` [PATCH 02/14] Add csum_icmp4() helper for calculating ICMPv4 checksums David Gibson
2022-10-18  3:01   ` Stefano Brivio
2022-10-18 12:06     ` David Gibson
2022-10-18 12:28       ` Stefano Brivio
2022-10-17  8:57 ` [PATCH 03/14] Add csum_udp6() helper for calculating UDP over IPv6 checksums David Gibson
2022-10-18  3:02   ` Stefano Brivio
2022-10-18 12:06     ` David Gibson
2022-10-17  8:57 ` [PATCH 04/14] Add csum_udp4() helper for calculating UDP over IPv4 checksums David Gibson
2022-10-18  3:03   ` Stefano Brivio
2022-10-18 12:06     ` David Gibson
2022-10-17  8:57 ` [PATCH 05/14] Add csum_ip4_header() helper to calculate IPv4 header checksums David Gibson
2022-10-18  3:03   ` Stefano Brivio
2022-10-18 12:07     ` David Gibson
2022-10-17  8:57 ` [PATCH 06/14] Add helpers for normal inbound packet destination addresses David Gibson
2022-10-18  3:04   ` Stefano Brivio
2022-10-18 12:07     ` David Gibson
2022-10-17  8:58 ` [PATCH 07/14] Remove support for TCP packets from tap_ip_send() David Gibson
2022-10-17  8:58 ` [PATCH 08/14] tap: Remove unhelpeful vnet_pre optimization from tap_send() David Gibson
2022-10-18  3:05   ` Stefano Brivio
2022-10-18 12:07     ` David Gibson
2022-10-17  8:58 ` [PATCH 09/14] Split tap_ip_send() into IPv4 and IPv6 specific functions David Gibson
2022-10-18  3:06   ` Stefano Brivio
2022-10-18 12:07     ` David Gibson
2022-10-17  8:58 ` [PATCH 10/14] tap: Split tap_ip6_send() into UDP and ICMP variants David Gibson
2022-10-17  8:58 ` [PATCH 11/14] ndp: Remove unneeded eh_source parameter David Gibson
2022-10-17  8:58 ` [PATCH 12/14] ndp: Use tap_icmp6_send() helper David Gibson
2022-10-17  8:58 ` [PATCH 13/14] tap: Split tap_ip4_send() into UDP and ICMP variants David Gibson
2022-10-18  3:06   ` Stefano Brivio
2022-10-18 12:07     ` David Gibson
2022-10-18 12:27       ` Stefano Brivio
2022-10-18 23:54         ` David Gibson
2022-10-17  8:58 ` [PATCH 14/14] dhcp: Use tap_udp4_send() helper in dhcp() David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).