public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v2 00/18] RFC: Unify and simplify tap send path
@ 2022-12-09  5:42 David Gibson
  2022-12-09  5:42 ` [PATCH v2 01/18] pcap: Introduce pcap_frame() helper David Gibson
                   ` (18 more replies)
  0 siblings, 19 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Although we have an abstraction for the "slow path" (DHCP, NDP) guest
bound packets, the TCP and UDP forwarding paths write directly to the
tap fd.  However, it turns out how they send frames to the tap device
is more similar than it originally appears.

This series unifies the low-level tap send functions for TCP and UDP,
and makes some clean ups along the way.

This is based on my earlier outstanding series.

Changes since v1:
 * Abstract tap header construction as well as frame send (a number of
   new patches)
 * Remove unneeded flags buf_bytes globals as well
 * Fix bug where we weren't correctly setting iov_len after the move
   to giving variable sized iovecs to send_frames().

David Gibson (18):
  pcap: Introduce pcap_frame() helper
  pcap: Replace pcapm() with pcap_multiple()
  tcp: Combine two parts of passt tap send path together
  tcp: Don't keep compute total bytes in a message until we need it
  tcp: Improve interface to tcp_l2_buf_flush()
  tcp: Combine two parts of pasta tap send path together
  tap, tcp: Move tap send path to tap.c
  util: Introduce hton*_constant() in place of #ifdefs
  tcp, udp: Use named field initializers in iov_init functions
  util: Parameterize ethernet header initializer macro
  tcp: Remove redundant and incorrect initialization from *_iov_init()
  tcp: Consolidate calculation of total frame size
  tap: Add "tap headers" abstraction
  tcp: Use abstracted tap header
  tap: Use different io vector bases depending on tap type
  udp: Use abstracted tap header
  udp: Use tap_send_frames()
  tap: Improve handling of partial frame sends

 dhcpv6.c |  50 +++--------
 pcap.c   |  78 +++++------------
 pcap.h   |   3 +-
 tap.c    | 123 +++++++++++++++++++++++++++
 tap.h    |  54 ++++++++++++
 tcp.c    | 254 +++++++++++++------------------------------------------
 udp.c    | 213 ++++++----------------------------------------
 udp.h    |   2 +-
 util.h   |  47 ++--------
 9 files changed, 303 insertions(+), 521 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/18] pcap: Introduce pcap_frame() helper
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2023-01-04 17:45   ` Stefano Brivio
  2022-12-09  5:42 ` [PATCH v2 02/18] pcap: Replace pcapm() with pcap_multiple() David Gibson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing
to the capture file.  The main purpose pf pcapm() and pcampp() not calling
pcap seems to be to avoid repeatedly calling gettimeofday().  We can
accomplish that while still sharing code by adding a new helper which
takes the packet timestamp as a parameter.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c | 59 +++++++++++++++++++++++-----------------------------------
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/pcap.c b/pcap.c
index 836688d..c9ac5cf 100644
--- a/pcap.c
+++ b/pcap.c
@@ -65,6 +65,24 @@ struct pcap_pkthdr {
 	uint32_t len;
 };
 
+/**
+ * pcap_frame() - Capture a single frame to pcap file with given timestamp
+ * @pkt:	Pointer to data buffer, including L2 headers
+ * @len:	L2 packet length
+ * @tv:		Timestamp
+ */
+static void pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
+{
+	struct pcap_pkthdr h;
+
+	h.tv_sec = tv->tv_sec;
+	h.tv_usec = tv->tv_usec;
+	h.caplen = h.len = len;
+
+	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
+		debug("Cannot log packet, length %lu", len);
+}
+
 /**
  * pcap() - Capture a single frame to pcap file
  * @pkt:	Pointer to data buffer, including L2 headers
@@ -72,19 +90,13 @@ struct pcap_pkthdr {
  */
 void pcap(const char *pkt, size_t len)
 {
-	struct pcap_pkthdr h;
 	struct timeval tv;
 
 	if (pcap_fd == -1)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
-		debug("Cannot log packet, length %lu", len);
+	pcap_frame(pkt, len, &tv);
 }
 
 /**
@@ -93,8 +105,6 @@ void pcap(const char *pkt, size_t len)
  */
 void pcapm(const struct msghdr *mh)
 {
-	struct pcap_pkthdr h;
-	struct iovec *iov;
 	struct timeval tv;
 	unsigned int i;
 
@@ -102,24 +112,12 @@ void pcapm(const struct msghdr *mh)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
 
 	for (i = 0; i < mh->msg_iovlen; i++) {
-		iov = &mh->msg_iov[i];
-
-		h.caplen = h.len = iov->iov_len - 4;
+		const struct iovec *iov = &mh->msg_iov[i];
 
-		if (write(pcap_fd, &h, sizeof(h)) < 0)
-			goto fail;
-		if (write(pcap_fd, (char *)iov->iov_base + 4,
-			  iov->iov_len - 4) < 0)
-			goto fail;
+		pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4, &tv);
 	}
-
-	return;
-fail:
-	debug("Cannot log packet, length %lu", iov->iov_len - 4);
 }
 
 /**
@@ -128,7 +126,6 @@ fail:
  */
 void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 {
-	struct pcap_pkthdr h;
 	struct iovec *iov;
 	struct timeval tv;
 	unsigned int i, j;
@@ -137,8 +134,6 @@ void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
 
 	for (i = 0; i < vlen; i++) {
 		const struct msghdr *mh = &mmh[i].msg_hdr;
@@ -146,18 +141,10 @@ void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 		for (j = 0; j < mh->msg_iovlen; j++) {
 			iov = &mh->msg_iov[j];
 
-			h.caplen = h.len = iov->iov_len - 4;
-
-			if (write(pcap_fd, &h, sizeof(h)) < 0)
-				goto fail;
-			if (write(pcap_fd, (char *)iov->iov_base + 4,
-				  iov->iov_len - 4) < 0)
-				goto fail;
+			pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4,
+				   &tv);
 		}
 	}
-	return;
-fail:
-	debug("Cannot log packet, length %lu", iov->iov_len - 4);
 }
 
 /**
-- 
@@ -65,6 +65,24 @@ struct pcap_pkthdr {
 	uint32_t len;
 };
 
+/**
+ * pcap_frame() - Capture a single frame to pcap file with given timestamp
+ * @pkt:	Pointer to data buffer, including L2 headers
+ * @len:	L2 packet length
+ * @tv:		Timestamp
+ */
+static void pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
+{
+	struct pcap_pkthdr h;
+
+	h.tv_sec = tv->tv_sec;
+	h.tv_usec = tv->tv_usec;
+	h.caplen = h.len = len;
+
+	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
+		debug("Cannot log packet, length %lu", len);
+}
+
 /**
  * pcap() - Capture a single frame to pcap file
  * @pkt:	Pointer to data buffer, including L2 headers
@@ -72,19 +90,13 @@ struct pcap_pkthdr {
  */
 void pcap(const char *pkt, size_t len)
 {
-	struct pcap_pkthdr h;
 	struct timeval tv;
 
 	if (pcap_fd == -1)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
-		debug("Cannot log packet, length %lu", len);
+	pcap_frame(pkt, len, &tv);
 }
 
 /**
@@ -93,8 +105,6 @@ void pcap(const char *pkt, size_t len)
  */
 void pcapm(const struct msghdr *mh)
 {
-	struct pcap_pkthdr h;
-	struct iovec *iov;
 	struct timeval tv;
 	unsigned int i;
 
@@ -102,24 +112,12 @@ void pcapm(const struct msghdr *mh)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
 
 	for (i = 0; i < mh->msg_iovlen; i++) {
-		iov = &mh->msg_iov[i];
-
-		h.caplen = h.len = iov->iov_len - 4;
+		const struct iovec *iov = &mh->msg_iov[i];
 
-		if (write(pcap_fd, &h, sizeof(h)) < 0)
-			goto fail;
-		if (write(pcap_fd, (char *)iov->iov_base + 4,
-			  iov->iov_len - 4) < 0)
-			goto fail;
+		pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4, &tv);
 	}
-
-	return;
-fail:
-	debug("Cannot log packet, length %lu", iov->iov_len - 4);
 }
 
 /**
@@ -128,7 +126,6 @@ fail:
  */
 void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 {
-	struct pcap_pkthdr h;
 	struct iovec *iov;
 	struct timeval tv;
 	unsigned int i, j;
@@ -137,8 +134,6 @@ void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 		return;
 
 	gettimeofday(&tv, NULL);
-	h.tv_sec = tv.tv_sec;
-	h.tv_usec = tv.tv_usec;
 
 	for (i = 0; i < vlen; i++) {
 		const struct msghdr *mh = &mmh[i].msg_hdr;
@@ -146,18 +141,10 @@ void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
 		for (j = 0; j < mh->msg_iovlen; j++) {
 			iov = &mh->msg_iov[j];
 
-			h.caplen = h.len = iov->iov_len - 4;
-
-			if (write(pcap_fd, &h, sizeof(h)) < 0)
-				goto fail;
-			if (write(pcap_fd, (char *)iov->iov_base + 4,
-				  iov->iov_len - 4) < 0)
-				goto fail;
+			pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4,
+				   &tv);
 		}
 	}
-	return;
-fail:
-	debug("Cannot log packet, length %lu", iov->iov_len - 4);
 }
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/18] pcap: Replace pcapm() with pcap_multiple()
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
  2022-12-09  5:42 ` [PATCH v2 01/18] pcap: Introduce pcap_frame() helper David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 03/18] tcp: Combine two parts of passt tap send path together David Gibson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

pcapm() captures multiple frames from a msghdr, however the only thing it
cares about in the msghdr is the list of buffers, where it assumes there is
one frame to capture per buffer.  That's what we want for its single caller
but it's not the only obvious choice here (one frame per msghdr would
arguably make more sense in isolation).  In addition pcapm() has logic
that only makes sense in the context of the passt specific path its called
from: it skips the first 4 bytes of each buffer, because those have the
qemu vnet_len rather than the frame proper.

Make this clearer by replacing pcapm() with pcap_multiple() which more
explicitly takes one struct iovec per frame, and parameterizes how much of
each buffer to skip (i.e. the offset of the frame within the buffer).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c | 16 ++++++++--------
 pcap.h |  2 +-
 tcp.c  |  3 ++-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/pcap.c b/pcap.c
index c9ac5cf..8af9021 100644
--- a/pcap.c
+++ b/pcap.c
@@ -100,10 +100,12 @@ void pcap(const char *pkt, size_t len)
 }
 
 /**
- * pcapm() - Capture multiple frames from message header to pcap file
- * @mh:		Pointer to sendmsg() message header buffer
+ * pcap_multiple() - Capture multiple frames
+ * @iov:	Array of iovecs, one entry per frame
+ * @n:		Number of frames to capture
+ * @offset:	Offset of the frame within each iovec buffer
  */
-void pcapm(const struct msghdr *mh)
+void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 {
 	struct timeval tv;
 	unsigned int i;
@@ -113,11 +115,9 @@ void pcapm(const struct msghdr *mh)
 
 	gettimeofday(&tv, NULL);
 
-	for (i = 0; i < mh->msg_iovlen; i++) {
-		const struct iovec *iov = &mh->msg_iov[i];
-
-		pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4, &tv);
-	}
+	for (i = 0; i < n; i++)
+		pcap_frame((char *)iov[i].iov_base + offset,
+			   iov[i].iov_len - offset, &tv);
 }
 
 /**
diff --git a/pcap.h b/pcap.h
index 9e1736c..eafc89b 100644
--- a/pcap.h
+++ b/pcap.h
@@ -7,7 +7,7 @@
 #define PCAP_H
 
 void pcap(const char *pkt, size_t len);
-void pcapm(const struct msghdr *mh);
+void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
 void pcapmm(const struct mmsghdr *mmh, unsigned int vlen);
 void pcap_init(struct ctx *c);
 
diff --git a/tcp.c b/tcp.c
index cfdae06..ed65a9e 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1468,7 +1468,8 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 		}
 	}
 	*buf_used = *buf_bytes = 0;
-	pcapm(mh);
+
+	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
 }
 
 /**
-- 
@@ -1468,7 +1468,8 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 		}
 	}
 	*buf_used = *buf_bytes = 0;
-	pcapm(mh);
+
+	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
 }
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/18] tcp: Combine two parts of passt tap send path together
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
  2022-12-09  5:42 ` [PATCH v2 01/18] pcap: Introduce pcap_frame() helper David Gibson
  2022-12-09  5:42 ` [PATCH v2 02/18] pcap: Replace pcapm() with pcap_multiple() David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it David Gibson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tcp_l2_buf_flush() open codes the "primary" send of message to the passt
tap interface, but calls tcp_l2_buf_flush_part() to handle the case of a
short send.  Combine these two passt-specific operations into
tcp_l2_buf_flush_passt() which is a little cleaner and will enable furrther
cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tcp.c b/tcp.c
index ed65a9e..fe788b4 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1415,22 +1415,28 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
 }
 
 /**
- * tcp_l2_buf_flush_part() - Ensure a complete last message on partial sendmsg()
+ * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
  * @mh:		Message header that was partially sent by sendmsg()
- * @sent:	Bytes already sent
+ * @buf_bytes:	Total number of bytes to send
  */
-static void tcp_l2_buf_flush_part(const struct ctx *c,
-				  const struct msghdr *mh, size_t sent)
+static void tcp_l2_buf_flush_passt(const struct ctx *c,
+				   const struct msghdr *mh, size_t buf_bytes)
 {
-	size_t end = 0, missing;
+	size_t end = 0, missing, sent;
 	struct iovec *iov;
 	unsigned int i;
+	ssize_t n;
 	char *p;
 
+	n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	if (n < 0 || ((sent = (size_t)n) == buf_bytes))
+		return;
+
+	/* Ensure a complete last message on partial sendmsg() */
 	for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) {
 		end += iov->iov_len;
-		if (end >= sent)
+		if (end >= (size_t)sent)
 			break;
 	}
 
@@ -1454,9 +1460,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 		return;
 
 	if (c->mode == MODE_PASST) {
-		size_t n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-		if (n > 0 && n < *buf_bytes)
-			tcp_l2_buf_flush_part(c, mh, n);
+		tcp_l2_buf_flush_passt(c, mh, *buf_bytes);
 	} else {
 		size_t i;
 
-- 
@@ -1415,22 +1415,28 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
 }
 
 /**
- * tcp_l2_buf_flush_part() - Ensure a complete last message on partial sendmsg()
+ * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
  * @mh:		Message header that was partially sent by sendmsg()
- * @sent:	Bytes already sent
+ * @buf_bytes:	Total number of bytes to send
  */
-static void tcp_l2_buf_flush_part(const struct ctx *c,
-				  const struct msghdr *mh, size_t sent)
+static void tcp_l2_buf_flush_passt(const struct ctx *c,
+				   const struct msghdr *mh, size_t buf_bytes)
 {
-	size_t end = 0, missing;
+	size_t end = 0, missing, sent;
 	struct iovec *iov;
 	unsigned int i;
+	ssize_t n;
 	char *p;
 
+	n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	if (n < 0 || ((sent = (size_t)n) == buf_bytes))
+		return;
+
+	/* Ensure a complete last message on partial sendmsg() */
 	for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) {
 		end += iov->iov_len;
-		if (end >= sent)
+		if (end >= (size_t)sent)
 			break;
 	}
 
@@ -1454,9 +1460,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 		return;
 
 	if (c->mode == MODE_PASST) {
-		size_t n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-		if (n > 0 && n < *buf_bytes)
-			tcp_l2_buf_flush_part(c, mh, n);
+		tcp_l2_buf_flush_passt(c, mh, *buf_bytes);
 	} else {
 		size_t i;
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (2 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 03/18] tcp: Combine two parts of passt tap send path together David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2023-01-04 17:45   ` Stefano Brivio
  2022-12-09  5:42 ` [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush() David Gibson
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tcp[46]_l2_buf_bytes keep track of the total number of bytes we have
queued to send to the tap interface.  tcp_l2_buf_flush_passt() uses this
to determine if sendmsg() has sent all the data we requested, or whether
we need to resend a trailing portion.

However, the logic for finding where we're up to in the case of a short
sendmsg() can equally well tell whether we've had one at all, without
knowing the total number in advance.  This does require an extra loop after
each sendmsg(), but it's doing simple arithmetic on values we've already
been accessing, and it leads to overall simpler code.

tcp[46]_l2_flags_buf_bytes were being calculated, but never used for
anything, so simply remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 55 +++++++++++++++++++------------------------------------
 1 file changed, 19 insertions(+), 36 deletions(-)

diff --git a/tcp.c b/tcp.c
index fe788b4..4050ea6 100644
--- a/tcp.c
+++ b/tcp.c
@@ -476,7 +476,6 @@ static struct tcp4_l2_buf_t {
 tcp4_l2_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp4_l2_buf_used;
-static size_t tcp4_l2_buf_bytes;
 
 /**
  * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -507,7 +506,6 @@ struct tcp6_l2_buf_t {
 tcp6_l2_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp6_l2_buf_used;
-static size_t tcp6_l2_buf_bytes;
 
 /* recvmsg()/sendmsg() data for tap */
 static char 		tcp_buf_discard		[MAX_WINDOW];
@@ -555,7 +553,6 @@ static struct tcp4_l2_flags_buf_t {
 tcp4_l2_flags_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp4_l2_flags_buf_used;
-static size_t tcp4_l2_flags_buf_bytes;
 
 /**
  * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (flags)
@@ -585,7 +582,6 @@ static struct tcp6_l2_flags_buf_t {
 tcp6_l2_flags_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp6_l2_flags_buf_used;
-static size_t tcp6_l2_flags_buf_bytes;
 
 /* TCP connections */
 union tcp_conn tc[TCP_MAX_CONNS];
@@ -1418,19 +1414,17 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
  * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
  * @mh:		Message header that was partially sent by sendmsg()
- * @buf_bytes:	Total number of bytes to send
  */
-static void tcp_l2_buf_flush_passt(const struct ctx *c,
-				   const struct msghdr *mh, size_t buf_bytes)
+static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
 {
-	size_t end = 0, missing, sent;
+	size_t end = 0, missing;
 	struct iovec *iov;
 	unsigned int i;
-	ssize_t n;
+	ssize_t sent;
 	char *p;
 
-	n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-	if (n < 0 || ((sent = (size_t)n) == buf_bytes))
+	sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	if (sent < 0)
 		return;
 
 	/* Ensure a complete last message on partial sendmsg() */
@@ -1441,6 +1435,9 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
 	}
 
 	missing = end - sent;
+	if (!missing)
+		return;
+
 	p = (char *)iov->iov_base + iov->iov_len - missing;
 	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
 		debug("TCP: failed to flush %lu missing bytes to tap", missing);
@@ -1451,19 +1448,18 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
  * @c:		Execution context
  * @mh:		Message header pointing to buffers, msg_iovlen not set
  * @buf_used:	Pointer to count of used buffers, set to 0 on return
- * @buf_bytes:	Pointer to count of buffer bytes, set to 0 on return
  */
 static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
-			     unsigned int *buf_used, size_t *buf_bytes)
+			     unsigned int *buf_used)
 {
+	size_t i;
+
 	if (!(mh->msg_iovlen = *buf_used))
 		return;
 
 	if (c->mode == MODE_PASST) {
-		tcp_l2_buf_flush_passt(c, mh, *buf_bytes);
+		tcp_l2_buf_flush_passt(c, mh);
 	} else {
-		size_t i;
-
 		for (i = 0; i < mh->msg_iovlen; i++) {
 			struct iovec *iov = &mh->msg_iov[i];
 
@@ -1471,7 +1467,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 				i--;
 		}
 	}
-	*buf_used = *buf_bytes = 0;
+	*buf_used = 0;
 
 	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
 }
@@ -1484,17 +1480,14 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
 	struct msghdr mh = { 0 };
 	unsigned int *buf_used;
-	size_t *buf_bytes;
 
 	mh.msg_iov	= tcp6_l2_flags_iov;
 	buf_used	= &tcp6_l2_flags_buf_used;
-	buf_bytes	= &tcp6_l2_flags_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 
 	mh.msg_iov	= tcp4_l2_flags_iov;
 	buf_used	= &tcp4_l2_flags_buf_used;
-	buf_bytes	= &tcp4_l2_flags_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 }
 
 /**
@@ -1505,17 +1498,14 @@ static void tcp_l2_data_buf_flush(struct ctx *c)
 {
 	struct msghdr mh = { 0 };
 	unsigned int *buf_used;
-	size_t *buf_bytes;
 
 	mh.msg_iov = tcp6_l2_iov;
 	buf_used	= &tcp6_l2_buf_used;
-	buf_bytes	= &tcp6_l2_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 
 	mh.msg_iov = tcp4_l2_iov;
 	buf_used	= &tcp4_l2_buf_used;
-	buf_bytes	= &tcp4_l2_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 }
 
 /**
@@ -1829,11 +1819,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 					  NULL, conn->seq_to_tap);
 	iov->iov_len = eth_len + sizeof(uint32_t);
 
-	if (CONN_V4(conn))
-		tcp4_l2_flags_buf_bytes += iov->iov_len;
-	else
-		tcp6_l2_flags_buf_bytes += iov->iov_len;
-
 	if (th->ack)
 		conn_flag(c, conn, ~ACK_TO_TAP_DUE);
 
@@ -1849,7 +1834,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			memcpy(b4 + 1, b4, sizeof(*b4));
 			(iov + 1)->iov_len = iov->iov_len;
 			tcp4_l2_flags_buf_used++;
-			tcp4_l2_flags_buf_bytes += iov->iov_len;
 		}
 
 		if (tcp4_l2_flags_buf_used > ARRAY_SIZE(tcp4_l2_flags_buf) - 2)
@@ -1859,7 +1843,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			memcpy(b6 + 1, b6, sizeof(*b6));
 			(iov + 1)->iov_len = iov->iov_len;
 			tcp6_l2_flags_buf_used++;
-			tcp6_l2_flags_buf_bytes += iov->iov_len;
 		}
 
 		if (tcp6_l2_flags_buf_used > ARRAY_SIZE(tcp6_l2_flags_buf) - 2)
@@ -2203,7 +2186,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 		len = tcp_l2_buf_fill_headers(c, conn, b, plen, check, seq);
 
 		iov = tcp4_l2_iov + tcp4_l2_buf_used++;
-		tcp4_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = len + sizeof(b->vnet_len);
 		if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	} else if (CONN_V6(conn)) {
@@ -2212,7 +2195,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 		len = tcp_l2_buf_fill_headers(c, conn, b, plen, NULL, seq);
 
 		iov = tcp6_l2_iov + tcp6_l2_buf_used++;
-		tcp6_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = len + sizeof(b->vnet_len);
 		if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	}
-- 
@@ -476,7 +476,6 @@ static struct tcp4_l2_buf_t {
 tcp4_l2_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp4_l2_buf_used;
-static size_t tcp4_l2_buf_bytes;
 
 /**
  * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -507,7 +506,6 @@ struct tcp6_l2_buf_t {
 tcp6_l2_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp6_l2_buf_used;
-static size_t tcp6_l2_buf_bytes;
 
 /* recvmsg()/sendmsg() data for tap */
 static char 		tcp_buf_discard		[MAX_WINDOW];
@@ -555,7 +553,6 @@ static struct tcp4_l2_flags_buf_t {
 tcp4_l2_flags_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp4_l2_flags_buf_used;
-static size_t tcp4_l2_flags_buf_bytes;
 
 /**
  * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (flags)
@@ -585,7 +582,6 @@ static struct tcp6_l2_flags_buf_t {
 tcp6_l2_flags_buf[TCP_FRAMES_MEM];
 
 static unsigned int tcp6_l2_flags_buf_used;
-static size_t tcp6_l2_flags_buf_bytes;
 
 /* TCP connections */
 union tcp_conn tc[TCP_MAX_CONNS];
@@ -1418,19 +1414,17 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
  * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
  * @mh:		Message header that was partially sent by sendmsg()
- * @buf_bytes:	Total number of bytes to send
  */
-static void tcp_l2_buf_flush_passt(const struct ctx *c,
-				   const struct msghdr *mh, size_t buf_bytes)
+static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
 {
-	size_t end = 0, missing, sent;
+	size_t end = 0, missing;
 	struct iovec *iov;
 	unsigned int i;
-	ssize_t n;
+	ssize_t sent;
 	char *p;
 
-	n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-	if (n < 0 || ((sent = (size_t)n) == buf_bytes))
+	sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	if (sent < 0)
 		return;
 
 	/* Ensure a complete last message on partial sendmsg() */
@@ -1441,6 +1435,9 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
 	}
 
 	missing = end - sent;
+	if (!missing)
+		return;
+
 	p = (char *)iov->iov_base + iov->iov_len - missing;
 	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
 		debug("TCP: failed to flush %lu missing bytes to tap", missing);
@@ -1451,19 +1448,18 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
  * @c:		Execution context
  * @mh:		Message header pointing to buffers, msg_iovlen not set
  * @buf_used:	Pointer to count of used buffers, set to 0 on return
- * @buf_bytes:	Pointer to count of buffer bytes, set to 0 on return
  */
 static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
-			     unsigned int *buf_used, size_t *buf_bytes)
+			     unsigned int *buf_used)
 {
+	size_t i;
+
 	if (!(mh->msg_iovlen = *buf_used))
 		return;
 
 	if (c->mode == MODE_PASST) {
-		tcp_l2_buf_flush_passt(c, mh, *buf_bytes);
+		tcp_l2_buf_flush_passt(c, mh);
 	} else {
-		size_t i;
-
 		for (i = 0; i < mh->msg_iovlen; i++) {
 			struct iovec *iov = &mh->msg_iov[i];
 
@@ -1471,7 +1467,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
 				i--;
 		}
 	}
-	*buf_used = *buf_bytes = 0;
+	*buf_used = 0;
 
 	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
 }
@@ -1484,17 +1480,14 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
 	struct msghdr mh = { 0 };
 	unsigned int *buf_used;
-	size_t *buf_bytes;
 
 	mh.msg_iov	= tcp6_l2_flags_iov;
 	buf_used	= &tcp6_l2_flags_buf_used;
-	buf_bytes	= &tcp6_l2_flags_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 
 	mh.msg_iov	= tcp4_l2_flags_iov;
 	buf_used	= &tcp4_l2_flags_buf_used;
-	buf_bytes	= &tcp4_l2_flags_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 }
 
 /**
@@ -1505,17 +1498,14 @@ static void tcp_l2_data_buf_flush(struct ctx *c)
 {
 	struct msghdr mh = { 0 };
 	unsigned int *buf_used;
-	size_t *buf_bytes;
 
 	mh.msg_iov = tcp6_l2_iov;
 	buf_used	= &tcp6_l2_buf_used;
-	buf_bytes	= &tcp6_l2_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 
 	mh.msg_iov = tcp4_l2_iov;
 	buf_used	= &tcp4_l2_buf_used;
-	buf_bytes	= &tcp4_l2_buf_bytes;
-	tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes);
+	tcp_l2_buf_flush(c, &mh, buf_used);
 }
 
 /**
@@ -1829,11 +1819,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 					  NULL, conn->seq_to_tap);
 	iov->iov_len = eth_len + sizeof(uint32_t);
 
-	if (CONN_V4(conn))
-		tcp4_l2_flags_buf_bytes += iov->iov_len;
-	else
-		tcp6_l2_flags_buf_bytes += iov->iov_len;
-
 	if (th->ack)
 		conn_flag(c, conn, ~ACK_TO_TAP_DUE);
 
@@ -1849,7 +1834,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			memcpy(b4 + 1, b4, sizeof(*b4));
 			(iov + 1)->iov_len = iov->iov_len;
 			tcp4_l2_flags_buf_used++;
-			tcp4_l2_flags_buf_bytes += iov->iov_len;
 		}
 
 		if (tcp4_l2_flags_buf_used > ARRAY_SIZE(tcp4_l2_flags_buf) - 2)
@@ -1859,7 +1843,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			memcpy(b6 + 1, b6, sizeof(*b6));
 			(iov + 1)->iov_len = iov->iov_len;
 			tcp6_l2_flags_buf_used++;
-			tcp6_l2_flags_buf_bytes += iov->iov_len;
 		}
 
 		if (tcp6_l2_flags_buf_used > ARRAY_SIZE(tcp6_l2_flags_buf) - 2)
@@ -2203,7 +2186,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 		len = tcp_l2_buf_fill_headers(c, conn, b, plen, check, seq);
 
 		iov = tcp4_l2_iov + tcp4_l2_buf_used++;
-		tcp4_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = len + sizeof(b->vnet_len);
 		if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	} else if (CONN_V6(conn)) {
@@ -2212,7 +2195,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 		len = tcp_l2_buf_fill_headers(c, conn, b, plen, NULL, seq);
 
 		iov = tcp6_l2_iov + tcp6_l2_buf_used++;
-		tcp6_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = len + sizeof(b->vnet_len);
 		if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush()
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (3 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2023-01-04 17:45   ` Stefano Brivio
  2022-12-09  5:42 ` [PATCH v2 06/18] tcp: Combine two parts of pasta tap send path together David Gibson
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently this takes a msghdr, but the only thing we actually care about
in there in is the io vector.  Make it take an io vector directly.  We also
have a weird side effect of zeroing @buf_used.  Just pass this by value
and zero it in the caller instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 59 ++++++++++++++++++++++++-----------------------------------
 1 file changed, 24 insertions(+), 35 deletions(-)

diff --git a/tcp.c b/tcp.c
index 4050ea6..d96122d 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1413,22 +1413,27 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
 /**
  * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
- * @mh:		Message header that was partially sent by sendmsg()
+ * @iov:	Pointer to array of buffers, one per frame
+ * @n:		Number of buffers/frames to flush
  */
-static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
+static void tcp_l2_buf_flush_passt(const struct ctx *c,
+				   const struct iovec *iov, size_t n)
 {
+	struct msghdr mh = {
+		.msg_iov = (void *)iov,
+		.msg_iovlen = n,
+	};
 	size_t end = 0, missing;
-	struct iovec *iov;
 	unsigned int i;
 	ssize_t sent;
 	char *p;
 
-	sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
 	if (sent < 0)
 		return;
 
 	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) {
+	for (i = 0; i < n; i++, iov++) {
 		end += iov->iov_len;
 		if (end >= (size_t)sent)
 			break;
@@ -1446,30 +1451,24 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
 /**
  * tcp_l2_flags_buf_flush() - Send out buffers for segments with or without data
  * @c:		Execution context
- * @mh:		Message header pointing to buffers, msg_iovlen not set
- * @buf_used:	Pointer to count of used buffers, set to 0 on return
  */
-static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
-			     unsigned int *buf_used)
+static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
 {
 	size_t i;
 
-	if (!(mh->msg_iovlen = *buf_used))
+	if (!n)
 		return;
 
 	if (c->mode == MODE_PASST) {
-		tcp_l2_buf_flush_passt(c, mh);
+		tcp_l2_buf_flush_passt(c, iov, n);
 	} else {
-		for (i = 0; i < mh->msg_iovlen; i++) {
-			struct iovec *iov = &mh->msg_iov[i];
-
-			if (tcp_l2_buf_write_one(c, iov))
+		for (i = 0; i < n; i++) {
+			if (tcp_l2_buf_write_one(c, iov + i))
 				i--;
 		}
 	}
-	*buf_used = 0;
 
-	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
+	pcap_multiple(iov, n, sizeof(uint32_t));
 }
 
 /**
@@ -1478,16 +1477,11 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
  */
 static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
-	struct msghdr mh = { 0 };
-	unsigned int *buf_used;
+	tcp_l2_buf_flush(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
+	tcp6_l2_flags_buf_used = 0;
 
-	mh.msg_iov	= tcp6_l2_flags_iov;
-	buf_used	= &tcp6_l2_flags_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
-
-	mh.msg_iov	= tcp4_l2_flags_iov;
-	buf_used	= &tcp4_l2_flags_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
+	tcp4_l2_flags_buf_used = 0;
 }
 
 /**
@@ -1496,16 +1490,11 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
  */
 static void tcp_l2_data_buf_flush(struct ctx *c)
 {
-	struct msghdr mh = { 0 };
-	unsigned int *buf_used;
-
-	mh.msg_iov = tcp6_l2_iov;
-	buf_used	= &tcp6_l2_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp6_l2_iov, tcp6_l2_buf_used);
+	tcp6_l2_buf_used = 0;
 
-	mh.msg_iov = tcp4_l2_iov;
-	buf_used	= &tcp4_l2_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp4_l2_iov, tcp4_l2_buf_used);
+	tcp4_l2_buf_used = 0;
 }
 
 /**
-- 
@@ -1413,22 +1413,27 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
 /**
  * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
  * @c:		Execution context
- * @mh:		Message header that was partially sent by sendmsg()
+ * @iov:	Pointer to array of buffers, one per frame
+ * @n:		Number of buffers/frames to flush
  */
-static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
+static void tcp_l2_buf_flush_passt(const struct ctx *c,
+				   const struct iovec *iov, size_t n)
 {
+	struct msghdr mh = {
+		.msg_iov = (void *)iov,
+		.msg_iovlen = n,
+	};
 	size_t end = 0, missing;
-	struct iovec *iov;
 	unsigned int i;
 	ssize_t sent;
 	char *p;
 
-	sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
 	if (sent < 0)
 		return;
 
 	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) {
+	for (i = 0; i < n; i++, iov++) {
 		end += iov->iov_len;
 		if (end >= (size_t)sent)
 			break;
@@ -1446,30 +1451,24 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh)
 /**
  * tcp_l2_flags_buf_flush() - Send out buffers for segments with or without data
  * @c:		Execution context
- * @mh:		Message header pointing to buffers, msg_iovlen not set
- * @buf_used:	Pointer to count of used buffers, set to 0 on return
  */
-static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
-			     unsigned int *buf_used)
+static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
 {
 	size_t i;
 
-	if (!(mh->msg_iovlen = *buf_used))
+	if (!n)
 		return;
 
 	if (c->mode == MODE_PASST) {
-		tcp_l2_buf_flush_passt(c, mh);
+		tcp_l2_buf_flush_passt(c, iov, n);
 	} else {
-		for (i = 0; i < mh->msg_iovlen; i++) {
-			struct iovec *iov = &mh->msg_iov[i];
-
-			if (tcp_l2_buf_write_one(c, iov))
+		for (i = 0; i < n; i++) {
+			if (tcp_l2_buf_write_one(c, iov + i))
 				i--;
 		}
 	}
-	*buf_used = 0;
 
-	pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t));
+	pcap_multiple(iov, n, sizeof(uint32_t));
 }
 
 /**
@@ -1478,16 +1477,11 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh,
  */
 static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
-	struct msghdr mh = { 0 };
-	unsigned int *buf_used;
+	tcp_l2_buf_flush(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
+	tcp6_l2_flags_buf_used = 0;
 
-	mh.msg_iov	= tcp6_l2_flags_iov;
-	buf_used	= &tcp6_l2_flags_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
-
-	mh.msg_iov	= tcp4_l2_flags_iov;
-	buf_used	= &tcp4_l2_flags_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
+	tcp4_l2_flags_buf_used = 0;
 }
 
 /**
@@ -1496,16 +1490,11 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
  */
 static void tcp_l2_data_buf_flush(struct ctx *c)
 {
-	struct msghdr mh = { 0 };
-	unsigned int *buf_used;
-
-	mh.msg_iov = tcp6_l2_iov;
-	buf_used	= &tcp6_l2_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp6_l2_iov, tcp6_l2_buf_used);
+	tcp6_l2_buf_used = 0;
 
-	mh.msg_iov = tcp4_l2_iov;
-	buf_used	= &tcp4_l2_buf_used;
-	tcp_l2_buf_flush(c, &mh, buf_used);
+	tcp_l2_buf_flush(c, tcp4_l2_iov, tcp4_l2_buf_used);
+	tcp4_l2_buf_used = 0;
 }
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/18] tcp: Combine two parts of pasta tap send path together
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (4 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush() David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 07/18] tap, tcp: Move tap send path to tap.c David Gibson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tcp_l2_buf_flush() open codes the loop across each frame in a group, but
but calls tcp_l2_buf_write_one() to send each frame to the pasta tuntap
device.  Combine these two pasta-specific operations into
tcp_l2_buf_flush_pasta() which is a little cleaner and will enable further
cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 40 ++++++++++++++++++----------------------
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/tcp.c b/tcp.c
index d96122d..9960a35 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1391,23 +1391,25 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn);
 	} while (0)
 
 /**
- * tcp_l2_buf_write_one() - Write a single buffer to tap file descriptor
+ * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface
  * @c:		Execution context
- * @iov:	struct iovec item pointing to buffer
- * @ts:		Current timestamp
- *
- * Return: 0 on success, negative error code on failure (tap reset possible)
+ * @iov:	Pointer to array of buffers, one per frame
+ * @n:		Number of buffers/frames to flush
  */
-static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
+static void tcp_l2_buf_flush_pasta(struct ctx *c,
+				  const struct iovec *iov, size_t n)
 {
-	if (write(c->fd_tap, (char *)iov->iov_base + 4, iov->iov_len - 4) < 0) {
-		debug("tap write: %s", strerror(errno));
-		if (errno != EAGAIN && errno != EWOULDBLOCK)
-			tap_handler(c, c->fd_tap, EPOLLERR, NULL);
-		return -errno;
-	}
+	size_t i;
 
-	return 0;
+	for (i = 0; i < n; i++) {
+		if (write(c->fd_tap, (char *)iov->iov_base + 4,
+			  iov->iov_len - 4) < 0) {
+			debug("tap write: %s", strerror(errno));
+			if (errno != EAGAIN && errno != EWOULDBLOCK)
+				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
+			i--;
+		}
+	}
 }
 
 /**
@@ -1454,19 +1456,13 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
  */
 static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
 {
-	size_t i;
-
 	if (!n)
 		return;
 
-	if (c->mode == MODE_PASST) {
+	if (c->mode == MODE_PASST)
 		tcp_l2_buf_flush_passt(c, iov, n);
-	} else {
-		for (i = 0; i < n; i++) {
-			if (tcp_l2_buf_write_one(c, iov + i))
-				i--;
-		}
-	}
+	else
+		tcp_l2_buf_flush_pasta(c, iov, n);
 
 	pcap_multiple(iov, n, sizeof(uint32_t));
 }
-- 
@@ -1391,23 +1391,25 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn);
 	} while (0)
 
 /**
- * tcp_l2_buf_write_one() - Write a single buffer to tap file descriptor
+ * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface
  * @c:		Execution context
- * @iov:	struct iovec item pointing to buffer
- * @ts:		Current timestamp
- *
- * Return: 0 on success, negative error code on failure (tap reset possible)
+ * @iov:	Pointer to array of buffers, one per frame
+ * @n:		Number of buffers/frames to flush
  */
-static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov)
+static void tcp_l2_buf_flush_pasta(struct ctx *c,
+				  const struct iovec *iov, size_t n)
 {
-	if (write(c->fd_tap, (char *)iov->iov_base + 4, iov->iov_len - 4) < 0) {
-		debug("tap write: %s", strerror(errno));
-		if (errno != EAGAIN && errno != EWOULDBLOCK)
-			tap_handler(c, c->fd_tap, EPOLLERR, NULL);
-		return -errno;
-	}
+	size_t i;
 
-	return 0;
+	for (i = 0; i < n; i++) {
+		if (write(c->fd_tap, (char *)iov->iov_base + 4,
+			  iov->iov_len - 4) < 0) {
+			debug("tap write: %s", strerror(errno));
+			if (errno != EAGAIN && errno != EWOULDBLOCK)
+				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
+			i--;
+		}
+	}
 }
 
 /**
@@ -1454,19 +1456,13 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c,
  */
 static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
 {
-	size_t i;
-
 	if (!n)
 		return;
 
-	if (c->mode == MODE_PASST) {
+	if (c->mode == MODE_PASST)
 		tcp_l2_buf_flush_passt(c, iov, n);
-	} else {
-		for (i = 0; i < n; i++) {
-			if (tcp_l2_buf_write_one(c, iov + i))
-				i--;
-		}
-	}
+	else
+		tcp_l2_buf_flush_pasta(c, iov, n);
 
 	pcap_multiple(iov, n, sizeof(uint32_t));
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/18] tap, tcp: Move tap send path to tap.c
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (5 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 06/18] tcp: Combine two parts of pasta tap send path together David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 08/18] util: Introduce hton*_constant() in place of #ifdefs David Gibson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

The functions which do the final steps of sending TCP packets on through
the tap interface - tcp_l2_buf_flush*() - no longer have anything that's
actually specific to TCP in them, other than comments and names.  Move them
all to tap.c.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tap.h |  1 +
 tcp.c | 85 +++--------------------------------------------------------
 3 files changed, 89 insertions(+), 81 deletions(-)

diff --git a/tap.c b/tap.c
index 2e603ed..558a734 100644
--- a/tap.c
+++ b/tap.c
@@ -53,6 +53,7 @@
 #include "netlink.h"
 #include "pasta.h"
 #include "packet.h"
+#include "tap.h"
 #include "log.h"
 
 /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */
@@ -302,6 +303,89 @@ void tap_icmp6_send(const struct ctx *c,
 		debug("tap: failed to send %lu bytes (IPv6)", len);
 }
 
+/**
+ * tap_send_frames_pasta() - Send multiple frames to the pasta tap
+ * @c:		Execution context
+ * @iov:	Array of buffers, each containing one frame
+ * @n:		Number of buffers/frames in @iov
+ *
+ * #syscalls:pasta write
+ */
+static void tap_send_frames_pasta(struct ctx *c,
+				  const struct iovec *iov, size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		if (write(c->fd_tap, (char *)iov->iov_base + 4,
+			  iov->iov_len - 4) < 0) {
+			debug("tap write: %s", strerror(errno));
+			if (errno != EAGAIN && errno != EWOULDBLOCK)
+				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
+			i--;
+		}
+	}
+}
+
+/**
+ * tap_send_frames_passt() - Send multiple frames to the passt tap
+ * @c:		Execution context
+ * @iov:	Array of buffers, each containing one frame
+ * @n:		Number of buffers/frames in @iov
+ *
+ * #syscalls:passt sendmsg send
+ */
+static void tap_send_frames_passt(const struct ctx *c,
+				  const struct iovec *iov, size_t n)
+{
+	struct msghdr mh = {
+		.msg_iov = (void *)iov,
+		.msg_iovlen = n,
+	};
+	size_t end = 0, missing;
+	unsigned int i;
+	ssize_t sent;
+	char *p;
+
+	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
+	if (sent < 0)
+		return;
+
+	/* Ensure a complete last message on partial sendmsg() */
+	for (i = 0; i < n; i++, iov++) {
+		end += iov->iov_len;
+		if (end >= (size_t)sent)
+			break;
+	}
+
+	missing = end - sent;
+	if (!missing)
+		return;
+
+	p = (char *)iov->iov_base + iov->iov_len - missing;
+	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
+		debug("tap: failed to flush %lu missing bytes to tap", missing);
+}
+
+/**
+ * tap_send_frames() - Send out multiple prepared frames
+ * @c:		Execution context
+ * @iov:	Array of buffers, each containing one frame (with L2 headers)
+ * @n:		Number of buffers/frames in @iov
+ */
+void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n)
+{
+	if (!n)
+		return;
+
+	if (c->mode == MODE_PASST)
+		tap_send_frames_passt(c, iov, n);
+	else
+		tap_send_frames_pasta(c, iov, n);
+
+	pcap_multiple(iov, n, sizeof(uint32_t));
+}
+
 PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
 
 /**
diff --git a/tap.h b/tap.h
index 674ab5c..ceac890 100644
--- a/tap.h
+++ b/tap.h
@@ -22,6 +22,7 @@ void tap_icmp6_send(const struct ctx *c,
 		    const struct in6_addr *src, const struct in6_addr *dst,
 		    void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
+void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
 void tap_sock_init(struct ctx *c);
diff --git a/tcp.c b/tcp.c
index 9960a35..0c46a7e 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1390,93 +1390,16 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn);
 		tcp_rst_do(c, conn);					\
 	} while (0)
 
-/**
- * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface
- * @c:		Execution context
- * @iov:	Pointer to array of buffers, one per frame
- * @n:		Number of buffers/frames to flush
- */
-static void tcp_l2_buf_flush_pasta(struct ctx *c,
-				  const struct iovec *iov, size_t n)
-{
-	size_t i;
-
-	for (i = 0; i < n; i++) {
-		if (write(c->fd_tap, (char *)iov->iov_base + 4,
-			  iov->iov_len - 4) < 0) {
-			debug("tap write: %s", strerror(errno));
-			if (errno != EAGAIN && errno != EWOULDBLOCK)
-				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
-			i--;
-		}
-	}
-}
-
-/**
- * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
- * @c:		Execution context
- * @iov:	Pointer to array of buffers, one per frame
- * @n:		Number of buffers/frames to flush
- */
-static void tcp_l2_buf_flush_passt(const struct ctx *c,
-				   const struct iovec *iov, size_t n)
-{
-	struct msghdr mh = {
-		.msg_iov = (void *)iov,
-		.msg_iovlen = n,
-	};
-	size_t end = 0, missing;
-	unsigned int i;
-	ssize_t sent;
-	char *p;
-
-	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-	if (sent < 0)
-		return;
-
-	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0; i < n; i++, iov++) {
-		end += iov->iov_len;
-		if (end >= (size_t)sent)
-			break;
-	}
-
-	missing = end - sent;
-	if (!missing)
-		return;
-
-	p = (char *)iov->iov_base + iov->iov_len - missing;
-	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
-		debug("TCP: failed to flush %lu missing bytes to tap", missing);
-}
-
-/**
- * tcp_l2_flags_buf_flush() - Send out buffers for segments with or without data
- * @c:		Execution context
- */
-static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
-{
-	if (!n)
-		return;
-
-	if (c->mode == MODE_PASST)
-		tcp_l2_buf_flush_passt(c, iov, n);
-	else
-		tcp_l2_buf_flush_pasta(c, iov, n);
-
-	pcap_multiple(iov, n, sizeof(uint32_t));
-}
-
 /**
  * tcp_l2_flags_buf_flush() - Send out buffers for segments with no data (flags)
  * @c:		Execution context
  */
 static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
-	tcp_l2_buf_flush(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
+	tap_send_frames(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
 	tcp6_l2_flags_buf_used = 0;
 
-	tcp_l2_buf_flush(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
+	tap_send_frames(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
 	tcp4_l2_flags_buf_used = 0;
 }
 
@@ -1486,10 +1409,10 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
  */
 static void tcp_l2_data_buf_flush(struct ctx *c)
 {
-	tcp_l2_buf_flush(c, tcp6_l2_iov, tcp6_l2_buf_used);
+	tap_send_frames(c, tcp6_l2_iov, tcp6_l2_buf_used);
 	tcp6_l2_buf_used = 0;
 
-	tcp_l2_buf_flush(c, tcp4_l2_iov, tcp4_l2_buf_used);
+	tap_send_frames(c, tcp4_l2_iov, tcp4_l2_buf_used);
 	tcp4_l2_buf_used = 0;
 }
 
-- 
@@ -1390,93 +1390,16 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn);
 		tcp_rst_do(c, conn);					\
 	} while (0)
 
-/**
- * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface
- * @c:		Execution context
- * @iov:	Pointer to array of buffers, one per frame
- * @n:		Number of buffers/frames to flush
- */
-static void tcp_l2_buf_flush_pasta(struct ctx *c,
-				  const struct iovec *iov, size_t n)
-{
-	size_t i;
-
-	for (i = 0; i < n; i++) {
-		if (write(c->fd_tap, (char *)iov->iov_base + 4,
-			  iov->iov_len - 4) < 0) {
-			debug("tap write: %s", strerror(errno));
-			if (errno != EAGAIN && errno != EWOULDBLOCK)
-				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
-			i--;
-		}
-	}
-}
-
-/**
- * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface
- * @c:		Execution context
- * @iov:	Pointer to array of buffers, one per frame
- * @n:		Number of buffers/frames to flush
- */
-static void tcp_l2_buf_flush_passt(const struct ctx *c,
-				   const struct iovec *iov, size_t n)
-{
-	struct msghdr mh = {
-		.msg_iov = (void *)iov,
-		.msg_iovlen = n,
-	};
-	size_t end = 0, missing;
-	unsigned int i;
-	ssize_t sent;
-	char *p;
-
-	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
-	if (sent < 0)
-		return;
-
-	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0; i < n; i++, iov++) {
-		end += iov->iov_len;
-		if (end >= (size_t)sent)
-			break;
-	}
-
-	missing = end - sent;
-	if (!missing)
-		return;
-
-	p = (char *)iov->iov_base + iov->iov_len - missing;
-	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
-		debug("TCP: failed to flush %lu missing bytes to tap", missing);
-}
-
-/**
- * tcp_l2_flags_buf_flush() - Send out buffers for segments with or without data
- * @c:		Execution context
- */
-static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n)
-{
-	if (!n)
-		return;
-
-	if (c->mode == MODE_PASST)
-		tcp_l2_buf_flush_passt(c, iov, n);
-	else
-		tcp_l2_buf_flush_pasta(c, iov, n);
-
-	pcap_multiple(iov, n, sizeof(uint32_t));
-}
-
 /**
  * tcp_l2_flags_buf_flush() - Send out buffers for segments with no data (flags)
  * @c:		Execution context
  */
 static void tcp_l2_flags_buf_flush(struct ctx *c)
 {
-	tcp_l2_buf_flush(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
+	tap_send_frames(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used);
 	tcp6_l2_flags_buf_used = 0;
 
-	tcp_l2_buf_flush(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
+	tap_send_frames(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used);
 	tcp4_l2_flags_buf_used = 0;
 }
 
@@ -1486,10 +1409,10 @@ static void tcp_l2_flags_buf_flush(struct ctx *c)
  */
 static void tcp_l2_data_buf_flush(struct ctx *c)
 {
-	tcp_l2_buf_flush(c, tcp6_l2_iov, tcp6_l2_buf_used);
+	tap_send_frames(c, tcp6_l2_iov, tcp6_l2_buf_used);
 	tcp6_l2_buf_used = 0;
 
-	tcp_l2_buf_flush(c, tcp4_l2_iov, tcp4_l2_buf_used);
+	tap_send_frames(c, tcp4_l2_iov, tcp4_l2_buf_used);
 	tcp4_l2_buf_used = 0;
 }
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/18] util: Introduce hton*_constant() in place of #ifdefs
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (6 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 07/18] tap, tcp: Move tap send path to tap.c David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 09/18] tcp, udp: Use named field initializers in iov_init functions David Gibson
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We have several places where we have fairly ugly #ifdefs on __BYTE_ORDER
where we need network order values in a constant expression (so we can't
use htons() or htonl()).  We can do this more cleanly by using a single
__BYTE_ORDER ifdef to define htons_constant() and htonl_constant()
macros, then using those in all the other places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 dhcpv6.c | 50 +++++++++++++-------------------------------------
 util.h   | 37 +++++++++++--------------------------
 2 files changed, 24 insertions(+), 63 deletions(-)

diff --git a/dhcpv6.c b/dhcpv6.c
index e763aed..d0e6c7f 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -39,37 +39,21 @@
  */
 struct opt_hdr {
 	uint16_t t;
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define OPT_CLIENTID		1
-# define OPT_SERVERID		2
-# define OPT_IA_NA		3
-# define OPT_IA_TA		4
-# define OPT_IAAADR		5
-# define OPT_STATUS_CODE	13
-# define  STATUS_NOTONLINK	4
-# define OPT_DNS_SERVERS	23
-# define OPT_DNS_SEARCH		24
-#else
-# define OPT_CLIENTID		__bswap_constant_16(1)
-# define OPT_SERVERID		__bswap_constant_16(2)
-# define OPT_IA_NA		__bswap_constant_16(3)
-# define OPT_IA_TA		__bswap_constant_16(4)
-# define OPT_IAAADR		__bswap_constant_16(5)
-# define OPT_STATUS_CODE	__bswap_constant_16(13)
-# define  STATUS_NOTONLINK	__bswap_constant_16(4)
-# define OPT_DNS_SERVERS	__bswap_constant_16(23)
-# define OPT_DNS_SEARCH		__bswap_constant_16(24)
-#endif
+# define OPT_CLIENTID		htons_constant(1)
+# define OPT_SERVERID		htons_constant(2)
+# define OPT_IA_NA		htons_constant(3)
+# define OPT_IA_TA		htons_constant(4)
+# define OPT_IAAADR		htons_constant(5)
+# define OPT_STATUS_CODE	htons_constant(13)
+# define  STATUS_NOTONLINK	htons_constant(4)
+# define OPT_DNS_SERVERS	htons_constant(23)
+# define OPT_DNS_SEARCH		htons_constant(24)
 #define   STR_NOTONLINK		"Prefix not appropriate for link."
 
 	uint16_t l;
 } __attribute__((packed));
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-# define OPT_SIZE_CONV(x)	(x)
-#else
-# define OPT_SIZE_CONV(x)	(__bswap_constant_16(x))
-#endif
+# define OPT_SIZE_CONV(x)	(htons_constant(x))
 #define OPT_SIZE(x)		OPT_SIZE_CONV(sizeof(struct opt_##x) -	\
 					      sizeof(struct opt_hdr))
 #define OPT_VSIZE(x)		(sizeof(struct opt_##x) - 		\
@@ -103,19 +87,11 @@ struct opt_server_id {
 	uint8_t duid_lladdr[ETH_ALEN];
 } __attribute__ ((packed));
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define SERVER_ID {						\
+#define SERVER_ID {							\
 	{ OPT_SERVERID,	OPT_SIZE(server_id) },				\
-	  DUID_TYPE_LLT, ARPHRD_ETHER, 0, { 0 }				\
+	  htons_constant(DUID_TYPE_LLT),				\
+	  htons_constant(ARPHRD_ETHER), 0, { 0 }			\
 }
-#else
-#define SERVER_ID {						\
-	{ OPT_SERVERID,	OPT_SIZE(server_id) },				\
-	__bswap_constant_16(DUID_TYPE_LLT),				\
-	__bswap_constant_16(ARPHRD_ETHER),				\
-	0, { 0 }							\
-}
-#endif
 
 /**
  * struct opt_ia_na - Identity Association for Non-temporary Addresses Option
diff --git a/util.h b/util.h
index 3baec91..b4c139d 100644
--- a/util.h
+++ b/util.h
@@ -69,6 +69,14 @@
 #define MAC_ZERO		((uint8_t [ETH_ALEN]){ 0 })
 #define MAC_IS_ZERO(addr)	(!memcmp((addr), MAC_ZERO, ETH_ALEN))
 
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define	htons_constant(x)	(x)
+#define	htonl_constant(x)	(x)
+#else
+#define	htons_constant(x)	(__bswap_constant_16(x))
+#define	htonl_constant(x)	(__bswap_constant_32(x))
+#endif
+
 #define IN4_IS_ADDR_UNSPECIFIED(a) \
 	((a)->s_addr == htonl(INADDR_ANY))
 #define IN4_IS_ADDR_BROADCAST(a) \
@@ -79,13 +87,8 @@
 	(IN_MULTICAST(ntohl((a)->s_addr)))
 #define IN4_ARE_ADDR_EQUAL(a, b) \
 	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= INADDR_LOOPBACK }
-#else
 #define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= __bswap_constant_32(INADDR_LOOPBACK) }
-#endif
+	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
 
 #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
 int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
@@ -99,37 +102,19 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define L2_BUF_ETH_IP4_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= ETH_P_IP,				\
-	}
-#else
 #define L2_BUF_ETH_IP4_INIT						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= __bswap_constant_16(ETH_P_IP),	\
+		.h_proto	= htons_constant(ETH_P_IP),		\
 	}
-#endif
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define L2_BUF_ETH_IP6_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= ETH_P_IPV6,				\
-	}
-#else
 #define L2_BUF_ETH_IP6_INIT						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= __bswap_constant_16(ETH_P_IPV6),	\
+		.h_proto	= htons_constant(ETH_P_IPV6),		\
 	}
-#endif
 
 #define L2_BUF_IP4_INIT(proto)						\
 	{								\
-- 
@@ -69,6 +69,14 @@
 #define MAC_ZERO		((uint8_t [ETH_ALEN]){ 0 })
 #define MAC_IS_ZERO(addr)	(!memcmp((addr), MAC_ZERO, ETH_ALEN))
 
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define	htons_constant(x)	(x)
+#define	htonl_constant(x)	(x)
+#else
+#define	htons_constant(x)	(__bswap_constant_16(x))
+#define	htonl_constant(x)	(__bswap_constant_32(x))
+#endif
+
 #define IN4_IS_ADDR_UNSPECIFIED(a) \
 	((a)->s_addr == htonl(INADDR_ANY))
 #define IN4_IS_ADDR_BROADCAST(a) \
@@ -79,13 +87,8 @@
 	(IN_MULTICAST(ntohl((a)->s_addr)))
 #define IN4_ARE_ADDR_EQUAL(a, b) \
 	(((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr)
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= INADDR_LOOPBACK }
-#else
 #define IN4ADDR_LOOPBACK_INIT \
-	{ .s_addr	= __bswap_constant_32(INADDR_LOOPBACK) }
-#endif
+	{ .s_addr	= htonl_constant(INADDR_LOOPBACK) }
 
 #define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 8)
 int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
@@ -99,37 +102,19 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define L2_BUF_ETH_IP4_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= ETH_P_IP,				\
-	}
-#else
 #define L2_BUF_ETH_IP4_INIT						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= __bswap_constant_16(ETH_P_IP),	\
+		.h_proto	= htons_constant(ETH_P_IP),		\
 	}
-#endif
 
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define L2_BUF_ETH_IP6_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= ETH_P_IPV6,				\
-	}
-#else
 #define L2_BUF_ETH_IP6_INIT						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= __bswap_constant_16(ETH_P_IPV6),	\
+		.h_proto	= htons_constant(ETH_P_IPV6),		\
 	}
-#endif
 
 #define L2_BUF_IP4_INIT(proto)						\
 	{								\
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/18] tcp, udp: Use named field initializers in iov_init functions
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (7 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 08/18] util: Introduce hton*_constant() in place of #ifdefs David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 10/18] util: Parameterize ethernet header initializer macro David Gibson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Both the TCP and UDP iov_init functions have some large structure literals
defined in "field order" style.  These are pretty hard to read since it's
not obvious what value corresponds to what field.  Use named field style
initializers instead to make this clearer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 26 ++++++++++++--------------
 udp.c | 13 ++++---------
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/tcp.c b/tcp.c
index 0c46a7e..79fc733 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1058,18 +1058,17 @@ static void tcp_sock4_iov_init(void)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_buf); i++) {
-		tcp4_l2_buf[i] = (struct tcp4_l2_buf_t) { 0, 0,
-			{ 0 },
-			0, L2_BUF_ETH_IP4_INIT, L2_BUF_IP4_INIT(IPPROTO_TCP),
-			{ .doff = sizeof(struct tcphdr) / 4, .ack = 1 }, { 0 },
+		tcp4_l2_buf[i] = (struct tcp4_l2_buf_t) {
+			.eh = L2_BUF_ETH_IP4_INIT,
+			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP),
+			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
 	}
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_flags_buf); i++) {
-		tcp4_l2_flags_buf[i] = (struct tcp4_l2_flags_buf_t) { 0, 0,
-			{ 0 },
-			0, L2_BUF_ETH_IP4_INIT, L2_BUF_IP4_INIT(IPPROTO_TCP),
-			{ 0 }, { 0 },
+		tcp4_l2_flags_buf[i] = (struct tcp4_l2_flags_buf_t) {
+			.eh = L2_BUF_ETH_IP4_INIT,
+			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP)
 		};
 	}
 
@@ -1092,17 +1091,16 @@ static void tcp_sock6_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_buf); i++) {
 		tcp6_l2_buf[i] = (struct tcp6_l2_buf_t) {
-			{ 0 },
-			0, L2_BUF_ETH_IP6_INIT, L2_BUF_IP6_INIT(IPPROTO_TCP),
-			{ .doff = sizeof(struct tcphdr) / 4, .ack = 1 }, { 0 },
+			.eh = L2_BUF_ETH_IP6_INIT,
+			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP),
+			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
 	}
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_flags_buf); i++) {
 		tcp6_l2_flags_buf[i] = (struct tcp6_l2_flags_buf_t) {
-			{ 0 },
-			0, L2_BUF_ETH_IP6_INIT, L2_BUF_IP6_INIT(IPPROTO_TCP),
-			{ 0 }, { 0 },
+			.eh = L2_BUF_ETH_IP6_INIT,
+			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP)
 		};
 	}
 
diff --git a/udp.c b/udp.c
index f7b9bdc..ffedca1 100644
--- a/udp.c
+++ b/udp.c
@@ -329,9 +329,8 @@ static void udp_sock4_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp4_l2_buf); i++) {
 		udp4_l2_buf[i] = (struct udp4_l2_buf_t) {
-			{ 0 }, 0, 0,
-			L2_BUF_ETH_IP4_INIT, L2_BUF_IP4_INIT(IPPROTO_UDP),
-			{{{ 0 }}}, { 0 },
+			.eh = L2_BUF_ETH_IP4_INIT,
+			.iph = L2_BUF_IP4_INIT(IPPROTO_UDP)
 		};
 	}
 
@@ -371,12 +370,8 @@ static void udp_sock6_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp6_l2_buf); i++) {
 		udp6_l2_buf[i] = (struct udp6_l2_buf_t) {
-			{ 0 },
-#ifdef __AVX2__
-			{ 0 },
-#endif
-			0, L2_BUF_ETH_IP6_INIT, L2_BUF_IP6_INIT(IPPROTO_UDP),
-			{{{ 0 }}}, { 0 },
+			.eh = L2_BUF_ETH_IP6_INIT,
+			.ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP)
 		};
 	}
 
-- 
@@ -329,9 +329,8 @@ static void udp_sock4_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp4_l2_buf); i++) {
 		udp4_l2_buf[i] = (struct udp4_l2_buf_t) {
-			{ 0 }, 0, 0,
-			L2_BUF_ETH_IP4_INIT, L2_BUF_IP4_INIT(IPPROTO_UDP),
-			{{{ 0 }}}, { 0 },
+			.eh = L2_BUF_ETH_IP4_INIT,
+			.iph = L2_BUF_IP4_INIT(IPPROTO_UDP)
 		};
 	}
 
@@ -371,12 +370,8 @@ static void udp_sock6_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp6_l2_buf); i++) {
 		udp6_l2_buf[i] = (struct udp6_l2_buf_t) {
-			{ 0 },
-#ifdef __AVX2__
-			{ 0 },
-#endif
-			0, L2_BUF_ETH_IP6_INIT, L2_BUF_IP6_INIT(IPPROTO_UDP),
-			{{{ 0 }}}, { 0 },
+			.eh = L2_BUF_ETH_IP6_INIT,
+			.ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP)
 		};
 	}
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/18] util: Parameterize ethernet header initializer macro
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (8 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 09/18] tcp, udp: Use named field initializers in iov_init functions David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 11/18] tcp: Remove redundant and incorrect initialization from *_iov_init() David Gibson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We have separate IPv4 and IPv6 versions of a macro to construct an
initializer for ethernet headers.  However, now that we have htons_constant
it's easy to simply paramterize this with the ethernet protocol number.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c  |  8 ++++----
 udp.c  |  4 ++--
 util.h | 11 ++---------
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/tcp.c b/tcp.c
index 79fc733..b30e8e6 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1059,7 +1059,7 @@ static void tcp_sock4_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_buf); i++) {
 		tcp4_l2_buf[i] = (struct tcp4_l2_buf_t) {
-			.eh = L2_BUF_ETH_IP4_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1067,7 +1067,7 @@ static void tcp_sock4_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_flags_buf); i++) {
 		tcp4_l2_flags_buf[i] = (struct tcp4_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_IP4_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP)
 		};
 	}
@@ -1091,7 +1091,7 @@ static void tcp_sock6_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_buf); i++) {
 		tcp6_l2_buf[i] = (struct tcp6_l2_buf_t) {
-			.eh = L2_BUF_ETH_IP6_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1099,7 +1099,7 @@ static void tcp_sock6_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_flags_buf); i++) {
 		tcp6_l2_flags_buf[i] = (struct tcp6_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_IP6_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP)
 		};
 	}
diff --git a/udp.c b/udp.c
index ffedca1..5609836 100644
--- a/udp.c
+++ b/udp.c
@@ -329,7 +329,7 @@ static void udp_sock4_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp4_l2_buf); i++) {
 		udp4_l2_buf[i] = (struct udp4_l2_buf_t) {
-			.eh = L2_BUF_ETH_IP4_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_UDP)
 		};
 	}
@@ -370,7 +370,7 @@ static void udp_sock6_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp6_l2_buf); i++) {
 		udp6_l2_buf[i] = (struct udp6_l2_buf_t) {
-			.eh = L2_BUF_ETH_IP6_INIT,
+			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP)
 		};
 	}
diff --git a/util.h b/util.h
index b4c139d..b5be7a6 100644
--- a/util.h
+++ b/util.h
@@ -102,18 +102,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_ETH_IP4_INIT						\
+#define L2_BUF_ETH_INIT(proto)						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(ETH_P_IP),		\
-	}
-
-#define L2_BUF_ETH_IP6_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(ETH_P_IPV6),		\
+		.h_proto	= htons_constant(proto),		\
 	}
 
 #define L2_BUF_IP4_INIT(proto)						\
-- 
@@ -102,18 +102,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_ETH_IP4_INIT						\
+#define L2_BUF_ETH_INIT(proto)						\
 	{								\
 		.h_dest		= { 0 },				\
 		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(ETH_P_IP),		\
-	}
-
-#define L2_BUF_ETH_IP6_INIT						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(ETH_P_IPV6),		\
+		.h_proto	= htons_constant(proto),		\
 	}
 
 #define L2_BUF_IP4_INIT(proto)						\
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 11/18] tcp: Remove redundant and incorrect initialization from *_iov_init()
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (9 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 10/18] util: Parameterize ethernet header initializer macro David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 12/18] tcp: Consolidate calculation of total frame size David Gibson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tcp_sock[46]_iov_init() initialize the length of each iovec buffer to
MSS_DEFAULT.  That will always be overwritten before use in
tcp_data_to_tap, so it's redundant.  It also wasn't correct, because it
didn't correctly account for the header lengths in all cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tcp.c b/tcp.c
index b30e8e6..0920c2a 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1072,10 +1072,8 @@ static void tcp_sock4_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) {
+	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp4_l2_buf[i].vnet_len;
-		iov->iov_len = MSS_DEFAULT;
-	}
 
 	for (i = 0, iov = tcp4_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp4_l2_flags_buf[i].vnet_len;
@@ -1104,10 +1102,8 @@ static void tcp_sock6_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) {
+	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp6_l2_buf[i].vnet_len;
-		iov->iov_len = MSS_DEFAULT;
-	}
 
 	for (i = 0, iov = tcp6_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp6_l2_flags_buf[i].vnet_len;
-- 
@@ -1072,10 +1072,8 @@ static void tcp_sock4_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) {
+	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp4_l2_buf[i].vnet_len;
-		iov->iov_len = MSS_DEFAULT;
-	}
 
 	for (i = 0, iov = tcp4_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp4_l2_flags_buf[i].vnet_len;
@@ -1104,10 +1102,8 @@ static void tcp_sock6_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++) {
+	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp6_l2_buf[i].vnet_len;
-		iov->iov_len = MSS_DEFAULT;
-	}
 
 	for (i = 0, iov = tcp6_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
 		iov->iov_base = &tcp6_l2_flags_buf[i].vnet_len;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 12/18] tcp: Consolidate calculation of total frame size
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (10 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 11/18] tcp: Remove redundant and incorrect initialization from *_iov_init() David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 13/18] tap: Add "tap headers" abstraction David Gibson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

tcp_l2_buf_fill_headers() returns the size of the generated frame including
the ethernet header.  The caller then adds on the size of the vnet_len
field to get the total frame size to be passed to the tap device.

Outside the tap code, though, we never care about the ethernet header size
only the final total size we need to put into an iovec.  So, consolidate
the total frame size calculation within tcp_l2_buf_fill_headers().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/tcp.c b/tcp.c
index 0920c2a..8bcb039 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1447,7 +1447,7 @@ void tcp_defer_handler(struct ctx *c)
  * @check:	Checksum, if already known
  * @seq:	Sequence number for this segment
  *
- * Return: 802.3 length, host order
+ * Return: frame length including L2 headers, host order
  */
 static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 				      const struct tcp_tap_conn *conn,
@@ -1455,7 +1455,7 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 				      const uint16_t *check, uint32_t seq)
 {
 	const struct in_addr *a4 = inany_v4(&conn->addr);
-	size_t ip_len, eth_len;
+	size_t ip_len, tlen;
 
 #define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq)			\
 do {									\
@@ -1489,9 +1489,10 @@ do {									\
 
 		tcp_update_check_tcp4(b);
 
-		eth_len = ip_len + sizeof(struct ethhdr);
+		tlen = ip_len + sizeof(struct ethhdr);
 		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(eth_len);
+			b->vnet_len = htonl(tlen);
+		tlen += sizeof(b->vnet_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
@@ -1514,14 +1515,14 @@ do {									\
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
 		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
 
-		eth_len = ip_len + sizeof(struct ethhdr);
+		tlen = ip_len + sizeof(struct ethhdr);
 		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(eth_len);
+			b->vnet_len = htonl(tlen);
+		tlen += sizeof(b->vnet_len);
 	}
-
 #undef SET_TCP_HEADER_COMMON_V4_V6
 
-	return eth_len;
+	return tlen;
 }
 
 /**
@@ -1627,8 +1628,8 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	struct tcp6_l2_flags_buf_t *b6 = NULL;
 	struct tcp_info tinfo = { 0 };
 	socklen_t sl = sizeof(tinfo);
-	size_t optlen = 0, eth_len;
 	int s = conn->sock;
+	size_t optlen = 0;
 	struct iovec *iov;
 	struct tcphdr *th;
 	char *data;
@@ -1717,9 +1718,8 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	th->syn = !!(flags & SYN);
 	th->fin = !!(flags & FIN);
 
-	eth_len = tcp_l2_buf_fill_headers(c, conn, p, optlen,
-					  NULL, conn->seq_to_tap);
-	iov->iov_len = eth_len + sizeof(uint32_t);
+	iov->iov_len = tcp_l2_buf_fill_headers(c, conn, p, optlen,
+					       NULL, conn->seq_to_tap);
 
 	if (th->ack)
 		conn_flag(c, conn, ~ACK_TO_TAP_DUE);
@@ -2079,25 +2079,22 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 			    ssize_t plen, int no_csum, uint32_t seq)
 {
 	struct iovec *iov;
-	size_t len;
 
 	if (CONN_V4(conn)) {
 		struct tcp4_l2_buf_t *b = &tcp4_l2_buf[tcp4_l2_buf_used];
 		uint16_t *check = no_csum ? &(b - 1)->iph.check : NULL;
 
-		len = tcp_l2_buf_fill_headers(c, conn, b, plen, check, seq);
-
 		iov = tcp4_l2_iov + tcp4_l2_buf_used++;
-		iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = tcp_l2_buf_fill_headers(c, conn, b, plen,
+						       check, seq);
 		if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	} else if (CONN_V6(conn)) {
 		struct tcp6_l2_buf_t *b = &tcp6_l2_buf[tcp6_l2_buf_used];
 
-		len = tcp_l2_buf_fill_headers(c, conn, b, plen, NULL, seq);
-
 		iov = tcp6_l2_iov + tcp6_l2_buf_used++;
-		iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = tcp_l2_buf_fill_headers(c, conn, b, plen,
+						       NULL, seq);
 		if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	}
-- 
@@ -1447,7 +1447,7 @@ void tcp_defer_handler(struct ctx *c)
  * @check:	Checksum, if already known
  * @seq:	Sequence number for this segment
  *
- * Return: 802.3 length, host order
+ * Return: frame length including L2 headers, host order
  */
 static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 				      const struct tcp_tap_conn *conn,
@@ -1455,7 +1455,7 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c,
 				      const uint16_t *check, uint32_t seq)
 {
 	const struct in_addr *a4 = inany_v4(&conn->addr);
-	size_t ip_len, eth_len;
+	size_t ip_len, tlen;
 
 #define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq)			\
 do {									\
@@ -1489,9 +1489,10 @@ do {									\
 
 		tcp_update_check_tcp4(b);
 
-		eth_len = ip_len + sizeof(struct ethhdr);
+		tlen = ip_len + sizeof(struct ethhdr);
 		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(eth_len);
+			b->vnet_len = htonl(tlen);
+		tlen += sizeof(b->vnet_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
@@ -1514,14 +1515,14 @@ do {									\
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
 		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
 
-		eth_len = ip_len + sizeof(struct ethhdr);
+		tlen = ip_len + sizeof(struct ethhdr);
 		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(eth_len);
+			b->vnet_len = htonl(tlen);
+		tlen += sizeof(b->vnet_len);
 	}
-
 #undef SET_TCP_HEADER_COMMON_V4_V6
 
-	return eth_len;
+	return tlen;
 }
 
 /**
@@ -1627,8 +1628,8 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	struct tcp6_l2_flags_buf_t *b6 = NULL;
 	struct tcp_info tinfo = { 0 };
 	socklen_t sl = sizeof(tinfo);
-	size_t optlen = 0, eth_len;
 	int s = conn->sock;
+	size_t optlen = 0;
 	struct iovec *iov;
 	struct tcphdr *th;
 	char *data;
@@ -1717,9 +1718,8 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	th->syn = !!(flags & SYN);
 	th->fin = !!(flags & FIN);
 
-	eth_len = tcp_l2_buf_fill_headers(c, conn, p, optlen,
-					  NULL, conn->seq_to_tap);
-	iov->iov_len = eth_len + sizeof(uint32_t);
+	iov->iov_len = tcp_l2_buf_fill_headers(c, conn, p, optlen,
+					       NULL, conn->seq_to_tap);
 
 	if (th->ack)
 		conn_flag(c, conn, ~ACK_TO_TAP_DUE);
@@ -2079,25 +2079,22 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn,
 			    ssize_t plen, int no_csum, uint32_t seq)
 {
 	struct iovec *iov;
-	size_t len;
 
 	if (CONN_V4(conn)) {
 		struct tcp4_l2_buf_t *b = &tcp4_l2_buf[tcp4_l2_buf_used];
 		uint16_t *check = no_csum ? &(b - 1)->iph.check : NULL;
 
-		len = tcp_l2_buf_fill_headers(c, conn, b, plen, check, seq);
-
 		iov = tcp4_l2_iov + tcp4_l2_buf_used++;
-		iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = tcp_l2_buf_fill_headers(c, conn, b, plen,
+						       check, seq);
 		if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	} else if (CONN_V6(conn)) {
 		struct tcp6_l2_buf_t *b = &tcp6_l2_buf[tcp6_l2_buf_used];
 
-		len = tcp_l2_buf_fill_headers(c, conn, b, plen, NULL, seq);
-
 		iov = tcp6_l2_iov + tcp6_l2_buf_used++;
-		iov->iov_len = len + sizeof(b->vnet_len);
+		iov->iov_len = tcp_l2_buf_fill_headers(c, conn, b, plen,
+						       NULL, seq);
 		if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1)
 			tcp_l2_data_buf_flush(c);
 	}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 13/18] tap: Add "tap headers" abstraction
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (11 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 12/18] tcp: Consolidate calculation of total frame size David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 14/18] tcp: Use abstracted tap header David Gibson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently both the TCP and UDP code need to deal in various places with the
details of the L2 headers, and also the tap-specific "vnet_len" header.
This makes abstracting the tap interface to new backends (e.g. vhost-user
or tun) more difficult.

To improve this abstraction, create a new 'tap_hdr' structure which
represents both L2 (always Ethernet at the moment, but might be vary in
future) and any additional tap specific headers (such as the qemu socket's
vnet_len field).  Provide helper functions and macros to initialize, update
and use it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 15 +++++++++++++++
 tap.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/tap.c b/tap.c
index 558a734..d0dd72c 100644
--- a/tap.c
+++ b/tap.c
@@ -386,6 +386,21 @@ void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n)
 	pcap_multiple(iov, n, sizeof(uint32_t));
 }
 
+/**
+ * tap_update_mac() - Update tap L2 header with new Ethernet addresses
+ * @taph:	Tap headers to update
+ * @eth_d:	Ethernet destination address, NULL if unchanged
+ * @eth_s:	Ethernet source address, NULL if unchanged
+ */
+void tap_update_mac(struct tap_hdr *taph,
+		    const unsigned char *eth_d, const unsigned char *eth_s)
+{
+	if (eth_d)
+		memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest));
+	if (eth_s)
+		memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source));
+}
+
 PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
 
 /**
diff --git a/tap.h b/tap.h
index ceac890..8fe460a 100644
--- a/tap.h
+++ b/tap.h
@@ -6,6 +6,55 @@
 #ifndef TAP_H
 #define TAP_H
 
+/**
+ * struct tap_hdr - L2 and tap specific headers
+ * @vnet_len:	Frame length (for qemu socket transport)
+ * @eh:		Ethernet header
+ */
+struct tap_hdr {
+	uint32_t vnet_len;
+	struct ethhdr eh;
+} __attribute__((packed));
+
+#define TAP_HDR_INIT(proto) { .eh.h_proto = htons_constant(proto) }
+
+static inline size_t tap_hdr_len_(const struct ctx *c)
+{
+	(void)c;
+	return sizeof(struct tap_hdr);
+}
+
+/**
+ * tap_iov_base() - Find start of tap frame
+ * @c:		Execution context
+ * @taph:	Pointer to L2 header buffer
+ *
+ * Returns: pointer to the start of tap frame - suitable for an
+ *          iov_base to be passed to tap_send_frames())
+ */
+static inline void *tap_iov_base(const struct ctx *c, struct tap_hdr *taph)
+{
+	return (char *)(taph + 1) - tap_hdr_len_(c);
+}
+
+/**
+ * tap_iov_len() - Finalize tap frame and return total length
+ * @c:		Execution context
+ * @taph:	Tap header to finalize
+ * @plen:	L2 payload length (excludes L2 and tap specific headers)
+ *
+ * Returns: length of the tap frame including L2 and tap specific
+ *          headers - suitable for an iov_len to be passed to
+ *          tap_send_frames()
+ */
+static inline size_t tap_iov_len(const struct ctx *c, struct tap_hdr *taph,
+				 size_t plen)
+{
+	if (c->mode == MODE_PASST)
+		taph->vnet_len = htonl(plen + sizeof(taph->eh));
+	return plen + tap_hdr_len_(c);
+}
+
 struct in_addr tap_ip4_daddr(const struct ctx *c);
 void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
@@ -23,6 +72,8 @@ void tap_icmp6_send(const struct ctx *c,
 		    void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n);
+void tap_update_mac(struct tap_hdr *taph,
+		    const unsigned char *eth_d, const unsigned char *eth_s);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
 void tap_sock_init(struct ctx *c);
-- 
@@ -6,6 +6,55 @@
 #ifndef TAP_H
 #define TAP_H
 
+/**
+ * struct tap_hdr - L2 and tap specific headers
+ * @vnet_len:	Frame length (for qemu socket transport)
+ * @eh:		Ethernet header
+ */
+struct tap_hdr {
+	uint32_t vnet_len;
+	struct ethhdr eh;
+} __attribute__((packed));
+
+#define TAP_HDR_INIT(proto) { .eh.h_proto = htons_constant(proto) }
+
+static inline size_t tap_hdr_len_(const struct ctx *c)
+{
+	(void)c;
+	return sizeof(struct tap_hdr);
+}
+
+/**
+ * tap_iov_base() - Find start of tap frame
+ * @c:		Execution context
+ * @taph:	Pointer to L2 header buffer
+ *
+ * Returns: pointer to the start of tap frame - suitable for an
+ *          iov_base to be passed to tap_send_frames())
+ */
+static inline void *tap_iov_base(const struct ctx *c, struct tap_hdr *taph)
+{
+	return (char *)(taph + 1) - tap_hdr_len_(c);
+}
+
+/**
+ * tap_iov_len() - Finalize tap frame and return total length
+ * @c:		Execution context
+ * @taph:	Tap header to finalize
+ * @plen:	L2 payload length (excludes L2 and tap specific headers)
+ *
+ * Returns: length of the tap frame including L2 and tap specific
+ *          headers - suitable for an iov_len to be passed to
+ *          tap_send_frames()
+ */
+static inline size_t tap_iov_len(const struct ctx *c, struct tap_hdr *taph,
+				 size_t plen)
+{
+	if (c->mode == MODE_PASST)
+		taph->vnet_len = htonl(plen + sizeof(taph->eh));
+	return plen + tap_hdr_len_(c);
+}
+
 struct in_addr tap_ip4_daddr(const struct ctx *c);
 void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
@@ -23,6 +72,8 @@ void tap_icmp6_send(const struct ctx *c,
 		    void *in, size_t len);
 int tap_send(const struct ctx *c, const void *data, size_t len);
 void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n);
+void tap_update_mac(struct tap_hdr *taph,
+		    const unsigned char *eth_d, const unsigned char *eth_s);
 void tap_handler(struct ctx *c, int fd, uint32_t events,
 		 const struct timespec *now);
 void tap_sock_init(struct ctx *c);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 14/18] tcp: Use abstracted tap header
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (12 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 13/18] tap: Add "tap headers" abstraction David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 15/18] tap: Use different io vector bases depending on tap type David Gibson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Update the TCP code to use the tap layer abstractions for initializing and
updating the L2 and lower headers.  This will make adding other tap
backends in future easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c | 85 +++++++++++++++++++++--------------------------------------
 1 file changed, 30 insertions(+), 55 deletions(-)

diff --git a/tcp.c b/tcp.c
index 8bcb039..26037b3 100644
--- a/tcp.c
+++ b/tcp.c
@@ -330,8 +330,7 @@ struct tcp4_l2_head {	/* For MSS4 macro: keep in sync with tcp4_l2_buf_t */
 #else
 	uint8_t pad[2];
 #endif
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct iphdr iph;
 	struct tcphdr th;
 #ifdef __AVX2__
@@ -346,8 +345,7 @@ struct tcp6_l2_head {	/* For MSS6 macro: keep in sync with tcp6_l2_buf_t */
 #else
 	uint8_t pad[2];
 #endif
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct ipv6hdr ip6h;
 	struct tcphdr th;
 #ifdef __AVX2__
@@ -448,8 +446,7 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
  * @psum:	Partial IP header checksum (excluding tot_len and saddr)
  * @tsum:	Partial TCP header checksum (excluding length and saddr)
  * @pad:	Align TCP header to 32 bytes, for AVX2 checksum calculation only
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @iph:	Pre-filled IP header (except for tot_len and saddr)
  * @uh:		Headroom for TCP header
  * @data:	Storage for TCP payload
@@ -462,8 +459,7 @@ static struct tcp4_l2_buf_t {
 #else
 	uint8_t pad[2];		/*	align iph to 4 bytes	8 */
 #endif
-	uint32_t vnet_len;	/* 26				10 */
-	struct ethhdr eh;	/* 30				14 */
+	struct tap_hdr taph;	/* 26				10 */
 	struct iphdr iph;	/* 44				28 */
 	struct tcphdr th;	/* 64				48 */
 	uint8_t data[MSS4];	/* 84				68 */
@@ -480,8 +476,7 @@ static unsigned int tcp4_l2_buf_used;
 /**
  * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
  * @pad:	Align IPv6 header for checksum calculation to 32B (AVX2) or 4B
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @ip6h:	Pre-filled IP header (except for payload_len and addresses)
  * @th:		Headroom for TCP header
  * @data:	Storage for TCP payload
@@ -492,8 +487,7 @@ struct tcp6_l2_buf_t {
 #else
 	uint8_t pad[2];		/*	align ip6h to 4 bytes	0 */
 #endif
-	uint32_t vnet_len;	/* 14				2 */
-	struct ethhdr eh;	/* 18				6 */
+	struct tap_hdr taph;	/* 14				2 */
 	struct ipv6hdr ip6h;	/* 32				20 */
 	struct tcphdr th;	/* 72				60 */
 	uint8_t data[MSS6];	/* 92				80 */
@@ -526,8 +520,7 @@ static struct iovec	tcp_iov			[UIO_MAXIOV];
  * @psum:	Partial IP header checksum (excluding tot_len and saddr)
  * @tsum:	Partial TCP header checksum (excluding length and saddr)
  * @pad:	Align TCP header to 32 bytes, for AVX2 checksum calculation only
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @iph:	Pre-filled IP header (except for tot_len and saddr)
  * @th:		Headroom for TCP header
  * @opts:	Headroom for TCP options
@@ -540,8 +533,7 @@ static struct tcp4_l2_flags_buf_t {
 #else
 	uint8_t pad[2];		/*	align iph to 4 bytes	8 */
 #endif
-	uint32_t vnet_len;	/* 26				10 */
-	struct ethhdr eh;	/* 30				14 */
+	struct tap_hdr taph;	/* 26				10 */
 	struct iphdr iph;	/* 44				28 */
 	struct tcphdr th;	/* 64				48 */
 	char opts[OPT_MSS_LEN + OPT_WS_LEN + 1];
@@ -557,8 +549,7 @@ static unsigned int tcp4_l2_flags_buf_used;
 /**
  * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (flags)
  * @pad:	Align IPv6 header for checksum calculation to 32B (AVX2) or 4B
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @ip6h:	Pre-filled IP header (except for payload_len and addresses)
  * @th:		Headroom for TCP header
  * @opts:	Headroom for TCP options
@@ -569,8 +560,7 @@ static struct tcp6_l2_flags_buf_t {
 #else
 	uint8_t pad[2];		/*	align ip6h to 4 bytes		   0 */
 #endif
-	uint32_t vnet_len;	/* 14					   2 */
-	struct ethhdr eh;	/* 18					   6 */
+	struct tap_hdr taph;	/* 14					   2 */
 	struct ipv6hdr ip6h;	/* 32					  20 */
 	struct tcphdr th	/* 72 */ __attribute__ ((aligned(4))); /* 60 */ 
 	char opts[OPT_MSS_LEN + OPT_WS_LEN + 1];
@@ -1013,21 +1003,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
 		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
 		struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i];
 
-		if (eth_d) {
-			memcpy(b4->eh.h_dest, eth_d, ETH_ALEN);
-			memcpy(b6->eh.h_dest, eth_d, ETH_ALEN);
-
-			memcpy(b4f->eh.h_dest, eth_d, ETH_ALEN);
-			memcpy(b6f->eh.h_dest, eth_d, ETH_ALEN);
-		}
-
-		if (eth_s) {
-			memcpy(b4->eh.h_source, eth_s, ETH_ALEN);
-			memcpy(b6->eh.h_source, eth_s, ETH_ALEN);
-
-			memcpy(b4f->eh.h_source, eth_s, ETH_ALEN);
-			memcpy(b6f->eh.h_source, eth_s, ETH_ALEN);
-		}
+		tap_update_mac(&b4->taph, eth_d, eth_s);
+		tap_update_mac(&b6->taph, eth_d, eth_s);
+		tap_update_mac(&b4f->taph, eth_d, eth_s);
+		tap_update_mac(&b6f->taph, eth_d, eth_s);
 
 		if (ip_da) {
 			b4f->iph.daddr = b4->iph.daddr = ip_da->s_addr;
@@ -1051,15 +1030,16 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
 
 /**
  * tcp_sock4_iov_init() - Initialise scatter-gather L2 buffers for IPv4 sockets
+ * @c:		Execution context
  */
-static void tcp_sock4_iov_init(void)
+static void tcp_sock4_iov_init(const struct ctx *c)
 {
 	struct iovec *iov;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_buf); i++) {
 		tcp4_l2_buf[i] = (struct tcp4_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
+			.taph = TAP_HDR_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1067,29 +1047,30 @@ static void tcp_sock4_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_flags_buf); i++) {
 		tcp4_l2_flags_buf[i] = (struct tcp4_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
+			.taph = TAP_HDR_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP)
 		};
 	}
 
 	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp4_l2_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp4_l2_buf[i].taph);
 
 	for (i = 0, iov = tcp4_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp4_l2_flags_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp4_l2_flags_buf[i].taph);
 }
 
 /**
  * tcp_sock6_iov_init() - Initialise scatter-gather L2 buffers for IPv6 sockets
+ * @c:		Execution context
  */
-static void tcp_sock6_iov_init(void)
+static void tcp_sock6_iov_init(const struct ctx *c)
 {
 	struct iovec *iov;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_buf); i++) {
 		tcp6_l2_buf[i] = (struct tcp6_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
+			.taph = TAP_HDR_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1097,16 +1078,16 @@ static void tcp_sock6_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_flags_buf); i++) {
 		tcp6_l2_flags_buf[i] = (struct tcp6_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
+			.taph = TAP_HDR_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP)
 		};
 	}
 
 	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp6_l2_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp6_l2_buf[i].taph);
 
 	for (i = 0, iov = tcp6_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp6_l2_flags_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp6_l2_flags_buf[i].taph);
 }
 
 /**
@@ -1489,10 +1470,7 @@ do {									\
 
 		tcp_update_check_tcp4(b);
 
-		tlen = ip_len + sizeof(struct ethhdr);
-		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(tlen);
-		tlen += sizeof(b->vnet_len);
+		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
@@ -1515,10 +1493,7 @@ do {									\
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
 		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
 
-		tlen = ip_len + sizeof(struct ethhdr);
-		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(tlen);
-		tlen += sizeof(b->vnet_len);
+		tlen = tap_iov_len(c, &b->taph, ip_len);
 	}
 #undef SET_TCP_HEADER_COMMON_V4_V6
 
@@ -3113,10 +3088,10 @@ int tcp_init(struct ctx *c)
 		tcp_l2_mh[i] = (struct mmsghdr) { .msg_hdr.msg_iovlen = 1 };
 
 	if (c->ifi4)
-		tcp_sock4_iov_init();
+		tcp_sock4_iov_init(c);
 
 	if (c->ifi6)
-		tcp_sock6_iov_init();
+		tcp_sock6_iov_init(c);
 
 	memset(init_sock_pool4,		0xff,	sizeof(init_sock_pool4));
 	memset(init_sock_pool6,		0xff,	sizeof(init_sock_pool6));
-- 
@@ -330,8 +330,7 @@ struct tcp4_l2_head {	/* For MSS4 macro: keep in sync with tcp4_l2_buf_t */
 #else
 	uint8_t pad[2];
 #endif
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct iphdr iph;
 	struct tcphdr th;
 #ifdef __AVX2__
@@ -346,8 +345,7 @@ struct tcp6_l2_head {	/* For MSS6 macro: keep in sync with tcp6_l2_buf_t */
 #else
 	uint8_t pad[2];
 #endif
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct ipv6hdr ip6h;
 	struct tcphdr th;
 #ifdef __AVX2__
@@ -448,8 +446,7 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
  * @psum:	Partial IP header checksum (excluding tot_len and saddr)
  * @tsum:	Partial TCP header checksum (excluding length and saddr)
  * @pad:	Align TCP header to 32 bytes, for AVX2 checksum calculation only
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @iph:	Pre-filled IP header (except for tot_len and saddr)
  * @uh:		Headroom for TCP header
  * @data:	Storage for TCP payload
@@ -462,8 +459,7 @@ static struct tcp4_l2_buf_t {
 #else
 	uint8_t pad[2];		/*	align iph to 4 bytes	8 */
 #endif
-	uint32_t vnet_len;	/* 26				10 */
-	struct ethhdr eh;	/* 30				14 */
+	struct tap_hdr taph;	/* 26				10 */
 	struct iphdr iph;	/* 44				28 */
 	struct tcphdr th;	/* 64				48 */
 	uint8_t data[MSS4];	/* 84				68 */
@@ -480,8 +476,7 @@ static unsigned int tcp4_l2_buf_used;
 /**
  * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
  * @pad:	Align IPv6 header for checksum calculation to 32B (AVX2) or 4B
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @ip6h:	Pre-filled IP header (except for payload_len and addresses)
  * @th:		Headroom for TCP header
  * @data:	Storage for TCP payload
@@ -492,8 +487,7 @@ struct tcp6_l2_buf_t {
 #else
 	uint8_t pad[2];		/*	align ip6h to 4 bytes	0 */
 #endif
-	uint32_t vnet_len;	/* 14				2 */
-	struct ethhdr eh;	/* 18				6 */
+	struct tap_hdr taph;	/* 14				2 */
 	struct ipv6hdr ip6h;	/* 32				20 */
 	struct tcphdr th;	/* 72				60 */
 	uint8_t data[MSS6];	/* 92				80 */
@@ -526,8 +520,7 @@ static struct iovec	tcp_iov			[UIO_MAXIOV];
  * @psum:	Partial IP header checksum (excluding tot_len and saddr)
  * @tsum:	Partial TCP header checksum (excluding length and saddr)
  * @pad:	Align TCP header to 32 bytes, for AVX2 checksum calculation only
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @iph:	Pre-filled IP header (except for tot_len and saddr)
  * @th:		Headroom for TCP header
  * @opts:	Headroom for TCP options
@@ -540,8 +533,7 @@ static struct tcp4_l2_flags_buf_t {
 #else
 	uint8_t pad[2];		/*	align iph to 4 bytes	8 */
 #endif
-	uint32_t vnet_len;	/* 26				10 */
-	struct ethhdr eh;	/* 30				14 */
+	struct tap_hdr taph;	/* 26				10 */
 	struct iphdr iph;	/* 44				28 */
 	struct tcphdr th;	/* 64				48 */
 	char opts[OPT_MSS_LEN + OPT_WS_LEN + 1];
@@ -557,8 +549,7 @@ static unsigned int tcp4_l2_flags_buf_used;
 /**
  * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (flags)
  * @pad:	Align IPv6 header for checksum calculation to 32B (AVX2) or 4B
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @ip6h:	Pre-filled IP header (except for payload_len and addresses)
  * @th:		Headroom for TCP header
  * @opts:	Headroom for TCP options
@@ -569,8 +560,7 @@ static struct tcp6_l2_flags_buf_t {
 #else
 	uint8_t pad[2];		/*	align ip6h to 4 bytes		   0 */
 #endif
-	uint32_t vnet_len;	/* 14					   2 */
-	struct ethhdr eh;	/* 18					   6 */
+	struct tap_hdr taph;	/* 14					   2 */
 	struct ipv6hdr ip6h;	/* 32					  20 */
 	struct tcphdr th	/* 72 */ __attribute__ ((aligned(4))); /* 60 */ 
 	char opts[OPT_MSS_LEN + OPT_WS_LEN + 1];
@@ -1013,21 +1003,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
 		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
 		struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i];
 
-		if (eth_d) {
-			memcpy(b4->eh.h_dest, eth_d, ETH_ALEN);
-			memcpy(b6->eh.h_dest, eth_d, ETH_ALEN);
-
-			memcpy(b4f->eh.h_dest, eth_d, ETH_ALEN);
-			memcpy(b6f->eh.h_dest, eth_d, ETH_ALEN);
-		}
-
-		if (eth_s) {
-			memcpy(b4->eh.h_source, eth_s, ETH_ALEN);
-			memcpy(b6->eh.h_source, eth_s, ETH_ALEN);
-
-			memcpy(b4f->eh.h_source, eth_s, ETH_ALEN);
-			memcpy(b6f->eh.h_source, eth_s, ETH_ALEN);
-		}
+		tap_update_mac(&b4->taph, eth_d, eth_s);
+		tap_update_mac(&b6->taph, eth_d, eth_s);
+		tap_update_mac(&b4f->taph, eth_d, eth_s);
+		tap_update_mac(&b6f->taph, eth_d, eth_s);
 
 		if (ip_da) {
 			b4f->iph.daddr = b4->iph.daddr = ip_da->s_addr;
@@ -1051,15 +1030,16 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
 
 /**
  * tcp_sock4_iov_init() - Initialise scatter-gather L2 buffers for IPv4 sockets
+ * @c:		Execution context
  */
-static void tcp_sock4_iov_init(void)
+static void tcp_sock4_iov_init(const struct ctx *c)
 {
 	struct iovec *iov;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_buf); i++) {
 		tcp4_l2_buf[i] = (struct tcp4_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
+			.taph = TAP_HDR_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1067,29 +1047,30 @@ static void tcp_sock4_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp4_l2_flags_buf); i++) {
 		tcp4_l2_flags_buf[i] = (struct tcp4_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
+			.taph = TAP_HDR_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_TCP)
 		};
 	}
 
 	for (i = 0, iov = tcp4_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp4_l2_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp4_l2_buf[i].taph);
 
 	for (i = 0, iov = tcp4_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp4_l2_flags_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp4_l2_flags_buf[i].taph);
 }
 
 /**
  * tcp_sock6_iov_init() - Initialise scatter-gather L2 buffers for IPv6 sockets
+ * @c:		Execution context
  */
-static void tcp_sock6_iov_init(void)
+static void tcp_sock6_iov_init(const struct ctx *c)
 {
 	struct iovec *iov;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_buf); i++) {
 		tcp6_l2_buf[i] = (struct tcp6_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
+			.taph = TAP_HDR_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP),
 			.th = { .doff = sizeof(struct tcphdr) / 4, .ack = 1 }
 		};
@@ -1097,16 +1078,16 @@ static void tcp_sock6_iov_init(void)
 
 	for (i = 0; i < ARRAY_SIZE(tcp6_l2_flags_buf); i++) {
 		tcp6_l2_flags_buf[i] = (struct tcp6_l2_flags_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
+			.taph = TAP_HDR_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_TCP)
 		};
 	}
 
 	for (i = 0, iov = tcp6_l2_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp6_l2_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp6_l2_buf[i].taph);
 
 	for (i = 0, iov = tcp6_l2_flags_iov; i < TCP_FRAMES_MEM; i++, iov++)
-		iov->iov_base = &tcp6_l2_flags_buf[i].vnet_len;
+		iov->iov_base = tap_iov_base(c, &tcp6_l2_flags_buf[i].taph);
 }
 
 /**
@@ -1489,10 +1470,7 @@ do {									\
 
 		tcp_update_check_tcp4(b);
 
-		tlen = ip_len + sizeof(struct ethhdr);
-		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(tlen);
-		tlen += sizeof(b->vnet_len);
+		tlen = tap_iov_len(c, &b->taph, ip_len);
 	} else {
 		struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
 
@@ -1515,10 +1493,7 @@ do {									\
 		b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff;
 		b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff;
 
-		tlen = ip_len + sizeof(struct ethhdr);
-		if (c->mode == MODE_PASST)
-			b->vnet_len = htonl(tlen);
-		tlen += sizeof(b->vnet_len);
+		tlen = tap_iov_len(c, &b->taph, ip_len);
 	}
 #undef SET_TCP_HEADER_COMMON_V4_V6
 
@@ -3113,10 +3088,10 @@ int tcp_init(struct ctx *c)
 		tcp_l2_mh[i] = (struct mmsghdr) { .msg_hdr.msg_iovlen = 1 };
 
 	if (c->ifi4)
-		tcp_sock4_iov_init();
+		tcp_sock4_iov_init(c);
 
 	if (c->ifi6)
-		tcp_sock6_iov_init();
+		tcp_sock6_iov_init(c);
 
 	memset(init_sock_pool4,		0xff,	sizeof(init_sock_pool4));
 	memset(init_sock_pool6,		0xff,	sizeof(init_sock_pool6));
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 15/18] tap: Use different io vector bases depending on tap type
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (13 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 14/18] tcp: Use abstracted tap header David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 16/18] udp: Use abstracted tap header David Gibson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently tap_send_frames() expects the frames it is given to include the
vnet_len field, even in pasta mode which doesn't use it (although it need
not be initialized in that case).  To match, tap_iov_base() and
tap_iov_len() construct the frame in that way.

This will inconvenience future changes, so alter things to set the buffers
to include just the frame needed by the tap backend type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 5 ++---
 tap.h | 6 ++++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tap.c b/tap.c
index d0dd72c..5ec6b70 100644
--- a/tap.c
+++ b/tap.c
@@ -317,8 +317,7 @@ static void tap_send_frames_pasta(struct ctx *c,
 	size_t i;
 
 	for (i = 0; i < n; i++) {
-		if (write(c->fd_tap, (char *)iov->iov_base + 4,
-			  iov->iov_len - 4) < 0) {
+		if (write(c->fd_tap, (char *)iov->iov_base, iov->iov_len) < 0) {
 			debug("tap write: %s", strerror(errno));
 			if (errno != EAGAIN && errno != EWOULDBLOCK)
 				tap_handler(c, c->fd_tap, EPOLLERR, NULL);
@@ -383,7 +382,7 @@ void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n)
 	else
 		tap_send_frames_pasta(c, iov, n);
 
-	pcap_multiple(iov, n, sizeof(uint32_t));
+	pcap_multiple(iov, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0);
 }
 
 /**
diff --git a/tap.h b/tap.h
index 8fe460a..40cf480 100644
--- a/tap.h
+++ b/tap.h
@@ -20,8 +20,10 @@ struct tap_hdr {
 
 static inline size_t tap_hdr_len_(const struct ctx *c)
 {
-	(void)c;
-	return sizeof(struct tap_hdr);
+	if (c->mode == MODE_PASST)
+		return sizeof(struct tap_hdr);
+	else
+		return sizeof(struct ethhdr);
 }
 
 /**
-- 
@@ -20,8 +20,10 @@ struct tap_hdr {
 
 static inline size_t tap_hdr_len_(const struct ctx *c)
 {
-	(void)c;
-	return sizeof(struct tap_hdr);
+	if (c->mode == MODE_PASST)
+		return sizeof(struct tap_hdr);
+	else
+		return sizeof(struct ethhdr);
 }
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 16/18] udp: Use abstracted tap header
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (14 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 15/18] tap: Use different io vector bases depending on tap type David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2022-12-09  5:42 ` [PATCH v2 17/18] udp: Use tap_send_frames() David Gibson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Update the UDP code to use the tap layer abstractions for initializing and
updating the L2 and lower headers.  This will make adding other tap
backends in future easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c  | 79 ++++++++++++++++++----------------------------------------
 util.h |  7 ------
 2 files changed, 24 insertions(+), 62 deletions(-)

diff --git a/udp.c b/udp.c
index 5609836..910f15f 100644
--- a/udp.c
+++ b/udp.c
@@ -169,8 +169,7 @@ static uint8_t udp_act[IP_VERSIONS][UDP_ACT_TYPE_MAX][DIV_ROUND_UP(NUM_PORTS, 8)
  * udp4_l2_buf_t - Pre-cooked IPv4 packet buffers for tap connections
  * @s_in:	Source socket address, filled in by recvmmsg()
  * @psum:	Partial IP header checksum (excluding tot_len and saddr)
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @iph:	Pre-filled IP header (except for tot_len and saddr)
  * @uh:		Headroom for UDP header
  * @data:	Storage for UDP payload
@@ -179,8 +178,7 @@ static struct udp4_l2_buf_t {
 	struct sockaddr_in s_in;
 	uint32_t psum;
 
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct iphdr iph;
 	struct udphdr uh;
 	uint8_t data[USHRT_MAX -
@@ -191,8 +189,7 @@ udp4_l2_buf[UDP_MAX_FRAMES];
 /**
  * udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
  * @s_in6:	Source socket address, filled in by recvmmsg()
- * @vnet_len:	4-byte qemu vnet buffer length descriptor, only for passt mode
- * @eh:		Pre-filled Ethernet header
+ * @taph:	Tap-level headers (partially pre-filled)
  * @ip6h:	Pre-filled IP header (except for payload_len and addresses)
  * @uh:		Headroom for UDP header
  * @data:	Storage for UDP payload
@@ -205,8 +202,7 @@ struct udp6_l2_buf_t {
 			  sizeof(uint32_t))];
 #endif
 
-	uint32_t vnet_len;
-	struct ethhdr eh;
+	struct tap_hdr taph;
 	struct ipv6hdr ip6h;
 	struct udphdr uh;
 	uint8_t data[USHRT_MAX -
@@ -294,15 +290,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
-		if (eth_d) {
-			memcpy(b4->eh.h_dest, eth_d, ETH_ALEN);
-			memcpy(b6->eh.h_dest, eth_d, ETH_ALEN);
-		}
-
-		if (eth_s) {
-			memcpy(b4->eh.h_source, eth_s, ETH_ALEN);
-			memcpy(b6->eh.h_source, eth_s, ETH_ALEN);
-		}
+		tap_update_mac(&b4->taph, eth_d, eth_s);
+		tap_update_mac(&b6->taph, eth_d, eth_s);
 
 		if (ip_da) {
 			b4->iph.daddr = ip_da->s_addr;
@@ -329,7 +318,7 @@ static void udp_sock4_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp4_l2_buf); i++) {
 		udp4_l2_buf[i] = (struct udp4_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IP),
+			.taph = TAP_HDR_INIT(ETH_P_IP),
 			.iph = L2_BUF_IP4_INIT(IPPROTO_UDP)
 		};
 	}
@@ -346,16 +335,13 @@ static void udp_sock4_iov_init(const struct ctx *c)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp4_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
-		struct msghdr *mh = &h->msg_hdr;
-
-		if (c->mode == MODE_PASTA)
-			udp4_l2_iov_tap[i].iov_base	= &udp4_l2_buf[i].eh;
-		else
-			udp4_l2_iov_tap[i].iov_base	= &udp4_l2_buf[i].vnet_len;
+	for (i = 0; i < UDP_MAX_FRAMES; i++) {
+		struct msghdr *mh = &udp4_l2_mh_tap[i].msg_hdr;
+		struct iovec *iov = &udp4_l2_iov_tap[i];
 
-		mh->msg_iov			= &udp4_l2_iov_tap[i];
-		mh->msg_iovlen			= 1;
+		iov->iov_base	= tap_iov_base(c, &udp4_l2_buf[i].taph);
+		mh->msg_iov	= iov;
+		mh->msg_iovlen	= 1;
 	}
 }
 
@@ -370,7 +356,7 @@ static void udp_sock6_iov_init(const struct ctx *c)
 
 	for (i = 0; i < ARRAY_SIZE(udp6_l2_buf); i++) {
 		udp6_l2_buf[i] = (struct udp6_l2_buf_t) {
-			.eh = L2_BUF_ETH_INIT(ETH_P_IPV6),
+			.taph = TAP_HDR_INIT(ETH_P_IPV6),
 			.ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP)
 		};
 	}
@@ -387,16 +373,13 @@ static void udp_sock6_iov_init(const struct ctx *c)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp6_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
-		struct msghdr *mh = &h->msg_hdr;
-
-		if (c->mode == MODE_PASTA)
-			udp6_l2_iov_tap[i].iov_base	= &udp6_l2_buf[i].eh;
-		else
-			udp6_l2_iov_tap[i].iov_base	= &udp6_l2_buf[i].vnet_len;
+	for (i = 0; i < UDP_MAX_FRAMES; i++) {
+		struct msghdr *mh = &udp6_l2_mh_tap[i].msg_hdr;
+		struct iovec *iov = &udp6_l2_iov_tap[i];
 
-		mh->msg_iov			= &udp6_l2_iov_tap[i];
-		mh->msg_iovlen			= 1;
+		iov->iov_base	= tap_iov_base(c, &udp6_l2_buf[i].taph);
+		mh->msg_iov	= iov;
+		mh->msg_iovlen	= 1;
 	}
 }
 
@@ -609,8 +592,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
 	struct udp4_l2_buf_t *b = &udp4_l2_buf[n];
-	size_t ip_len, buf_len;
 	in_port_t src_port;
+	size_t ip_len;
 
 	ip_len = udp4_l2_mh_sock[n].msg_len + sizeof(b->iph) + sizeof(b->uh);
 
@@ -644,14 +627,7 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	b->uh.dest = htons(dstport);
 	b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
 
-	buf_len = ip_len + sizeof(b->eh);
-
-	if (c->mode == MODE_PASST) {
-		b->vnet_len = htonl(buf_len);
-		buf_len += sizeof(b->vnet_len);
-	}
-
-	return buf_len;
+	return tap_iov_len(c, &b->taph, ip_len);
 }
 
 /**
@@ -667,9 +643,9 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
 	struct udp6_l2_buf_t *b = &udp6_l2_buf[n];
-	size_t ip_len, buf_len;
 	struct in6_addr *src;
 	in_port_t src_port;
+	size_t ip_len;
 
 	src = &b->s_in6.sin6_addr;
 	src_port = ntohs(b->s_in6.sin6_port);
@@ -726,14 +702,7 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 	b->ip6h.nexthdr = IPPROTO_UDP;
 	b->ip6h.hop_limit = 255;
 
-	buf_len = ip_len + sizeof(b->eh);
-
-	if (c->mode == MODE_PASST) {
-		b->vnet_len = htonl(buf_len);
-		buf_len += sizeof(b->vnet_len);
-	}
-
-	return buf_len;
+	return tap_iov_len(c, &b->taph, ip_len);
 }
 
 /**
diff --git a/util.h b/util.h
index b5be7a6..4a1c03e 100644
--- a/util.h
+++ b/util.h
@@ -102,13 +102,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_ETH_INIT(proto)						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(proto),		\
-	}
-
 #define L2_BUF_IP4_INIT(proto)						\
 	{								\
 		.version	= 4,					\
-- 
@@ -102,13 +102,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 			 (void *)(arg));				\
 	} while (0)
 
-#define L2_BUF_ETH_INIT(proto)						\
-	{								\
-		.h_dest		= { 0 },				\
-		.h_source	= { 0 },				\
-		.h_proto	= htons_constant(proto),		\
-	}
-
 #define L2_BUF_IP4_INIT(proto)						\
 	{								\
 		.version	= 4,					\
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 17/18] udp: Use tap_send_frames()
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (15 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 16/18] udp: Use abstracted tap header David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2023-01-04 17:45   ` Stefano Brivio
  2022-12-09  5:42 ` [PATCH v2 18/18] tap: Improve handling of partial frame sends David Gibson
  2023-01-04 17:45 ` [PATCH v2 00/18] RFC: Unify and simplify tap send path Stefano Brivio
  18 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

To send frames on the tap interface, the UDP uses a fairly complicated two
level batching.  First multiple frames are gathered into a single "message"
for the qemu stream socket, then multiple messages are send with
sendmmsg().  We now have tap_send_frames() which already deals with sending
a number of frames, including batching and handling partial sends.  Use
that to considerably simplify things.

This does make a couple of behavioural changes:
  * We no longer split messages to keep them under 64kiB, which comments
    say is necessary.  However the TCP code didn't have equivalent code, so
    either this isn't actually needed, or we should implement for both
    (which is now easier since it can be done in one place).
  * Previously when we got a partial send on UDP, we would resend the
    remainder of the entire "message", including multiple frames.  The
    common code now only resends the remainder of a single frame, simply
    dropping any frames which weren't even partially sent.  This is what
    TCP always did and is probably a better idea for UDP too.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c |  27 -----------
 pcap.h |   1 -
 udp.c  | 141 +++------------------------------------------------------
 udp.h  |   2 +-
 4 files changed, 8 insertions(+), 163 deletions(-)

diff --git a/pcap.c b/pcap.c
index 8af9021..36f16b5 100644
--- a/pcap.c
+++ b/pcap.c
@@ -120,33 +120,6 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 			   iov[i].iov_len - offset, &tv);
 }
 
-/**
- * pcapm() - Capture multiple frames from multiple message headers to pcap file
- * @mmh:	Pointer to first sendmmsg() header
- */
-void pcapmm(const struct mmsghdr *mmh, unsigned int vlen)
-{
-	struct iovec *iov;
-	struct timeval tv;
-	unsigned int i, j;
-
-	if (pcap_fd == -1)
-		return;
-
-	gettimeofday(&tv, NULL);
-
-	for (i = 0; i < vlen; i++) {
-		const struct msghdr *mh = &mmh[i].msg_hdr;
-
-		for (j = 0; j < mh->msg_iovlen; j++) {
-			iov = &mh->msg_iov[j];
-
-			pcap_frame((char *)iov->iov_base + 4, iov->iov_len - 4,
-				   &tv);
-		}
-	}
-}
-
 /**
  * pcap_init() - Initialise pcap file
  * @c:		Execution context
diff --git a/pcap.h b/pcap.h
index eafc89b..c2af3cf 100644
--- a/pcap.h
+++ b/pcap.h
@@ -8,7 +8,6 @@
 
 void pcap(const char *pkt, size_t len);
 void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
-void pcapmm(const struct mmsghdr *mmh, unsigned int vlen);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
diff --git a/udp.c b/udp.c
index 910f15f..596e59f 100644
--- a/udp.c
+++ b/udp.c
@@ -224,9 +224,6 @@ static struct iovec	udp6_l2_iov_tap		[UDP_MAX_FRAMES];
 static struct mmsghdr	udp4_l2_mh_sock		[UDP_MAX_FRAMES];
 static struct mmsghdr	udp6_l2_mh_sock		[UDP_MAX_FRAMES];
 
-static struct mmsghdr	udp4_l2_mh_tap		[UDP_MAX_FRAMES];
-static struct mmsghdr	udp6_l2_mh_tap		[UDP_MAX_FRAMES];
-
 /* recvmmsg()/sendmmsg() data for "spliced" connections */
 static struct iovec	udp4_iov_splice		[UDP_MAX_FRAMES];
 static struct iovec	udp6_iov_splice		[UDP_MAX_FRAMES];
@@ -336,12 +333,9 @@ static void udp_sock4_iov_init(const struct ctx *c)
 	}
 
 	for (i = 0; i < UDP_MAX_FRAMES; i++) {
-		struct msghdr *mh = &udp4_l2_mh_tap[i].msg_hdr;
 		struct iovec *iov = &udp4_l2_iov_tap[i];
 
-		iov->iov_base	= tap_iov_base(c, &udp4_l2_buf[i].taph);
-		mh->msg_iov	= iov;
-		mh->msg_iovlen	= 1;
+		iov->iov_base = tap_iov_base(c, &udp4_l2_buf[i].taph);
 	}
 }
 
@@ -374,12 +368,9 @@ static void udp_sock6_iov_init(const struct ctx *c)
 	}
 
 	for (i = 0; i < UDP_MAX_FRAMES; i++) {
-		struct msghdr *mh = &udp6_l2_mh_tap[i].msg_hdr;
 		struct iovec *iov = &udp6_l2_iov_tap[i];
 
-		iov->iov_base	= tap_iov_base(c, &udp6_l2_buf[i].taph);
-		mh->msg_iov	= iov;
-		mh->msg_iovlen	= 1;
+		iov->iov_base = tap_iov_base(c, &udp6_l2_buf[i].taph);
 	}
 }
 
@@ -705,102 +696,6 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 	return tap_iov_len(c, &b->taph, ip_len);
 }
 
-/**
- * udp_tap_send_pasta() - Send datagrams to the pasta tap interface
- * @c:		Execution context
- * @mmh:	Array of message headers to send
- * @n:		Number of message headers to send
- *
- * #syscalls:pasta write
- */
-static void udp_tap_send_pasta(const struct ctx *c, struct mmsghdr *mmh,
-			       unsigned int n)
-{
-	unsigned int i, j;
-
-	for (i = 0; i < n; i++) {
-		for (j = 0; j < mmh[i].msg_hdr.msg_iovlen; j++) {
-			struct iovec *iov = &mmh[i].msg_hdr.msg_iov[j];
-
-			/* We can't use writev() because the tap
-			 * character device relies on the write()
-			 * boundaries to discern frame boundaries
-			 */
-			if (write(c->fd_tap, iov->iov_base, iov->iov_len) < 0)
-				debug("tap write: %s", strerror(errno));
-			else
-				pcap(iov->iov_base, iov->iov_len);
-		}
-	}
-}
-
-/**
- * udp_tap_send_passt() - Send datagrams to the passt tap interface
- * @c:		Execution context
- * @mmh:	Array of message headers to send
- * @n:		Number of message headers to send
- *
- * #syscalls:passt sendmmsg sendmsg
- */
-static void udp_tap_send_passt(const struct ctx *c, struct mmsghdr *mmh, int n)
-{
-	struct msghdr *last_mh;
-	ssize_t missing = 0;
-	size_t msg_len = 0;
-	unsigned int i;
-	int ret;
-
-	ret = sendmmsg(c->fd_tap, mmh, n, MSG_NOSIGNAL | MSG_DONTWAIT);
-	if (ret <= 0)
-		return;
-
-	/* If we lose some messages to sendmmsg() here, fine, it's UDP. However,
-	 * the last message needs to be delivered completely, otherwise qemu
-	 * will fail to reassemble the next message and close the connection. Go
-	 * through headers from the last sent message, counting bytes, and, if
-	 * and as soon as we see more bytes than sendmmsg() sent, re-send the
-	 * rest with a blocking call.
-	 *
-	 * In pictures, given this example:
-	 *
-	 *				 	iov #0  iov #1  iov #2  iov #3
-	 * tap_mmh[ret - 1].msg_hdr:		....    ......  .....   ......
-	 * tap_mmh[ret - 1].msg_len:	7	....    ...
-	 *
-	 * when 'msglen' reaches:	10		      ^
-	 * and 'missing' below is:	3	           ---
-	 *
-	 * re-send everything from here:		   ^--  -----   ------
-	 */
-	last_mh = &mmh[ret - 1].msg_hdr;
-	for (i = 0; i < last_mh->msg_iovlen; i++) {
-		if (missing <= 0) {
-			msg_len += last_mh->msg_iov[i].iov_len;
-			missing = msg_len - mmh[ret - 1].msg_len;
-		}
-
-		if (missing > 0) {
-			uint8_t **iov_base;
-			int first_offset;
-
-			iov_base = (uint8_t **)&last_mh->msg_iov[i].iov_base;
-			first_offset = last_mh->msg_iov[i].iov_len - missing;
-			*iov_base += first_offset;
-			last_mh->msg_iov[i].iov_len = missing;
-
-			last_mh->msg_iov = &last_mh->msg_iov[i];
-
-			if (sendmsg(c->fd_tap, last_mh, MSG_NOSIGNAL) < 0)
-				debug("UDP: %li bytes to tap missing", missing);
-
-			*iov_base -= first_offset;
-			break;
-		}
-	}
-
-	pcapmm(mmh, ret);
-}
-
 /**
  * udp_tap_send() - Prepare UDP datagrams and send to tap interface
  * @c:		Execution context
@@ -812,25 +707,18 @@ static void udp_tap_send_passt(const struct ctx *c, struct mmsghdr *mmh, int n)
  *
  * Return: size of tap frame with headers
  */
-static void udp_tap_send(const struct ctx *c,
+static void udp_tap_send(struct ctx *c,
 			 unsigned int start, unsigned int n,
 			 in_port_t dstport, bool v6, const struct timespec *now)
 {
-	int msg_bufs = 0, msg_i = 0;
-	struct mmsghdr *tap_mmh;
 	struct iovec *tap_iov;
-	ssize_t msg_len = 0;
 	unsigned int i;
 
-	if (v6) {
-		tap_mmh = udp6_l2_mh_tap;
+	if (v6)
 		tap_iov = udp6_l2_iov_tap;
-	} else {
-		tap_mmh = udp4_l2_mh_tap;
+	else
 		tap_iov = udp4_l2_iov_tap;
-	}
 
-	tap_mmh[0].msg_hdr.msg_iov = &tap_iov[start];
 	for (i = start; i < start + n; i++) {
 		size_t buf_len;
 
@@ -840,24 +728,9 @@ static void udp_tap_send(const struct ctx *c,
 			buf_len = udp_update_hdr4(c, i, dstport, now);
 
 		tap_iov[i].iov_len = buf_len;
-
-		/* With bigger messages, qemu closes the connection. */
-		if (c->mode == MODE_PASST && msg_bufs &&
-		    msg_len + buf_len > SHRT_MAX) {
-			tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs;
-			msg_i++;
-			tap_mmh[msg_i].msg_hdr.msg_iov = &tap_iov[i];
-			msg_len = msg_bufs = 0;
-		}
-		msg_len += buf_len;
-		msg_bufs++;
 	}
-	tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs;
 
-	if (c->mode == MODE_PASTA)
-		udp_tap_send_pasta(c, tap_mmh, msg_i + 1);
-	else
-		udp_tap_send_passt(c, tap_mmh, msg_i + 1);
+	tap_send_frames(c, tap_iov + start, n);
 }
 
 /**
@@ -869,7 +742,7 @@ static void udp_tap_send(const struct ctx *c,
  *
  * #syscalls recvmmsg
  */
-void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
+void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now)
 {
 	in_port_t dstport = ref.r.p.udp.udp.port;
diff --git a/udp.h b/udp.h
index 2a03335..68082ea 100644
--- a/udp.h
+++ b/udp.h
@@ -8,7 +8,7 @@
 
 #define UDP_TIMER_INTERVAL		1000 /* ms */
 
-void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
+void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now);
 int udp_tap_handler(struct ctx *c, int af, const void *addr,
 		    const struct pool *p, const struct timespec *now);
-- 
@@ -8,7 +8,7 @@
 
 #define UDP_TIMER_INTERVAL		1000 /* ms */
 
-void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
+void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now);
 int udp_tap_handler(struct ctx *c, int af, const void *addr,
 		    const struct pool *p, const struct timespec *now);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 18/18] tap: Improve handling of partial frame sends
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (16 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 17/18] udp: Use tap_send_frames() David Gibson
@ 2022-12-09  5:42 ` David Gibson
  2023-01-04 17:45   ` Stefano Brivio
  2023-01-04 17:45 ` [PATCH v2 00/18] RFC: Unify and simplify tap send path Stefano Brivio
  18 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2022-12-09  5:42 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

In passt mode, when writing frames to the qemu socket, we might get a short
send.  If we ignored this and carried on, the qemu socket would get out of
sync, because the bytes we actually sent wouldn't correspond  to the length
header we already sent.  tap_send_frames_passt() handles that by doing a
a blocking send to complete the message, but it has a few flaws:
 * We only attempt to resend once: although it's unlikely in practice,
   nothing prevents the blocking send() from also being short
 * We print a debug error if send() returns non-zero.. but send() returns
   the number of bytes sent, so we actually want it to return the length
   of the remaining data.

Correct those flaws and also be a bit more thorough about reporting
problems here.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/tap.c b/tap.c
index 5ec6b70..4fb27ca 100644
--- a/tap.c
+++ b/tap.c
@@ -326,13 +326,39 @@ static void tap_send_frames_pasta(struct ctx *c,
 	}
 }
 
+/**
+ * tap_send_remainder() - Send remainder of a partiall sent frame
+ * @c:		Execution context
+ * @iov:	Partially sent buffer
+ * @offset:	Number of bytes already sent from @iov
+ *
+ * #syscalls:passt sendto
+ */
+static void tap_send_remainder(const struct ctx *c, const struct iovec *iov,
+			       size_t offset)
+{
+	const char *base = (char *)iov->iov_base;
+	size_t len = iov->iov_len;
+
+	while (offset < len) {
+		ssize_t sent = send(c->fd_tap, base + offset, len - offset,
+				    MSG_NOSIGNAL);
+		if (sent < 0) {
+			err("tap: partial frame send (missing %lu bytes): %s",
+			    len - offset, strerror(errno));
+			return;
+		}
+		offset += sent;
+	}
+}
+
 /**
  * tap_send_frames_passt() - Send multiple frames to the passt tap
  * @c:		Execution context
  * @iov:	Array of buffers, each containing one frame
  * @n:		Number of buffers/frames in @iov
  *
- * #syscalls:passt sendmsg send
+ * #syscalls:passt sendmsg
  */
 static void tap_send_frames_passt(const struct ctx *c,
 				  const struct iovec *iov, size_t n)
@@ -341,29 +367,28 @@ static void tap_send_frames_passt(const struct ctx *c,
 		.msg_iov = (void *)iov,
 		.msg_iovlen = n,
 	};
-	size_t end = 0, missing;
 	unsigned int i;
 	ssize_t sent;
-	char *p;
 
 	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
 	if (sent < 0)
 		return;
 
-	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0; i < n; i++, iov++) {
-		end += iov->iov_len;
-		if (end >= (size_t)sent)
+	/* Check for any partial frames due to short send */
+	for (i = 0; i < n; i++) {
+		if ((size_t)sent < iov[i].iov_len)
 			break;
+		sent -= iov[i].iov_len;
 	}
 
-	missing = end - sent;
-	if (!missing)
-		return;
+	if (i < n && sent) {
+		/* A partial frame was sent */
+		tap_send_remainder(c, &iov[i], sent);
+		i++;
+	}
 
-	p = (char *)iov->iov_base + iov->iov_len - missing;
-	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
-		debug("tap: failed to flush %lu missing bytes to tap", missing);
+	if (i < n)
+		debug("tap: dropped %lu frames due to short send", n - i);
 }
 
 /**
-- 
@@ -326,13 +326,39 @@ static void tap_send_frames_pasta(struct ctx *c,
 	}
 }
 
+/**
+ * tap_send_remainder() - Send remainder of a partiall sent frame
+ * @c:		Execution context
+ * @iov:	Partially sent buffer
+ * @offset:	Number of bytes already sent from @iov
+ *
+ * #syscalls:passt sendto
+ */
+static void tap_send_remainder(const struct ctx *c, const struct iovec *iov,
+			       size_t offset)
+{
+	const char *base = (char *)iov->iov_base;
+	size_t len = iov->iov_len;
+
+	while (offset < len) {
+		ssize_t sent = send(c->fd_tap, base + offset, len - offset,
+				    MSG_NOSIGNAL);
+		if (sent < 0) {
+			err("tap: partial frame send (missing %lu bytes): %s",
+			    len - offset, strerror(errno));
+			return;
+		}
+		offset += sent;
+	}
+}
+
 /**
  * tap_send_frames_passt() - Send multiple frames to the passt tap
  * @c:		Execution context
  * @iov:	Array of buffers, each containing one frame
  * @n:		Number of buffers/frames in @iov
  *
- * #syscalls:passt sendmsg send
+ * #syscalls:passt sendmsg
  */
 static void tap_send_frames_passt(const struct ctx *c,
 				  const struct iovec *iov, size_t n)
@@ -341,29 +367,28 @@ static void tap_send_frames_passt(const struct ctx *c,
 		.msg_iov = (void *)iov,
 		.msg_iovlen = n,
 	};
-	size_t end = 0, missing;
 	unsigned int i;
 	ssize_t sent;
-	char *p;
 
 	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
 	if (sent < 0)
 		return;
 
-	/* Ensure a complete last message on partial sendmsg() */
-	for (i = 0; i < n; i++, iov++) {
-		end += iov->iov_len;
-		if (end >= (size_t)sent)
+	/* Check for any partial frames due to short send */
+	for (i = 0; i < n; i++) {
+		if ((size_t)sent < iov[i].iov_len)
 			break;
+		sent -= iov[i].iov_len;
 	}
 
-	missing = end - sent;
-	if (!missing)
-		return;
+	if (i < n && sent) {
+		/* A partial frame was sent */
+		tap_send_remainder(c, &iov[i], sent);
+		i++;
+	}
 
-	p = (char *)iov->iov_base + iov->iov_len - missing;
-	if (send(c->fd_tap, p, missing, MSG_NOSIGNAL))
-		debug("tap: failed to flush %lu missing bytes to tap", missing);
+	if (i < n)
+		debug("tap: dropped %lu frames due to short send", n - i);
 }
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/18] pcap: Introduce pcap_frame() helper
  2022-12-09  5:42 ` [PATCH v2 01/18] pcap: Introduce pcap_frame() helper David Gibson
@ 2023-01-04 17:45   ` Stefano Brivio
  2023-01-05  4:47     ` David Gibson
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing
> to the capture file.  The main purpose pf pcapm() and pcampp() not calling
> pcap seems to be to avoid repeatedly calling gettimeofday().  We can
> accomplish that while still sharing code by adding a new helper which
> takes the packet timestamp as a parameter.

Another purpose is to avoid logging errors for single messages in
batches -- if something goes wrong with the log file, it gets noisy to
the point of making passt unusable during throughput tests.

Sure, it's quite unlikely to happen during normal operation -- still,
should you re-spin, pcap_frame() could return errors and callers could
abort early in case.

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it
  2022-12-09  5:42 ` [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it David Gibson
@ 2023-01-04 17:45   ` Stefano Brivio
  2023-01-05  4:48     ` David Gibson
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:14 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> tcp[46]_l2_buf_bytes keep track of the total number of bytes we have
> queued to send to the tap interface.  tcp_l2_buf_flush_passt() uses this
> to determine if sendmsg() has sent all the data we requested, or whether
> we need to resend a trailing portion.

In the subject: s/compute/computed/.

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush()
  2022-12-09  5:42 ` [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush() David Gibson
@ 2023-01-04 17:45   ` Stefano Brivio
  2023-01-05  4:53     ` David Gibson
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:15 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Currently this takes a msghdr, but the only thing we actually care about
> in there in is the io vector.  Make it take an io vector directly.  We also

s/in is/is/, if you re-spin.

> have a weird side effect of zeroing @buf_used.  Just pass this by value
> and zero it in the caller instead.

That was done on purpose: call a "flush" function, and that will set the
right value for @buf_used.

On the other hand, after 3/18, I think both options are equally
readable, and if you found it weird, so be it.

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 17/18] udp: Use tap_send_frames()
  2022-12-09  5:42 ` [PATCH v2 17/18] udp: Use tap_send_frames() David Gibson
@ 2023-01-04 17:45   ` Stefano Brivio
  2023-01-05  4:54     ` David Gibson
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:27 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> To send frames on the tap interface, the UDP uses a fairly complicated two
> level batching.  First multiple frames are gathered into a single "message"
> for the qemu stream socket, then multiple messages are send with
> sendmmsg().  We now have tap_send_frames() which already deals with sending
> a number of frames, including batching and handling partial sends.  Use
> that to considerably simplify things.
> 
> This does make a couple of behavioural changes:
>   * We no longer split messages to keep them under 64kiB, which comments
>     say is necessary.  However the TCP code didn't have equivalent code, so
>     either this isn't actually needed, or we should implement for both
>     (which is now easier since it can be done in one place).

That used to be SHRT_MAX (32 KiB - 1), not 64 KiB. For some reason, if
I sent messages with multiple frames, with any one of them larger than
32 KiB, qemu used to close the connection.

But it looks like this was caused by some issue in the handling of short
sends rather than an issue in qemu, especially given that, with this
series applied, I don't hit the original issue (and I could predictably
reproduce it earlier).

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 18/18] tap: Improve handling of partial frame sends
  2022-12-09  5:42 ` [PATCH v2 18/18] tap: Improve handling of partial frame sends David Gibson
@ 2023-01-04 17:45   ` Stefano Brivio
  2023-01-05  4:57     ` David Gibson
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:28 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> In passt mode, when writing frames to the qemu socket, we might get a short
> send.  If we ignored this and carried on, the qemu socket would get out of
> sync, because the bytes we actually sent wouldn't correspond  to the length
> header we already sent.  tap_send_frames_passt() handles that by doing a
> a blocking send to complete the message, but it has a few flaws:
>  * We only attempt to resend once: although it's unlikely in practice,
>    nothing prevents the blocking send() from also being short
>  * We print a debug error if send() returns non-zero.. but send() returns
>    the number of bytes sent, so we actually want it to return the length
>    of the remaining data.
> 
> Correct those flaws and also be a bit more thorough about reporting
> problems here.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  tap.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 5ec6b70..4fb27ca 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -326,13 +326,39 @@ static void tap_send_frames_pasta(struct ctx *c,
>  	}
>  }
>  
> +/**
> + * tap_send_remainder() - Send remainder of a partiall sent frame

s/partiall/partially/

> + * @c:		Execution context
> + * @iov:	Partially sent buffer
> + * @offset:	Number of bytes already sent from @iov
> + *
> + * #syscalls:passt sendto

We should probably drop this, it's actually send(2) on armv6/armv7 and
ppc64le, sendto(2) otherwise -- initially I sprinkled those annotations
all over the place to show where specific calls were issued (and make
it easier to drop them from the filter if calls disappeared) but I
think it's not really practical for send(3).

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/18] RFC: Unify and simplify tap send path
  2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
                   ` (17 preceding siblings ...)
  2022-12-09  5:42 ` [PATCH v2 18/18] tap: Improve handling of partial frame sends David Gibson
@ 2023-01-04 17:45 ` Stefano Brivio
  18 siblings, 0 replies; 30+ messages in thread
From: Stefano Brivio @ 2023-01-04 17:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri,  9 Dec 2022 16:42:10 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Although we have an abstraction for the "slow path" (DHCP, NDP) guest
> bound packets, the TCP and UDP forwarding paths write directly to the
> tap fd.  However, it turns out how they send frames to the tap device
> is more similar than it originally appears.
> 
> This series unifies the low-level tap send functions for TCP and UDP,
> and makes some clean ups along the way.
> 
> This is based on my earlier outstanding series.

Other than the minor remarks I shared, this looks good to me, and
it shows no functional or performance impact in tests.

-- 
Stefano


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/18] pcap: Introduce pcap_frame() helper
  2023-01-04 17:45   ` Stefano Brivio
@ 2023-01-05  4:47     ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2023-01-05  4:47 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Wed, Jan 04, 2023 at 06:45:03PM +0100, Stefano Brivio wrote:
> On Fri,  9 Dec 2022 16:42:11 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing
> > to the capture file.  The main purpose pf pcapm() and pcampp() not calling
> > pcap seems to be to avoid repeatedly calling gettimeofday().  We can
> > accomplish that while still sharing code by adding a new helper which
> > takes the packet timestamp as a parameter.
> 
> Another purpose is to avoid logging errors for single messages in
> batches -- if something goes wrong with the log file, it gets noisy to
> the point of making passt unusable during throughput tests.
> 
> Sure, it's quite unlikely to happen during normal operation -- still,
> should you re-spin, pcap_frame() could return errors and callers could
> abort early in case.

Good point, adjusted.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it
  2023-01-04 17:45   ` Stefano Brivio
@ 2023-01-05  4:48     ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2023-01-05  4:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

On Wed, Jan 04, 2023 at 06:45:08PM +0100, Stefano Brivio wrote:
> On Fri,  9 Dec 2022 16:42:14 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > tcp[46]_l2_buf_bytes keep track of the total number of bytes we have
> > queued to send to the tap interface.  tcp_l2_buf_flush_passt() uses this
> > to determine if sendmsg() has sent all the data we requested, or whether
> > we need to resend a trailing portion.
> 
> In the subject: s/compute/computed/.

Fixed.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush()
  2023-01-04 17:45   ` Stefano Brivio
@ 2023-01-05  4:53     ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2023-01-05  4:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

On Wed, Jan 04, 2023 at 06:45:16PM +0100, Stefano Brivio wrote:
> On Fri,  9 Dec 2022 16:42:15 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently this takes a msghdr, but the only thing we actually care about
> > in there in is the io vector.  Make it take an io vector directly.  We also
> 
> s/in is/is/, if you re-spin.

Fixed.

> > have a weird side effect of zeroing @buf_used.  Just pass this by value
> > and zero it in the caller instead.
> 
> That was done on purpose: call a "flush" function, and that will set the
> right value for @buf_used.

Hm, I guess.  It seemed odd to me because most of the rest of the
management of the buf_used variables is open coded in the caller.

> On the other hand, after 3/18, I think both options are equally
> readable, and if you found it weird, so be it.

More to the point I think the old way obstructed the changes I make
later in the series.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 17/18] udp: Use tap_send_frames()
  2023-01-04 17:45   ` Stefano Brivio
@ 2023-01-05  4:54     ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2023-01-05  4:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

On Wed, Jan 04, 2023 at 06:45:25PM +0100, Stefano Brivio wrote:
> On Fri,  9 Dec 2022 16:42:27 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > To send frames on the tap interface, the UDP uses a fairly complicated two
> > level batching.  First multiple frames are gathered into a single "message"
> > for the qemu stream socket, then multiple messages are send with
> > sendmmsg().  We now have tap_send_frames() which already deals with sending
> > a number of frames, including batching and handling partial sends.  Use
> > that to considerably simplify things.
> > 
> > This does make a couple of behavioural changes:
> >   * We no longer split messages to keep them under 64kiB, which comments
> >     say is necessary.  However the TCP code didn't have equivalent code, so
> >     either this isn't actually needed, or we should implement for both
> >     (which is now easier since it can be done in one place).
> 
> That used to be SHRT_MAX (32 KiB - 1), not 64 KiB. For some reason, if
> I sent messages with multiple frames, with any one of them larger than
> 32 KiB, qemu used to close the connection.
> 
> But it looks like this was caused by some issue in the handling of short
> sends rather than an issue in qemu, especially given that, with this
> series applied, I don't hit the original issue (and I could predictably
> reproduce it earlier).

Right, I already updated the commit messages and some comments based
on the discussion we had last night.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 18/18] tap: Improve handling of partial frame sends
  2023-01-04 17:45   ` Stefano Brivio
@ 2023-01-05  4:57     ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2023-01-05  4:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Wed, Jan 04, 2023 at 06:45:33PM +0100, Stefano Brivio wrote:
> On Fri,  9 Dec 2022 16:42:28 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > In passt mode, when writing frames to the qemu socket, we might get a short
> > send.  If we ignored this and carried on, the qemu socket would get out of
> > sync, because the bytes we actually sent wouldn't correspond  to the length
> > header we already sent.  tap_send_frames_passt() handles that by doing a
> > a blocking send to complete the message, but it has a few flaws:
> >  * We only attempt to resend once: although it's unlikely in practice,
> >    nothing prevents the blocking send() from also being short
> >  * We print a debug error if send() returns non-zero.. but send() returns
> >    the number of bytes sent, so we actually want it to return the length
> >    of the remaining data.
> > 
> > Correct those flaws and also be a bit more thorough about reporting
> > problems here.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  tap.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 38 insertions(+), 13 deletions(-)
> > 
> > diff --git a/tap.c b/tap.c
> > index 5ec6b70..4fb27ca 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -326,13 +326,39 @@ static void tap_send_frames_pasta(struct ctx *c,
> >  	}
> >  }
> >  
> > +/**
> > + * tap_send_remainder() - Send remainder of a partiall sent frame
> 
> s/partiall/partially/

Fixed.

> > + * @c:		Execution context
> > + * @iov:	Partially sent buffer
> > + * @offset:	Number of bytes already sent from @iov
> > + *
> > + * #syscalls:passt sendto
> 
> We should probably drop this, it's actually send(2) on armv6/armv7 and
> ppc64le, sendto(2) otherwise -- initially I sprinkled those annotations
> all over the place to show where specific calls were issued (and make
> it easier to drop them from the filter if calls disappeared) but I
> think it's not really practical for send(3).

Heh, right.  Done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-01-05  5:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-09  5:42 [PATCH v2 00/18] RFC: Unify and simplify tap send path David Gibson
2022-12-09  5:42 ` [PATCH v2 01/18] pcap: Introduce pcap_frame() helper David Gibson
2023-01-04 17:45   ` Stefano Brivio
2023-01-05  4:47     ` David Gibson
2022-12-09  5:42 ` [PATCH v2 02/18] pcap: Replace pcapm() with pcap_multiple() David Gibson
2022-12-09  5:42 ` [PATCH v2 03/18] tcp: Combine two parts of passt tap send path together David Gibson
2022-12-09  5:42 ` [PATCH v2 04/18] tcp: Don't keep compute total bytes in a message until we need it David Gibson
2023-01-04 17:45   ` Stefano Brivio
2023-01-05  4:48     ` David Gibson
2022-12-09  5:42 ` [PATCH v2 05/18] tcp: Improve interface to tcp_l2_buf_flush() David Gibson
2023-01-04 17:45   ` Stefano Brivio
2023-01-05  4:53     ` David Gibson
2022-12-09  5:42 ` [PATCH v2 06/18] tcp: Combine two parts of pasta tap send path together David Gibson
2022-12-09  5:42 ` [PATCH v2 07/18] tap, tcp: Move tap send path to tap.c David Gibson
2022-12-09  5:42 ` [PATCH v2 08/18] util: Introduce hton*_constant() in place of #ifdefs David Gibson
2022-12-09  5:42 ` [PATCH v2 09/18] tcp, udp: Use named field initializers in iov_init functions David Gibson
2022-12-09  5:42 ` [PATCH v2 10/18] util: Parameterize ethernet header initializer macro David Gibson
2022-12-09  5:42 ` [PATCH v2 11/18] tcp: Remove redundant and incorrect initialization from *_iov_init() David Gibson
2022-12-09  5:42 ` [PATCH v2 12/18] tcp: Consolidate calculation of total frame size David Gibson
2022-12-09  5:42 ` [PATCH v2 13/18] tap: Add "tap headers" abstraction David Gibson
2022-12-09  5:42 ` [PATCH v2 14/18] tcp: Use abstracted tap header David Gibson
2022-12-09  5:42 ` [PATCH v2 15/18] tap: Use different io vector bases depending on tap type David Gibson
2022-12-09  5:42 ` [PATCH v2 16/18] udp: Use abstracted tap header David Gibson
2022-12-09  5:42 ` [PATCH v2 17/18] udp: Use tap_send_frames() David Gibson
2023-01-04 17:45   ` Stefano Brivio
2023-01-05  4:54     ` David Gibson
2022-12-09  5:42 ` [PATCH v2 18/18] tap: Improve handling of partial frame sends David Gibson
2023-01-04 17:45   ` Stefano Brivio
2023-01-05  4:57     ` David Gibson
2023-01-04 17:45 ` [PATCH v2 00/18] RFC: Unify and simplify tap send path Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).