public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces
@ 2024-02-22  5:55 David Gibson
  2024-02-22  5:55 ` [PATCH 1/6] util: Add helper to find offset into io vector David Gibson
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:55 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

Laurent, sorry I didn't spot this earlier.

While working on other stuff I stumbled across these patches I wrote
quite a while back (part of a larger series on indefinite hiatus).
The first two patches have some overlap with the preliminary iovec
patches in the vhost-user series (and will certainly conflict).  I do
think the pcap interface here is slightly better than the one in the
vhost-user series, although I did ack that.

Stefano, if you've already applied / run tests for Laurent's series
then go ahead with it; I'll rework this on top of those.

David Gibson (6):
  util: Add helper to find offset into io vector
  pcap: Update pcap_frame() to take an iovec and offset
  util: Add write_remainder() helper
  pcap: Handle short writes in pcap_frame()
  pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer
    frames
  tap: Use write_remainder() in tap_send_frames_passt()

 pcap.c | 54 ++++++++++++++++++++++++++++--------------------------
 pcap.h |  3 ++-
 tap.c  | 42 +++++++++---------------------------------
 util.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util.h |  2 ++
 5 files changed, 96 insertions(+), 60 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] util: Add helper to find offset into io vector
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
@ 2024-02-22  5:55 ` David Gibson
  2024-02-27 14:23   ` Stefano Brivio
  2024-02-22  5:55 ` [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset David Gibson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:55 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

tap_send_frames_passt() needs to determine which buffer element a byte
offset into an IO vector lies in.  We have some upcoming uses for similar
logic, so split this out into a helper function iov_offset().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c  | 13 +++++--------
 util.c | 23 +++++++++++++++++++++++
 util.h |  1 +
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/tap.c b/tap.c
index 396dee7e..f15eba6e 100644
--- a/tap.c
+++ b/tap.c
@@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c,
 		.msg_iovlen = n,
 	};
 	unsigned int i;
+	size_t offset;
 	ssize_t sent;
 
 	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
 	if (sent < 0)
 		return 0;
 
-	/* Check for any partial frames due to short send */
-	for (i = 0; i < n; i++) {
-		if ((size_t)sent < iov[i].iov_len)
-			break;
-		sent -= iov[i].iov_len;
-	}
+	offset = (size_t)sent;
+	i = iov_offset(iov, n, &offset);
 
-	if (i < n && sent) {
+	if (i < n && offset) {
 		/* A partial frame was sent */
-		tap_send_remainder(c, &iov[i], sent);
+		tap_send_remainder(c, &iov[i], offset);
 		i++;
 	}
 
diff --git a/util.c b/util.c
index 21b35ff9..fb6a0430 100644
--- a/util.c
+++ b/util.c
@@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
 	return clone(fn, stack_area + stack_size / 2, flags, arg);
 #endif
 }
+
+/* iov_offset() - interpret offset into an IO vector
+ * @iov:	IO vector
+ * @n:		Number of entries in @iov
+ * @offset:	Pointer to offset into IO vector
+ *
+ * Return: index I of iovec which contains the given offset, or @n if
+ *         the given offset is >= the total # of bytes in the vector.
+ *         *@offset is updated to be the byte offset into (@iov + I),
+ *         and is guaranteed to be less than @iov[I].iov_len
+ */
+size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+	if (*offset < iov[i].iov_len)
+			break;
+		*offset -= iov[i].iov_len;
+	}
+
+	return i;
+}
diff --git a/util.h b/util.h
index d2320f8c..62fad6fe 100644
--- a/util.h
+++ b/util.h
@@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid);
 int __daemon(int pidfile_fd, int devnull_fd);
 int fls(unsigned long x);
 int write_file(const char *path, const char *buf);
+size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
@@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid);
 int __daemon(int pidfile_fd, int devnull_fd);
 int fls(unsigned long x);
 int write_file(const char *path, const char *buf);
+size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
  2024-02-22  5:55 ` [PATCH 1/6] util: Add helper to find offset into io vector David Gibson
@ 2024-02-22  5:55 ` David Gibson
  2024-02-27 14:23   ` Stefano Brivio
  2024-02-22  5:55 ` [PATCH 3/6] util: Add write_remainder() helper David Gibson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:55 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

Update the low-level helper pcap_frame() to take a struct iovec and
offset within it, rather than an explicit pointer and length for the
frame.  This moves the handling of an offset (to skip vnet_len) from
pcap_multiple() to pcap_frame().

This doesn't accomplish a great deal immediately, but will make
subsequent changes easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/pcap.c b/pcap.c
index 501d52d4..8876a051 100644
--- a/pcap.c
+++ b/pcap.c
@@ -67,24 +67,25 @@ struct pcap_pkthdr {
 
 /**
  * pcap_frame() - Capture a single frame to pcap file with given timestamp
- * @pkt:	Pointer to data buffer, including L2 headers
- * @len:	L2 packet length
+ * @iov:	iovec referencing buffer containing frame (with L2 headers)
+ * @offset:	Offset of the frame from @iov->iov_base
  * @tv:		Timestamp
  *
  * Returns: 0 on success, -errno on error writing to the file
  */
-static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
+static void pcap_frame(const struct iovec *iov, size_t offset,
+		       const struct timeval *tv)
 {
+	size_t len = iov->iov_len - offset;
 	struct pcap_pkthdr h;
 
 	h.tv_sec = tv->tv_sec;
 	h.tv_usec = tv->tv_usec;
 	h.caplen = h.len = len;
 
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
-		return -errno;
-
-	return 0;
+	if (write(pcap_fd, &h, sizeof(h)) < 0 ||
+	    write(pcap_fd, (char *)iov->iov_base + offset, len) < 0)
+		debug("Cannot log packet, length %zu", len);
 }
 
 /**
@@ -94,14 +95,14 @@ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
  */
 void pcap(const char *pkt, size_t len)
 {
+	struct iovec iov = { (char *)pkt, len };
 	struct timeval tv;
 
 	if (pcap_fd == -1)
 		return;
 
 	gettimeofday(&tv, NULL);
-	if (pcap_frame(pkt, len, &tv) != 0)
-		debug("Cannot log packet, length %zu", len);
+	pcap_frame(&iov, 0, &tv);
 }
 
 /**
@@ -120,14 +121,8 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 
 	gettimeofday(&tv, NULL);
 
-	for (i = 0; i < n; i++) {
-		if (pcap_frame((char *)iov[i].iov_base + offset,
-			       iov[i].iov_len - offset, &tv) != 0) {
-			debug("Cannot log packet, length %zu",
-			      iov->iov_len - offset);
-			return;
-		}
-	}
+	for (i = 0; i < n; i++)
+		pcap_frame(iov + i, offset, &tv);
 }
 
 /**
-- 
@@ -67,24 +67,25 @@ struct pcap_pkthdr {
 
 /**
  * pcap_frame() - Capture a single frame to pcap file with given timestamp
- * @pkt:	Pointer to data buffer, including L2 headers
- * @len:	L2 packet length
+ * @iov:	iovec referencing buffer containing frame (with L2 headers)
+ * @offset:	Offset of the frame from @iov->iov_base
  * @tv:		Timestamp
  *
  * Returns: 0 on success, -errno on error writing to the file
  */
-static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
+static void pcap_frame(const struct iovec *iov, size_t offset,
+		       const struct timeval *tv)
 {
+	size_t len = iov->iov_len - offset;
 	struct pcap_pkthdr h;
 
 	h.tv_sec = tv->tv_sec;
 	h.tv_usec = tv->tv_usec;
 	h.caplen = h.len = len;
 
-	if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0)
-		return -errno;
-
-	return 0;
+	if (write(pcap_fd, &h, sizeof(h)) < 0 ||
+	    write(pcap_fd, (char *)iov->iov_base + offset, len) < 0)
+		debug("Cannot log packet, length %zu", len);
 }
 
 /**
@@ -94,14 +95,14 @@ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv)
  */
 void pcap(const char *pkt, size_t len)
 {
+	struct iovec iov = { (char *)pkt, len };
 	struct timeval tv;
 
 	if (pcap_fd == -1)
 		return;
 
 	gettimeofday(&tv, NULL);
-	if (pcap_frame(pkt, len, &tv) != 0)
-		debug("Cannot log packet, length %zu", len);
+	pcap_frame(&iov, 0, &tv);
 }
 
 /**
@@ -120,14 +121,8 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 
 	gettimeofday(&tv, NULL);
 
-	for (i = 0; i < n; i++) {
-		if (pcap_frame((char *)iov[i].iov_base + offset,
-			       iov[i].iov_len - offset, &tv) != 0) {
-			debug("Cannot log packet, length %zu",
-			      iov->iov_len - offset);
-			return;
-		}
-	}
+	for (i = 0; i < n; i++)
+		pcap_frame(iov + i, offset, &tv);
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] util: Add write_remainder() helper
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
  2024-02-22  5:55 ` [PATCH 1/6] util: Add helper to find offset into io vector David Gibson
  2024-02-22  5:55 ` [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset David Gibson
@ 2024-02-22  5:55 ` David Gibson
  2024-02-27 14:25   ` Stefano Brivio
  2024-02-22  5:56 ` [PATCH 4/6] pcap: Handle short writes in pcap_frame() David Gibson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:55 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

We have several places where we want to write(2) a buffer or buffers and we
do (or should) handle sort write()s by retrying until everything is
successfully written.  Add a helper for this in util.c.

This version has some differences from the typical write_all() function.
First, take an IO vector rather than a single buffer, because that will be
useful for some of our cases.  Second, allow it to take an parameter to
skip the first n bytes of the given buffers.  This will be usefl for some
of the cases we want, and also falls out quite naturally from the
implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 util.c | 32 ++++++++++++++++++++++++++++++++
 util.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/util.c b/util.c
index fb6a0430..a475f2b5 100644
--- a/util.c
+++ b/util.c
@@ -19,6 +19,7 @@
 #include <arpa/inet.h>
 #include <net/ethernet.h>
 #include <sys/epoll.h>
+#include <sys/uio.h>
 #include <fcntl.h>
 #include <string.h>
 #include <time.h>
@@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)
 
 	return i;
 }
+
+/* write_remainder() - write the tail of an IO vector to an fd
+ * @fd:		File descriptor
+ * @iov:	IO vector
+ * @iovcnt:	Number of entries in @iov
+ * @skip:	Number of bytes of the vector to skip writing
+ *
+ * Return: 0 on success, -1 on error (with errno set)
+ *
+ * #syscalls write writev
+ */
+int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip)
+{
+	int i;
+
+	while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) {
+		ssize_t rc;
+
+		if (skip)
+			rc = write(fd, (char *)iov[i].iov_base + skip,
+				   iov[i].iov_len - skip);
+		else
+			rc = writev(fd, &iov[i], iovcnt - i);
+
+		if (rc < 0)
+			return -1;
+		skip += rc;
+	}
+
+	return 0;
+}
diff --git a/util.h b/util.h
index 62fad6fe..ee380f55 100644
--- a/util.h
+++ b/util.h
@@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd);
 int fls(unsigned long x);
 int write_file(const char *path, const char *buf);
 size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
+int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
@@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd);
 int fls(unsigned long x);
 int write_file(const char *path, const char *buf);
 size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
+int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip);
 
 /**
  * mod_sub() - Modular arithmetic subtraction
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] pcap: Handle short writes in pcap_frame()
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
                   ` (2 preceding siblings ...)
  2024-02-22  5:55 ` [PATCH 3/6] util: Add write_remainder() helper David Gibson
@ 2024-02-22  5:56 ` David Gibson
  2024-02-22  5:56 ` [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames David Gibson
  2024-02-22  5:56 ` [PATCH 6/6] tap: Use write_remainder() in tap_send_frames_passt() David Gibson
  5 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:56 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

Currently pcap_frame() assumes that if write() doesn't return an error, it
has written everything we want.  That's not necessarily true, because it
could return a short write.  That's not likely to happen on a regular file,
but there's not a lot of reason not to be robust here; it's conceivable we
might want to direct the pcap fd at a named pipe or similar.

So, make pcap_frame() handle short frames by using the write_remainder()
helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/pcap.c b/pcap.c
index 8876a051..eeb71a3c 100644
--- a/pcap.c
+++ b/pcap.c
@@ -77,15 +77,18 @@ static void pcap_frame(const struct iovec *iov, size_t offset,
 		       const struct timeval *tv)
 {
 	size_t len = iov->iov_len - offset;
-	struct pcap_pkthdr h;
-
-	h.tv_sec = tv->tv_sec;
-	h.tv_usec = tv->tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 ||
-	    write(pcap_fd, (char *)iov->iov_base + offset, len) < 0)
-		debug("Cannot log packet, length %zu", len);
+	struct pcap_pkthdr h = {
+		.tv_sec = tv->tv_sec,
+		.tv_usec = tv->tv_usec,
+		.caplen = len,
+		.len = len
+	};
+	struct iovec hiov = { &h, sizeof(h) };
+
+	if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 ||
+	    write_remainder(pcap_fd, iov, 1, offset) < 0)
+		debug("Cannot log packet, length %zu: %s",
+		      len, strerror(errno));
 }
 
 /**
-- 
@@ -77,15 +77,18 @@ static void pcap_frame(const struct iovec *iov, size_t offset,
 		       const struct timeval *tv)
 {
 	size_t len = iov->iov_len - offset;
-	struct pcap_pkthdr h;
-
-	h.tv_sec = tv->tv_sec;
-	h.tv_usec = tv->tv_usec;
-	h.caplen = h.len = len;
-
-	if (write(pcap_fd, &h, sizeof(h)) < 0 ||
-	    write(pcap_fd, (char *)iov->iov_base + offset, len) < 0)
-		debug("Cannot log packet, length %zu", len);
+	struct pcap_pkthdr h = {
+		.tv_sec = tv->tv_sec,
+		.tv_usec = tv->tv_usec,
+		.caplen = len,
+		.len = len
+	};
+	struct iovec hiov = { &h, sizeof(h) };
+
+	if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 ||
+	    write_remainder(pcap_fd, iov, 1, offset) < 0)
+		debug("Cannot log packet, length %zu: %s",
+		      len, strerror(errno));
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
                   ` (3 preceding siblings ...)
  2024-02-22  5:56 ` [PATCH 4/6] pcap: Handle short writes in pcap_frame() David Gibson
@ 2024-02-22  5:56 ` David Gibson
  2024-02-27 14:26   ` Stefano Brivio
  2024-02-22  5:56 ` [PATCH 6/6] tap: Use write_remainder() in tap_send_frames_passt() David Gibson
  5 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:56 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

pcap_frame() explicitly takes a single frame, and only allows a single
buffer (iovec) to be passed.  pcap_multiple() takes multiple buffers, but
explicitly expects exactly one frame per buffer.

Future changes are going to want to split single frames across multiple
buffers in some circumstances, so extend the pcap functions to allow for
that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c | 24 ++++++++++++++----------
 pcap.h |  3 ++-
 tap.c  |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/pcap.c b/pcap.c
index eeb71a3c..9eb4f3d2 100644
--- a/pcap.c
+++ b/pcap.c
@@ -31,6 +31,7 @@
 #include "util.h"
 #include "passt.h"
 #include "log.h"
+#include "pcap.h"
 
 #define PCAP_VERSION_MINOR 4
 
@@ -67,14 +68,15 @@ struct pcap_pkthdr {
 
 /**
  * pcap_frame() - Capture a single frame to pcap file with given timestamp
- * @iov:	iovec referencing buffer containing frame (with L2 headers)
- * @offset:	Offset of the frame from @iov->iov_base
+ * @iov:	IO vector containing frame (with L2 headers and tap headers)
+ * @iovcnt:	Number of buffers (@iov entries) in frame
+ * @offset:	Byte offset of the L2 headers within @iov
  * @tv:		Timestamp
  *
  * Returns: 0 on success, -errno on error writing to the file
  */
-static void pcap_frame(const struct iovec *iov, size_t offset,
-		       const struct timeval *tv)
+static void pcap_frame(const struct iovec *iov, size_t iovcnt,
+		       size_t offset, const struct timeval *tv)
 {
 	size_t len = iov->iov_len - offset;
 	struct pcap_pkthdr h = {
@@ -86,7 +88,7 @@ static void pcap_frame(const struct iovec *iov, size_t offset,
 	struct iovec hiov = { &h, sizeof(h) };
 
 	if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 ||
-	    write_remainder(pcap_fd, iov, 1, offset) < 0)
+	    write_remainder(pcap_fd, iov, iovcnt, offset) < 0)
 		debug("Cannot log packet, length %zu: %s",
 		      len, strerror(errno));
 }
@@ -105,16 +107,18 @@ void pcap(const char *pkt, size_t len)
 		return;
 
 	gettimeofday(&tv, NULL);
-	pcap_frame(&iov, 0, &tv);
+	pcap_frame(&iov, 1, 0, &tv);
 }
 
 /**
  * pcap_multiple() - Capture multiple frames
- * @iov:	Array of iovecs, one entry per frame
+ * @iov:	Array of iovecs, every @framebufs entries is one frame
+ * @framebufs:	Number of buffers per frame
  * @n:		Number of frames to capture
- * @offset:	Offset of the frame within each iovec buffer
+ * @offset:	Offset of the L2 frame within each iovec buffer
  */
-void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
+void pcap_multiple(const struct iovec *iov, size_t framebufs, unsigned int n,
+		   size_t offset)
 {
 	struct timeval tv;
 	unsigned int i;
@@ -125,7 +129,7 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset)
 	gettimeofday(&tv, NULL);
 
 	for (i = 0; i < n; i++)
-		pcap_frame(iov + i, offset, &tv);
+		pcap_frame(iov + i * framebufs, framebufs, offset, &tv);
 }
 
 /**
diff --git a/pcap.h b/pcap.h
index da5a7e84..b9a2257e 100644
--- a/pcap.h
+++ b/pcap.h
@@ -7,7 +7,8 @@
 #define PCAP_H
 
 void pcap(const char *pkt, size_t len);
-void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset);
+void pcap_multiple(const struct iovec *iov, size_t framebufs, unsigned int n,
+		   size_t offset);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
diff --git a/tap.c b/tap.c
index f15eba6e..7f0389f3 100644
--- a/tap.c
+++ b/tap.c
@@ -432,7 +432,7 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n)
 	if (m < n)
 		debug("tap: failed to send %zu frames of %zu", n - m, n);
 
-	pcap_multiple(iov, m, c->mode == MODE_PASST ? sizeof(uint32_t) : 0);
+	pcap_multiple(iov, 1, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0);
 
 	return m;
 }
-- 
@@ -432,7 +432,7 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n)
 	if (m < n)
 		debug("tap: failed to send %zu frames of %zu", n - m, n);
 
-	pcap_multiple(iov, m, c->mode == MODE_PASST ? sizeof(uint32_t) : 0);
+	pcap_multiple(iov, 1, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0);
 
 	return m;
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] tap: Use write_remainder() in tap_send_frames_passt()
  2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
                   ` (4 preceding siblings ...)
  2024-02-22  5:56 ` [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames David Gibson
@ 2024-02-22  5:56 ` David Gibson
  5 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2024-02-22  5:56 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio, Laurent Vivier; +Cc: David Gibson

When we determine we have sent a partial frame in tap_send_frames_passt(),
we call tap_send_remainder() to send the remainder of it.  The logic in
that function is very similar to that in the more general write_remainder()
except that it uses send() instead of write()/writev().  But we are dealing
specifically with the qemu socket here, which is a connected stream socket.
In that case write()s do the same thing as send() with the options we were
using, so we can just reuse write_remainder().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 tap.c | 29 ++++-------------------------
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/tap.c b/tap.c
index 7f0389f3..d2c66acb 100644
--- a/tap.c
+++ b/tap.c
@@ -348,30 +348,6 @@ static size_t tap_send_frames_pasta(const struct ctx *c,
 	return i;
 }
 
-/**
- * tap_send_remainder() - Send remainder of a partially sent frame
- * @c:		Execution context
- * @iov:	Partially sent buffer
- * @offset:	Number of bytes already sent from @iov
- */
-static void tap_send_remainder(const struct ctx *c, const struct iovec *iov,
-			       size_t offset)
-{
-	const char *base = (char *)iov->iov_base;
-	size_t len = iov->iov_len;
-
-	while (offset < len) {
-		ssize_t sent = send(c->fd_tap, base + offset, len - offset,
-				    MSG_NOSIGNAL);
-		if (sent < 0) {
-			err("tap: partial frame send (missing %zu bytes): %s",
-			    len - offset, strerror(errno));
-			return;
-		}
-		offset += sent;
-	}
-}
-
 /**
  * tap_send_frames_passt() - Send multiple frames to the passt tap
  * @c:		Execution context
@@ -402,7 +378,10 @@ static size_t tap_send_frames_passt(const struct ctx *c,
 
 	if (i < n && offset) {
 		/* A partial frame was sent */
-		tap_send_remainder(c, &iov[i], offset);
+		if (write_remainder(c->fd_tap, &iov[i], 1, offset) < 0) {
+			err("tap: partial frame send: %s", strerror(errno));
+			return i;
+		}
 		i++;
 	}
 
-- 
@@ -348,30 +348,6 @@ static size_t tap_send_frames_pasta(const struct ctx *c,
 	return i;
 }
 
-/**
- * tap_send_remainder() - Send remainder of a partially sent frame
- * @c:		Execution context
- * @iov:	Partially sent buffer
- * @offset:	Number of bytes already sent from @iov
- */
-static void tap_send_remainder(const struct ctx *c, const struct iovec *iov,
-			       size_t offset)
-{
-	const char *base = (char *)iov->iov_base;
-	size_t len = iov->iov_len;
-
-	while (offset < len) {
-		ssize_t sent = send(c->fd_tap, base + offset, len - offset,
-				    MSG_NOSIGNAL);
-		if (sent < 0) {
-			err("tap: partial frame send (missing %zu bytes): %s",
-			    len - offset, strerror(errno));
-			return;
-		}
-		offset += sent;
-	}
-}
-
 /**
  * tap_send_frames_passt() - Send multiple frames to the passt tap
  * @c:		Execution context
@@ -402,7 +378,10 @@ static size_t tap_send_frames_passt(const struct ctx *c,
 
 	if (i < n && offset) {
 		/* A partial frame was sent */
-		tap_send_remainder(c, &iov[i], offset);
+		if (write_remainder(c->fd_tap, &iov[i], 1, offset) < 0) {
+			err("tap: partial frame send: %s", strerror(errno));
+			return i;
+		}
 		i++;
 	}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] util: Add helper to find offset into io vector
  2024-02-22  5:55 ` [PATCH 1/6] util: Add helper to find offset into io vector David Gibson
@ 2024-02-27 14:23   ` Stefano Brivio
  2024-02-27 23:27     ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2024-02-27 14:23 UTC (permalink / raw)
  To: David Gibson, Laurent Vivier; +Cc: passt-dev

On Thu, 22 Feb 2024 16:55:57 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> tap_send_frames_passt() needs to determine which buffer element a byte
> offset into an IO vector lies in.  We have some upcoming uses for similar
> logic, so split this out into a helper function iov_offset().
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  tap.c  | 13 +++++--------
>  util.c | 23 +++++++++++++++++++++++
>  util.h |  1 +

Laurent, I guess this will need to be moved to iov.h by your series, at
the point where you introduce the new header.

>  3 files changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 396dee7e..f15eba6e 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c,
>  		.msg_iovlen = n,
>  	};
>  	unsigned int i;
> +	size_t offset;
>  	ssize_t sent;
>  
>  	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
>  	if (sent < 0)
>  		return 0;
>  
> -	/* Check for any partial frames due to short send */

Why would you drop this comment, though? I feel it's needed even more
as we don't open-code that any longer.

> -	for (i = 0; i < n; i++) {
> -		if ((size_t)sent < iov[i].iov_len)
> -			break;
> -		sent -= iov[i].iov_len;
> -	}
> +	offset = (size_t)sent;
> +	i = iov_offset(iov, n, &offset);

I think with these names and interface this becomes quite obscure: it
sounds like 'i' should be an offset at this point... and 'offset', I
have no idea (unless I read the comment to iov_offset()). Slightly
different proposal below.

>  
> -	if (i < n && sent) {
> +	if (i < n && offset) {
>  		/* A partial frame was sent */
> -		tap_send_remainder(c, &iov[i], sent);
> +		tap_send_remainder(c, &iov[i], offset);
>  		i++;
>  	}
>  
> diff --git a/util.c b/util.c
> index 21b35ff9..fb6a0430 100644
> --- a/util.c
> +++ b/util.c
> @@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
>  	return clone(fn, stack_area + stack_size / 2, flags, arg);
>  #endif
>  }
> +
> +/* iov_offset() - interpret offset into an IO vector
> + * @iov:	IO vector
> + * @n:		Number of entries in @iov
> + * @offset:	Pointer to offset into IO vector
> + *
> + * Return: index I of iovec which contains the given offset, or @n if
> + *         the given offset is >= the total # of bytes in the vector.
> + *         *@offset is updated to be the byte offset into (@iov + I),
> + *         and is guaranteed to be less than @iov[I].iov_len
> + */
> +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)

...what do you think of:

/* iov_entry_index() - Index and optionally entry offset, given global offset
 * @iov:		IO vector
 * @n:			Number of entries in @iov
 * @global_offset:	Global offset of byte in IO vector we're looking for
 * @entry_offset:	If not NULL, set on return: entry offset
 *
 * Return: index of IO vector entry for given byte offset, @n if not found
 *
 * Note: @entry_offset is guaranteed to be less than @iov[i].iov_len, where i is
 * the return value
 */

and tap_send_frames_passt() could (more) happily do:

	i = iov_entry_index(iov, n, sent, &entry_offset);
	if (i < n) {
		/* A partial frame was sent */
		tap_send_remainder(c, &iov[i], entry_offset);
		i++;
	}

> +{
> +	size_t i;
> +
> +	for (i = 0; i < n; i++) {
> +	if (*offset < iov[i].iov_len)

Indentation.

> +			break;
> +		*offset -= iov[i].iov_len;
> +	}
> +
> +	return i;
> +}
> diff --git a/util.h b/util.h
> index d2320f8c..62fad6fe 100644
> --- a/util.h
> +++ b/util.h
> @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid);
>  int __daemon(int pidfile_fd, int devnull_fd);
>  int fls(unsigned long x);
>  int write_file(const char *path, const char *buf);
> +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
>  
>  /**
>   * mod_sub() - Modular arithmetic subtraction

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset
  2024-02-22  5:55 ` [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset David Gibson
@ 2024-02-27 14:23   ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2024-02-27 14:23 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Thu, 22 Feb 2024 16:55:58 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Update the low-level helper pcap_frame() to take a struct iovec and
> offset within it, rather than an explicit pointer and length for the
> frame.  This moves the handling of an offset (to skip vnet_len) from
> pcap_multiple() to pcap_frame().
> 
> This doesn't accomplish a great deal immediately, but will make
> subsequent changes easier.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  pcap.c | 29 ++++++++++++-----------------
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/pcap.c b/pcap.c
> index 501d52d4..8876a051 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -67,24 +67,25 @@ struct pcap_pkthdr {
>  
>  /**
>   * pcap_frame() - Capture a single frame to pcap file with given timestamp
> - * @pkt:	Pointer to data buffer, including L2 headers
> - * @len:	L2 packet length
> + * @iov:	iovec referencing buffer containing frame (with L2 headers)

For consistency with, say, 1/6: "IO vector"

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] util: Add write_remainder() helper
  2024-02-22  5:55 ` [PATCH 3/6] util: Add write_remainder() helper David Gibson
@ 2024-02-27 14:25   ` Stefano Brivio
  2024-02-28  0:44     ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2024-02-27 14:25 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Thu, 22 Feb 2024 16:55:59 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> We have several places where we want to write(2) a buffer or buffers and we
> do (or should) handle sort write()s by retrying until everything is

After making sure that "handle sorting" doesn't exist (yet): is one
between "handle" and "sort" redundant, or is there another meaning to
this?

> successfully written.  Add a helper for this in util.c.
> 
> This version has some differences from the typical write_all() function.
> First, take an IO vector rather than a single buffer, because that will be
> useful for some of our cases.  Second, allow it to take an parameter to
> skip the first n bytes of the given buffers.  This will be usefl for some
> of the cases we want, and also falls out quite naturally from the
> implementation.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  util.c | 32 ++++++++++++++++++++++++++++++++
>  util.h |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/util.c b/util.c
> index fb6a0430..a475f2b5 100644
> --- a/util.c
> +++ b/util.c
> @@ -19,6 +19,7 @@
>  #include <arpa/inet.h>
>  #include <net/ethernet.h>
>  #include <sys/epoll.h>
> +#include <sys/uio.h>
>  #include <fcntl.h>
>  #include <string.h>
>  #include <time.h>
> @@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)
>  
>  	return i;
>  }
> +
> +/* write_remainder() - write the tail of an IO vector to an fd
> + * @fd:		File descriptor
> + * @iov:	IO vector
> + * @iovcnt:	Number of entries in @iov
> + * @skip:	Number of bytes of the vector to skip writing
> + *
> + * Return: 0 on success, -1 on error (with errno set)
> + *
> + * #syscalls write writev
> + */
> +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip)
> +{
> +	int i;
> +
> +	while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) {

With my proposal for 1/6 this becomes:

	while ((i = iov_entry_index(iov, iovcnt, skip, &offset)) < iovcnt) {

		if (offset)
			...

which I don't really find brilliant, but at least we don't do if (skip)
where 'skip' used to be something completely different.

> +		ssize_t rc;
> +
> +		if (skip)

Curly brackets here for consistency (undecided about readability to be
honest).

> +			rc = write(fd, (char *)iov[i].iov_base + skip,
> +				   iov[i].iov_len - skip);
> +		else
> +			rc = writev(fd, &iov[i], iovcnt - i);
> +
> +		if (rc < 0)
> +			return -1;
> +		skip += rc;
> +	}
> +
> +	return 0;
> +}
> diff --git a/util.h b/util.h
> index 62fad6fe..ee380f55 100644
> --- a/util.h
> +++ b/util.h
> @@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd);
>  int fls(unsigned long x);
>  int write_file(const char *path, const char *buf);
>  size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
> +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip);
>  
>  /**
>   * mod_sub() - Modular arithmetic subtraction

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames
  2024-02-22  5:56 ` [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames David Gibson
@ 2024-02-27 14:26   ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2024-02-27 14:26 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Thu, 22 Feb 2024 16:56:01 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> pcap_frame() explicitly takes a single frame, and only allows a single
> buffer (iovec) to be passed.  pcap_multiple() takes multiple buffers, but
> explicitly expects exactly one frame per buffer.
> 
> Future changes are going to want to split single frames across multiple
> buffers in some circumstances, so extend the pcap functions to allow for
> that.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  pcap.c | 24 ++++++++++++++----------
>  pcap.h |  3 ++-
>  tap.c  |  2 +-
>  3 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/pcap.c b/pcap.c
> index eeb71a3c..9eb4f3d2 100644
> --- a/pcap.c
> +++ b/pcap.c
> @@ -31,6 +31,7 @@
>  #include "util.h"
>  #include "passt.h"
>  #include "log.h"
> +#include "pcap.h"
>  
>  #define PCAP_VERSION_MINOR 4
>  
> @@ -67,14 +68,15 @@ struct pcap_pkthdr {
>  
>  /**
>   * pcap_frame() - Capture a single frame to pcap file with given timestamp
> - * @iov:	iovec referencing buffer containing frame (with L2 headers)
> - * @offset:	Offset of the frame from @iov->iov_base
> + * @iov:	IO vector containing frame (with L2 headers and tap headers)
> + * @iovcnt:	Number of buffers (@iov entries) in frame
> + * @offset:	Byte offset of the L2 headers within @iov

I swear I didn't read "Byte offset" from here before commenting on 1/6,
which is rather promising.

>   * @tv:		Timestamp
>   *
>   * Returns: 0 on success, -errno on error writing to the file
>   */
> -static void pcap_frame(const struct iovec *iov, size_t offset,
> -		       const struct timeval *tv)
> +static void pcap_frame(const struct iovec *iov, size_t iovcnt,
> +		       size_t offset, const struct timeval *tv)
>  {
>  	size_t len = iov->iov_len - offset;
>  	struct pcap_pkthdr h = {
> @@ -86,7 +88,7 @@ static void pcap_frame(const struct iovec *iov, size_t offset,
>  	struct iovec hiov = { &h, sizeof(h) };
>  
>  	if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 ||
> -	    write_remainder(pcap_fd, iov, 1, offset) < 0)
> +	    write_remainder(pcap_fd, iov, iovcnt, offset) < 0)
>  		debug("Cannot log packet, length %zu: %s",
>  		      len, strerror(errno));
>  }
> @@ -105,16 +107,18 @@ void pcap(const char *pkt, size_t len)
>  		return;
>  
>  	gettimeofday(&tv, NULL);
> -	pcap_frame(&iov, 0, &tv);
> +	pcap_frame(&iov, 1, 0, &tv);
>  }
>  
>  /**
>   * pcap_multiple() - Capture multiple frames
> - * @iov:	Array of iovecs, one entry per frame
> + * @iov:	Array of iovecs, every @framebufs entries is one frame
> + * @framebufs:	Number of buffers per frame

I found this a bit hard to understand. What about:

* @iov:		Array of IO vectors, with item count @frame_parts * @n
* @frame_parts:	Number of IO vector items for each frame

?

The rest of the series looks good to me.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] util: Add helper to find offset into io vector
  2024-02-27 14:23   ` Stefano Brivio
@ 2024-02-27 23:27     ` David Gibson
  0 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2024-02-27 23:27 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev

[-- Attachment #1: Type: text/plain, Size: 5226 bytes --]

On Tue, Feb 27, 2024 at 03:23:35PM +0100, Stefano Brivio wrote:
> On Thu, 22 Feb 2024 16:55:57 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > tap_send_frames_passt() needs to determine which buffer element a byte
> > offset into an IO vector lies in.  We have some upcoming uses for similar
> > logic, so split this out into a helper function iov_offset().
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  tap.c  | 13 +++++--------
> >  util.c | 23 +++++++++++++++++++++++
> >  util.h |  1 +
> 
> Laurent, I guess this will need to be moved to iov.h by your series, at
> the point where you introduce the new header.

To avoid that shuffle, I cherry picked Laurent's first patch into my
series, and rebased putting this function directly into iov.c.
Several of Laurent's functions can also be slightly simplified using
this helper, so I've done that.

> >  3 files changed, 29 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tap.c b/tap.c
> > index 396dee7e..f15eba6e 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c,
> >  		.msg_iovlen = n,
> >  	};
> >  	unsigned int i;
> > +	size_t offset;
> >  	ssize_t sent;
> >  
> >  	sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT);
> >  	if (sent < 0)
> >  		return 0;
> >  
> > -	/* Check for any partial frames due to short send */
> 
> Why would you drop this comment, though? I feel it's needed even more
> as we don't open-code that any longer.

I guess I thought it was redundant with the comment a few lines down?
Anyway, I've put it back in.

> 
> > -	for (i = 0; i < n; i++) {
> > -		if ((size_t)sent < iov[i].iov_len)
> > -			break;
> > -		sent -= iov[i].iov_len;
> > -	}
> > +	offset = (size_t)sent;
> > +	i = iov_offset(iov, n, &offset);
> 
> I think with these names and interface this becomes quite obscure: it
> sounds like 'i' should be an offset at this point... and 'offset', I
> have no idea (unless I read the comment to iov_offset()). Slightly
> different proposal below.

Yeah, that's fair.  I was think of "iov_offset" as "offset the IOV",
but "offset *of* the IOV" is probably a more obvious interpretation.

> >  
> > -	if (i < n && sent) {
> > +	if (i < n && offset) {
> >  		/* A partial frame was sent */
> > -		tap_send_remainder(c, &iov[i], sent);
> > +		tap_send_remainder(c, &iov[i], offset);
> >  		i++;
> >  	}
> >  
> > diff --git a/util.c b/util.c
> > index 21b35ff9..fb6a0430 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
> >  	return clone(fn, stack_area + stack_size / 2, flags, arg);
> >  #endif
> >  }
> > +
> > +/* iov_offset() - interpret offset into an IO vector
> > + * @iov:	IO vector
> > + * @n:		Number of entries in @iov
> > + * @offset:	Pointer to offset into IO vector
> > + *
> > + * Return: index I of iovec which contains the given offset, or @n if
> > + *         the given offset is >= the total # of bytes in the vector.
> > + *         *@offset is updated to be the byte offset into (@iov + I),
> > + *         and is guaranteed to be less than @iov[I].iov_len
> > + */
> > +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)
> 
> ...what do you think of:
> 
> /* iov_entry_index() - Index and optionally entry offset, given global offset

Hm, I find this name and and one line description confusing in a
different way from mine.  In my next spin, I've tried to synthesize
your suggestion and mine, into a version under the name
iov_skip_bytes().

>  * @iov:		IO vector
>  * @n:			Number of entries in @iov
>  * @global_offset:	Global offset of byte in IO vector we're looking for
>  * @entry_offset:	If not NULL, set on return: entry offset
>  *
>  * Return: index of IO vector entry for given byte offset, @n if not found
>  *
>  * Note: @entry_offset is guaranteed to be less than @iov[i].iov_len, where i is
>  * the return value
>  */
> 
> and tap_send_frames_passt() could (more) happily do:
> 
> 	i = iov_entry_index(iov, n, sent, &entry_offset);
> 	if (i < n) {
> 		/* A partial frame was sent */
> 		tap_send_remainder(c, &iov[i], entry_offset);
> 		i++;
> 	}
> 
> > +{
> > +	size_t i;
> > +
> > +	for (i = 0; i < n; i++) {
> > +	if (*offset < iov[i].iov_len)
> 
> Indentation.
> 
> > +			break;
> > +		*offset -= iov[i].iov_len;
> > +	}
> > +
> > +	return i;
> > +}
> > diff --git a/util.h b/util.h
> > index d2320f8c..62fad6fe 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid);
> >  int __daemon(int pidfile_fd, int devnull_fd);
> >  int fls(unsigned long x);
> >  int write_file(const char *path, const char *buf);
> > +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
> >  
> >  /**
> >   * mod_sub() - Modular arithmetic subtraction
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] util: Add write_remainder() helper
  2024-02-27 14:25   ` Stefano Brivio
@ 2024-02-28  0:44     ` David Gibson
  2024-02-28  6:24       ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-28  0:44 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 4015 bytes --]

On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:
> On Thu, 22 Feb 2024 16:55:59 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We have several places where we want to write(2) a buffer or buffers and we
> > do (or should) handle sort write()s by retrying until everything is
> 
> After making sure that "handle sorting" doesn't exist (yet): is one
> between "handle" and "sort" redundant, or is there another meaning to
> this?

Oops.  s/sort/short/ is what makes this make sense.

> 
> > successfully written.  Add a helper for this in util.c.
> > 
> > This version has some differences from the typical write_all() function.
> > First, take an IO vector rather than a single buffer, because that will be
> > useful for some of our cases.  Second, allow it to take an parameter to
> > skip the first n bytes of the given buffers.  This will be usefl for some
> > of the cases we want, and also falls out quite naturally from the
> > implementation.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  util.c | 32 ++++++++++++++++++++++++++++++++
> >  util.h |  1 +
> >  2 files changed, 33 insertions(+)
> > 
> > diff --git a/util.c b/util.c
> > index fb6a0430..a475f2b5 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -19,6 +19,7 @@
> >  #include <arpa/inet.h>
> >  #include <net/ethernet.h>
> >  #include <sys/epoll.h>
> > +#include <sys/uio.h>
> >  #include <fcntl.h>
> >  #include <string.h>
> >  #include <time.h>
> > @@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)
> >  
> >  	return i;
> >  }
> > +
> > +/* write_remainder() - write the tail of an IO vector to an fd
> > + * @fd:		File descriptor
> > + * @iov:	IO vector
> > + * @iovcnt:	Number of entries in @iov
> > + * @skip:	Number of bytes of the vector to skip writing
> > + *
> > + * Return: 0 on success, -1 on error (with errno set)
> > + *
> > + * #syscalls write writev
> > + */
> > +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip)
> > +{
> > +	int i;
> > +
> > +	while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) {
> 
> With my proposal for 1/6 this becomes:
> 
> 	while ((i = iov_entry_index(iov, iovcnt, skip, &offset)) < iovcnt) {

Hrm, the easiest conversion is to use (..., skip, &skip).  Using a new
variable means we'd need to feed that back into skip somehow on the
next loop.

> 
> 		if (offset)
> 			...
> 
> which I don't really find brilliant, but at least we don't do if (skip)
> where 'skip' used to be something completely different.

So.. the version I have now you might not like based on this comment,
because it still has 'skip' essentially become a local variable with a
different meaning from the one it has at entry.

> > +		ssize_t rc;
> > +
> > +		if (skip)
> 
> Curly brackets here for consistency (undecided about readability to be
> honest).

Uh.. consistency with what?  We don't typically brace single line
clauses in passt.

> 
> > +			rc = write(fd, (char *)iov[i].iov_base + skip,
> > +				   iov[i].iov_len - skip);
> > +		else
> > +			rc = writev(fd, &iov[i], iovcnt - i);
> > +
> > +		if (rc < 0)
> > +			return -1;
> > +		skip += rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/util.h b/util.h
> > index 62fad6fe..ee380f55 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd);
> >  int fls(unsigned long x);
> >  int write_file(const char *path, const char *buf);
> >  size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset);
> > +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip);
> >  
> >  /**
> >   * mod_sub() - Modular arithmetic subtraction
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] util: Add write_remainder() helper
  2024-02-28  0:44     ` David Gibson
@ 2024-02-28  6:24       ` Stefano Brivio
  2024-02-28  9:04         ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2024-02-28  6:24 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Wed, 28 Feb 2024 11:44:28 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:
> > On Thu, 22 Feb 2024 16:55:59 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > [...]
> >
> > > +
> > > +		if (skip)  
> > 
> > Curly brackets here for consistency (undecided about readability to be
> > honest).  
>
> Uh.. consistency with what?  We don't typically brace single line
> clauses in passt.
>
> > > +			rc = write(fd, (char *)iov[i].iov_base + skip,
> > > +				   iov[i].iov_len - skip);

These are two lines though. I've been trying to keep this consistent
with the Linux kernel's net/ and drivers/net/ style, where curly braces
are used for multiple lines, even if it's a single statement.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] util: Add write_remainder() helper
  2024-02-28  6:24       ` Stefano Brivio
@ 2024-02-28  9:04         ` David Gibson
  2024-02-28  9:22           ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2024-02-28  9:04 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

On Wed, Feb 28, 2024 at 07:24:10AM +0100, Stefano Brivio wrote:
> On Wed, 28 Feb 2024 11:44:28 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:
> > > On Thu, 22 Feb 2024 16:55:59 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > [...]
> > >
> > > > +
> > > > +		if (skip)  
> > > 
> > > Curly brackets here for consistency (undecided about readability to be
> > > honest).  
> >
> > Uh.. consistency with what?  We don't typically brace single line
> > clauses in passt.
> >
> > > > +			rc = write(fd, (char *)iov[i].iov_base + skip,
> > > > +				   iov[i].iov_len - skip);
> 
> These are two lines though. I've been trying to keep this consistent
> with the Linux kernel's net/ and drivers/net/ style, where curly braces
> are used for multiple lines, even if it's a single statement.

Ah, fair enough.  It never occurred to me to consider physical lines
rather than logical lines.

Adjust and merge if you want, otherwise I'll respin with this change
tomorrow.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] util: Add write_remainder() helper
  2024-02-28  9:04         ` David Gibson
@ 2024-02-28  9:22           ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2024-02-28  9:22 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Wed, 28 Feb 2024 20:04:55 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 28, 2024 at 07:24:10AM +0100, Stefano Brivio wrote:
> > On Wed, 28 Feb 2024 11:44:28 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:  
> > > > On Thu, 22 Feb 2024 16:55:59 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > [...]
> > > >  
> > > > > +
> > > > > +		if (skip)    
> > > > 
> > > > Curly brackets here for consistency (undecided about readability to be
> > > > honest).    
> > >
> > > Uh.. consistency with what?  We don't typically brace single line
> > > clauses in passt.
> > >  
> > > > > +			rc = write(fd, (char *)iov[i].iov_base + skip,
> > > > > +				   iov[i].iov_len - skip);  
> > 
> > These are two lines though. I've been trying to keep this consistent
> > with the Linux kernel's net/ and drivers/net/ style, where curly braces
> > are used for multiple lines, even if it's a single statement.  
> 
> Ah, fair enough.  It never occurred to me to consider physical lines
> rather than logical lines.
> 
> Adjust and merge if you want, otherwise I'll respin with this change
> tomorrow.

Sure, let me adjust this on merge. I'm still reviewing the rest.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-02-28  9:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22  5:55 [PATCH 0/6] Allow more use of iovecs in pcap and tap interfaces David Gibson
2024-02-22  5:55 ` [PATCH 1/6] util: Add helper to find offset into io vector David Gibson
2024-02-27 14:23   ` Stefano Brivio
2024-02-27 23:27     ` David Gibson
2024-02-22  5:55 ` [PATCH 2/6] pcap: Update pcap_frame() to take an iovec and offset David Gibson
2024-02-27 14:23   ` Stefano Brivio
2024-02-22  5:55 ` [PATCH 3/6] util: Add write_remainder() helper David Gibson
2024-02-27 14:25   ` Stefano Brivio
2024-02-28  0:44     ` David Gibson
2024-02-28  6:24       ` Stefano Brivio
2024-02-28  9:04         ` David Gibson
2024-02-28  9:22           ` Stefano Brivio
2024-02-22  5:56 ` [PATCH 4/6] pcap: Handle short writes in pcap_frame() David Gibson
2024-02-22  5:56 ` [PATCH 5/6] pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames David Gibson
2024-02-27 14:26   ` Stefano Brivio
2024-02-22  5:56 ` [PATCH 6/6] tap: Use write_remainder() in tap_send_frames_passt() David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).