[PATCH 00/11] Minor assorted fixes, mostly logging and tests

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH 00/11] Minor assorted fixes, mostly logging and tests
@ 2024-07-24 21:50 Stefano Brivio
  2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

Here are a bunch of mostly unrelated assorted fixes for small issues
I found in the past few weeks but didn't really find time to report.

Stefano Brivio (11):
  tap: Don't quit if pasta gets EIO on writev() to tap, interface might
    be down
  tcp: Change SO_PEEK_OFF support message to debug()
  log: Drop newlines in the middle of the perror()-like messages
  log: Fix sub-second part in relative log time calculation
  log: Initialise timestamp for relative log time also if we use a log
    file
  test: Fix memory/passt tests, --netns-only is not a valid option for
    passt
  test: Update names of symbols and slabinfo entries
  test: iperf3 3.16 introduces multiple threads, drop our own
    implementation of that
  tap: Exit if we fail to bind a UNIX domain socket with explicit path
  tap: Discard guest data on length descriptor mismatch
  conf: Accept addresses enclosed by square brackets in port forwarding
    specifiers

 conf.c               | 24 ++++++++----
 flow.c               |  2 +-
 log.c                | 76 +++++++++++++++++++++-----------------
 log.h                | 19 +++++-----
 passt.c              |  2 +
 tap.c                | 18 ++++++---
 tcp.c                |  2 +-
 test/lib/perf_report |  2 +-
 test/lib/test        | 31 +++++-----------
 test/memory/passt    | 35 ++++++++----------
 test/passt.mem.mbuto |  9 +----
 test/perf/passt_tcp  | 39 ++++++++++----------
 test/perf/passt_udp  | 55 ++++++++++++++--------------
 test/perf/pasta_tcp  | 58 ++++++++++++++---------------
 test/perf/pasta_udp  | 87 ++++++++++++++++++++++----------------------
 15 files changed, 229 insertions(+), 230 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:21   ` David Gibson
  2024-07-24 21:50 ` [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug() Stefano Brivio
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

If we start pasta with some ports forwarded, but no --config-net, say:

  $ ./pasta -u 10001

and then use a local, non-loopback address to send traffic to that
port, say:

  $ socat -u FILE:test UDP4:192.0.2.1:10001

pasta writes to the tap file descriptor, but if the interface is down,
we get EIO and terminate.

By itself, what I'm doing in this case is not very useful (I simply
forgot to pass --config-net), but if we happen to have a DHCP client
in the network namespace, the interface might still be down while
somebody tries to send traffic to it, and exiting in that case is not
really helpful.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tap.c b/tap.c
index 32a7b09..6930ad8 100644
--- a/tap.c
+++ b/tap.c
@@ -324,6 +324,7 @@ static size_t tap_send_frames_pasta(const struct ctx *c,
 			case EINTR:
 			case ENOBUFS:
 			case ENOSPC:
+			case EIO:		/* interface down? */
 				break;
 			default:
 				die("Write error on tap device, exiting");
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug()
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
  2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:22   ` David Gibson
  2024-07-24 21:50 ` [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages Stefano Brivio
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

This:

  $ ./pasta
  SO_PEEK_OFF not supported
  #

is a bit annoying, and might trick users who face other issues into
thinking that SO_PEEK_OFF not being supported on a given kernel is
an actual issue.

Even if SO_PEEK_OFF is supported by the kernel, that would be the
only message displayed there, with default options, which looks a bit
out of context.

Switch that to debug(): now that Podman users can pass --debug too, we
can find out quickly if it's supported or not, if SO_PEEK_OFF usage is
suspected of causing any issue.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcp.c b/tcp.c
index c031f13..c0820ce 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2524,7 +2524,7 @@ int tcp_init(struct ctx *c)
 
 	peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
 			  (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
-	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
+	debug("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
 
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
  2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
  2024-07-24 21:50 ` [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug() Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:26   ` David Gibson
  2024-07-24 21:50 ` [PATCH 04/11] log: Fix sub-second part in relative log time calculation Stefano Brivio
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

Calling vlogmsg() twice from logmsg_perror() results in this beauty:

  $ ./pasta -i foo
  Invalid interface name foo
  : No such device

because the first part of the message, corresponding to the first
call, doesn't end with a newline, and vlogmsg() adds it.

Given that we can't easily append an argument (error description) to
a variadic list, add a 'newline' parameter to all the functions that
currently add a newline if missing, and disable that on the first call
to vlogmsg() from logmsg_perror(). Not very pretty but I can't think
of any solution that's less messy than this.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 flow.c |  2 +-
 log.c  | 34 ++++++++++++++++++++++------------
 log.h  | 18 +++++++++---------
 3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/flow.c b/flow.c
index d7d548d..bd5fa2b 100644
--- a/flow.c
+++ b/flow.c
@@ -279,7 +279,7 @@ void flow_log_(const struct flow_common *f, int pri, const char *fmt, ...)
 	else
 		type_or_state = FLOW_TYPE(f);
 
-	logmsg(pri, "Flow %u (%s): %s", flow_idx(f), type_or_state, msg);
+	logmsg(true, pri, "Flow %u (%s): %s", flow_idx(f), type_or_state, msg);
 }
 
 /**
diff --git a/log.c b/log.c
index 9ed6bc1..54483e7 100644
--- a/log.c
+++ b/log.c
@@ -46,7 +46,14 @@ int		log_trace;		/* --trace mode enabled */
 bool		log_conf_parsed;	/* Logging options already parsed */
 bool		log_runtime;		/* Daemonised, or ready in foreground */
 
-void vlogmsg(int pri, const char *format, va_list ap)
+/**
+ * vlogmsg() - Print or send messages to log or output files as configured
+ * @newline:	Append newline at the end of the message, if missing
+ * @pri:	Facility and level map, same as priority for vsyslog()
+ * @format:	Message
+ * @ap:		Variable argument list
+ */
+void vlogmsg(bool newline, int pri, const char *format, va_list ap)
 {
 	bool debug_print = (log_mask & LOG_MASK(LOG_DEBUG)) && log_file == -1;
 	struct timespec tp;
@@ -63,9 +70,9 @@ void vlogmsg(int pri, const char *format, va_list ap)
 
 		va_copy(ap2, ap); /* Don't clobber ap, we need it again */
 		if (log_file != -1)
-			logfile_write(pri, format, ap2);
+			logfile_write(newline, pri, format, ap2);
 		else if (!(log_mask & LOG_MASK(LOG_DEBUG)))
-			passt_vsyslog(pri, format, ap2);
+			passt_vsyslog(newline, pri, format, ap2);
 
 		va_end(ap2);
 	}
@@ -73,22 +80,23 @@ void vlogmsg(int pri, const char *format, va_list ap)
 	if (debug_print || !log_conf_parsed ||
 	    (!log_runtime && (log_mask & LOG_MASK(LOG_PRI(pri))))) {
 		(void)vfprintf(stderr, format, ap);
-		if (format[strlen(format)] != '\n')
+		if (newline && format[strlen(format)] != '\n')
 			fprintf(stderr, "\n");
 	}
 }
 
 /**
  * logmsg() - vlogmsg() wrapper for variable argument lists
+ * @newline:	Append newline at the end of the message, if missing
  * @pri:	Facility and level map, same as priority for vsyslog()
  * @format:	Message
  */
-void logmsg(int pri, const char *format, ...)
+void logmsg(bool newline, int pri, const char *format, ...)
 {
 	va_list ap;
 
 	va_start(ap, format);
-	vlogmsg(pri, format, ap);
+	vlogmsg(newline, pri, format, ap);
 	va_end(ap);
 }
 
@@ -103,10 +111,10 @@ void logmsg_perror(int pri, const char *format, ...)
 	va_list ap;
 
 	va_start(ap, format);
-	vlogmsg(pri, format, ap);
+	vlogmsg(false, pri, format, ap);
 	va_end(ap);
 
-	logmsg(pri, ": %s", strerror(errno_copy));
+	logmsg(true, pri, ": %s", strerror(errno_copy));
 }
 
 /* Prefixes for log file messages, indexed by priority */
@@ -174,11 +182,12 @@ void __setlogmask(int mask)
 
 /**
  * passt_vsyslog() - vsyslog() implementation not using heap memory
+ * @newline:	Append newline at the end of the message, if missing
  * @pri:	Facility and level map, same as priority for vsyslog()
  * @format:	Same as vsyslog() format
  * @ap:		Same as vsyslog() ap
  */
-void passt_vsyslog(int pri, const char *format, va_list ap)
+void passt_vsyslog(bool newline, int pri, const char *format, va_list ap)
 {
 	char buf[BUFSIZ];
 	int n;
@@ -188,7 +197,7 @@ void passt_vsyslog(int pri, const char *format, va_list ap)
 
 	n += vsnprintf(buf + n, BUFSIZ - n, format, ap);
 
-	if (format[strlen(format)] != '\n')
+	if (newline && format[strlen(format)] != '\n')
 		n += snprintf(buf + n, BUFSIZ - n, "\n");
 
 	if (log_sock >= 0 && send(log_sock, buf, n, 0) != n && !log_runtime)
@@ -360,11 +369,12 @@ static int logfile_rotate(int fd, const struct timespec *now)
 
 /**
  * logfile_write() - Write entry to log file, trigger rotation if full
+ * @newline:	Append newline at the end of the message, if missing
  * @pri:	Facility and level map, same as priority for vsyslog()
  * @format:	Same as vsyslog() format
  * @ap:		Same as vsyslog() ap
  */
-void logfile_write(int pri, const char *format, va_list ap)
+void logfile_write(bool newline, int pri, const char *format, va_list ap)
 {
 	struct timespec now;
 	char buf[BUFSIZ];
@@ -380,7 +390,7 @@ void logfile_write(int pri, const char *format, va_list ap)
 
 	n += vsnprintf(buf + n, BUFSIZ - n, format, ap);
 
-	if (format[strlen(format)] != '\n')
+	if (newline && format[strlen(format)] != '\n')
 		n += snprintf(buf + n, BUFSIZ - n, "\n");
 
 	if ((log_written + n >= log_size) && logfile_rotate(log_file, &now))
diff --git a/log.h b/log.h
index 6f34e8d..51ddafa 100644
--- a/log.h
+++ b/log.h
@@ -13,16 +13,16 @@
 #define LOGFILE_CUT_RATIO		30	/* When full, cut ~30% size */
 #define LOGFILE_SIZE_MIN		(5UL * MAX(BUFSIZ, PAGE_SIZE))
 
-void vlogmsg(int pri, const char *format, va_list ap);
-void logmsg(int pri, const char *format, ...)
-	__attribute__((format(printf, 2, 3)));
+void vlogmsg(bool newline, int pri, const char *format, va_list ap);
+void logmsg(bool newline, int pri, const char *format, ...)
+	__attribute__((format(printf, 3, 4)));
 void logmsg_perror(int pri, const char *format, ...)
 	__attribute__((format(printf, 2, 3)));
 
-#define err(...)		logmsg(		LOG_ERR,	__VA_ARGS__)
-#define warn(...)		logmsg(		LOG_WARNING,	__VA_ARGS__)
-#define info(...)		logmsg(		LOG_INFO,	__VA_ARGS__)
-#define debug(...)		logmsg(		LOG_DEBUG,	__VA_ARGS__)
+#define err(...)		logmsg(true,	LOG_ERR,	__VA_ARGS__)
+#define warn(...)		logmsg(true,	LOG_WARNING,	__VA_ARGS__)
+#define info(...)		logmsg(true,	LOG_INFO,	__VA_ARGS__)
+#define debug(...)		logmsg(true,	LOG_DEBUG,	__VA_ARGS__)
 
 #define err_perror(...)		logmsg_perror(	LOG_ERR,	__VA_ARGS__)
 #define warn_perror(...)	logmsg_perror(	LOG_WARNING,	__VA_ARGS__)
@@ -54,8 +54,8 @@ void trace_init(int enable);
 
 void __openlog(const char *ident, int option, int facility);
 void logfile_init(const char *name, const char *path, size_t size);
-void passt_vsyslog(int pri, const char *format, va_list ap);
-void logfile_write(int pri, const char *format, va_list ap);
+void passt_vsyslog(bool newline, int pri, const char *format, va_list ap);
+void logfile_write(bool newline, int pri, const char *format, va_list ap);
 void __setlogmask(int mask);
 
 #endif /* LOG_H */
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 04/11] log: Fix sub-second part in relative log time calculation
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (2 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:32   ` David Gibson
  2024-07-24 21:50 ` [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file Stefano Brivio
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

For some reason, in commit 01efc71ddd25 ("log, conf: Add support for
logging to file"), I added calculations for relative logging
timestamps using the difference for the seconds part only, not for
accounting for the fractional part.

Fix that by storing the initial timestamp, log_start, as a timespec
struct, and by calculating differences of both seconds and nanoseconds
from the starting time. Do this in a macro as we need the same
calculation and format in a few places.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 log.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/log.c b/log.c
index 54483e7..f57a54f 100644
--- a/log.c
+++ b/log.c
@@ -40,12 +40,21 @@ static size_t	log_written;		/* Currently used bytes in log file */
 static size_t	log_cut_size;		/* Bytes to cut at start on rotation */
 static char	log_header[BUFSIZ];	/* File header, written back on cuts */
 
-static time_t	log_start;		/* Start timestamp */
+static struct timespec log_start;	/* Start timestamp */
 
 int		log_trace;		/* --trace mode enabled */
 bool		log_conf_parsed;	/* Logging options already parsed */
 bool		log_runtime;		/* Daemonised, or ready in foreground */
 
+/**
+ * logtime_fmt_and_arg() - Build format and arguments to print relative log time
+ * @x:		Current timestamp
+ */
+#define logtime_fmt_and_arg(x)						\
+	"%lli.%04lli",							\
+	((long long int)(x)->tv_sec - log_start.tv_sec),		\
+	(((long long int)(x)->tv_nsec - log_start.tv_nsec) / (100L * 1000))
+
 /**
  * vlogmsg() - Print or send messages to log or output files as configured
  * @newline:	Append newline at the end of the message, if missing
@@ -60,9 +69,8 @@ void vlogmsg(bool newline, int pri, const char *format, va_list ap)
 
 	if (debug_print) {
 		clock_gettime(CLOCK_REALTIME, &tp);
-		fprintf(stderr, "%lli.%04lli: ",
-			(long long int)tp.tv_sec - log_start,
-			(long long int)tp.tv_nsec / (100L * 1000));
+		fprintf(stderr, logtime_fmt_and_arg(&tp));
+		fprintf(stderr, ": ");
 	}
 
 	if ((log_mask & LOG_MASK(LOG_PRI(pri))) || !log_conf_parsed) {
@@ -144,12 +152,9 @@ void trace_init(int enable)
  */
 void __openlog(const char *ident, int option, int facility)
 {
-	struct timespec tp;
-
 	(void)option;
 
-	clock_gettime(CLOCK_REALTIME, &tp);
-	log_start = tp.tv_sec;
+	clock_gettime(CLOCK_REALTIME, &log_start);
 
 	if (log_sock < 0) {
 		struct sockaddr_un a = { .sun_family = AF_UNIX, };
@@ -255,10 +260,8 @@ static void logfile_rotate_fallocate(int fd, const struct timespec *now)
 	if (read(fd, buf, BUFSIZ) == -1)
 		return;
 
-	n = snprintf(buf, BUFSIZ,
-		     "%s - log truncated at %lli.%04lli", log_header,
-		     (long long int)(now->tv_sec - log_start),
-		     (long long int)(now->tv_nsec / (100L * 1000)));
+	n =  snprintf(buf, BUFSIZ, "%s - log truncated at ", log_header);
+	n += snprintf(buf + n, BUFSIZ - n, logtime_fmt_and_arg(now));
 
 	/* Avoid partial lines by padding the header with spaces */
 	nl = memchr(buf + n + 1, '\n', BUFSIZ - n - 1);
@@ -288,10 +291,11 @@ static void logfile_rotate_move(int fd, const struct timespec *now)
 	char buf[BUFSIZ];
 	const char *nl;
 
-	header_len = snprintf(buf, BUFSIZ,
-			      "%s - log truncated at %lli.%04lli\n", log_header,
-			      (long long int)(now->tv_sec - log_start),
-			      (long long int)(now->tv_nsec / (100L * 1000)));
+	header_len =  snprintf(buf, BUFSIZ, "%s - log truncated at ",
+			       log_header);
+	header_len += snprintf(buf + header_len, BUFSIZ - header_len,
+			       logtime_fmt_and_arg(now));
+
 	if (lseek(fd, 0, SEEK_SET) == -1)
 		return;
 	if (write(fd, buf, header_len) == -1)
@@ -383,10 +387,8 @@ void logfile_write(bool newline, int pri, const char *format, va_list ap)
 	if (clock_gettime(CLOCK_REALTIME, &now))
 		return;
 
-	n = snprintf(buf, BUFSIZ, "%lli.%04lli: %s",
-		     (long long int)(now.tv_sec - log_start),
-		     (long long int)(now.tv_nsec / (100L * 1000)),
-		     logfile_prefix[pri]);
+	n  = snprintf(buf, BUFSIZ, logtime_fmt_and_arg(&now));
+	n += snprintf(buf + n, BUFSIZ - n, ": %s", logfile_prefix[pri]);
 
 	n += vsnprintf(buf + n, BUFSIZ - n, format, ap);
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (3 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 04/11] log: Fix sub-second part in relative log time calculation Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:35   ` David Gibson
  2024-07-24 21:50 ` [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt Stefano Brivio
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

...not just for debug messages. Otherwise, timestamps in the log file
are consistent but the starting point is not zero.

Do this right away as we enter main(), so that the resulting
timestamps are as closely as possible relative to when we start.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 log.c   | 4 +---
 log.h   | 1 +
 passt.c | 2 ++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/log.c b/log.c
index f57a54f..5f24c16 100644
--- a/log.c
+++ b/log.c
@@ -40,7 +40,7 @@ static size_t	log_written;		/* Currently used bytes in log file */
 static size_t	log_cut_size;		/* Bytes to cut at start on rotation */
 static char	log_header[BUFSIZ];	/* File header, written back on cuts */
 
-static struct timespec log_start;	/* Start timestamp */
+struct timespec	log_start;		/* Start timestamp */
 
 int		log_trace;		/* --trace mode enabled */
 bool		log_conf_parsed;	/* Logging options already parsed */
@@ -154,8 +154,6 @@ void __openlog(const char *ident, int option, int facility)
 {
 	(void)option;
 
-	clock_gettime(CLOCK_REALTIME, &log_start);
-
 	if (log_sock < 0) {
 		struct sockaddr_un a = { .sun_family = AF_UNIX, };
 
diff --git a/log.h b/log.h
index 51ddafa..e03199c 100644
--- a/log.h
+++ b/log.h
@@ -44,6 +44,7 @@ void logmsg_perror(int pri, const char *format, ...)
 extern int log_trace;
 extern bool log_conf_parsed;
 extern bool log_runtime;
+extern struct timespec log_start;
 
 void trace_init(int enable);
 #define trace(...)							\
diff --git a/passt.c b/passt.c
index eed74ec..72ad704 100644
--- a/passt.c
+++ b/passt.c
@@ -207,6 +207,8 @@ int main(int argc, char **argv)
 	struct timespec now;
 	struct sigaction sa;
 
+	clock_gettime(CLOCK_REALTIME, &log_start);
+
 	arch_avx2_exec(argv);
 
 	isolate_initial();
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (4 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:50   ` David Gibson
  2024-07-24 21:50 ` [PATCH 07/11] test: Update names of symbols and slabinfo entries Stefano Brivio
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

This used to work on my setup as I kept reusing an old mbuto
(initramfs) image, but since commit 65923ba79877 ("conf: Accept
duplicate and conflicting options, the last one wins"), --netns-only
is, as originally intended, a pasta-only option.

I had used --netns-only, here, to prevent passt from trying to detach
its own user namespace, which is not permitted as we're in a chroot,
see unshare(2). In turn, we need the chroot because passt can't pivot
root directly into its own empty filesystem using an initramfs.

Use switch_root into the tmpfs mountpoint instead of chroot, so that
we can still detach user namespaces.

Note that in the mbuto images, we can't switch to nobody as we have
no password entries at all, so we need to detach a further user
namespace before starting passt, to trick passt into running as UID
0.

Given the new sequence, it's now more convenient to directly switch
to a detached network namespace as well, which means we need to move
the initialisation of the dummy network from the init script into the
test script.

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 test/memory/passt    | 13 ++++++++++---
 test/passt.mem.mbuto |  9 +--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/test/memory/passt b/test/memory/passt
index 1193af8..bf78c8f 100644
--- a/test/memory/passt
+++ b/test/memory/passt
@@ -44,7 +44,7 @@ endef
 def	start_stop_diff
 guest	sed /proc/slabinfo -ne 's/^\([^ ]* *[^ ]* *[^ ]* *[^ ]*\).*/\\\1/p' > /tmp/slabinfo.before
 guest	cat /proc/meminfo > /tmp/meminfo.before
-guest	/bin/passt.avx2 -l /tmp/log -s /tmp/sock -P /tmp/pid __OPTS__ --netns-only
+guest	/bin/passt.avx2 -l /tmp/log -s /tmp/sock -P /tmp/pid __OPTS__
 sleep	2
 guest	cat /proc/meminfo > /tmp/meminfo.after
 guest	sed /proc/slabinfo -ne 's/^\([^ ]* *[^ ]* *[^ ]* *[^ ]*\).*/\\\1/p' > /tmp/slabinfo.after
@@ -78,9 +78,16 @@ guest	mount -o bind /proc /test/proc
 guest	mount -o bind /dev /test/dev
 guest	cp -Lr /bin /lib /lib64 /usr /sbin /test/
 
+guest	exec switch_root /test /bin/sh
+
 guest	ulimit -Hn 300000
-guest	unshare -rUm -R /test
-guest	chroot .
+guest	unshare -rUn
+guest	ip link add eth0 type dummy
+guest	ip link set eth0 up
+guest	ip address add 192.0.2.2/24 dev eth0
+guest	ip address add 2001:db8::2/64 dev eth0
+guest	ip route add default via 192.0.2.1
+guest	ip -6 route add default via 2001:db8::1 dev eth0
 
 guest	meminfo_size() { grep "^$2:" $1 | tr -s ' ' | cut -f2 -d ' '; }
 guest	meminfo_diff() { echo $(( $(meminfo_size $2 $3) - $(meminfo_size $1 $3) )); }
diff --git a/test/passt.mem.mbuto b/test/passt.mem.mbuto
index 56f5139..532eae0 100755
--- a/test/passt.mem.mbuto
+++ b/test/passt.mem.mbuto
@@ -12,7 +12,7 @@
 
 PROGS="${PROGS:-ash,dash,bash chmod ip mount insmod mkdir ln cat chmod modprobe
        grep mknod sed chown sleep bc ls ps mount unshare chroot cp kill diff
-       head tail sort tr tee cut nm which}"
+       head tail sort tr tee cut nm which switch_root}"
 
 KMODS="${KMODS:- dummy}"
 
@@ -29,13 +29,6 @@ COPIES="${COPIES} ../passt.avx2,/bin/passt.avx2"
 FIXUP="${FIXUP}"'
 ln -s /bin /usr/bin
 chmod 777 /tmp
-ip link add eth0 type dummy
-ip link set eth0 up
-ip address add 192.0.2.2/24 dev eth0
-ip address add 2001:db8::2/64 dev eth0
-ip route add default via 192.0.2.1
-ip -6 route add default via 2001:db8::1 dev eth0
-sleep 2
 sh +m
 '
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 07/11] test: Update names of symbols and slabinfo entries
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (5 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  3:54   ` David Gibson
  2024-07-24 21:50 ` [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that Stefano Brivio
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

Differences in allocated Acpi-Parse entries are gone (at least) since
the 6.1 Linux kernel series. I should run this on a 6.10 kernel,
eventually, and adjust things further, as needed.

Userspace symbols are also fairly different now: show whatever is more
than 1 MiB at the moment.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 test/memory/passt | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/test/memory/passt b/test/memory/passt
index bf78c8f..7e45724 100644
--- a/test/memory/passt
+++ b/test/memory/passt
@@ -110,27 +110,17 @@ info
 th	symbol MiB
 set	WHAT tcp_buf_discard
 nm_row
-set	WHAT tcp6_l2_buf
+set	WHAT flowtab
 nm_row
-set	WHAT tcp4_l2_buf
+set	WHAT tcp6_payload
 nm_row
-set	WHAT tc
+set	WHAT tcp4_payload
 nm_row
 set	WHAT pkt_buf
 nm_row
-set	WHAT udp_splice_map
+set	WHAT udp_payload
 nm_row
-set	WHAT udp6_l2_buf
-nm_row
-set	WHAT udp4_l2_buf
-nm_row
-set	WHAT udp_tap_map
-nm_row
-set	WHAT icmp_id_map
-nm_row
-set	WHAT udp_splice_buf
-nm_row
-set	WHAT tc_hash
+set	WHAT flow_hashtab
 nm_row
 set	WHAT pool_tap6_storage
 nm_row
@@ -149,8 +139,6 @@ set	WHAT pid
 slab_row
 set	WHAT dentry
 slab_row
-set	WHAT Acpi-Parse
-slab_row
 set	WHAT kmalloc-64
 slab_row
 set	WHAT kmalloc-32
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (6 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 07/11] test: Update names of symbols and slabinfo entries Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  4:23   ` David Gibson
  2024-07-24 21:50 ` [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path Stefano Brivio
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

Starting from iperf3 version 3.16, -P / --parallel spawns multiple
clients as separate threads, instead of multiple streams serviced by
the same thread.

So we can drop our lib/test implementation to spawn several iperf3
client and server processes and finally simplify things quite a bit.

Adjust number of threads and UDP sending bandwidth to values that seem
to be more or less matching previous throughput tests on my setup.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 test/lib/perf_report |  2 +-
 test/lib/test        | 31 +++++-----------
 test/perf/passt_tcp  | 39 ++++++++++----------
 test/perf/passt_udp  | 55 ++++++++++++++--------------
 test/perf/pasta_tcp  | 58 ++++++++++++++---------------
 test/perf/pasta_udp  | 87 ++++++++++++++++++++++----------------------
 6 files changed, 127 insertions(+), 145 deletions(-)

diff --git a/test/lib/perf_report b/test/lib/perf_report
index 67f9f4e..4d79fa2 100755
--- a/test/lib/perf_report
+++ b/test/lib/perf_report
@@ -18,7 +18,7 @@ PERF_LINK_COUNT=0
 PERF_JS="${LOGDIR}/web/perf.js"
 
 PERF_TEMPLATE_HTML="document.write('"'
-Throughput in Gbps, latency in µs. Threads are <span style="font-family: monospace;">iperf3</span> processes, <i>passt</i> and <i>pasta</i> are currently single-threaded.<br/>
+Throughput in Gbps, latency in µs. Threads are <span style="font-family: monospace;">iperf3</span> threads, <i>passt</i> and <i>pasta</i> are currently single-threaded.<br/>
 Click on numbers to show test execution. Measured at head, commit <span style="font-family: monospace;">__commit__</span>.
 
 <style type="text/CSS">
diff --git a/test/lib/test b/test/lib/test
index 1d571c3..c525f8e 100755
--- a/test/lib/test
+++ b/test/lib/test
@@ -15,18 +15,13 @@
 
 # test_iperf3s() - Start iperf3 server
 # $1:	Destination/server context
-# $2:	Port number, ${i} is translated to process index
-# $3:	Number of processes to run in parallel
+# $2:	Port number
 test_iperf3s() {
 	__sctx="${1}"
 	__port="${2}"
-	__procs="$((${3} - 1))"
 
 	pane_or_context_run_bg "${__sctx}" 				\
-		 'for i in $(seq 0 '${__procs}'); do'			\
-		 '	iperf3 -s -p'${__port}' &'			\
-		 '	echo $! > s${i}.pid; '				\
-		 'done'							\
+		 'iperf3 -s -p'${__port}' & echo $! > s.pid'		\
 
 	sleep 1		# Wait for server to be ready
 }
@@ -36,7 +31,7 @@ test_iperf3s() {
 test_iperf3k() {
 	__sctx="${1}"
 
-	pane_or_context_run "${__sctx}" 'kill -INT $(cat s*.pid); rm s*.pid'
+	pane_or_context_run "${__sctx}" 'kill -INT $(cat s.pid); rm s.pid'
 
 	sleep 3		# Wait for kernel to free up ports
 }
@@ -46,37 +41,29 @@ test_iperf3k() {
 # $2:	Source/client context
 # $3:	Destination name or address for client
 # $4:	Port number, ${i} is translated to process index
-# $5:	Number of processes to run in parallel
-# $6:	Run time, in seconds
+# $5:	Run time, in seconds
 # $@:	Client options
 test_iperf3() {
 	__var="${1}"; shift
 	__cctx="${1}"; shift
 	__dest="${1}"; shift
 	__port="${1}"; shift
-	__procs="$((${1} - 1))"; shift
 	__time="${1}"; shift
 
-	pane_or_context_run "${__cctx}" 'rm -f c*.json'
+	pane_or_context_run "${__cctx}" 'rm -f c.json'
 
         # A 1s wait for connection on what's basically a local link
         # indicates something is pretty wrong
         __timeout=1000
 	pane_or_context_run "${__cctx}" 				\
-		 '('							\
-		 '	for i in $(seq 0 '${__procs}'); do'		\
-		 '		iperf3 -J -c '${__dest}' -p '${__port}	\
-		 '		 --connect-timeout '${__timeout}	\
-		 '		 -t'${__time}' -i0 -T c${i} '"${@}"	\
-                 ' 		> c${i}.json &'				\
-		 '	done;'						\
-		 '	wait'						\
-		 ')'
+		 'iperf3 -J -c '${__dest}' -p '${__port}		\
+		 '	 --connect-timeout '${__timeout}		\
+		 '	 -t'${__time}' -i0 '"${@}"' > c.json'		\
 
 	__jval=".end.sum_received.bits_per_second"
 
 	__bw=$(pane_or_context_output "${__cctx}"			\
-		 'cat c*.json | jq -rMs "map('${__jval}') | add"')
+		 'cat c.json | jq -rMs "map('${__jval}') | add"')
 
 	TEST_ONE_subs="$(list_add_pair "${TEST_ONE_subs}" "__${__var}__" "${__bw}" )"
 }
diff --git a/test/perf/passt_tcp b/test/perf/passt_tcp
index 631a407..14343cb 100644
--- a/test/perf/passt_tcp
+++ b/test/perf/passt_tcp
@@ -37,34 +37,33 @@ hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\
 hout	FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l
 hout	FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__
 
-set	THREADS 1
-set	STREAMS 8
+set	THREADS 4
 set	TIME 10
 set	OMIT 0.1
-set	OPTS -Z -P __STREAMS__ -l 1M -O__OMIT__
+set	OPTS -Z -P __THREADS__ -l 1M -O__OMIT__
 
-info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz, __STREAMS__ streams
+info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz
 report	passt tcp __THREADS__ __FREQ__
 
 th	MTU 256B 576B 1280B 1500B 9000B 65520B
 
 
 tr	TCP throughput over IPv6: guest to host
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 bw	-
 bw	-
 guest	ip link set dev __IFNAME__ mtu 1280
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.2 1.5
 guest	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.6 1.8
 guest	ip link set dev __IFNAME__ mtu 9000
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 4.0 5.0
 guest	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 16M
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 16M
 bw	__BW__ 7.0 8.0
 
 iperf3k	ns
@@ -90,25 +89,25 @@ gout	LAT tcp_crr --nolog -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.
 lat	__LAT__ 500 400
 
 tr	TCP throughput over IPv4: guest to host
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 guest	ip link set dev __IFNAME__ mtu 256
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 1M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.2 0.3
 guest	ip link set dev __IFNAME__ mtu 576
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 1M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.5 0.8
 guest	ip link set dev __IFNAME__ mtu 1280
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.2 1.5
 guest	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.6 1.8
 guest	ip link set dev __IFNAME__ mtu 9000
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 4.0 5.0
 guest	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -w 16M
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 16M
 bw	__BW__ 7.0 8.0
 
 iperf3k	ns
@@ -134,14 +133,14 @@ gout	LAT tcp_crr --nolog -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 500 400
 
 tr	TCP throughput over IPv6: host to guest
-iperf3s	guest 100${i}1 __THREADS__
+iperf3s	guest 10001
 
 bw	-
 bw	-
 bw	-
 bw	-
 bw	-
-iperf3	BW ns ::1 100${i}1 __THREADS__ __TIME__ __OPTS__
+iperf3	BW ns ::1 10001 __TIME__ __OPTS__
 bw	__BW__ 6.0 6.8
 
 iperf3k	guest
@@ -170,14 +169,14 @@ lat	__LAT__ 500 350
 
 
 tr	TCP throughput over IPv4: host to guest
-iperf3s	guest 100${i}1 __THREADS__
+iperf3s	guest 10001
 
 bw	-
 bw	-
 bw	-
 bw	-
 bw	-
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__
 bw	__BW__ 6.0 6.8
 
 iperf3k	guest
diff --git a/test/perf/passt_udp b/test/perf/passt_udp
index 10f638f..8919280 100644
--- a/test/perf/passt_udp
+++ b/test/perf/passt_udp
@@ -30,30 +30,29 @@ hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\
 hout	FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l
 hout	FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__
 
-set	THREADS 4
-set	STREAMS 1
+set	THREADS 2
 set	TIME 10
-set	OPTS -u -P __STREAMS__ --pacing-timer 1000
+set	OPTS -u -P __THREADS__ --pacing-timer 1000
 
-info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz, one stream each
+info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz
 
 report	passt udp __THREADS__ __FREQ__
 
 th	pktlen 256B 576B 1280B 1500B 9000B 65520B
 
 tr	UDP throughput over IPv6: guest to host
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header
 
 bw	-
 bw	-
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1232
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 3G -l 1232
 bw	__BW__ 0.8 1.2
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1452
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 4G -l 1452
 bw	__BW__ 1.0 1.5
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 5G -l 8952
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 8G -l 8952
 bw	__BW__ 4.0 5.0
-iperf3	BW guest __GW6__%__IFNAME__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 7G -l 64372
+iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 15G -l 64372
 bw	__BW__ 4.0 5.0
 
 iperf3k	ns
@@ -70,20 +69,20 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: guest to host
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 500M -l 228
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 1G -l 228
 bw	__BW__ 0.0 0.0
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 1G -l 548
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 2G -l 548
 bw	__BW__ 0.4 0.6
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1252
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 3G -l 1252
 bw	__BW__ 0.8 1.2
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1472
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 4G -l 1472
 bw	__BW__ 1.0 1.5
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 6G -l 8972
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 8G -l 8972
 bw	__BW__ 4.0 5.0
-iperf3	BW guest __GW__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 7G -l 65492
+iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 15G -l 65492
 bw	__BW__ 4.0 5.0
 
 iperf3k	ns
@@ -100,18 +99,18 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv6: host to guest
-iperf3s	guest 100${i}1 __THREADS__
+iperf3s	guest 10001
 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header
 
 bw	-
 bw	-
-iperf3	BW ns ::1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1232
+iperf3	BW ns ::1 10001 __TIME__ __OPTS__ -b 3G -l 1232
 bw	__BW__ 0.8 1.2
-iperf3	BW ns ::1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1452
+iperf3	BW ns ::1 10001 __TIME__ __OPTS__ -b 4G -l 1452
 bw	__BW__ 1.0 1.5
-iperf3	BW ns ::1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 8952
+iperf3	BW ns ::1 10001 __TIME__ __OPTS__ -b 8G -l 8952
 bw	__BW__ 3.0 4.0
-iperf3	BW ns ::1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 64372
+iperf3	BW ns ::1 10001 __TIME__ __OPTS__ -b 15G -l 64372
 bw	__BW__ 3.0 4.0
 
 iperf3k	guest
@@ -129,20 +128,20 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: host to guest
-iperf3s	guest 100${i}1 __THREADS__
+iperf3s	guest 10001
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 1G -l 228
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 1G -l 228
 bw	__BW__ 0.0 0.0
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 1G -l 548
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 2G -l 548
 bw	__BW__ 0.4 0.6
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1252
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 3G -l 1252
 bw	__BW__ 0.8 1.2
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1472
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 4G -l 1472
 bw	__BW__ 1.0 1.5
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 8972
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 8G -l 8972
 bw	__BW__ 3.0 4.0
-iperf3	BW ns 127.0.0.1 100${i}1 __THREADS__ __TIME__ __OPTS__ -b 3G -l 65492
+iperf3	BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -b 15G -l 65492
 bw	__BW__ 3.0 4.0
 
 iperf3k	guest
diff --git a/test/perf/pasta_tcp b/test/perf/pasta_tcp
index 7777532..8d2f911 100644
--- a/test/perf/pasta_tcp
+++ b/test/perf/pasta_tcp
@@ -21,26 +21,25 @@ ns	/sbin/sysctl -w net.ipv4.tcp_wmem="131072 524288 134217728"
 ns	/sbin/sysctl -w net.ipv4.tcp_timestamps=0
 
 
-set	THREADS 2
-set	STREAMS 2
+set	THREADS 4
 set	TIME 10
 set	OMIT 0.1
-set	OPTS -Z -w 4M -l 1M -P __STREAMS__ -O__OMIT__
+set	OPTS -Z -w 4M -l 1M -P __THREADS__ -O__OMIT__
 
 hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1
 hout	FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l
 hout	FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__
 
 
-info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz, __STREAMS__ streams each
+info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz
 report	pasta lo_tcp __THREADS__ __FREQ__
 
 th	MTU 65535B
 
 tr	TCP throughput over IPv6: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 
-iperf3	BW ns ::1 100${i}3 __THREADS__ __TIME__ __OPTS__
+iperf3	BW ns ::1 10003 __THREADS__ __TIME__ __OPTS__
 bw	__BW__ 15.0 20.0
 
 iperf3k	host
@@ -59,9 +58,9 @@ lat	__LAT__ 500 350
 
 
 tr	TCP throughput over IPv4: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 
-iperf3	BW ns 127.0.0.1 100${i}3 __THREADS__ __TIME__ __OPTS__
+iperf3	BW ns 127.0.0.1 10003 __THREADS__ __TIME__ __OPTS__
 bw	__BW__ 15.0 20.0
 
 iperf3k	host
@@ -79,9 +78,9 @@ hostw
 lat	__LAT__ 500 350
 
 tr	TCP throughput over IPv6: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
-iperf3	BW host ::1 100${i}2 __THREADS__ __TIME__ __OPTS__
+iperf3	BW host ::1 10002 __TIME__ __OPTS__
 bw	__BW__ 15.0 20.0
 
 iperf3k	ns
@@ -100,9 +99,9 @@ lat	__LAT__ 1000 700
 
 
 tr	TCP throughput over IPv4: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
-iperf3	BW host 127.0.0.1 100${i}2 __THREADS__ __TIME__ __OPTS__
+iperf3	BW host 127.0.0.1 10002 __TIME__ __OPTS__
 bw	__BW__ 15.0 20.0
 
 iperf3k	ns
@@ -126,29 +125,28 @@ test	pasta: throughput and latency (connections via tap)
 nsout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
-set	THREADS 1
-set	STREAMS 2
-set	OPTS -Z -P __STREAMS__ -i1 -O__OMIT__
+set	THREADS 2
+set	OPTS -Z -P __THREADS__ -i1 -O__OMIT__
 
-info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz, __STREAMS__ streams
+info	Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz
 report	pasta tap_tcp __THREADS__ __FREQ__
 
 th	MTU 1500B 4000B 16384B 65520B
 
 tr	TCP throughput over IPv6: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 
 ns	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 512k
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 512k
 bw	__BW__ 0.2 0.4
 ns	ip link set dev __IFNAME__ mtu 4000
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 1M
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.3 0.5
 ns	ip link set dev __IFNAME__ mtu 16384
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 1.5 2.0
 ns	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 2.0 2.5
 
 iperf3k	host
@@ -173,19 +171,19 @@ lat	__LAT__ 1500 500
 
 
 tr	TCP throughput over IPv4: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 
 ns	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 512k
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 512k
 bw	__BW__ 0.2 0.4
 ns	ip link set dev __IFNAME__ mtu 4000
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 1M
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.3 0.5
 ns	ip link set dev __IFNAME__ mtu 16384
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 1.5 2.0
 ns	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 2.0 2.5
 
 iperf3k	host
@@ -209,14 +207,14 @@ hostw
 lat	__LAT__ 1500 500
 
 tr	TCP throughput over IPv6: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 nsout	ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.scope == "global" and .prefixlen == 64).local'
 bw	-
 bw	-
 bw	-
-iperf3	BW host __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__
+iperf3	BW host __ADDR6__ 10002 __TIME__ __OPTS__
 bw	__BW__ 8.0 10.0
 
 iperf3k	ns
@@ -242,13 +240,13 @@ lat	__LAT__ 5000 10000
 
 
 tr	TCP throughput over IPv4: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 nsout	ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local'
 bw	-
 bw	-
 bw	-
-iperf3	BW host __ADDR__ 100${i}2 __THREADS__ __TIME__ __OPTS__
+iperf3	BW host __ADDR__ 10002 __TIME__ __OPTS__
 bw	__BW__ 8.0 10.0
 
 iperf3k	ns
diff --git a/test/perf/pasta_udp b/test/perf/pasta_udp
index 5e3db1e..6acbfd3 100644
--- a/test/perf/pasta_udp
+++ b/test/perf/pasta_udp
@@ -21,11 +21,10 @@ hout	FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sy
 hout	FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__
 
 set	THREADS 1
-set	STREAMS 4
 set	TIME 10
-set	OPTS -u -P __STREAMS__
+set	OPTS -u -P __THREADS__
 
-info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz, __STREAMS__ streams
+info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz
 
 report	pasta lo_udp 1 __FREQ__
 
@@ -33,16 +32,16 @@ th	pktlen 1500B 4000B 16384B 65535B
 
 
 tr	UDP throughput over IPv6: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header
 
-iperf3	BW ns ::1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1452
+iperf3	BW ns ::1 10003 __TIME__ __OPTS__ -b 5G -l 1452
 bw	__BW__ 1.0 1.5
-iperf3	BW ns ::1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW ns ::1 10003 __TIME__ __OPTS__ -b 10G -l 3972
 bw	__BW__ 1.2 1.8
-iperf3	BW ns ::1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 10G -l 16336
+iperf3	BW ns ::1 10003 __TIME__ __OPTS__ -b 30G -l 16336
 bw	__BW__ 5.0 6.0
-iperf3	BW ns ::1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 15G -l 65487
+iperf3	BW ns ::1 10003 __TIME__ __OPTS__ -b 40G -l 65487
 bw	__BW__ 7.0 9.0
 
 iperf3k	host
@@ -58,16 +57,16 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW ns 127.0.0.1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1372
+iperf3	BW ns 127.0.0.1 10003 __TIME__ __OPTS__ -b 5G -l 1372
 bw	__BW__ 1.0 1.5
-iperf3	BW ns 127.0.0.1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW ns 127.0.0.1 10003 __TIME__ __OPTS__ -b 10G -l 3972
 bw	__BW__ 1.2 1.8
-iperf3	BW ns 127.0.0.1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 10G -l 16356
+iperf3	BW ns 127.0.0.1 10003 __TIME__ __OPTS__ -b 30G -l 16356
 bw	__BW__ 5.0 6.0
-iperf3	BW ns 127.0.0.1 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 15G -l 65507
+iperf3	BW ns 127.0.0.1 10003 __TIME__ __OPTS__ -b 40G -l 65507
 bw	__BW__ 7.0 9.0
 
 iperf3k	host
@@ -83,15 +82,15 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv6: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
-iperf3	BW host ::1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1452
+iperf3	BW host ::1 10002 __TIME__ __OPTS__ -b 5G -l 1452
 bw	__BW__ 1.0 1.5
-iperf3	BW host ::1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW host ::1 10002 __TIME__ __OPTS__ -b 10G -l 3972
 bw	__BW__ 1.2 1.8
-iperf3	BW host ::1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 10G -l 16336
+iperf3	BW host ::1 10002 __TIME__ __OPTS__ -b 30G -l 16336
 bw	__BW__ 5.0 6.0
-iperf3	BW host ::1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G -l 16336
+iperf3	BW host ::1 10002 __TIME__ __OPTS__ -b 40G -l 65487
 bw	__BW__ 7.0 9.0
 
 iperf3k	ns
@@ -107,14 +106,14 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: host to ns
-iperf3s	ns 100${i}2 __THREADS__
-iperf3	BW host 127.0.0.1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 1372
+iperf3s	ns 10002
+iperf3	BW host 127.0.0.1 10002 __TIME__ __OPTS__ -b 5G -l 1372
 bw	__BW__ 1.0 1.5
-iperf3	BW host 127.0.0.1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW host 127.0.0.1 10002 __TIME__ __OPTS__ -b 10G -l 3972
 bw	__BW__ 1.2 1.8
-iperf3	BW host 127.0.0.1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 10G -l 16356
+iperf3	BW host 127.0.0.1 10002 __TIME__ __OPTS__ -b 30G -l 16356
 bw	__BW__ 5.0 6.0
-iperf3	BW host 127.0.0.1 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G -l 65507
+iperf3	BW host 127.0.0.1 10002 __TIME__ __OPTS__ -b 40G -l 65507
 bw	__BW__ 7.0 9.0
 
 iperf3k	ns
@@ -138,22 +137,22 @@ nsout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 
-info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz, __STREAMS__ streams
+info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz
 report	pasta tap_udp 1 __FREQ__
 
 th	pktlen 1500B 4000B 16384B 65520B
 
 tr	UDP throughput over IPv6: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header
 
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1472
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 4G -l 16356
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW ns __GW6__%__IFNAME__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 6G -l 65472
+iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 30G -l 65472
 bw	__BW__ 6.0 7.0
 
 iperf3k	host
@@ -169,16 +168,16 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: ns to host
-iperf3s	host 100${i}3 __THREADS__
+iperf3s	host 10003
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1472
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 4G -l 16356
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW ns __GW__ 100${i}3 __THREADS__ __TIME__ __OPTS__ -b 6G -l 65492
+iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 30G -l 65492
 bw	__BW__ 6.0 7.0
 
 iperf3k	host
@@ -193,17 +192,17 @@ hostw
 lat	__LAT__ 200 150
 
 tr	UDP throughput over IPv6: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 nsout	ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.scope == "global" and .prefixlen == 64).local'
-iperf3	BW host __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1472
+iperf3	BW host __ADDR6__ 10002 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW host __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW host __ADDR6__ 10002 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW host __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 4G -l 16356
+iperf3	BW host __ADDR6__ 10002 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW host __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G -l 65472
+iperf3	BW host __ADDR6__ 10002 __TIME__ __OPTS__ -b 30G -l 65472
 bw	__BW__ 7.0 9.0
 
 iperf3k	ns
@@ -219,16 +218,16 @@ lat	__LAT__ 200 150
 
 
 tr	UDP throughput over IPv4: host to ns
-iperf3s	ns 100${i}2 __THREADS__
+iperf3s	ns 10002
 
 nsout	ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local'
-iperf3	BW host __ADDR__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 2G -l 1472
+iperf3	BW host __ADDR__ 10002 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW host __ADDR__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 3G -l 3972
+iperf3	BW host __ADDR__ 10002 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW host __ADDR__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 4G -l 16356
+iperf3	BW host __ADDR__ 10002 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW host __ADDR__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G -l 65492
+iperf3	BW host __ADDR__ 10002 __TIME__ __OPTS__ -b 30G -l 65492
 bw	__BW__ 7.0 9.0
 
 iperf3k	ns
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (7 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  4:00   ` David Gibson
  2024-07-24 21:50 ` [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Stefano Brivio
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

In tap_sock_unix_open(), if we have a given path for the socket from
configuration, we don't need to loop over possible paths, so we exit
the loop on the first iteration, unconditionally.

But if we failed to bind() the socket to that explicit path, we should
exit, instead of continuing. Otherwise we'll pretend we're up and
running, but nobody can contact us, and this might be mildly confusing
for users.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2299474
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tap.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tap.c b/tap.c
index 6930ad8..44bd444 100644
--- a/tap.c
+++ b/tap.c
@@ -1139,8 +1139,11 @@ int tap_sock_unix_open(char *sock_path)
 		close(ex);
 
 		unlink(path);
-		if (!bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) ||
-		    *sock_path)
+		ret = bind(fd, (const struct sockaddr *)&addr, sizeof(addr));
+		if (*sock_path && ret)
+			die_perror("Failed to bind UNIX domain socket");
+
+		if (!ret)
 			break;
 	}
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (8 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  4:37   ` David Gibson
  2024-07-24 21:50 ` [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers Stefano Brivio
  2024-07-25 11:28 ` [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

This was reported by Matej a while ago, but we forgot to fix it. Even
if the hypervisor is necessarily trusted by passt, as it can in any
case terminate the guest or disrupt guest connectivity, it's a good
idea to be robust against possible issues.

Instead of resetting the connection to the hypervisor, just discard
the data we read with a single recv(), as we had a few cases where
QEMU would get the length descriptor wrong, in the past.

While at it, change l2len in tap_handler_passt() to uint32_t, as the
length descriptor is logically unsigned and 32-bit wide.

Reported-by: Matej Hrica <mhrica@redhat.com>
Suggested-by: Matej Hrica <mhrica@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tap.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tap.c b/tap.c
index 44bd444..62ba6a4 100644
--- a/tap.c
+++ b/tap.c
@@ -1011,15 +1011,18 @@ redo:
 	}
 
 	while (n > (ssize_t)sizeof(uint32_t)) {
-		ssize_t l2len = ntohl(*(uint32_t *)p);
+		uint32_t l2len = ntohl(*(uint32_t *)p);
 
 		p += sizeof(uint32_t);
 		n -= sizeof(uint32_t);
 
+		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
+			return;
+
 		/* At most one packet might not fit in a single read, and this
 		 * needs to be blocking.
 		 */
-		if (l2len > n) {
+		if (l2len > (size_t)n) {
 			rem = recv(c->fd_tap, p + n, l2len - n, 0);
 			if ((n += rem) != l2len)
 				return;
@@ -1028,8 +1031,7 @@ redo:
 		/* Complete the partial read above before discarding a malformed
 		 * frame, otherwise the stream will be inconsistent.
 		 */
-		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
-		    l2len > (ssize_t)ETH_MAX_MTU)
+		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
 			goto next;
 
 		tap_add_packet(c, l2len, p);
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (9 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Stefano Brivio
@ 2024-07-24 21:50 ` Stefano Brivio
  2024-07-25  4:39   ` David Gibson
  2024-07-25 11:28 ` [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
  11 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-24 21:50 UTC (permalink / raw)
  To: passt-dev

Even though we don't use : as delimiter for the port, making square
brackets unneeded, RFC 3986, section 3.2.2, mandates them for IPv6
literals. We want IPv6 addresses there, but some users might still
specify them out of habit.

Same for IPv4 addresses: RFC 3986 doesn't specify square brackets for
IPv4 literals, but I had reports of users actually trying to use them
(they're accepted by many tools).

Allow square brackets for both IPv4 and IPv6 addresses, correct or
not, they're harmless anyway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/conf.c b/conf.c
index 3cf9ed8..338742e 100644
--- a/conf.c
+++ b/conf.c
@@ -209,14 +209,24 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
 
 		}
 
-		if (ifname == buf + 1)		/* Interface without address */
+		if (ifname == buf + 1) {	/* Interface without address */
 			addr = NULL;
-		else if (inet_pton(AF_INET, buf, addr))
-			af = AF_INET;
-		else if (inet_pton(AF_INET6, buf, addr))
-			af = AF_INET6;
-		else
-			goto bad;
+		} else {
+			p = buf;
+
+			/* Allow square brackets for IPv4 too for convenience */
+			if (*p == '[' && p[strlen(p) - 1] == ']') {
+				p[strlen(p) - 1] = 0;
+				p++;
+			}
+
+			if (inet_pton(AF_INET, p, addr))
+				af = AF_INET;
+			else if (inet_pton(AF_INET6, p, addr))
+				af = AF_INET6;
+			else
+				goto bad;
+		}
 	} else {
 		spec = buf;
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down
  2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
@ 2024-07-25  3:21   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:21 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

On Wed, Jul 24, 2024 at 11:50:07PM +0200, Stefano Brivio wrote:
> If we start pasta with some ports forwarded, but no --config-net, say:
> 
>   $ ./pasta -u 10001
> 
> and then use a local, non-loopback address to send traffic to that
> port, say:
> 
>   $ socat -u FILE:test UDP4:192.0.2.1:10001
> 
> pasta writes to the tap file descriptor, but if the interface is down,
> we get EIO and terminate.
> 
> By itself, what I'm doing in this case is not very useful (I simply
> forgot to pass --config-net), but if we happen to have a DHCP client
> in the network namespace, the interface might still be down while
> somebody tries to send traffic to it, and exiting in that case is not
> really helpful.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>


> ---
>  tap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tap.c b/tap.c
> index 32a7b09..6930ad8 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -324,6 +324,7 @@ static size_t tap_send_frames_pasta(const struct ctx *c,
>  			case EINTR:
>  			case ENOBUFS:
>  			case ENOSPC:
> +			case EIO:		/* interface down? */
>  				break;
>  			default:
>  				die("Write error on tap device, exiting");

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug()
  2024-07-24 21:50 ` [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug() Stefano Brivio
@ 2024-07-25  3:22   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:22 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

On Wed, Jul 24, 2024 at 11:50:08PM +0200, Stefano Brivio wrote:
> This:
> 
>   $ ./pasta
>   SO_PEEK_OFF not supported
>   #
> 
> is a bit annoying, and might trick users who face other issues into
> thinking that SO_PEEK_OFF not being supported on a given kernel is
> an actual issue.
> 
> Even if SO_PEEK_OFF is supported by the kernel, that would be the
> only message displayed there, with default options, which looks a bit
> out of context.
> 
> Switch that to debug(): now that Podman users can pass --debug too, we
> can find out quickly if it's supported or not, if SO_PEEK_OFF usage is
> suspected of causing any issue.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tcp.c b/tcp.c
> index c031f13..c0820ce 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2524,7 +2524,7 @@ int tcp_init(struct ctx *c)
>  
>  	peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
>  			  (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
> -	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
> +	debug("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
>  
>  	return 0;
>  }

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages
  2024-07-24 21:50 ` [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages Stefano Brivio
@ 2024-07-25  3:26   ` David Gibson
  2024-07-25 11:27     ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:26 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 8296 bytes --]

On Wed, Jul 24, 2024 at 11:50:09PM +0200, Stefano Brivio wrote:
> Calling vlogmsg() twice from logmsg_perror() results in this beauty:
> 
>   $ ./pasta -i foo
>   Invalid interface name foo
>   : No such device
> 
> because the first part of the message, corresponding to the first
> call, doesn't end with a newline, and vlogmsg() adds it.
> 
> Given that we can't easily append an argument (error description) to
> a variadic list, add a 'newline' parameter to all the functions that
> currently add a newline if missing, and disable that on the first call
> to vlogmsg() from logmsg_perror(). Not very pretty but I can't think
> of any solution that's less messy than this.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

I think my personal inclination would be to rename all the
lowest-level functions slightly and remove the newline adding logic
unconditionally.  The create wrappers under the old name which add the
"\n".  I think that can be done in an easy macro, since the "\n" can
be constant string appended to the format string.  Just the special
paths that need to suppress the newline would call the low level "no
newline" variants.

But, I don't actually care that much so

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  flow.c |  2 +-
>  log.c  | 34 ++++++++++++++++++++++------------
>  log.h  | 18 +++++++++---------
>  3 files changed, 32 insertions(+), 22 deletions(-)
> 
> diff --git a/flow.c b/flow.c
> index d7d548d..bd5fa2b 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -279,7 +279,7 @@ void flow_log_(const struct flow_common *f, int pri, const char *fmt, ...)
>  	else
>  		type_or_state = FLOW_TYPE(f);
>  
> -	logmsg(pri, "Flow %u (%s): %s", flow_idx(f), type_or_state, msg);
> +	logmsg(true, pri, "Flow %u (%s): %s", flow_idx(f), type_or_state, msg);
>  }
>  
>  /**
> diff --git a/log.c b/log.c
> index 9ed6bc1..54483e7 100644
> --- a/log.c
> +++ b/log.c
> @@ -46,7 +46,14 @@ int		log_trace;		/* --trace mode enabled */
>  bool		log_conf_parsed;	/* Logging options already parsed */
>  bool		log_runtime;		/* Daemonised, or ready in foreground */
>  
> -void vlogmsg(int pri, const char *format, va_list ap)
> +/**
> + * vlogmsg() - Print or send messages to log or output files as configured
> + * @newline:	Append newline at the end of the message, if missing
> + * @pri:	Facility and level map, same as priority for vsyslog()
> + * @format:	Message
> + * @ap:		Variable argument list
> + */
> +void vlogmsg(bool newline, int pri, const char *format, va_list ap)
>  {
>  	bool debug_print = (log_mask & LOG_MASK(LOG_DEBUG)) && log_file == -1;
>  	struct timespec tp;
> @@ -63,9 +70,9 @@ void vlogmsg(int pri, const char *format, va_list ap)
>  
>  		va_copy(ap2, ap); /* Don't clobber ap, we need it again */
>  		if (log_file != -1)
> -			logfile_write(pri, format, ap2);
> +			logfile_write(newline, pri, format, ap2);
>  		else if (!(log_mask & LOG_MASK(LOG_DEBUG)))
> -			passt_vsyslog(pri, format, ap2);
> +			passt_vsyslog(newline, pri, format, ap2);
>  
>  		va_end(ap2);
>  	}
> @@ -73,22 +80,23 @@ void vlogmsg(int pri, const char *format, va_list ap)
>  	if (debug_print || !log_conf_parsed ||
>  	    (!log_runtime && (log_mask & LOG_MASK(LOG_PRI(pri))))) {
>  		(void)vfprintf(stderr, format, ap);
> -		if (format[strlen(format)] != '\n')
> +		if (newline && format[strlen(format)] != '\n')
>  			fprintf(stderr, "\n");
>  	}
>  }
>  
>  /**
>   * logmsg() - vlogmsg() wrapper for variable argument lists
> + * @newline:	Append newline at the end of the message, if missing
>   * @pri:	Facility and level map, same as priority for vsyslog()
>   * @format:	Message
>   */
> -void logmsg(int pri, const char *format, ...)
> +void logmsg(bool newline, int pri, const char *format, ...)
>  {
>  	va_list ap;
>  
>  	va_start(ap, format);
> -	vlogmsg(pri, format, ap);
> +	vlogmsg(newline, pri, format, ap);
>  	va_end(ap);
>  }
>  
> @@ -103,10 +111,10 @@ void logmsg_perror(int pri, const char *format, ...)
>  	va_list ap;
>  
>  	va_start(ap, format);
> -	vlogmsg(pri, format, ap);
> +	vlogmsg(false, pri, format, ap);
>  	va_end(ap);
>  
> -	logmsg(pri, ": %s", strerror(errno_copy));
> +	logmsg(true, pri, ": %s", strerror(errno_copy));
>  }
>  
>  /* Prefixes for log file messages, indexed by priority */
> @@ -174,11 +182,12 @@ void __setlogmask(int mask)
>  
>  /**
>   * passt_vsyslog() - vsyslog() implementation not using heap memory
> + * @newline:	Append newline at the end of the message, if missing
>   * @pri:	Facility and level map, same as priority for vsyslog()
>   * @format:	Same as vsyslog() format
>   * @ap:		Same as vsyslog() ap
>   */
> -void passt_vsyslog(int pri, const char *format, va_list ap)
> +void passt_vsyslog(bool newline, int pri, const char *format, va_list ap)
>  {
>  	char buf[BUFSIZ];
>  	int n;
> @@ -188,7 +197,7 @@ void passt_vsyslog(int pri, const char *format, va_list ap)
>  
>  	n += vsnprintf(buf + n, BUFSIZ - n, format, ap);
>  
> -	if (format[strlen(format)] != '\n')
> +	if (newline && format[strlen(format)] != '\n')
>  		n += snprintf(buf + n, BUFSIZ - n, "\n");
>  
>  	if (log_sock >= 0 && send(log_sock, buf, n, 0) != n && !log_runtime)
> @@ -360,11 +369,12 @@ static int logfile_rotate(int fd, const struct timespec *now)
>  
>  /**
>   * logfile_write() - Write entry to log file, trigger rotation if full
> + * @newline:	Append newline at the end of the message, if missing
>   * @pri:	Facility and level map, same as priority for vsyslog()
>   * @format:	Same as vsyslog() format
>   * @ap:		Same as vsyslog() ap
>   */
> -void logfile_write(int pri, const char *format, va_list ap)
> +void logfile_write(bool newline, int pri, const char *format, va_list ap)
>  {
>  	struct timespec now;
>  	char buf[BUFSIZ];
> @@ -380,7 +390,7 @@ void logfile_write(int pri, const char *format, va_list ap)
>  
>  	n += vsnprintf(buf + n, BUFSIZ - n, format, ap);
>  
> -	if (format[strlen(format)] != '\n')
> +	if (newline && format[strlen(format)] != '\n')
>  		n += snprintf(buf + n, BUFSIZ - n, "\n");
>  
>  	if ((log_written + n >= log_size) && logfile_rotate(log_file, &now))
> diff --git a/log.h b/log.h
> index 6f34e8d..51ddafa 100644
> --- a/log.h
> +++ b/log.h
> @@ -13,16 +13,16 @@
>  #define LOGFILE_CUT_RATIO		30	/* When full, cut ~30% size */
>  #define LOGFILE_SIZE_MIN		(5UL * MAX(BUFSIZ, PAGE_SIZE))
>  
> -void vlogmsg(int pri, const char *format, va_list ap);
> -void logmsg(int pri, const char *format, ...)
> -	__attribute__((format(printf, 2, 3)));
> +void vlogmsg(bool newline, int pri, const char *format, va_list ap);
> +void logmsg(bool newline, int pri, const char *format, ...)
> +	__attribute__((format(printf, 3, 4)));
>  void logmsg_perror(int pri, const char *format, ...)
>  	__attribute__((format(printf, 2, 3)));
>  
> -#define err(...)		logmsg(		LOG_ERR,	__VA_ARGS__)
> -#define warn(...)		logmsg(		LOG_WARNING,	__VA_ARGS__)
> -#define info(...)		logmsg(		LOG_INFO,	__VA_ARGS__)
> -#define debug(...)		logmsg(		LOG_DEBUG,	__VA_ARGS__)
> +#define err(...)		logmsg(true,	LOG_ERR,	__VA_ARGS__)
> +#define warn(...)		logmsg(true,	LOG_WARNING,	__VA_ARGS__)
> +#define info(...)		logmsg(true,	LOG_INFO,	__VA_ARGS__)
> +#define debug(...)		logmsg(true,	LOG_DEBUG,	__VA_ARGS__)
>  
>  #define err_perror(...)		logmsg_perror(	LOG_ERR,	__VA_ARGS__)
>  #define warn_perror(...)	logmsg_perror(	LOG_WARNING,	__VA_ARGS__)
> @@ -54,8 +54,8 @@ void trace_init(int enable);
>  
>  void __openlog(const char *ident, int option, int facility);
>  void logfile_init(const char *name, const char *path, size_t size);
> -void passt_vsyslog(int pri, const char *format, va_list ap);
> -void logfile_write(int pri, const char *format, va_list ap);
> +void passt_vsyslog(bool newline, int pri, const char *format, va_list ap);
> +void logfile_write(bool newline, int pri, const char *format, va_list ap);
>  void __setlogmask(int mask);
>  
>  #endif /* LOG_H */

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] log: Fix sub-second part in relative log time calculation
  2024-07-24 21:50 ` [PATCH 04/11] log: Fix sub-second part in relative log time calculation Stefano Brivio
@ 2024-07-25  3:32   ` David Gibson
  2024-07-25  7:51     ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:32 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2139 bytes --]

On Wed, Jul 24, 2024 at 11:50:10PM +0200, Stefano Brivio wrote:
> For some reason, in commit 01efc71ddd25 ("log, conf: Add support for
> logging to file"), I added calculations for relative logging
> timestamps using the difference for the seconds part only, not for
> accounting for the fractional part.
> 
> Fix that by storing the initial timestamp, log_start, as a timespec
> struct, and by calculating differences of both seconds and nanoseconds
> from the starting time. Do this in a macro as we need the same
> calculation and format in a few places.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  log.c | 42 ++++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 20 deletions(-)
> 
> diff --git a/log.c b/log.c
> index 54483e7..f57a54f 100644
> --- a/log.c
> +++ b/log.c
> @@ -40,12 +40,21 @@ static size_t	log_written;		/* Currently used bytes in log file */
>  static size_t	log_cut_size;		/* Bytes to cut at start on rotation */
>  static char	log_header[BUFSIZ];	/* File header, written back on cuts */
>  
> -static time_t	log_start;		/* Start timestamp */
> +static struct timespec log_start;	/* Start timestamp */
>  
>  int		log_trace;		/* --trace mode enabled */
>  bool		log_conf_parsed;	/* Logging options already parsed */
>  bool		log_runtime;		/* Daemonised, or ready in foreground */
>  
> +/**
> + * logtime_fmt_and_arg() - Build format and arguments to print relative log time
> + * @x:		Current timestamp
> + */
> +#define logtime_fmt_and_arg(x)						\
> +	"%lli.%04lli",							\
> +	((long long int)(x)->tv_sec - log_start.tv_sec),		\
> +	(((long long int)(x)->tv_nsec - log_start.tv_nsec) / (100L * 1000))

This doesn't look right.  If x->tv_nsec - log_start.tv_nsec is
negative it will produce weird results.  Instead you need more complex
logic to carry a sufficient difference in the nsec over into the
seconds difference.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file
  2024-07-24 21:50 ` [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file Stefano Brivio
@ 2024-07-25  3:35   ` David Gibson
  2024-07-25  7:51     ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:35 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]

On Wed, Jul 24, 2024 at 11:50:11PM +0200, Stefano Brivio wrote:
> ...not just for debug messages. Otherwise, timestamps in the log file
> are consistent but the starting point is not zero.
> 
> Do this right away as we enter main(), so that the resulting
> timestamps are as closely as possible relative to when we start.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

It does occur to me, though... when we're using relative timestamps,
should we be using the monotonic clock instead of the realtime clock?

> ---
>  log.c   | 4 +---
>  log.h   | 1 +
>  passt.c | 2 ++
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/log.c b/log.c
> index f57a54f..5f24c16 100644
> --- a/log.c
> +++ b/log.c
> @@ -40,7 +40,7 @@ static size_t	log_written;		/* Currently used bytes in log file */
>  static size_t	log_cut_size;		/* Bytes to cut at start on rotation */
>  static char	log_header[BUFSIZ];	/* File header, written back on cuts */
>  
> -static struct timespec log_start;	/* Start timestamp */
> +struct timespec	log_start;		/* Start timestamp */
>  
>  int		log_trace;		/* --trace mode enabled */
>  bool		log_conf_parsed;	/* Logging options already parsed */
> @@ -154,8 +154,6 @@ void __openlog(const char *ident, int option, int facility)
>  {
>  	(void)option;
>  
> -	clock_gettime(CLOCK_REALTIME, &log_start);
> -
>  	if (log_sock < 0) {
>  		struct sockaddr_un a = { .sun_family = AF_UNIX, };
>  
> diff --git a/log.h b/log.h
> index 51ddafa..e03199c 100644
> --- a/log.h
> +++ b/log.h
> @@ -44,6 +44,7 @@ void logmsg_perror(int pri, const char *format, ...)
>  extern int log_trace;
>  extern bool log_conf_parsed;
>  extern bool log_runtime;
> +extern struct timespec log_start;
>  
>  void trace_init(int enable);
>  #define trace(...)							\
> diff --git a/passt.c b/passt.c
> index eed74ec..72ad704 100644
> --- a/passt.c
> +++ b/passt.c
> @@ -207,6 +207,8 @@ int main(int argc, char **argv)
>  	struct timespec now;
>  	struct sigaction sa;
>  
> +	clock_gettime(CLOCK_REALTIME, &log_start);
> +
>  	arch_avx2_exec(argv);
>  
>  	isolate_initial();

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt
  2024-07-24 21:50 ` [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt Stefano Brivio
@ 2024-07-25  3:50   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:50 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4095 bytes --]

On Wed, Jul 24, 2024 at 11:50:12PM +0200, Stefano Brivio wrote:
> This used to work on my setup as I kept reusing an old mbuto
> (initramfs) image, but since commit 65923ba79877 ("conf: Accept
> duplicate and conflicting options, the last one wins"), --netns-only
> is, as originally intended, a pasta-only option.
> 
> I had used --netns-only, here, to prevent passt from trying to detach
> its own user namespace, which is not permitted as we're in a chroot,
> see unshare(2). In turn, we need the chroot because passt can't pivot
> root directly into its own empty filesystem using an initramfs.
> 
> Use switch_root into the tmpfs mountpoint instead of chroot, so that
> we can still detach user namespaces.
> 
> Note that in the mbuto images, we can't switch to nobody as we have
> no password entries at all, so we need to detach a further user
> namespace before starting passt, to trick passt into running as UID
> 0.
> 
> Given the new sequence, it's now more convenient to directly switch
> to a detached network namespace as well, which means we need to move
> the initialisation of the dummy network from the init script into the
> test script.
> 
> Reported-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Excellent, I can run these tests again.

Tested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  test/memory/passt    | 13 ++++++++++---
>  test/passt.mem.mbuto |  9 +--------
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/test/memory/passt b/test/memory/passt
> index 1193af8..bf78c8f 100644
> --- a/test/memory/passt
> +++ b/test/memory/passt
> @@ -44,7 +44,7 @@ endef
>  def	start_stop_diff
>  guest	sed /proc/slabinfo -ne 's/^\([^ ]* *[^ ]* *[^ ]* *[^ ]*\).*/\\\1/p' > /tmp/slabinfo.before
>  guest	cat /proc/meminfo > /tmp/meminfo.before
> -guest	/bin/passt.avx2 -l /tmp/log -s /tmp/sock -P /tmp/pid __OPTS__ --netns-only
> +guest	/bin/passt.avx2 -l /tmp/log -s /tmp/sock -P /tmp/pid __OPTS__
>  sleep	2
>  guest	cat /proc/meminfo > /tmp/meminfo.after
>  guest	sed /proc/slabinfo -ne 's/^\([^ ]* *[^ ]* *[^ ]* *[^ ]*\).*/\\\1/p' > /tmp/slabinfo.after
> @@ -78,9 +78,16 @@ guest	mount -o bind /proc /test/proc
>  guest	mount -o bind /dev /test/dev
>  guest	cp -Lr /bin /lib /lib64 /usr /sbin /test/
>  
> +guest	exec switch_root /test /bin/sh
> +
>  guest	ulimit -Hn 300000
> -guest	unshare -rUm -R /test
> -guest	chroot .
> +guest	unshare -rUn
> +guest	ip link add eth0 type dummy
> +guest	ip link set eth0 up
> +guest	ip address add 192.0.2.2/24 dev eth0
> +guest	ip address add 2001:db8::2/64 dev eth0
> +guest	ip route add default via 192.0.2.1
> +guest	ip -6 route add default via 2001:db8::1 dev eth0
>  
>  guest	meminfo_size() { grep "^$2:" $1 | tr -s ' ' | cut -f2 -d ' '; }
>  guest	meminfo_diff() { echo $(( $(meminfo_size $2 $3) - $(meminfo_size $1 $3) )); }
> diff --git a/test/passt.mem.mbuto b/test/passt.mem.mbuto
> index 56f5139..532eae0 100755
> --- a/test/passt.mem.mbuto
> +++ b/test/passt.mem.mbuto
> @@ -12,7 +12,7 @@
>  
>  PROGS="${PROGS:-ash,dash,bash chmod ip mount insmod mkdir ln cat chmod modprobe
>         grep mknod sed chown sleep bc ls ps mount unshare chroot cp kill diff
> -       head tail sort tr tee cut nm which}"
> +       head tail sort tr tee cut nm which switch_root}"
>  
>  KMODS="${KMODS:- dummy}"
>  
> @@ -29,13 +29,6 @@ COPIES="${COPIES} ../passt.avx2,/bin/passt.avx2"
>  FIXUP="${FIXUP}"'
>  ln -s /bin /usr/bin
>  chmod 777 /tmp
> -ip link add eth0 type dummy
> -ip link set eth0 up
> -ip address add 192.0.2.2/24 dev eth0
> -ip address add 2001:db8::2/64 dev eth0
> -ip route add default via 192.0.2.1
> -ip -6 route add default via 2001:db8::1 dev eth0
> -sleep 2
>  sh +m
>  '
>  

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/11] test: Update names of symbols and slabinfo entries
  2024-07-24 21:50 ` [PATCH 07/11] test: Update names of symbols and slabinfo entries Stefano Brivio
@ 2024-07-25  3:54   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  3:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

On Wed, Jul 24, 2024 at 11:50:13PM +0200, Stefano Brivio wrote:
> Differences in allocated Acpi-Parse entries are gone (at least) since
> the 6.1 Linux kernel series. I should run this on a 6.10 kernel,
> eventually, and adjust things further, as needed.
> 
> Userspace symbols are also fairly different now: show whatever is more
> than 1 MiB at the moment.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

And now I get more useful results here too.

Tested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  test/memory/passt | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/test/memory/passt b/test/memory/passt
> index bf78c8f..7e45724 100644
> --- a/test/memory/passt
> +++ b/test/memory/passt
> @@ -110,27 +110,17 @@ info
>  th	symbol MiB
>  set	WHAT tcp_buf_discard
>  nm_row
> -set	WHAT tcp6_l2_buf
> +set	WHAT flowtab
>  nm_row
> -set	WHAT tcp4_l2_buf
> +set	WHAT tcp6_payload
>  nm_row
> -set	WHAT tc
> +set	WHAT tcp4_payload
>  nm_row
>  set	WHAT pkt_buf
>  nm_row
> -set	WHAT udp_splice_map
> +set	WHAT udp_payload
>  nm_row
> -set	WHAT udp6_l2_buf
> -nm_row
> -set	WHAT udp4_l2_buf
> -nm_row
> -set	WHAT udp_tap_map
> -nm_row
> -set	WHAT icmp_id_map
> -nm_row
> -set	WHAT udp_splice_buf
> -nm_row
> -set	WHAT tc_hash
> +set	WHAT flow_hashtab
>  nm_row
>  set	WHAT pool_tap6_storage
>  nm_row
> @@ -149,8 +139,6 @@ set	WHAT pid
>  slab_row
>  set	WHAT dentry
>  slab_row
> -set	WHAT Acpi-Parse
> -slab_row
>  set	WHAT kmalloc-64
>  slab_row
>  set	WHAT kmalloc-32

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path
  2024-07-24 21:50 ` [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path Stefano Brivio
@ 2024-07-25  4:00   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  4:00 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Wed, Jul 24, 2024 at 11:50:15PM +0200, Stefano Brivio wrote:
> In tap_sock_unix_open(), if we have a given path for the socket from
> configuration, we don't need to loop over possible paths, so we exit
> the loop on the first iteration, unconditionally.
> 
> But if we failed to bind() the socket to that explicit path, we should
> exit, instead of continuing. Otherwise we'll pretend we're up and
> running, but nobody can contact us, and this might be mildly confusing
> for users.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2299474
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 6930ad8..44bd444 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1139,8 +1139,11 @@ int tap_sock_unix_open(char *sock_path)
>  		close(ex);
>  
>  		unlink(path);
> -		if (!bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) ||
> -		    *sock_path)
> +		ret = bind(fd, (const struct sockaddr *)&addr, sizeof(addr));
> +		if (*sock_path && ret)
> +			die_perror("Failed to bind UNIX domain socket");
> +
> +		if (!ret)
>  			break;
>  	}
>  

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that
  2024-07-24 21:50 ` [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that Stefano Brivio
@ 2024-07-25  4:23   ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-25  4:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On Wed, Jul 24, 2024 at 11:50:14PM +0200, Stefano Brivio wrote:
> Starting from iperf3 version 3.16, -P / --parallel spawns multiple
> clients as separate threads, instead of multiple streams serviced by
> the same thread.
> 
> So we can drop our lib/test implementation to spawn several iperf3
> client and server processes and finally simplify things quite a bit.
> 
> Adjust number of threads and UDP sending bandwidth to values that seem
> to be more or less matching previous throughput tests on my setup.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Lovely.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-24 21:50 ` [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Stefano Brivio
@ 2024-07-25  4:37   ` David Gibson
  2024-07-25  9:15     ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25  4:37 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3229 bytes --]

On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:
> This was reported by Matej a while ago, but we forgot to fix it. Even
> if the hypervisor is necessarily trusted by passt, as it can in any
> case terminate the guest or disrupt guest connectivity, it's a good
> idea to be robust against possible issues.
> 
> Instead of resetting the connection to the hypervisor, just discard
> the data we read with a single recv(), as we had a few cases where
> QEMU would get the length descriptor wrong, in the past.
> 
> While at it, change l2len in tap_handler_passt() to uint32_t, as the
> length descriptor is logically unsigned and 32-bit wide.
> 
> Reported-by: Matej Hrica <mhrica@redhat.com>
> Suggested-by: Matej Hrica <mhrica@redhat.com>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tap.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index 44bd444..62ba6a4 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1011,15 +1011,18 @@ redo:
>  	}
>  
>  	while (n > (ssize_t)sizeof(uint32_t)) {
> -		ssize_t l2len = ntohl(*(uint32_t *)p);
> +		uint32_t l2len = ntohl(*(uint32_t *)p);
>  
>  		p += sizeof(uint32_t);
>  		n -= sizeof(uint32_t);
>  
> +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> +			return;

Neither the condition nor the action makes much sense to me here.
We're testing if the frame can fit in the the remaining buffer space.
But we may have already read part (or all) of the frame - i.e. it's
included in 'n'.  So I don't see how that condition is useful.

Then, simply returning doesn't seem right under pretty much any
circumstances - that discards some amount of data, and leaves us in an
unsynchronized state w.r.t. the frame boundaries.

If this is just supposed to be a sanity check on the frame length,
then I think we'd be better off with a fixed limit - 64kiB is the
obvious choice.  If we hit that, we can warn() and discard data up to
the end of the too-large frame.  That at least has a chance of letting
us recover and move on to future acceptable frames.

>  		/* At most one packet might not fit in a single read, and this
>  		 * needs to be blocking.
>  		 */
> -		if (l2len > n) {
> +		if (l2len > (size_t)n) {
>  			rem = recv(c->fd_tap, p + n, l2len - n, 0);
>  			if ((n += rem) != l2len)
>  				return;

Pre-existing, but a 'return' here basically lands us in a situation we
have no meaningful chance of recovering from.  A die() would be
preferable.  Better yet would be continuing to re-recv() until we have
the whole frame, similar to what we do for write_remainder().

> @@ -1028,8 +1031,7 @@ redo:
>  		/* Complete the partial read above before discarding a malformed
>  		 * frame, otherwise the stream will be inconsistent.
>  		 */
> -		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
> -		    l2len > (ssize_t)ETH_MAX_MTU)
> +		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
>  			goto next;
>  		tap_add_packet(c, l2len, p);

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers
  2024-07-24 21:50 ` [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers Stefano Brivio
@ 2024-07-25  4:39   ` David Gibson
  2024-07-25 11:26     ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25  4:39 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2027 bytes --]

On Wed, Jul 24, 2024 at 11:50:17PM +0200, Stefano Brivio wrote:
> Even though we don't use : as delimiter for the port, making square
> brackets unneeded, RFC 3986, section 3.2.2, mandates them for IPv6
> literals. We want IPv6 addresses there, but some users might still
> specify them out of habit.
> 
> Same for IPv4 addresses: RFC 3986 doesn't specify square brackets for
> IPv4 literals, but I had reports of users actually trying to use them
> (they're accepted by many tools).
> 
> Allow square brackets for both IPv4 and IPv6 addresses, correct or
> not, they're harmless anyway.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  conf.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 3cf9ed8..338742e 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -209,14 +209,24 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
>  
>  		}
>  
> -		if (ifname == buf + 1)		/* Interface without address */
> +		if (ifname == buf + 1) {	/* Interface without address */
>  			addr = NULL;
> -		else if (inet_pton(AF_INET, buf, addr))
> -			af = AF_INET;
> -		else if (inet_pton(AF_INET6, buf, addr))
> -			af = AF_INET6;
> -		else
> -			goto bad;
> +		} else {
> +			p = buf;
> +
> +			/* Allow square brackets for IPv4 too for convenience */
> +			if (*p == '[' && p[strlen(p) - 1] == ']') {
> +				p[strlen(p) - 1] = 0;

Nit: I think = '\0' would make the intention here clearer.

> +				p++;
> +			}
> +
> +			if (inet_pton(AF_INET, p, addr))
> +				af = AF_INET;
> +			else if (inet_pton(AF_INET6, p, addr))
> +				af = AF_INET6;
> +			else
> +				goto bad;
> +		}
>  	} else {
>  		spec = buf;
>  

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] log: Fix sub-second part in relative log time calculation
  2024-07-25  3:32   ` David Gibson
@ 2024-07-25  7:51     ` Stefano Brivio
  0 siblings, 0 replies; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25  7:51 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 13:32:52 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 11:50:10PM +0200, Stefano Brivio wrote:
> >
> > [...]
> > 
> > +/**
> > + * logtime_fmt_and_arg() - Build format and arguments to print relative log time
> > + * @x:		Current timestamp
> > + */
> > +#define logtime_fmt_and_arg(x)						\
> > +	"%lli.%04lli",							\
> > +	((long long int)(x)->tv_sec - log_start.tv_sec),		\
> > +	(((long long int)(x)->tv_nsec - log_start.tv_nsec) / (100L * 1000))  
> 
> This doesn't look right.  If x->tv_nsec - log_start.tv_nsec is
> negative it will produce weird results.  Instead you need more complex
> logic to carry a sufficient difference in the nsec over into the
> seconds difference.

Oh, oops, of course.

I was hoping to recycle this from the Linux kernel, where we have
timespec_sub(), but that implementation looks a bit too complicated for
my taste. I'm adding something simpler now.

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file
  2024-07-25  3:35   ` David Gibson
@ 2024-07-25  7:51     ` Stefano Brivio
  0 siblings, 0 replies; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25  7:51 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 13:35:25 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 11:50:11PM +0200, Stefano Brivio wrote:
> > ...not just for debug messages. Otherwise, timestamps in the log file
> > are consistent but the starting point is not zero.
> > 
> > Do this right away as we enter main(), so that the resulting
> > timestamps are as closely as possible relative to when we start.
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> It does occur to me, though... when we're using relative timestamps,
> should we be using the monotonic clock instead of the realtime clock?

Right, yes, I was under the impression that we could risk mixing those
with packet capture timestamps (where we want the realtime clock), but
actually it's never the case. I'll post another patch for that.

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-25  4:37   ` David Gibson
@ 2024-07-25  9:15     ` Stefano Brivio
  2024-07-25 10:23       ` David Gibson
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25  9:15 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 14:37:43 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:
> > This was reported by Matej a while ago, but we forgot to fix it. Even
> > if the hypervisor is necessarily trusted by passt, as it can in any
> > case terminate the guest or disrupt guest connectivity, it's a good
> > idea to be robust against possible issues.
> > 
> > Instead of resetting the connection to the hypervisor, just discard
> > the data we read with a single recv(), as we had a few cases where
> > QEMU would get the length descriptor wrong, in the past.
> > 
> > While at it, change l2len in tap_handler_passt() to uint32_t, as the
> > length descriptor is logically unsigned and 32-bit wide.
> > 
> > Reported-by: Matej Hrica <mhrica@redhat.com>
> > Suggested-by: Matej Hrica <mhrica@redhat.com>
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> >  tap.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tap.c b/tap.c
> > index 44bd444..62ba6a4 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -1011,15 +1011,18 @@ redo:
> >  	}
> >  
> >  	while (n > (ssize_t)sizeof(uint32_t)) {
> > -		ssize_t l2len = ntohl(*(uint32_t *)p);
> > +		uint32_t l2len = ntohl(*(uint32_t *)p);
> >  
> >  		p += sizeof(uint32_t);
> >  		n -= sizeof(uint32_t);
> >  
> > +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> > +			return;  
> 
> Neither the condition nor the action makes much sense to me here.
> We're testing if the frame can fit in the the remaining buffer space.

Not really, we're just checking that the length descriptor fits the
remaining buffer space. We're using this in the second recv() below,
that's why it matters here.

> But we may have already read part (or all) of the frame - i.e. it's
> included in 'n'.  So I don't see how that condition is useful.

...that is, it has nothing to do with exceeding or not exceeding the
buffer on recv(), that's already taken care of by the recv() call,
implicitly.

> Then, simply returning doesn't seem right under pretty much any
> circumstances - that discards some amount of data, and leaves us in an
> unsynchronized state w.r.t. the frame boundaries.

That might happen, of course, but it might also happen that the
hypervisor sent us *one* corrupted buffer, and the next recv() will
read consistent data.

> If this is just supposed to be a sanity check on the frame length,
> then I think we'd be better off with a fixed limit - 64kiB is the
> obvious choice.

That's already checked below (l2len > ETH_MAX_MTU), and...

> If we hit that, we can warn() and discard data up to
> the end of the too-large frame.  That at least has a chance of letting
> us recover and move on to future acceptable frames.

that's exactly what we do in that case (goto next).

> >  		/* At most one packet might not fit in a single read, and this
> >  		 * needs to be blocking.
> >  		 */
> > -		if (l2len > n) {
> > +		if (l2len > (size_t)n) {
> >  			rem = recv(c->fd_tap, p + n, l2len - n, 0);
                                              ^^^^^^^^^^^^^^^^

This the reason why the check above is relevant.

> >  			if ((n += rem) != l2len)
> >  				return;  
> 
> Pre-existing, but a 'return' here basically lands us in a situation we
> have no meaningful chance of recovering from.  A die() would be
> preferable.  Better yet would be continuing to re-recv() until we have
> the whole frame, similar to what we do for write_remainder().

Same as above, it depends on what failure you're assuming. If it's just
one botched recv(), instead, we recv() again the next time and we
recover.

But yes, the first attempt should probably be to recv() the rest of the
frame. I didn't implement this because it adds complexity and I think
that, eventually, we should turn this into a proper ringbuffer anyway.

> > @@ -1028,8 +1031,7 @@ redo:
> >  		/* Complete the partial read above before discarding a malformed
> >  		 * frame, otherwise the stream will be inconsistent.
> >  		 */
> > -		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
> > -		    l2len > (ssize_t)ETH_MAX_MTU)
> > +		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
> >  			goto next;
> >  		tap_add_packet(c, l2len, p);  

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-25  9:15     ` Stefano Brivio
@ 2024-07-25 10:23       ` David Gibson
  2024-07-25 11:09         ` Stefano Brivio
  0 siblings, 1 reply; 33+ messages in thread
From: David Gibson @ 2024-07-25 10:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 6085 bytes --]

On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote:
> On Thu, 25 Jul 2024 14:37:43 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:
> > > This was reported by Matej a while ago, but we forgot to fix it. Even
> > > if the hypervisor is necessarily trusted by passt, as it can in any
> > > case terminate the guest or disrupt guest connectivity, it's a good
> > > idea to be robust against possible issues.
> > > 
> > > Instead of resetting the connection to the hypervisor, just discard
> > > the data we read with a single recv(), as we had a few cases where
> > > QEMU would get the length descriptor wrong, in the past.
> > > 
> > > While at it, change l2len in tap_handler_passt() to uint32_t, as the
> > > length descriptor is logically unsigned and 32-bit wide.
> > > 
> > > Reported-by: Matej Hrica <mhrica@redhat.com>
> > > Suggested-by: Matej Hrica <mhrica@redhat.com>
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > ---
> > >  tap.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/tap.c b/tap.c
> > > index 44bd444..62ba6a4 100644
> > > --- a/tap.c
> > > +++ b/tap.c
> > > @@ -1011,15 +1011,18 @@ redo:
> > >  	}
> > >  
> > >  	while (n > (ssize_t)sizeof(uint32_t)) {
> > > -		ssize_t l2len = ntohl(*(uint32_t *)p);
> > > +		uint32_t l2len = ntohl(*(uint32_t *)p);
> > >  
> > >  		p += sizeof(uint32_t);
> > >  		n -= sizeof(uint32_t);
> > >  
> > > +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> > > +			return;  
> > 
> > Neither the condition nor the action makes much sense to me here.
> > We're testing if the frame can fit in the the remaining buffer space.
> 
> Not really, we're just checking that the length descriptor fits the
> remaining buffer space. We're using this in the second recv() below,
> that's why it matters here.

But AFAICT, what we need to know is if the remainder of the frame fits
in the buffer.  That could be less than the length descriptor if we've
already recv()ed part of a frame.

> > But we may have already read part (or all) of the frame - i.e. it's
> > included in 'n'.  So I don't see how that condition is useful.
> 
> ...that is, it has nothing to do with exceeding or not exceeding the
> buffer on recv(), that's already taken care of by the recv() call,
> implicitly.
> 
> > Then, simply returning doesn't seem right under pretty much any
> > circumstances - that discards some amount of data, and leaves us in an
> > unsynchronized state w.r.t. the frame boundaries.
> 
> That might happen, of course, but it might also happen that the
> hypervisor sent us *one* corrupted buffer, and the next recv() will
> read consistent data.

Well, sure, it's possible, but it doesn't seem particularly likely to
me.  AFAICT this is a stream which we need every length field to
interpret properly.  If we lose one, or it's corrupted, I think we're
done for.

> > If this is just supposed to be a sanity check on the frame length,
> > then I think we'd be better off with a fixed limit - 64kiB is the
> > obvious choice.
> 
> That's already checked below (l2len > ETH_MAX_MTU), and...

Right.  I wonder if it would make sense to do that earlier.

> > If we hit that, we can warn() and discard data up to
> > the end of the too-large frame.  That at least has a chance of letting
> > us recover and move on to future acceptable frames.
> 
> that's exactly what we do in that case (goto next).

Only for the case that the length is too long, but not *too* long.  In
particular it needs to fit in the buffer to even get there.  If we
sanity checked the frame length earlier we could use MSG_TRUNC to
discard even a ludicrously large frame and still continue on to the
next one.

> > >  		/* At most one packet might not fit in a single read, and this
> > >  		 * needs to be blocking.
> > >  		 */
> > > -		if (l2len > n) {
> > > +		if (l2len > (size_t)n) {
> > >  			rem = recv(c->fd_tap, p + n, l2len - n, 0);
>                                               ^^^^^^^^^^^^^^^^
> 
> This the reason why the check above is relevant.

Relevant, sure, but I still don't think it's right.  Actually
(TAP_BUF_BYTES - n) is an even stranger quantity than I initially
thought.  It's the total space of the buffer minus the current partial
frame - counting *both* the stuff before our partial frame and after
it.

I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES).

> > >  			if ((n += rem) != l2len)
> > >  				return;  
> > 
> > Pre-existing, but a 'return' here basically lands us in a situation we
> > have no meaningful chance of recovering from.  A die() would be
> > preferable.  Better yet would be continuing to re-recv() until we have
> > the whole frame, similar to what we do for write_remainder().
> 
> Same as above, it depends on what failure you're assuming. If it's just
> one botched recv(), instead, we recv() again the next time and we
> recover.

Even if it's just one bad recv(), we still have no idea where we are
w.r.t. frame boundaries, so I can't see any way we could recover.

> But yes, the first attempt should probably be to recv() the rest of the
> frame. I didn't implement this because it adds complexity and I think
> that, eventually, we should turn this into a proper ringbuffer anyway.
> 
> > > @@ -1028,8 +1031,7 @@ redo:
> > >  		/* Complete the partial read above before discarding a malformed
> > >  		 * frame, otherwise the stream will be inconsistent.
> > >  		 */
> > > -		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
> > > -		    l2len > (ssize_t)ETH_MAX_MTU)
> > > +		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
> > >  			goto next;
> > >  		tap_add_packet(c, l2len, p);  
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-25 10:23       ` David Gibson
@ 2024-07-25 11:09         ` Stefano Brivio
  2024-07-26  1:22           ` David Gibson
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25 11:09 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 20:23:55 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote:
> > On Thu, 25 Jul 2024 14:37:43 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:  
> > > > This was reported by Matej a while ago, but we forgot to fix it. Even
> > > > if the hypervisor is necessarily trusted by passt, as it can in any
> > > > case terminate the guest or disrupt guest connectivity, it's a good
> > > > idea to be robust against possible issues.
> > > > 
> > > > Instead of resetting the connection to the hypervisor, just discard
> > > > the data we read with a single recv(), as we had a few cases where
> > > > QEMU would get the length descriptor wrong, in the past.
> > > > 
> > > > While at it, change l2len in tap_handler_passt() to uint32_t, as the
> > > > length descriptor is logically unsigned and 32-bit wide.
> > > > 
> > > > Reported-by: Matej Hrica <mhrica@redhat.com>
> > > > Suggested-by: Matej Hrica <mhrica@redhat.com>
> > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > ---
> > > >  tap.c | 10 ++++++----
> > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/tap.c b/tap.c
> > > > index 44bd444..62ba6a4 100644
> > > > --- a/tap.c
> > > > +++ b/tap.c
> > > > @@ -1011,15 +1011,18 @@ redo:
> > > >  	}
> > > >  
> > > >  	while (n > (ssize_t)sizeof(uint32_t)) {
> > > > -		ssize_t l2len = ntohl(*(uint32_t *)p);
> > > > +		uint32_t l2len = ntohl(*(uint32_t *)p);
> > > >  
> > > >  		p += sizeof(uint32_t);
> > > >  		n -= sizeof(uint32_t);
> > > >  
> > > > +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> > > > +			return;    
> > > 
> > > Neither the condition nor the action makes much sense to me here.
> > > We're testing if the frame can fit in the the remaining buffer space.  
> > 
> > Not really, we're just checking that the length descriptor fits the
> > remaining buffer space. We're using this in the second recv() below,
> > that's why it matters here.  
> 
> But AFAICT, what we need to know is if the remainder of the frame fits
> in the buffer.

It always does because of how TAP_BUF_BYTES and TAP_BUF_FILL are
defined.

> That could be less than the length descriptor if we've
> already recv()ed part of a frame.
> 
> > > But we may have already read part (or all) of the frame - i.e. it's
> > > included in 'n'.  So I don't see how that condition is useful.  
> > 
> > ...that is, it has nothing to do with exceeding or not exceeding the
> > buffer on recv(), that's already taken care of by the recv() call,
> > implicitly.
> >   
> > > Then, simply returning doesn't seem right under pretty much any
> > > circumstances - that discards some amount of data, and leaves us in an
> > > unsynchronized state w.r.t. the frame boundaries.  
> > 
> > That might happen, of course, but it might also happen that the
> > hypervisor sent us *one* corrupted buffer, and the next recv() will
> > read consistent data.  
> 
> Well, sure, it's possible, but it doesn't seem particularly likely to
> me.  AFAICT this is a stream which we need every length field to
> interpret properly.  If we lose one, or it's corrupted, I think we're
> done for.

In most cases, the content of one recv() corresponds to a given number
of whole frames, so what we're doing by returning here is, practically
speaking, similar to what you're suggesting below with MSG_TRUNC.

> > > If this is just supposed to be a sanity check on the frame length,
> > > then I think we'd be better off with a fixed limit - 64kiB is the
> > > obvious choice.  
> > 
> > That's already checked below (l2len > ETH_MAX_MTU), and...  
> 
> Right.  I wonder if it would make sense to do that earlier.

We can. But right now what I want to fix here is just that missing
check, because it actually causes passt to crash (even though we assume
a trusted hypervisor, so it's not really security relevant, but not nice
either).

> > > If we hit that, we can warn() and discard data up to
> > > the end of the too-large frame.  That at least has a chance of letting
> > > us recover and move on to future acceptable frames.  
> > 
> > that's exactly what we do in that case (goto next).  
> 
> Only for the case that the length is too long, but not *too* long.  In
> particular it needs to fit in the buffer to even get there.  If we
> sanity checked the frame length earlier we could use MSG_TRUNC to
> discard even a ludicrously large frame and still continue on to the
> next one.

We don't know if the length descriptor actually matches the length of
the frame, though. If you have a way you think is more robust, would
you mind sending a patch?

> > > >  		/* At most one packet might not fit in a single read, and this
> > > >  		 * needs to be blocking.
> > > >  		 */
> > > > -		if (l2len > n) {
> > > > +		if (l2len > (size_t)n) {
> > > >  			rem = recv(c->fd_tap, p + n, l2len - n, 0);  
> >                                               ^^^^^^^^^^^^^^^^
> > 
> > This the reason why the check above is relevant.  
> 
> Relevant, sure, but I still don't think it's right.  Actually
> (TAP_BUF_BYTES - n) is an even stranger quantity than I initially
> thought.  It's the total space of the buffer minus the current partial
> frame - counting *both* the stuff before our partial frame and after
> it.

That should be enough to make sure we don't have bad side effects on
this second recv(), but yes:

> I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES).

...that is actually more accurate. I can fix this up in another
version, unless you can think of a more comprehensive change that also
gives us better possibilities to recover from a corrupted stream.

> > > >  			if ((n += rem) != l2len)
> > > >  				return;    
> > > 
> > > Pre-existing, but a 'return' here basically lands us in a situation we
> > > have no meaningful chance of recovering from.  A die() would be
> > > preferable.  Better yet would be continuing to re-recv() until we have
> > > the whole frame, similar to what we do for write_remainder().  
> > 
> > Same as above, it depends on what failure you're assuming. If it's just
> > one botched recv(), instead, we recv() again the next time and we
> > recover.  
> 
> Even if it's just one bad recv(), we still have no idea where we are
> w.r.t. frame boundaries, so I can't see any way we could recover.

The idea is that the recv() is _usually_ big enough as to flush the
buffer anyway.

> > But yes, the first attempt should probably be to recv() the rest of the
> > frame. I didn't implement this because it adds complexity and I think
> > that, eventually, we should turn this into a proper ringbuffer anyway.
> >   
> > > > @@ -1028,8 +1031,7 @@ redo:
> > > >  		/* Complete the partial read above before discarding a malformed
> > > >  		 * frame, otherwise the stream will be inconsistent.
> > > >  		 */
> > > > -		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
> > > > -		    l2len > (ssize_t)ETH_MAX_MTU)
> > > > +		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
> > > >  			goto next;
> > > >  		tap_add_packet(c, l2len, p);    
> >   
> 

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers
  2024-07-25  4:39   ` David Gibson
@ 2024-07-25 11:26     ` Stefano Brivio
  0 siblings, 0 replies; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25 11:26 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 14:39:49 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 11:50:17PM +0200, Stefano Brivio wrote:
> > Even though we don't use : as delimiter for the port, making square
> > brackets unneeded, RFC 3986, section 3.2.2, mandates them for IPv6
> > literals. We want IPv6 addresses there, but some users might still
> > specify them out of habit.
> > 
> > Same for IPv4 addresses: RFC 3986 doesn't specify square brackets for
> > IPv4 literals, but I had reports of users actually trying to use them
> > (they're accepted by many tools).
> > 
> > Allow square brackets for both IPv4 and IPv6 addresses, correct or
> > not, they're harmless anyway.
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> > ---
> >  conf.c | 24 +++++++++++++++++-------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 3cf9ed8..338742e 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -209,14 +209,24 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg,
> >  
> >  		}
> >  
> > -		if (ifname == buf + 1)		/* Interface without address */
> > +		if (ifname == buf + 1) {	/* Interface without address */
> >  			addr = NULL;
> > -		else if (inet_pton(AF_INET, buf, addr))
> > -			af = AF_INET;
> > -		else if (inet_pton(AF_INET6, buf, addr))
> > -			af = AF_INET6;
> > -		else
> > -			goto bad;
> > +		} else {
> > +			p = buf;
> > +
> > +			/* Allow square brackets for IPv4 too for convenience */
> > +			if (*p == '[' && p[strlen(p) - 1] == ']') {
> > +				p[strlen(p) - 1] = 0;  
> 
> Nit: I think = '\0' would make the intention here clearer.

Right, fixed on merge.

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages
  2024-07-25  3:26   ` David Gibson
@ 2024-07-25 11:27     ` Stefano Brivio
  2024-07-26  0:33       ` David Gibson
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25 11:27 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 25 Jul 2024 13:26:47 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 11:50:09PM +0200, Stefano Brivio wrote:
> > Calling vlogmsg() twice from logmsg_perror() results in this beauty:
> > 
> >   $ ./pasta -i foo
> >   Invalid interface name foo
> >   : No such device
> > 
> > because the first part of the message, corresponding to the first
> > call, doesn't end with a newline, and vlogmsg() adds it.
> > 
> > Given that we can't easily append an argument (error description) to
> > a variadic list, add a 'newline' parameter to all the functions that
> > currently add a newline if missing, and disable that on the first call
> > to vlogmsg() from logmsg_perror(). Not very pretty but I can't think
> > of any solution that's less messy than this.
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> I think my personal inclination would be to rename all the
> lowest-level functions slightly and remove the newline adding logic
> unconditionally.  The create wrappers under the old name which add the
> "\n".  I think that can be done in an easy macro, since the "\n" can
> be constant string appended to the format string.  Just the special
> paths that need to suppress the newline would call the low level "no
> newline" variants.

I gave it a try, but the problem is that the "easy macro" needs to be
conditional, depending on whether the newline is there or not: we don't
want to add a newline in case somebody already added it by mistake (or
habit).

That could probably be done as well with an intermediate function, but
it's getting a bit too complicated (at least for me right now).

> But, I don't actually care that much so
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/11] Minor assorted fixes, mostly logging and tests
  2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
                   ` (10 preceding siblings ...)
  2024-07-24 21:50 ` [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers Stefano Brivio
@ 2024-07-25 11:28 ` Stefano Brivio
  11 siblings, 0 replies; 33+ messages in thread
From: Stefano Brivio @ 2024-07-25 11:28 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson

On Wed, 24 Jul 2024 23:50:06 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> Here are a bunch of mostly unrelated assorted fixes for small issues
> I found in the past few weeks but didn't really find time to report.
> 
> Stefano Brivio (11):
>   tap: Don't quit if pasta gets EIO on writev() to tap, interface might
>     be down
>   tcp: Change SO_PEEK_OFF support message to debug()
>   log: Drop newlines in the middle of the perror()-like messages
>   log: Fix sub-second part in relative log time calculation
>   log: Initialise timestamp for relative log time also if we use a log
>     file
>   test: Fix memory/passt tests, --netns-only is not a valid option for
>     passt
>   test: Update names of symbols and slabinfo entries
>   test: iperf3 3.16 introduces multiple threads, drop our own
>     implementation of that
>   tap: Exit if we fail to bind a UNIX domain socket with explicit path
>   tap: Discard guest data on length descriptor mismatch
>   conf: Accept addresses enclosed by square brackets in port forwarding
>     specifiers

I applied these except for 4/11, 5/11, and 10/11.

I'll send a new version at least for 4/11 and 5/11 in a bit.

-- 
Stefano


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages
  2024-07-25 11:27     ` Stefano Brivio
@ 2024-07-26  0:33       ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-26  0:33 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On Thu, Jul 25, 2024 at 01:27:09PM +0200, Stefano Brivio wrote:
> On Thu, 25 Jul 2024 13:26:47 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 24, 2024 at 11:50:09PM +0200, Stefano Brivio wrote:
> > > Calling vlogmsg() twice from logmsg_perror() results in this beauty:
> > > 
> > >   $ ./pasta -i foo
> > >   Invalid interface name foo
> > >   : No such device
> > > 
> > > because the first part of the message, corresponding to the first
> > > call, doesn't end with a newline, and vlogmsg() adds it.
> > > 
> > > Given that we can't easily append an argument (error description) to
> > > a variadic list, add a 'newline' parameter to all the functions that
> > > currently add a newline if missing, and disable that on the first call
> > > to vlogmsg() from logmsg_perror(). Not very pretty but I can't think
> > > of any solution that's less messy than this.
> > > 
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> > 
> > I think my personal inclination would be to rename all the
> > lowest-level functions slightly and remove the newline adding logic
> > unconditionally.  The create wrappers under the old name which add the
> > "\n".  I think that can be done in an easy macro, since the "\n" can
> > be constant string appended to the format string.  Just the special
> > paths that need to suppress the newline would call the low level "no
> > newline" variants.
> 
> I gave it a try, but the problem is that the "easy macro" needs to be
> conditional, depending on whether the newline is there or not: we don't
> want to add a newline in case somebody already added it by mistake (or
> habit).

Well, I mean we could just add the newline and consider it a bug if
someone puts one in themselves.

> That could probably be done as well with an intermediate function, but
> it's getting a bit too complicated (at least for me right now).
> 
> > But, I don't actually care that much so
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
  2024-07-25 11:09         ` Stefano Brivio
@ 2024-07-26  1:22           ` David Gibson
  0 siblings, 0 replies; 33+ messages in thread
From: David Gibson @ 2024-07-26  1:22 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 10084 bytes --]

On Thu, Jul 25, 2024 at 01:09:19PM +0200, Stefano Brivio wrote:
> On Thu, 25 Jul 2024 20:23:55 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote:
> > > On Thu, 25 Jul 2024 14:37:43 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:  
> > > > > This was reported by Matej a while ago, but we forgot to fix it. Even
> > > > > if the hypervisor is necessarily trusted by passt, as it can in any
> > > > > case terminate the guest or disrupt guest connectivity, it's a good
> > > > > idea to be robust against possible issues.
> > > > > 
> > > > > Instead of resetting the connection to the hypervisor, just discard
> > > > > the data we read with a single recv(), as we had a few cases where
> > > > > QEMU would get the length descriptor wrong, in the past.
> > > > > 
> > > > > While at it, change l2len in tap_handler_passt() to uint32_t, as the
> > > > > length descriptor is logically unsigned and 32-bit wide.
> > > > > 
> > > > > Reported-by: Matej Hrica <mhrica@redhat.com>
> > > > > Suggested-by: Matej Hrica <mhrica@redhat.com>
> > > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > > ---
> > > > >  tap.c | 10 ++++++----
> > > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/tap.c b/tap.c
> > > > > index 44bd444..62ba6a4 100644
> > > > > --- a/tap.c
> > > > > +++ b/tap.c
> > > > > @@ -1011,15 +1011,18 @@ redo:
> > > > >  	}
> > > > >  
> > > > >  	while (n > (ssize_t)sizeof(uint32_t)) {
> > > > > -		ssize_t l2len = ntohl(*(uint32_t *)p);
> > > > > +		uint32_t l2len = ntohl(*(uint32_t *)p);
> > > > >  
> > > > >  		p += sizeof(uint32_t);
> > > > >  		n -= sizeof(uint32_t);
> > > > >  
> > > > > +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> > > > > +			return;    
> > > > 
> > > > Neither the condition nor the action makes much sense to me here.
> > > > We're testing if the frame can fit in the the remaining buffer space.  
> > > 
> > > Not really, we're just checking that the length descriptor fits the
> > > remaining buffer space. We're using this in the second recv() below,
> > > that's why it matters here.  
> > 
> > But AFAICT, what we need to know is if the remainder of the frame fits
> > in the buffer.
> 
> It always does because of how TAP_BUF_BYTES and TAP_BUF_FILL are
> defined.

Assuming the frame is <= ETH_MAX_MTU bytes, which we haven't yet
verified.  But that's not really the point I was making.  The key word
above is *remainder*: we may have already read part of the frame, so
comparing the *total* frame length against the remaining buffer space
is incorrect.

> > That could be less than the length descriptor if we've
> > already recv()ed part of a frame.
> > 
> > > > But we may have already read part (or all) of the frame - i.e. it's
> > > > included in 'n'.  So I don't see how that condition is useful.  
> > > 
> > > ...that is, it has nothing to do with exceeding or not exceeding the
> > > buffer on recv(), that's already taken care of by the recv() call,
> > > implicitly.
> > >   
> > > > Then, simply returning doesn't seem right under pretty much any
> > > > circumstances - that discards some amount of data, and leaves us in an
> > > > unsynchronized state w.r.t. the frame boundaries.  
> > > 
> > > That might happen, of course, but it might also happen that the
> > > hypervisor sent us *one* corrupted buffer, and the next recv() will
> > > read consistent data.  
> > 
> > Well, sure, it's possible, but it doesn't seem particularly likely to
> > me.  AFAICT this is a stream which we need every length field to
> > interpret properly.  If we lose one, or it's corrupted, I think we're
> > done for.
> 
> In most cases, the content of one recv() corresponds to a given number
> of whole frames, so what we're doing by returning here is, practically
> speaking, similar to what you're suggesting below with MSG_TRUNC.

Or, I guess more technically, the recv() boundaries will probably
correspond to qemu's send() boundaries(), which are likely to
correspond to the frame boundaries.

But, I think I now see the root of our disagreement here.  If the
frame field lengths don't seem to make sense relative to the recv()
boundaries, do we trust the former (my position) or the latter (your
position, I think).

Thinking about it like that, I don't think either position makes a lot
of sense.  If these mismatch something has already gone horribly wrong
and we should probably just die(), or at least reset the tap socket.

> > > > If this is just supposed to be a sanity check on the frame length,
> > > > then I think we'd be better off with a fixed limit - 64kiB is the
> > > > obvious choice.  
> > > 
> > > That's already checked below (l2len > ETH_MAX_MTU), and...  
> > 
> > Right.  I wonder if it would make sense to do that earlier.
> 
> We can. But right now what I want to fix here is just that missing
> check, because it actually causes passt to crash (even though we assume
> a trusted hypervisor, so it's not really security relevant, but not nice
> either).

Ok, fair enough.

> > > > If we hit that, we can warn() and discard data up to
> > > > the end of the too-large frame.  That at least has a chance of letting
> > > > us recover and move on to future acceptable frames.  
> > > 
> > > that's exactly what we do in that case (goto next).  
> > 
> > Only for the case that the length is too long, but not *too* long.  In
> > particular it needs to fit in the buffer to even get there.  If we
> > sanity checked the frame length earlier we could use MSG_TRUNC to
> > discard even a ludicrously large frame and still continue on to the
> > next one.
> 
> We don't know if the length descriptor actually matches the length of
> the frame, though. If you have a way you think is more robust, would
> you mind sending a patch?

It's more that if the length descriptor doesn't match the actual
length of the frame, I think we're beyond saving.  The recv()
boundaries probably correspond to to frame boundaries, but that's not
guaranteed by SOCK_STREAM semantics, so I don't think we should ever
trust them _over_ the length fields we receive.

> > > > >  		/* At most one packet might not fit in a single read, and this
> > > > >  		 * needs to be blocking.
> > > > >  		 */
> > > > > -		if (l2len > n) {
> > > > > +		if (l2len > (size_t)n) {
> > > > >  			rem = recv(c->fd_tap, p + n, l2len - n, 0);  
> > >                                               ^^^^^^^^^^^^^^^^
> > > 
> > > This the reason why the check above is relevant.  
> > 
> > Relevant, sure, but I still don't think it's right.  Actually
> > (TAP_BUF_BYTES - n) is an even stranger quantity than I initially
> > thought.  It's the total space of the buffer minus the current partial
> > frame - counting *both* the stuff before our partial frame and after
> > it.
> 
> That should be enough to make sure we don't have bad side effects on
> this second recv(), but yes:

No, I don't think it is.  Suppose we get exactly TAP_BUF_FILL bytes in
the first recv().  (TAP_BUF_FILL-4) bytes of that is perfectly
ordinary, valid frames, which we process.  We've read the remaining 4
bytes as a field length, so:
	p == pkt_buf + TAP_BUF_FILL
	n == 0

Now suppose that last frame length field is bad, say
	l2len == 2*ETH_MAX_MTU

The test you've suggested:
	if (l2len > (ssize_t)TAP_BUF_BYTES - n)
will let us continue, but the recv()
	rem = recv(c->fd_tap, p + n, l2len - n, 0);                                                                
will overrun pkt_buf + TAP_BUF_BYTES.

That _probably_ won't happen, since the recv() is likely to end at a
frame boundary instead, but again SOCK_STREAM semantics don't
guarantee that.  And even if the kernel does always preserve
send()/recv() boundaries, that qemu might not transmit the frame
length as a separate send() (maybe on a debugging / slow path).

> > I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES).
> 
> ...that is actually more accurate. I can fix this up in another
> version, unless you can think of a more comprehensive change that also
> gives us better possibilities to recover from a corrupted stream.

So, as noted above, I don't think we should even attempt to recover
from a corrupted stream.  One could certainly design a stream protocol
that allows for recovery from corrupted portions, but the qemu -net
stream protocol isn't so designed.

> > > > >  			if ((n += rem) != l2len)
> > > > >  				return;    
> > > > 
> > > > Pre-existing, but a 'return' here basically lands us in a situation we
> > > > have no meaningful chance of recovering from.  A die() would be
> > > > preferable.  Better yet would be continuing to re-recv() until we have
> > > > the whole frame, similar to what we do for write_remainder().  
> > > 
> > > Same as above, it depends on what failure you're assuming. If it's just
> > > one botched recv(), instead, we recv() again the next time and we
> > > recover.  
> > 
> > Even if it's just one bad recv(), we still have no idea where we are
> > w.r.t. frame boundaries, so I can't see any way we could recover.
> 
> The idea is that the recv() is _usually_ big enough as to flush the
> buffer anyway.

Hrm, not something I'd really like to rely on.

So, I just realised there's another pre-existing problem here.  AFAICT
there's no guarantee that frame lengths are aligned - which means
after an odd sized frame, we'd be making an unaligned u32 read for the
frame length, which will crash on some platforms.

I'll have a look at all the above today and see what I can come up
with.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-07-26  1:27 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
2024-07-25  3:21   ` David Gibson
2024-07-24 21:50 ` [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug() Stefano Brivio
2024-07-25  3:22   ` David Gibson
2024-07-24 21:50 ` [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages Stefano Brivio
2024-07-25  3:26   ` David Gibson
2024-07-25 11:27     ` Stefano Brivio
2024-07-26  0:33       ` David Gibson
2024-07-24 21:50 ` [PATCH 04/11] log: Fix sub-second part in relative log time calculation Stefano Brivio
2024-07-25  3:32   ` David Gibson
2024-07-25  7:51     ` Stefano Brivio
2024-07-24 21:50 ` [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file Stefano Brivio
2024-07-25  3:35   ` David Gibson
2024-07-25  7:51     ` Stefano Brivio
2024-07-24 21:50 ` [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt Stefano Brivio
2024-07-25  3:50   ` David Gibson
2024-07-24 21:50 ` [PATCH 07/11] test: Update names of symbols and slabinfo entries Stefano Brivio
2024-07-25  3:54   ` David Gibson
2024-07-24 21:50 ` [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that Stefano Brivio
2024-07-25  4:23   ` David Gibson
2024-07-24 21:50 ` [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path Stefano Brivio
2024-07-25  4:00   ` David Gibson
2024-07-24 21:50 ` [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Stefano Brivio
2024-07-25  4:37   ` David Gibson
2024-07-25  9:15     ` Stefano Brivio
2024-07-25 10:23       ` David Gibson
2024-07-25 11:09         ` Stefano Brivio
2024-07-26  1:22           ` David Gibson
2024-07-24 21:50 ` [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers Stefano Brivio
2024-07-25  4:39   ` David Gibson
2024-07-25 11:26     ` Stefano Brivio
2024-07-25 11:28 ` [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).