public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes
@ 2022-02-22  1:34 Stefano Brivio
  2022-02-22  1:34 ` [PATCH 01/18] slirp4netns: Look up pasta command, exit if not found Stefano Brivio
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3915 bytes --]

This series:

- completes slirp4netns(1) compatibility of slirp4netns.sh and introduces
  equivalent features in pasta (patches 1/18, 2/18, 6/18, 9/18)

- enables namespace-based sandboxing that's _at least_ equivalent to
  the one implemented by slirp4netns (patches 3/18 and 4/18)

- carries a number of fixes for minor ssues I found while doing this
  (patches 5/18, 7/18, 8/18, 10/18, 11/18)

- introduce a self-quit mechanism for pasta for easier integration with
  container runtimes (patch 12/18)

- fixes a few items in documentation and tests (patches 13/18 to 16/18)

- adds Podman integration as out-of-tree patch (patch 17/18)

- adds a demo for Podman operation with pasta and side-by-side
  comparison with slirp4netns (patch 18/18).

I already ran a demo recording for the Podman demo:
  https://passt.top/builds/latest/web/demo_podman.webm


Stefano Brivio (18):
  slirp4netns: Look up pasta command, exit if not found
  slirp4netns: Add EXIT as condition for trap
  passt, pasta: Namespace-based sandboxing, defer seccomp policy
    application
  passt: Make process not dumpable after sandboxing
  Makefile, conf, passt: Drop passt4netns references, explicit argc
    check
  slirp4netns.sh: Implement API socket option for port forwarding
  conf: Don't print configuration on --quiet
  conf: Given IPv4 address and no netmask, assign RFC 790-style classes
  conf, udp: Introduce basic DNS forwarding
  udp: Allow loopback connections from host using configured unicast
    address
  tcp, udp: Receive batching doesn't pay off when writing single frames
    to tap
  pasta: By default, quit if filesystem-bound net namespace goes away
  test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on
    22.04
  test/perf/passt_udp: Drop threshold for 256B test
  man page: Update REPORTING BUGS section
  README, hooks: Build HTML man page on push, add a link
  contrib: Add patch for Podman integration
  test: Add demo for Podman with pasta

 Makefile                                      |  10 +-
 README.md                                     |  18 +-
 conf.c                                        | 219 +++--
 ...001-libpod-Add-pasta-networking-mode.patch | 542 +++++++++++
 dhcp.c                                        |   5 +-
 dhcpv6.c                                      |   7 +
 hooks/pre-push                                |   3 +
 ndp.c                                         |   6 +-
 passt.1                                       |  92 +-
 passt.c                                       | 140 ++-
 passt.h                                       |  28 +-
 pasta.c                                       | 217 ++---
 pasta.h                                       |   2 +
 pcap.c                                        |   5 +-
 pcap.h                                        |   2 +-
 slirp4netns.sh                                | 198 +++-
 tap.c                                         |  58 +-
 tcp.c                                         |  49 +-
 test/demo/passt                               |   3 +-
 test/demo/pasta                               |   5 +-
 test/demo/podman                              | 843 ++++++++++++++++++
 test/distro/ubuntu                            |   1 +
 test/lib/layout                               |  38 +-
 test/lib/setup                                |  49 +-
 test/lib/term                                 |  10 +
 test/lib/test                                 |  35 +
 test/perf/passt_udp                           |   4 +-
 test/run                                      |   8 +
 udp.c                                         |  76 +-
 util.c                                        | 129 ++-
 util.h                                        |  12 +-
 31 files changed, 2430 insertions(+), 384 deletions(-)
 create mode 100644 contrib/podman/0001-libpod-Add-pasta-networking-mode.patch
 create mode 100644 test/demo/podman

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/18] slirp4netns: Look up pasta command, exit if not found
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 02/18] slirp4netns: Add EXIT as condition for trap Stefano Brivio
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 slirp4netns.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/slirp4netns.sh b/slirp4netns.sh
index de74281..e6a6049 100755
--- a/slirp4netns.sh
+++ b/slirp4netns.sh
@@ -17,7 +17,10 @@
 
 PASTA_PID="$(mktemp)"
 PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
+PASTA="$(command -v ./pasta || command -v pasta || :)"
+
 USAGE_RET=1
+NOTFOUND_RET=127
 
 # add() - Add single option to $PASTA_OPTS
 # $1:	Option name, with or without argument
@@ -161,6 +164,8 @@ no_map_gw=0
 EFD=0
 RFD=0
 
+[ -z "${PASTA}" ] && echo "pasta command not found" && exit ${NOTFOUND_RET}
+
 while getopts ce:r:m:6a:hv-: OPT 2>/dev/null; do
 	if [ "${OPT}" = "-" ]; then
 		OPT="${OPTARG%%[= ]*}"
@@ -198,7 +203,7 @@ if [ ${v6} -eq 1 ]; then
 	add "-a $(gen_addr6) -g fd00::2 -D fd00::3"
 fi
 
-./pasta ${PASTA_OPTS} ${ns_spec} 2>/dev/null && \
+${PASTA} ${PASTA_OPTS} ${ns_spec} && \
 	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
 
 trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM
-- 
@@ -17,7 +17,10 @@
 
 PASTA_PID="$(mktemp)"
 PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
+PASTA="$(command -v ./pasta || command -v pasta || :)"
+
 USAGE_RET=1
+NOTFOUND_RET=127
 
 # add() - Add single option to $PASTA_OPTS
 # $1:	Option name, with or without argument
@@ -161,6 +164,8 @@ no_map_gw=0
 EFD=0
 RFD=0
 
+[ -z "${PASTA}" ] && echo "pasta command not found" && exit ${NOTFOUND_RET}
+
 while getopts ce:r:m:6a:hv-: OPT 2>/dev/null; do
 	if [ "${OPT}" = "-" ]; then
 		OPT="${OPTARG%%[= ]*}"
@@ -198,7 +203,7 @@ if [ ${v6} -eq 1 ]; then
 	add "-a $(gen_addr6) -g fd00::2 -D fd00::3"
 fi
 
-./pasta ${PASTA_OPTS} ${ns_spec} 2>/dev/null && \
+${PASTA} ${PASTA_OPTS} ${ns_spec} && \
 	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
 
 trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/18] slirp4netns: Add EXIT as condition for trap
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
  2022-02-22  1:34 ` [PATCH 01/18] slirp4netns: Look up pasta command, exit if not found Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 03/18] passt, pasta: Namespace-based sandboxing, defer seccomp policy application Stefano Brivio
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

...otherwise, we don't terminate pasta on regular exit, i.e.
on a read from the "exit" file descriptor.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 slirp4netns.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slirp4netns.sh b/slirp4netns.sh
index e6a6049..518f581 100755
--- a/slirp4netns.sh
+++ b/slirp4netns.sh
@@ -206,7 +206,7 @@ fi
 ${PASTA} ${PASTA_OPTS} ${ns_spec} && \
 	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
 
-trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM
+trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM EXIT
 
 cat << EOF
 sent tapfd=5 for ${ifname}
-- 
@@ -206,7 +206,7 @@ fi
 ${PASTA} ${PASTA_OPTS} ${ns_spec} && \
 	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
 
-trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM
+trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM EXIT
 
 cat << EOF
 sent tapfd=5 for ${ifname}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/18] passt, pasta: Namespace-based sandboxing, defer seccomp policy application
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
  2022-02-22  1:34 ` [PATCH 01/18] slirp4netns: Look up pasta command, exit if not found Stefano Brivio
  2022-02-22  1:34 ` [PATCH 02/18] slirp4netns: Add EXIT as condition for trap Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 04/18] passt: Make process not dumpable after sandboxing Stefano Brivio
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 39948 bytes --]

To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 README.md       |   5 +-
 conf.c          |  45 +++++++------
 passt.1         |  15 +++--
 passt.c         | 126 ++++++++++++++++++++++--------------
 passt.h         |   7 +-
 pasta.c         | 165 +++++++++++++++++-------------------------------
 pcap.c          |   5 +-
 pcap.h          |   2 +-
 slirp4netns.sh  |   2 +-
 tap.c           |  58 ++++++++---------
 tcp.c           |  13 ++--
 test/demo/passt |   3 +-
 test/demo/pasta |   5 +-
 test/lib/setup  |  28 ++++----
 udp.c           |   7 +-
 util.c          | 129 ++++++++++++++++++++++++++++++++-----
 util.h          |  12 +++-
 17 files changed, 365 insertions(+), 262 deletions(-)

diff --git a/README.md b/README.md
index d16b705..1c8baf3 100644
--- a/README.md
+++ b/README.md
@@ -232,9 +232,10 @@ speeding up local connections, and usually requiring NAT. _pasta_:
   `seccomp`](/passt/tree/seccomp.sh))
 * ✅ root operation not allowed outside user namespaces
 * ✅ all capabilities dropped, other than `CAP_NET_BIND_SERVICE` (if granted)
+* ✅ with default options, user, mount, IPC, UTS, PID namespaces are detached
 * ✅ no external dependencies (other than a standard C library)
-* ✅ restrictive seccomp profiles (50 syscalls allowed for _passt_, 62 for
-  _pasta_)
+* ✅ restrictive seccomp profiles (22 syscalls allowed for _passt_, 34 for
+  _pasta_ on x86_64)
 * ✅ static checkers in continuous integration (clang-tidy, cppcheck)
 * 🛠️ rework of TCP state machine (flags instead of states), TCP timers, and code
   de-duplication
diff --git a/conf.c b/conf.c
index abe63a1..732d918 100644
--- a/conf.c
+++ b/conf.c
@@ -10,8 +10,6 @@
  *
  * Copyright (c) 2020-2021 Red Hat GmbH
  * Author: Stefano Brivio <sbrivio(a)redhat.com>
- *
- * #syscalls stat|statx
  */
 
 #include <arpa/inet.h>
@@ -46,31 +44,31 @@
  */
 void get_bound_ports(struct ctx *c, int ns, uint8_t proto)
 {
-	uint8_t *udp_map, *udp_exclude, *tcp_map, *tcp_exclude;
+	uint8_t *udp_map, *udp_excl, *tcp_map, *tcp_excl;
 
 	if (ns) {
 		udp_map = c->udp.port_to_tap;
-		udp_exclude = c->udp.port_to_init;
+		udp_excl = c->udp.port_to_init;
 		tcp_map = c->tcp.port_to_tap;
-		tcp_exclude = c->tcp.port_to_init;
+		tcp_excl = c->tcp.port_to_init;
 	} else {
 		udp_map = c->udp.port_to_init;
-		udp_exclude = c->udp.port_to_tap;
+		udp_excl = c->udp.port_to_tap;
 		tcp_map = c->tcp.port_to_init;
-		tcp_exclude = c->tcp.port_to_tap;
+		tcp_excl = c->tcp.port_to_tap;
 	}
 
 	if (proto == IPPROTO_UDP) {
 		memset(udp_map, 0, USHRT_MAX / 8);
-		procfs_scan_listen("udp",  udp_map, udp_exclude);
-		procfs_scan_listen("udp6", udp_map, udp_exclude);
+		procfs_scan_listen(c, IPPROTO_UDP, V4, ns, udp_map, udp_excl);
+		procfs_scan_listen(c, IPPROTO_UDP, V6, ns, udp_map, udp_excl);
 
-		procfs_scan_listen("tcp",  udp_map, udp_exclude);
-		procfs_scan_listen("tcp6", udp_map, udp_exclude);
+		procfs_scan_listen(c, IPPROTO_TCP, V4, ns, udp_map, udp_excl);
+		procfs_scan_listen(c, IPPROTO_TCP, V6, ns, udp_map, udp_excl);
 	} else if (proto == IPPROTO_TCP) {
 		memset(tcp_map, 0, USHRT_MAX / 8);
-		procfs_scan_listen("tcp",  tcp_map, tcp_exclude);
-		procfs_scan_listen("tcp6", tcp_map, tcp_exclude);
+		procfs_scan_listen(c, IPPROTO_TCP, V4, ns, tcp_map, tcp_excl);
+		procfs_scan_listen(c, IPPROTO_TCP, V6, ns, tcp_map, tcp_excl);
 	}
 }
 
@@ -367,7 +365,7 @@ static int conf_ns_check(void *arg)
 static int conf_ns_opt(struct ctx *c,
 		       char *nsdir, char *conf_userns, const char *optarg)
 {
-	int ufd = 0, nfd = 0, try, ret, netns_only_reset = c->netns_only;
+	int ufd = -1, nfd = -1, try, ret, netns_only_reset = c->netns_only;
 	char userns[PATH_MAX] = { 0 }, netns[PATH_MAX];
 	char *endptr;
 	pid_t pid;
@@ -416,7 +414,7 @@ static int conf_ns_opt(struct ctx *c,
 
 		nfd = open(netns, O_RDONLY);
 
-		if (nfd >= 0 && ufd >= 0) {
+		if (nfd >= 0 && (ufd >= 0 || c->netns_only)) {
 			c->pasta_netns_fd = nfd;
 			c->pasta_userns_fd = ufd;
 
@@ -425,10 +423,10 @@ static int conf_ns_opt(struct ctx *c,
 				return 0;
 		}
 
-		if (nfd > 0)
+		if (nfd >= 0)
 			close(nfd);
 
-		if (ufd > 0)
+		if (ufd >= 0)
 			close(ufd);
 	}
 
@@ -565,9 +563,9 @@ static void usage(const char *name)
 	info(   "    if FILE is not given, log to:");
 
 	if (strstr(name, "pasta") || strstr(name, "passt4netns"))
-		info("      /tmp/pasta_ISO8601-TIMESTAMP_INSTANCE-NUMBER.pcap");
+		info("      /tmp/pasta_ISO8601-TIMESTAMP_PID.pcap");
 	else
-		info("      /tmp/passt_ISO8601-TIMESTAMP_INSTANCE-NUMBER.pcap");
+		info("      /tmp/passt_ISO8601-TIMESTAMP_PID.pcap");
 
 	info(   "  -P, --pid FILE	Write own PID to the given file");
 	info(   "  -m, --mtu MTU	Assign MTU via DHCP/NDP");
@@ -664,7 +662,7 @@ pasta_opts:
 	info(   "    SPEC is as described above");
 	info(   "    default: auto");
 	info(   "  --userns NSPATH 	Target user namespace to join");
-	info(   "  --netns-only		Don't join or create user namespace");
+	info(   "  --netns-only		Don't join existing user namespace");
 	info(   "    implied if PATH or NAME are given without --userns");
 	info(   "  --nsrun-dir		Directory for nsfs mountpoints");
 	info(   "    default: " NETNS_RUN_DIR);
@@ -1170,7 +1168,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		usage(argv[0]);
 	}
 
-	if (c->mode == MODE_PASTA && c->pasta_netns_fd <= 0)
+	if (c->mode == MODE_PASTA && c->pasta_netns_fd == -1)
 		pasta_start_ns(c);
 
 	if (nl_sock_init(c)) {
@@ -1216,6 +1214,11 @@ void conf(struct ctx *c, int argc, char **argv)
 	c->tcp.init_detect_ports = c->udp.init_detect_ports = 0;
 
 	if (c->mode == MODE_PASTA) {
+		c->proc_net_tcp[V4][0] = c->proc_net_tcp[V4][1] = -1;
+		c->proc_net_tcp[V6][0] = c->proc_net_tcp[V6][1] = -1;
+		c->proc_net_udp[V4][0] = c->proc_net_udp[V4][1] = -1;
+		c->proc_net_udp[V6][0] = c->proc_net_udp[V6][1] = -1;
+
 		if (!tcp_tap || tcp_tap == PORT_AUTO) {
 			c->tcp.ns_detect_ports = 1;
 			ns_ports_arg.proto = IPPROTO_TCP;
diff --git a/passt.1 b/passt.1
index b0d7d87..92681f6 100644
--- a/passt.1
+++ b/passt.1
@@ -80,7 +80,8 @@ Don't print informational messages.
 
 .TP
 .BR \-f ", " \-\-foreground
-Don't run in background.
+Don't run in background. This implies that the process is not moved to a
+detached PID namespace after starting, because the PID itself cannot change.
 Default is to fork into background.
 
 .TP
@@ -100,14 +101,13 @@ Capture tap-facing (that is, guest-side or namespace-side) network packets to
 
 If \fIfile\fR is not given, capture packets to
 
-	\fB/tmp/passt_\fIISO8601-timestamp\fR_\fIinstance-number\fB.pcap\fR
+	\fB/tmp/passt_\fIISO8601-timestamp\fR_\fIPID\fB.pcap\fR
 
 in \fBpasst\fR mode and to
 
-	\fB/tmp/pasta_\fIISO8601-timestamp\fR_\fIinstance-number\fB.pcap\fR
+	\fB/tmp/pasta_\fIISO8601-timestamp\fR_\fIPID\fB.pcap\fR
 
-in \fBpasta\fR mode, where \fIinstance-number\fR is a progressive count of
-other detected instances running on the same host.
+in \fBpasta\fR mode, where \fIPID\fR is the ID of the running process.
 
 .TP
 .BR \-P ", " \-\-pid " " \fIfile
@@ -379,8 +379,9 @@ This option requires PID, PATH or NAME to be specified.
 
 .TP
 .BR \-\-netns-only
-Join or create only the network namespace, not a user namespace. This is implied
-if PATH or NAME are given without \-\-userns.
+Join only a target network namespace, not a user namespace, and don't create one
+for sandboxing purposes either. This is implied if PATH or NAME are given
+without \-\-userns.
 
 .TP
 .BR \-\-nsrun-dir " " \fIpath
diff --git a/passt.c b/passt.c
index a8bb88e..508d525 100644
--- a/passt.c
+++ b/passt.c
@@ -30,7 +30,9 @@
 #include <sys/mman.h>
 #include <sys/resource.h>
 #include <sys/uio.h>
+#include <sys/syscall.h>
 #include <sys/wait.h>
+#include <sys/mount.h>
 #include <netinet/ip.h>
 #include <net/ethernet.h>
 #include <stdlib.h>
@@ -53,7 +55,6 @@
 #include <linux/seccomp.h>
 #include <linux/audit.h>
 #include <linux/filter.h>
-#include <linux/capability.h>
 #include <linux/icmpv6.h>
 
 #include "util.h"
@@ -228,42 +229,61 @@ static void check_root(void)
 }
 
 /**
- * drop_caps() - Drop capabilities we might have except for CAP_NET_BIND_SERVICE
+ * sandbox() - Unshare IPC, mount, PID, UTS, and user namespaces, "unmount" root
+ *
+ * Return: negative error code on failure, zero on success
  */
-static void drop_caps(void)
+static int sandbox(struct ctx *c)
 {
-	int i;
+	int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS;
 
-	for (i = 0; i < 64; i++) {
-		if (i == CAP_NET_BIND_SERVICE)
-			continue;
+	errno = 0;
 
-		prctl(PR_CAPBSET_DROP, i, 0, 0, 0);
+	if (!c->netns_only) {
+		if (c->pasta_userns_fd == -1)
+			flags |= CLONE_NEWUSER;
+		else
+			setns(c->pasta_userns_fd, CLONE_NEWUSER);
 	}
-}
 
-/**
- * pid_file() - Write own PID to file, if configured
- * @c:		Execution context
- */
-static void pid_file(struct ctx *c) {
-	char pid_buf[12];
-	int pid_fd, n;
+	c->pasta_userns_fd = -1;
 
-	if (!*c->pid_file)
-		return;
+	/* If we run in foreground, we have no chance to actually move to a new
+	 * PID namespace. For passt, use CLONE_NEWPID anyway, in case somebody
+	 * ever gets around seccomp profiles -- there's no harm in passing it.
+	 */
+	if (!c->foreground || c->mode == MODE_PASST)
+		flags |= CLONE_NEWPID;
 
-	pid_fd = open(c->pid_file, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
-	if (pid_fd < 0)
-		return;
+	unshare(flags);
 
-	n = snprintf(pid_buf, sizeof(pid_buf), "%i\n", getpid());
+	mount("", "/", "", MS_UNBINDABLE | MS_REC, NULL);
+	mount("", TMPDIR, "tmpfs", MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RDONLY,
+	      "nr_inodes=2,nr_blocks=0");
+	chdir(TMPDIR);
+	syscall(SYS_pivot_root, ".", ".");
+	umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW);
 
-	if (write(pid_fd, pid_buf, n) < 0) {
-		perror("PID file write");
-		exit(EXIT_FAILURE);
-	}
-	close(pid_fd);
+	if (errno)
+		return -errno;
+
+	drop_caps();	/* Relative to the new user namespace this time. */
+
+	return 0;
+}
+
+/**
+ * exit_handler() - Signal handler for SIGQUIT and SIGTERM
+ * @unused:	Unused, handler deals with SIGQUIT and SIGTERM only
+ *
+ * TODO: After unsharing the PID namespace and forking, SIG_DFL for SIGTERM and
+ * SIGQUIT unexpectedly doesn't cause the process to terminate, figure out why.
+ */
+void exit_handler(int signal)
+{
+	(void)signal;
+
+	exit(EXIT_SUCCESS);
 }
 
 /**
@@ -273,36 +293,36 @@ static void pid_file(struct ctx *c) {
  *
  * Return: non-zero on failure
  *
- * #syscalls read write open|openat close fork|clone dup2|dup3 ioctl writev
- * #syscalls socket bind connect getsockopt setsockopt recvfrom sendto shutdown
- * #syscalls accept4 accept listen set_robust_list getrlimit setrlimit
- * #syscalls openat fcntl lseek clone setsid exit exit_group getpid chdir
- * #syscalls epoll_ctl epoll_create1 epoll_wait|epoll_pwait epoll_pwait
- * #syscalls prlimit64 clock_gettime fstat|newfstat newfstatat syslog
- * #syscalls ppc64le:_llseek ppc64le:recv ppc64le:send ppc64le:getuid
- * #syscalls ppc64:_llseek ppc64:recv ppc64:send ppc64:getuid ppc64:ugetrlimit
- * #syscalls s390x:socketcall s390x:sigreturn
- * #syscalls:pasta rt_sigreturn|sigreturn ppc64:sigreturn ppc64:fcntl64
+ * #syscalls read write writev
+ * #syscalls socket bind connect getsockopt setsockopt s390x:socketcall close
+ * #syscalls recvfrom sendto shutdown ppc64le:recv ppc64le:send
+ * #syscalls accept4|accept listen
+ * #syscalls epoll_ctl epoll_wait|epoll_pwait epoll_pwait clock_gettime
  */
 int main(int argc, char **argv)
 {
+	int nfds, i, devnull_fd = -1, pidfile_fd = -1;
 	struct epoll_event events[EPOLL_EVENTS];
 	struct ctx c = { 0 };
 	struct rlimit limit;
 	struct timespec now;
+	struct sigaction sa;
 	char *log_name;
-	int nfds, i;
 
 #ifndef PASST_LEGACY_NO_OPTIONS
 	check_root();
 #endif
 	drop_caps();
 
-	if (strstr(argv[0], "pasta") || strstr(argv[0], "passt4netns")) {
-		struct sigaction sa;
+	c.pasta_userns_fd = c.pasta_netns_fd = c.fd_tap = c.fd_tap_listen = -1;
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = 0;
+	sa.sa_handler = exit_handler;
+	sigaction(SIGTERM, &sa, NULL);
+	sigaction(SIGQUIT, &sa, NULL);
 
-		sigemptyset(&sa.sa_mask);
-		sa.sa_flags = 0;
+	if (strstr(argv[0], "pasta") || strstr(argv[0], "passt4netns")) {
 		sa.sa_handler = pasta_child_handler;
 		sigaction(SIGCHLD, &sa, NULL);
 		signal(SIGPIPE, SIG_IGN);
@@ -323,8 +343,6 @@ int main(int argc, char **argv)
 
 	conf(&c, argc, argv);
 
-	seccomp(&c);
-
 	if (!c.debug && (c.stderr || isatty(fileno(stdout))))
 		__openlog(log_name, LOG_PERROR, LOG_DAEMON);
 
@@ -369,12 +387,26 @@ int main(int argc, char **argv)
 	else
 		__setlogmask(LOG_UPTO(LOG_INFO));
 
-	if (!c.foreground && daemon(0, 0)) {
-		perror("daemon");
+	pcap_init(&c);
+
+	if (!c.foreground)
+		devnull_fd = open("/dev/null", O_RDWR);
+
+	if (*c.pid_file)
+		pidfile_fd = open(c.pid_file,
+				  O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
+
+	if (sandbox(&c)) {
+		err("Failed to sandbox process, exiting\n");
 		exit(EXIT_FAILURE);
 	}
 
-	pid_file(&c);
+	if (!c.foreground)
+		__daemon(pidfile_fd, devnull_fd);
+	else
+		write_pidfile(pidfile_fd, getpid());
+
+	seccomp(&c);
 
 	timer_init(&c, &now);
 loop:
diff --git a/passt.h b/passt.h
index 0ef1897..d7011da 100644
--- a/passt.h
+++ b/passt.h
@@ -99,8 +99,10 @@ enum passt_modes {
  * @pcap:		Path for packet capture file
  * @pid_file:		Path to PID file, empty string if not configured
  * @pasta_netns_fd:	File descriptor for network namespace in pasta mode
- * @pasta_userns_fd:	File descriptor for user namespace in pasta mode
+ * @pasta_userns_fd:	Descriptor for user namespace to join, -1 once joined
  * @netns_only:		In pasta mode, don't join or create a user namespace
+ * @proc_net_tcp:	Stored handles for /proc/net/tcp{,6} in init and ns
+ * @proc_net_udp:	Stored handles for /proc/net/udp{,6} in init and ns
  * @epollfd:		File descriptor for epoll instance
  * @fd_tap_listen:	File descriptor for listening AF_UNIX socket, if any
  * @fd_tap:		File descriptor for AF_UNIX socket or tuntap device
@@ -155,6 +157,9 @@ struct ctx {
 	int pasta_userns_fd;
 	int netns_only;
 
+	int proc_net_tcp[IP_VERSIONS][2];
+	int proc_net_udp[IP_VERSIONS][2];
+
 	int epollfd;
 	int fd_tap_listen;
 	int fd_tap;
diff --git a/pasta.c b/pasta.c
index bce30d4..972cbcf 100644
--- a/pasta.c
+++ b/pasta.c
@@ -11,9 +11,8 @@
  * Copyright (c) 2020-2021 Red Hat GmbH
  * Author: Stefano Brivio <sbrivio(a)redhat.com>
  *
- * #syscalls:pasta clone unshare waitid kill execve exit_group rt_sigprocmask
- * #syscalls:pasta geteuid getdents64|getdents readlink|readlinkat setsid
- * #syscalls:pasta nanosleep clock_nanosleep
+ * #syscalls:pasta clone waitid exit exit_group rt_sigprocmask
+ * #syscalls:pasta rt_sigreturn|sigreturn ppc64:sigreturn s390x:sigreturn
  */
 
 #include <sched.h>
@@ -40,75 +39,8 @@
 #include "passt.h"
 #include "netlink.h"
 
-/* PID of child, in case we created a namespace, and its procfs link */
+/* PID of child, in case we created a namespace */
 static int pasta_child_pid;
-static char pasta_child_ns[PATH_MAX];
-
-/**
- * pasta_ns_cleanup() - Look for processes in namespace, terminate them
- */
-static void pasta_ns_cleanup(void)
-{
-	char proc_path[PATH_MAX], ns_link[PATH_MAX], buf[BUFSIZ];
-	int recheck = 0, found = 0, waited = 0;
-	int dir_fd, n;
-
-	if (!*pasta_child_ns)
-		return;
-
-loop:
-	if ((dir_fd = open("/proc", O_RDONLY | O_DIRECTORY)) < 0)
-		return;
-
-	while ((n = syscall(SYS_getdents64, dir_fd, buf, BUFSIZ)) > 0) {
-		struct dirent *dp = (struct dirent *)buf;
-		int pos = 0;
-
-		while (dp->d_reclen && pos < n) {
-			pid_t pid;
-
-			errno = 0;
-			pid = strtol(dp->d_name, NULL, 0);
-			if (!pid || errno)
-				goto next;
-
-			snprintf(proc_path, PATH_MAX, "/proc/%i/ns/net", pid);
-			if (readlink(proc_path, ns_link, PATH_MAX) < 0)
-				goto next;
-
-			if (!strncmp(ns_link, pasta_child_ns, PATH_MAX)) {
-				found = 1;
-				if (waited)
-					kill(pid, SIGKILL);
-				else
-					kill(pid, SIGQUIT);
-			}
-next:
-			dp = (struct dirent *)(buf + (pos += dp->d_reclen));
-		}
-	}
-
-	close(dir_fd);
-
-	if (!found)
-		return;
-
-	if (waited) {
-		if (recheck) {
-			info("Some processes in namespace didn't quit");
-		} else {
-			found = 0;
-			recheck = 1;
-			goto loop;
-		}
-		return;
-	}
-
-	info("Waiting for all processes in namespace to terminate");
-	sleep(1);
-	waited = 1;
-	goto loop;
-}
 
 /**
  * pasta_child_handler() - Exit once shell exits (if we started it), reap clones
@@ -120,12 +52,14 @@ void pasta_child_handler(int signal)
 
 	(void)signal;
 
+	if (signal != SIGCHLD)
+		return;
+
 	if (pasta_child_pid &&
 	    !waitid(P_PID, pasta_child_pid, &infop, WEXITED | WNOHANG)) {
-		if (infop.si_pid == pasta_child_pid) {
-			pasta_ns_cleanup();
+		if (infop.si_pid == pasta_child_pid)
 			exit(EXIT_SUCCESS);
-		}
+			/* Nothing to do, detached PID namespace going away */
 	}
 
 	waitid(P_ALL, 0, NULL, WEXITED | WNOHANG);
@@ -163,45 +97,31 @@ netns:
 }
 
 /**
- * pasta_start_ns() - Fork shell in new namespace if target ns is not given
+ * struct pasta_setup_ns_arg - Argument for pasta_setup_ns()
  * @c:		Execution context
+ * @euid:	Effective UID of caller
  */
-void pasta_start_ns(struct ctx *c)
+struct pasta_setup_ns_arg {
+	struct ctx *c;
+	int euid;
+};
+
+/**
+ * pasta_setup_ns() - Map credentials, enable access to ping sockets, run shell
+ * @arg:	See @pasta_setup_ns_arg
+ *
+ * Return: this function never returns
+ */
+static int pasta_setup_ns(void *arg)
 {
-	int euid = geteuid(), fd;
+	struct pasta_setup_ns_arg *a = (struct pasta_setup_ns_arg *)arg;
 	char *shell;
+	int fd;
 
-	c->foreground = 1;
-	if (!c->debug)
-		c->quiet = 1;
-
-	if ((pasta_child_pid = fork()) == -1) {
-		perror("fork");
-		exit(EXIT_FAILURE);
-	}
-
-	if (pasta_child_pid) {
-		char proc_path[PATH_MAX];
-
-		NS_CALL(pasta_wait_for_ns, c);
-
-		snprintf(proc_path, PATH_MAX, "/proc/%i/ns/net",
-			 pasta_child_pid);
-		if (readlink(proc_path, pasta_child_ns, PATH_MAX) < 0)
-			warn("Cannot read link to ns, won't clean up on exit");
-
-		return;
-	}
-
-	if (unshare(CLONE_NEWNET | (c->netns_only ? 0 : CLONE_NEWUSER))) {
-		perror("unshare");
-		exit(EXIT_FAILURE);
-	}
-
-	if (!c->netns_only) {
+	if (!a->c->netns_only) {
 		char buf[BUFSIZ];
 
-		snprintf(buf, BUFSIZ, "%i %i %i", 0, euid, 1);
+		snprintf(buf, BUFSIZ, "%i %i %i", 0, a->euid, 1);
 
 		fd = open("/proc/self/uid_map", O_WRONLY);
 		if (write(fd, buf, strlen(buf)) < 0)
@@ -234,6 +154,39 @@ void pasta_start_ns(struct ctx *c)
 	exit(EXIT_FAILURE);
 }
 
+/**
+ * pasta_start_ns() - Fork shell in new namespace if target ns is not given
+ * @c:		Execution context
+ */
+void pasta_start_ns(struct ctx *c)
+{
+	struct pasta_setup_ns_arg arg = { .c = c, .euid = geteuid() };
+	char ns_fn_stack[NS_FN_STACK_SIZE];
+
+	c->foreground = 1;
+	if (!c->debug)
+		c->quiet = 1;
+
+	pasta_child_pid = clone(pasta_setup_ns,
+				ns_fn_stack + sizeof(ns_fn_stack) / 2,
+				(c->netns_only ? 0 : CLONE_NEWNET) |
+				CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWUSER |
+				CLONE_NEWUTS,
+				(void *)&arg);
+
+	if (pasta_child_pid == -1) {
+		perror("clone");
+		exit(EXIT_FAILURE);
+	}
+
+	drop_caps();
+
+	if (pasta_child_pid) {
+		NS_CALL(pasta_wait_for_ns, c);
+		return;
+	}
+}
+
 /**
  * pasta_ns_conf() - Set up loopback and tap interfaces in namespace as needed
  * @c:		Execution context
diff --git a/pcap.c b/pcap.c
index e00fc45..9c617ce 100644
--- a/pcap.c
+++ b/pcap.c
@@ -167,9 +167,8 @@ fail:
 /**
  * pcap_init() - Initialise pcap file
  * @c:		Execution context
- * @index:	pcap name index: passt instance number or pasta netns socket
  */
-void pcap_init(struct ctx *c, int index)
+void pcap_init(struct ctx *c)
 {
 	struct timeval tv;
 
@@ -196,7 +195,7 @@ void pcap_init(struct ctx *c, int index)
 		snprintf(name + strlen(PCAP_PREFIX) + strlen(PCAP_ISO8601_STR),
 			 sizeof(name) - strlen(PCAP_PREFIX) -
 					strlen(PCAP_ISO8601_STR),
-			 "_%i.pcap", index);
+			 "_%i.pcap", getpid());
 
 		strncpy(c->pcap, name, PATH_MAX);
 	}
diff --git a/pcap.h b/pcap.h
index 26f4f35..73b5ed8 100644
--- a/pcap.h
+++ b/pcap.h
@@ -6,4 +6,4 @@
 void pcap(char *pkt, size_t len);
 void pcapm(struct msghdr *mh);
 void pcapmm(struct mmsghdr *mmh, unsigned int vlen);
-void pcap_init(struct ctx *c, int sock_index);
+void pcap_init(struct ctx *c);
diff --git a/slirp4netns.sh b/slirp4netns.sh
index 518f581..7c2188d 100755
--- a/slirp4netns.sh
+++ b/slirp4netns.sh
@@ -10,7 +10,7 @@
 #
 # slirp4netns.sh - Compatibility wrapper for pasta, behaving like slirp4netns(1)
 #
-# WARNING: Draft quality, not really tested, --enable-sandbox not supported yet
+# WARNING: Draft quality, not really tested
 #
 # Copyright (c) 2021 Red Hat GmbH
 # Author: Stefano Brivio <sbrivio(a)redhat.com>
diff --git a/tap.c b/tap.c
index 22db9c5..38004a5 100644
--- a/tap.c
+++ b/tap.c
@@ -11,7 +11,6 @@
  * Copyright (c) 2020-2021 Red Hat GmbH
  * Author: Stefano Brivio <sbrivio(a)redhat.com>
  *
- * #syscalls recvfrom sendto
  */
 
 #include <sched.h>
@@ -769,12 +768,10 @@ restart:
 }
 
 /**
- * tap_sock_init_unix() - Create and bind AF_UNIX socket, listen for connection
+ * tap_sock_unix_init() - Create and bind AF_UNIX socket, listen for connection
  * @c:		Execution context
- *
- * #syscalls:passt unlink|unlinkat
  */
-static void tap_sock_init_unix(struct ctx *c)
+static void tap_sock_unix_init(struct ctx *c)
 {
 	int fd = socket(AF_UNIX, SOCK_STREAM, 0), ex;
 	struct epoll_event ev = { 0 };
@@ -783,11 +780,6 @@ static void tap_sock_init_unix(struct ctx *c)
 	};
 	int i, ret;
 
-	if (c->fd_tap_listen != -1) {
-		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap_listen, &ev);
-		close(c->fd_tap_listen);
-	}
-
 	if (fd < 0) {
 		perror("UNIX socket");
 		exit(EXIT_FAILURE);
@@ -834,8 +826,6 @@ static void tap_sock_init_unix(struct ctx *c)
 	      S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
 #endif
 
-	pcap_init(c, i);
-
 	listen(fd, 0);
 
 	ev.data.fd = c->fd_tap_listen = fd;
@@ -852,19 +842,26 @@ static void tap_sock_init_unix(struct ctx *c)
 }
 
 /**
- * tap_sock_accept_unix() - Accept connection on listening socket
+ * tap_sock_unix_new() - Handle new connection on listening socket
  * @c:		Execution context
  */
-static void tap_sock_accept_unix(struct ctx *c)
+static void tap_sock_unix_new(struct ctx *c)
 {
 	struct epoll_event ev = { 0 };
 	int v = INT_MAX / 2;
 
-	c->fd_tap = accept(c->fd_tap_listen, NULL, NULL);
+	/* Another client is already connected: accept and close right away. */
+	if (c->fd_tap != -1) {
+		int discard = accept4(c->fd_tap_listen, NULL, NULL,
+				      SOCK_NONBLOCK);
+
+		if (discard != -1)
+			close(discard);
 
-	epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap_listen, &ev);
-	close(c->fd_tap_listen);
-	c->fd_tap_listen = -1;
+		return;
+	}
+
+	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0);
 
 	if (!c->low_rmem)
 		setsockopt(c->fd_tap, SOL_SOCKET, SO_RCVBUF, &v, sizeof(v));
@@ -884,8 +881,6 @@ static int tun_ns_fd = -1;
  * @c:		Execution context
  *
  * Return: 0
- *
- * #syscalls:pasta ioctl
  */
 static int tap_ns_tun(void *arg)
 {
@@ -907,7 +902,7 @@ static int tap_ns_tun(void *arg)
  * tap_sock_init_tun() - Set up tuntap file descriptor
  * @c:		Execution context
  */
-static void tap_sock_init_tun(struct ctx *c)
+static void tap_sock_tun_init(struct ctx *c)
 {
 	struct epoll_event ev = { 0 };
 
@@ -919,8 +914,6 @@ static void tap_sock_init_tun(struct ctx *c)
 
 	pasta_ns_conf(c);
 
-	pcap_init(c, c->pasta_netns_fd);
-
 	c->fd_tap = tun_ns_fd;
 
 	ev.data.fd = c->fd_tap;
@@ -937,12 +930,15 @@ void tap_sock_init(struct ctx *c)
 	if (c->fd_tap != -1) {
 		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap, NULL);
 		close(c->fd_tap);
+		c->fd_tap = -1;
 	}
 
-	if (c->mode == MODE_PASST)
-		tap_sock_init_unix(c);
-	else
-		tap_sock_init_tun(c);
+	if (c->mode == MODE_PASST) {
+		if (c->fd_tap_listen == -1)
+			tap_sock_unix_init(c);
+	} else {
+		tap_sock_tun_init(c);
+	}
 }
 
 /**
@@ -955,18 +951,18 @@ void tap_sock_init(struct ctx *c)
 void tap_handler(struct ctx *c, int fd, uint32_t events, struct timespec *now)
 {
 	if (fd == c->fd_tap_listen && events == EPOLLIN) {
-		tap_sock_accept_unix(c);
+		tap_sock_unix_new(c);
 		return;
 	}
 
 	if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))
-		goto fail;
+		goto reinit;
 
 	if ((c->mode == MODE_PASST && tap_handler_passt(c, now)) ||
 	    (c->mode == MODE_PASTA && tap_handler_pasta(c, now)))
-		goto fail;
+		goto reinit;
 
 	return;
-fail:
+reinit:
 	tap_sock_init(c);
 }
diff --git a/tcp.c b/tcp.c
index 723b18e..e4fac22 100644
--- a/tcp.c
+++ b/tcp.c
@@ -304,7 +304,7 @@
  * - SPLICE_FIN_TO:		FIN (EPOLLRDHUP) seen from connected socket
  * - SPLICE_FIN_BOTH:		FIN (EPOLLRDHUP) seen from both sides
  *
- * #syscalls pipe|pipe2 pipe2
+ * #syscalls:pasta pipe2|pipe fcntl ppc64:fcntl64
  */
 
 #include <sched.h>
@@ -3028,7 +3028,7 @@ static void tcp_conn_from_sock(struct ctx *c, union epoll_ref ref,
  * @ref:	epoll reference
  * @events:	epoll events bitmap
  *
- * #syscalls splice
+ * #syscalls:pasta splice
  */
 void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
 			     uint32_t events)
@@ -3374,7 +3374,7 @@ static void tcp_set_pipe_size(struct ctx *c)
 
 smaller:
 	for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE * 2; i++) {
-		if (pipe(probe_pipe[i])) {
+		if (pipe2(probe_pipe[i], 0)) {
 			i++;
 			break;
 		}
@@ -3493,7 +3493,7 @@ static void tcp_sock_init_one(struct ctx *c, int ns, in_port_t port)
  * tcp_sock_init_ns() - Bind sockets in namespace for inbound connections
  * @arg:	Execution context
  *
- * Return: 0 on success, -1 on failure
+ * Return: 0
  */
 static int tcp_sock_init_ns(void *arg)
 {
@@ -3560,8 +3560,7 @@ static int tcp_sock_refill(void *arg)
 	int i, *p4, *p6;
 
 	if (a->ns) {
-		if (ns_enter(a->c))
-			return 0;
+		ns_enter(a->c);
 		p4 = ns_sock_pool4;
 		p6 = ns_sock_pool6;
 	} else {
@@ -3594,8 +3593,6 @@ static int tcp_sock_refill(void *arg)
  * @c:		Execution context
  *
  * Return: 0 on success, -1 on failure
- *
- * #syscalls getrandom
  */
 int tcp_sock_init(struct ctx *c, struct timespec *now)
 {
diff --git a/test/demo/passt b/test/demo/passt
index b5762aa..76aac86 100644
--- a/test/demo/passt
+++ b/test/demo/passt
@@ -84,7 +84,8 @@ say	Now let's run 'passt' in the new namespace, and
 nl
 say	  enter this namespace from the guest terminal too.
 sleep	3
-pout	TARGET_PID echo $$
+guest	pstree -p | grep pasta
+gout	TARGET_PID pstree -p | grep pasta | sed -n 's/.*(\([0-9].*\))$/\1/p'
 sleep	1
 
 passtb	./passt -f -t 5201,5203
diff --git a/test/demo/pasta b/test/demo/pasta
index f8f0cd0..c136965 100644
--- a/test/demo/pasta
+++ b/test/demo/pasta
@@ -58,7 +58,8 @@ say	For convenience, let's enter this namespace
 nl
 say	  from another terminal.
 sleep	3
-pout	TARGET_PID echo $$
+ns	pstree -p | grep pasta
+nsout	TARGET_PID pstree -p | grep pasta | sed -n 's/.*(\([0-9].*\))$/\1/p'
 sleep	1
 
 ns	nsenter -t __TARGET_PID__ -U -n --preserve-credentials
@@ -172,7 +173,7 @@ sleep	2
 passtb	perf record -g ./pasta
 sleep	2
 
-pout	TARGET_PID echo $$
+nsout	TARGET_PID pstree -p | grep pasta | sed -n 's/.*(\([0-9].*\))$/\1/p'
 sleep	1
 ns	nsenter -t __TARGET_PID__ -U -n --preserve-credentials
 sleep	5
diff --git a/test/lib/setup b/test/lib/setup
index ab51787..df21655 100755
--- a/test/lib/setup
+++ b/test/lib/setup
@@ -115,13 +115,14 @@ setup_passt_in_ns() {
 	[ ${PCAP} -eq 1 ] && __opts="${__opts} -p /tmp/pasta_with_passt.pcap"
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
 
-	pane_run PASST "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013"
+	__pid_file="$(mktemp)"
+	pane_run PASST "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${__pid_file}"
 	sleep 1
 	pane_run PASST ''
 	pane_wait PASST
-	pane_run PASST 'echo $$'
-	pane_wait PASST
-	__ns_pid="$(pane_parse PASST)"
+	__pasta_pid="$(cat "${__pid_file}")"
+	__ns_pid="$(cat /proc/${__pasta_pid}/task/${__pasta_pid}/children | cut -f1 -d' ')"
+	rm "${__pid_file}"
 
 	pane_run GUEST "nsenter -t ${__ns_pid} -U -n --preserve-credentials"
 	pane_run NS "nsenter -t ${__ns_pid} -U -n --preserve-credentials"
@@ -172,15 +173,18 @@ setup_two_guests() {
 	#  10004            | as server |  to init  |  to guest  |  to ns #2
 	#  10005            |           |           |  as server |  to ns #2
 
+	__pid1_file="$(mktemp)"
+	__pid2_file="$(mktemp)"
+
 	__opts=
 	[ ${PCAP} -eq 1 ] && __opts="${__opts} -p /tmp/pasta_1.pcap"
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
-	pane_run PASST_1 "./pasta ${__opts} -t 10001,10002 -T 10003,10004 -u 10001,10002 -U 10003,10004"
+	pane_run PASST_1 "./pasta ${__opts} -P ${__pid1_file} -t 10001,10002 -T 10003,10004 -u 10001,10002 -U 10003,10004"
 
 	__opts=
 	[ ${PCAP} -eq 1 ] && __opts="${__opts} -p /tmp/pasta_2.pcap"
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
-	pane_run PASST_2 "./pasta ${__opts} -t 10004,10005 -T 10003,10001 -u 10004,10005 -U 10003,10001"
+	pane_run PASST_2 "./pasta ${__opts} -P ${__pid2_file} -t 10004,10005 -T 10003,10001 -u 10004,10005 -U 10003,10001"
 
 	sleep 1
 	pane_run PASST_1 ''
@@ -188,12 +192,12 @@ setup_two_guests() {
 
 	pane_wait PASST_1
 	pane_wait PASST_2
-	pane_run PASST_1 'echo $$'
-	pane_run PASST_2 'echo $$'
-	pane_wait PASST_1
-	pane_wait PASST_2
-	__ns1_pid="$(pane_parse PASST_1)"
-	__ns2_pid="$(pane_parse PASST_2)"
+	__pasta1_pid="$(cat "${__pid1_file}")"
+	__ns1_pid="$(cat /proc/${__pasta1_pid}/task/${__pasta1_pid}/children | cut -f1 -d' ')"
+	rm "${__pid1_file}"
+	__pasta2_pid="$(cat "${__pid2_file}")"
+	__ns2_pid="$(cat /proc/${__pasta2_pid}/task/${__pasta2_pid}/children | cut -f1 -d' ')"
+	rm "${__pid2_file}"
 
 	pane_run GUEST_1 "nsenter -t ${__ns1_pid} -U -n --preserve-credentials"
 	pane_run GUEST_2 "nsenter -t ${__ns2_pid} -U -n --preserve-credentials"
diff --git a/udp.c b/udp.c
index e1a9ecb..348f695 100644
--- a/udp.c
+++ b/udp.c
@@ -529,7 +529,9 @@ static int udp_splice_connect_ns(void *arg)
 
 	a = (struct udp_splice_connect_ns_arg *)arg;
 
-	ns_enter(a->c);
+	if (ns_enter(a->c))
+		return 0;
+
 	a->s = udp_splice_connect(a->c, a->v6, a->bound_sock, a->src, a->dst,
 				  UDP_BACK_TO_INIT);
 
@@ -1029,7 +1031,8 @@ int udp_sock_init_ns(void *arg)
 	struct ctx *c = (struct ctx *)arg;
 	int dst;
 
-	ns_enter(c);
+	if (ns_enter(c))
+		return 0;
 
 	for (dst = 0; dst < USHRT_MAX; dst++) {
 		if (!bitmap_isset(c->udp.port_to_init, dst))
diff --git a/util.c b/util.c
index 94d49a6..e9fca3b 100644
--- a/util.c
+++ b/util.c
@@ -16,6 +16,7 @@
 #include <stdio.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdlib.h>
 #include <unistd.h>
 #include <arpa/inet.h>
 #include <net/ethernet.h>
@@ -23,6 +24,7 @@
 #include <netinet/tcp.h>
 #include <netinet/udp.h>
 #include <sys/epoll.h>
+#include <sys/prctl.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -32,6 +34,8 @@
 #include <time.h>
 #include <errno.h>
 
+#include <linux/capability.h>
+
 #include "util.h"
 #include "passt.h"
 
@@ -431,31 +435,51 @@ char *line_read(char *buf, size_t len, int fd)
 
 /**
  * procfs_scan_listen() - Set bits for listening TCP or UDP sockets from procfs
- * @name:	Corresponding name of file under /proc/net/
+ * @proto:	IPPROTO_TCP or IPPROTO_UDP
+ * @ip_version:	IP version, V4 or V6
+ * @ns:		Use saved file descriptors for namespace if set
  * @map:	Bitmap where numbers of ports in listening state will be set
  * @exclude:	Bitmap of ports to exclude from setting (and clear)
+ *
+ * #syscalls:pasta lseek ppc64le:_llseek ppc64:_llseek
  */
-void procfs_scan_listen(char *name, uint8_t *map, uint8_t *exclude)
+void procfs_scan_listen(struct ctx *c, uint8_t proto, int ip_version, int ns,
+			uint8_t *map, uint8_t *exclude)
 {
-	char line[BUFSIZ], path[PATH_MAX];
+	char line[BUFSIZ], *path;
 	unsigned long port;
 	unsigned int state;
-	int fd;
+	int *fd;
 
-	snprintf(path, PATH_MAX, "/proc/net/%s", name);
-	if ((fd = open(path, O_RDONLY)) < 0)
+	if (proto == IPPROTO_TCP) {
+		fd = &c->proc_net_tcp[ip_version][ns];
+		if (ip_version == V4)
+			path = "/proc/net/tcp";
+		else
+			path = "/proc/net/tcp6";
+	} else {
+		fd = &c->proc_net_udp[ip_version][ns];
+		if (ip_version == V4)
+			path = "/proc/net/udp";
+		else
+			path = "/proc/net/udp6";
+	}
+
+	if (*fd != -1)
+		lseek(*fd, 0, SEEK_SET);
+	else if ((*fd = open(path, O_RDONLY)) < 0)
 		return;
 
 	*line = 0;
-	line_read(line, sizeof(line), fd);
-	while (line_read(line, sizeof(line), fd)) {
+	line_read(line, sizeof(line), *fd);
+	while (line_read(line, sizeof(line), *fd)) {
 		/* NOLINTNEXTLINE(cert-err34-c): != 2 if conversion fails */
 		if (sscanf(line, "%*u: %*x:%lx %*x:%*x %x", &port, &state) != 2)
 			continue;
 
 		/* See enum in kernel's include/net/tcp_states.h */
-		if ((strstr(name, "tcp") && state != 0x0a) ||
-		    (strstr(name, "udp") && state != 0x07))
+		if ((proto == IPPROTO_TCP && state != 0x0a) ||
+		    (proto == IPPROTO_UDP && state != 0x07))
 			continue;
 
 		if (bitmap_isset(exclude, port))
@@ -463,25 +487,98 @@ void procfs_scan_listen(char *name, uint8_t *map, uint8_t *exclude)
 		else
 			bitmap_set(map, port);
 	}
+}
 
-	close(fd);
+/**
+ * drop_caps() - Drop capabilities we might have except for CAP_NET_BIND_SERVICE
+ */
+void drop_caps(void)
+{
+	int i;
+
+	for (i = 0; i < 64; i++) {
+		if (i == CAP_NET_BIND_SERVICE)
+			continue;
+
+		prctl(PR_CAPBSET_DROP, i, 0, 0, 0);
+	}
 }
 
 /**
- * ns_enter() - Enter configured network and user namespaces
+ * ns_enter() - Enter configured user (unless already joined) and network ns
  * @c:		Execution context
  *
- * Return: 0 on success, -1 on failure
+ * Return: 0, won't return on failure
  *
  * #syscalls:pasta setns
  */
 int ns_enter(struct ctx *c)
 {
-	if (!c->netns_only && setns(c->pasta_userns_fd, CLONE_NEWUSER))
-		return -errno;
+	if (!c->netns_only &&
+	    c->pasta_userns_fd != -1 &&
+	    setns(c->pasta_userns_fd, CLONE_NEWUSER))
+		exit(EXIT_FAILURE);
 
 	if (setns(c->pasta_netns_fd, CLONE_NEWNET))
-		return -errno;
+		exit(EXIT_FAILURE);
+
+	return 0;
+}
+
+/**
+ * pid_file() - Write PID to file, if requested to do so, and close it
+ * @fd:		Open PID file descriptor, closed on exit, -1 to skip writing it
+ * @pid:	PID value to write
+ */
+void write_pidfile(int fd, pid_t pid) {
+	char pid_buf[12];
+	int n;
+
+	if (fd == -1)
+		return;
+
+	n = snprintf(pid_buf, sizeof(pid_buf), "%i\n", pid);
+
+	if (write(fd, pid_buf, n) < 0) {
+		perror("PID file write");
+		exit(EXIT_FAILURE);
+	}
+
+	close(fd);
+}
+
+/**
+ * __daemon() - daemon()-like function writing PID file before parent exits
+ * @pidfile_fd:	Open PID file descriptor
+ * @devnull_fd:	Open file descriptor for /dev/null
+ *
+ * Return: child PID on success, won't return on failure
+ */
+int __daemon(int pidfile_fd, int devnull_fd)
+{
+	pid_t pid = fork();
+
+	if (pid == -1) {
+		perror("fork");
+		exit(EXIT_FAILURE);
+	}
+
+	if (pid) {
+		write_pidfile(pidfile_fd, pid);
+		exit(EXIT_SUCCESS);
+	}
+
+	errno = 0;
+
+	setsid();
+
+	dup2(devnull_fd, STDIN_FILENO);
+	dup2(devnull_fd, STDOUT_FILENO);
+	dup2(devnull_fd, STDERR_FILENO);
+	close(devnull_fd);
+
+	if (errno)
+		exit(EXIT_FAILURE);
 
 	return 0;
 }
diff --git a/util.h b/util.h
index add4c1e..b7852e9 100644
--- a/util.h
+++ b/util.h
@@ -54,6 +54,12 @@ void debug(const char *format, ...);
 #define STRINGIFY(x)	#x
 #define STR(x)		STRINGIFY(x)
 
+#ifdef P_tmpdir
+#define TMPDIR		P_tmpdir
+#else
+#define TMPDIR		"/tmp"
+#endif
+
 #define V4		0
 #define V6		1
 #define IP_VERSIONS	2
@@ -202,5 +208,9 @@ void bitmap_set(uint8_t *map, int bit);
 void bitmap_clear(uint8_t *map, int bit);
 int bitmap_isset(const uint8_t *map, int bit);
 char *line_read(char *buf, size_t len, int fd);
-void procfs_scan_listen(char *name, uint8_t *map, uint8_t *exclude);
+void procfs_scan_listen(struct ctx *c, uint8_t proto, int ip_version, int ns,
+			uint8_t *map, uint8_t *exclude);
+void drop_caps(void);
 int ns_enter(struct ctx *c);
+void write_pidfile(int fd, pid_t pid);
+int __daemon(int pidfile_fd, int devnull_fd);
-- 
@@ -54,6 +54,12 @@ void debug(const char *format, ...);
 #define STRINGIFY(x)	#x
 #define STR(x)		STRINGIFY(x)
 
+#ifdef P_tmpdir
+#define TMPDIR		P_tmpdir
+#else
+#define TMPDIR		"/tmp"
+#endif
+
 #define V4		0
 #define V6		1
 #define IP_VERSIONS	2
@@ -202,5 +208,9 @@ void bitmap_set(uint8_t *map, int bit);
 void bitmap_clear(uint8_t *map, int bit);
 int bitmap_isset(const uint8_t *map, int bit);
 char *line_read(char *buf, size_t len, int fd);
-void procfs_scan_listen(char *name, uint8_t *map, uint8_t *exclude);
+void procfs_scan_listen(struct ctx *c, uint8_t proto, int ip_version, int ns,
+			uint8_t *map, uint8_t *exclude);
+void drop_caps(void);
 int ns_enter(struct ctx *c);
+void write_pidfile(int fd, pid_t pid);
+int __daemon(int pidfile_fd, int devnull_fd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/18] passt: Make process not dumpable after sandboxing
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (2 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 03/18] passt, pasta: Namespace-based sandboxing, defer seccomp policy application Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 05/18] Makefile, conf, passt: Drop passt4netns references, explicit argc check Stefano Brivio
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

Two effects:

- ptrace() on passt and pasta can only be done by root, so that even
  if somebody gains access to the same user, they won't be able to
  check data passed in syscalls anyway. No core dumps allowed either

- /proc/PID files are owned by root:root, and they can't be read by
  the same user as the one passt or pasta are running with

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 passt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/passt.c b/passt.c
index 508d525..b5086d8 100644
--- a/passt.c
+++ b/passt.c
@@ -406,6 +406,8 @@ int main(int argc, char **argv)
 	else
 		write_pidfile(pidfile_fd, getpid());
 
+	prctl(PR_SET_DUMPABLE, 0);
+
 	seccomp(&c);
 
 	timer_init(&c, &now);
-- 
@@ -406,6 +406,8 @@ int main(int argc, char **argv)
 	else
 		write_pidfile(pidfile_fd, getpid());
 
+	prctl(PR_SET_DUMPABLE, 0);
+
 	seccomp(&c);
 
 	timer_init(&c, &now);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/18] Makefile, conf, passt: Drop passt4netns references, explicit argc check
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (3 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 04/18] passt: Make process not dumpable after sandboxing Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 06/18] slirp4netns.sh: Implement API socket option for port forwarding Stefano Brivio
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4231 bytes --]

Nobody currently calls this as passt4netns, that was the name I used
before 'pasta', drop any reference before it's too late.

While at it, explicitly check that argc is bigger than or equal to
one, just as a defensive measure: argv[0] being NULL is not an issue
anyway.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 Makefile |  7 ++-----
 conf.c   | 12 ++++++------
 passt.c  |  9 +++++++--
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/Makefile b/Makefile
index 5085578..8477cf0 100644
--- a/Makefile
+++ b/Makefile
@@ -62,7 +62,7 @@ endif
 
 prefix ?= /usr/local
 
-all: passt pasta passt4netns qrap
+all: passt pasta qrap
 
 avx2: CFLAGS += -Ofast -mavx2 -ftree-vectorize -funroll-loops
 avx2: clean all
@@ -81,16 +81,13 @@ pasta: passt
 	ln -s passt pasta
 	ln -s passt.1 pasta.1
 
-passt4netns: passt
-	ln -s passt passt4netns
-
 qrap: qrap.c passt.h
 	$(CC) $(CFLAGS) \
 		qrap.c -o qrap
 
 .PHONY: clean
 clean:
-	-${RM} passt *.o seccomp.h qrap pasta pasta.1 passt4netns \
+	-${RM} passt *.o seccomp.h qrap pasta pasta.1 \
 		passt.tar passt.tar.gz *.deb *.rpm
 
 install: passt pasta qrap
diff --git a/conf.c b/conf.c
index 732d918..2984ac2 100644
--- a/conf.c
+++ b/conf.c
@@ -532,7 +532,7 @@ static void conf_ip(struct ctx *c)
  */
 static void usage(const char *name)
 {
-	if (strstr(name, "pasta") || strstr(name, "passt4netns")) {
+	if (strstr(name, "pasta")) {
 		info("Usage: %s [OPTION]... [PID|PATH|NAME]", name);
 		info("");
 		info("Without PID|PATH|NAME, run the default shell in a new");
@@ -550,7 +550,7 @@ static void usage(const char *name)
 	info(   "    default: log to system logger only if started from a TTY");
 	info(   "  -h, --help		Display this help message and exit");
 
-	if (strstr(name, "pasta") || strstr(name, "passt4netns")) {
+	if (strstr(name, "pasta")) {
 		info(   "  -I, --ns-ifname NAME	namespace interface name");
 		info(   "    default: same interface name as external one");
 	} else {
@@ -562,7 +562,7 @@ static void usage(const char *name)
 	info(   "  -p, --pcap [FILE]	Log tap-facing traffic to pcap file");
 	info(   "    if FILE is not given, log to:");
 
-	if (strstr(name, "pasta") || strstr(name, "passt4netns"))
+	if (strstr(name, "pasta"))
 		info("      /tmp/pasta_ISO8601-TIMESTAMP_PID.pcap");
 	else
 		info("      /tmp/passt_ISO8601-TIMESTAMP_PID.pcap");
@@ -586,14 +586,14 @@ static void usage(const char *name)
 	info(   "  -D, --dns ADDR	Pass IPv4 or IPv6 address as DNS");
 	info(   "    can be specified multiple times");
 	info(   "    a single, empty option disables DNS information");
-	if (strstr(name, "pasta") || strstr(name, "passt4netns"))
+	if (strstr(name, "pasta"))
 		info(   "    default: don't send any addresses");
 	else
 		info(   "    default: use addresses from /etc/resolv.conf");
 
 	info(   "  -S, --search LIST	Space-separated list, search domains");
 	info(   "    a single, empty option disables the DNS search list");
-	if (strstr(name, "pasta") || strstr(name, "passt4netns"))
+	if (strstr(name, "pasta"))
 		info(   "    default: don't send any search list");
 	else
 		info(   "    default: use search list from /etc/resolv.conf");
@@ -609,7 +609,7 @@ static void usage(const char *name)
 	info(   "  -4, --ipv4-only	Enable IPv4 operation only");
 	info(   "  -6, --ipv6-only	Enable IPv6 operation only");
 
-	if (strstr(name, "pasta") || strstr(name, "passt4netns"))
+	if (strstr(name, "pasta"))
 		goto pasta_opts;
 
 	info(   "  -t, --tcp-ports SPEC	TCP port forwarding to guest");
diff --git a/passt.c b/passt.c
index b5086d8..67ad1c7 100644
--- a/passt.c
+++ b/passt.c
@@ -322,16 +322,21 @@ int main(int argc, char **argv)
 	sigaction(SIGTERM, &sa, NULL);
 	sigaction(SIGQUIT, &sa, NULL);
 
-	if (strstr(argv[0], "pasta") || strstr(argv[0], "passt4netns")) {
+	if (argc < 1)
+		exit(EXIT_FAILURE);
+
+	if (strstr(argv[0], "pasta")) {
 		sa.sa_handler = pasta_child_handler;
 		sigaction(SIGCHLD, &sa, NULL);
 		signal(SIGPIPE, SIG_IGN);
 
 		c.mode = MODE_PASTA;
 		log_name = "pasta";
-	} else {
+	} else if (strstr(argv[0], "passt")) {
 		c.mode = MODE_PASST;
 		log_name = "passt";
+	} else {
+		exit(EXIT_FAILURE);
 	}
 
 	if (madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE))
-- 
@@ -322,16 +322,21 @@ int main(int argc, char **argv)
 	sigaction(SIGTERM, &sa, NULL);
 	sigaction(SIGQUIT, &sa, NULL);
 
-	if (strstr(argv[0], "pasta") || strstr(argv[0], "passt4netns")) {
+	if (argc < 1)
+		exit(EXIT_FAILURE);
+
+	if (strstr(argv[0], "pasta")) {
 		sa.sa_handler = pasta_child_handler;
 		sigaction(SIGCHLD, &sa, NULL);
 		signal(SIGPIPE, SIG_IGN);
 
 		c.mode = MODE_PASTA;
 		log_name = "pasta";
-	} else {
+	} else if (strstr(argv[0], "passt")) {
 		c.mode = MODE_PASST;
 		log_name = "passt";
+	} else {
+		exit(EXIT_FAILURE);
 	}
 
 	if (madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/18] slirp4netns.sh: Implement API socket option for port forwarding
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (4 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 05/18] Makefile, conf, passt: Drop passt4netns references, explicit argc check Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 07/18] conf: Don't print configuration on --quiet Stefano Brivio
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 7037 bytes --]

Introduce the equivalent of the --api-socket option from slirp4netns:
spawn a subshell to handle requests, netcat binds to a UNIX domain
socket and jq parses messages.

Three minor differences compared to slirp4netns:

- IPv6 ports are forwarded too

- error messages are not as specific, for example we don't tell
  apart malformed JSON requests from invalid parameters

- host addresses are always 0.0.0.0 and ::1, pasta doesn't bind on
  specific addresses for different ports

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 slirp4netns.sh | 189 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 182 insertions(+), 7 deletions(-)

diff --git a/slirp4netns.sh b/slirp4netns.sh
index 7c2188d..1784926 100755
--- a/slirp4netns.sh
+++ b/slirp4netns.sh
@@ -12,13 +12,20 @@
 #
 # WARNING: Draft quality, not really tested
 #
-# Copyright (c) 2021 Red Hat GmbH
+# Copyright (c) 2021-2022 Red Hat GmbH
 # Author: Stefano Brivio <sbrivio(a)redhat.com>
 
 PASTA_PID="$(mktemp)"
 PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
 PASTA="$(command -v ./pasta || command -v pasta || :)"
 
+API_SOCKET=
+API_DIR="$(mktemp -d)"
+PORTS_DIR="${API_DIR}/ports"
+FIFO_REQ="${API_DIR}/req.fifo"
+FIFO_RESP="${API_DIR}/resp.fifo"
+PORT_ARGS=
+
 USAGE_RET=1
 NOTFOUND_RET=127
 
@@ -112,6 +119,172 @@ opt() {
 	esac
 }
 
+# start() - Start pasta
+start() {
+	${PASTA} ${PASTA_OPTS} ${PORT_ARGS} ${ns_spec}
+	[ ${RFD} -ne 0 ] && echo "1" >&${RFD} || :
+}
+
+# start() - Terminate pasta process
+stop() {
+	kill $(cat ${PASTA_PID})
+}
+
+# api_insert() - Handle add_hostfwd request, update PORT_ARGS
+# $1:	Protocol, "tcp" or "udp"
+# $2:	Host port
+# $3:	Guest port
+api_insert() {
+	__id=
+	__next_id=1	# slirp4netns starts from ID 1
+	PORT_ARGS=
+
+	for __entry in $(ls ${PORTS_DIR}); do
+		PORT_ARGS="${PORT_ARGS} $(cat "${PORTS_DIR}/${__entry}")"
+
+		if [ -z "${__id}" ] && [ ${__entry} -ne ${__next_id} ]; then
+			__id=${__next_id}
+		fi
+
+		__next_id=$((__next_id + 1))
+	done
+	[ -z "${__id}" ] && __id=${__next_id}
+
+	# Invalid ports are accepted by slirp4netns, store them as empty files.
+	# Unknown protocols aren't.
+
+	case ${1} in
+	"tcp") opt="-t" ;;
+	"udp") opt="-u" ;;
+	*)
+		echo '{"error":{"desc":"bad request: add_hostfwd: bad arguments.proto"}}'
+		return
+		;;
+	esac
+
+	if [ ${2} -ge 0 ] && [ ${2} -le 65535 ] && \
+	   [ ${3} -ge 0 ] && [ ${3} -le 65535 ]; then
+		echo "${opt} ${2}:${3}" > "${PORTS_DIR}/${__id}"
+		PORT_ARGS="${PORT_ARGS} ${opt} ${2}:${3}"
+	else
+		:> "${PORTS_DIR}/${__id}"
+	fi
+
+	echo "{ \"return\": {\"id\": ${__id}}}"
+
+	NEED_RESTART=1
+}
+
+# api_list_one() - Print a single port forwarding entry in JSON
+# $1:	ID
+# $2:	protocol option, -t or -u
+# $3:	host port
+# $4:	guest port
+api_list_one() {
+	[ "${2}" = "-t" ] && __proto="tcp" || __proto="udp"
+
+	printf '{"id": %i, "proto": "%s", "host_addr": "0.0.0.0", "host_port": %i, "guest_addr": "%s", "guest_port": %i}' \
+		"${1}" "${__proto}" "${3}" "${A4}" "${4}"
+}
+
+# api_list() - Handle list_hostfwd request: list port forwarding entries in JSON
+api_list() {
+	printf '{ "return": {"entries": ['
+
+	__first=1
+	for __entry in $(ls "${PORTS_DIR}"); do
+		[ ${__first} -eq 0 ] && printf ", " || __first=0
+		IFS=' :'
+		api_list_one ${__entry} $(cat ${PORTS_DIR}/${__entry})
+		unset IFS
+	done
+
+	printf ']}}'
+}
+
+# api_delete() - Handle remove_hostfwd request: delete entry, update PORT_ARGS
+# $1:	Entry ID -- caller *must* ensure it's a number
+api_delete() {
+	if [ ! -f "${PORTS_DIR}/${1}" ]; then
+		printf '{"error":{"desc":"bad request: remove_hostfwd: bad arguments.id"}}'
+		return
+	fi
+
+	rm "${PORTS_DIR}/${1}"
+
+	PORT_ARGS=
+	for __entry in $(ls ${PORTS_DIR}); do
+		PORT_ARGS="${PORT_ARGS} $(cat "${PORTS_DIR}/${__entry}")"
+	done
+
+	printf '{"return":{}}'
+
+	NEED_RESTART=1
+}
+
+# api_error() - Print generic error in JSON
+api_error() {
+	printf '{"error":{"desc":"bad request"}}'
+}
+
+# api_handler() - Entry point for slirp4netns-like API socket handler
+api_handler() {
+	trap 'exit 0' INT QUIT TERM
+	mkdir "${PORTS_DIR}"
+
+	while true; do
+		mkfifo "${FIFO_REQ}" "${FIFO_RESP}"
+
+		cat "${FIFO_RESP}" | nc -l -U "${API_SOCKET}" | \
+			tee /dev/null >"${FIFO_REQ}" & READER_PID=${!}
+
+		__req="$(dd count=1 2>/dev/null <${FIFO_REQ})"
+
+		>&2 echo "apifd event"
+		>&2 echo "api_handler: got request: ${__req}"
+
+		eval $(echo "${__req}" |
+			(jq -r 'to_entries | .[0] |
+			 .key + "=" + (.value | @sh)' ||
+			 printf 'execute=ERR'))
+
+		if [ "${execute}" != "list_hostfwd" ]; then
+			eval $(echo "${__req}" |
+				(jq -r '.arguments | to_entries | .[] |
+				 .key + "=" + (.value | @sh)' ||
+				 printf 'execute=ERR'))
+		fi
+
+		NEED_RESTART=0
+		case ${execute} in
+		"add_hostfwd")
+			api_insert "${proto}" "${host_port}" "${guest_port}"
+			__restart=1
+			;;
+		"list_hostfwd")
+			api_list
+			;;
+		"remove_hostfwd")
+			case ${id} in
+			''|*[!0-9]*)	api_error ;;
+			*)		api_delete "${id}"; __restart=1 ;;
+			esac
+			;;
+		*)
+			api_error
+			;;
+		esac >"${FIFO_RESP}"
+
+		kill ${READER_PID}
+
+		rm "${FIFO_REQ}" "${FIFO_RESP}"
+
+		[ ${NEED_RESTART} -eq 1 ] && { stop; start; }
+	done
+
+	exit 0
+}
+
 # usage() - Print slirpnetns(1) usage and exit indicating failure
 # $1:	Invalid option name, if any
 usage() {
@@ -177,7 +350,7 @@ while getopts ce:r:m:6a:hv-: OPT 2>/dev/null; do
 	r | ready-fd)		opt u32 RFD				      ;;
 	m | mtu)		opt mtu MTU && sub -m ${MTU}		      ;;
 	6 | enable-ipv6)	V6=1					      ;;
-	a | api-socket)		opt str API				      ;;
+	a | api-socket)		opt str API_SOCKET			      ;;
 	cidr)			opt net4 A4 M4 && sub -a ${A4} -n ${M4}	      ;;
 	disable-host-loopback)	add "--no-map-gw" && no_map_gw=1	      ;;
 	netns-type)		: Autodetected				      ;;
@@ -203,14 +376,15 @@ if [ ${v6} -eq 1 ]; then
 	add "-a $(gen_addr6) -g fd00::2 -D fd00::3"
 fi
 
-${PASTA} ${PASTA_OPTS} ${ns_spec} && \
-	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
+start
+[ -n "${API_SOCKET}" ] && api_handler </dev/null &
+trap "stop; rm -rf ${API_DIR}; rm -f ${API_SOCKET}; rm ${PASTA_PID}" EXIT
+trap 'exit 0' INT QUIT TERM
 
-trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM EXIT
+>&2 echo "sent tapfd=5 for ${ifname}"
+>&2 echo "received tapfd=5"
 
 cat << EOF
-sent tapfd=5 for ${ifname}
-received tapfd=5
 Starting slirp
 * MTU:             ${MTU}
 * Network:         ${A4}
@@ -219,6 +393,7 @@ Starting slirp
 * DNS:             10.0.2.3
 * Recommended IP:  10.0.2.100
 EOF
+[ -n "${API_SOCKET}" ] && echo "* API socket:      ${API_SOCKET}"
 
 if [ ${no_map_gw} -eq 0 ]; then
 	echo "WARNING: 127.0.0.1:* on the host is accessible as 10.0.2.2 (set --disable-host-loopback to prohibit connecting to 127.0.0.1:*)"
-- 
@@ -12,13 +12,20 @@
 #
 # WARNING: Draft quality, not really tested
 #
-# Copyright (c) 2021 Red Hat GmbH
+# Copyright (c) 2021-2022 Red Hat GmbH
 # Author: Stefano Brivio <sbrivio(a)redhat.com>
 
 PASTA_PID="$(mktemp)"
 PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
 PASTA="$(command -v ./pasta || command -v pasta || :)"
 
+API_SOCKET=
+API_DIR="$(mktemp -d)"
+PORTS_DIR="${API_DIR}/ports"
+FIFO_REQ="${API_DIR}/req.fifo"
+FIFO_RESP="${API_DIR}/resp.fifo"
+PORT_ARGS=
+
 USAGE_RET=1
 NOTFOUND_RET=127
 
@@ -112,6 +119,172 @@ opt() {
 	esac
 }
 
+# start() - Start pasta
+start() {
+	${PASTA} ${PASTA_OPTS} ${PORT_ARGS} ${ns_spec}
+	[ ${RFD} -ne 0 ] && echo "1" >&${RFD} || :
+}
+
+# start() - Terminate pasta process
+stop() {
+	kill $(cat ${PASTA_PID})
+}
+
+# api_insert() - Handle add_hostfwd request, update PORT_ARGS
+# $1:	Protocol, "tcp" or "udp"
+# $2:	Host port
+# $3:	Guest port
+api_insert() {
+	__id=
+	__next_id=1	# slirp4netns starts from ID 1
+	PORT_ARGS=
+
+	for __entry in $(ls ${PORTS_DIR}); do
+		PORT_ARGS="${PORT_ARGS} $(cat "${PORTS_DIR}/${__entry}")"
+
+		if [ -z "${__id}" ] && [ ${__entry} -ne ${__next_id} ]; then
+			__id=${__next_id}
+		fi
+
+		__next_id=$((__next_id + 1))
+	done
+	[ -z "${__id}" ] && __id=${__next_id}
+
+	# Invalid ports are accepted by slirp4netns, store them as empty files.
+	# Unknown protocols aren't.
+
+	case ${1} in
+	"tcp") opt="-t" ;;
+	"udp") opt="-u" ;;
+	*)
+		echo '{"error":{"desc":"bad request: add_hostfwd: bad arguments.proto"}}'
+		return
+		;;
+	esac
+
+	if [ ${2} -ge 0 ] && [ ${2} -le 65535 ] && \
+	   [ ${3} -ge 0 ] && [ ${3} -le 65535 ]; then
+		echo "${opt} ${2}:${3}" > "${PORTS_DIR}/${__id}"
+		PORT_ARGS="${PORT_ARGS} ${opt} ${2}:${3}"
+	else
+		:> "${PORTS_DIR}/${__id}"
+	fi
+
+	echo "{ \"return\": {\"id\": ${__id}}}"
+
+	NEED_RESTART=1
+}
+
+# api_list_one() - Print a single port forwarding entry in JSON
+# $1:	ID
+# $2:	protocol option, -t or -u
+# $3:	host port
+# $4:	guest port
+api_list_one() {
+	[ "${2}" = "-t" ] && __proto="tcp" || __proto="udp"
+
+	printf '{"id": %i, "proto": "%s", "host_addr": "0.0.0.0", "host_port": %i, "guest_addr": "%s", "guest_port": %i}' \
+		"${1}" "${__proto}" "${3}" "${A4}" "${4}"
+}
+
+# api_list() - Handle list_hostfwd request: list port forwarding entries in JSON
+api_list() {
+	printf '{ "return": {"entries": ['
+
+	__first=1
+	for __entry in $(ls "${PORTS_DIR}"); do
+		[ ${__first} -eq 0 ] && printf ", " || __first=0
+		IFS=' :'
+		api_list_one ${__entry} $(cat ${PORTS_DIR}/${__entry})
+		unset IFS
+	done
+
+	printf ']}}'
+}
+
+# api_delete() - Handle remove_hostfwd request: delete entry, update PORT_ARGS
+# $1:	Entry ID -- caller *must* ensure it's a number
+api_delete() {
+	if [ ! -f "${PORTS_DIR}/${1}" ]; then
+		printf '{"error":{"desc":"bad request: remove_hostfwd: bad arguments.id"}}'
+		return
+	fi
+
+	rm "${PORTS_DIR}/${1}"
+
+	PORT_ARGS=
+	for __entry in $(ls ${PORTS_DIR}); do
+		PORT_ARGS="${PORT_ARGS} $(cat "${PORTS_DIR}/${__entry}")"
+	done
+
+	printf '{"return":{}}'
+
+	NEED_RESTART=1
+}
+
+# api_error() - Print generic error in JSON
+api_error() {
+	printf '{"error":{"desc":"bad request"}}'
+}
+
+# api_handler() - Entry point for slirp4netns-like API socket handler
+api_handler() {
+	trap 'exit 0' INT QUIT TERM
+	mkdir "${PORTS_DIR}"
+
+	while true; do
+		mkfifo "${FIFO_REQ}" "${FIFO_RESP}"
+
+		cat "${FIFO_RESP}" | nc -l -U "${API_SOCKET}" | \
+			tee /dev/null >"${FIFO_REQ}" & READER_PID=${!}
+
+		__req="$(dd count=1 2>/dev/null <${FIFO_REQ})"
+
+		>&2 echo "apifd event"
+		>&2 echo "api_handler: got request: ${__req}"
+
+		eval $(echo "${__req}" |
+			(jq -r 'to_entries | .[0] |
+			 .key + "=" + (.value | @sh)' ||
+			 printf 'execute=ERR'))
+
+		if [ "${execute}" != "list_hostfwd" ]; then
+			eval $(echo "${__req}" |
+				(jq -r '.arguments | to_entries | .[] |
+				 .key + "=" + (.value | @sh)' ||
+				 printf 'execute=ERR'))
+		fi
+
+		NEED_RESTART=0
+		case ${execute} in
+		"add_hostfwd")
+			api_insert "${proto}" "${host_port}" "${guest_port}"
+			__restart=1
+			;;
+		"list_hostfwd")
+			api_list
+			;;
+		"remove_hostfwd")
+			case ${id} in
+			''|*[!0-9]*)	api_error ;;
+			*)		api_delete "${id}"; __restart=1 ;;
+			esac
+			;;
+		*)
+			api_error
+			;;
+		esac >"${FIFO_RESP}"
+
+		kill ${READER_PID}
+
+		rm "${FIFO_REQ}" "${FIFO_RESP}"
+
+		[ ${NEED_RESTART} -eq 1 ] && { stop; start; }
+	done
+
+	exit 0
+}
+
 # usage() - Print slirpnetns(1) usage and exit indicating failure
 # $1:	Invalid option name, if any
 usage() {
@@ -177,7 +350,7 @@ while getopts ce:r:m:6a:hv-: OPT 2>/dev/null; do
 	r | ready-fd)		opt u32 RFD				      ;;
 	m | mtu)		opt mtu MTU && sub -m ${MTU}		      ;;
 	6 | enable-ipv6)	V6=1					      ;;
-	a | api-socket)		opt str API				      ;;
+	a | api-socket)		opt str API_SOCKET			      ;;
 	cidr)			opt net4 A4 M4 && sub -a ${A4} -n ${M4}	      ;;
 	disable-host-loopback)	add "--no-map-gw" && no_map_gw=1	      ;;
 	netns-type)		: Autodetected				      ;;
@@ -203,14 +376,15 @@ if [ ${v6} -eq 1 ]; then
 	add "-a $(gen_addr6) -g fd00::2 -D fd00::3"
 fi
 
-${PASTA} ${PASTA_OPTS} ${ns_spec} && \
-	[ ${RFD} -ne 0 ] && echo "1" >&${RFD}
+start
+[ -n "${API_SOCKET}" ] && api_handler </dev/null &
+trap "stop; rm -rf ${API_DIR}; rm -f ${API_SOCKET}; rm ${PASTA_PID}" EXIT
+trap 'exit 0' INT QUIT TERM
 
-trap "kill $(cat ${PASTA_PID}); rm ${PASTA_PID}" INT TERM EXIT
+>&2 echo "sent tapfd=5 for ${ifname}"
+>&2 echo "received tapfd=5"
 
 cat << EOF
-sent tapfd=5 for ${ifname}
-received tapfd=5
 Starting slirp
 * MTU:             ${MTU}
 * Network:         ${A4}
@@ -219,6 +393,7 @@ Starting slirp
 * DNS:             10.0.2.3
 * Recommended IP:  10.0.2.100
 EOF
+[ -n "${API_SOCKET}" ] && echo "* API socket:      ${API_SOCKET}"
 
 if [ ${no_map_gw} -eq 0 ]; then
 	echo "WARNING: 127.0.0.1:* on the host is accessible as 10.0.2.2 (set --disable-host-loopback to prohibit connecting to 127.0.0.1:*)"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/18] conf: Don't print configuration on --quiet
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (5 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 06/18] slirp4netns.sh: Implement API socket option for port forwarding Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 08/18] conf: Given IPv4 address and no netmask, assign RFC 790-style classes Stefano Brivio
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 conf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/conf.c b/conf.c
index 2984ac2..41895de 100644
--- a/conf.c
+++ b/conf.c
@@ -1239,5 +1239,6 @@ void conf(struct ctx *c, int argc, char **argv)
 		}
 	}
 
-	conf_print(c);
+	if (!c->quiet)
+		conf_print(c);
 }
-- 
@@ -1239,5 +1239,6 @@ void conf(struct ctx *c, int argc, char **argv)
 		}
 	}
 
-	conf_print(c);
+	if (!c->quiet)
+		conf_print(c);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/18] conf: Given IPv4 address and no netmask, assign RFC 790-style classes
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (6 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 07/18] conf: Don't print configuration on --quiet Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 09/18] conf, udp: Introduce basic DNS forwarding Stefano Brivio
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

Provide a sane default, instead of /0, if an address is given, and it
doesn't correspond to any host address we could find via netlink.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 conf.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/conf.c b/conf.c
index 41895de..279fdfe 100644
--- a/conf.c
+++ b/conf.c
@@ -468,17 +468,17 @@ static void conf_ip(struct ctx *c)
 
 			nl_addr(0, c->ifi, AF_INET, &c->addr4, &mask_len, NULL);
 			c->mask4 = htonl(0xffffffff << (32 - mask_len));
+		}
 
-			if (!c->mask4) {
-				if (IN_CLASSA(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSA_NET);
-				else if (IN_CLASSB(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSB_NET);
-				else if (IN_CLASSC(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSC_NET);
-				else
-					c->mask4 = 0xffffffff;
-			}
+		if (!c->mask4) {
+			if (IN_CLASSA(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSA_NET);
+			else if (IN_CLASSB(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSB_NET);
+			else if (IN_CLASSC(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSC_NET);
+			else
+				c->mask4 = 0xffffffff;
 		}
 
 		memcpy(&c->addr4_seen, &c->addr4, sizeof(c->addr4_seen));
-- 
@@ -468,17 +468,17 @@ static void conf_ip(struct ctx *c)
 
 			nl_addr(0, c->ifi, AF_INET, &c->addr4, &mask_len, NULL);
 			c->mask4 = htonl(0xffffffff << (32 - mask_len));
+		}
 
-			if (!c->mask4) {
-				if (IN_CLASSA(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSA_NET);
-				else if (IN_CLASSB(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSB_NET);
-				else if (IN_CLASSC(ntohl(c->addr4)))
-					c->mask4 = htonl(IN_CLASSC_NET);
-				else
-					c->mask4 = 0xffffffff;
-			}
+		if (!c->mask4) {
+			if (IN_CLASSA(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSA_NET);
+			else if (IN_CLASSB(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSB_NET);
+			else if (IN_CLASSC(ntohl(c->addr4)))
+				c->mask4 = htonl(IN_CLASSC_NET);
+			else
+				c->mask4 = 0xffffffff;
 		}
 
 		memcpy(&c->addr4_seen, &c->addr4, sizeof(c->addr4_seen));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/18] conf, udp: Introduce basic DNS forwarding
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (7 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 08/18] conf: Given IPv4 address and no netmask, assign RFC 790-style classes Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 10/18] udp: Allow loopback connections from host using configured unicast address Stefano Brivio
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 18182 bytes --]

For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.

This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.

Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.

This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 conf.c         | 102 +++++++++++++++++++++++++++++++++++++++----------
 dhcp.c         |   5 ++-
 dhcpv6.c       |   7 ++++
 ndp.c          |   6 ++-
 passt.1        |  63 +++++++++++++++++++++++++-----
 passt.h        |  14 +++++--
 slirp4netns.sh |   2 +-
 udp.c          |  16 ++++++++
 8 files changed, 177 insertions(+), 38 deletions(-)

diff --git a/conf.c b/conf.c
index 279fdfe..21e9bc0 100644
--- a/conf.c
+++ b/conf.c
@@ -279,7 +279,7 @@ static void get_dns(struct ctx *c)
 	dns4_set = !c->v4  || !!*dns4;
 	dns6_set = !c->v6  || !IN6_IS_ADDR_UNSPECIFIED(dns6);
 	dnss_set = !!*s->n || c->no_dns_search;
-	dns_set = dns4_set || dns6_set || c->no_dns;
+	dns_set = (dns4_set && dns6_set) || c->no_dns;
 
 	if (dns_set && dnss_set)
 		return;
@@ -583,21 +583,35 @@ static void usage(const char *name)
 	info(   "    default: gateway from interface with default route");
 	info(   "  -i, --interface NAME	Interface for addresses and routes");
 	info(   "    default: interface with first default route");
-	info(   "  -D, --dns ADDR	Pass IPv4 or IPv6 address as DNS");
+	info(   "  -D, --dns ADDR	Use IPv4 or IPv6 address as DNS");
 	info(   "    can be specified multiple times");
 	info(   "    a single, empty option disables DNS information");
 	if (strstr(name, "pasta"))
-		info(   "    default: don't send any addresses");
+		info(   "    default: don't use any addresses");
 	else
 		info(   "    default: use addresses from /etc/resolv.conf");
 
 	info(   "  -S, --search LIST	Space-separated list, search domains");
 	info(   "    a single, empty option disables the DNS search list");
 	if (strstr(name, "pasta"))
-		info(   "    default: don't send any search list");
+		info(   "    default: don't use any search list");
 	else
 		info(   "    default: use search list from /etc/resolv.conf");
 
+	if (strstr(name, "pasta"))
+		info("  --dhcp-dns:	\tPass DNS list via DHCP/DHCPv6/NDP");
+	else
+		info("  --no-dhcp-dns:	No DNS list in DHCP/DHCPv6/NDP");
+
+	if (strstr(name, "pasta"))
+		info("  --dhcp-search:	Pass list via DHCP/DHCPv6/NDP");
+	else
+		info("  --no-dhcp-search:	No list in DHCP/DHCPv6/NDP");
+
+	info(   "  --dns-forward ADDR	Forward DNS queries sent to ADDR");
+	info(   "    can be specified zero to two times (for IPv4 and IPv6)");
+	info(   "    default: don't forward DNS queries");
+
 	info(   "  --no-tcp		Disable TCP protocol handler");
 	info(   "  --no-udp		Disable UDP protocol handler");
 	info(   "  --no-icmp		Disable ICMP/ICMPv6 protocol handler");
@@ -699,22 +713,18 @@ void conf_print(struct ctx *c)
 			info("    router: %s",
 			     inet_ntop(AF_INET, &c->gw4,   buf4, sizeof(buf4)));
 		}
-	}
 
-	if (!c->no_dns && !(c->no_dhcp && c->no_ndp && c->no_dhcpv6)) {
 		for (i = 0; c->dns4[i]; i++) {
 			if (!i)
-				info("    DNS:");
+				info("DNS:");
 			inet_ntop(AF_INET, &c->dns4[i], buf4, sizeof(buf4));
-			info("        %s", buf4);
+			info("    %s", buf4);
 		}
-	}
 
-	if (!c->no_dns_search && !(c->no_dhcp && c->no_ndp && c->no_dhcpv6)) {
 		for (i = 0; *c->dns_search[i].n; i++) {
 			if (!i)
-				info("        search:");
-			info("            %s", c->dns_search[i].n);
+				info("DNS search list:");
+			info("    %s", c->dns_search[i].n);
 		}
 	}
 
@@ -728,7 +738,7 @@ void conf_print(struct ctx *c)
 		else if (!c->no_dhcpv6)
 			info("NDP:");
 		else
-			return;
+			goto dns6;
 
 		info("    assign: %s",
 		     inet_ntop(AF_INET6, &c->addr6, buf6, sizeof(buf6)));
@@ -737,17 +747,18 @@ void conf_print(struct ctx *c)
 		info("    our link-local: %s",
 		     inet_ntop(AF_INET6, &c->addr6_ll, buf6, sizeof(buf6)));
 
+dns6:
 		for (i = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->dns6[i]); i++) {
 			if (!i)
-				info("    DNS:");
+				info("DNS:");
 			inet_ntop(AF_INET6, &c->dns6[i], buf6, sizeof(buf6));
-			info("        %s", buf6);
+			info("    %s", buf6);
 		}
 
 		for (i = 0; *c->dns_search[i].n; i++) {
 			if (!i)
-				info("        search:");
-			info("            %s", c->dns_search[i].n);
+				info("DNS search list:");
+			info("    %s", c->dns_search[i].n);
 		}
 	}
 }
@@ -797,6 +808,11 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"nsrun-dir",	required_argument,	NULL,		3 },
 		{"config-net",	no_argument,		&c->pasta_conf_ns, 1 },
 		{"ns-mac-addr",	required_argument,	NULL,		4 },
+		{"dhcp-dns",	no_argument,		NULL,		5 },
+		{"no-dhcp-dns",	no_argument,		NULL,		6 },
+		{"dhcp-search", no_argument,		NULL,		7 },
+		{"no-dhcp-search", no_argument,		NULL,		8 },
+		{"dns-forward",	required_argument,	NULL,		9 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -808,6 +824,9 @@ void conf(struct ctx *c, int argc, char **argv)
 	int name, ret, mask, b, i;
 	uint32_t *dns4 = c->dns4;
 
+	if (c->mode == MODE_PASTA)
+		c->no_dhcp_dns = c->no_dhcp_dns_search = 1;
+
 	do {
 		enum conf_port_type *set = NULL;
 		const char *optstring;
@@ -873,6 +892,51 @@ void conf(struct ctx *c, int argc, char **argv)
 				c->mac_guest[i] = b;
 			}
 			break;
+		case 5:
+			if (c->mode != MODE_PASTA) {
+				err("--dhcp-dns is for pasta mode only");
+				usage(argv[0]);
+			}
+			c->no_dhcp_dns = 0;
+			break;
+		case 6:
+			if (c->mode != MODE_PASST) {
+				err("--no-dhcp-dns is for passt mode only");
+				usage(argv[0]);
+			}
+			c->no_dhcp_dns = 1;
+			break;
+		case 7:
+			if (c->mode != MODE_PASTA) {
+				err("--dhcp-search is for pasta mode only");
+				usage(argv[0]);
+			}
+			c->no_dhcp_dns_search = 0;
+			break;
+		case 8:
+			if (c->mode != MODE_PASST) {
+				err("--no-dhcp-search is for passt mode only");
+				usage(argv[0]);
+			}
+			c->no_dhcp_dns_search = 1;
+			break;
+		case 9:
+			if (IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd)	&&
+			    inet_pton(AF_INET6, optarg, &c->dns6_fwd)	&&
+			    !IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd)	&&
+			    !IN6_IS_ADDR_LOOPBACK(&c->dns6_fwd))
+				break;
+
+			if (c->dns4_fwd == INADDR_ANY			&&
+			    inet_pton(AF_INET, optarg, &c->dns4_fwd)	&&
+			    c->dns4_fwd != INADDR_ANY			&&
+			    c->dns4_fwd != INADDR_BROADCAST		&&
+			    c->dns4_fwd != INADDR_LOOPBACK)
+				break;
+
+			err("Invalid DNS forwarding address: %s", optarg);
+			usage(argv[0]);
+			break;
 		case 'd':
 			if (c->debug) {
 				err("Multiple --debug options given");
@@ -1189,10 +1253,6 @@ void conf(struct ctx *c, int argc, char **argv)
 	if (!c->mtu)
 		c->mtu = ROUND_DOWN(ETH_MAX_MTU - ETH_HLEN, sizeof(uint32_t));
 
-	if (c->mode == MODE_PASTA && dns4 == c->dns4 && dns6 == c->dns6)
-		c->no_dns = 1;
-	if (c->mode == MODE_PASTA && dnss == c->dns_search)
-		c->no_dns_search = 1;
 	get_dns(c);
 
 	if (!*c->pasta_ifn)
diff --git a/dhcp.c b/dhcp.c
index a052397..ab1249c 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -333,12 +333,13 @@ int dhcp(struct ctx *c, struct ethhdr *eh, size_t len)
 		opts[26].s[1] = c->mtu % 256;
 	}
 
-	for (i = 0, opts[6].slen = 0; c->dns4[i]; i++) {
+	for (i = 0, opts[6].slen = 0; !c->no_dhcp_dns && c->dns4[i]; i++) {
 		((uint32_t *)opts[6].s)[i] = c->dns4[i];
 		opts[6].slen += sizeof(uint32_t);
 	}
 
-	opt_set_dns_search(c, sizeof(m->o));
+	if (!c->no_dhcp_dns_search)
+		opt_set_dns_search(c, sizeof(m->o));
 
 	uh->len = htons(len = offsetof(struct msg, o) + fill(m) + sizeof(*uh));
 	uh->check = 0;
diff --git a/dhcpv6.c b/dhcpv6.c
index e4113bc..b79a8e9 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -394,6 +394,9 @@ static size_t dhcpv6_dns_fill(struct ctx *c, char *buf, int offset)
 	char *p = NULL;
 	int i;
 
+	if (c->no_dhcp_dns)
+		goto search;
+
 	for (i = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->dns6[i]); i++) {
 		if (!i) {
 			srv = (struct opt_dns_servers *)(buf + offset);
@@ -410,6 +413,10 @@ static size_t dhcpv6_dns_fill(struct ctx *c, char *buf, int offset)
 	if (srv)
 		srv->hdr.l = htons(srv->hdr.l);
 
+search:
+	if (c->no_dhcp_dns_search)
+		return offset;
+
 	for (i = 0; *c->dns_search[i].n; i++) {
 		if (!i) {
 			srch = (struct opt_dns_search *)(buf + offset);
diff --git a/ndp.c b/ndp.c
index 386098c..6b1c1a8 100644
--- a/ndp.c
+++ b/ndp.c
@@ -127,6 +127,9 @@ int ndp(struct ctx *c, struct ethhdr *eh, size_t len)
 			p += 4;
 		}
 
+		if (c->no_dhcp_dns)
+			goto dns_done;
+
 		for (n = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->dns6[n]); n++);
 		if (n) {
 			*p++ = 25;			/* RDNSS */
@@ -144,7 +147,7 @@ int ndp(struct ctx *c, struct ethhdr *eh, size_t len)
 				dns_s_len += strlen(c->dns_search[n].n) + 2;
 		}
 
-		if (dns_s_len) {
+		if (!c->no_dhcp_dns_search && dns_s_len) {
 			*p++ = 31;			/* DNSSL */
 			*p++ = (len + 8 - 1) / 8 + 1;	/* length */
 			p += 2;				/* reserved */
@@ -171,6 +174,7 @@ int ndp(struct ctx *c, struct ethhdr *eh, size_t len)
 			p += 8 - dns_s_len % 8;
 		}
 
+dns_done:
 		*p++ = 1;			/* source ll */
 		*p++ = 1;			/* length */
 		memcpy(p, c->mac, ETH_ALEN);
diff --git a/passt.1 b/passt.1
index 92681f6..7070a31 100644
--- a/passt.1
+++ b/passt.1
@@ -165,19 +165,62 @@ Default is to use the interface with the first default route.
 
 .TP
 .BR \-D ", " \-\-dns " " \fIaddr
-Assign IPv4 \fIaddr\fR via DHCP (option 23) or IPv6 \fIaddr\fR via NDP Router
-Advertisement (option type 25) and DHCPv6 (option 23) as DNS resolver.
+Use \fIaddr\fR (IPv4 or IPv6) for DHCP, DHCPv6, NDP or DNS forwarding, as
+configured (see options \fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR,
+\fB--dns-forward\fR) instead of reading addresses from \fI/etc/resolv.conf\fR.
 This option can be specified multiple times, and a single, empty option disables
-DNS options altogether.
-In \fBpasst\fR mode, default is to use addresses from \fI/etc/resolv.conf\fR,
-and, in \fBpasta\fR mode, no addresses are sent by default.
+usage of DNS addresses altogether.
+
+.TP
+.BR \-D ", " \-\-dns " " \fIaddr
+Use \fIaddr\fR (IPv4 or IPv6) for DHCP, DHCPv6, NDP or DNS forwarding, as
+configured (see options \fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR,
+\fB--dns-forward\fR) instead of reading addresses from \fI/etc/resolv.conf\fR.
+This option can be specified multiple times, and a single, empty option disables
+usage of DNS addresses altogether.
+
+.TP
+.BR \-\-dns-forward " " \fIaddr
+Map \fIaddr\fR (IPv4 or IPv6) as seen from guest or namespace to the first
+configured DNS resolver (with corresponding IP version). Mapping is limited to
+UDP traffic directed to port 53, and DNS answers are translated back with a
+reverse mapping.
+This option can be specified zero to two times (once for IPv4, once for IPv6).
+
 .TP
 .BR \-S ", " \-\-search " " \fIlist
-Assign space-separated \fIlist\fR via DHCP (option 119), via NDP Router
-Advertisement (option type 31) and DHCPv6 (option 24) as DNS domain search list.
-A single, empty option disables sending the DNS domain search list.
-In \fBpasst\fR mode, default is to use the search list from
-\fI/etc/resolv.conf\fR, and, in \fBpasta\fR mode, no list is sent by default.
+Use space-separated \fIlist\fR for DHCP, DHCPv6, and NDP purposes, instead of
+reading entries from \fI/etc/resolv.conf\fR. See options \fB--no-dhcp-search\fR
+and \fB--dhcp-search\fR. A single, empty option disables the DNS domain search
+list altogether.
+
+.TP
+.BR \-\-no-dhcp-dns " " \fIaddr
+In \fIpasst\fR mode, do not assign IPv4 addresses via DHCP (option 23) or IPv6
+addresses via NDP Router Advertisement (option type 25) and DHCPv6 (option 23)
+as DNS resolvers.
+By default, all the configured addresses are passed.
+
+.TP
+.BR \-\-dhcp-dns " " \fIaddr
+In \fIpasta\fR mode, assign IPv4 addresses via DHCP (option 23) or IPv6
+addresses via NDP Router Advertisement (option type 25) and DHCPv6 (option 23)
+as DNS resolvers.
+By default, configured addresses, if any, are not passed.
+
+.TP
+.BR \-\-no-dhcp-search " " \fIaddr
+In \fIpasst\fR mode, do not send the DNS domain search list addresses via DHCP
+(option 119), via NDP Router Advertisement (option type 31) and DHCPv6 (option
+24).
+By default, the DNS domain search list resulting from configuration is passed.
+
+.TP
+.BR \-\-dhcp-search " " \fIaddr
+In \fIpasta\fR mode, send the DNS domain search list addresses via DHCP (option
+119), via NDP Router Advertisement (option type 31) and DHCPv6 (option 24).
+By default, the DNS domain search list resulting from configuration is not
+passed.
 
 .TP
 .BR \-\-no-tcp
diff --git a/passt.h b/passt.h
index d7011da..2589ee7 100644
--- a/passt.h
+++ b/passt.h
@@ -114,6 +114,7 @@ enum passt_modes {
  * @mask4:		IPv4 netmask, network order
  * @gw4:		Default IPv4 gateway, network order
  * @dns4:		IPv4 DNS addresses, zero-terminated, network order
+ * @dns4_fwd:		Address forwarded (UDP) to first IPv4 DNS, network order
  * @dns_search:		DNS search list
  * @v6:			Enable IPv6 transport
  * @addr6:		IPv6 address for external, routable interface
@@ -121,7 +122,8 @@ enum passt_modes {
  * @addr6_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr6_ll_seen:	Latest IPv6 link-local address seen as source from tap
  * @gw6:		Default IPv6 gateway
- * @dns4:		IPv4 DNS addresses, zero-terminated
+ * @dns6:		IPv6 DNS addresses, zero-terminated
+ * @dns6_fwd:		Address forwarded (UDP) to first IPv6 DNS, network order
  * @ifi:		Index of routable interface
  * @pasta_ifn:		Name of namespace interface for pasta
  * @pasta_ifn:		Index of namespace interface for pasta
@@ -133,8 +135,10 @@ enum passt_modes {
  * @no_icmp:		Disable ICMP operation
  * @icmp:		Context for ICMP protocol handler
  * @mtu:		MTU passed via DHCP/NDP
- * @no_dns:		Do not assign any DNS server via DHCP/DHCPv6/NDP
- * @no_dns_search:	Do not assign any DNS domain search via DHCP/DHCPv6/NDP
+ * @no_dns:		Do not source/use DNS servers for any purpose
+ * @no_dns_search:	Do not source/use domain search lists for any purpose
+ * @no_dhcp_dns:	Do not assign any DNS server via DHCP/DHCPv6/NDP
+ * @no_dhcp_dns_search:	Do not assign any DNS domain search via DHCP/DHCPv6/NDP
  * @no_dhcp:		Disable DHCP server
  * @no_dhcpv6:		Disable DHCPv6 server
  * @no_ndp:		Disable NDP handler altogether
@@ -172,6 +176,7 @@ struct ctx {
 	uint32_t mask4;
 	uint32_t gw4;
 	uint32_t dns4[MAXNS + 1];
+	uint32_t dns4_fwd;
 
 	struct fqdn dns_search[MAXDNSRCH];
 
@@ -182,6 +187,7 @@ struct ctx {
 	struct in6_addr addr6_ll_seen;
 	struct in6_addr gw6;
 	struct in6_addr dns6[MAXNS + 1];
+	struct in6_addr dns6_fwd;
 
 	unsigned int ifi;
 	char pasta_ifn[IF_NAMESIZE];
@@ -198,6 +204,8 @@ struct ctx {
 	int mtu;
 	int no_dns;
 	int no_dns_search;
+	int no_dhcp_dns;
+	int no_dhcp_dns_search;
 	int no_dhcp;
 	int no_dhcpv6;
 	int no_ndp;
diff --git a/slirp4netns.sh b/slirp4netns.sh
index 1784926..ff12a52 100755
--- a/slirp4netns.sh
+++ b/slirp4netns.sh
@@ -16,7 +16,7 @@
 # Author: Stefano Brivio <sbrivio(a)redhat.com>
 
 PASTA_PID="$(mktemp)"
-PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
+PASTA_OPTS="-q --ipv4-only -a 10.0.2.0 -n 24 -g 10.0.2.2 --dns-forward 10.0.2.3 -m 1500 --no-ndp --no-dhcpv6 --no-dhcp -P ${PASTA_PID}"
 PASTA="$(command -v ./pasta || command -v pasta || :)"
 
 API_SOCKET=
diff --git a/udp.c b/udp.c
index 348f695..2fc52d3 100644
--- a/udp.c
+++ b/udp.c
@@ -718,6 +718,12 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 					udp_tap_map[V6][src].loopback = 0;
 
 				bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
+			} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd) &&
+				   !memcmp(&b->s_in6.sin6_addr, &c->dns6_fwd,
+					   sizeof(c->dns6_fwd)) &&
+				   ntohs(b->s_in6.sin6_port) == 53) {
+				b->ip6h.daddr = c->addr6_seen;
+				b->ip6h.saddr = c->dns6_fwd;
 			} else {
 				b->ip6h.daddr = c->addr6_seen;
 				b->ip6h.saddr = b->s_in6.sin6_addr;
@@ -797,6 +803,10 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 					udp_tap_map[V4][src].loopback = 1;
 
 				bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
+			} else if (c->dns4_fwd &&
+				   s_addr == ntohl(c->dns4[0]) &&
+				   ntohs(b->s_in.sin_port) == 53) {
+				b->iph.saddr = c->dns4_fwd;
 			} else {
 				b->iph.saddr = b->s_in.sin_addr.s_addr;
 			}
@@ -958,6 +968,9 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
 			else
 				s_in.sin_addr.s_addr = c->addr4_seen;
+		} else if (s_in.sin_addr.s_addr == c->dns4_fwd &&
+			   ntohs(s_in.sin_port) == 53) {
+			s_in.sin_addr.s_addr = c->dns4[0];
 		}
 	} else {
 		s_in6 = (struct sockaddr_in6) {
@@ -976,6 +989,9 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 				s_in6.sin6_addr = in6addr_loopback;
 			else
 				s_in6.sin6_addr = c->addr6_seen;
+		} else if (!memcmp(addr, &c->dns6_fwd, sizeof(c->dns6_fwd)) &&
+			   ntohs(s_in6.sin6_port) == 53) {
+			s_in6.sin6_addr = c->dns6[0];
 		} else if (IN6_IS_ADDR_LINKLOCAL(&s_in6.sin6_addr)) {
 			bind_to = BIND_LL;
 		}
-- 
@@ -718,6 +718,12 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 					udp_tap_map[V6][src].loopback = 0;
 
 				bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
+			} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd) &&
+				   !memcmp(&b->s_in6.sin6_addr, &c->dns6_fwd,
+					   sizeof(c->dns6_fwd)) &&
+				   ntohs(b->s_in6.sin6_port) == 53) {
+				b->ip6h.daddr = c->addr6_seen;
+				b->ip6h.saddr = c->dns6_fwd;
 			} else {
 				b->ip6h.daddr = c->addr6_seen;
 				b->ip6h.saddr = b->s_in6.sin6_addr;
@@ -797,6 +803,10 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 					udp_tap_map[V4][src].loopback = 1;
 
 				bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
+			} else if (c->dns4_fwd &&
+				   s_addr == ntohl(c->dns4[0]) &&
+				   ntohs(b->s_in.sin_port) == 53) {
+				b->iph.saddr = c->dns4_fwd;
 			} else {
 				b->iph.saddr = b->s_in.sin_addr.s_addr;
 			}
@@ -958,6 +968,9 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
 			else
 				s_in.sin_addr.s_addr = c->addr4_seen;
+		} else if (s_in.sin_addr.s_addr == c->dns4_fwd &&
+			   ntohs(s_in.sin_port) == 53) {
+			s_in.sin_addr.s_addr = c->dns4[0];
 		}
 	} else {
 		s_in6 = (struct sockaddr_in6) {
@@ -976,6 +989,9 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 				s_in6.sin6_addr = in6addr_loopback;
 			else
 				s_in6.sin6_addr = c->addr6_seen;
+		} else if (!memcmp(addr, &c->dns6_fwd, sizeof(c->dns6_fwd)) &&
+			   ntohs(s_in6.sin6_port) == 53) {
+			s_in6.sin6_addr = c->dns6[0];
 		} else if (IN6_IS_ADDR_LINKLOCAL(&s_in6.sin6_addr)) {
 			bind_to = BIND_LL;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/18] udp: Allow loopback connections from host using configured unicast address
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (8 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 09/18] conf, udp: Introduce basic DNS forwarding Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 11/18] tcp, udp: Receive batching doesn't pay off when writing single frames to tap Stefano Brivio
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]

Likely for testing purposes only: allow connections from host to
guest or namespace using, as connection target, the configured,
possibly global unicast address.

In this case, we have to map the destination address to a link-local
address, and for port-based tracked responses, the source address
needs to be again the unicast address: not loopback, not link-local.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 udp.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/udp.c b/udp.c
index 2fc52d3..8129a89 100644
--- a/udp.c
+++ b/udp.c
@@ -125,13 +125,15 @@
  * @sock:	Socket bound to source port used as index
  * @ts:		Activity timestamp from tap, used for socket aging
  * @ts_local:	Timestamp of tap packet to gateway address, aging for local bind
- * @loopback:	Whether local bind should use loopback address as source
+ * @loopback:	Whether local bind maps to loopback address as source
+ * @gua:	Whether local bind maps to configured unicast address as source
  */
 struct udp_tap_port {
 	int sock;
 	time_t ts;
 	time_t ts_local;
 	int loopback;
+	int gua;
 };
 
 /**
@@ -701,10 +703,13 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 				b->ip6h.saddr = b->s_in6.sin6_addr;
 			} else if (IN6_IS_ADDR_LOOPBACK(&b->s_in6.sin6_addr) ||
 				   !memcmp(&b->s_in6.sin6_addr, &c->addr6_seen,
+					   sizeof(c->addr6)) ||
+				   !memcmp(&b->s_in6.sin6_addr, &c->addr6,
 					   sizeof(c->addr6))) {
 				in_port_t src = htons(b->s_in6.sin6_port);
 
 				b->ip6h.daddr = c->addr6_ll_seen;
+
 				if (IN6_IS_ADDR_LINKLOCAL(&c->gw6))
 					b->ip6h.saddr = c->gw6;
 				else
@@ -717,6 +722,12 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 				else
 					udp_tap_map[V6][src].loopback = 0;
 
+				if (!memcmp(&b->s_in6.sin6_addr, &c->addr6,
+						 sizeof(c->addr6)))
+					udp_tap_map[V6][src].gua = 1;
+				else
+					udp_tap_map[V6][src].gua = 0;
+
 				bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
 			} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd) &&
 				   !memcmp(&b->s_in6.sin6_addr, &c->dns6_fwd,
@@ -987,6 +998,8 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 			if (!udp_tap_map[V6][dst].ts_local ||
 			    udp_tap_map[V6][dst].loopback)
 				s_in6.sin6_addr = in6addr_loopback;
+			else if (udp_tap_map[V6][dst].gua)
+				s_in6.sin6_addr = c->addr6;
 			else
 				s_in6.sin6_addr = c->addr6_seen;
 		} else if (!memcmp(addr, &c->dns6_fwd, sizeof(c->dns6_fwd)) &&
@@ -1212,8 +1225,11 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 		if (ts->tv_sec - tp->ts > UDP_CONN_TIMEOUT)
 			s = tp->sock;
 
-		if (ts->tv_sec - tp->ts_local > UDP_CONN_TIMEOUT)
+		if (ts->tv_sec - tp->ts_local > UDP_CONN_TIMEOUT) {
 			tp->ts_local = 0;
+			tp->loopback = 0;
+			tp->gua = 0;
+		}
 
 		break;
 	case UDP_ACT_INIT_CONN:
-- 
@@ -125,13 +125,15 @@
  * @sock:	Socket bound to source port used as index
  * @ts:		Activity timestamp from tap, used for socket aging
  * @ts_local:	Timestamp of tap packet to gateway address, aging for local bind
- * @loopback:	Whether local bind should use loopback address as source
+ * @loopback:	Whether local bind maps to loopback address as source
+ * @gua:	Whether local bind maps to configured unicast address as source
  */
 struct udp_tap_port {
 	int sock;
 	time_t ts;
 	time_t ts_local;
 	int loopback;
+	int gua;
 };
 
 /**
@@ -701,10 +703,13 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 				b->ip6h.saddr = b->s_in6.sin6_addr;
 			} else if (IN6_IS_ADDR_LOOPBACK(&b->s_in6.sin6_addr) ||
 				   !memcmp(&b->s_in6.sin6_addr, &c->addr6_seen,
+					   sizeof(c->addr6)) ||
+				   !memcmp(&b->s_in6.sin6_addr, &c->addr6,
 					   sizeof(c->addr6))) {
 				in_port_t src = htons(b->s_in6.sin6_port);
 
 				b->ip6h.daddr = c->addr6_ll_seen;
+
 				if (IN6_IS_ADDR_LINKLOCAL(&c->gw6))
 					b->ip6h.saddr = c->gw6;
 				else
@@ -717,6 +722,12 @@ void udp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
 				else
 					udp_tap_map[V6][src].loopback = 0;
 
+				if (!memcmp(&b->s_in6.sin6_addr, &c->addr6,
+						 sizeof(c->addr6)))
+					udp_tap_map[V6][src].gua = 1;
+				else
+					udp_tap_map[V6][src].gua = 0;
+
 				bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
 			} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->dns6_fwd) &&
 				   !memcmp(&b->s_in6.sin6_addr, &c->dns6_fwd,
@@ -987,6 +998,8 @@ int udp_tap_handler(struct ctx *c, int af, void *addr,
 			if (!udp_tap_map[V6][dst].ts_local ||
 			    udp_tap_map[V6][dst].loopback)
 				s_in6.sin6_addr = in6addr_loopback;
+			else if (udp_tap_map[V6][dst].gua)
+				s_in6.sin6_addr = c->addr6;
 			else
 				s_in6.sin6_addr = c->addr6_seen;
 		} else if (!memcmp(addr, &c->dns6_fwd, sizeof(c->dns6_fwd)) &&
@@ -1212,8 +1225,11 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 		if (ts->tv_sec - tp->ts > UDP_CONN_TIMEOUT)
 			s = tp->sock;
 
-		if (ts->tv_sec - tp->ts_local > UDP_CONN_TIMEOUT)
+		if (ts->tv_sec - tp->ts_local > UDP_CONN_TIMEOUT) {
 			tp->ts_local = 0;
+			tp->loopback = 0;
+			tp->gua = 0;
+		}
 
 		break;
 	case UDP_ACT_INIT_CONN:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/18] tcp, udp: Receive batching doesn't pay off when writing single frames to tap
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (9 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 10/18] udp: Allow loopback connections from host using configured unicast address Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 12/18] pasta: By default, quit if filesystem-bound net namespace goes away Stefano Brivio
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 8353 bytes --]

In pasta mode, when we get data from sockets and write it as single
frames to the tap device, we batch receive operations considerably,
and then (conceptually) split the data in many smaller writes.

It looked like an obvious choice, but performance is actually better
if we receive data in many small frame-sized recvmsg()/recvmmsg().

The syscall overhead with the previous behaviour, observed by perf,
comes predominantly from write operations, but receiving data in
shorter chunks probably improves cache locality by a considerable
amount.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 tcp.c | 36 ++++++++++++++++++++----------------
 udp.c | 33 +++++++++++++++++----------------
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/tcp.c b/tcp.c
index e4fac22..a3a9dfd 100644
--- a/tcp.c
+++ b/tcp.c
@@ -343,7 +343,9 @@
 #define MAX_TAP_CONNS			(128 * 1024)
 #define MAX_SPLICE_CONNS		(128 * 1024)
 
-#define TCP_TAP_FRAMES			256
+#define TCP_TAP_FRAMES_MEM		256
+#define TCP_TAP_FRAMES							\
+	(c->mode == MODE_PASST ? TCP_TAP_FRAMES_MEM : 1)
 
 #define MAX_PIPE_SIZE			(2UL * 1024 * 1024)
 
@@ -609,7 +611,7 @@ static struct tcp4_l2_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-tcp4_l2_buf[TCP_TAP_FRAMES];
+tcp4_l2_buf[TCP_TAP_FRAMES_MEM];
 
 static unsigned int tcp4_l2_buf_used;
 static size_t tcp4_l2_buf_bytes;
@@ -640,21 +642,21 @@ struct tcp6_l2_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-tcp6_l2_buf[TCP_TAP_FRAMES];
+tcp6_l2_buf[TCP_TAP_FRAMES_MEM];
 
 static unsigned int tcp6_l2_buf_used;
 static size_t tcp6_l2_buf_bytes;
 
 /* recvmsg()/sendmsg() data for tap */
 static char 		tcp_buf_discard		[MAX_WINDOW];
-static struct iovec	iov_sock		[TCP_TAP_FRAMES + 1];
+static struct iovec	iov_sock		[TCP_TAP_FRAMES_MEM + 1];
 
-static struct iovec	tcp4_l2_iov_tap		[TCP_TAP_FRAMES];
-static struct iovec	tcp6_l2_iov_tap		[TCP_TAP_FRAMES];
-static struct iovec	tcp4_l2_flags_iov_tap	[TCP_TAP_FRAMES];
-static struct iovec	tcp6_l2_flags_iov_tap	[TCP_TAP_FRAMES];
+static struct iovec	tcp4_l2_iov_tap		[TCP_TAP_FRAMES_MEM];
+static struct iovec	tcp6_l2_iov_tap		[TCP_TAP_FRAMES_MEM];
+static struct iovec	tcp4_l2_flags_iov_tap	[TCP_TAP_FRAMES_MEM];
+static struct iovec	tcp6_l2_flags_iov_tap	[TCP_TAP_FRAMES_MEM];
 
-static struct mmsghdr	tcp_l2_mh_tap		[TCP_TAP_FRAMES];
+static struct mmsghdr	tcp_l2_mh_tap		[TCP_TAP_FRAMES_MEM];
 
 /* sendmsg() to socket */
 static struct iovec	tcp_tap_iov		[UIO_MAXIOV];
@@ -688,7 +690,7 @@ static struct tcp4_l2_flags_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-tcp4_l2_flags_buf[TCP_TAP_FRAMES];
+tcp4_l2_flags_buf[TCP_TAP_FRAMES_MEM];
 
 static int tcp4_l2_flags_buf_used;
 
@@ -717,7 +719,7 @@ static struct tcp6_l2_flags_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-tcp6_l2_flags_buf[TCP_TAP_FRAMES];
+tcp6_l2_flags_buf[TCP_TAP_FRAMES_MEM];
 
 static int tcp6_l2_flags_buf_used;
 
@@ -916,7 +918,7 @@ void tcp_update_l2_buf(unsigned char *eth_d, unsigned char *eth_s,
 {
 	int i;
 
-	for (i = 0; i < TCP_TAP_FRAMES; i++) {
+	for (i = 0; i < TCP_TAP_FRAMES_MEM; i++) {
 		struct tcp4_l2_flags_buf_t *b4f = &tcp4_l2_flags_buf[i];
 		struct tcp6_l2_flags_buf_t *b6f = &tcp6_l2_flags_buf[i];
 		struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i];
@@ -982,12 +984,13 @@ static void tcp_sock4_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp4_l2_iov_tap; i < TCP_TAP_FRAMES; i++, iov++) {
+	for (i = 0, iov = tcp4_l2_iov_tap; i < TCP_TAP_FRAMES_MEM; i++, iov++) {
 		iov->iov_base = &tcp4_l2_buf[i].vnet_len;
 		iov->iov_len = MSS_DEFAULT;
 	}
 
-	for (i = 0, iov = tcp4_l2_flags_iov_tap; i < TCP_TAP_FRAMES; i++, iov++)
+	for (i = 0, iov = tcp4_l2_flags_iov_tap; i < TCP_TAP_FRAMES_MEM;
+	     i++, iov++)
 		iov->iov_base = &tcp4_l2_flags_buf[i].vnet_len;
 }
 
@@ -1015,12 +1018,13 @@ static void tcp_sock6_iov_init(void)
 		};
 	}
 
-	for (i = 0, iov = tcp6_l2_iov_tap; i < TCP_TAP_FRAMES; i++, iov++) {
+	for (i = 0, iov = tcp6_l2_iov_tap; i < TCP_TAP_FRAMES_MEM; i++, iov++) {
 		iov->iov_base = &tcp6_l2_buf[i].vnet_len;
 		iov->iov_len = MSS_DEFAULT;
 	}
 
-	for (i = 0, iov = tcp6_l2_flags_iov_tap; i < TCP_TAP_FRAMES; i++, iov++)
+	for (i = 0, iov = tcp6_l2_flags_iov_tap; i < TCP_TAP_FRAMES_MEM;
+	     i++, iov++)
 		iov->iov_base = &tcp6_l2_flags_buf[i].vnet_len;
 }
 
diff --git a/udp.c b/udp.c
index 8129a89..d4f3714 100644
--- a/udp.c
+++ b/udp.c
@@ -118,7 +118,8 @@
 
 #define UDP_CONN_TIMEOUT	180 /* s, timeout for ephemeral or local bind */
 #define UDP_SPLICE_FRAMES	128
-#define UDP_TAP_FRAMES		128
+#define UDP_TAP_FRAMES_MEM	128
+#define UDP_TAP_FRAMES		(c->mode == MODE_PASST ? UDP_TAP_FRAMES_MEM : 1)
 
 /**
  * struct udp_tap_port - Port tracking based on tap-facing source port
@@ -204,7 +205,7 @@ static struct udp4_l2_buf_t {
 	uint8_t data[USHRT_MAX -
 		     (sizeof(struct iphdr) + sizeof(struct udphdr))];
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
-udp4_l2_buf[UDP_TAP_FRAMES];
+udp4_l2_buf[UDP_TAP_FRAMES_MEM];
 
 /**
  * udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -234,23 +235,23 @@ struct udp6_l2_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-udp6_l2_buf[UDP_TAP_FRAMES];
+udp6_l2_buf[UDP_TAP_FRAMES_MEM];
 
 static struct sockaddr_storage udp_splice_namebuf;
 static uint8_t udp_splice_buf[UDP_SPLICE_FRAMES][USHRT_MAX];
 
 /* recvmmsg()/sendmmsg() data for tap */
-static struct iovec	udp4_l2_iov_sock	[UDP_TAP_FRAMES];
-static struct iovec	udp6_l2_iov_sock	[UDP_TAP_FRAMES];
+static struct iovec	udp4_l2_iov_sock	[UDP_TAP_FRAMES_MEM];
+static struct iovec	udp6_l2_iov_sock	[UDP_TAP_FRAMES_MEM];
 
-static struct iovec	udp4_l2_iov_tap		[UDP_TAP_FRAMES];
-static struct iovec	udp6_l2_iov_tap		[UDP_TAP_FRAMES];
+static struct iovec	udp4_l2_iov_tap		[UDP_TAP_FRAMES_MEM];
+static struct iovec	udp6_l2_iov_tap		[UDP_TAP_FRAMES_MEM];
 
-static struct mmsghdr	udp4_l2_mh_sock		[UDP_TAP_FRAMES];
-static struct mmsghdr	udp6_l2_mh_sock		[UDP_TAP_FRAMES];
+static struct mmsghdr	udp4_l2_mh_sock		[UDP_TAP_FRAMES_MEM];
+static struct mmsghdr	udp6_l2_mh_sock		[UDP_TAP_FRAMES_MEM];
 
-static struct mmsghdr	udp4_l2_mh_tap		[UDP_TAP_FRAMES];
-static struct mmsghdr	udp6_l2_mh_tap		[UDP_TAP_FRAMES];
+static struct mmsghdr	udp4_l2_mh_tap		[UDP_TAP_FRAMES_MEM];
+static struct mmsghdr	udp6_l2_mh_tap		[UDP_TAP_FRAMES_MEM];
 
 /* recvmmsg()/sendmmsg() data for "spliced" connections */
 static struct iovec	udp_splice_iov_recv	[UDP_SPLICE_FRAMES];
@@ -310,7 +311,7 @@ void udp_update_l2_buf(unsigned char *eth_d, unsigned char *eth_s,
 {
 	int i;
 
-	for (i = 0; i < UDP_TAP_FRAMES; i++) {
+	for (i = 0; i < UDP_TAP_FRAMES_MEM; i++) {
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
@@ -354,7 +355,7 @@ static void udp_sock4_iov_init(void)
 		};
 	}
 
-	for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		mh->msg_name			= &udp4_l2_buf[i].s_in;
@@ -366,7 +367,7 @@ static void udp_sock4_iov_init(void)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		udp4_l2_iov_tap[i].iov_base	= &udp4_l2_buf[i].vnet_len;
@@ -394,7 +395,7 @@ static void udp_sock6_iov_init(void)
 		};
 	}
 
-	for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		mh->msg_name			= &udp6_l2_buf[i].s_in6;
@@ -406,7 +407,7 @@ static void udp_sock6_iov_init(void)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		udp6_l2_iov_tap[i].iov_base	= &udp6_l2_buf[i].vnet_len;
-- 
@@ -118,7 +118,8 @@
 
 #define UDP_CONN_TIMEOUT	180 /* s, timeout for ephemeral or local bind */
 #define UDP_SPLICE_FRAMES	128
-#define UDP_TAP_FRAMES		128
+#define UDP_TAP_FRAMES_MEM	128
+#define UDP_TAP_FRAMES		(c->mode == MODE_PASST ? UDP_TAP_FRAMES_MEM : 1)
 
 /**
  * struct udp_tap_port - Port tracking based on tap-facing source port
@@ -204,7 +205,7 @@ static struct udp4_l2_buf_t {
 	uint8_t data[USHRT_MAX -
 		     (sizeof(struct iphdr) + sizeof(struct udphdr))];
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
-udp4_l2_buf[UDP_TAP_FRAMES];
+udp4_l2_buf[UDP_TAP_FRAMES_MEM];
 
 /**
  * udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -234,23 +235,23 @@ struct udp6_l2_buf_t {
 #else
 } __attribute__ ((packed, aligned(__alignof__(unsigned int))))
 #endif
-udp6_l2_buf[UDP_TAP_FRAMES];
+udp6_l2_buf[UDP_TAP_FRAMES_MEM];
 
 static struct sockaddr_storage udp_splice_namebuf;
 static uint8_t udp_splice_buf[UDP_SPLICE_FRAMES][USHRT_MAX];
 
 /* recvmmsg()/sendmmsg() data for tap */
-static struct iovec	udp4_l2_iov_sock	[UDP_TAP_FRAMES];
-static struct iovec	udp6_l2_iov_sock	[UDP_TAP_FRAMES];
+static struct iovec	udp4_l2_iov_sock	[UDP_TAP_FRAMES_MEM];
+static struct iovec	udp6_l2_iov_sock	[UDP_TAP_FRAMES_MEM];
 
-static struct iovec	udp4_l2_iov_tap		[UDP_TAP_FRAMES];
-static struct iovec	udp6_l2_iov_tap		[UDP_TAP_FRAMES];
+static struct iovec	udp4_l2_iov_tap		[UDP_TAP_FRAMES_MEM];
+static struct iovec	udp6_l2_iov_tap		[UDP_TAP_FRAMES_MEM];
 
-static struct mmsghdr	udp4_l2_mh_sock		[UDP_TAP_FRAMES];
-static struct mmsghdr	udp6_l2_mh_sock		[UDP_TAP_FRAMES];
+static struct mmsghdr	udp4_l2_mh_sock		[UDP_TAP_FRAMES_MEM];
+static struct mmsghdr	udp6_l2_mh_sock		[UDP_TAP_FRAMES_MEM];
 
-static struct mmsghdr	udp4_l2_mh_tap		[UDP_TAP_FRAMES];
-static struct mmsghdr	udp6_l2_mh_tap		[UDP_TAP_FRAMES];
+static struct mmsghdr	udp4_l2_mh_tap		[UDP_TAP_FRAMES_MEM];
+static struct mmsghdr	udp6_l2_mh_tap		[UDP_TAP_FRAMES_MEM];
 
 /* recvmmsg()/sendmmsg() data for "spliced" connections */
 static struct iovec	udp_splice_iov_recv	[UDP_SPLICE_FRAMES];
@@ -310,7 +311,7 @@ void udp_update_l2_buf(unsigned char *eth_d, unsigned char *eth_s,
 {
 	int i;
 
-	for (i = 0; i < UDP_TAP_FRAMES; i++) {
+	for (i = 0; i < UDP_TAP_FRAMES_MEM; i++) {
 		struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
 		struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
 
@@ -354,7 +355,7 @@ static void udp_sock4_iov_init(void)
 		};
 	}
 
-	for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		mh->msg_name			= &udp4_l2_buf[i].s_in;
@@ -366,7 +367,7 @@ static void udp_sock4_iov_init(void)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		udp4_l2_iov_tap[i].iov_base	= &udp4_l2_buf[i].vnet_len;
@@ -394,7 +395,7 @@ static void udp_sock6_iov_init(void)
 		};
 	}
 
-	for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		mh->msg_name			= &udp6_l2_buf[i].s_in6;
@@ -406,7 +407,7 @@ static void udp_sock6_iov_init(void)
 		mh->msg_iovlen			= 1;
 	}
 
-	for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES; i++, h++) {
+	for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
 		struct msghdr *mh = &h->msg_hdr;
 
 		udp6_l2_iov_tap[i].iov_base	= &udp6_l2_buf[i].vnet_len;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/18] pasta: By default, quit if filesystem-bound net namespace goes away
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (10 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 11/18] tcp, udp: Receive batching doesn't pay off when writing single frames to tap Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 13/18] test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04 Stefano Brivio
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 8483 bytes --]

This should be convenient for users managing filesystem-bound network
namespaces: monitor the base directory of the namespace and exit if
the namespace given as PATH or NAME target is deleted. We can't add
an inotify watch directly on the namespace directory, that won't work
with nsfs.

Add an option to disable this behaviour, --no-netns-quit.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 Makefile |  3 ++-
 conf.c   | 43 +++++++++++++++++++++++++++++++++----------
 passt.1  |  5 +++++
 passt.c  |  7 ++++++-
 passt.h  |  7 +++++++
 pasta.c  | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 pasta.h  |  2 ++
 7 files changed, 107 insertions(+), 12 deletions(-)

diff --git a/Makefile b/Makefile
index 8477cf0..28ef316 100644
--- a/Makefile
+++ b/Makefile
@@ -153,6 +153,7 @@ pkgs:
 # - android-cloexec-pipe
 # - android-cloexec-pipe2
 # - android-cloexec-epoll-create1
+# - android-cloexec-inotify-init1
 #	TODO: check, fix except for the few cases where we need to share fds
 #
 # - bugprone-narrowing-conversions
@@ -197,7 +198,7 @@ clang-tidy: $(wildcard *.c) $(wildcard *.h)
 	-cppcoreguidelines-avoid-magic-numbers,\
 	-readability-isolate-declaration,\
 	-android-cloexec-open,-android-cloexec-pipe,-android-cloexec-pipe2,\
-	-android-cloexec-epoll-create1,\
+	-android-cloexec-epoll-create1,-android-cloexec-inotify-init1,\
 	-bugprone-narrowing-conversions,\
 	-cppcoreguidelines-narrowing-conversions,\
 	-cppcoreguidelines-avoid-non-const-global-variables,\
diff --git a/conf.c b/conf.c
index 21e9bc0..9851575 100644
--- a/conf.c
+++ b/conf.c
@@ -20,6 +20,7 @@
 #include <sched.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <libgen.h>
 #include <limits.h>
 #include <stdlib.h>
 #include <stdint.h>
@@ -414,20 +415,34 @@ static int conf_ns_opt(struct ctx *c,
 
 		nfd = open(netns, O_RDONLY);
 
-		if (nfd >= 0 && (ufd >= 0 || c->netns_only)) {
-			c->pasta_netns_fd = nfd;
-			c->pasta_userns_fd = ufd;
+		if (nfd == -1 || (ufd == -1 && !c->netns_only)) {
+			if (nfd >= 0)
+				close(nfd);
 
-			NS_CALL(conf_ns_check, c);
-			if (c->pasta_netns_fd >= 0)
-				return 0;
+			if (ufd >= 0)
+				close(ufd);
+
+			continue;
 		}
 
-		if (nfd >= 0)
-			close(nfd);
+		c->pasta_netns_fd = nfd;
+		c->pasta_userns_fd = ufd;
+
+		NS_CALL(conf_ns_check, c);
+
+		if (c->pasta_netns_fd >= 0) {
+			char buf[PATH_MAX];
+
+			if (try == 0 || c->no_netns_quit)
+				return 0;
+
+			strncpy(buf, netns, PATH_MAX);
+			strncpy(c->netns_base, basename(buf), PATH_MAX - 1);
+			strncpy(buf, netns, PATH_MAX);
+			strncpy(c->netns_dir, dirname(buf), PATH_MAX - 1);
 
-		if (ufd >= 0)
-			close(ufd);
+			return 0;
+		}
 	}
 
 	c->netns_only = netns_only_reset;
@@ -813,6 +828,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"dhcp-search", no_argument,		NULL,		7 },
 		{"no-dhcp-search", no_argument,		NULL,		8 },
 		{"dns-forward",	required_argument,	NULL,		9 },
+		{"no-netns-quit", no_argument,		NULL,		10 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -937,6 +953,13 @@ void conf(struct ctx *c, int argc, char **argv)
 			err("Invalid DNS forwarding address: %s", optarg);
 			usage(argv[0]);
 			break;
+		case 10:
+			if (c->mode != MODE_PASTA) {
+				err("--no-netns-quit is for pasta mode only");
+				usage(argv[0]);
+			}
+			c->no_netns_quit = 1;
+			break;
 		case 'd':
 			if (c->debug) {
 				err("Multiple --debug options given");
diff --git a/passt.1 b/passt.1
index 7070a31..485e1db 100644
--- a/passt.1
+++ b/passt.1
@@ -426,6 +426,11 @@ Join only a target network namespace, not a user namespace, and don't create one
 for sandboxing purposes either. This is implied if PATH or NAME are given
 without \-\-userns.
 
+.TP
+.BR \-\-no-netns-quit
+If the target network namespace is bound to the filesystem (that is, if PATH or
+NAME are given as target), do not exit once the network namespace is deleted.
+
 .TP
 .BR \-\-nsrun-dir " " \fIpath
 Directory for nsfs mountpoints, used as path prefix for names of namespaces.
diff --git a/passt.c b/passt.c
index 67ad1c7..36f0161 100644
--- a/passt.c
+++ b/passt.c
@@ -301,7 +301,7 @@ void exit_handler(int signal)
  */
 int main(int argc, char **argv)
 {
-	int nfds, i, devnull_fd = -1, pidfile_fd = -1;
+	int nfds, i, devnull_fd = -1, pidfile_fd = -1, quit_fd;
 	struct epoll_event events[EPOLL_EVENTS];
 	struct ctx c = { 0 };
 	struct rlimit limit;
@@ -357,6 +357,8 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
+	quit_fd = pasta_netns_quit_init(&c);
+
 	if (getrlimit(RLIMIT_NOFILE, &limit)) {
 		perror("getrlimit");
 		exit(EXIT_FAILURE);
@@ -416,6 +418,7 @@ int main(int argc, char **argv)
 	seccomp(&c);
 
 	timer_init(&c, &now);
+
 loop:
 	nfds = epoll_wait(c.epollfd, events, EPOLL_EVENTS, TIMER_INTERVAL);
 	if (nfds == -1 && errno != EINTR) {
@@ -431,6 +434,8 @@ loop:
 
 		if (fd == c.fd_tap || fd == c.fd_tap_listen)
 			tap_handler(&c, fd, events[i].events, &now);
+		else if (fd == quit_fd)
+			pasta_netns_quit_handler(&c, fd);
 		else
 			sock_handler(&c, ref, events[i].events, &now);
 	}
diff --git a/passt.h b/passt.h
index 2589ee7..042f760 100644
--- a/passt.h
+++ b/passt.h
@@ -101,6 +101,9 @@ enum passt_modes {
  * @pasta_netns_fd:	File descriptor for network namespace in pasta mode
  * @pasta_userns_fd:	Descriptor for user namespace to join, -1 once joined
  * @netns_only:		In pasta mode, don't join or create a user namespace
+ * @no_netns_quit:	In pasta mode, don't exit if fs-bound namespace is gone
+ * @netns_base:		Base name for fs-bound namespace, if any, in pasta mode
+ * @netns_dir:		Directory of fs-bound namespace, if any, in pasta mode
  * @proc_net_tcp:	Stored handles for /proc/net/tcp{,6} in init and ns
  * @proc_net_udp:	Stored handles for /proc/net/udp{,6} in init and ns
  * @epollfd:		File descriptor for epoll instance
@@ -161,6 +164,10 @@ struct ctx {
 	int pasta_userns_fd;
 	int netns_only;
 
+	int no_netns_quit;
+	char netns_base[PATH_MAX];
+	char netns_dir[PATH_MAX];
+
 	int proc_net_tcp[IP_VERSIONS][2];
 	int proc_net_udp[IP_VERSIONS][2];
 
diff --git a/pasta.c b/pasta.c
index 972cbcf..e45cc92 100644
--- a/pasta.c
+++ b/pasta.c
@@ -24,6 +24,8 @@
 #include <stdint.h>
 #include <unistd.h>
 #include <syslog.h>
+#include <sys/epoll.h>
+#include <sys/inotify.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -219,3 +221,53 @@ void pasta_ns_conf(struct ctx *c)
 
 	proto_update_l2_buf(c->mac_guest, NULL, NULL);
 }
+
+/**
+ * pasta_netns_quit_init() - Watch network namespace to quit once it's gone
+ * @c:		Execution context
+ *
+ * Return: inotify file descriptor, -1 on failure or if not needed/applicable
+ */
+int pasta_netns_quit_init(struct ctx *c)
+{
+	struct epoll_event ev = { .events = EPOLLIN };
+	int inotify_fd;
+
+	if (c->mode != MODE_PASTA || c->no_netns_quit || !*c->netns_base)
+		return -1;
+
+	if ((inotify_fd = inotify_init1(O_NONBLOCK)) < 0) {
+		perror("inotify_init(): won't quit once netns is gone");
+		return -1;
+	}
+
+	if (inotify_add_watch(inotify_fd, c->netns_dir, IN_DELETE) < 0) {
+		perror("inotify_add_watch(): won't quit once netns is gone");
+		return -1;
+	}
+
+	ev.data.fd = inotify_fd;
+	epoll_ctl(c->epollfd, EPOLL_CTL_ADD, inotify_fd, &ev);
+
+	return inotify_fd;
+}
+
+/**
+ * pasta_netns_quit_handler() - Handle ns directory events, exit if ns is gone
+ * @c:		Execution context
+ * @inotify_fd:	inotify file descriptor with watch on namespace directory
+ */
+void pasta_netns_quit_handler(struct ctx *c, int inotify_fd)
+{
+	char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
+	struct inotify_event *in_ev = (struct inotify_event *)buf;
+
+	if (read(inotify_fd, buf, sizeof(buf)) < (ssize_t)sizeof(*in_ev))
+		return;
+
+	if (strncmp(in_ev->name, c->netns_base, sizeof(c->netns_base)))
+		return;
+
+	info("Namespace %s is gone, exiting", c->netns_base);
+	exit(EXIT_SUCCESS);
+}
diff --git a/pasta.h b/pasta.h
index 1fcd6a9..235bfb9 100644
--- a/pasta.h
+++ b/pasta.h
@@ -6,3 +6,5 @@
 void pasta_start_ns(struct ctx *c);
 void pasta_ns_conf(struct ctx *c);
 void pasta_child_handler(int signal);
+int pasta_netns_quit_init(struct ctx *c);
+void pasta_netns_quit_handler(struct ctx *c, int inotify_fd);
-- 
@@ -6,3 +6,5 @@
 void pasta_start_ns(struct ctx *c);
 void pasta_ns_conf(struct ctx *c);
 void pasta_child_handler(int signal);
+int pasta_netns_quit_init(struct ctx *c);
+void pasta_netns_quit_handler(struct ctx *c, int inotify_fd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/18] test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (11 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 12/18] pasta: By default, quit if filesystem-bound net namespace goes away Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 14/18] test/perf/passt_udp: Drop threshold for 256B test Stefano Brivio
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

Removing the needrestart package doesn't seem to work anymore, and
I'm getting again prompts to restart services after installing gcc
and make: export DEBIAN_FRONTEND=noninteractive before installing
packages to avoid that.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 test/distro/ubuntu | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/distro/ubuntu b/test/distro/ubuntu
index b67c1f3..781daab 100644
--- a/test/distro/ubuntu
+++ b/test/distro/ubuntu
@@ -187,6 +187,7 @@ host	guestfish --rw -a __IMG__ -i copy-in __GUEST_FILES__ /root/
 host	./qrap 5 qemu-system-s390x -m 2048 -smp 2 -serial stdio -nodefaults -nographic __IMG__ -net socket,fd=5 -net nic,model=virtio -device virtio-rng-ccw
 
 host	service systemd-resolved stop
+host	export DEBIAN_FRONTEND=noninteractive
 host	apt-get -y remove needrestart snapd
 host	dhclient
 sleep	2
-- 
@@ -187,6 +187,7 @@ host	guestfish --rw -a __IMG__ -i copy-in __GUEST_FILES__ /root/
 host	./qrap 5 qemu-system-s390x -m 2048 -smp 2 -serial stdio -nodefaults -nographic __IMG__ -net socket,fd=5 -net nic,model=virtio -device virtio-rng-ccw
 
 host	service systemd-resolved stop
+host	export DEBIAN_FRONTEND=noninteractive
 host	apt-get -y remove needrestart snapd
 host	dhclient
 sleep	2
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/18] test/perf/passt_udp: Drop threshold for 256B test
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (12 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 13/18] test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04 Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 15/18] man page: Update REPORTING BUGS section Stefano Brivio
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

That test fails sometimes, it looks like iperf3 is still sending
initial messages that are too big. I'll need to figure out why,
but given that 256 bytes is not really an expected MTU, drop the
thresholds to zero for the moment being.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 test/perf/passt_udp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/test/perf/passt_udp b/test/perf/passt_udp
index 349f429..ff4c73a 100644
--- a/test/perf/passt_udp
+++ b/test/perf/passt_udp
@@ -77,7 +77,7 @@ tr	UDP throughput over IPv4: guest to host
 guest	ip link set dev __IFNAME__ mtu 256
 iperf3c	guest __GW__ 100${i}2 __THREADS__ __OPTS__ -b 500M
 iperf3s	BW ns 100${i}2 __THREADS__
-bw	__BW__ 0.1 0.2
+bw	__BW__ 0.0 0.0
 guest	ip link set dev __IFNAME__ mtu 576
 iperf3c	guest __GW__ 100${i}2 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW ns 100${i}2 __THREADS__
@@ -146,7 +146,7 @@ tr	UDP throughput over IPv4: host to guest
 ns	ip link set dev lo mtu 256
 iperf3c	ns 127.0.0.1 100${i}1 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW guest 100${i}1 __THREADS__
-bw	__BW__ 0.1 0.2
+bw	__BW__ 0.0 0.0
 ns	ip link set dev lo mtu 576
 iperf3c	ns 127.0.0.1 100${i}1 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW guest 100${i}1 __THREADS__
-- 
@@ -77,7 +77,7 @@ tr	UDP throughput over IPv4: guest to host
 guest	ip link set dev __IFNAME__ mtu 256
 iperf3c	guest __GW__ 100${i}2 __THREADS__ __OPTS__ -b 500M
 iperf3s	BW ns 100${i}2 __THREADS__
-bw	__BW__ 0.1 0.2
+bw	__BW__ 0.0 0.0
 guest	ip link set dev __IFNAME__ mtu 576
 iperf3c	guest __GW__ 100${i}2 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW ns 100${i}2 __THREADS__
@@ -146,7 +146,7 @@ tr	UDP throughput over IPv4: host to guest
 ns	ip link set dev lo mtu 256
 iperf3c	ns 127.0.0.1 100${i}1 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW guest 100${i}1 __THREADS__
-bw	__BW__ 0.1 0.2
+bw	__BW__ 0.0 0.0
 ns	ip link set dev lo mtu 576
 iperf3c	ns 127.0.0.1 100${i}1 __THREADS__ __OPTS__ -b 1G
 iperf3s	BW guest 100${i}1 __THREADS__
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 15/18] man page: Update REPORTING BUGS section
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (13 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 14/18] test/perf/passt_udp: Drop threshold for 256B test Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 16/18] README, hooks: Build HTML man page on push, add a link Stefano Brivio
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 passt.1 | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/passt.1 b/passt.1
index 485e1db..65b473b 100644
--- a/passt.1
+++ b/passt.1
@@ -1,5 +1,5 @@
 .\" SPDX-License-Identifier: AGPL-3.0-or-later
-.\" Copyright (c) 2020-2021 Red Hat GmbH
+.\" Copyright (c) 2020-2022 Red Hat GmbH
 .\" Author: Stefano Brivio <sbrivio(a)redhat.com>
 .TH passt 1
 
@@ -781,12 +781,13 @@ Stefano Brivio <sbrivio(a)redhat.com>
 
 .SH REPORTING BUGS
 
-No public bug tracker is available at this time. For the moment being, report
-issues to Stefano Brivio <sbrivio(a)redhat.com>.
+Please report issues on the bug tracker at https://passt.top/passt/bugs, or
+send a message to the passt-user(a)passt.top mailing list, see
+https://passt.top/passt/lists.
 
 .SH COPYRIGHT
 
-Copyright (c) 2020-2021 Red Hat GmbH.
+Copyright (c) 2020-2022 Red Hat GmbH.
 
 \fBpasst\fR and \fBpasta\fR are free software: you can redistribute them and/or
 modify them under the terms of the GNU Affero General Public License as
-- 
@@ -1,5 +1,5 @@
 .\" SPDX-License-Identifier: AGPL-3.0-or-later
-.\" Copyright (c) 2020-2021 Red Hat GmbH
+.\" Copyright (c) 2020-2022 Red Hat GmbH
 .\" Author: Stefano Brivio <sbrivio(a)redhat.com>
 .TH passt 1
 
@@ -781,12 +781,13 @@ Stefano Brivio <sbrivio(a)redhat.com>
 
 .SH REPORTING BUGS
 
-No public bug tracker is available at this time. For the moment being, report
-issues to Stefano Brivio <sbrivio(a)redhat.com>.
+Please report issues on the bug tracker at https://passt.top/passt/bugs, or
+send a message to the passt-user(a)passt.top mailing list, see
+https://passt.top/passt/lists.
 
 .SH COPYRIGHT
 
-Copyright (c) 2020-2021 Red Hat GmbH.
+Copyright (c) 2020-2022 Red Hat GmbH.
 
 \fBpasst\fR and \fBpasta\fR are free software: you can redistribute them and/or
 modify them under the terms of the GNU Affero General Public License as
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 16/18] README, hooks: Build HTML man page on push, add a link
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (14 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 15/18] man page: Update REPORTING BUGS section Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 17/18] contrib: Add patch for Podman integration Stefano Brivio
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 README.md      | 2 ++
 hooks/pre-push | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/README.md b/README.md
index 1c8baf3..51cc870 100644
--- a/README.md
+++ b/README.md
@@ -128,6 +128,8 @@ for TCP and UDP, respectively.
 - [Contribute](#contribute)
 - [Security and Vulnerability Reports](#security-and-vulnerability-reports)
 
+See also the [man page](/builds/latest/web/passt.1.html).
+
 ## Motivation
 
 ### passt
diff --git a/hooks/pre-push b/hooks/pre-push
index a5e4790..0498b0a 100755
--- a/hooks/pre-push
+++ b/hooks/pre-push
@@ -45,6 +45,9 @@ cd ..
 make static
 scp passt pasta qrap passt.1 pasta.1 qrap.1	"${USER_HOST}:${BIN}"
 
+man2html -M "/" passt.1 > passt.1.html
+scp passt.1.html				"${USER_HOST}:${WEB}/"
+
 make pkgs
 ssh "${USER_HOST}" 				"rm -f ${BIN}/*.deb"
 ssh "${USER_HOST}"				"rm -f ${BIN}/*.rpm"
-- 
@@ -45,6 +45,9 @@ cd ..
 make static
 scp passt pasta qrap passt.1 pasta.1 qrap.1	"${USER_HOST}:${BIN}"
 
+man2html -M "/" passt.1 > passt.1.html
+scp passt.1.html				"${USER_HOST}:${WEB}/"
+
 make pkgs
 ssh "${USER_HOST}" 				"rm -f ${BIN}/*.deb"
 ssh "${USER_HOST}"				"rm -f ${BIN}/*.rpm"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 17/18] contrib: Add patch for Podman integration
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (15 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 16/18] README, hooks: Build HTML man page on push, add a link Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  1:34 ` [PATCH 18/18] test: Add demo for Podman with pasta Stefano Brivio
  2022-02-22  9:07 ` [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 39012 bytes --]

The patch introduces a "pasta" networking mode for rootless
container, similar to the existing slirp4netns mode. Notable
differences are described in the commit message.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 ...001-libpod-Add-pasta-networking-mode.patch | 542 ++++++++++++++++++
 1 file changed, 542 insertions(+)
 create mode 100644 contrib/podman/0001-libpod-Add-pasta-networking-mode.patch

diff --git a/contrib/podman/0001-libpod-Add-pasta-networking-mode.patch b/contrib/podman/0001-libpod-Add-pasta-networking-mode.patch
new file mode 100644
index 0000000..98cb48b
--- /dev/null
+++ b/contrib/podman/0001-libpod-Add-pasta-networking-mode.patch
@@ -0,0 +1,542 @@
+From bcfd618a316097e5d2e1a20703b11beeb21b6899 Mon Sep 17 00:00:00 2001
+From: Stefano Brivio <sbrivio(a)redhat.com>
+Date: Sat, 19 Feb 2022 04:54:09 +0100
+Subject: [PATCH] libpod: Add pasta networking mode
+
+Conceptually equivalent to networking by means of slirp4netns(1),
+with a few practical differences:
+
+- pasta(1) forks to background once networking is configured in the
+  namespace and quits on its own once the namespace is deleted:
+  file descriptor synchronisation and PID tracking are not needed
+
+- port forwarding is configured via command line options at start-up,
+  instead of an API socket: this is taken care of right away as we're
+  about to start pasta
+
+- there's no need for further selection of port forwarding modes:
+  pasta behaves similarly to containers-rootlessport for local binds
+  (splice() instead of read()/write() pairs, without L2-L4
+  translation), and keeps the original source address for non-local
+  connections like slirp4netns does
+
+- IPv6 is enabled by default, it's not an experimental feature. It
+  can be disabled using additional options as documented
+
+- by default, addresses and routes are copied from the host, that is,
+  container users will see the same IP address and routes as if they
+  were in the init namespace context. The interface name is also
+  sourced from the host upstream interface with the first default
+  route in the routing table. This is also configurable as documented
+
+- by default, the host is reachable using the gateway address from
+  the container, unless the --no-map-gw option is passed
+
+- sandboxing and seccomp(2) policies cannot be disabled
+
+See https://passt.top for more details about pasta.
+
+Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
+---
+SPDX-FileCopyrightText: 2021-2022 Red Hat GmbH <sbrivio(a)redhat.com>
+SPDX-License-Identifier: Apache-2.0
+
+ docs/source/markdown/podman-create.1.md     | 40 ++++++++++++-
+ docs/source/markdown/podman-pod-create.1.md | 33 +++++++++++
+ docs/source/markdown/podman-run.1.md        | 38 +++++++++++-
+ docs/source/markdown/podman.1.md            |  6 +-
+ libpod/networking_linux.go                  |  6 +-
+ libpod/networking_pasta.go                  | 64 +++++++++++++++++++++
+ pkg/namespaces/namespaces.go                |  6 ++
+ pkg/specgen/generate/namespaces.go          | 10 ++++
+ pkg/specgen/generate/pod_create.go          |  6 ++
+ pkg/specgen/namespaces.go                   | 18 +++++-
+ pkg/specgen/podspecgen.go                   |  2 +-
+ 11 files changed, 215 insertions(+), 14 deletions(-)
+ create mode 100644 libpod/networking_pasta.go
+
+diff --git a/docs/source/markdown/podman-create.1.md b/docs/source/markdown/podman-create.1.md
+index 2a0f3b738..5cc03bff3 100644
+--- a/docs/source/markdown/podman-create.1.md
++++ b/docs/source/markdown/podman-create.1.md
+@@ -699,12 +699,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -718,6 +725,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the container, setting the alias for all networks that the container joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -1551,8 +1582,9 @@ In order for users to run rootless, there must be an entry for their username in
+ 
+ Rootless Podman works better if the fuse-overlayfs and slirp4netns packages are installed.
+ The fuse-overlayfs package provides a userspace overlay storage driver, otherwise users need to use
+-the vfs storage driver, which is diskspace expensive and does not perform well. slirp4netns is
+-required for VPN, without it containers need to be run with the --network=host flag.
++the vfs storage driver, which is diskspace expensive and does not perform well.
++slirp4netns or pasta are required for VPN, without it containers need to be run
++with the --network=host flag.
+ 
+ ## ENVIRONMENT
+ 
+@@ -1601,7 +1633,9 @@ page.
+ NOTE: Use the environment variable `TMPDIR` to change the temporary storage location of downloaded container images. Podman defaults to use `/var/tmp`.
+ 
+ ## SEE ALSO
+-**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
++**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**,
++**[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**,
++**[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
+ 
+ ## HISTORY
+ October 2017, converted from Docker documentation to Podman by Dan Walsh for Podman `<dwalsh(a)redhat.com>`
+diff --git a/docs/source/markdown/podman-pod-create.1.md b/docs/source/markdown/podman-pod-create.1.md
+index 8088e1d62..c94ac6061 100644
+--- a/docs/source/markdown/podman-pod-create.1.md
++++ b/docs/source/markdown/podman-pod-create.1.md
+@@ -175,12 +175,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -194,6 +201,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the pod, setting the alias for all networks that the pod joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -527,6 +558,8 @@ $ podman pod create --network slirp4netns:outbound_addr=127.0.0.1,allow_host_loo
+ 
+ $ podman pod create --network slirp4netns:cidr=192.168.0.0/24
+ 
++$ podman pod create --network pasta
++
+ $ podman pod create --network net1:ip=10.89.1.5 --network net2:ip=10.89.10.10
+ ```
+ 
+diff --git a/docs/source/markdown/podman-run.1.md b/docs/source/markdown/podman-run.1.md
+index 239cf3b83..7c12f5e88 100644
+--- a/docs/source/markdown/podman-run.1.md
++++ b/docs/source/markdown/podman-run.1.md
+@@ -726,12 +726,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -745,6 +752,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the container, setting the alias for all networks that the container joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -1935,8 +1966,9 @@ In order for users to run rootless, there must be an entry for their username in
+ 
+ Rootless Podman works better if the fuse-overlayfs and slirp4netns packages are installed.
+ The **fuse-overlayfs** package provides a userspace overlay storage driver, otherwise users need to use
+-the **vfs** storage driver, which is diskspace expensive and does not perform well. slirp4netns is
+-required for VPN, without it containers need to be run with the **--network=host** flag.
++the **vfs** storage driver, which is diskspace expensive and does not perform
++well. slirp4netns or pasta are required for VPN, without it containers need to
++be run with the **--network=host** flag.
+ 
+ ## ENVIRONMENT
+ 
+@@ -1983,7 +2015,7 @@ page.
+ NOTE: Use the environment variable `TMPDIR` to change the temporary storage location of downloaded container images. Podman defaults to use `/var/tmp`.
+ 
+ ## SEE ALSO
+-**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
++**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
+ 
+ ## HISTORY
+ September 2018, updated by Kunal Kushwaha `<kushwaha_kunal_v7(a)lab.ntt.co.jp>`
+diff --git a/docs/source/markdown/podman.1.md b/docs/source/markdown/podman.1.md
+index b318001e4..1ad808cba 100644
+--- a/docs/source/markdown/podman.1.md
++++ b/docs/source/markdown/podman.1.md
+@@ -95,7 +95,7 @@ Set libpod namespace. Namespaces are used to separate groups of containers and p
+ When namespace is set, created containers and pods will join the given namespace, and only containers and pods in the given namespace will be visible to Podman.
+ 
+ #### **--network-cmd-path**=*path*
+-Path to the command binary to use for setting up a network.  It is currently only used for setting up a slirp4netns network.  If "" is used then the binary is looked up using the $PATH environment variable.
++Path to the command binary to use for setting up a network.  It is currently only used for setting up a slirp4netns(1) or pasta(1) network.  If "" is used then the binary is looked up using the $PATH environment variable.
+ 
+ #### **--noout**
+ 
+@@ -409,7 +409,7 @@ See the `subuid(5)` and `subgid(5)` man pages for more information.
+ 
+ Images are pulled under `XDG_DATA_HOME` when specified, otherwise in the home directory of the user under `.local/share/containers/storage`.
+ 
+-Currently the slirp4netns package is required to be installed to create a network device, otherwise rootless containers need to run in the network namespace of the host.
++Currently either slirp4netns or pasta are required to be installed to create a network device, otherwise rootless containers need to run in the network namespace of the host.
+ 
+ In certain environments like HPC (High Performance Computing), users cannot take advantage of the additional UIDs and GIDs from the /etc/subuid and /etc/subgid systems.  However, in this environment, rootless Podman can operate with a single UID.  To make this work, set the `ignore_chown_errors` option in the /etc/containers/storage.conf or in ~/.config/containers/storage.conf files. This option tells Podman when pulling an image to ignore chown errors when attempting to change a file in a container image to match the non-root UID in the image. This means all files get saved as the user's UID. Note this could cause issues when running the container.
+ 
+@@ -422,7 +422,7 @@ The Network File System (NFS) and other distributed file systems (for example: L
+ For more information, please refer to the [Podman Troubleshooting Page](https://github.com/containers/podman/blob/main/troubleshooting.md).
+ 
+ ## SEE ALSO
+-**[containers-mounts.conf(5)](https://github.com/containers/common/blob/main/docs/containers-mounts.conf.5.md)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[containers-registries.conf(5)](https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md)**, **[containers-storage.conf(5)](https://github.com/containers/storage/blob/main/docs/containers-storage.conf.5.md)**, **[buildah(1)](https://github.com/containers/buildah/blob/main/docs/buildah.1.md)**, **oci-hooks(5)**, **[containers-policy.json(5)](https://github.com/containers/image/blob/main/docs/containers-policy.json.5.md)**, **[crun(1)](https://github.com/containers/crun/blob/main/crun.1.md)**, **[runc(8)](https://github.com/opencontainers/runc/blob/master/man/runc.8.md)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**
++**[containers-mounts.conf(5)](https://github.com/containers/common/blob/main/docs/containers-mounts.conf.5.md)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[containers-registries.conf(5)](https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md)**, **[containers-storage.conf(5)](https://github.com/containers/storage/blob/main/docs/containers-storage.conf.5.md)**, **[buildah(1)](https://github.com/containers/buildah/blob/main/docs/buildah.1.md)**, **oci-hooks(5)**, **[containers-policy.json(5)](https://github.com/containers/image/blob/main/docs/containers-policy.json.5.md)**, **[crun(1)](https://github.com/containers/crun/blob/main/crun.1.md)**, **[runc(8)](https://github.com/opencontainers/runc/blob/master/man/runc.8.md)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**
+ 
+ ## HISTORY
+ Dec 2016, Originally compiled by Dan Walsh <dwalsh(a)redhat.com>
+diff --git a/libpod/networking_linux.go b/libpod/networking_linux.go
+index 19d5c7f76..183f815ba 100644
+--- a/libpod/networking_linux.go
++++ b/libpod/networking_linux.go
+@@ -636,6 +636,9 @@ func (r *Runtime) configureNetNS(ctr *Container, ctrNS ns.NetNS) (status map[str
+ 	if ctr.config.NetMode.IsSlirp4netns() {
+ 		return nil, r.setupSlirp4netns(ctr, ctrNS)
+ 	}
++	if ctr.config.NetMode.IsPasta() {
++		return nil, r.setupPasta(ctr, ctrNS)
++	}
+ 	networks, err := ctr.networks()
+ 	if err != nil {
+ 		return nil, err
+@@ -806,7 +809,8 @@ func (r *Runtime) teardownCNI(ctr *Container) error {
+ 		return err
+ 	}
+ 
+-	if !ctr.config.NetMode.IsSlirp4netns() && len(networks) > 0 {
++	if !ctr.config.NetMode.IsSlirp4netns() &&
++	   !ctr.config.NetMode.IsPasta() && len(networks) > 0 {
+ 		netOpts, err := ctr.getNetworkOptions(networks)
+ 		if err != nil {
+ 			return err
+diff --git a/libpod/networking_pasta.go b/libpod/networking_pasta.go
+new file mode 100644
+index 000000000..71595c87c
+--- /dev/null
++++ b/libpod/networking_pasta.go
+@@ -0,0 +1,64 @@
++// SPDX-License-Identifier: Apache-2.0
++//
++// networking_pasta.go - Start pasta(1) to provide connectivity to the container
++//
++// Copyright (c) 2022 Red Hat GmbH
++// Author: Stefano Brivio <sbrivio(a)redhat.com>
++//
++// +build linux
++
++package libpod
++
++import (
++	"os/exec"
++	"fmt"
++	"strings"
++
++	"github.com/containernetworking/plugins/pkg/ns"
++	"github.com/pkg/errors"
++	"github.com/sirupsen/logrus"
++)
++
++func (r *Runtime) setupPasta(ctr *Container, netns ns.NetNS) error {
++	path := r.config.Engine.NetworkCmdPath
++	if path == "" {
++		var err error
++		path, err = exec.LookPath("pasta")
++		if err != nil {
++			logrus.Errorf("Could not find pasta, the network namespace won't be configured: %v", err)
++			return nil
++		}
++	}
++
++	cmdArgs := []string{}
++	cmdArgs = append(cmdArgs, "--config-net")
++
++	for _, i := range ctr.convertPortMappings() {
++		if i.Protocol == "tcp" {
++			cmdArgs = append(cmdArgs, "-t")
++		} else if i.Protocol == "udp" {
++			cmdArgs = append(cmdArgs, "-u")
++		} else {
++			logrus.Errorf("can't forward protocol: %s", i.Protocol)
++			return nil
++		}
++
++		arg := fmt.Sprintf("%d:%d", i.HostPort, i.ContainerPort)
++		cmdArgs = append(cmdArgs, arg)
++	}
++
++	cmdArgs = append(cmdArgs, ctr.config.NetworkOptions["pasta"]...)
++
++	cmdArgs = append(cmdArgs, netns.Path())
++
++	logrus.Debugf("pasta arguments: %s", strings.Join(cmdArgs, " "))
++
++	// pasta forks once ready, and quits once we delete the target namespace
++	_, err := exec.Command(path, cmdArgs...).Output()
++	if err != nil {
++		return errors.Wrapf(err, "failed to start pasta: %s",
++				    err.(*exec.ExitError).Stderr)
++	}
++
++	return nil
++}
+diff --git a/pkg/namespaces/namespaces.go b/pkg/namespaces/namespaces.go
+index a7736aee0..0b2cb2b0b 100644
+--- a/pkg/namespaces/namespaces.go
++++ b/pkg/namespaces/namespaces.go
+@@ -19,6 +19,7 @@ const (
+ 	privateType   = "private"
+ 	shareableType = "shareable"
+ 	slirpType     = "slirp4netns"
++	pastaType     = "pasta"
+ )
+ 
+ // CgroupMode represents cgroup mode in the container.
+@@ -388,6 +389,11 @@ func (n NetworkMode) IsSlirp4netns() bool {
+ 	return n == slirpType || strings.HasPrefix(string(n), slirpType+":")
+ }
+ 
++// IsPasta indicates if we are running a rootless network stack using pasta
++func (n NetworkMode) IsPasta() bool {
++	return n == pastaType || strings.HasPrefix(string(n), pastaType + ":")
++}
++
+ // IsNS indicates a network namespace passed in by path (ns:<path>)
+ func (n NetworkMode) IsNS() bool {
+ 	return strings.HasPrefix(string(n), nsType)
+diff --git a/pkg/specgen/generate/namespaces.go b/pkg/specgen/generate/namespaces.go
+index 3f77cbe76..a72be1731 100644
+--- a/pkg/specgen/generate/namespaces.go
++++ b/pkg/specgen/generate/namespaces.go
+@@ -258,6 +258,16 @@ func namespaceOptions(ctx context.Context, s *specgen.SpecGenerator, rt *libpod.
+ 			val = fmt.Sprintf("slirp4netns:%s", s.NetNS.Value)
+ 		}
+ 		toReturn = append(toReturn, libpod.WithNetNS(portMappings, expose, postConfigureNetNS, val, nil))
++	case specgen.Pasta:
++		portMappings, expose, err := createPortMappings(ctx, s, imageData)
++		if err != nil {
++			return nil, err
++		}
++		val := "pasta"
++		if s.NetNS.Value != "" {
++			val = fmt.Sprintf("pasta:%s", s.NetNS.Value)
++		}
++		toReturn = append(toReturn, libpod.WithNetNS(portMappings, expose, postConfigureNetNS, val, nil))
+ 	case specgen.Private:
+ 		fallthrough
+ 	case specgen.Bridge:
+diff --git a/pkg/specgen/generate/pod_create.go b/pkg/specgen/generate/pod_create.go
+index 68fda3ad7..0d64027a3 100644
+--- a/pkg/specgen/generate/pod_create.go
++++ b/pkg/specgen/generate/pod_create.go
+@@ -232,6 +232,12 @@ func MapSpec(p *specgen.PodSpecGenerator) (*specgen.SpecGenerator, error) {
+ 			p.InfraContainerSpec.NetworkOptions = p.NetworkOptions
+ 			p.InfraContainerSpec.NetNS.NSMode = specgen.NamespaceMode("slirp4netns")
+ 		}
++	case specgen.Pasta:
++		logrus.Debugf("Pod will use pasta")
++		if p.InfraContainerSpec.NetNS.NSMode != "host" {
++			p.InfraContainerSpec.NetworkOptions = p.NetworkOptions
++			p.InfraContainerSpec.NetNS.NSMode = specgen.NamespaceMode("pasta")
++		}
+ 	case specgen.NoNetwork:
+ 		logrus.Debugf("Pod will not use networking")
+ 		if len(p.InfraContainerSpec.PortMappings) > 0 ||
+diff --git a/pkg/specgen/namespaces.go b/pkg/specgen/namespaces.go
+index e672bc65f..c7d443661 100644
+--- a/pkg/specgen/namespaces.go
++++ b/pkg/specgen/namespaces.go
+@@ -47,6 +47,9 @@ const (
+ 	// be used.
+ 	// Only used with the network namespace, invalid otherwise.
+ 	Slirp NamespaceMode = "slirp4netns"
++	// Pasta indicates that a pasta network stack should be used.
++	// Only used with the network namespace, invalid otherwise.
++	Pasta NamespaceMode = "pasta"
+ 	// KeepId indicates a user namespace to keep the owner uid inside
+ 	// of the namespace itself.
+ 	// Only used with the user namespace, invalid otherwise.
+@@ -135,7 +138,7 @@ func validateNetNS(n *Namespace) error {
+ 		return nil
+ 	}
+ 	switch n.NSMode {
+-	case Slirp:
++	case Slirp, Pasta:
+ 		break
+ 	case "", Default, Host, Path, FromContainer, FromPod, Private, NoNetwork, Bridge:
+ 		break
+@@ -167,7 +170,7 @@ func (n *Namespace) validate() error {
+ 	switch n.NSMode {
+ 	case "", Default, Host, Path, FromContainer, FromPod, Private:
+ 		// Valid, do nothing
+-	case NoNetwork, Bridge, Slirp:
++	case NoNetwork, Bridge, Slirp, Pasta:
+ 		return errors.Errorf("cannot use network modes with non-network namespace")
+ 	default:
+ 		return errors.Errorf("invalid namespace type %s specified", n.NSMode)
+@@ -281,6 +284,8 @@ func ParseNetworkNamespace(ns string, rootlessDefaultCNI bool) (Namespace, map[s
+ 	switch {
+ 	case ns == string(Slirp), strings.HasPrefix(ns, string(Slirp)+":"):
+ 		toReturn.NSMode = Slirp
++	case ns == string(Pasta), strings.HasPrefix(ns, string(Pasta) + ":"):
++		toReturn.NSMode = Pasta
+ 	case ns == string(FromPod):
+ 		toReturn.NSMode = FromPod
+ 	case ns == "" || ns == string(Default) || ns == string(Private):
+@@ -349,6 +354,13 @@ func ParseNetworkFlag(networks []string) (Namespace, map[string]types.PerNetwork
+ 			networkOptions[parts[0]] = strings.Split(parts[1], ",")
+ 		}
+ 		toReturn.NSMode = Slirp
++	case ns == string(Pasta), strings.HasPrefix(ns, string(Pasta) + ":"):
++		parts := strings.SplitN(ns, ":", 2)
++		if len(parts) > 1 {
++			networkOptions = make(map[string][]string)
++			networkOptions[parts[0]] = strings.Split(parts[1], ",")
++		}
++		toReturn.NSMode = Pasta
+ 	case ns == string(FromPod):
+ 		toReturn.NSMode = FromPod
+ 	case ns == "" || ns == string(Default) || ns == string(Private):
+@@ -425,7 +437,7 @@ func ParseNetworkFlag(networks []string) (Namespace, map[string]types.PerNetwork
+ 			if parts[0] == "" {
+ 				return toReturn, nil, nil, errors.Wrapf(define.ErrInvalidArg, "network name cannot be empty")
+ 			}
+-			if util.StringInSlice(parts[0], []string{string(Bridge), string(Slirp), string(FromPod), string(NoNetwork),
++			if util.StringInSlice(parts[0], []string{string(Bridge), string(Slirp), string(Pasta), string(FromPod), string(NoNetwork),
+ 				string(Default), string(Private), string(Path), string(FromContainer), string(Host)}) {
+ 				return toReturn, nil, nil, errors.Wrapf(define.ErrInvalidArg, "can only set extra network names, selected mode %s conflicts with bridge", parts[0])
+ 			}
+diff --git a/pkg/specgen/podspecgen.go b/pkg/specgen/podspecgen.go
+index 759caa0c0..f95bbffc7 100644
+--- a/pkg/specgen/podspecgen.go
++++ b/pkg/specgen/podspecgen.go
+@@ -93,7 +93,7 @@ type PodNetworkConfig struct {
+ 	// PortMappings is a set of ports to map into the infra container.
+ 	// As, by default, containers share their network with the infra
+ 	// container, this will forward the ports to the entire pod.
+-	// Only available if NetNS is set to Bridge or Slirp.
++	// Only available if NetNS is set to Bridge, Slirp, or Pasta.
+ 	// Optional.
+ 	PortMappings []types.PortMapping `json:"portmappings,omitempty"`
+ 	// Map of networks names to ids the container should join to.
+-- 
+2.28.0
+
-- 
@@ -0,0 +1,542 @@
+From bcfd618a316097e5d2e1a20703b11beeb21b6899 Mon Sep 17 00:00:00 2001
+From: Stefano Brivio <sbrivio(a)redhat.com>
+Date: Sat, 19 Feb 2022 04:54:09 +0100
+Subject: [PATCH] libpod: Add pasta networking mode
+
+Conceptually equivalent to networking by means of slirp4netns(1),
+with a few practical differences:
+
+- pasta(1) forks to background once networking is configured in the
+  namespace and quits on its own once the namespace is deleted:
+  file descriptor synchronisation and PID tracking are not needed
+
+- port forwarding is configured via command line options at start-up,
+  instead of an API socket: this is taken care of right away as we're
+  about to start pasta
+
+- there's no need for further selection of port forwarding modes:
+  pasta behaves similarly to containers-rootlessport for local binds
+  (splice() instead of read()/write() pairs, without L2-L4
+  translation), and keeps the original source address for non-local
+  connections like slirp4netns does
+
+- IPv6 is enabled by default, it's not an experimental feature. It
+  can be disabled using additional options as documented
+
+- by default, addresses and routes are copied from the host, that is,
+  container users will see the same IP address and routes as if they
+  were in the init namespace context. The interface name is also
+  sourced from the host upstream interface with the first default
+  route in the routing table. This is also configurable as documented
+
+- by default, the host is reachable using the gateway address from
+  the container, unless the --no-map-gw option is passed
+
+- sandboxing and seccomp(2) policies cannot be disabled
+
+See https://passt.top for more details about pasta.
+
+Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
+---
+SPDX-FileCopyrightText: 2021-2022 Red Hat GmbH <sbrivio(a)redhat.com>
+SPDX-License-Identifier: Apache-2.0
+
+ docs/source/markdown/podman-create.1.md     | 40 ++++++++++++-
+ docs/source/markdown/podman-pod-create.1.md | 33 +++++++++++
+ docs/source/markdown/podman-run.1.md        | 38 +++++++++++-
+ docs/source/markdown/podman.1.md            |  6 +-
+ libpod/networking_linux.go                  |  6 +-
+ libpod/networking_pasta.go                  | 64 +++++++++++++++++++++
+ pkg/namespaces/namespaces.go                |  6 ++
+ pkg/specgen/generate/namespaces.go          | 10 ++++
+ pkg/specgen/generate/pod_create.go          |  6 ++
+ pkg/specgen/namespaces.go                   | 18 +++++-
+ pkg/specgen/podspecgen.go                   |  2 +-
+ 11 files changed, 215 insertions(+), 14 deletions(-)
+ create mode 100644 libpod/networking_pasta.go
+
+diff --git a/docs/source/markdown/podman-create.1.md b/docs/source/markdown/podman-create.1.md
+index 2a0f3b738..5cc03bff3 100644
+--- a/docs/source/markdown/podman-create.1.md
++++ b/docs/source/markdown/podman-create.1.md
+@@ -699,12 +699,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -718,6 +725,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the container, setting the alias for all networks that the container joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -1551,8 +1582,9 @@ In order for users to run rootless, there must be an entry for their username in
+ 
+ Rootless Podman works better if the fuse-overlayfs and slirp4netns packages are installed.
+ The fuse-overlayfs package provides a userspace overlay storage driver, otherwise users need to use
+-the vfs storage driver, which is diskspace expensive and does not perform well. slirp4netns is
+-required for VPN, without it containers need to be run with the --network=host flag.
++the vfs storage driver, which is diskspace expensive and does not perform well.
++slirp4netns or pasta are required for VPN, without it containers need to be run
++with the --network=host flag.
+ 
+ ## ENVIRONMENT
+ 
+@@ -1601,7 +1633,9 @@ page.
+ NOTE: Use the environment variable `TMPDIR` to change the temporary storage location of downloaded container images. Podman defaults to use `/var/tmp`.
+ 
+ ## SEE ALSO
+-**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
++**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**,
++**[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**,
++**[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
+ 
+ ## HISTORY
+ October 2017, converted from Docker documentation to Podman by Dan Walsh for Podman `<dwalsh(a)redhat.com>`
+diff --git a/docs/source/markdown/podman-pod-create.1.md b/docs/source/markdown/podman-pod-create.1.md
+index 8088e1d62..c94ac6061 100644
+--- a/docs/source/markdown/podman-pod-create.1.md
++++ b/docs/source/markdown/podman-pod-create.1.md
+@@ -175,12 +175,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -194,6 +201,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the pod, setting the alias for all networks that the pod joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -527,6 +558,8 @@ $ podman pod create --network slirp4netns:outbound_addr=127.0.0.1,allow_host_loo
+ 
+ $ podman pod create --network slirp4netns:cidr=192.168.0.0/24
+ 
++$ podman pod create --network pasta
++
+ $ podman pod create --network net1:ip=10.89.1.5 --network net2:ip=10.89.10.10
+ ```
+ 
+diff --git a/docs/source/markdown/podman-run.1.md b/docs/source/markdown/podman-run.1.md
+index 239cf3b83..7c12f5e88 100644
+--- a/docs/source/markdown/podman-run.1.md
++++ b/docs/source/markdown/podman-run.1.md
+@@ -726,12 +726,19 @@ Valid _mode_ values are:
+   - **interface_name**: Specify a name for the created network interface inside the container.
+ 
+   For example to set a static ipv4 address and a static mac address, use `--network bridge:ip=10.88.0.10,mac=44:33:22:11:00:99`.
++
+ - \<network name or ID\>[:OPTIONS,...]: Connect to a user-defined network; this is the network name or ID from a network created by **[podman network create](podman-network-create.1.md)**. Using the network name implies the bridge network mode. It is possible to specify the same options described under the bridge mode above. You can use the **--network** option multiple times to specify additional networks.
++
+ - **none**: Create a network namespace for the container but do not configure network interfaces for it, thus the container has no network connectivity.
++
+ - **container:**_id_: Reuse another container's network stack.
++
+ - **host**: Do not create a network namespace, the container will use the host's network. Note: The host mode gives the container full access to local system services such as D-bus and is therefore considered insecure.
++
+ - **ns:**_path_: Path to a network namespace to join.
++
+ - **private**: Create a new namespace for the container. This will use the **bridge** mode for rootfull containers and **slirp4netns** for rootless ones.
++
+ - **slirp4netns[:OPTIONS,...]**: use **slirp4netns**(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options:
+   - **allow_host_loopback=true|false**: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`, which is added to `/etc/hosts` as `host.containers.internal` for your convenience). Default is false.
+   - **mtu=MTU**: Specify the MTU to use for this network. (Default is `65520`).
+@@ -745,6 +752,30 @@ Valid _mode_ values are:
+   Note: Rootlesskit changes the source IP address of incoming packets to an IP address in the container network namespace, usually `10.0.2.100`. If your application requires the real source IP address, e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for rootless containers when connected to user-defined networks.
+   - **port_handler=slirp4netns**: Use the slirp4netns port forwarding, it is slower than rootlesskit but preserves the correct source IP address. This port handler cannot be used for user-defined networks.
+ 
++- **pasta[:OPTIONS,...]**: use **pasta**(1) to create a user-mode networking
++stack. By default, IPv4 and IPv6 addresses and routes, as well as the pod
++interface name, are copied from the host. If port forwarding isn't configured,
++ports will be forwarded dynamically as services are bound on either side (init
++namespace or container namespace). Port forwarding preserves the original source
++IP address. Options described in pasta(1) can be specified as comma-separated
++arguments. In terms of pasta(1) options, only **--config-net** is given by
++default, in order to configure networking when the container is started. Some
++examples:
++  - **pasta:--no-map-gw**: Don't allow the container to directly reach the host
++    using the gateway address, which would normally be mapped to a loopback or
++    link-local address.
++  - **pasta:--mtu,1500**: Specify a 1500 bytes MTU for the _tap_ interface in
++    the container.
++  - **pasta:--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options: disable IPv6, assign
++    `10.0.2.0/24` to the `tap0` interface in the container, with gateway
++    `10.0.2.3`, enable DNS forwarder reachable at `10.0.2.3`, set MTU to 1500
++    bytes, disable NDP, DHCPv6 and DHCP support.
++  - **pasta:--no-map-gw,-I,tap0,--ipv4-only,-a,10.0.2.0,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,--no-ndp,--no-dhcpv6,--no-dhcp**,
++    equivalent to default slirp4netns(1) options with Podman overrides: same as
++    above, but leave the MTU to 65520 bytes, and don't map the gateway address
++    from the container to a local address.
++
+ #### **--network-alias**=*alias*
+ 
+ Add a network-scoped alias for the container, setting the alias for all networks that the container joins. To set a name only for a specific network, use the alias option as described under the **--network** option.
+@@ -1935,8 +1966,9 @@ In order for users to run rootless, there must be an entry for their username in
+ 
+ Rootless Podman works better if the fuse-overlayfs and slirp4netns packages are installed.
+ The **fuse-overlayfs** package provides a userspace overlay storage driver, otherwise users need to use
+-the **vfs** storage driver, which is diskspace expensive and does not perform well. slirp4netns is
+-required for VPN, without it containers need to be run with the **--network=host** flag.
++the **vfs** storage driver, which is diskspace expensive and does not perform
++well. slirp4netns or pasta are required for VPN, without it containers need to
++be run with the **--network=host** flag.
+ 
+ ## ENVIRONMENT
+ 
+@@ -1983,7 +2015,7 @@ page.
+ NOTE: Use the environment variable `TMPDIR` to change the temporary storage location of downloaded container images. Podman defaults to use `/var/tmp`.
+ 
+ ## SEE ALSO
+-**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
++**[podman(1)](podman.1.md)**, **[podman-save(1)](podman-save.1.md)**, **[podman-ps(1)](podman-ps.1.md)**, **[podman-attach(1)](podman-attach.1.md)**, **[podman-pod-create(1)](podman-pod-create.1.md)**, **[podman-port(1)](podman-port.1.md)**, **[podman-start(1)](podman-start.1.md)**, **[podman-kill(1)](podman-kill.1.md)**, **[podman-stop(1)](podman-stop.1.md)**, **[podman-generate-systemd(1)](podman-generate-systemd.1.md)**, **[podman-rm(1)](podman-rm.1.md)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[systemd.unit(5)](https://www.freedesktop.org/software/systemd/man/systemd.unit.html)**, **[setsebool(8)](https://man7.org/linux/man-pages/man8/setsebool.8.html)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**, **[fuse-overlayfs(1)](https://github.com/containers/fuse-overlayfs/blob/main/fuse-overlayfs.1.md)**, **proc(5)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**, **personality(2)**
+ 
+ ## HISTORY
+ September 2018, updated by Kunal Kushwaha `<kushwaha_kunal_v7(a)lab.ntt.co.jp>`
+diff --git a/docs/source/markdown/podman.1.md b/docs/source/markdown/podman.1.md
+index b318001e4..1ad808cba 100644
+--- a/docs/source/markdown/podman.1.md
++++ b/docs/source/markdown/podman.1.md
+@@ -95,7 +95,7 @@ Set libpod namespace. Namespaces are used to separate groups of containers and p
+ When namespace is set, created containers and pods will join the given namespace, and only containers and pods in the given namespace will be visible to Podman.
+ 
+ #### **--network-cmd-path**=*path*
+-Path to the command binary to use for setting up a network.  It is currently only used for setting up a slirp4netns network.  If "" is used then the binary is looked up using the $PATH environment variable.
++Path to the command binary to use for setting up a network.  It is currently only used for setting up a slirp4netns(1) or pasta(1) network.  If "" is used then the binary is looked up using the $PATH environment variable.
+ 
+ #### **--noout**
+ 
+@@ -409,7 +409,7 @@ See the `subuid(5)` and `subgid(5)` man pages for more information.
+ 
+ Images are pulled under `XDG_DATA_HOME` when specified, otherwise in the home directory of the user under `.local/share/containers/storage`.
+ 
+-Currently the slirp4netns package is required to be installed to create a network device, otherwise rootless containers need to run in the network namespace of the host.
++Currently either slirp4netns or pasta are required to be installed to create a network device, otherwise rootless containers need to run in the network namespace of the host.
+ 
+ In certain environments like HPC (High Performance Computing), users cannot take advantage of the additional UIDs and GIDs from the /etc/subuid and /etc/subgid systems.  However, in this environment, rootless Podman can operate with a single UID.  To make this work, set the `ignore_chown_errors` option in the /etc/containers/storage.conf or in ~/.config/containers/storage.conf files. This option tells Podman when pulling an image to ignore chown errors when attempting to change a file in a container image to match the non-root UID in the image. This means all files get saved as the user's UID. Note this could cause issues when running the container.
+ 
+@@ -422,7 +422,7 @@ The Network File System (NFS) and other distributed file systems (for example: L
+ For more information, please refer to the [Podman Troubleshooting Page](https://github.com/containers/podman/blob/main/troubleshooting.md).
+ 
+ ## SEE ALSO
+-**[containers-mounts.conf(5)](https://github.com/containers/common/blob/main/docs/containers-mounts.conf.5.md)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[containers-registries.conf(5)](https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md)**, **[containers-storage.conf(5)](https://github.com/containers/storage/blob/main/docs/containers-storage.conf.5.md)**, **[buildah(1)](https://github.com/containers/buildah/blob/main/docs/buildah.1.md)**, **oci-hooks(5)**, **[containers-policy.json(5)](https://github.com/containers/image/blob/main/docs/containers-policy.json.5.md)**, **[crun(1)](https://github.com/containers/crun/blob/main/crun.1.md)**, **[runc(8)](https://github.com/opencontainers/runc/blob/master/man/runc.8.md)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**
++**[containers-mounts.conf(5)](https://github.com/containers/common/blob/main/docs/containers-mounts.conf.5.md)**, **[containers.conf(5)](https://github.com/containers/common/blob/main/docs/containers.conf.5.md)**, **[containers-registries.conf(5)](https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md)**, **[containers-storage.conf(5)](https://github.com/containers/storage/blob/main/docs/containers-storage.conf.5.md)**, **[buildah(1)](https://github.com/containers/buildah/blob/main/docs/buildah.1.md)**, **oci-hooks(5)**, **[containers-policy.json(5)](https://github.com/containers/image/blob/main/docs/containers-policy.json.5.md)**, **[crun(1)](https://github.com/containers/crun/blob/main/crun.1.md)**, **[runc(8)](https://github.com/opencontainers/runc/blob/master/man/runc.8.md)**, **[subuid(5)](https://www.unix.com/man-page/linux/5/subuid)**, **[subgid(5)](https://www.unix.com/man-page/linux/5/subgid)**, **[slirp4netns(1)](https://github.com/rootless-containers/slirp4netns/blob/master/slirp4netns.1.md)**, **[pasta(1)](https://passt.top/builds/latest/web/passt.1.html)**, **[conmon(8)](https://github.com/containers/conmon/blob/main/docs/conmon.8.md)**
+ 
+ ## HISTORY
+ Dec 2016, Originally compiled by Dan Walsh <dwalsh(a)redhat.com>
+diff --git a/libpod/networking_linux.go b/libpod/networking_linux.go
+index 19d5c7f76..183f815ba 100644
+--- a/libpod/networking_linux.go
++++ b/libpod/networking_linux.go
+@@ -636,6 +636,9 @@ func (r *Runtime) configureNetNS(ctr *Container, ctrNS ns.NetNS) (status map[str
+ 	if ctr.config.NetMode.IsSlirp4netns() {
+ 		return nil, r.setupSlirp4netns(ctr, ctrNS)
+ 	}
++	if ctr.config.NetMode.IsPasta() {
++		return nil, r.setupPasta(ctr, ctrNS)
++	}
+ 	networks, err := ctr.networks()
+ 	if err != nil {
+ 		return nil, err
+@@ -806,7 +809,8 @@ func (r *Runtime) teardownCNI(ctr *Container) error {
+ 		return err
+ 	}
+ 
+-	if !ctr.config.NetMode.IsSlirp4netns() && len(networks) > 0 {
++	if !ctr.config.NetMode.IsSlirp4netns() &&
++	   !ctr.config.NetMode.IsPasta() && len(networks) > 0 {
+ 		netOpts, err := ctr.getNetworkOptions(networks)
+ 		if err != nil {
+ 			return err
+diff --git a/libpod/networking_pasta.go b/libpod/networking_pasta.go
+new file mode 100644
+index 000000000..71595c87c
+--- /dev/null
++++ b/libpod/networking_pasta.go
+@@ -0,0 +1,64 @@
++// SPDX-License-Identifier: Apache-2.0
++//
++// networking_pasta.go - Start pasta(1) to provide connectivity to the container
++//
++// Copyright (c) 2022 Red Hat GmbH
++// Author: Stefano Brivio <sbrivio(a)redhat.com>
++//
++// +build linux
++
++package libpod
++
++import (
++	"os/exec"
++	"fmt"
++	"strings"
++
++	"github.com/containernetworking/plugins/pkg/ns"
++	"github.com/pkg/errors"
++	"github.com/sirupsen/logrus"
++)
++
++func (r *Runtime) setupPasta(ctr *Container, netns ns.NetNS) error {
++	path := r.config.Engine.NetworkCmdPath
++	if path == "" {
++		var err error
++		path, err = exec.LookPath("pasta")
++		if err != nil {
++			logrus.Errorf("Could not find pasta, the network namespace won't be configured: %v", err)
++			return nil
++		}
++	}
++
++	cmdArgs := []string{}
++	cmdArgs = append(cmdArgs, "--config-net")
++
++	for _, i := range ctr.convertPortMappings() {
++		if i.Protocol == "tcp" {
++			cmdArgs = append(cmdArgs, "-t")
++		} else if i.Protocol == "udp" {
++			cmdArgs = append(cmdArgs, "-u")
++		} else {
++			logrus.Errorf("can't forward protocol: %s", i.Protocol)
++			return nil
++		}
++
++		arg := fmt.Sprintf("%d:%d", i.HostPort, i.ContainerPort)
++		cmdArgs = append(cmdArgs, arg)
++	}
++
++	cmdArgs = append(cmdArgs, ctr.config.NetworkOptions["pasta"]...)
++
++	cmdArgs = append(cmdArgs, netns.Path())
++
++	logrus.Debugf("pasta arguments: %s", strings.Join(cmdArgs, " "))
++
++	// pasta forks once ready, and quits once we delete the target namespace
++	_, err := exec.Command(path, cmdArgs...).Output()
++	if err != nil {
++		return errors.Wrapf(err, "failed to start pasta: %s",
++				    err.(*exec.ExitError).Stderr)
++	}
++
++	return nil
++}
+diff --git a/pkg/namespaces/namespaces.go b/pkg/namespaces/namespaces.go
+index a7736aee0..0b2cb2b0b 100644
+--- a/pkg/namespaces/namespaces.go
++++ b/pkg/namespaces/namespaces.go
+@@ -19,6 +19,7 @@ const (
+ 	privateType   = "private"
+ 	shareableType = "shareable"
+ 	slirpType     = "slirp4netns"
++	pastaType     = "pasta"
+ )
+ 
+ // CgroupMode represents cgroup mode in the container.
+@@ -388,6 +389,11 @@ func (n NetworkMode) IsSlirp4netns() bool {
+ 	return n == slirpType || strings.HasPrefix(string(n), slirpType+":")
+ }
+ 
++// IsPasta indicates if we are running a rootless network stack using pasta
++func (n NetworkMode) IsPasta() bool {
++	return n == pastaType || strings.HasPrefix(string(n), pastaType + ":")
++}
++
+ // IsNS indicates a network namespace passed in by path (ns:<path>)
+ func (n NetworkMode) IsNS() bool {
+ 	return strings.HasPrefix(string(n), nsType)
+diff --git a/pkg/specgen/generate/namespaces.go b/pkg/specgen/generate/namespaces.go
+index 3f77cbe76..a72be1731 100644
+--- a/pkg/specgen/generate/namespaces.go
++++ b/pkg/specgen/generate/namespaces.go
+@@ -258,6 +258,16 @@ func namespaceOptions(ctx context.Context, s *specgen.SpecGenerator, rt *libpod.
+ 			val = fmt.Sprintf("slirp4netns:%s", s.NetNS.Value)
+ 		}
+ 		toReturn = append(toReturn, libpod.WithNetNS(portMappings, expose, postConfigureNetNS, val, nil))
++	case specgen.Pasta:
++		portMappings, expose, err := createPortMappings(ctx, s, imageData)
++		if err != nil {
++			return nil, err
++		}
++		val := "pasta"
++		if s.NetNS.Value != "" {
++			val = fmt.Sprintf("pasta:%s", s.NetNS.Value)
++		}
++		toReturn = append(toReturn, libpod.WithNetNS(portMappings, expose, postConfigureNetNS, val, nil))
+ 	case specgen.Private:
+ 		fallthrough
+ 	case specgen.Bridge:
+diff --git a/pkg/specgen/generate/pod_create.go b/pkg/specgen/generate/pod_create.go
+index 68fda3ad7..0d64027a3 100644
+--- a/pkg/specgen/generate/pod_create.go
++++ b/pkg/specgen/generate/pod_create.go
+@@ -232,6 +232,12 @@ func MapSpec(p *specgen.PodSpecGenerator) (*specgen.SpecGenerator, error) {
+ 			p.InfraContainerSpec.NetworkOptions = p.NetworkOptions
+ 			p.InfraContainerSpec.NetNS.NSMode = specgen.NamespaceMode("slirp4netns")
+ 		}
++	case specgen.Pasta:
++		logrus.Debugf("Pod will use pasta")
++		if p.InfraContainerSpec.NetNS.NSMode != "host" {
++			p.InfraContainerSpec.NetworkOptions = p.NetworkOptions
++			p.InfraContainerSpec.NetNS.NSMode = specgen.NamespaceMode("pasta")
++		}
+ 	case specgen.NoNetwork:
+ 		logrus.Debugf("Pod will not use networking")
+ 		if len(p.InfraContainerSpec.PortMappings) > 0 ||
+diff --git a/pkg/specgen/namespaces.go b/pkg/specgen/namespaces.go
+index e672bc65f..c7d443661 100644
+--- a/pkg/specgen/namespaces.go
++++ b/pkg/specgen/namespaces.go
+@@ -47,6 +47,9 @@ const (
+ 	// be used.
+ 	// Only used with the network namespace, invalid otherwise.
+ 	Slirp NamespaceMode = "slirp4netns"
++	// Pasta indicates that a pasta network stack should be used.
++	// Only used with the network namespace, invalid otherwise.
++	Pasta NamespaceMode = "pasta"
+ 	// KeepId indicates a user namespace to keep the owner uid inside
+ 	// of the namespace itself.
+ 	// Only used with the user namespace, invalid otherwise.
+@@ -135,7 +138,7 @@ func validateNetNS(n *Namespace) error {
+ 		return nil
+ 	}
+ 	switch n.NSMode {
+-	case Slirp:
++	case Slirp, Pasta:
+ 		break
+ 	case "", Default, Host, Path, FromContainer, FromPod, Private, NoNetwork, Bridge:
+ 		break
+@@ -167,7 +170,7 @@ func (n *Namespace) validate() error {
+ 	switch n.NSMode {
+ 	case "", Default, Host, Path, FromContainer, FromPod, Private:
+ 		// Valid, do nothing
+-	case NoNetwork, Bridge, Slirp:
++	case NoNetwork, Bridge, Slirp, Pasta:
+ 		return errors.Errorf("cannot use network modes with non-network namespace")
+ 	default:
+ 		return errors.Errorf("invalid namespace type %s specified", n.NSMode)
+@@ -281,6 +284,8 @@ func ParseNetworkNamespace(ns string, rootlessDefaultCNI bool) (Namespace, map[s
+ 	switch {
+ 	case ns == string(Slirp), strings.HasPrefix(ns, string(Slirp)+":"):
+ 		toReturn.NSMode = Slirp
++	case ns == string(Pasta), strings.HasPrefix(ns, string(Pasta) + ":"):
++		toReturn.NSMode = Pasta
+ 	case ns == string(FromPod):
+ 		toReturn.NSMode = FromPod
+ 	case ns == "" || ns == string(Default) || ns == string(Private):
+@@ -349,6 +354,13 @@ func ParseNetworkFlag(networks []string) (Namespace, map[string]types.PerNetwork
+ 			networkOptions[parts[0]] = strings.Split(parts[1], ",")
+ 		}
+ 		toReturn.NSMode = Slirp
++	case ns == string(Pasta), strings.HasPrefix(ns, string(Pasta) + ":"):
++		parts := strings.SplitN(ns, ":", 2)
++		if len(parts) > 1 {
++			networkOptions = make(map[string][]string)
++			networkOptions[parts[0]] = strings.Split(parts[1], ",")
++		}
++		toReturn.NSMode = Pasta
+ 	case ns == string(FromPod):
+ 		toReturn.NSMode = FromPod
+ 	case ns == "" || ns == string(Default) || ns == string(Private):
+@@ -425,7 +437,7 @@ func ParseNetworkFlag(networks []string) (Namespace, map[string]types.PerNetwork
+ 			if parts[0] == "" {
+ 				return toReturn, nil, nil, errors.Wrapf(define.ErrInvalidArg, "network name cannot be empty")
+ 			}
+-			if util.StringInSlice(parts[0], []string{string(Bridge), string(Slirp), string(FromPod), string(NoNetwork),
++			if util.StringInSlice(parts[0], []string{string(Bridge), string(Slirp), string(Pasta), string(FromPod), string(NoNetwork),
+ 				string(Default), string(Private), string(Path), string(FromContainer), string(Host)}) {
+ 				return toReturn, nil, nil, errors.Wrapf(define.ErrInvalidArg, "can only set extra network names, selected mode %s conflicts with bridge", parts[0])
+ 			}
+diff --git a/pkg/specgen/podspecgen.go b/pkg/specgen/podspecgen.go
+index 759caa0c0..f95bbffc7 100644
+--- a/pkg/specgen/podspecgen.go
++++ b/pkg/specgen/podspecgen.go
+@@ -93,7 +93,7 @@ type PodNetworkConfig struct {
+ 	// PortMappings is a set of ports to map into the infra container.
+ 	// As, by default, containers share their network with the infra
+ 	// container, this will forward the ports to the entire pod.
+-	// Only available if NetNS is set to Bridge or Slirp.
++	// Only available if NetNS is set to Bridge, Slirp, or Pasta.
+ 	// Optional.
+ 	PortMappings []types.PortMapping `json:"portmappings,omitempty"`
+ 	// Map of networks names to ids the container should join to.
+-- 
+2.28.0
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 18/18] test: Add demo for Podman with pasta
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (16 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 17/18] contrib: Add patch for Podman integration Stefano Brivio
@ 2022-02-22  1:34 ` Stefano Brivio
  2022-02-22  9:07 ` [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  1:34 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 30782 bytes --]

...showing setup steps, some peculiarities as --net option, and a
general side-to-side comparison with slirp4netns(1), including
"quick" TCP and UDP throughput and latency benchmarks.

Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
 README.md        |  11 +-
 test/demo/podman | 798 +++++++++++++++++++++++++++++++++++++++++++++++
 test/lib/layout  |  38 ++-
 test/lib/setup   |  21 +-
 test/lib/term    |  10 +
 test/lib/test    |  35 +++
 test/run         |   8 +
 7 files changed, 915 insertions(+), 6 deletions(-)
 create mode 100644 test/demo/podman

diff --git a/README.md b/README.md
index 51cc870..16e91b9 100644
--- a/README.md
+++ b/README.md
@@ -398,9 +398,14 @@ is fully configurable with command line options.
 
 ### pasta
 
-<p><video id="demo_pasta_video" style="width: 70%; height: auto; max-height: 90%" controls>
- <source src="/builds/latest/web/demo_pasta.webm" type="video/webm">
-</video></p>
+<div style="display: grid; grid-template-columns: 1fr 1fr;">
+  <div><video id="demo_pasta_video" style="width: 100%; height: auto;" controls>
+    <source src="/builds/latest/web/demo_pasta.webm" type="video/webm">
+  </video>use pasta to create and connect a namespace</div>
+  <div><video id="demo_podman_video" style="width: 100%; height: auto;" controls>
+    <source src="/builds/latest/web/demo_podman.webm" type="video/webm">
+  </video>use Podman with pasta</div>
+</div>
 
 ### passt
 
diff --git a/test/demo/podman b/test/demo/podman
new file mode 100644
index 0000000..2586695
--- /dev/null
+++ b/test/demo/podman
@@ -0,0 +1,798 @@
+# SPDX-License-Identifier: AGPL-3.0-or-later
+#
+# PASST - Plug A Simple Socket Transport
+#  for qemu/UNIX domain socket mode
+#
+# PASTA - Pack A Subtle Tap Abstraction
+#  for network namespace/tap device mode
+#
+# test/demo/podman - Show pasta operation with Podman
+#
+# Copyright (c) 2022 Red Hat GmbH
+# Author: Stefano Brivio <sbrivio(a)redhat.com>
+
+onlyfor	podman
+
+set	OPTS -Z -w 4M -l 1M -P 2 -t5 --pacing-timer 10000
+set	OPTS_10s -Z -w 4M -l 1M -P 2 -t10 --pacing-timer 10000
+
+say	This is an overview of 
+em	Podman
+say	 using 
+em	pasta
+say	.
+nl
+nl
+sleep	3
+
+say	Let's fetch Podman
+sleep	1
+tempdir	TEMPDIR
+host	git -C __TEMPDIR__ clone https://github.com/containers/podman.git
+sleep	1
+
+say	, patch it
+sleep	1
+host	cp ../contrib/podman/0001-libpod-Add-pasta-networking-mode.patch __TEMPDIR__/podman
+host	cd __TEMPDIR__/podman
+host	patch -p1 < 0001-libpod-Add-pasta-networking-mode.patch
+sleep	1
+
+say	, and build it.
+host	make
+sleep	1
+
+nl
+nl
+say	By default, for 
+em	rootless
+say	 mode, Podman will pick
+nl
+em	slirp4netns
+say	 to operate the network.
+nl
+nl
+say	Let's start a container with it
+sleep	1
+
+ns1	cd __TEMPDIR__/podman
+ns1b	./bin/podman run --rm -ti alpine sh
+sleep	2
+
+say	,
+nl
+say	and one with 
+em	pasta
+say	 instead.
+
+ns2	cd __TEMPDIR__/podman
+ns2b	./bin/podman run --net=pasta --rm -ti alpine sh
+sleep	2
+
+nl
+nl
+say	We can observe some practical differences:
+nl
+
+ns1b	ip ad sh
+sleep	3
+say	- slirp4netns uses a predefined IPv4 address
+hl	NS1
+sleep	2
+
+ns2b	ip ad sh
+sleep	3
+say	,
+nl
+say	  pasta copies addresses from the host
+hl	NS2
+sleep	2
+
+nl
+say	- pasta enables IPv6 by default
+hl	NS2
+sleep	2
+
+nl
+say	- slirp4netns uses 
+em	tap0
+say	 as interface name
+hl	NS1
+sleep	2
+
+say	, pasta
+nl
+say	  takes an interface name from the host
+hl	NS2
+sleep	2
+
+nl
+say	- same for routes:
+
+ns1b	ip ro sh
+sleep	3
+say	 slirp4netns defines its own
+nl
+say	  gateway address
+hl	NS1
+sleep	2
+
+say	, pasta copies it from the host
+ns2b	ip ro sh
+ns2b	ip -6 ro sh
+sleep	5
+
+nl
+nl
+say	Let's check connectivity...
+sleep	2
+ns1b	wget risotto.milane.se
+ns2b	wget myfinge.rs
+sleep	2
+say	 fine.
+sleep	5
+nl
+nl
+
+say	Let's run a service in the container. We didn't
+nl
+say	configure port forwarding. With default options,
+nl
+say	pasta detects services bound inside and outside
+nl
+say	the container and forwards ports accordingly, so
+nl
+say	we don't need to restart it. Let's restart the
+nl
+say	container running with slirp4netns...
+sleep	5
+
+ns1b	exit
+sleep	2
+ns1b	podman run --rm -p 8080:8080/tcp -ti alpine sh
+sleep	5
+
+nl
+nl
+say	and now actually start the service
+ns1b	apk add thttpd
+ns2b	apk add thttpd
+ns1b	>index.html cat << EOF
+ns1b	<!doctype html><body>Hello via slirp4netns</body>
+ns1b	EOF
+ns2b	>index.html cat << EOF
+ns2b	<!doctype html><body>Hello via pasta</body>
+ns2b	EOF
+ns1b	thttpd -p 8080
+ns2b	thttpd -p 8081
+
+sleep	3
+say	, then check
+nl
+say	that it's accessible.
+sleep	3
+
+hostb	lynx http://127.0.0.1:8080/
+sleep	5
+hostb	q
+hostb	lynx http://[::1]:8081/
+sleep	5
+hostb	q
+sleep	2
+
+nl
+nl
+say	What about performance, you might ask.
+nl
+say	For simplicity, we'll measure between init
+nl
+say	namespace (the "host") and container. To do
+nl
+say	that, we need to allow the container direct
+nl
+say	access to the host, which needs an extra option
+nl
+say	in slirp4netns. Let's restart that container,
+nl
+say	while also mapping ports for iperf3 and neper,
+nl
+say	and enabling IPv6 for slirp4netns (experimental)
+nl
+say	too.
+sleep	3
+
+ns1	exit
+
+ns1b	podman run --rm --net=slirp4netns:allow_host_loopback=true,enable_ipv6=true -p 5201-5202:5201-5202/tcp -p 5201-5202:5201-5202/udp -ti alpine sh
+sleep	5
+nl
+nl
+say	pasta allows that by default, so we wouldn't need
+nl
+say	to touch the container using pasta, but let's
+nl
+say	take the chance to look at passing extra options
+nl
+say	there as well.
+nl
+nl
+ns2	exit
+
+say	Options after '--net-pasta:' are the same as
+nl
+say	documented for the command line of pasta(1).
+nl
+say	For example, we can enable packet captures
+sleep	3
+ns2b	./bin/podman run --net=pasta:--pcap,demo.pcap --rm -ti alpine sh
+sleep	5
+
+say	,
+nl
+say	and generate some traffic we can look at.
+nl
+sleep	2
+ns2b	wget -O - lameexcu.se
+sleep	2
+hostb	tshark -r demo.pcap tcp
+sleep	5
+
+nl
+say	But back to performance now. By the way,
+nl
+say	pasta doesn't detect bound UDP ports
+nl
+say	periodically (only when it starts), so we
+nl
+say	have to pass the ones we need explicitly.
+nl
+sleep	2
+ns2b	exit
+sleep	1
+ns2b	./bin/podman run --net=pasta:-U,5214 -p 5204:5204/udp --rm -ti alpine sh
+sleep	5
+
+nl
+say	In slirp4netns mode, Podman enables by
+nl
+say	default the port forwarder from 'rootlesskit'
+nl
+say	for better performance.
+nl
+say	However, it can't be used for non-local
+nl
+say	mappings (traffic without loopback source 
+nl
+em	and
+say	 destination) because it doesn't preserve
+nl
+say	the correct source address as it forwards
+nl
+say	packets to the container.
+sleep	3
+nl
+nl
+say	We'll check non-loopback mappings first for
+nl
+say	both pasta and slirp4netns, then restart the
+nl
+say	slirp4netns container with rootlesskit and
+nl
+say	switch to loopback mappings. pasta doesn't
+nl
+say	have this limitation.
+nl
+nl
+say	One last note: slirp4netns doesn't support
+nl
+say	forwarding of IPv6 ports (to the container):
+nl
+say	github.com/rootless-containers/slirp4netns/issues/253
+nl
+say	so we'll skip IPv6 tests for slirp4netns as
+nl
+say	port forwarder (on the path to the container).
+
+sleep	5
+ns1	exit
+ns1b	podman run --rm --net=slirp4netns:allow_host_loopback=true,enable_ipv6=true,port_handler=slirp4netns -p 5201-5202:5201-5202/tcp -p 5201-5202:5201-5202/udp -ti alpine sh
+sleep	3
+
+nl
+nl
+say	We'll use iperf3(1) for throughput
+sleep	2
+ns1b	apk add iperf3 jq bc
+ns2b	apk add iperf3 jq bc
+sleep	2
+say	 and static
+nl
+say	builds of neper (github.com/google/neper) for
+nl
+say	latency.
+ns1	wget lameexcu.se/tcp_rr; chmod 755 tcp_rr
+ns2	wget lameexcu.se/tcp_rr; chmod 755 tcp_rr
+ns1	wget lameexcu.se/tcp_crr; chmod 755 tcp_crr
+ns2	wget lameexcu.se/tcp_crr; chmod 755 tcp_crr
+ns1	wget lameexcu.se/udp_rr; chmod 755 udp_rr
+ns2	wget lameexcu.se/udp_rr; chmod 755 udp_rr
+sleep	5
+
+nl
+nl
+say	Everything is set now, let's start
+sleep	2
+hout	IFNAME ip -j li sh | jq -rM '.[] | select(.link_type == "ether").ifname'
+hout	ADDR4 ip -j -4 ad sh|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.scope == "global").local'
+hout	ADDR6 ip -j -6 ad sh|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.scope == "global").local'
+hout	GW4 ip -j -4 ro sh|jq -rM '.[] | select(.dst == "default").gateway'
+hout	GW6 ip -j -6 ro sh|jq -rM '.[] | select(.dst == "default").gateway'
+
+nl
+nl
+resize	INFO D 15
+info	Throughput in Gbps, latency in µs
+info	  non-loopback (tap) connections
+th	mode slirp4netns pasta
+
+tr	TCP/IPv6 to ns
+#ns1b	(iperf3 -s1J -p 5201 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+#ns1b	iperf3 -s1J -p 5202 | jq -rM ".end.sum_received.bits_per_second" >t2
+#hostb	iperf3 -c __ADDR6__ -p 5201 __OPTS_10s__ & iperf3 -c __ADDR6__ -p 5202 __OPTS_10s__
+#sleep	15
+#ns1b	
+#ns1out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+#bw	__BW__ 0.0 0.0
+bw	-
+ns2b	(iperf3 -s1J -p 5203 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c __ADDR6__ -p 5203 -t5 -l 1M -Z & iperf3 -c __ADDR6__ -p 5204 -t5 -l 1M -Z
+sleep	10
+ns2b	
+ns2out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	
+
+tl	  RR latency
+#ns1b	./tcp_rr -6 --nolog -C 5201 -P 5202
+#sleep	2
+#hout	LAT tcp_rr --nolog -c -H __ADDR6__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+#lat	__LAT__ 100000 100000
+lat	-
+ns2b	./tcp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_rr --nolog -c -H __ADDR6__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+#ns1b	./tcp_crr -6 --nolog -C 5201 -P 5202
+#sleep	2
+#hout	LAT tcp_crr --nolog -c -H __ADDR6__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+#lat	__LAT__ 100000 100000
+lat	-
+ns2b	./tcp_crr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_crr --nolog -c -H __ADDR6__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	TCP/IPv4 to ns
+ns1b	(iperf3 -s1J -p 5201 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns1b	iperf3 -s1J -p 5202 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c __ADDR4__ -p 5201 __OPTS__ & iperf3 -c __ADDR4__ -p 5202 __OPTS__
+sleep	10
+ns1b	
+ns1out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns2b	(iperf3 -s1J -p 5203 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c __ADDR4__ -p 5203 __OPTS__ & iperf3 -c __ADDR4__ -p 5204 __OPTS__
+sleep	10
+ns2b	
+ns2out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	
+
+tl	  RR latency
+ns1b	./tcp_rr -4 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_rr --nolog -c -H __ADDR4__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_rr -4 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_rr --nolog -c -H __ADDR4__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+ns1b	./tcp_crr -4 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_crr --nolog -c -H __ADDR4__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_crr -4 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_crr --nolog -c -H __ADDR4__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tr	TCP/IPv6 to host
+hostb	(iperf3 -s1J -p 5211 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5212 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns1b	iperf3 -c fd00::2 -p 5211 __OPTS__ & iperf3 -c fd00::2 -p 5212 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	(iperf3 -s1J -p 5213 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5214 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns2b	iperf3 -c __GW6__%__IFNAME__ -p 5213 __OPTS__ & iperf3 -c __GW6__%__IFNAME__ -p 5214 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns1b	
+ns2b	
+
+tl	  RR latency
+hostb	tcp_rr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_rr --nolog -c -H fd00::2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_rr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_rr --nolog -c -H __GW6__%__IFNAME__ -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+hostb	tcp_crr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_crr --nolog -c -H fd00::2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_crr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_crr --nolog -c -H __GW6__%__IFNAME__ -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	TCP/IPv4 to host
+hostb	(iperf3 -s1J -p 5211 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5212 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns1b	iperf3 -c 10.0.2.2 -p 5211 __OPTS__ & iperf3 -c 10.0.2.2 -p 5212 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	(iperf3 -s1J -p 5213 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5214 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns2b	iperf3 -c __GW4__ -p 5213 __OPTS__ & iperf3 -c __GW4__ -p 5214 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns1b	
+ns2b	
+
+tl	  RR latency
+hostb	tcp_rr -4 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_rr --nolog -c -H 10.0.2.2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_rr -4 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_rr --nolog -c -H __GW4__ -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+hostb	tcp_crr -4 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_crr --nolog -c -H 10.0.2.2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_crr -4 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_crr --nolog -c -H __GW4__ -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+sleep	5
+
+
+tr	UDP/IPv6 to ns
+#ns1b	iperf3 -s1J -p 5201 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+#hostb	iperf3 -u -c __ADDR6__ -p 5201 -t5 -b 35G
+#sleep	10
+#ns1out	BW cat t1
+#bw	__BW__ 0.0 0.0
+bw	-
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c __ADDR6__ -p 5204 -t5 -b 35G
+sleep	10
+ns2out	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+#ns1b	./udp_rr -6 --nolog -C 5201 -P 5202
+#sleep	2
+#hout	LAT udp_rr --nolog -c -H __ADDR6__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+#lat	__LAT__ 100000 100000
+lat	-
+ns2b	./udp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT udp_rr --nolog -c -H __ADDR6__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	UDP/IPv4 to ns
+ns1b	iperf3 -s1J -p 5201 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c __ADDR4__ -p 5201 -t5 -b 35G
+sleep	10
+ns1out	BW cat t1
+bw	__BW__ 0.0 0.0
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c __ADDR4__ -p 5204 -t5 -b 35G
+sleep	10
+ns2out	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+ns1b	./udp_rr -6 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT udp_rr --nolog -c -H __ADDR4__ -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./udp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT udp_rr --nolog -c -H __ADDR4__ -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+
+ns1	exit
+ns1	podman run --rm --net=slirp4netns:allow_host_loopback=true,enable_ipv6=true -p 5201-5202:5201-5202/tcp -p 5201-5202:5201-5202/udp -ti alpine sh
+ns1	apk add iperf3 jq bc
+ns1	wget lameexcu.se/tcp_rr; chmod 755 tcp_rr
+ns1	wget lameexcu.se/tcp_crr; chmod 755 tcp_crr
+ns1	wget lameexcu.se/udp_rr; chmod 755 udp_rr
+info	
+info	
+info	  loopback (lo) connections
+th	mode rootlesskit pasta
+
+
+tr	TCP/IPv6 to ns
+ns1b	(iperf3 -s1J -p 5201 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns1b	iperf3 -s1J -p 5202 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c ::1 -p 5201 -t5 -l 1M -Z & iperf3 -c ::1 -p 5202 -t5 -l 1M -Z
+sleep	10
+ns1b	
+ns1out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns2b	(iperf3 -s1J -p 5203 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c ::1 -p 5203 -t5 -l 1M -Z & iperf3 -c ::1 -p 5204 -t5 -l 1M -Z
+sleep	10
+ns2b	
+ns2out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	
+
+tl	  RR latency
+ns1b	./tcp_rr -6 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_rr --nolog -c -H ::1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_rr --nolog -c -H ::1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+ns1b	./tcp_crr -6 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_crr --nolog -c -H ::1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_crr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_crr --nolog -c -H ::1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	TCP/IPv4 to ns
+ns1b	(iperf3 -s1J -p 5201 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns1b	iperf3 -s1J -p 5202 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c 127.0.0.1 -p 5201 __OPTS__ & iperf3 -c 127.0.0.1 -p 5202 __OPTS__
+sleep	10
+ns1b	
+ns1out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns2b	(iperf3 -s1J -p 5203 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".end.sum_received.bits_per_second" >t2
+hostb	iperf3 -c 127.0.0.1 -p 5203 __OPTS__ & iperf3 -c 127.0.0.1 -p 5204 __OPTS__
+sleep	10
+ns2b	
+ns2out	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	
+
+tl	  RR latency
+ns1b	./tcp_rr -4 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_rr --nolog -c -H 127.0.0.1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_rr -4 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_rr --nolog -c -H 127.0.0.1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+ns1b	./tcp_crr -4 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT tcp_crr --nolog -c -H 127.0.0.1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./tcp_crr -4 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT tcp_crr --nolog -c -H 127.0.0.1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tr	TCP/IPv6 to host
+hostb	(iperf3 -s1J -p 5211 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5212 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns1b	iperf3 -c fd00::2 -p 5211 __OPTS__ & iperf3 -c fd00::2 -p 5212 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	(iperf3 -s1J -p 5213 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5214 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns2b	iperf3 -c ::1 -p 5213 __OPTS__ & iperf3 -c ::1 -p 5214 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns1b	
+ns2b	
+
+tl	  RR latency
+hostb	tcp_rr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_rr --nolog -c -H fd00::2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_rr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_rr --nolog -c -H ::1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+hostb	tcp_crr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_crr --nolog -c -H fd00::2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_crr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_crr --nolog -c -H ::1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	TCP/IPv4 to host
+hostb	(iperf3 -s1J -p 5211 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5212 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns1b	iperf3 -c 10.0.2.2 -p 5211 __OPTS__ & iperf3 -c 10.0.2.2 -p 5212 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+hostb	(iperf3 -s1J -p 5213 | jq -rM ".end.sum_received.bits_per_second" >t1) &
+hostb	iperf3 -s1J -p 5214 | jq -rM ".end.sum_received.bits_per_second" >t2
+ns2b	iperf3 -c 127.0.0.1 -p 5213 __OPTS__ & iperf3 -c 127.0.0.1 -p 5214 __OPTS__
+sleep	10
+hostb	
+hout	BW echo "$(cat t1) + $(cat t2)" | bc -l
+bw	__BW__ 0.0 0.0
+ns1b	
+ns2b	
+
+tl	  RR latency
+hostb	tcp_rr -4 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_rr --nolog -c -H 10.0.2.2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_rr -4 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_rr --nolog -c -H 127.0.0.1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	  CRR latency
+hostb	tcp_crr -4 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./tcp_crr --nolog -c -H 10.0.2.2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	tcp_crr -4 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./tcp_crr --nolog -c -H 127.0.0.1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+sleep	5
+
+
+tr	UDP/IPv6 to ns
+ns1b	iperf3 -s1J -p 5201 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c ::1 -p 5201 -t5 -b 35G
+sleep	10
+ns1out	BW cat t1
+bw	__BW__ 0.0 0.0
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c ::1 -p 5204 -t5 -b 35G
+sleep	10
+ns2out	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+ns1b	./udp_rr -6 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT udp_rr --nolog -c -H ::1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./udp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT udp_rr --nolog -c -H ::1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	UDP/IPv4 to ns
+ns1b	iperf3 -s1J -p 5201 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c 127.0.0.1 -p 5201 -t5 -b 35G
+sleep	10
+ns1out	BW cat t1
+bw	__BW__ 0.0 0.0
+ns2b	iperf3 -s1J -p 5204 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+hostb	iperf3 -u -c 127.0.0.1 -p 5204 -t5 -b 35G
+sleep	10
+ns2out	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+ns1b	./udp_rr -6 --nolog -C 5201 -P 5202
+sleep	2
+hout	LAT udp_rr --nolog -c -H 127.0.0.1 -C 5201 -P 5202 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+ns2b	./udp_rr -6 --nolog -C 5203 -P 5204
+sleep	2
+hout	LAT udp_rr --nolog -c -H 127.0.0.1 -C 5203 -P 5204 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tr	UDP/IPv6 to host
+hostb	iperf3 -s1J -p 5211 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+ns1b	iperf3 -u -c fd00::2 -p 5211 -t5 -b 35G
+sleep	10
+hout	BW cat t1
+bw	__BW__ 0.0 0.0
+hostb	iperf3 -s1J -p 5214 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+ns2b	iperf3 -u -c ::1 -p 5214 -t5 -b 35G
+sleep	10
+hout	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+hostb	udp_rr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./udp_rr --nolog -c -H fd00::2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	udp_rr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./udp_rr --nolog -c -H ::1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+tl	UDP/IPv4 to host
+hostb	iperf3 -s1J -p 5211 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+ns1b	iperf3 -u -c 10.0.2.2 -p 5211 -t5 -b 35G
+sleep	10
+hout	BW cat t1
+bw	__BW__ 0.0 0.0
+hostb	iperf3 -s1J -p 5214 | jq -rM ".intervals[0].sum.bits_per_second" >t1
+ns2b	iperf3 -u -c 127.0.0.1 -p 5214 -t5 -b 35G
+sleep	10
+hout	BW cat t1
+bw	__BW__ 0.0 0.0
+
+tl	  RR latency
+hostb	udp_rr -6 --nolog -C 5211 -P 5212
+sleep	2
+ns1out	LAT ./udp_rr --nolog -c -H 10.0.2.2 -C 5211 -P 5212 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+hostb	udp_rr -6 --nolog -C 5213 -P 5214
+sleep	2
+ns2out	LAT ./udp_rr --nolog -c -H 127.0.0.1 -C 5213 -P 5214 -l 5 | sed -n 's/^throughput=\(.*\)/\1/p'
+lat	__LAT__ 100000 100000
+
+
+nl
+nl
+say	Thanks for watching!
+sleep	15
diff --git a/test/lib/layout b/test/lib/layout
index 7802dac..2d6b197 100644
--- a/test/lib/layout
+++ b/test/lib/layout
@@ -207,7 +207,6 @@ layout_two_guests() {
 layout_demo_pasta() {
 	sleep 3
 
-	tmux kill-pane -a -t 0
 	cmd_write 0 cd ${BASEPATH}
 	cmd_write 0 clear
 	sleep 1
@@ -244,7 +243,6 @@ layout_demo_pasta() {
 layout_demo_passt() {
 	sleep 3
 
-	tmux kill-pane -a -t 0
 	cmd_write 0 cd ${BASEPATH}
 	cmd_write 0 clear
 	sleep 1
@@ -276,3 +274,39 @@ layout_demo_passt() {
 
 	sleep 1
 }
+
+# layout_demo_podman() - Four panes for pasta demo with Podman
+layout_demo_podman() {
+	sleep 3
+
+	cmd_write 0 cd ${BASEPATH}
+	cmd_write 0 clear
+	sleep 1
+	cmd_write 0 clear
+
+	tmux split-window -v -l '65%' -t passt_test
+	tmux split-window -h -t passt_test
+	tmux split-window -h -l '42%' -t passt_test:1.0
+
+	PANE_HOST=0
+	PANE_INFO=1
+	PANE_NS1=2
+	PANE_NS2=3
+
+	get_info_cols
+
+	tmux pipe-pane -O -t ${PANE_NS1} "cat >> ${LOGDIR}/pane_ns1.log"
+	tmux select-pane -t ${PANE_NS1} -T "Podman with slirp4netns"
+
+	tmux pipe-pane -O -t ${PANE_NS2} "cat >> ${LOGDIR}/pane_ns2.log"
+	tmux select-pane -t ${PANE_NS2} -T "Podman with pasta"
+
+	tmux send-keys -l -t ${PANE_INFO} 'while cat /tmp/.passt_test_log_pipe; do :; done'
+	tmux send-keys -t ${PANE_INFO} -N 100 C-m
+	tmux select-pane -t ${PANE_INFO} -T ""
+
+	tmux pipe-pane -O -t ${PANE_HOST} "cat >> ${LOGDIR}/pane_host.log"
+	tmux select-pane -t ${PANE_HOST} -T "host"
+
+	sleep 1
+}
diff --git a/test/lib/setup b/test/lib/setup
index df21655..b076eff 100755
--- a/test/lib/setup
+++ b/test/lib/setup
@@ -327,12 +327,31 @@ teardown_demo_passt() {
 	pane_wait GUEST
 	pane_wait HOST
 	pane_wait PASST
+
+	tmux kill-pane -a -t 0
+	tmux send-keys -t 0 "C-c"
 }
 
-# teardown_demo_pasta() - Exit namespace from remaining pane
+# teardown_demo_pasta() - Exit perf and namespace from remaining pane
 teardown_demo_pasta() {
+	tmux send-keys -t ${PANE_NS} "q"
+	pane_wait NS
 	tmux send-keys -t ${PANE_NS} "C-d"
 	pane_wait NS
+
+	tmux kill-pane -a -t 0
+	tmux send-keys -t 0 "C-c"
+}
+
+# teardown_demo_podman() - Exit namespaces
+teardown_demo_podman() {
+	tmux send-keys -t ${PANE_NS1} "C-d"
+	tmux send-keys -t ${PANE_NS2} "C-d"
+	pane_wait NS1
+	pane_wait NS2
+
+	tmux kill-pane -a -t 0
+	tmux send-keys -t 0 "C-c"
 }
 
 # setup() - Run setup_*() functions
diff --git a/test/lib/term b/test/lib/term
index cc6349f..e8a1d38 100755
--- a/test/lib/term
+++ b/test/lib/term
@@ -176,6 +176,15 @@ pane_highlight() {
 	sleep 3
 }
 
+# pane_resize() - Resize a pane given its name
+# $1:	Pane name
+# $2:	Direction: U, D, L, or R
+# $3:	Adjustment in lines or columns
+pane_resize() {
+	__pane_number=$(eval echo \$PANE_${1})
+	tmux resize-pane -${2} -t ${__pane_number} ${3}
+}
+
 # pane_run() - Issue a command in given pane name
 # $1:	Pane name
 # $@:	Command to issue
@@ -201,6 +210,7 @@ pane_wait() {
 		case ${__l} in
 		'$ ' | '# ' | '# # ' | *"$ " | *"# ") return ;;
 		*" #[m " | *" #[m [K" | *"]# ["*) return ;;
+		*' $ [6n' | *' # [6n' ) return ;;
 		esac
 	do sleep 0.1 || sleep 1; done
 }
diff --git a/test/lib/test b/test/lib/test
index 9f6f6e4..2854191 100755
--- a/test/lib/test
+++ b/test/lib/test
@@ -218,12 +218,32 @@ test_one_line() {
 		pane_run NS "${__arg}"
 		pane_wait NS
 		;;
+	"ns1")
+		pane_run NS1 "${__arg}"
+		pane_wait NS1
+		;;
+	"ns2")
+		pane_run NS2 "${__arg}"
+		pane_wait NS2
+		;;
 	"nsb")
 		pane_run NS "${__arg}"
 		;;
+	"ns1b")
+		pane_run NS1 "${__arg}"
+		;;
+	"ns2b")
+		pane_run NS2 "${__arg}"
+		;;
 	"nsw")
 		pane_wait NS
 		;;
+	"ns1w")
+		pane_wait NS1
+		;;
+	"ns2w")
+		pane_wait NS2
+		;;
 	"nstools")
 		pane_run NS 'which '"${__arg}"' >/dev/null || echo skip'
 		pane_wait NS
@@ -259,6 +279,18 @@ test_one_line() {
 		pane_wait NS
 		TEST_ONE_subs="$(list_add_pair "${TEST_ONE_subs}" "__${__varname}__" "$(pane_parse NS)")"
 		;;
+	"ns1out")
+		__varname="${__arg%% *}"
+		pane_run NS1 "${__arg#* }"
+		pane_wait NS1
+		TEST_ONE_subs="$(list_add_pair "${TEST_ONE_subs}" "__${__varname}__" "$(pane_parse NS1)")"
+		;;
+	"ns2out")
+		__varname="${__arg%% *}"
+		pane_run NS2 "${__arg#* }"
+		pane_wait NS2
+		TEST_ONE_subs="$(list_add_pair "${TEST_ONE_subs}" "__${__varname}__" "$(pane_parse NS2)")"
+		;;
 	"check")
 		info_check "${__arg}"
 		__nok=0
@@ -326,6 +358,9 @@ test_one_line() {
 	"killp")
 		pane_kill "${__arg}"
 		;;
+	"resize")
+		pane_resize ${__arg}
+		;;
 	*)
 		__def_body="$(eval printf \"\$TEST_ONE_DEF_$__cmd\")"
 		if [ -n "${__def_body}" ]; then
diff --git a/test/run b/test/run
index dadd983..c91122d 100755
--- a/test/run
+++ b/test/run
@@ -128,6 +128,14 @@ demo() {
 	MODE=pasta
 	test demo
 	video_stop 0
+	teardown demo_pasta
+
+	layout_demo_podman
+	video_grab demo_podman
+	MODE=podman
+	test demo
+	video_stop 0
+	teardown_demo_podman
 
 	return 0
 }
-- 
@@ -128,6 +128,14 @@ demo() {
 	MODE=pasta
 	test demo
 	video_stop 0
+	teardown demo_pasta
+
+	layout_demo_podman
+	video_grab demo_podman
+	MODE=podman
+	test demo
+	video_stop 0
+	teardown_demo_podman
 
 	return 0
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes
  2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
                   ` (17 preceding siblings ...)
  2022-02-22  1:34 ` [PATCH 18/18] test: Add demo for Podman with pasta Stefano Brivio
@ 2022-02-22  9:07 ` Stefano Brivio
  18 siblings, 0 replies; 20+ messages in thread
From: Stefano Brivio @ 2022-02-22  9:07 UTC (permalink / raw)
  To: passt-dev

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

On Tue, 22 Feb 2022 02:34:16 +0100
Stefano Brivio <sbrivio(a)redhat.com> wrote:

> [...]
>
> - adds a demo for Podman operation with pasta and side-by-side
>   comparison with slirp4netns (patch 18/18).
> 
> I already ran a demo recording for the Podman demo:
>   https://passt.top/builds/latest/web/demo_podman.webm

...forget about it, having cool-retro-term and ffmpeg threads on my box
with iperf3 running isn't a good idea. I'm now switching the whole video
mess to asciinema, preview:
	https://asciinema.org/a/jNz15xWEgj0COs2VT6kdfdJ9L

-- 
Stefano


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-02-22  9:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22  1:34 [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio
2022-02-22  1:34 ` [PATCH 01/18] slirp4netns: Look up pasta command, exit if not found Stefano Brivio
2022-02-22  1:34 ` [PATCH 02/18] slirp4netns: Add EXIT as condition for trap Stefano Brivio
2022-02-22  1:34 ` [PATCH 03/18] passt, pasta: Namespace-based sandboxing, defer seccomp policy application Stefano Brivio
2022-02-22  1:34 ` [PATCH 04/18] passt: Make process not dumpable after sandboxing Stefano Brivio
2022-02-22  1:34 ` [PATCH 05/18] Makefile, conf, passt: Drop passt4netns references, explicit argc check Stefano Brivio
2022-02-22  1:34 ` [PATCH 06/18] slirp4netns.sh: Implement API socket option for port forwarding Stefano Brivio
2022-02-22  1:34 ` [PATCH 07/18] conf: Don't print configuration on --quiet Stefano Brivio
2022-02-22  1:34 ` [PATCH 08/18] conf: Given IPv4 address and no netmask, assign RFC 790-style classes Stefano Brivio
2022-02-22  1:34 ` [PATCH 09/18] conf, udp: Introduce basic DNS forwarding Stefano Brivio
2022-02-22  1:34 ` [PATCH 10/18] udp: Allow loopback connections from host using configured unicast address Stefano Brivio
2022-02-22  1:34 ` [PATCH 11/18] tcp, udp: Receive batching doesn't pay off when writing single frames to tap Stefano Brivio
2022-02-22  1:34 ` [PATCH 12/18] pasta: By default, quit if filesystem-bound net namespace goes away Stefano Brivio
2022-02-22  1:34 ` [PATCH 13/18] test/distro/ubuntu: Use DEBIAN_FRONTEND=noninteractive for apt on 22.04 Stefano Brivio
2022-02-22  1:34 ` [PATCH 14/18] test/perf/passt_udp: Drop threshold for 256B test Stefano Brivio
2022-02-22  1:34 ` [PATCH 15/18] man page: Update REPORTING BUGS section Stefano Brivio
2022-02-22  1:34 ` [PATCH 16/18] README, hooks: Build HTML man page on push, add a link Stefano Brivio
2022-02-22  1:34 ` [PATCH 17/18] contrib: Add patch for Podman integration Stefano Brivio
2022-02-22  1:34 ` [PATCH 18/18] test: Add demo for Podman with pasta Stefano Brivio
2022-02-22  9:07 ` [PATCH 00/18] slirp4netns, sandboxing, Podman integration, assorted fixes Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).