public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Passt Interface Identifiers
@ 2023-10-09  8:30 David Gibson
  2023-10-09  8:30 ` [PATCH 1/4] udp: Clean up ref initialisation in udp_sock_init() David Gibson
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: David Gibson @ 2023-10-09  8:30 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

For the generalisations to forwarding we want to accomplish with the
flow table, we will need a more formalised notion of passt
"interfaces" - that is way that passt communicates with network,
whether that be over L4 sockets (as on the host) or via an L2 tunnel.

This series makes a small start on this, by introducing a type to
identify passt interfaces.  We don't use it for a whole lot yet, but
it will become more useful in future.

David Gibson (4):
  udp: Clean up ref initialisation in udp_sock_init()
  pif: Introduce notion of passt/pasta interface
  pif: Record originating pif in listening socket refs
  pif: Pass originating pif to tap handler functions

 Makefile     |  2 +-
 icmp.c       |  4 +++-
 icmp.h       |  4 ++--
 passt.h      |  1 +
 pif.h        | 36 ++++++++++++++++++++++++++++++++++++
 tap.c        | 26 ++++++++++++++++----------
 tcp.c        | 11 ++++++++---
 tcp.h        |  7 ++++---
 tcp_splice.c | 10 ++++++----
 udp.c        | 35 ++++++++++++++++++-----------------
 udp.h        | 11 ++++++-----
 11 files changed, 101 insertions(+), 46 deletions(-)
 create mode 100644 pif.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] udp: Clean up ref initialisation in udp_sock_init()
  2023-10-09  8:30 [PATCH 0/4] Passt Interface Identifiers David Gibson
@ 2023-10-09  8:30 ` David Gibson
  2023-10-09  8:30 ` [PATCH 2/4] pif: Introduce notion of passt/pasta interface David Gibson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2023-10-09  8:30 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

udp_sock_init() has a number of paths that initialise uref differently.
However some of the fields are initialised the same way in all of them.
Move those fields into the original initialiser to save a few lines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/udp.c b/udp.c
index cadf393..db35859 100644
--- a/udp.c
+++ b/udp.c
@@ -969,7 +969,8 @@ int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
 int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port)
 {
-	union udp_epoll_ref uref = { .u32 = 0 };
+	union udp_epoll_ref uref = { .splice = (c->mode == MODE_PASTA),
+				     .orig = true };
 	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
 	if (ns) {
@@ -980,8 +981,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		uref.v6 = 0;
-		uref.splice = (c->mode == MODE_PASTA);
-		uref.orig = true;
 
 		if (!ns) {
 			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
@@ -1001,8 +1000,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET6 || af == AF_UNSPEC) && c->ifi6) {
 		uref.v6 = 1;
-		uref.splice = (c->mode == MODE_PASTA);
-		uref.orig = true;
 
 		if (!ns) {
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
-- 
@@ -969,7 +969,8 @@ int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
 int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port)
 {
-	union udp_epoll_ref uref = { .u32 = 0 };
+	union udp_epoll_ref uref = { .splice = (c->mode == MODE_PASTA),
+				     .orig = true };
 	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
 	if (ns) {
@@ -980,8 +981,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET || af == AF_UNSPEC) && c->ifi4) {
 		uref.v6 = 0;
-		uref.splice = (c->mode == MODE_PASTA);
-		uref.orig = true;
 
 		if (!ns) {
 			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
@@ -1001,8 +1000,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 
 	if ((af == AF_INET6 || af == AF_UNSPEC) && c->ifi6) {
 		uref.v6 = 1;
-		uref.splice = (c->mode == MODE_PASTA);
-		uref.orig = true;
 
 		if (!ns) {
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] pif: Introduce notion of passt/pasta interface
  2023-10-09  8:30 [PATCH 0/4] Passt Interface Identifiers David Gibson
  2023-10-09  8:30 ` [PATCH 1/4] udp: Clean up ref initialisation in udp_sock_init() David Gibson
@ 2023-10-09  8:30 ` David Gibson
  2023-10-09  8:30 ` [PATCH 3/4] pif: Record originating pif in listening socket refs David Gibson
  2023-10-09  8:30 ` [PATCH 4/4] pif: Pass originating pif to tap handler functions David Gibson
  3 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2023-10-09  8:30 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We have several possible ways of communicating with other entities.  We use
sockets to communicate with the host and other network sites, but also in
a different context to communicate "spliced" channels to a namespace.  We
also use a tuntap device or a qemu socket to communicate with the namespace
or guest.

For the time being these are just defined implicitly by how we structure
things.  However, there are other communication channels we want to use in
future (e.g. virtio-user), and we want to allow more flexible forwarding
between those.  To accomplish that we're going to want a specific way of
referring to those channels.

Introduce the concept of a "passt/pasta interface" or "pif" representing a
specific channel to communicate network data.  Each pif is assumed to be
associated with a specific network namespace in the broad sense (that is
as a place where IP addresses have a consistent meaning - not the Linux
specific sense).  But there could be multiple pifs communicating with the
same namespace (e.g. the spliced and tap interfaces in pasta).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 Makefile |  2 +-
 pif.h    | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 pif.h

diff --git a/Makefile b/Makefile
index 0324fdd..55972b1 100644
--- a/Makefile
+++ b/Makefile
@@ -53,7 +53,7 @@ MANPAGES = passt.1 pasta.1 qrap.1
 
 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h icmp.h \
 	inany.h isolation.h lineread.h log.h ndp.h netlink.h packet.h passt.h \
-	pasta.h pcap.h port_fwd.h siphash.h tap.h tcp.h tcp_conn.h \
+	pasta.h pcap.h pif.h port_fwd.h siphash.h tap.h tcp.h tcp_conn.h \
 	tcp_splice.h udp.h util.h
 HEADERS = $(PASST_HEADERS) seccomp.h
 
diff --git a/pif.h b/pif.h
new file mode 100644
index 0000000..edd4fee
--- /dev/null
+++ b/pif.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright Red Hat
+ * Author: David Gibson <david@gibson.dropbear.id.au>
+ *
+ * Passt/pasta interface types and IDs
+ */
+#ifndef PIF_H
+#define PIF_H
+
+/**
+ * enum pif_type - Type of passt/pasta interface ("pif")
+ *
+ * Pifs can be an L4 level channel (sockets) or an L2 level channel (tap device
+ * or qemu socket).
+ */
+enum pif_type {
+	/* Invalid or not present pif */
+	PIF_NONE = 0,
+	/* Host socket interface */
+	PIF_HOST,
+	/* Qemu socket or namespace tuntap interface */
+	PIF_TAP,
+	/* Namespace socket interface for splicing */
+	PIF_SPLICE,
+};
+	
+/**
+ * pif_id - Index for passt/pasta interface ("pif")
+ *
+ * This represents a specific instance of an interface.  For now, we don't have
+ * multiple instances of the same type, so we use the same values as for
+ * pif_type here.  That might change in future.
+ */
+typedef uint8_t pif_id;
+
+#endif /* PIF_H */
-- 
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright Red Hat
+ * Author: David Gibson <david@gibson.dropbear.id.au>
+ *
+ * Passt/pasta interface types and IDs
+ */
+#ifndef PIF_H
+#define PIF_H
+
+/**
+ * enum pif_type - Type of passt/pasta interface ("pif")
+ *
+ * Pifs can be an L4 level channel (sockets) or an L2 level channel (tap device
+ * or qemu socket).
+ */
+enum pif_type {
+	/* Invalid or not present pif */
+	PIF_NONE = 0,
+	/* Host socket interface */
+	PIF_HOST,
+	/* Qemu socket or namespace tuntap interface */
+	PIF_TAP,
+	/* Namespace socket interface for splicing */
+	PIF_SPLICE,
+};
+	
+/**
+ * pif_id - Index for passt/pasta interface ("pif")
+ *
+ * This represents a specific instance of an interface.  For now, we don't have
+ * multiple instances of the same type, so we use the same values as for
+ * pif_type here.  That might change in future.
+ */
+typedef uint8_t pif_id;
+
+#endif /* PIF_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] pif: Record originating pif in listening socket refs
  2023-10-09  8:30 [PATCH 0/4] Passt Interface Identifiers David Gibson
  2023-10-09  8:30 ` [PATCH 1/4] udp: Clean up ref initialisation in udp_sock_init() David Gibson
  2023-10-09  8:30 ` [PATCH 2/4] pif: Introduce notion of passt/pasta interface David Gibson
@ 2023-10-09  8:30 ` David Gibson
  2023-11-03  9:47   ` Stefano Brivio
  2023-10-09  8:30 ` [PATCH 4/4] pif: Pass originating pif to tap handler functions David Gibson
  3 siblings, 1 reply; 7+ messages in thread
From: David Gibson @ 2023-10-09  8:30 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

For certain socket types, we record in the epoll ref whether they're
sockets in the namespace, or on the host.  We now have the notion of "pif"
to indicate what "place" a socket is associated with, so generalise the
simple one-bit 'ns' to a pif id.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 passt.h      |  1 +
 tcp.c        |  5 +++--
 tcp.h        |  4 ++--
 tcp_splice.c | 10 ++++++----
 udp.c        | 23 ++++++++++++-----------
 udp.h        |  8 ++++----
 6 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/passt.h b/passt.h
index 53defa4..cac720a 100644
--- a/passt.h
+++ b/passt.h
@@ -35,6 +35,7 @@ union epoll_ref;
 #include <assert.h>
 #include <sys/epoll.h>
 
+#include "pif.h"
 #include "packet.h"
 #include "icmp.h"
 #include "port_fwd.h"
diff --git a/tcp.c b/tcp.c
index c13b7de..bad8c38 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2964,6 +2964,7 @@ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port,
 {
 	union tcp_listen_epoll_ref tref = {
 		.port = port + c->tcp.fwd_in.delta[port],
+		.pif = PIF_HOST,
 	};
 	int s;
 
@@ -3025,7 +3026,7 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
 {
 	union tcp_listen_epoll_ref tref = {
 		.port = port + c->tcp.fwd_out.delta[port],
-		.ns = true,
+		.pif = PIF_SPLICE,
 	};
 	struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
 	int s;
@@ -3051,7 +3052,7 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
 {
 	union tcp_listen_epoll_ref tref = {
 		.port = port + c->tcp.fwd_out.delta[port],
-		.ns = true,
+		.pif = PIF_SPLICE,
 	};
 	int s;
 
diff --git a/tcp.h b/tcp.h
index 6444d6a..3f6937c 100644
--- a/tcp.h
+++ b/tcp.h
@@ -41,13 +41,13 @@ union tcp_epoll_ref {
 /**
  * union tcp_listen_epoll_ref - epoll reference portion for TCP listening
  * @port:	Port number we're forwarding *to* (listening port plus delta)
- * @ns:		True if listening within the pasta namespace
+ * @pif:	Pif in which the socket is listening
  * @u32:	Opaque u32 value of reference
  */
 union tcp_listen_epoll_ref {
 	struct {
 		in_port_t	port;
-		bool		ns;
+		pif_id		pif;
 	};
 	uint32_t u32;
 };
diff --git a/tcp_splice.c b/tcp_splice.c
index 54fc317..19f5406 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -411,12 +411,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
  * @c:		Execution context
  * @conn:	Connection pointer
  * @port:	Destination port, host order
- * @outbound:	Connection request coming from namespace
+ * @pif:	Originating pif of the splice
  *
  * Return: return code from connect()
  */
 static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
-			  in_port_t port, int outbound)
+			  in_port_t port, pif_id pif)
 {
 	int s = -1;
 
@@ -427,7 +427,7 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
 	 * entering the ns anyway, so we might as well refill the
 	 * pool.
 	 */
-	if (outbound) {
+	if (pif == PIF_SPLICE) {
 		int *p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
 		int af = CONN_V6(conn) ? AF_INET6 : AF_INET;
 
@@ -437,6 +437,8 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
 	} else {
 		int *p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
 
+		ASSERT(pif == PIF_HOST);
+
 		/* If pool is empty, refill it first */
 		if (p[TCP_SOCK_POOL_SIZE-1] < 0)
 			NS_CALL(tcp_sock_refill_ns, c);
@@ -516,7 +518,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
 	conn->c.spliced = true;
 	conn->a = s;
 
-	if (tcp_splice_new(c, conn, ref.port, ref.ns))
+	if (tcp_splice_new(c, conn, ref.port, ref.pif))
 		conn_flag(c, conn, CLOSING);
 
 	return true;
diff --git a/udp.c b/udp.c
index db35859..ce02979 100644
--- a/udp.c
+++ b/udp.c
@@ -365,7 +365,7 @@ static void udp_sock6_iov_init(const struct ctx *c)
  * @c:		Execution context
  * @v6:		Set for IPv6 sockets
  * @src:	Source port of original connection, host order
- * @splice:	UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
+ * @ns:		Does the splice originate in the ns or not
  *
  * Return: prepared socket, negative error code on failure
  *
@@ -375,16 +375,17 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
 {
 	struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
 	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
-				.udp = { .splice = true, .ns = ns,
-					 .v6 = v6, .port = src }
+				.udp = { .splice = true, .v6 = v6, .port = src }
 			      };
 	struct udp_splice_port *sp;
 	int act, s;
 
 	if (ns) {
+		ref.udp.pif = PIF_SPLICE;
 		sp = &udp_splice_ns[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_NS;
 	} else {
+		ref.udp.pif = PIF_HOST;
 		sp = &udp_splice_init[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_INIT;
 	}
@@ -495,15 +496,15 @@ static int udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
  * @n:		Number of datagrams to send
  * @src:	Datagrams will be sent from this port (on origin side)
  * @dst:	Datagrams will be send to this port (on destination side)
+ * @from_pif:	Pif from which the packet originated
  * @v6:		Send as IPv6?
- * @from_ns:	If true send from pasta ns to init, otherwise reverse
  * @allow_new:	If true create sending socket if needed, if false discard
  *              if no sending socket is available
  * @now:	Timestamp
  */
 static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
-				in_port_t src, in_port_t dst,
-				bool v6, bool from_ns, bool allow_new,
+				in_port_t src, in_port_t dst, pif_id from_pif,
+				bool v6, bool allow_new,
 				const struct timespec *now)
 {
 	struct mmsghdr *mmh_recv, *mmh_send;
@@ -518,7 +519,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
 		mmh_send = udp4_mh_splice;
 	}
 
-	if (from_ns) {
+	if (from_pif == PIF_SPLICE) {
 		src += c->udp.fwd_in.rdelta[src];
 		s = udp_splice_init[v6][src].sock;
 		if (!s && allow_new)
@@ -530,6 +531,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
 		udp_splice_ns[v6][dst].ts = now->tv_sec;
 		udp_splice_init[v6][src].ts = now->tv_sec;
 	} else {
+		ASSERT(from_pif == PIF_HOST);
 		src += c->udp.fwd_out.rdelta[src];
 		s = udp_splice_ns[v6][src].sock;
 		if (!s && allow_new) {
@@ -776,7 +778,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
 
 		if (splicefrom >= 0)
 			udp_splice_sendfrom(c, i, m, splicefrom, dstport,
-					    v6, ref.udp.ns, ref.udp.orig, now);
+					    ref.udp.pif, v6, ref.udp.orig, now);
 		else
 			udp_tap_send(c, i, m, dstport, v6, now);
 	}
@@ -974,8 +976,10 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
 
 	if (ns) {
+		uref.pif = PIF_SPLICE;
 		uref.port = (in_port_t)(port + c->udp.fwd_out.f.delta[port]);
 	} else {
+		uref.pif = PIF_HOST;
 		uref.port = (in_port_t)(port + c->udp.fwd_in.f.delta[port]);
 	}
 
@@ -990,7 +994,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
 		} else {
 			struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
-			uref.ns = true;
 
 			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, &loopback,
 					 ifname, port, uref.u32);
@@ -1008,8 +1011,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
 			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
 		} else {
-			uref.ns = true;
-
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
 					 &in6addr_loopback,
 					 ifname, port, uref.u32);
diff --git a/udp.h b/udp.h
index 7837134..122a3d9 100644
--- a/udp.h
+++ b/udp.h
@@ -20,21 +20,21 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
 
 /**
  * union udp_epoll_ref - epoll reference portion for TCP connections
+ * @port:		Source port for connected sockets, bound port otherwise
+ * @pif:		Pif for this socket
  * @bound:		Set if this file descriptor is a bound socket
  * @splice:		Set if descriptor packets to be "spliced"
  * @orig:		Set if a spliced socket which can originate "connections"
- * @ns:			Set if this is a socket in the pasta network namespace
  * @v6:			Set for IPv6 sockets or connections
- * @port:		Source port for connected sockets, bound port otherwise
  * @u32:		Opaque u32 value of reference
  */
 union udp_epoll_ref {
 	struct {
+		in_port_t	port;
+		pif_id		pif;
 		bool		splice:1,
 				orig:1,
-				ns:1,
 				v6:1;
-		uint32_t	port:16;
 	};
 	uint32_t u32;
 };
-- 
@@ -20,21 +20,21 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
 
 /**
  * union udp_epoll_ref - epoll reference portion for TCP connections
+ * @port:		Source port for connected sockets, bound port otherwise
+ * @pif:		Pif for this socket
  * @bound:		Set if this file descriptor is a bound socket
  * @splice:		Set if descriptor packets to be "spliced"
  * @orig:		Set if a spliced socket which can originate "connections"
- * @ns:			Set if this is a socket in the pasta network namespace
  * @v6:			Set for IPv6 sockets or connections
- * @port:		Source port for connected sockets, bound port otherwise
  * @u32:		Opaque u32 value of reference
  */
 union udp_epoll_ref {
 	struct {
+		in_port_t	port;
+		pif_id		pif;
 		bool		splice:1,
 				orig:1,
-				ns:1,
 				v6:1;
-		uint32_t	port:16;
 	};
 	uint32_t u32;
 };
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] pif: Pass originating pif to tap handler functions
  2023-10-09  8:30 [PATCH 0/4] Passt Interface Identifiers David Gibson
                   ` (2 preceding siblings ...)
  2023-10-09  8:30 ` [PATCH 3/4] pif: Record originating pif in listening socket refs David Gibson
@ 2023-10-09  8:30 ` David Gibson
  3 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2023-10-09  8:30 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

For now, packets passed to the various *_tap_handler() functions always
come from the single "tap" interface.  We want to allow the possibility to
broaden that in future.  As preparation for that, have the code in tap.c
pass the pif_id of the originating interface to each of those handler
functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 icmp.c |  4 +++-
 icmp.h |  4 ++--
 tap.c  | 26 ++++++++++++++++----------
 tcp.c  |  6 +++++-
 tcp.h  |  3 ++-
 udp.c  |  5 ++++-
 udp.h  |  3 ++-
 7 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/icmp.c b/icmp.c
index 41b9f8b..da32ee8 100644
--- a/icmp.c
+++ b/icmp.c
@@ -148,6 +148,7 @@ void icmpv6_sock_handler(const struct ctx *c, union epoll_ref ref)
 /**
  * icmp_tap_handler() - Handle packets from tap
  * @c:		Execution context
+ * @pif:	Pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address
  * @daddr:	Destination address
@@ -156,13 +157,14 @@ void icmpv6_sock_handler(const struct ctx *c, union epoll_ref ref)
  *
  * Return: count of consumed packets (always 1, even if malformed)
  */
-int icmp_tap_handler(const struct ctx *c, int af,
+int icmp_tap_handler(const struct ctx *c, pif_id pif, int af,
 		     const void *saddr, const void *daddr,
 		     const struct pool *p, const struct timespec *now)
 {
 	size_t plen;
 
 	(void)saddr;
+	(void)pif;
 
 	if (af == AF_INET) {
 		struct sockaddr_in sa = {
diff --git a/icmp.h b/icmp.h
index 00d10ea..c6c4b8f 100644
--- a/icmp.h
+++ b/icmp.h
@@ -12,8 +12,8 @@ struct ctx;
 
 void icmp_sock_handler(const struct ctx *c, union epoll_ref ref);
 void icmpv6_sock_handler(const struct ctx *c, union epoll_ref ref);
-int icmp_tap_handler(const struct ctx *c,
-		     int af, const void *saddr, const void *daddr,
+int icmp_tap_handler(const struct ctx *c, pif_id pif, int af,
+		     const void *saddr, const void *daddr,
 		     const struct pool *p, const struct timespec *now);
 void icmp_timer(const struct ctx *c, const struct timespec *ts);
 void icmp_init(void);
diff --git a/tap.c b/tap.c
index a7f6d8b..3a938f3 100644
--- a/tap.c
+++ b/tap.c
@@ -645,7 +645,8 @@ resume:
 			tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
 
 			packet_add(pkt, l4_len, l4h);
-			icmp_tap_handler(c, AF_INET, &iph->saddr, &iph->daddr,
+			icmp_tap_handler(c, PIF_TAP, AF_INET,
+					 &iph->saddr, &iph->daddr,
 					 pkt, now);
 			continue;
 		}
@@ -719,14 +720,16 @@ append:
 			if (c->no_tcp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += tcp_tap_handler(c, AF_INET, &seq->saddr,
-						     &seq->daddr, p, k, now);
+				k += tcp_tap_handler(c, PIF_TAP, AF_INET,
+						     &seq->saddr, &seq->daddr,
+						     p, k, now);
 		} else if (seq->protocol == IPPROTO_UDP) {
 			if (c->no_udp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += udp_tap_handler(c, AF_INET, &seq->saddr,
-						     &seq->daddr, p, k, now);
+				k += udp_tap_handler(c, PIF_TAP, AF_INET,
+						     &seq->saddr, &seq->daddr,
+						     p, k, now);
 		}
 	}
 
@@ -807,7 +810,8 @@ resume:
 			tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
 
 			packet_add(pkt, l4_len, l4h);
-			icmp_tap_handler(c, AF_INET6, saddr, daddr, pkt, now);
+			icmp_tap_handler(c, PIF_TAP, AF_INET6,
+					 saddr, daddr, pkt, now);
 			continue;
 		}
 
@@ -883,14 +887,16 @@ append:
 			if (c->no_tcp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += tcp_tap_handler(c, AF_INET6, &seq->saddr,
-						     &seq->daddr, p, k, now);
+				k += tcp_tap_handler(c, PIF_TAP, AF_INET6,
+						     &seq->saddr, &seq->daddr,
+						     p, k, now);
 		} else if (seq->protocol == IPPROTO_UDP) {
 			if (c->no_udp)
 				continue;
 			for (k = 0; k < p->count; )
-				k += udp_tap_handler(c, AF_INET6, &seq->saddr,
-						     &seq->daddr, p, k, now);
+				k += udp_tap_handler(c, PIF_TAP, AF_INET6,
+						     &seq->saddr, &seq->daddr,
+						     p, k, now);
 		}
 	}
 
diff --git a/tcp.c b/tcp.c
index bad8c38..a921dcd 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2562,6 +2562,7 @@ static void tcp_conn_from_sock_finish(struct ctx *c, struct tcp_tap_conn *conn,
 /**
  * tcp_tap_handler() - Handle packets from tap and state transitions
  * @c:		Execution context
+ * @pif:	Pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address
  * @daddr:	Destination address
@@ -2571,7 +2572,8 @@ static void tcp_conn_from_sock_finish(struct ctx *c, struct tcp_tap_conn *conn,
  *
  * Return: count of consumed packets
  */
-int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
+int tcp_tap_handler(struct ctx *c, pif_id pif, int af,
+		    const void *saddr, const void *daddr,
 		    const struct pool *p, int idx, const struct timespec *now)
 {
 	struct tcp_tap_conn *conn;
@@ -2581,6 +2583,8 @@ int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
 	char *opts;
 	int count;
 
+	(void)pif;
+
 	th = packet_get(p, idx, 0, sizeof(*th), &len);
 	if (!th)
 		return 1;
diff --git a/tcp.h b/tcp.h
index 3f6937c..3872c16 100644
--- a/tcp.h
+++ b/tcp.h
@@ -17,7 +17,8 @@ void tcp_timer_handler(struct ctx *c, union epoll_ref ref);
 void tcp_listen_handler(struct ctx *c, union epoll_ref ref,
 			const struct timespec *now);
 void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events);
-int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
+int tcp_tap_handler(struct ctx *c, pif_id pif, int af,
+		    const void *saddr, const void *daddr,
 		    const struct pool *p, int idx, const struct timespec *now);
 int tcp_sock_init(const struct ctx *c, sa_family_t af, const void *addr,
 		  const char *ifname, in_port_t port);
diff --git a/udp.c b/udp.c
index ce02979..d838b70 100644
--- a/udp.c
+++ b/udp.c
@@ -787,6 +787,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
 /**
  * udp_tap_handler() - Handle packets from tap
  * @c:		Execution context
+ * @pif:	Pif on which the packet is arriving
  * @af:		Address family, AF_INET or AF_INET6
  * @saddr:	Source address
  * @daddr:	Destination address
@@ -798,7 +799,8 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
  *
  * #syscalls sendmmsg
  */
-int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
+int udp_tap_handler(struct ctx *c, pif_id pif,
+		    int af, const void *saddr, const void *daddr,
 		    const struct pool *p, int idx, const struct timespec *now)
 {
 	struct mmsghdr mm[UIO_MAXIOV];
@@ -813,6 +815,7 @@ int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
 
 	(void)c;
 	(void)saddr;
+	(void)pif;
 
 	uh = packet_get(p, idx, 0, sizeof(*uh), NULL);
 	if (!uh)
diff --git a/udp.h b/udp.h
index 122a3d9..d41932b 100644
--- a/udp.h
+++ b/udp.h
@@ -10,7 +10,8 @@
 
 void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now);
-int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
+int udp_tap_handler(struct ctx *c, pif_id pif, int af,
+		    const void *saddr, const void *daddr,
 		    const struct pool *p, int idx, const struct timespec *now);
 int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port);
-- 
@@ -10,7 +10,8 @@
 
 void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
 		      const struct timespec *now);
-int udp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
+int udp_tap_handler(struct ctx *c, pif_id pif, int af,
+		    const void *saddr, const void *daddr,
 		    const struct pool *p, int idx, const struct timespec *now);
 int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 		  const void *addr, const char *ifname, in_port_t port);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] pif: Record originating pif in listening socket refs
  2023-10-09  8:30 ` [PATCH 3/4] pif: Record originating pif in listening socket refs David Gibson
@ 2023-11-03  9:47   ` Stefano Brivio
  2023-11-04  2:03     ` David Gibson
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Brivio @ 2023-11-03  9:47 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon,  9 Oct 2023 19:30:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> For certain socket types, we record in the epoll ref whether they're
> sockets in the namespace, or on the host.  We now have the notion of "pif"
> to indicate what "place" a socket is associated with, so generalise the
> simple one-bit 'ns' to a pif id.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  passt.h      |  1 +
>  tcp.c        |  5 +++--
>  tcp.h        |  4 ++--
>  tcp_splice.c | 10 ++++++----
>  udp.c        | 23 ++++++++++++-----------
>  udp.h        |  8 ++++----
>  6 files changed, 28 insertions(+), 23 deletions(-)
> 
> diff --git a/passt.h b/passt.h
> index 53defa4..cac720a 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -35,6 +35,7 @@ union epoll_ref;
>  #include <assert.h>
>  #include <sys/epoll.h>
>  
> +#include "pif.h"
>  #include "packet.h"
>  #include "icmp.h"
>  #include "port_fwd.h"
> diff --git a/tcp.c b/tcp.c
> index c13b7de..bad8c38 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2964,6 +2964,7 @@ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port,
>  {
>  	union tcp_listen_epoll_ref tref = {
>  		.port = port + c->tcp.fwd_in.delta[port],
> +		.pif = PIF_HOST,
>  	};
>  	int s;
>  
> @@ -3025,7 +3026,7 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
>  {
>  	union tcp_listen_epoll_ref tref = {
>  		.port = port + c->tcp.fwd_out.delta[port],
> -		.ns = true,
> +		.pif = PIF_SPLICE,
>  	};
>  	struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
>  	int s;
> @@ -3051,7 +3052,7 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
>  {
>  	union tcp_listen_epoll_ref tref = {
>  		.port = port + c->tcp.fwd_out.delta[port],
> -		.ns = true,
> +		.pif = PIF_SPLICE,
>  	};
>  	int s;
>  
> diff --git a/tcp.h b/tcp.h
> index 6444d6a..3f6937c 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -41,13 +41,13 @@ union tcp_epoll_ref {
>  /**
>   * union tcp_listen_epoll_ref - epoll reference portion for TCP listening
>   * @port:	Port number we're forwarding *to* (listening port plus delta)
> - * @ns:		True if listening within the pasta namespace
> + * @pif:	Pif in which the socket is listening
>   * @u32:	Opaque u32 value of reference
>   */
>  union tcp_listen_epoll_ref {
>  	struct {
>  		in_port_t	port;
> -		bool		ns;
> +		pif_id		pif;
>  	};
>  	uint32_t u32;
>  };
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 54fc317..19f5406 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -411,12 +411,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
>   * @c:		Execution context
>   * @conn:	Connection pointer
>   * @port:	Destination port, host order
> - * @outbound:	Connection request coming from namespace
> + * @pif:	Originating pif of the splice
>   *
>   * Return: return code from connect()
>   */
>  static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> -			  in_port_t port, int outbound)
> +			  in_port_t port, pif_id pif)
>  {
>  	int s = -1;
>  
> @@ -427,7 +427,7 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
>  	 * entering the ns anyway, so we might as well refill the
>  	 * pool.
>  	 */
> -	if (outbound) {
> +	if (pif == PIF_SPLICE) {
>  		int *p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
>  		int af = CONN_V6(conn) ? AF_INET6 : AF_INET;
>  
> @@ -437,6 +437,8 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
>  	} else {
>  		int *p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
>  
> +		ASSERT(pif == PIF_HOST);
> +
>  		/* If pool is empty, refill it first */
>  		if (p[TCP_SOCK_POOL_SIZE-1] < 0)
>  			NS_CALL(tcp_sock_refill_ns, c);
> @@ -516,7 +518,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
>  	conn->c.spliced = true;
>  	conn->a = s;
>  
> -	if (tcp_splice_new(c, conn, ref.port, ref.ns))
> +	if (tcp_splice_new(c, conn, ref.port, ref.pif))
>  		conn_flag(c, conn, CLOSING);
>  
>  	return true;
> diff --git a/udp.c b/udp.c
> index db35859..ce02979 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -365,7 +365,7 @@ static void udp_sock6_iov_init(const struct ctx *c)
>   * @c:		Execution context
>   * @v6:		Set for IPv6 sockets
>   * @src:	Source port of original connection, host order
> - * @splice:	UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
> + * @ns:		Does the splice originate in the ns or not
>   *
>   * Return: prepared socket, negative error code on failure
>   *
> @@ -375,16 +375,17 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
>  {
>  	struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
>  	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
> -				.udp = { .splice = true, .ns = ns,
> -					 .v6 = v6, .port = src }
> +				.udp = { .splice = true, .v6 = v6, .port = src }
>  			      };
>  	struct udp_splice_port *sp;
>  	int act, s;
>  
>  	if (ns) {
> +		ref.udp.pif = PIF_SPLICE;
>  		sp = &udp_splice_ns[v6 ? V6 : V4][src];
>  		act = UDP_ACT_SPLICE_NS;
>  	} else {
> +		ref.udp.pif = PIF_HOST;
>  		sp = &udp_splice_init[v6 ? V6 : V4][src];
>  		act = UDP_ACT_SPLICE_INIT;
>  	}
> @@ -495,15 +496,15 @@ static int udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
>   * @n:		Number of datagrams to send
>   * @src:	Datagrams will be sent from this port (on origin side)
>   * @dst:	Datagrams will be send to this port (on destination side)
> + * @from_pif:	Pif from which the packet originated

"pif" isn't a word and not a proper acronym either... should we keep
this simpler and always spell it lowercase (we would do the same with,
say, "uint8_t") even at the beginning of a sentence? Not a strong
preference.

>   * @v6:		Send as IPv6?
> - * @from_ns:	If true send from pasta ns to init, otherwise reverse
>   * @allow_new:	If true create sending socket if needed, if false discard
>   *              if no sending socket is available
>   * @now:	Timestamp
>   */
>  static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> -				in_port_t src, in_port_t dst,
> -				bool v6, bool from_ns, bool allow_new,
> +				in_port_t src, in_port_t dst, pif_id from_pif,
> +				bool v6, bool allow_new,
>  				const struct timespec *now)
>  {
>  	struct mmsghdr *mmh_recv, *mmh_send;
> @@ -518,7 +519,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>  		mmh_send = udp4_mh_splice;
>  	}
>  
> -	if (from_ns) {
> +	if (from_pif == PIF_SPLICE) {
>  		src += c->udp.fwd_in.rdelta[src];
>  		s = udp_splice_init[v6][src].sock;
>  		if (!s && allow_new)
> @@ -530,6 +531,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
>  		udp_splice_ns[v6][dst].ts = now->tv_sec;
>  		udp_splice_init[v6][src].ts = now->tv_sec;
>  	} else {
> +		ASSERT(from_pif == PIF_HOST);
>  		src += c->udp.fwd_out.rdelta[src];
>  		s = udp_splice_ns[v6][src].sock;
>  		if (!s && allow_new) {
> @@ -776,7 +778,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
>  
>  		if (splicefrom >= 0)
>  			udp_splice_sendfrom(c, i, m, splicefrom, dstport,
> -					    v6, ref.udp.ns, ref.udp.orig, now);
> +					    ref.udp.pif, v6, ref.udp.orig, now);
>  		else
>  			udp_tap_send(c, i, m, dstport, v6, now);
>  	}
> @@ -974,8 +976,10 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
>  	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
>  
>  	if (ns) {
> +		uref.pif = PIF_SPLICE;
>  		uref.port = (in_port_t)(port + c->udp.fwd_out.f.delta[port]);
>  	} else {
> +		uref.pif = PIF_HOST;
>  		uref.port = (in_port_t)(port + c->udp.fwd_in.f.delta[port]);
>  	}
>  
> @@ -990,7 +994,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
>  			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
>  		} else {
>  			struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
> -			uref.ns = true;
>  
>  			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, &loopback,
>  					 ifname, port, uref.u32);
> @@ -1008,8 +1011,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
>  			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
>  			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
>  		} else {
> -			uref.ns = true;
> -
>  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
>  					 &in6addr_loopback,
>  					 ifname, port, uref.u32);
> diff --git a/udp.h b/udp.h
> index 7837134..122a3d9 100644
> --- a/udp.h
> +++ b/udp.h
> @@ -20,21 +20,21 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
>  
>  /**
>   * union udp_epoll_ref - epoll reference portion for TCP connections
> + * @port:		Source port for connected sockets, bound port otherwise
> + * @pif:		Pif for this socket
>   * @bound:		Set if this file descriptor is a bound socket
>   * @splice:		Set if descriptor packets to be "spliced"
>   * @orig:		Set if a spliced socket which can originate "connections"
> - * @ns:			Set if this is a socket in the pasta network namespace
>   * @v6:			Set for IPv6 sockets or connections
> - * @port:		Source port for connected sockets, bound port otherwise
>   * @u32:		Opaque u32 value of reference
>   */
>  union udp_epoll_ref {
>  	struct {
> +		in_port_t	port;
> +		pif_id		pif;
>  		bool		splice:1,
>  				orig:1,
> -				ns:1,
>  				v6:1;
> -		uint32_t	port:16;

This is rather a comment to 2/4, but its application is more obvious
here: having 'uint8_t pif' would show clearly where it needs to be
placed in these types of struct, instead of hiding its type behind the
typedef.

The rest of the series looks good to me.

-- 
Stefano


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] pif: Record originating pif in listening socket refs
  2023-11-03  9:47   ` Stefano Brivio
@ 2023-11-04  2:03     ` David Gibson
  0 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2023-11-04  2:03 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 10981 bytes --]

On Fri, Nov 03, 2023 at 10:47:36AM +0100, Stefano Brivio wrote:
> On Mon,  9 Oct 2023 19:30:12 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > For certain socket types, we record in the epoll ref whether they're
> > sockets in the namespace, or on the host.  We now have the notion of "pif"
> > to indicate what "place" a socket is associated with, so generalise the
> > simple one-bit 'ns' to a pif id.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  passt.h      |  1 +
> >  tcp.c        |  5 +++--
> >  tcp.h        |  4 ++--
> >  tcp_splice.c | 10 ++++++----
> >  udp.c        | 23 ++++++++++++-----------
> >  udp.h        |  8 ++++----
> >  6 files changed, 28 insertions(+), 23 deletions(-)
> > 
> > diff --git a/passt.h b/passt.h
> > index 53defa4..cac720a 100644
> > --- a/passt.h
> > +++ b/passt.h
> > @@ -35,6 +35,7 @@ union epoll_ref;
> >  #include <assert.h>
> >  #include <sys/epoll.h>
> >  
> > +#include "pif.h"
> >  #include "packet.h"
> >  #include "icmp.h"
> >  #include "port_fwd.h"
> > diff --git a/tcp.c b/tcp.c
> > index c13b7de..bad8c38 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -2964,6 +2964,7 @@ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port,
> >  {
> >  	union tcp_listen_epoll_ref tref = {
> >  		.port = port + c->tcp.fwd_in.delta[port],
> > +		.pif = PIF_HOST,
> >  	};
> >  	int s;
> >  
> > @@ -3025,7 +3026,7 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
> >  {
> >  	union tcp_listen_epoll_ref tref = {
> >  		.port = port + c->tcp.fwd_out.delta[port],
> > -		.ns = true,
> > +		.pif = PIF_SPLICE,
> >  	};
> >  	struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
> >  	int s;
> > @@ -3051,7 +3052,7 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
> >  {
> >  	union tcp_listen_epoll_ref tref = {
> >  		.port = port + c->tcp.fwd_out.delta[port],
> > -		.ns = true,
> > +		.pif = PIF_SPLICE,
> >  	};
> >  	int s;
> >  
> > diff --git a/tcp.h b/tcp.h
> > index 6444d6a..3f6937c 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -41,13 +41,13 @@ union tcp_epoll_ref {
> >  /**
> >   * union tcp_listen_epoll_ref - epoll reference portion for TCP listening
> >   * @port:	Port number we're forwarding *to* (listening port plus delta)
> > - * @ns:		True if listening within the pasta namespace
> > + * @pif:	Pif in which the socket is listening
> >   * @u32:	Opaque u32 value of reference
> >   */
> >  union tcp_listen_epoll_ref {
> >  	struct {
> >  		in_port_t	port;
> > -		bool		ns;
> > +		pif_id		pif;
> >  	};
> >  	uint32_t u32;
> >  };
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 54fc317..19f5406 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -411,12 +411,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
> >   * @c:		Execution context
> >   * @conn:	Connection pointer
> >   * @port:	Destination port, host order
> > - * @outbound:	Connection request coming from namespace
> > + * @pif:	Originating pif of the splice
> >   *
> >   * Return: return code from connect()
> >   */
> >  static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> > -			  in_port_t port, int outbound)
> > +			  in_port_t port, pif_id pif)
> >  {
> >  	int s = -1;
> >  
> > @@ -427,7 +427,7 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> >  	 * entering the ns anyway, so we might as well refill the
> >  	 * pool.
> >  	 */
> > -	if (outbound) {
> > +	if (pif == PIF_SPLICE) {
> >  		int *p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
> >  		int af = CONN_V6(conn) ? AF_INET6 : AF_INET;
> >  
> > @@ -437,6 +437,8 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> >  	} else {
> >  		int *p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
> >  
> > +		ASSERT(pif == PIF_HOST);
> > +
> >  		/* If pool is empty, refill it first */
> >  		if (p[TCP_SOCK_POOL_SIZE-1] < 0)
> >  			NS_CALL(tcp_sock_refill_ns, c);
> > @@ -516,7 +518,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
> >  	conn->c.spliced = true;
> >  	conn->a = s;
> >  
> > -	if (tcp_splice_new(c, conn, ref.port, ref.ns))
> > +	if (tcp_splice_new(c, conn, ref.port, ref.pif))
> >  		conn_flag(c, conn, CLOSING);
> >  
> >  	return true;
> > diff --git a/udp.c b/udp.c
> > index db35859..ce02979 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -365,7 +365,7 @@ static void udp_sock6_iov_init(const struct ctx *c)
> >   * @c:		Execution context
> >   * @v6:		Set for IPv6 sockets
> >   * @src:	Source port of original connection, host order
> > - * @splice:	UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
> > + * @ns:		Does the splice originate in the ns or not
> >   *
> >   * Return: prepared socket, negative error code on failure
> >   *
> > @@ -375,16 +375,17 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
> >  {
> >  	struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
> >  	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
> > -				.udp = { .splice = true, .ns = ns,
> > -					 .v6 = v6, .port = src }
> > +				.udp = { .splice = true, .v6 = v6, .port = src }
> >  			      };
> >  	struct udp_splice_port *sp;
> >  	int act, s;
> >  
> >  	if (ns) {
> > +		ref.udp.pif = PIF_SPLICE;
> >  		sp = &udp_splice_ns[v6 ? V6 : V4][src];
> >  		act = UDP_ACT_SPLICE_NS;
> >  	} else {
> > +		ref.udp.pif = PIF_HOST;
> >  		sp = &udp_splice_init[v6 ? V6 : V4][src];
> >  		act = UDP_ACT_SPLICE_INIT;
> >  	}
> > @@ -495,15 +496,15 @@ static int udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
> >   * @n:		Number of datagrams to send
> >   * @src:	Datagrams will be sent from this port (on origin side)
> >   * @dst:	Datagrams will be send to this port (on destination side)
> > + * @from_pif:	Pif from which the packet originated
> 
> "pif" isn't a word and not a proper acronym either... should we keep
> this simpler and always spell it lowercase (we would do the same with,
> say, "uint8_t") even at the beginning of a sentence? Not a strong
> preference.

Makes sense to me, changed throughout.

> >   * @v6:		Send as IPv6?
> > - * @from_ns:	If true send from pasta ns to init, otherwise reverse
> >   * @allow_new:	If true create sending socket if needed, if false discard
> >   *              if no sending socket is available
> >   * @now:	Timestamp
> >   */
> >  static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> > -				in_port_t src, in_port_t dst,
> > -				bool v6, bool from_ns, bool allow_new,
> > +				in_port_t src, in_port_t dst, pif_id from_pif,
> > +				bool v6, bool allow_new,
> >  				const struct timespec *now)
> >  {
> >  	struct mmsghdr *mmh_recv, *mmh_send;
> > @@ -518,7 +519,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> >  		mmh_send = udp4_mh_splice;
> >  	}
> >  
> > -	if (from_ns) {
> > +	if (from_pif == PIF_SPLICE) {
> >  		src += c->udp.fwd_in.rdelta[src];
> >  		s = udp_splice_init[v6][src].sock;
> >  		if (!s && allow_new)
> > @@ -530,6 +531,7 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n,
> >  		udp_splice_ns[v6][dst].ts = now->tv_sec;
> >  		udp_splice_init[v6][src].ts = now->tv_sec;
> >  	} else {
> > +		ASSERT(from_pif == PIF_HOST);
> >  		src += c->udp.fwd_out.rdelta[src];
> >  		s = udp_splice_ns[v6][src].sock;
> >  		if (!s && allow_new) {
> > @@ -776,7 +778,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
> >  
> >  		if (splicefrom >= 0)
> >  			udp_splice_sendfrom(c, i, m, splicefrom, dstport,
> > -					    v6, ref.udp.ns, ref.udp.orig, now);
> > +					    ref.udp.pif, v6, ref.udp.orig, now);
> >  		else
> >  			udp_tap_send(c, i, m, dstport, v6, now);
> >  	}
> > @@ -974,8 +976,10 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> >  	int s, r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1;
> >  
> >  	if (ns) {
> > +		uref.pif = PIF_SPLICE;
> >  		uref.port = (in_port_t)(port + c->udp.fwd_out.f.delta[port]);
> >  	} else {
> > +		uref.pif = PIF_HOST;
> >  		uref.port = (in_port_t)(port + c->udp.fwd_in.f.delta[port]);
> >  	}
> >  
> > @@ -990,7 +994,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> >  			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
> >  		} else {
> >  			struct in_addr loopback = { htonl(INADDR_LOOPBACK) };
> > -			uref.ns = true;
> >  
> >  			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, &loopback,
> >  					 ifname, port, uref.u32);
> > @@ -1008,8 +1011,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> >  			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
> >  			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
> >  		} else {
> > -			uref.ns = true;
> > -
> >  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
> >  					 &in6addr_loopback,
> >  					 ifname, port, uref.u32);
> > diff --git a/udp.h b/udp.h
> > index 7837134..122a3d9 100644
> > --- a/udp.h
> > +++ b/udp.h
> > @@ -20,21 +20,21 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
> >  
> >  /**
> >   * union udp_epoll_ref - epoll reference portion for TCP connections
> > + * @port:		Source port for connected sockets, bound port otherwise
> > + * @pif:		Pif for this socket
> >   * @bound:		Set if this file descriptor is a bound socket
> >   * @splice:		Set if descriptor packets to be "spliced"
> >   * @orig:		Set if a spliced socket which can originate "connections"
> > - * @ns:			Set if this is a socket in the pasta network namespace
> >   * @v6:			Set for IPv6 sockets or connections
> > - * @port:		Source port for connected sockets, bound port otherwise
> >   * @u32:		Opaque u32 value of reference
> >   */
> >  union udp_epoll_ref {
> >  	struct {
> > +		in_port_t	port;
> > +		pif_id		pif;
> >  		bool		splice:1,
> >  				orig:1,
> > -				ns:1,
> >  				v6:1;
> > -		uint32_t	port:16;
> 
> This is rather a comment to 2/4, but its application is more obvious
> here: having 'uint8_t pif' would show clearly where it needs to be
> placed in these types of struct, instead of hiding its type behind the
> typedef.

Oof.  I have mixed thoughts about this.  In this instance, you're
right, that the typedef obscures what's going in here, in a packed
bitfield where it's pretty relevant.  In other places the type makes
it clearer what's valid information in that slot.  So I'm not really
sure which way to go.

> The rest of the series looks good to me.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-04  2:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-09  8:30 [PATCH 0/4] Passt Interface Identifiers David Gibson
2023-10-09  8:30 ` [PATCH 1/4] udp: Clean up ref initialisation in udp_sock_init() David Gibson
2023-10-09  8:30 ` [PATCH 2/4] pif: Introduce notion of passt/pasta interface David Gibson
2023-10-09  8:30 ` [PATCH 3/4] pif: Record originating pif in listening socket refs David Gibson
2023-11-03  9:47   ` Stefano Brivio
2023-11-04  2:03     ` David Gibson
2023-10-09  8:30 ` [PATCH 4/4] pif: Pass originating pif to tap handler functions David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).