public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Assorted small fixes for UDP
@ 2024-02-21 23:21 David Gibson
  2024-02-21 23:21 ` [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address David Gibson
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Partly in looking at flow table support, partly for other reasons,
I've spotted a number of small fixes that can be made to the UDP code
(in addition to the large fixes I'm still working on).

These are ready to go now, so here they are.  They'll go before the
flow related addressing cleanups.

David Gibson (7):
  udp: Don't attempt to translate a 0.0.0.0 source address
  udp: Set pif in epoll reference for ephemeral host sockets
  udp: Unconditionally clear act flag in udp_timer_one()
  udp: Separate bound sockets from flags
  udp: Small streamline to udp_update_hdr4()
  udp: Fix incorrect usage of IPv6 state in IPv4 path
  udp: Remove unnecessary test for unspecified addr_out

 udp.c | 170 +++++++++++++++++++++++++++++++---------------------------
 1 file changed, 90 insertions(+), 80 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-22 17:46   ` Stefano Brivio
  2024-02-21 23:21 ` [PATCH 2/7] udp: Set pif in epoll reference for ephemeral host sockets David Gibson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

If an incoming packet has a source address of 0.0.0.0 we translate that to
the gateway address.  This doesn't really make sense, because we have no
way to do a reverse translation for reply packets.

Certain UDP protocols do use an unspecified source address in some
circumstances (e.g. DHCP).  These generally either require no reply, a
multicast reply, or provide a suitable reply address by other means.

In none of those cases does translating it in passt/pasta make sense.  The
best we can really do here is just leave it as is.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/udp.c b/udp.c
index a3961bfd..d2f8027c 100644
--- a/udp.c
+++ b/udp.c
@@ -599,7 +599,6 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	    src_port == 53) {
 		b->iph.saddr = c->ip4.dns_match.s_addr;
 	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
-		   IN4_IS_ADDR_UNSPECIFIED(&b->s_in.sin_addr)||
 		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
 		udp_tap_map[V4][src_port].ts = now->tv_sec;
-- 
@@ -599,7 +599,6 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	    src_port == 53) {
 		b->iph.saddr = c->ip4.dns_match.s_addr;
 	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
-		   IN4_IS_ADDR_UNSPECIFIED(&b->s_in.sin_addr)||
 		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
 		udp_tap_map[V4][src_port].ts = now->tv_sec;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] udp: Set pif in epoll reference for ephemeral host sockets
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
  2024-02-21 23:21 ` [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-21 23:21 ` [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one() David Gibson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

The udp_epoll_ref contains a field for the pif to which the socket belongs.
We fill this in for permanent sockets created with udp_sock_init() and for
spliced sockets, however, we omit it for ephemeral sockets created for
tap originated flows.

This is a bug, although we currently get away with it, because we don't
consult that field for such flows.  Correctly fill it in.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/udp.c b/udp.c
index d2f8027c..03a5936e 100644
--- a/udp.c
+++ b/udp.c
@@ -868,7 +868,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 		      src, dst, udp_tap_map[V4][src].sock);
 		if ((s = udp_tap_map[V4][src].sock) < 0) {
 			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
-			union udp_epoll_ref uref = { .port = src };
+			union udp_epoll_ref uref = {
+				.port = src,
+				.pif = PIF_HOST,
+			};
 			const char *bind_if = NULL;
 
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in.sin_addr))
@@ -916,7 +919,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 		}
 
 		if ((s = udp_tap_map[V6][src].sock) < 0) {
-			union udp_epoll_ref uref = { .v6 = 1, .port = src };
+			union udp_epoll_ref uref = {
+				.v6 = 1,
+				.port = src,
+				.pif = PIF_HOST,
+			};
 			const char *bind_if = NULL;
 
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr))
-- 
@@ -868,7 +868,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 		      src, dst, udp_tap_map[V4][src].sock);
 		if ((s = udp_tap_map[V4][src].sock) < 0) {
 			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
-			union udp_epoll_ref uref = { .port = src };
+			union udp_epoll_ref uref = {
+				.port = src,
+				.pif = PIF_HOST,
+			};
 			const char *bind_if = NULL;
 
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in.sin_addr))
@@ -916,7 +919,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 		}
 
 		if ((s = udp_tap_map[V6][src].sock) < 0) {
-			union udp_epoll_ref uref = { .v6 = 1, .port = src };
+			union udp_epoll_ref uref = {
+				.v6 = 1,
+				.port = src,
+				.pif = PIF_HOST,
+			};
 			const char *bind_if = NULL;
 
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr))
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one()
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
  2024-02-21 23:21 ` [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address David Gibson
  2024-02-21 23:21 ` [PATCH 2/7] udp: Set pif in epoll reference for ephemeral host sockets David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-27 11:56   ` Stefano Brivio
  2024-02-21 23:21 ` [PATCH 4/7] udp: Separate bound sockets from flags David Gibson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

udp_timer_one() first checks for expiries of various sorts of sockets and
if necessary sets the 'sockp' variable to trigger a cleanup at the end.
If sockp is *not* set then, correctly, we don't attempt to close a
non-existent socket.  However, we also don't clear the flag in the
udp_act[] map, in which case we'll come back here and there will, again, be
nothing to be done.

So, clear the udp_act[] flag, even if we don't need to clean up a socket.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/udp.c b/udp.c
index 03a5936e..4200f849 100644
--- a/udp.c
+++ b/udp.c
@@ -1123,8 +1123,9 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 		*sockp = -1;
 		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, s, NULL);
 		close(s);
-		bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
 	}
+
+	bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
 }
 
 /**
-- 
@@ -1123,8 +1123,9 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 		*sockp = -1;
 		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, s, NULL);
 		close(s);
-		bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
 	}
+
+	bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] udp: Separate bound sockets from flags
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
                   ` (2 preceding siblings ...)
  2024-02-21 23:21 ` [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one() David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-27 11:56   ` Stefano Brivio
  2024-02-21 23:21 ` [PATCH 5/7] udp: Small streamline to udp_update_hdr4() David Gibson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

struct udp_tap_port contains both a socket fd and a set of flags. But,
confusingly, these fields are not actually related to each other at all.

udp_tap_map[?][port].sock contains a host UDP socket bound to 'port' (or
-1 if we haven't opened such a socket).  That is, it is indexed by the
bound port == the host side forwarding port == destination for host->guest
packets == source for guest->host packets.

udp_tap_map[?][port].flags, however, contains flags which control how we
direct datagrams bound for 'port' from the tap interface.  That is, it is
indexed by the destination for guest->host packets == source for
host->guest packets == host side endpoint port.

Worse both aspects of this share the ts field used to control expiry.  When
we update this because we've used the socket in slot 'port' we could be
clobbering the flags timestamp for an unrelated flow of packets that
happens to have the same endpoint port as our forwarding port.

Clarify this by moving the flags out into a separate array, with its own
timestamp and expiry path.  As a bonus this means that we can now use the
same data structure for tracking the socket fds of "tap" and "splice" path
sockets.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 132 ++++++++++++++++++++++++++++++----------------------------
 1 file changed, 68 insertions(+), 64 deletions(-)

diff --git a/udp.c b/udp.c
index 4200f849..97518d92 100644
--- a/udp.c
+++ b/udp.c
@@ -121,40 +121,39 @@
 #define UDP_MAX_FRAMES		32  /* max # of frames to receive at once */
 
 /**
- * struct udp_tap_port - Port tracking based on tap-facing source port
- * @sock:	Socket bound to source port used as index
+ * struct udp_port_flags - Translation options by remote Port tracking based on tap-facing source port
  * @flags:	Flags for local bind, loopback address/unicast address as source
- * @ts:		Activity timestamp from tap, used for socket aging
+ * @ts:		Activity timestamp
  */
-struct udp_tap_port {
-	int sock;
-	uint8_t flags;
+struct udp_port_flags {
 #define PORT_LOCAL	BIT(0)
 #define PORT_LOOPBACK	BIT(1)
 #define PORT_GUA	BIT(2)
-
+	uint8_t flags;
 	time_t ts;
 };
+static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS];
 
 /**
- * struct udp_splice_port - Bound socket for spliced communication
- * @sock:	Socket bound to index port
+ * struct udp_bound_port - Track bound UDP sockets
+ * @sock:	Socket bound to index port (or -1)
  * @ts:		Activity timestamp
  */
-struct udp_splice_port {
+struct udp_bound_port {
 	int sock;
 	time_t ts;
 };
 
-/* Port tracking, arrays indexed by packet source port (host order) */
-static struct udp_tap_port	udp_tap_map	[IP_VERSIONS][NUM_PORTS];
+/* Bound sockets indexed by bound port (host order) */
+static struct udp_bound_port	udp_tap_sock	[IP_VERSIONS][NUM_PORTS];
 
-/* "Spliced" sockets indexed by bound port (host order) */
-static struct udp_splice_port udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
+/* Bound sockets for splicing indexed by bound port (host order) */
+static struct udp_bound_port	udp_splice_ns	[IP_VERSIONS][NUM_PORTS];
+static struct udp_bound_port	udp_splice_init	[IP_VERSIONS][NUM_PORTS];
 
 enum udp_act_type {
-	UDP_ACT_TAP,
+	UDP_ACT_TAP_SOCK,
+	UDP_ACT_FLAGS,
 	UDP_ACT_SPLICE_NS,
 	UDP_ACT_SPLICE_INIT,
 	UDP_ACT_TYPE_MAX,
@@ -246,7 +245,7 @@ void udp_portmap_clear(void)
 	unsigned i;
 
 	for (i = 0; i < NUM_PORTS; i++) {
-		udp_tap_map[V4][i].sock = udp_tap_map[V6][i].sock = -1;
+		udp_tap_sock[V4][i].sock = udp_tap_sock[V6][i].sock = -1;
 		udp_splice_ns[V4][i].sock = udp_splice_ns[V6][i].sock = -1;
 		udp_splice_init[V4][i].sock = udp_splice_init[V6][i].sock = -1;
 	}
@@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
 	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
 				.udp = { .splice = true, .v6 = v6, .port = src }
 			      };
-	struct udp_splice_port *sp;
+	struct udp_bound_port *bp;
 	int act, s;
 
 	if (ns) {
 		ref.udp.pif = PIF_SPLICE;
-		sp = &udp_splice_ns[v6 ? V6 : V4][src];
+		bp = &udp_splice_ns[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_NS;
 	} else {
 		ref.udp.pif = PIF_HOST;
-		sp = &udp_splice_init[v6 ? V6 : V4][src];
+		bp = &udp_splice_init[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_INIT;
 	}
 
@@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
 			goto fail;
 	}
 
-	sp->sock = s;
+	bp->sock = s;
 	bitmap_set(udp_act[v6 ? V6 : V4][act], src);
 
 	ev.data.u64 = ref.u64;
@@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
 		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
-		udp_tap_map[V4][src_port].ts = now->tv_sec;
-		udp_tap_map[V4][src_port].flags |= PORT_LOCAL;
+		udp_port_flags[V4][src_port].ts = now->tv_sec;
+		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
 
 		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
-			udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK;
+			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
 		else
-			udp_tap_map[V4][src_port].flags |= PORT_LOOPBACK;
+			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
 
-		bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port);
+		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
 	} else {
 		b->iph.saddr = b->s_in.sin_addr.s_addr;
 	}
@@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 		else
 			b->ip6h.saddr = c->ip6.addr_ll;
 
-		udp_tap_map[V6][src_port].ts = now->tv_sec;
-		udp_tap_map[V6][src_port].flags |= PORT_LOCAL;
+		udp_port_flags[V6][src_port].ts = now->tv_sec;
+		udp_port_flags[V6][src_port].flags |= PORT_LOCAL;
 
 		if (IN6_IS_ADDR_LOOPBACK(src))
-			udp_tap_map[V6][src_port].flags |= PORT_LOOPBACK;
+			udp_port_flags[V6][src_port].flags |= PORT_LOOPBACK;
 		else
-			udp_tap_map[V6][src_port].flags &= ~PORT_LOOPBACK;
+			udp_port_flags[V6][src_port].flags &= ~PORT_LOOPBACK;
 
 		if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr))
-			udp_tap_map[V6][src_port].flags |= PORT_GUA;
+			udp_port_flags[V6][src_port].flags |= PORT_GUA;
 		else
-			udp_tap_map[V6][src_port].flags &= ~PORT_GUA;
+			udp_port_flags[V6][src_port].flags &= ~PORT_GUA;
 
-		bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port);
+		bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port);
 	} else {
 		b->ip6h.daddr = c->ip6.addr_seen;
 		b->ip6h.saddr = b->s_in6.sin6_addr;
@@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			s_in.sin_addr = c->ip4.dns_host;
 		} else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) &&
 			   !c->no_map_gw) {
-			if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) ||
-			    (udp_tap_map[V4][dst].flags & PORT_LOOPBACK))
+			if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) ||
+			    (udp_port_flags[V4][dst].flags & PORT_LOOPBACK))
 				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
 			else
 				s_in.sin_addr = c->ip4.addr_seen;
 		}
 
 		debug("UDP from tap src=%hu dst=%hu, s=%d",
-		      src, dst, udp_tap_map[V4][src].sock);
-		if ((s = udp_tap_map[V4][src].sock) < 0) {
+		      src, dst, udp_tap_sock[V4][src].sock);
+		if ((s = udp_tap_sock[V4][src].sock) < 0) {
 			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
 			union udp_epoll_ref uref = {
 				.port = src,
@@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (s < 0)
 				return p->count - idx;
 
-			udp_tap_map[V4][src].sock = s;
-			bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
+			udp_tap_sock[V4][src].sock = s;
+			bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src);
 		}
 
-		udp_tap_map[V4][src].ts = now->tv_sec;
+		udp_tap_sock[V4][src].ts = now->tv_sec;
 	} else {
 		s_in6 = (struct sockaddr_in6) {
 			.sin6_family = AF_INET6,
@@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			s_in6.sin6_addr = c->ip6.dns_host;
 		} else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) &&
 			   !c->no_map_gw) {
-			if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) ||
-			    (udp_tap_map[V6][dst].flags & PORT_LOOPBACK))
+			if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) ||
+			    (udp_port_flags[V6][dst].flags & PORT_LOOPBACK))
 				s_in6.sin6_addr = in6addr_loopback;
-			else if (udp_tap_map[V6][dst].flags & PORT_GUA)
+			else if (udp_port_flags[V6][dst].flags & PORT_GUA)
 				s_in6.sin6_addr = c->ip6.addr;
 			else
 				s_in6.sin6_addr = c->ip6.addr_seen;
@@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			bind_addr = &c->ip6.addr_ll;
 		}
 
-		if ((s = udp_tap_map[V6][src].sock) < 0) {
+		if ((s = udp_tap_sock[V6][src].sock) < 0) {
 			union udp_epoll_ref uref = {
 				.v6 = 1,
 				.port = src,
@@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (s < 0)
 				return p->count - idx;
 
-			udp_tap_map[V6][src].sock = s;
-			bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
+			udp_tap_sock[V6][src].sock = s;
+			bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src);
 		}
 
-		udp_tap_map[V6][src].ts = now->tv_sec;
+		udp_tap_sock[V6][src].ts = now->tv_sec;
 	}
 
 	for (i = 0; i < (int)p->count - idx; i++) {
@@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
 					 ifname, port, uref.u32);
 
-			udp_tap_map[V4][uref.port].sock = s < 0 ? -1 : s;
+			udp_tap_sock[V4][uref.port].sock = s < 0 ? -1 : s;
 			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
 		} else {
 			struct in_addr loopback = IN4ADDR_LOOPBACK_INIT;
@@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
 					 ifname, port, uref.u32);
 
-			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
+			udp_tap_sock[V6][uref.port].sock = s < 0 ? -1 : s;
 			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
 		} else {
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
@@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void)
 static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 			  in_port_t port, const struct timespec *now)
 {
-	struct udp_splice_port *sp;
-	struct udp_tap_port *tp;
+	struct udp_bound_port *bp;
+	struct udp_port_flags *pf;
 	int *sockp = NULL;
 
 	switch (type) {
-	case UDP_ACT_TAP:
-		tp = &udp_tap_map[v6 ? V6 : V4][port];
+	case UDP_ACT_TAP_SOCK:
+		bp = &udp_tap_sock[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) {
-			sockp = &tp->sock;
-			tp->flags = 0;
-		}
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
+
+		break;
+	case UDP_ACT_FLAGS:
+		pf = &udp_port_flags[v6 ? V6 : V4][port];
+
+		if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT)
+			pf->flags = 0;
 
 		break;
 	case UDP_ACT_SPLICE_INIT:
-		sp = &udp_splice_init[v6 ? V6 : V4][port];
+		bp = &udp_splice_init[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
-			sockp = &sp->sock;
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
 
 		break;
 	case UDP_ACT_SPLICE_NS:
-		sp = &udp_splice_ns[v6 ? V6 : V4][port];
+		bp = &udp_splice_ns[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
-			sockp = &sp->sock;
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
 
 		break;
 	default:
@@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
 		= outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map;
 	const uint8_t *rmap
 		= outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map;
-	struct udp_splice_port (*socks)[NUM_PORTS]
+	struct udp_bound_port (*socks)[NUM_PORTS]
 		= outbound ? udp_splice_ns : udp_splice_init;
 	unsigned port;
 
-- 
@@ -121,40 +121,39 @@
 #define UDP_MAX_FRAMES		32  /* max # of frames to receive at once */
 
 /**
- * struct udp_tap_port - Port tracking based on tap-facing source port
- * @sock:	Socket bound to source port used as index
+ * struct udp_port_flags - Translation options by remote Port tracking based on tap-facing source port
  * @flags:	Flags for local bind, loopback address/unicast address as source
- * @ts:		Activity timestamp from tap, used for socket aging
+ * @ts:		Activity timestamp
  */
-struct udp_tap_port {
-	int sock;
-	uint8_t flags;
+struct udp_port_flags {
 #define PORT_LOCAL	BIT(0)
 #define PORT_LOOPBACK	BIT(1)
 #define PORT_GUA	BIT(2)
-
+	uint8_t flags;
 	time_t ts;
 };
+static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS];
 
 /**
- * struct udp_splice_port - Bound socket for spliced communication
- * @sock:	Socket bound to index port
+ * struct udp_bound_port - Track bound UDP sockets
+ * @sock:	Socket bound to index port (or -1)
  * @ts:		Activity timestamp
  */
-struct udp_splice_port {
+struct udp_bound_port {
 	int sock;
 	time_t ts;
 };
 
-/* Port tracking, arrays indexed by packet source port (host order) */
-static struct udp_tap_port	udp_tap_map	[IP_VERSIONS][NUM_PORTS];
+/* Bound sockets indexed by bound port (host order) */
+static struct udp_bound_port	udp_tap_sock	[IP_VERSIONS][NUM_PORTS];
 
-/* "Spliced" sockets indexed by bound port (host order) */
-static struct udp_splice_port udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
+/* Bound sockets for splicing indexed by bound port (host order) */
+static struct udp_bound_port	udp_splice_ns	[IP_VERSIONS][NUM_PORTS];
+static struct udp_bound_port	udp_splice_init	[IP_VERSIONS][NUM_PORTS];
 
 enum udp_act_type {
-	UDP_ACT_TAP,
+	UDP_ACT_TAP_SOCK,
+	UDP_ACT_FLAGS,
 	UDP_ACT_SPLICE_NS,
 	UDP_ACT_SPLICE_INIT,
 	UDP_ACT_TYPE_MAX,
@@ -246,7 +245,7 @@ void udp_portmap_clear(void)
 	unsigned i;
 
 	for (i = 0; i < NUM_PORTS; i++) {
-		udp_tap_map[V4][i].sock = udp_tap_map[V6][i].sock = -1;
+		udp_tap_sock[V4][i].sock = udp_tap_sock[V6][i].sock = -1;
 		udp_splice_ns[V4][i].sock = udp_splice_ns[V6][i].sock = -1;
 		udp_splice_init[V4][i].sock = udp_splice_init[V6][i].sock = -1;
 	}
@@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
 	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
 				.udp = { .splice = true, .v6 = v6, .port = src }
 			      };
-	struct udp_splice_port *sp;
+	struct udp_bound_port *bp;
 	int act, s;
 
 	if (ns) {
 		ref.udp.pif = PIF_SPLICE;
-		sp = &udp_splice_ns[v6 ? V6 : V4][src];
+		bp = &udp_splice_ns[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_NS;
 	} else {
 		ref.udp.pif = PIF_HOST;
-		sp = &udp_splice_init[v6 ? V6 : V4][src];
+		bp = &udp_splice_init[v6 ? V6 : V4][src];
 		act = UDP_ACT_SPLICE_INIT;
 	}
 
@@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
 			goto fail;
 	}
 
-	sp->sock = s;
+	bp->sock = s;
 	bitmap_set(udp_act[v6 ? V6 : V4][act], src);
 
 	ev.data.u64 = ref.u64;
@@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
 		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
-		udp_tap_map[V4][src_port].ts = now->tv_sec;
-		udp_tap_map[V4][src_port].flags |= PORT_LOCAL;
+		udp_port_flags[V4][src_port].ts = now->tv_sec;
+		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
 
 		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
-			udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK;
+			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
 		else
-			udp_tap_map[V4][src_port].flags |= PORT_LOOPBACK;
+			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
 
-		bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port);
+		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
 	} else {
 		b->iph.saddr = b->s_in.sin_addr.s_addr;
 	}
@@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
 		else
 			b->ip6h.saddr = c->ip6.addr_ll;
 
-		udp_tap_map[V6][src_port].ts = now->tv_sec;
-		udp_tap_map[V6][src_port].flags |= PORT_LOCAL;
+		udp_port_flags[V6][src_port].ts = now->tv_sec;
+		udp_port_flags[V6][src_port].flags |= PORT_LOCAL;
 
 		if (IN6_IS_ADDR_LOOPBACK(src))
-			udp_tap_map[V6][src_port].flags |= PORT_LOOPBACK;
+			udp_port_flags[V6][src_port].flags |= PORT_LOOPBACK;
 		else
-			udp_tap_map[V6][src_port].flags &= ~PORT_LOOPBACK;
+			udp_port_flags[V6][src_port].flags &= ~PORT_LOOPBACK;
 
 		if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr))
-			udp_tap_map[V6][src_port].flags |= PORT_GUA;
+			udp_port_flags[V6][src_port].flags |= PORT_GUA;
 		else
-			udp_tap_map[V6][src_port].flags &= ~PORT_GUA;
+			udp_port_flags[V6][src_port].flags &= ~PORT_GUA;
 
-		bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port);
+		bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port);
 	} else {
 		b->ip6h.daddr = c->ip6.addr_seen;
 		b->ip6h.saddr = b->s_in6.sin6_addr;
@@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			s_in.sin_addr = c->ip4.dns_host;
 		} else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) &&
 			   !c->no_map_gw) {
-			if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) ||
-			    (udp_tap_map[V4][dst].flags & PORT_LOOPBACK))
+			if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) ||
+			    (udp_port_flags[V4][dst].flags & PORT_LOOPBACK))
 				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
 			else
 				s_in.sin_addr = c->ip4.addr_seen;
 		}
 
 		debug("UDP from tap src=%hu dst=%hu, s=%d",
-		      src, dst, udp_tap_map[V4][src].sock);
-		if ((s = udp_tap_map[V4][src].sock) < 0) {
+		      src, dst, udp_tap_sock[V4][src].sock);
+		if ((s = udp_tap_sock[V4][src].sock) < 0) {
 			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
 			union udp_epoll_ref uref = {
 				.port = src,
@@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (s < 0)
 				return p->count - idx;
 
-			udp_tap_map[V4][src].sock = s;
-			bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
+			udp_tap_sock[V4][src].sock = s;
+			bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src);
 		}
 
-		udp_tap_map[V4][src].ts = now->tv_sec;
+		udp_tap_sock[V4][src].ts = now->tv_sec;
 	} else {
 		s_in6 = (struct sockaddr_in6) {
 			.sin6_family = AF_INET6,
@@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			s_in6.sin6_addr = c->ip6.dns_host;
 		} else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) &&
 			   !c->no_map_gw) {
-			if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) ||
-			    (udp_tap_map[V6][dst].flags & PORT_LOOPBACK))
+			if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) ||
+			    (udp_port_flags[V6][dst].flags & PORT_LOOPBACK))
 				s_in6.sin6_addr = in6addr_loopback;
-			else if (udp_tap_map[V6][dst].flags & PORT_GUA)
+			else if (udp_port_flags[V6][dst].flags & PORT_GUA)
 				s_in6.sin6_addr = c->ip6.addr;
 			else
 				s_in6.sin6_addr = c->ip6.addr_seen;
@@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			bind_addr = &c->ip6.addr_ll;
 		}
 
-		if ((s = udp_tap_map[V6][src].sock) < 0) {
+		if ((s = udp_tap_sock[V6][src].sock) < 0) {
 			union udp_epoll_ref uref = {
 				.v6 = 1,
 				.port = src,
@@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (s < 0)
 				return p->count - idx;
 
-			udp_tap_map[V6][src].sock = s;
-			bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
+			udp_tap_sock[V6][src].sock = s;
+			bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src);
 		}
 
-		udp_tap_map[V6][src].ts = now->tv_sec;
+		udp_tap_sock[V6][src].ts = now->tv_sec;
 	}
 
 	for (i = 0; i < (int)p->count - idx; i++) {
@@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
 					 ifname, port, uref.u32);
 
-			udp_tap_map[V4][uref.port].sock = s < 0 ? -1 : s;
+			udp_tap_sock[V4][uref.port].sock = s < 0 ? -1 : s;
 			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
 		} else {
 			struct in_addr loopback = IN4ADDR_LOOPBACK_INIT;
@@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
 					 ifname, port, uref.u32);
 
-			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
+			udp_tap_sock[V6][uref.port].sock = s < 0 ? -1 : s;
 			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
 		} else {
 			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
@@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void)
 static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
 			  in_port_t port, const struct timespec *now)
 {
-	struct udp_splice_port *sp;
-	struct udp_tap_port *tp;
+	struct udp_bound_port *bp;
+	struct udp_port_flags *pf;
 	int *sockp = NULL;
 
 	switch (type) {
-	case UDP_ACT_TAP:
-		tp = &udp_tap_map[v6 ? V6 : V4][port];
+	case UDP_ACT_TAP_SOCK:
+		bp = &udp_tap_sock[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) {
-			sockp = &tp->sock;
-			tp->flags = 0;
-		}
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
+
+		break;
+	case UDP_ACT_FLAGS:
+		pf = &udp_port_flags[v6 ? V6 : V4][port];
+
+		if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT)
+			pf->flags = 0;
 
 		break;
 	case UDP_ACT_SPLICE_INIT:
-		sp = &udp_splice_init[v6 ? V6 : V4][port];
+		bp = &udp_splice_init[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
-			sockp = &sp->sock;
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
 
 		break;
 	case UDP_ACT_SPLICE_NS:
-		sp = &udp_splice_ns[v6 ? V6 : V4][port];
+		bp = &udp_splice_ns[v6 ? V6 : V4][port];
 
-		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
-			sockp = &sp->sock;
+		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
+			sockp = &bp->sock;
 
 		break;
 	default:
@@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
 		= outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map;
 	const uint8_t *rmap
 		= outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map;
-	struct udp_splice_port (*socks)[NUM_PORTS]
+	struct udp_bound_port (*socks)[NUM_PORTS]
 		= outbound ? udp_splice_ns : udp_splice_init;
 	unsigned port;
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] udp: Small streamline to udp_update_hdr4()
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
                   ` (3 preceding siblings ...)
  2024-02-21 23:21 ` [PATCH 4/7] udp: Separate bound sockets from flags David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-21 23:21 ` [PATCH 6/7] udp: Fix incorrect usage of IPv6 state in IPv4 path David Gibson
  2024-02-21 23:21 ` [PATCH 7/7] udp: Remove unnecessary test for unspecified addr_out David Gibson
  6 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Streamline the logic here slightly, by introducing a 'src' temporary for
brevity.  We also transform the logic for setting/clearing PORT_LOOPBACK.
This makes udp_update_hdr4() more closely match the corresponding logic
from udp_update_udp6().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/udp.c b/udp.c
index 97518d92..87f440f5 100644
--- a/udp.c
+++ b/udp.c
@@ -583,6 +583,7 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
 	struct udp4_l2_buf_t *b = &udp4_l2_buf[n];
+	struct in_addr *src;
 	in_port_t src_port;
 	size_t ip_len;
 
@@ -591,26 +592,26 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	b->iph.tot_len = htons(ip_len);
 	b->iph.daddr = c->ip4.addr_seen.s_addr;
 
+	src = &b->s_in.sin_addr;
 	src_port = ntohs(b->s_in.sin_port);
 
 	if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match) &&
-	    IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.dns_host) &&
-	    src_port == 53) {
+	    IN4_ARE_ADDR_EQUAL(src, &c->ip4.dns_host) && src_port == 53) {
 		b->iph.saddr = c->ip4.dns_match.s_addr;
-	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
-		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
+	} else if (IN4_IS_ADDR_LOOPBACK(src) ||
+		   IN4_ARE_ADDR_EQUAL(src, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
 		udp_port_flags[V4][src_port].ts = now->tv_sec;
 		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
 
-		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
-			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
-		else
+		if (IN4_IS_ADDR_LOOPBACK(src))
 			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
+		else
+			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
 
 		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
 	} else {
-		b->iph.saddr = b->s_in.sin_addr.s_addr;
+		b->iph.saddr = src->s_addr;
 	}
 
 	udp_update_check4(b);
-- 
@@ -583,6 +583,7 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 			      const struct timespec *now)
 {
 	struct udp4_l2_buf_t *b = &udp4_l2_buf[n];
+	struct in_addr *src;
 	in_port_t src_port;
 	size_t ip_len;
 
@@ -591,26 +592,26 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
 	b->iph.tot_len = htons(ip_len);
 	b->iph.daddr = c->ip4.addr_seen.s_addr;
 
+	src = &b->s_in.sin_addr;
 	src_port = ntohs(b->s_in.sin_port);
 
 	if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match) &&
-	    IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.dns_host) &&
-	    src_port == 53) {
+	    IN4_ARE_ADDR_EQUAL(src, &c->ip4.dns_host) && src_port == 53) {
 		b->iph.saddr = c->ip4.dns_match.s_addr;
-	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
-		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
+	} else if (IN4_IS_ADDR_LOOPBACK(src) ||
+		   IN4_ARE_ADDR_EQUAL(src, &c->ip4.addr_seen)) {
 		b->iph.saddr = c->ip4.gw.s_addr;
 		udp_port_flags[V4][src_port].ts = now->tv_sec;
 		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
 
-		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
-			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
-		else
+		if (IN4_IS_ADDR_LOOPBACK(src))
 			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
+		else
+			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
 
 		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
 	} else {
-		b->iph.saddr = b->s_in.sin_addr.s_addr;
+		b->iph.saddr = src->s_addr;
 	}
 
 	udp_update_check4(b);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] udp: Fix incorrect usage of IPv6 state in IPv4 path
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
                   ` (4 preceding siblings ...)
  2024-02-21 23:21 ` [PATCH 5/7] udp: Small streamline to udp_update_hdr4() David Gibson
@ 2024-02-21 23:21 ` David Gibson
  2024-02-21 23:21 ` [PATCH 7/7] udp: Remove unnecessary test for unspecified addr_out David Gibson
  6 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

When forwarding IPv4 packets in udp_tap_handler(), we incorrectly use an
IPv6 address test on our IPv4 address (which could cause an out of bounds
access), and possibly set our bind interface to the IPv6 interface based on
it.  Adjust to correctly look at the IPv4 address and IPv4 interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/udp.c b/udp.c
index 87f440f5..0204f1d2 100644
--- a/udp.c
+++ b/udp.c
@@ -874,8 +874,8 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			};
 			const char *bind_if = NULL;
 
-			if (!IN6_IS_ADDR_LOOPBACK(&s_in.sin_addr))
-				bind_if = c->ip6.ifname_out;
+			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
+				bind_if = c->ip4.ifname_out;
 
 			if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr_out) &&
 			    !IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
-- 
@@ -874,8 +874,8 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			};
 			const char *bind_if = NULL;
 
-			if (!IN6_IS_ADDR_LOOPBACK(&s_in.sin_addr))
-				bind_if = c->ip6.ifname_out;
+			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
+				bind_if = c->ip4.ifname_out;
 
 			if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr_out) &&
 			    !IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] udp: Remove unnecessary test for unspecified addr_out
  2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
                   ` (5 preceding siblings ...)
  2024-02-21 23:21 ` [PATCH 6/7] udp: Fix incorrect usage of IPv6 state in IPv4 path David Gibson
@ 2024-02-21 23:21 ` David Gibson
  6 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-21 23:21 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

If the configured output address is unspecified, we don't set the bind
address to it when creating a new socket in udp_tap_handler().  That sounds
sensible, but what we're leaving the bind address as is, exactly, the
unspecified address, so this test makes no difference.  Remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 udp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/udp.c b/udp.c
index 0204f1d2..304ba206 100644
--- a/udp.c
+++ b/udp.c
@@ -877,8 +877,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
 				bind_if = c->ip4.ifname_out;
 
-			if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr_out) &&
-			    !IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
+			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
 				bind_addr = c->ip4.addr_out;
 
 			s = sock_l4(c, AF_INET, IPPROTO_UDP, &bind_addr,
@@ -929,8 +928,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr))
 				bind_if = c->ip6.ifname_out;
 
-			if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_out) &&
-			    !IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr) &&
+			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr) &&
 			    !IN6_IS_ADDR_LINKLOCAL(&s_in6.sin6_addr))
 				bind_addr = &c->ip6.addr_out;
 
-- 
@@ -877,8 +877,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
 				bind_if = c->ip4.ifname_out;
 
-			if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr_out) &&
-			    !IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
+			if (!IN4_IS_ADDR_LOOPBACK(&s_in.sin_addr))
 				bind_addr = c->ip4.addr_out;
 
 			s = sock_l4(c, AF_INET, IPPROTO_UDP, &bind_addr,
@@ -929,8 +928,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
 			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr))
 				bind_if = c->ip6.ifname_out;
 
-			if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_out) &&
-			    !IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr) &&
+			if (!IN6_IS_ADDR_LOOPBACK(&s_in6.sin6_addr) &&
 			    !IN6_IS_ADDR_LINKLOCAL(&s_in6.sin6_addr))
 				bind_addr = &c->ip6.addr_out;
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address
  2024-02-21 23:21 ` [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address David Gibson
@ 2024-02-22 17:46   ` Stefano Brivio
  2024-02-23  4:03     ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2024-02-22 17:46 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 22 Feb 2024 10:21:09 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> If an incoming packet has a source address of 0.0.0.0 we translate that to
> the gateway address.  This doesn't really make sense, because we have no
> way to do a reverse translation for reply packets.

Well, we would translate that back to a loopback address, which is fine
if we take 0.0.0.0 as "This host on this network". Actually, after my
previous note based on RFC 6890, I went and had a look at RFC 1122,
section 3.2.1.3, which also states that 0.0.0.0:

  MUST NOT be sent, except as a source address as part of an
  initialization procedure by which the host learns its own IP address.

...so I guess dropping it here is fine.

By the way, I added this originally as part of commit 6488c3e8489d
("tcp, udp: Replace loopback source address by gateway address") on the
basis that 0.0.0.0 could be used in lieu of a loopback address, but
sure, we shouldn't even get it from the kernel to start with.

> Certain UDP protocols do use an unspecified source address in some
> circumstances (e.g. DHCP).  These generally either require no reply, a
> multicast reply, or provide a suitable reply address by other means.
> 
> In none of those cases does translating it in passt/pasta make sense.  The
> best we can really do here is just leave it as is.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/udp.c b/udp.c
> index a3961bfd..d2f8027c 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -599,7 +599,6 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  	    src_port == 53) {
>  		b->iph.saddr = c->ip4.dns_match.s_addr;
>  	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
> -		   IN4_IS_ADDR_UNSPECIFIED(&b->s_in.sin_addr)||
>  		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
>  		b->iph.saddr = c->ip4.gw.s_addr;
>  		udp_tap_map[V4][src_port].ts = now->tv_sec;

-- 
Stefano


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address
  2024-02-22 17:46   ` Stefano Brivio
@ 2024-02-23  4:03     ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-23  4:03 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On Thu, Feb 22, 2024 at 06:46:02PM +0100, Stefano Brivio wrote:
> On Thu, 22 Feb 2024 10:21:09 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > If an incoming packet has a source address of 0.0.0.0 we translate that to
> > the gateway address.  This doesn't really make sense, because we have no
> > way to do a reverse translation for reply packets.
> 
> Well, we would translate that back to a loopback address, which is fine
> if we take 0.0.0.0 as "This host on this network".

Not really, because "this host" has a different meaning to the sender
than it does to us.  For example, returning replies to DHCP broadcasts
to localhost would absolutely not be correct.  Of course, attempting
to run a DHCP server within passt/pasta sounds problematic, but my
point is that localhost is not a reasonable translation.

> Actually, after my
> previous note based on RFC 6890, I went and had a look at RFC 1122,
> section 3.2.1.3, which also states that 0.0.0.0:
> 
>   MUST NOT be sent, except as a source address as part of an
>   initialization procedure by which the host learns its own IP address.
> 
> ...so I guess dropping it here is fine.

I'm not dropping it: I'm leaving it untranslated.

> By the way, I added this originally as part of commit 6488c3e8489d
> ("tcp, udp: Replace loopback source address by gateway address") on the
> basis that 0.0.0.0 could be used in lieu of a loopback address, but
> sure, we shouldn't even get it from the kernel to start with.

Again, that's true for certain API calls, but AFAICT it's not true on
the wire.  At least it seems to be that way in practice, although I
haven't located an RFC to say that explicitly.

> > Certain UDP protocols do use an unspecified source address in some
> > circumstances (e.g. DHCP).  These generally either require no reply, a
> > multicast reply, or provide a suitable reply address by other means.
> > 
> > In none of those cases does translating it in passt/pasta make sense.  The
> > best we can really do here is just leave it as is.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index a3961bfd..d2f8027c 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -599,7 +599,6 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >  	    src_port == 53) {
> >  		b->iph.saddr = c->ip4.dns_match.s_addr;
> >  	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
> > -		   IN4_IS_ADDR_UNSPECIFIED(&b->s_in.sin_addr)||
> >  		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
> >  		b->iph.saddr = c->ip4.gw.s_addr;
> >  		udp_tap_map[V4][src_port].ts = now->tv_sec;
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one()
  2024-02-21 23:21 ` [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one() David Gibson
@ 2024-02-27 11:56   ` Stefano Brivio
  2024-02-28  1:54     ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2024-02-27 11:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

[Sorry, I wrote this comment a while ago and just now realised I didn't
send this out...]

On Thu, 22 Feb 2024 10:21:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> udp_timer_one() first checks for expiries of various sorts of sockets and
> if necessary sets the 'sockp' variable to trigger a cleanup at the end.
> If sockp is *not* set then, correctly, we don't attempt to close a
> non-existent socket.  However, we also don't clear the flag in the
> udp_act[] map, in which case we'll come back here and there will, again, be
> nothing to be done.

Well, but that's intended: if we didn't reach UDP_CONN_TIMEOUT for this
port, the socket should still be in udp_act[], meaning we'll check it
against activity timeouts and close on timeout.

Otherwise, we'll never check that port for the activity timeout, right?

> So, clear the udp_act[] flag, even if we don't need to clean up a socket.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/udp.c b/udp.c
> index 03a5936e..4200f849 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -1123,8 +1123,9 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
>  		*sockp = -1;
>  		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, s, NULL);
>  		close(s);
> -		bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
>  	}
> +
> +	bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
>  }
>  
>  /**

-- 
Stefano


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/7] udp: Separate bound sockets from flags
  2024-02-21 23:21 ` [PATCH 4/7] udp: Separate bound sockets from flags David Gibson
@ 2024-02-27 11:56   ` Stefano Brivio
  2024-02-28  2:30     ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2024-02-27 11:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 22 Feb 2024 10:21:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> struct udp_tap_port contains both a socket fd and a set of flags. But,
> confusingly, these fields are not actually related to each other at all.
> 
> udp_tap_map[?][port].sock contains a host UDP socket bound to 'port' (or
> -1 if we haven't opened such a socket).  That is, it is indexed by the
> bound port == the host side forwarding port == destination for host->guest
> packets == source for guest->host packets.
> 
> udp_tap_map[?][port].flags, however, contains flags which control how we
> direct datagrams bound for 'port' from the tap interface.  That is, it is
> indexed by the destination for guest->host packets == source for
> host->guest packets == host side endpoint port.
> 
> Worse both aspects of this share the ts field used to control expiry.  When
> we update this because we've used the socket in slot 'port' we could be
> clobbering the flags timestamp for an unrelated flow of packets that
> happens to have the same endpoint port as our forwarding port.
> 
> Clarify this by moving the flags out into a separate array, with its own
> timestamp and expiry path.  As a bonus this means that we can now use the
> same data structure for tracking the socket fds of "tap" and "splice" path
> sockets.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  udp.c | 132 ++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 68 insertions(+), 64 deletions(-)
> 
> diff --git a/udp.c b/udp.c
> index 4200f849..97518d92 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -121,40 +121,39 @@
>  #define UDP_MAX_FRAMES		32  /* max # of frames to receive at once */
>  
>  /**
> - * struct udp_tap_port - Port tracking based on tap-facing source port
> - * @sock:	Socket bound to source port used as index
> + * struct udp_port_flags - Translation options by remote Port tracking based on tap-facing source port

I'm not sure what you mean by this -- perhaps "Address translation
options for bound source ports"?

>   * @flags:	Flags for local bind, loopback address/unicast address as source
> - * @ts:		Activity timestamp from tap, used for socket aging
> + * @ts:		Activity timestamp
>   */
> -struct udp_tap_port {
> -	int sock;
> -	uint8_t flags;
> +struct udp_port_flags {
>  #define PORT_LOCAL	BIT(0)
>  #define PORT_LOOPBACK	BIT(1)
>  #define PORT_GUA	BIT(2)
> -
> +	uint8_t flags;
>  	time_t ts;
>  };
> +static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS];
>  
>  /**
> - * struct udp_splice_port - Bound socket for spliced communication
> - * @sock:	Socket bound to index port
> + * struct udp_bound_port - Track bound UDP sockets
> + * @sock:	Socket bound to index port (or -1)
>   * @ts:		Activity timestamp
>   */
> -struct udp_splice_port {
> +struct udp_bound_port {
>  	int sock;
>  	time_t ts;
>  };
>  
> -/* Port tracking, arrays indexed by packet source port (host order) */
> -static struct udp_tap_port	udp_tap_map	[IP_VERSIONS][NUM_PORTS];
> +/* Bound sockets indexed by bound port (host order) */
> +static struct udp_bound_port	udp_tap_sock	[IP_VERSIONS][NUM_PORTS];
>  
> -/* "Spliced" sockets indexed by bound port (host order) */
> -static struct udp_splice_port udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
> -static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
> +/* Bound sockets for splicing indexed by bound port (host order) */
> +static struct udp_bound_port	udp_splice_ns	[IP_VERSIONS][NUM_PORTS];
> +static struct udp_bound_port	udp_splice_init	[IP_VERSIONS][NUM_PORTS];
>  
>  enum udp_act_type {
> -	UDP_ACT_TAP,
> +	UDP_ACT_TAP_SOCK,
> +	UDP_ACT_FLAGS,

I'm having a hard time reviewing the rest because of this, I think.
Earlier, UDP_ACT_TAP meant that the activity timestamp was observed on
a non-spliced port.

Now, we have UDP_ACT_FLAGS which seems to mean that activity was seen
in the direction of (to) the "tap" side, and UDP_ACT_TAP_SOCK meaning
that activity was seen in the direction of (to) the socket.

But that's somehow the opposite of UDP_ACT_SPLICE_NS and
UDP_ACT_SPLICE_INIT -- even though they don't actually indicate
activity, rather the fact that activity timestamps refer to sockets that
originated in (*from*) the target namespace, or in the initial
namespace.

Anyway, I'm not quite sure: why do we need two separate flags for "tap"
(from/to) activity?

I mean, as long as we observe activity on a non-spliced flow, we'll
update the related timestamp, and we make sure the activity flag is
set. Can't it simply be UDP_ACT_TAP?

The rest of the series looks good to me.

>  	UDP_ACT_SPLICE_NS,
>  	UDP_ACT_SPLICE_INIT,
>  	UDP_ACT_TYPE_MAX,
> @@ -246,7 +245,7 @@ void udp_portmap_clear(void)
>  	unsigned i;
>  
>  	for (i = 0; i < NUM_PORTS; i++) {
> -		udp_tap_map[V4][i].sock = udp_tap_map[V6][i].sock = -1;
> +		udp_tap_sock[V4][i].sock = udp_tap_sock[V6][i].sock = -1;
>  		udp_splice_ns[V4][i].sock = udp_splice_ns[V6][i].sock = -1;
>  		udp_splice_init[V4][i].sock = udp_splice_init[V6][i].sock = -1;
>  	}
> @@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
>  	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
>  				.udp = { .splice = true, .v6 = v6, .port = src }
>  			      };
> -	struct udp_splice_port *sp;
> +	struct udp_bound_port *bp;
>  	int act, s;
>  
>  	if (ns) {
>  		ref.udp.pif = PIF_SPLICE;
> -		sp = &udp_splice_ns[v6 ? V6 : V4][src];
> +		bp = &udp_splice_ns[v6 ? V6 : V4][src];
>  		act = UDP_ACT_SPLICE_NS;
>  	} else {
>  		ref.udp.pif = PIF_HOST;
> -		sp = &udp_splice_init[v6 ? V6 : V4][src];
> +		bp = &udp_splice_init[v6 ? V6 : V4][src];
>  		act = UDP_ACT_SPLICE_INIT;
>  	}
>  
> @@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
>  			goto fail;
>  	}
>  
> -	sp->sock = s;
> +	bp->sock = s;
>  	bitmap_set(udp_act[v6 ? V6 : V4][act], src);
>  
>  	ev.data.u64 = ref.u64;
> @@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
>  	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
>  		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
>  		b->iph.saddr = c->ip4.gw.s_addr;
> -		udp_tap_map[V4][src_port].ts = now->tv_sec;
> -		udp_tap_map[V4][src_port].flags |= PORT_LOCAL;
> +		udp_port_flags[V4][src_port].ts = now->tv_sec;
> +		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
>  
>  		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
> -			udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK;
> +			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
>  		else
> -			udp_tap_map[V4][src_port].flags |= PORT_LOOPBACK;
> +			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
>  
> -		bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port);
> +		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
>  	} else {
>  		b->iph.saddr = b->s_in.sin_addr.s_addr;
>  	}
> @@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
>  		else
>  			b->ip6h.saddr = c->ip6.addr_ll;
>  
> -		udp_tap_map[V6][src_port].ts = now->tv_sec;
> -		udp_tap_map[V6][src_port].flags |= PORT_LOCAL;
> +		udp_port_flags[V6][src_port].ts = now->tv_sec;
> +		udp_port_flags[V6][src_port].flags |= PORT_LOCAL;
>  
>  		if (IN6_IS_ADDR_LOOPBACK(src))
> -			udp_tap_map[V6][src_port].flags |= PORT_LOOPBACK;
> +			udp_port_flags[V6][src_port].flags |= PORT_LOOPBACK;
>  		else
> -			udp_tap_map[V6][src_port].flags &= ~PORT_LOOPBACK;
> +			udp_port_flags[V6][src_port].flags &= ~PORT_LOOPBACK;
>  
>  		if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr))
> -			udp_tap_map[V6][src_port].flags |= PORT_GUA;
> +			udp_port_flags[V6][src_port].flags |= PORT_GUA;
>  		else
> -			udp_tap_map[V6][src_port].flags &= ~PORT_GUA;
> +			udp_port_flags[V6][src_port].flags &= ~PORT_GUA;
>  
> -		bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port);
> +		bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port);
>  	} else {
>  		b->ip6h.daddr = c->ip6.addr_seen;
>  		b->ip6h.saddr = b->s_in6.sin6_addr;
> @@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
>  			s_in.sin_addr = c->ip4.dns_host;
>  		} else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) &&
>  			   !c->no_map_gw) {
> -			if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) ||
> -			    (udp_tap_map[V4][dst].flags & PORT_LOOPBACK))
> +			if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) ||
> +			    (udp_port_flags[V4][dst].flags & PORT_LOOPBACK))
>  				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
>  			else
>  				s_in.sin_addr = c->ip4.addr_seen;
>  		}
>  
>  		debug("UDP from tap src=%hu dst=%hu, s=%d",
> -		      src, dst, udp_tap_map[V4][src].sock);
> -		if ((s = udp_tap_map[V4][src].sock) < 0) {
> +		      src, dst, udp_tap_sock[V4][src].sock);
> +		if ((s = udp_tap_sock[V4][src].sock) < 0) {
>  			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
>  			union udp_epoll_ref uref = {
>  				.port = src,
> @@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
>  			if (s < 0)
>  				return p->count - idx;
>  
> -			udp_tap_map[V4][src].sock = s;
> -			bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
> +			udp_tap_sock[V4][src].sock = s;
> +			bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src);
>  		}
>  
> -		udp_tap_map[V4][src].ts = now->tv_sec;
> +		udp_tap_sock[V4][src].ts = now->tv_sec;
>  	} else {
>  		s_in6 = (struct sockaddr_in6) {
>  			.sin6_family = AF_INET6,
> @@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
>  			s_in6.sin6_addr = c->ip6.dns_host;
>  		} else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) &&
>  			   !c->no_map_gw) {
> -			if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) ||
> -			    (udp_tap_map[V6][dst].flags & PORT_LOOPBACK))
> +			if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) ||
> +			    (udp_port_flags[V6][dst].flags & PORT_LOOPBACK))
>  				s_in6.sin6_addr = in6addr_loopback;
> -			else if (udp_tap_map[V6][dst].flags & PORT_GUA)
> +			else if (udp_port_flags[V6][dst].flags & PORT_GUA)
>  				s_in6.sin6_addr = c->ip6.addr;
>  			else
>  				s_in6.sin6_addr = c->ip6.addr_seen;
> @@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
>  			bind_addr = &c->ip6.addr_ll;
>  		}
>  
> -		if ((s = udp_tap_map[V6][src].sock) < 0) {
> +		if ((s = udp_tap_sock[V6][src].sock) < 0) {
>  			union udp_epoll_ref uref = {
>  				.v6 = 1,
>  				.port = src,
> @@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
>  			if (s < 0)
>  				return p->count - idx;
>  
> -			udp_tap_map[V6][src].sock = s;
> -			bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
> +			udp_tap_sock[V6][src].sock = s;
> +			bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src);
>  		}
>  
> -		udp_tap_map[V6][src].ts = now->tv_sec;
> +		udp_tap_sock[V6][src].ts = now->tv_sec;
>  	}
>  
>  	for (i = 0; i < (int)p->count - idx; i++) {
> @@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
>  			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
>  					 ifname, port, uref.u32);
>  
> -			udp_tap_map[V4][uref.port].sock = s < 0 ? -1 : s;
> +			udp_tap_sock[V4][uref.port].sock = s < 0 ? -1 : s;
>  			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
>  		} else {
>  			struct in_addr loopback = IN4ADDR_LOOPBACK_INIT;
> @@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
>  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
>  					 ifname, port, uref.u32);
>  
> -			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
> +			udp_tap_sock[V6][uref.port].sock = s < 0 ? -1 : s;
>  			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
>  		} else {
>  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
> @@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void)
>  static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
>  			  in_port_t port, const struct timespec *now)
>  {
> -	struct udp_splice_port *sp;
> -	struct udp_tap_port *tp;
> +	struct udp_bound_port *bp;
> +	struct udp_port_flags *pf;
>  	int *sockp = NULL;
>  
>  	switch (type) {
> -	case UDP_ACT_TAP:
> -		tp = &udp_tap_map[v6 ? V6 : V4][port];
> +	case UDP_ACT_TAP_SOCK:
> +		bp = &udp_tap_sock[v6 ? V6 : V4][port];
>  
> -		if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) {
> -			sockp = &tp->sock;
> -			tp->flags = 0;
> -		}
> +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> +			sockp = &bp->sock;
> +
> +		break;
> +	case UDP_ACT_FLAGS:
> +		pf = &udp_port_flags[v6 ? V6 : V4][port];
> +
> +		if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT)
> +			pf->flags = 0;
>  
>  		break;
>  	case UDP_ACT_SPLICE_INIT:
> -		sp = &udp_splice_init[v6 ? V6 : V4][port];
> +		bp = &udp_splice_init[v6 ? V6 : V4][port];
>  
> -		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
> -			sockp = &sp->sock;
> +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> +			sockp = &bp->sock;
>  
>  		break;
>  	case UDP_ACT_SPLICE_NS:
> -		sp = &udp_splice_ns[v6 ? V6 : V4][port];
> +		bp = &udp_splice_ns[v6 ? V6 : V4][port];
>  
> -		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
> -			sockp = &sp->sock;
> +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> +			sockp = &bp->sock;
>  
>  		break;
>  	default:
> @@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
>  		= outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map;
>  	const uint8_t *rmap
>  		= outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map;
> -	struct udp_splice_port (*socks)[NUM_PORTS]
> +	struct udp_bound_port (*socks)[NUM_PORTS]
>  		= outbound ? udp_splice_ns : udp_splice_init;
>  	unsigned port;
>  

-- 
Stefano


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one()
  2024-02-27 11:56   ` Stefano Brivio
@ 2024-02-28  1:54     ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-28  1:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On Tue, Feb 27, 2024 at 12:56:32PM +0100, Stefano Brivio wrote:
> [Sorry, I wrote this comment a while ago and just now realised I didn't
> send this out...]
> 
> On Thu, 22 Feb 2024 10:21:11 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > udp_timer_one() first checks for expiries of various sorts of sockets and
> > if necessary sets the 'sockp' variable to trigger a cleanup at the end.
> > If sockp is *not* set then, correctly, we don't attempt to close a
> > non-existent socket.  However, we also don't clear the flag in the
> > udp_act[] map, in which case we'll come back here and there will, again, be
> > nothing to be done.
> 
> Well, but that's intended: if we didn't reach UDP_CONN_TIMEOUT for this
> port, the socket should still be in udp_act[], meaning we'll check it
> against activity timeouts and close on timeout.
> 
> Otherwise, we'll never check that port for the activity timeout,
> right?

Ah, bother.  Yes, my patch is entirely wrong, I'll drop.

> 
> > So, clear the udp_act[] flag, even if we don't need to clean up a socket.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index 03a5936e..4200f849 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -1123,8 +1123,9 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
> >  		*sockp = -1;
> >  		epoll_ctl(c->epollfd, EPOLL_CTL_DEL, s, NULL);
> >  		close(s);
> > -		bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
> >  	}
> > +
> > +	bitmap_clear(udp_act[v6 ? V6 : V4][type], port);
> >  }
> >  
> >  /**
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/7] udp: Separate bound sockets from flags
  2024-02-27 11:56   ` Stefano Brivio
@ 2024-02-28  2:30     ` David Gibson
  2024-02-28  3:53       ` David Gibson
  0 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2024-02-28  2:30 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 15779 bytes --]

On Tue, Feb 27, 2024 at 12:56:59PM +0100, Stefano Brivio wrote:
> On Thu, 22 Feb 2024 10:21:12 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > struct udp_tap_port contains both a socket fd and a set of flags. But,
> > confusingly, these fields are not actually related to each other at all.
> > 
> > udp_tap_map[?][port].sock contains a host UDP socket bound to 'port' (or
> > -1 if we haven't opened such a socket).  That is, it is indexed by the
> > bound port == the host side forwarding port == destination for host->guest
> > packets == source for guest->host packets.
> > 
> > udp_tap_map[?][port].flags, however, contains flags which control how we
> > direct datagrams bound for 'port' from the tap interface.  That is, it is
> > indexed by the destination for guest->host packets == source for
> > host->guest packets == host side endpoint port.
> > 
> > Worse both aspects of this share the ts field used to control expiry.  When
> > we update this because we've used the socket in slot 'port' we could be
> > clobbering the flags timestamp for an unrelated flow of packets that
> > happens to have the same endpoint port as our forwarding port.
> > 
> > Clarify this by moving the flags out into a separate array, with its own
> > timestamp and expiry path.  As a bonus this means that we can now use the
> > same data structure for tracking the socket fds of "tap" and "splice" path
> > sockets.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  udp.c | 132 ++++++++++++++++++++++++++++++----------------------------
> >  1 file changed, 68 insertions(+), 64 deletions(-)
> > 
> > diff --git a/udp.c b/udp.c
> > index 4200f849..97518d92 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -121,40 +121,39 @@
> >  #define UDP_MAX_FRAMES		32  /* max # of frames to receive at once */
> >  
> >  /**
> > - * struct udp_tap_port - Port tracking based on tap-facing source port
> > - * @sock:	Socket bound to source port used as index
> > + * struct udp_port_flags - Translation options by remote Port tracking based on tap-facing source port
> 
> I'm not sure what you mean by this -- perhaps "Address translation
> options for bound source ports"?

Oops, yes.  I think I partially rewrote an existing comment, leaving
an incoherent fusion of two different sentences.

> >   * @flags:	Flags for local bind, loopback address/unicast address as source
> > - * @ts:		Activity timestamp from tap, used for socket aging
> > + * @ts:		Activity timestamp
> >   */
> > -struct udp_tap_port {
> > -	int sock;
> > -	uint8_t flags;
> > +struct udp_port_flags {
> >  #define PORT_LOCAL	BIT(0)
> >  #define PORT_LOOPBACK	BIT(1)
> >  #define PORT_GUA	BIT(2)
> > -
> > +	uint8_t flags;
> >  	time_t ts;
> >  };
> > +static struct udp_port_flags udp_port_flags[IP_VERSIONS][NUM_PORTS];
> >  
> >  /**
> > - * struct udp_splice_port - Bound socket for spliced communication
> > - * @sock:	Socket bound to index port
> > + * struct udp_bound_port - Track bound UDP sockets
> > + * @sock:	Socket bound to index port (or -1)
> >   * @ts:		Activity timestamp
> >   */
> > -struct udp_splice_port {
> > +struct udp_bound_port {
> >  	int sock;
> >  	time_t ts;
> >  };
> >  
> > -/* Port tracking, arrays indexed by packet source port (host order) */
> > -static struct udp_tap_port	udp_tap_map	[IP_VERSIONS][NUM_PORTS];
> > +/* Bound sockets indexed by bound port (host order) */
> > +static struct udp_bound_port	udp_tap_sock	[IP_VERSIONS][NUM_PORTS];
> >  
> > -/* "Spliced" sockets indexed by bound port (host order) */
> > -static struct udp_splice_port udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
> > -static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
> > +/* Bound sockets for splicing indexed by bound port (host order) */
> > +static struct udp_bound_port	udp_splice_ns	[IP_VERSIONS][NUM_PORTS];
> > +static struct udp_bound_port	udp_splice_init	[IP_VERSIONS][NUM_PORTS];
> >  
> >  enum udp_act_type {
> > -	UDP_ACT_TAP,
> > +	UDP_ACT_TAP_SOCK,
> > +	UDP_ACT_FLAGS,
> 
> I'm having a hard time reviewing the rest because of this, I think.
> Earlier, UDP_ACT_TAP meant that the activity timestamp was observed on
> a non-spliced port.

Ah!  I'd read those as "ACTions we needed to take" in the expiry
timers, rather than tying them to specific activity.  It amounts to
the same thing pre-patch, but it makes more sense to me with that new
lens.

> Now, we have UDP_ACT_FLAGS which seems to mean that activity was seen
> in the direction of (to) the "tap" side, and UDP_ACT_TAP_SOCK meaning
> that activity was seen in the direction of (to) the socket.
> 
> But that's somehow the opposite of UDP_ACT_SPLICE_NS and
> UDP_ACT_SPLICE_INIT -- even though they don't actually indicate
> activity, rather the fact that activity timestamps refer to sockets that
> originated in (*from*) the target namespace, or in the initial
> namespace.
> 
> Anyway, I'm not quite sure: why do we need two separate flags for "tap"
> (from/to) activity?

We don't, this is an ugly side effect.  The problem is that in
scanning for expiries we only have a single port number.  But the
socket is associated with a particular forwarding port, while the
flags are associated with a particular endpoint port.  Without flow
tracking, we don't have a way to match one to the other.

The compromise here is - or is supposed to be - that we have
independent timestamps for the socket and the flags.  The two
timestamps associated with a single flow are supposed to be updated at
basically the same time - looks like I missed some things to do that
as well.

> I mean, as long as we observe activity on a non-spliced flow, we'll
> update the related timestamp, and we make sure the activity flag is
> set. Can't it simply be UDP_ACT_TAP?
> 
> The rest of the series looks good to me.
> 
> >  	UDP_ACT_SPLICE_NS,
> >  	UDP_ACT_SPLICE_INIT,
> >  	UDP_ACT_TYPE_MAX,
> > @@ -246,7 +245,7 @@ void udp_portmap_clear(void)
> >  	unsigned i;
> >  
> >  	for (i = 0; i < NUM_PORTS; i++) {
> > -		udp_tap_map[V4][i].sock = udp_tap_map[V6][i].sock = -1;
> > +		udp_tap_sock[V4][i].sock = udp_tap_sock[V6][i].sock = -1;
> >  		udp_splice_ns[V4][i].sock = udp_splice_ns[V6][i].sock = -1;
> >  		udp_splice_init[V4][i].sock = udp_splice_init[V6][i].sock = -1;
> >  	}
> > @@ -393,16 +392,16 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
> >  	union epoll_ref ref = { .type = EPOLL_TYPE_UDP,
> >  				.udp = { .splice = true, .v6 = v6, .port = src }
> >  			      };
> > -	struct udp_splice_port *sp;
> > +	struct udp_bound_port *bp;
> >  	int act, s;
> >  
> >  	if (ns) {
> >  		ref.udp.pif = PIF_SPLICE;
> > -		sp = &udp_splice_ns[v6 ? V6 : V4][src];
> > +		bp = &udp_splice_ns[v6 ? V6 : V4][src];
> >  		act = UDP_ACT_SPLICE_NS;
> >  	} else {
> >  		ref.udp.pif = PIF_HOST;
> > -		sp = &udp_splice_init[v6 ? V6 : V4][src];
> > +		bp = &udp_splice_init[v6 ? V6 : V4][src];
> >  		act = UDP_ACT_SPLICE_INIT;
> >  	}
> >  
> > @@ -437,7 +436,7 @@ int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
> >  			goto fail;
> >  	}
> >  
> > -	sp->sock = s;
> > +	bp->sock = s;
> >  	bitmap_set(udp_act[v6 ? V6 : V4][act], src);
> >  
> >  	ev.data.u64 = ref.u64;
> > @@ -601,15 +600,15 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport,
> >  	} else if (IN4_IS_ADDR_LOOPBACK(&b->s_in.sin_addr) ||
> >  		   IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr, &c->ip4.addr_seen)) {
> >  		b->iph.saddr = c->ip4.gw.s_addr;
> > -		udp_tap_map[V4][src_port].ts = now->tv_sec;
> > -		udp_tap_map[V4][src_port].flags |= PORT_LOCAL;
> > +		udp_port_flags[V4][src_port].ts = now->tv_sec;
> > +		udp_port_flags[V4][src_port].flags |= PORT_LOCAL;
> >  
> >  		if (IN4_ARE_ADDR_EQUAL(&b->s_in.sin_addr.s_addr, &c->ip4.addr_seen))
> > -			udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK;
> > +			udp_port_flags[V4][src_port].flags &= ~PORT_LOOPBACK;
> >  		else
> > -			udp_tap_map[V4][src_port].flags |= PORT_LOOPBACK;
> > +			udp_port_flags[V4][src_port].flags |= PORT_LOOPBACK;
> >  
> > -		bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port);
> > +		bitmap_set(udp_act[V4][UDP_ACT_FLAGS], src_port);
> >  	} else {
> >  		b->iph.saddr = b->s_in.sin_addr.s_addr;
> >  	}
> > @@ -664,20 +663,20 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
> >  		else
> >  			b->ip6h.saddr = c->ip6.addr_ll;
> >  
> > -		udp_tap_map[V6][src_port].ts = now->tv_sec;
> > -		udp_tap_map[V6][src_port].flags |= PORT_LOCAL;
> > +		udp_port_flags[V6][src_port].ts = now->tv_sec;
> > +		udp_port_flags[V6][src_port].flags |= PORT_LOCAL;
> >  
> >  		if (IN6_IS_ADDR_LOOPBACK(src))
> > -			udp_tap_map[V6][src_port].flags |= PORT_LOOPBACK;
> > +			udp_port_flags[V6][src_port].flags |= PORT_LOOPBACK;
> >  		else
> > -			udp_tap_map[V6][src_port].flags &= ~PORT_LOOPBACK;
> > +			udp_port_flags[V6][src_port].flags &= ~PORT_LOOPBACK;
> >  
> >  		if (IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr))
> > -			udp_tap_map[V6][src_port].flags |= PORT_GUA;
> > +			udp_port_flags[V6][src_port].flags |= PORT_GUA;
> >  		else
> > -			udp_tap_map[V6][src_port].flags &= ~PORT_GUA;
> > +			udp_port_flags[V6][src_port].flags &= ~PORT_GUA;
> >  
> > -		bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port);
> > +		bitmap_set(udp_act[V6][UDP_ACT_FLAGS], src_port);
> >  	} else {
> >  		b->ip6h.daddr = c->ip6.addr_seen;
> >  		b->ip6h.saddr = b->s_in6.sin6_addr;
> > @@ -857,16 +856,16 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
> >  			s_in.sin_addr = c->ip4.dns_host;
> >  		} else if (IN4_ARE_ADDR_EQUAL(&s_in.sin_addr, &c->ip4.gw) &&
> >  			   !c->no_map_gw) {
> > -			if (!(udp_tap_map[V4][dst].flags & PORT_LOCAL) ||
> > -			    (udp_tap_map[V4][dst].flags & PORT_LOOPBACK))
> > +			if (!(udp_port_flags[V4][dst].flags & PORT_LOCAL) ||
> > +			    (udp_port_flags[V4][dst].flags & PORT_LOOPBACK))
> >  				s_in.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
> >  			else
> >  				s_in.sin_addr = c->ip4.addr_seen;
> >  		}
> >  
> >  		debug("UDP from tap src=%hu dst=%hu, s=%d",
> > -		      src, dst, udp_tap_map[V4][src].sock);
> > -		if ((s = udp_tap_map[V4][src].sock) < 0) {
> > +		      src, dst, udp_tap_sock[V4][src].sock);
> > +		if ((s = udp_tap_sock[V4][src].sock) < 0) {
> >  			struct in_addr bind_addr = IN4ADDR_ANY_INIT;
> >  			union udp_epoll_ref uref = {
> >  				.port = src,
> > @@ -886,11 +885,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
> >  			if (s < 0)
> >  				return p->count - idx;
> >  
> > -			udp_tap_map[V4][src].sock = s;
> > -			bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
> > +			udp_tap_sock[V4][src].sock = s;
> > +			bitmap_set(udp_act[V4][UDP_ACT_TAP_SOCK], src);
> >  		}
> >  
> > -		udp_tap_map[V4][src].ts = now->tv_sec;
> > +		udp_tap_sock[V4][src].ts = now->tv_sec;
> >  	} else {
> >  		s_in6 = (struct sockaddr_in6) {
> >  			.sin6_family = AF_INET6,
> > @@ -907,10 +906,10 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
> >  			s_in6.sin6_addr = c->ip6.dns_host;
> >  		} else if (IN6_ARE_ADDR_EQUAL(daddr, &c->ip6.gw) &&
> >  			   !c->no_map_gw) {
> > -			if (!(udp_tap_map[V6][dst].flags & PORT_LOCAL) ||
> > -			    (udp_tap_map[V6][dst].flags & PORT_LOOPBACK))
> > +			if (!(udp_port_flags[V6][dst].flags & PORT_LOCAL) ||
> > +			    (udp_port_flags[V6][dst].flags & PORT_LOOPBACK))
> >  				s_in6.sin6_addr = in6addr_loopback;
> > -			else if (udp_tap_map[V6][dst].flags & PORT_GUA)
> > +			else if (udp_port_flags[V6][dst].flags & PORT_GUA)
> >  				s_in6.sin6_addr = c->ip6.addr;
> >  			else
> >  				s_in6.sin6_addr = c->ip6.addr_seen;
> > @@ -918,7 +917,7 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
> >  			bind_addr = &c->ip6.addr_ll;
> >  		}
> >  
> > -		if ((s = udp_tap_map[V6][src].sock) < 0) {
> > +		if ((s = udp_tap_sock[V6][src].sock) < 0) {
> >  			union udp_epoll_ref uref = {
> >  				.v6 = 1,
> >  				.port = src,
> > @@ -939,11 +938,11 @@ int udp_tap_handler(struct ctx *c, uint8_t pif,
> >  			if (s < 0)
> >  				return p->count - idx;
> >  
> > -			udp_tap_map[V6][src].sock = s;
> > -			bitmap_set(udp_act[V6][UDP_ACT_TAP], src);
> > +			udp_tap_sock[V6][src].sock = s;
> > +			bitmap_set(udp_act[V6][UDP_ACT_TAP_SOCK], src);
> >  		}
> >  
> > -		udp_tap_map[V6][src].ts = now->tv_sec;
> > +		udp_tap_sock[V6][src].ts = now->tv_sec;
> >  	}
> >  
> >  	for (i = 0; i < (int)p->count - idx; i++) {
> > @@ -1015,7 +1014,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> >  			r4 = s = sock_l4(c, AF_INET, IPPROTO_UDP, addr,
> >  					 ifname, port, uref.u32);
> >  
> > -			udp_tap_map[V4][uref.port].sock = s < 0 ? -1 : s;
> > +			udp_tap_sock[V4][uref.port].sock = s < 0 ? -1 : s;
> >  			udp_splice_init[V4][port].sock = s < 0 ? -1 : s;
> >  		} else {
> >  			struct in_addr loopback = IN4ADDR_LOOPBACK_INIT;
> > @@ -1033,7 +1032,7 @@ int udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> >  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP, addr,
> >  					 ifname, port, uref.u32);
> >  
> > -			udp_tap_map[V6][uref.port].sock = s < 0 ? -1 : s;
> > +			udp_tap_sock[V6][uref.port].sock = s < 0 ? -1 : s;
> >  			udp_splice_init[V6][port].sock = s < 0 ? -1 : s;
> >  		} else {
> >  			r6 = s = sock_l4(c, AF_INET6, IPPROTO_UDP,
> > @@ -1086,32 +1085,37 @@ static void udp_splice_iov_init(void)
> >  static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
> >  			  in_port_t port, const struct timespec *now)
> >  {
> > -	struct udp_splice_port *sp;
> > -	struct udp_tap_port *tp;
> > +	struct udp_bound_port *bp;
> > +	struct udp_port_flags *pf;
> >  	int *sockp = NULL;
> >  
> >  	switch (type) {
> > -	case UDP_ACT_TAP:
> > -		tp = &udp_tap_map[v6 ? V6 : V4][port];
> > +	case UDP_ACT_TAP_SOCK:
> > +		bp = &udp_tap_sock[v6 ? V6 : V4][port];
> >  
> > -		if (now->tv_sec - tp->ts > UDP_CONN_TIMEOUT) {
> > -			sockp = &tp->sock;
> > -			tp->flags = 0;
> > -		}
> > +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> > +			sockp = &bp->sock;
> > +
> > +		break;
> > +	case UDP_ACT_FLAGS:
> > +		pf = &udp_port_flags[v6 ? V6 : V4][port];
> > +
> > +		if (now->tv_sec - pf->ts > UDP_CONN_TIMEOUT)
> > +			pf->flags = 0;
> >  
> >  		break;
> >  	case UDP_ACT_SPLICE_INIT:
> > -		sp = &udp_splice_init[v6 ? V6 : V4][port];
> > +		bp = &udp_splice_init[v6 ? V6 : V4][port];
> >  
> > -		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
> > -			sockp = &sp->sock;
> > +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> > +			sockp = &bp->sock;
> >  
> >  		break;
> >  	case UDP_ACT_SPLICE_NS:
> > -		sp = &udp_splice_ns[v6 ? V6 : V4][port];
> > +		bp = &udp_splice_ns[v6 ? V6 : V4][port];
> >  
> > -		if (now->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
> > -			sockp = &sp->sock;
> > +		if (now->tv_sec - bp->ts > UDP_CONN_TIMEOUT)
> > +			sockp = &bp->sock;
> >  
> >  		break;
> >  	default:
> > @@ -1141,7 +1145,7 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
> >  		= outbound ? c->udp.fwd_out.f.map : c->udp.fwd_in.f.map;
> >  	const uint8_t *rmap
> >  		= outbound ? c->udp.fwd_in.f.map : c->udp.fwd_out.f.map;
> > -	struct udp_splice_port (*socks)[NUM_PORTS]
> > +	struct udp_bound_port (*socks)[NUM_PORTS]
> >  		= outbound ? udp_splice_ns : udp_splice_init;
> >  	unsigned port;
> >  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/7] udp: Separate bound sockets from flags
  2024-02-28  2:30     ` David Gibson
@ 2024-02-28  3:53       ` David Gibson
  0 siblings, 0 replies; 15+ messages in thread
From: David Gibson @ 2024-02-28  3:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3183 bytes --]

On Wed, Feb 28, 2024 at 01:30:05PM +1100, David Gibson wrote:
> On Tue, Feb 27, 2024 at 12:56:59PM +0100, Stefano Brivio wrote:
> > On Thu, 22 Feb 2024 10:21:12 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:

[snip]
> > I'm having a hard time reviewing the rest because of this, I think.
> > Earlier, UDP_ACT_TAP meant that the activity timestamp was observed on
> > a non-spliced port.
> 
> Ah!  I'd read those as "ACTions we needed to take" in the expiry
> timers, rather than tying them to specific activity.  It amounts to
> the same thing pre-patch, but it makes more sense to me with that new
> lens.
> 
> > Now, we have UDP_ACT_FLAGS which seems to mean that activity was seen
> > in the direction of (to) the "tap" side, and UDP_ACT_TAP_SOCK meaning
> > that activity was seen in the direction of (to) the socket.
> > 
> > But that's somehow the opposite of UDP_ACT_SPLICE_NS and
> > UDP_ACT_SPLICE_INIT -- even though they don't actually indicate
> > activity, rather the fact that activity timestamps refer to sockets that
> > originated in (*from*) the target namespace, or in the initial
> > namespace.
> > 
> > Anyway, I'm not quite sure: why do we need two separate flags for "tap"
> > (from/to) activity?
> 
> We don't, this is an ugly side effect.  The problem is that in
> scanning for expiries we only have a single port number.  But the
> socket is associated with a particular forwarding port, while the
> flags are associated with a particular endpoint port.  Without flow
> tracking, we don't have a way to match one to the other.
> 
> The compromise here is - or is supposed to be - that we have
> independent timestamps for the socket and the flags.  The two
> timestamps associated with a single flow are supposed to be updated at
> basically the same time - looks like I missed some things to do that
> as well.
> 
> > I mean, as long as we observe activity on a non-spliced flow, we'll
> > update the related timestamp, and we make sure the activity flag is
> > set. Can't it simply be UDP_ACT_TAP?

Urgh.  So it seems like every time I re-examine this, I find more
pre-existing bugs.  So, part of what this patch was attempting to fix
originally is that previously, although we updated the timestamp host->guest
packets, we updated the wrong one for expiring the socket (based on
endpoint rather than forwarding port).

In trying to fix it up, I additionally realised, we currently don't
update the timestamp for host to guest packets at all, except in the
when remapping the src to gateway address, which as noted previously
itself doesn't honour the no_map_gw flag.

*Then* I realised that if we do update the correct timestamp and ACT
flag, we could now set an ACT flag for a socket created by -u, and
therefore cause a port that should remain open to be expired.

I don't think this any longer counts as a "small" fix, so I will
remove it from this series and tackle it elsewhere.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-02-28  3:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-21 23:21 [PATCH 0/7] Assorted small fixes for UDP David Gibson
2024-02-21 23:21 ` [PATCH 1/7] udp: Don't attempt to translate a 0.0.0.0 source address David Gibson
2024-02-22 17:46   ` Stefano Brivio
2024-02-23  4:03     ` David Gibson
2024-02-21 23:21 ` [PATCH 2/7] udp: Set pif in epoll reference for ephemeral host sockets David Gibson
2024-02-21 23:21 ` [PATCH 3/7] udp: Unconditionally clear act flag in udp_timer_one() David Gibson
2024-02-27 11:56   ` Stefano Brivio
2024-02-28  1:54     ` David Gibson
2024-02-21 23:21 ` [PATCH 4/7] udp: Separate bound sockets from flags David Gibson
2024-02-27 11:56   ` Stefano Brivio
2024-02-28  2:30     ` David Gibson
2024-02-28  3:53       ` David Gibson
2024-02-21 23:21 ` [PATCH 5/7] udp: Small streamline to udp_update_hdr4() David Gibson
2024-02-21 23:21 ` [PATCH 6/7] udp: Fix incorrect usage of IPv6 state in IPv4 path David Gibson
2024-02-21 23:21 ` [PATCH 7/7] udp: Remove unnecessary test for unspecified addr_out David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).