public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect
@ 2025-02-19 19:30 Jon Maloy
  2025-02-19 19:30 ` [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function Jon Maloy
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jon Maloy @ 2025-02-19 19:30 UTC (permalink / raw)
  To: passt-dev, sbrivio, lvivier, dgibson, jmaloy

Reconstruct incoming ICMP headers for failed UDP connect and forward back
to local peer.

v2: - Added patch breaking out udp header creation from function
      tap_udp4_send().
    - Updated the ICMP creation by using the new function.
    - Added logics to find correct flow, depending on origin.
    - All done after feedback from David Gibson.
v3: - More changes after feedback from David Gibson.


Jon Maloy (2):
  tap: break out building of udp header from tap_udp4_send function
  udp: create and send ICMPv4 to local peer when applicable

 tap.c          | 36 ++++++++++++++-----
 tap.h          |  7 ++++
 udp.c          | 96 +++++++++++++++++++++++++++++++++++++++++++-------
 udp_internal.h |  3 +-
 udp_vu.c       |  4 +--
 5 files changed, 121 insertions(+), 25 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function
  2025-02-19 19:30 [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect Jon Maloy
@ 2025-02-19 19:30 ` Jon Maloy
  2025-02-20  1:08   ` David Gibson
  2025-02-19 19:30 ` [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
  2025-02-20  3:47 ` [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect David Gibson
  2 siblings, 1 reply; 8+ messages in thread
From: Jon Maloy @ 2025-02-19 19:30 UTC (permalink / raw)
  To: passt-dev, sbrivio, lvivier, dgibson, jmaloy

We will need to build the UDP header at other locations than in function
tap_udp4_send(), so we break that part out to a separate function.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tap.c | 32 +++++++++++++++++++++++++-------
 tap.h |  6 +++++-
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/tap.c b/tap.c
index dfe6dcf..95d64bf 100644
--- a/tap.c
+++ b/tap.c
@@ -162,7 +162,7 @@ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
 }
 
 /**
- * tap_udp4_send() - Send UDP over IPv4 packet
+ * tap_push_uh4() - Build UDPv4 header with checksum
  * @c:		Execution context
  * @src:	IPv4 source address
  * @sport:	UDP source port
@@ -171,15 +171,11 @@ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
  * @in:		UDP payload contents (not including UDP header)
  * @dlen:	UDP payload length (not including UDP header)
  */
-void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
+void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen)
 {
 	size_t l4len = dlen + sizeof(struct udphdr);
-	char buf[USHRT_MAX];
-	struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP);
-	struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, l4len, IPPROTO_UDP);
-	char *data = (char *)(uh + 1);
 	const struct iovec iov = {
 		.iov_base = (void *)in,
 		.iov_len = dlen
@@ -190,8 +186,30 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 	uh->dest = htons(dport);
 	uh->len = htons(l4len);
 	csum_udp4(uh, src, dst, &payload);
-	memcpy(data, in, dlen);
+	return uh + 1;
+}
+
+/**
+ * tap_udp4_send() - Send UDP over IPv4 packet
+ * @c:		Execution context
+ * @src:	IPv4 source address
+ * @sport:	UDP source port
+ * @dst:	IPv4 destination address
+ * @dport:	UDP destination port
+ * @in:		UDP payload contents (not including UDP header)
+ * @dlen:	UDP payload length (not including UDP header)
+ */
+void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
+		   struct in_addr dst, in_port_t dport,
+		   const void *in, size_t dlen)
+{
+	size_t l4len = dlen + sizeof(struct udphdr);
+	char buf[USHRT_MAX];
+	struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP);
+	struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, l4len, IPPROTO_UDP);
+	char *data = tap_push_uh4(uh, src, sport, dst, dport, in, dlen);
 
+	memcpy(data, in, dlen);
 	tap_send_single(c, buf, dlen + (data - buf));
 }
 
diff --git a/tap.h b/tap.h
index dfbd8b9..3451343 100644
--- a/tap.h
+++ b/tap.h
@@ -8,6 +8,8 @@
 
 #define ETH_HDR_INIT(proto) { .h_proto = htons_constant(proto) }
 
+struct udphdr;
+
 /**
  * struct tap_hdr - tap backend specific headers
  * @vnet_len:	Frame length (for qemu socket transport)
@@ -43,7 +45,9 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
 	if (thdr)
 		thdr->vnet_len = htonl(l2len);
 }
-
+void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
+		   struct in_addr dst, in_port_t dport,
+		   const void *in, size_t dlen);
 void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen);
-- 
@@ -8,6 +8,8 @@
 
 #define ETH_HDR_INIT(proto) { .h_proto = htons_constant(proto) }
 
+struct udphdr;
+
 /**
  * struct tap_hdr - tap backend specific headers
  * @vnet_len:	Frame length (for qemu socket transport)
@@ -43,7 +45,9 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
 	if (thdr)
 		thdr->vnet_len = htonl(l2len);
 }
-
+void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
+		   struct in_addr dst, in_port_t dport,
+		   const void *in, size_t dlen);
 void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable
  2025-02-19 19:30 [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect Jon Maloy
  2025-02-19 19:30 ` [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function Jon Maloy
@ 2025-02-19 19:30 ` Jon Maloy
  2025-02-20  3:13   ` David Gibson
  2025-02-20  3:47 ` [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect David Gibson
  2 siblings, 1 reply; 8+ messages in thread
From: Jon Maloy @ 2025-02-19 19:30 UTC (permalink / raw)
  To: passt-dev, sbrivio, lvivier, dgibson, jmaloy

When a local peer sends a UDP message to a non-existing port on an
existing remote host, that host will return an ICMP message containing
the error code ICMP_PORT_UNREACH, plus the header and the first eight
bytes of the original message. If the sender socket has been connected,
it uses this message to issue a "Connection Refused" event to the user.

Until now, we have only read such events from the externally facing
socket, but we don't forward them back to the local sender because
we cannot read the ICMP message directly to user space. Because of
this, the local peer will hang and wait for a response that never
arrives.

We now fix this for IPv4 by recreating and forwarding a correct ICMP
message back to the internal sender. We synthesize the message based
on the information in the extended error structure, plus the returned
part of the original message body.

Note that for the sake of completeness, we even produce ICMP messages
for other error codes. We have noticed that at least ICMP_PROT_UNREACH
is propagated as an error event back to the user.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
v2: - Updated the ICMP creation to use the new function tap_push_uh4().
    - Added logics to find correct flow, depending on origin.
    - All done after feedback from David Gibson.
v3: - Passing parameter 'now' along with call to udp_sock_errs() call.
    - Corrected lookup of flow from listener socket
    - All done after feedback from David Gibson.
---
 tap.c          |  4 +--
 tap.h          |  3 ++
 udp.c          | 96 +++++++++++++++++++++++++++++++++++++++++++-------
 udp_internal.h |  3 +-
 udp_vu.c       |  4 +--
 5 files changed, 92 insertions(+), 18 deletions(-)

diff --git a/tap.c b/tap.c
index 95d64bf..902f076 100644
--- a/tap.c
+++ b/tap.c
@@ -142,8 +142,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto)
  *
  * Return: pointer at which to write the packet's payload
  */
-static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
-			   struct in_addr dst, size_t l4len, uint8_t proto)
+void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
+		    struct in_addr dst, size_t l4len, uint8_t proto)
 {
 	uint16_t l3len = l4len + sizeof(*ip4h);
 
diff --git a/tap.h b/tap.h
index 3451343..5b326f5 100644
--- a/tap.h
+++ b/tap.h
@@ -45,9 +45,12 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
 	if (thdr)
 		thdr->vnet_len = htonl(l2len);
 }
+
 void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen);
+void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
+		    struct in_addr dst, size_t l4len, uint8_t proto);
 void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
 		   struct in_addr dst, in_port_t dport,
 		   const void *in, size_t dlen);
diff --git a/udp.c b/udp.c
index 923cc38..b9c53eb 100644
--- a/udp.c
+++ b/udp.c
@@ -87,6 +87,7 @@
 #include <netinet/in.h>
 #include <netinet/ip.h>
 #include <netinet/udp.h>
+#include <netinet/ip_icmp.h>
 #include <stdint.h>
 #include <stddef.h>
 #include <string.h>
@@ -402,25 +403,72 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
 	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
 }
 
+/**
+ * udp_send_conn_fail_icmp4() - Construct and send ICMP to local peer
+ * @c:	          Execution context
+ * @ee:	  Extended error descriptor
+ * @ref:	  epoll reference
+ * @in:	  First bytes (max 8) of original UDP message body
+ * @dlen:	  Length of the read part of original UDP message body
+ */
+static void udp_send_conn_fail_icmp4(const struct ctx *c,
+				     const struct sock_extended_err *ee,
+				     const struct flowside *toside,
+				     void *in, size_t dlen)
+{
+	struct in_addr oaddr = toside->oaddr.v4mapped.a4;
+	struct in_addr eaddr = toside->eaddr.v4mapped.a4;
+	in_port_t eport = toside->eport;
+	in_port_t oport = toside->oport;
+	struct {
+		struct icmphdr icmp4h;
+		struct iphdr ip4h;
+		struct udphdr uh;
+		char data[8];
+	} __attribute__((packed, aligned(__alignof__(max_align_t)))) msg;
+	size_t msglen = sizeof(msg) - sizeof(msg.data) + dlen;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.icmp4h.type = ee->ee_type;
+	msg.icmp4h.code = ee->ee_code;
+
+	/* Reconstruct the original headers as returned in the ICMP message */
+	tap_push_ip4h(&msg.ip4h, eaddr, oaddr, dlen, IPPROTO_UDP);
+	tap_push_uh4(&msg.uh, eaddr, eport, oaddr, oport, in, dlen);
+	memcpy(&msg.data, in, dlen);
+
+	tap_icmp4_send(c, oaddr, eaddr, &msg, msglen);
+}
+
 /**
  * udp_sock_recverr() - Receive and clear an error from a socket
- * @s:		Socket to receive from
+ * @c:	          Execution context
+ * @ref:	  epoll reference
+ * @now:	  Current timestamp
  *
  * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
  *         if there was an error reading the queue
  *
  * #syscalls recvmsg
  */
-static int udp_sock_recverr(int s)
+static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref,
+			    const struct timespec *now)
 {
 	const struct sock_extended_err *ee;
 	const struct cmsghdr *hdr;
+	union sockaddr_inany saddr;
 	char buf[CMSG_SPACE(sizeof(*ee))];
+	char udp_data[8];
+	int s = ref.fd;
+	struct iovec iov = {
+		.iov_base = udp_data,
+		.iov_len = sizeof(udp_data)
+	};
 	struct msghdr mh = {
-		.msg_name = NULL,
-		.msg_namelen = 0,
-		.msg_iov = NULL,
-		.msg_iovlen = 0,
+		.msg_name = &saddr,
+		.msg_namelen = sizeof(saddr),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
 		.msg_control = buf,
 		.msg_controllen = sizeof(buf),
 	};
@@ -450,8 +498,27 @@ static int udp_sock_recverr(int s)
 	}
 
 	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
-
-	/* TODO: When possible propagate and otherwise handle errors */
+	if (ee->ee_type == ICMP_DEST_UNREACH) {
+		flow_sidx_t sidx;
+		struct udp_flow *flow;
+		const struct flowside *toside;
+
+		if (ref.type == EPOLL_TYPE_UDP_LISTEN) {
+			sidx = flow_lookup_sa(c, IPPROTO_UDP, ref.udp.pif,
+					      &saddr, ref.udp.port);
+			flow = udp_at_sidx(sidx);
+			if (!flow) {
+				err("Unexpected cmsg reading error queue");
+				return -1;
+			}
+			flow->ts = now->tv_sec;
+			sidx = flow_sidx_opposite(sidx);
+		} else {
+			sidx = flow_sidx_opposite(ref.flowside);
+		}
+		toside = flowside_at_sidx(sidx);
+		udp_send_conn_fail_icmp4(c, ee, toside, udp_data, rc);
+	}
 	debug("%s error on UDP socket %i: %s",
 	      str_ee_origin(ee), s, strerror_(ee->ee_errno));
 
@@ -461,15 +528,18 @@ static int udp_sock_recverr(int s)
 /**
  * udp_sock_errs() - Process errors on a socket
  * @c:		Execution context
- * @s:		Socket to receive from
+ * @ref:        epoll reference
  * @events:	epoll events bitmap
+ * @now:	Current timestamp
  *
  * Return: Number of errors handled, or < 0 if we have an unrecoverable error
  */
-int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
+int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events,
+		  const struct timespec *now)
 {
 	unsigned n_err = 0;
 	socklen_t errlen;
+	int s = ref.fd;
 	int rc, err;
 
 	ASSERT(!c->no_udp);
@@ -478,7 +548,7 @@ int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
 		return 0; /* Nothing to do */
 
 	/* Empty the error queue */
-	while ((rc = udp_sock_recverr(s)) > 0)
+	while ((rc = udp_sock_recverr(c, ref, now)) > 0)
 		n_err += rc;
 
 	if (rc < 0)
@@ -558,7 +628,7 @@ static void udp_buf_listen_sock_handler(const struct ctx *c,
 	const socklen_t sasize = sizeof(udp_meta[0].s_in);
 	int n, i;
 
-	if (udp_sock_errs(c, ref.fd, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		err("UDP: Unrecoverable error on listening socket:"
 		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
 		/* FIXME: what now?  close/re-open socket? */
@@ -661,7 +731,7 @@ static void udp_buf_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
 
 	from_s = uflow->s[ref.flowside.sidei];
 
-	if (udp_sock_errs(c, from_s, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		flow_err(uflow, "Unrecoverable error on reply socket");
 		flow_err_details(uflow);
 		udp_flow_close(c, uflow);
diff --git a/udp_internal.h b/udp_internal.h
index cc80e30..c5f8304 100644
--- a/udp_internal.h
+++ b/udp_internal.h
@@ -30,5 +30,6 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
                        const struct flowside *toside, size_t dlen,
 		       bool no_udp_csum);
-int udp_sock_errs(const struct ctx *c, int s, uint32_t events);
+int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events,
+		  const struct timespec *now);
 #endif /* UDP_INTERNAL_H */
diff --git a/udp_vu.c b/udp_vu.c
index 4123510..0b1d3c6 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -227,7 +227,7 @@ void udp_vu_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
 	int i;
 
-	if (udp_sock_errs(c, ref.fd, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		err("UDP: Unrecoverable error on listening socket:"
 		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
 		return;
@@ -302,7 +302,7 @@ void udp_vu_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
 
 	ASSERT(!c->no_udp);
 
-	if (udp_sock_errs(c, from_s, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		flow_err(uflow, "Unrecoverable error on reply socket");
 		flow_err_details(uflow);
 		udp_flow_close(c, uflow);
-- 
@@ -227,7 +227,7 @@ void udp_vu_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
 	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
 	int i;
 
-	if (udp_sock_errs(c, ref.fd, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		err("UDP: Unrecoverable error on listening socket:"
 		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
 		return;
@@ -302,7 +302,7 @@ void udp_vu_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
 
 	ASSERT(!c->no_udp);
 
-	if (udp_sock_errs(c, from_s, events) < 0) {
+	if (udp_sock_errs(c, ref, events, now) < 0) {
 		flow_err(uflow, "Unrecoverable error on reply socket");
 		flow_err_details(uflow);
 		udp_flow_close(c, uflow);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function
  2025-02-19 19:30 ` [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function Jon Maloy
@ 2025-02-20  1:08   ` David Gibson
  0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2025-02-20  1:08 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, sbrivio, lvivier, dgibson

[-- Attachment #1: Type: text/plain, Size: 3839 bytes --]

On Wed, Feb 19, 2025 at 02:30:06PM -0500, Jon Maloy wrote:
> We will need to build the UDP header at other locations than in function
> tap_udp4_send(), so we break that part out to a separate function.
> 
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c | 32 +++++++++++++++++++++++++-------
>  tap.h |  6 +++++-
>  2 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index dfe6dcf..95d64bf 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -162,7 +162,7 @@ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>  }
>  
>  /**
> - * tap_udp4_send() - Send UDP over IPv4 packet
> + * tap_push_uh4() - Build UDPv4 header with checksum
>   * @c:		Execution context
>   * @src:	IPv4 source address
>   * @sport:	UDP source port
> @@ -171,15 +171,11 @@ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
>   * @in:		UDP payload contents (not including UDP header)
>   * @dlen:	UDP payload length (not including UDP header)
>   */
> -void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
> +void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
>  		   struct in_addr dst, in_port_t dport,
>  		   const void *in, size_t dlen)
>  {
>  	size_t l4len = dlen + sizeof(struct udphdr);
> -	char buf[USHRT_MAX];
> -	struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP);
> -	struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, l4len, IPPROTO_UDP);
> -	char *data = (char *)(uh + 1);
>  	const struct iovec iov = {
>  		.iov_base = (void *)in,
>  		.iov_len = dlen
> @@ -190,8 +186,30 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
>  	uh->dest = htons(dport);
>  	uh->len = htons(l4len);
>  	csum_udp4(uh, src, dst, &payload);
> -	memcpy(data, in, dlen);
> +	return uh + 1;
> +}
> +
> +/**
> + * tap_udp4_send() - Send UDP over IPv4 packet
> + * @c:		Execution context
> + * @src:	IPv4 source address
> + * @sport:	UDP source port
> + * @dst:	IPv4 destination address
> + * @dport:	UDP destination port
> + * @in:		UDP payload contents (not including UDP header)
> + * @dlen:	UDP payload length (not including UDP header)
> + */
> +void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
> +		   struct in_addr dst, in_port_t dport,
> +		   const void *in, size_t dlen)
> +{
> +	size_t l4len = dlen + sizeof(struct udphdr);
> +	char buf[USHRT_MAX];
> +	struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP);
> +	struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, l4len, IPPROTO_UDP);
> +	char *data = tap_push_uh4(uh, src, sport, dst, dport, in, dlen);
>  
> +	memcpy(data, in, dlen);
>  	tap_send_single(c, buf, dlen + (data - buf));
>  }
>  
> diff --git a/tap.h b/tap.h
> index dfbd8b9..3451343 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -8,6 +8,8 @@
>  
>  #define ETH_HDR_INIT(proto) { .h_proto = htons_constant(proto) }
>  
> +struct udphdr;
> +
>  /**
>   * struct tap_hdr - tap backend specific headers
>   * @vnet_len:	Frame length (for qemu socket transport)
> @@ -43,7 +45,9 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
>  	if (thdr)
>  		thdr->vnet_len = htonl(l2len);
>  }
> -
> +void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
> +		   struct in_addr dst, in_port_t dport,
> +		   const void *in, size_t dlen);
>  void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
>  		   struct in_addr dst, in_port_t dport,
>  		   const void *in, size_t dlen);

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable
  2025-02-19 19:30 ` [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
@ 2025-02-20  3:13   ` David Gibson
  0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2025-02-20  3:13 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, sbrivio, lvivier, dgibson

[-- Attachment #1: Type: text/plain, Size: 6168 bytes --]

On Wed, Feb 19, 2025 at 02:30:07PM -0500, Jon Maloy wrote:
> When a local peer sends a UDP message to a non-existing port on an
> existing remote host, that host will return an ICMP message containing
> the error code ICMP_PORT_UNREACH, plus the header and the first eight
> bytes of the original message. If the sender socket has been connected,
> it uses this message to issue a "Connection Refused" event to the user.
> 
> Until now, we have only read such events from the externally facing
> socket, but we don't forward them back to the local sender because
> we cannot read the ICMP message directly to user space. Because of
> this, the local peer will hang and wait for a response that never
> arrives.
> 
> We now fix this for IPv4 by recreating and forwarding a correct ICMP
> message back to the internal sender. We synthesize the message based
> on the information in the extended error structure, plus the returned
> part of the original message body.
> 
> Note that for the sake of completeness, we even produce ICMP messages
> for other error codes. We have noticed that at least ICMP_PROT_UNREACH
> is propagated as an error event back to the user.
> 
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>

Almost there...

[snip]
>  	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
> -
> -	/* TODO: When possible propagate and otherwise handle errors */
> +	if (ee->ee_type == ICMP_DEST_UNREACH) {
> +		flow_sidx_t sidx;
> +		struct udp_flow *flow;
> +		const struct flowside *toside;
> +
> +		if (ref.type == EPOLL_TYPE_UDP_LISTEN) {
> +			sidx = flow_lookup_sa(c, IPPROTO_UDP, ref.udp.pif,
> +					      &saddr, ref.udp.port);
> +			flow = udp_at_sidx(sidx);
> +			if (!flow) {
> +				err("Unexpected cmsg reading error queue");
> +				return -1;

This is too strong a response if you can't find a flow.  This will
happen if we get some random ICMP which doesn't match an existing
flow.  It could be a long delayed ICMP from some flow that already
expired, some peer confused by something, or a malicious actor.  This
will log a hard error (without rate limiting) and break scanning for
any additional errors.

I think we want just a trace() and return 1; here (we did receive and
process an error - but all we could do was drop it).

> +			}
> +			flow->ts = now->tv_sec;
> +			sidx = flow_sidx_opposite(sidx);
> +		} else {
> +			sidx = flow_sidx_opposite(ref.flowside);
> +		}
> +		toside = flowside_at_sidx(sidx);
> +		udp_send_conn_fail_icmp4(c, ee, toside, udp_data, rc);
> +	}
>  	debug("%s error on UDP socket %i: %s",
>  	      str_ee_origin(ee), s, strerror_(ee->ee_errno));
>  
> @@ -461,15 +528,18 @@ static int udp_sock_recverr(int s)
>  /**
>   * udp_sock_errs() - Process errors on a socket
>   * @c:		Execution context
> - * @s:		Socket to receive from
> + * @ref:        epoll reference
>   * @events:	epoll events bitmap
> + * @now:	Current timestamp
>   *
>   * Return: Number of errors handled, or < 0 if we have an unrecoverable error
>   */
> -int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
> +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events,
> +		  const struct timespec *now)
>  {
>  	unsigned n_err = 0;
>  	socklen_t errlen;
> +	int s = ref.fd;
>  	int rc, err;
>  
>  	ASSERT(!c->no_udp);
> @@ -478,7 +548,7 @@ int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
>  		return 0; /* Nothing to do */
>  
>  	/* Empty the error queue */
> -	while ((rc = udp_sock_recverr(s)) > 0)
> +	while ((rc = udp_sock_recverr(c, ref, now)) > 0)
>  		n_err += rc;
>  
>  	if (rc < 0)
> @@ -558,7 +628,7 @@ static void udp_buf_listen_sock_handler(const struct ctx *c,
>  	const socklen_t sasize = sizeof(udp_meta[0].s_in);
>  	int n, i;
>  
> -	if (udp_sock_errs(c, ref.fd, events) < 0) {
> +	if (udp_sock_errs(c, ref, events, now) < 0) {
>  		err("UDP: Unrecoverable error on listening socket:"
>  		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
>  		/* FIXME: what now?  close/re-open socket? */
> @@ -661,7 +731,7 @@ static void udp_buf_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
>  
>  	from_s = uflow->s[ref.flowside.sidei];
>  
> -	if (udp_sock_errs(c, from_s, events) < 0) {
> +	if (udp_sock_errs(c, ref, events, now) < 0) {
>  		flow_err(uflow, "Unrecoverable error on reply socket");
>  		flow_err_details(uflow);
>  		udp_flow_close(c, uflow);
> diff --git a/udp_internal.h b/udp_internal.h
> index cc80e30..c5f8304 100644
> --- a/udp_internal.h
> +++ b/udp_internal.h
> @@ -30,5 +30,6 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>  size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>                         const struct flowside *toside, size_t dlen,
>  		       bool no_udp_csum);
> -int udp_sock_errs(const struct ctx *c, int s, uint32_t events);
> +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events,
> +		  const struct timespec *now);
>  #endif /* UDP_INTERNAL_H */
> diff --git a/udp_vu.c b/udp_vu.c
> index 4123510..0b1d3c6 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -227,7 +227,7 @@ void udp_vu_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
>  	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>  	int i;
>  
> -	if (udp_sock_errs(c, ref.fd, events) < 0) {
> +	if (udp_sock_errs(c, ref, events, now) < 0) {
>  		err("UDP: Unrecoverable error on listening socket:"
>  		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
>  		return;
> @@ -302,7 +302,7 @@ void udp_vu_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
>  
>  	ASSERT(!c->no_udp);
>  
> -	if (udp_sock_errs(c, from_s, events) < 0) {
> +	if (udp_sock_errs(c, ref, events, now) < 0) {
>  		flow_err(uflow, "Unrecoverable error on reply socket");
>  		flow_err_details(uflow);
>  		udp_flow_close(c, uflow);

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect
  2025-02-19 19:30 [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect Jon Maloy
  2025-02-19 19:30 ` [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function Jon Maloy
  2025-02-19 19:30 ` [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
@ 2025-02-20  3:47 ` David Gibson
  2025-02-20 16:08   ` Jon Maloy
  2 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2025-02-20  3:47 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, sbrivio, lvivier, dgibson

[-- Attachment #1: Type: text/plain, Size: 2505 bytes --]

On Wed, Feb 19, 2025 at 02:30:05PM -0500, Jon Maloy wrote:
> Reconstruct incoming ICMP headers for failed UDP connect and forward back
> to local peer.
> 
> v2: - Added patch breaking out udp header creation from function
>       tap_udp4_send().
>     - Updated the ICMP creation by using the new function.
>     - Added logics to find correct flow, depending on origin.
>     - All done after feedback from David Gibson.
> v3: - More changes after feedback from David Gibson.

I gave this a test.  The outbound "connection" version works nicely:

$ ./pasta --config-net socat STDIO UDP4:<gw address>:9999
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first
qwer
2025/02/20 14:39:08 socat[1] E read(5, 0x55c12494f000, 8192): Connection refused

I also tried with an inbound "connection" to test the handling of
errors on listening sockets.  There I,

	1. Start a server in pasta
	2. Connect and write with a client on the host
	3. Kill the client
	4. Attempt to more data from the server

Without pasta in the way, this gives a similar Connection refused
error on the server socat.  With pasta in the way, it doesn't, even
with this patch.

Looking at an strace I suspect the problem is that the kernel doesn't
deliver EPOLLERR events to non-connect()ed sockets for ICMPs, so pasta
never knows to look for an error.  It think it works for plain socat,
because it - even in server mode - connect()s its socket.  pasta
doesn't however, and just sends the outbound packets via the original
"listening" socket.

The only way I can see to fix this would be to create connect()ed
sockets for both ends of a flow when we establish it.  I _think_ we can
have the listening and connected socket concurrently (at least with
REUSEADDR) with the connect()ed socket taking priority when it
matches.

However, it does have a complication: there's a brief time window
between bind() and connect() on the new socket, when it might pick up
packets that should go to the original listening socket.  We'd need to
decide what to do with that case.

In any case, it kind of looks like there's not much point trying to
handle error events on the listening socket (beyond a debug()
message), since I'm not sure such events are ever delivered.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect
  2025-02-20  3:47 ` [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect David Gibson
@ 2025-02-20 16:08   ` Jon Maloy
  2025-02-21  2:25     ` David Gibson
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Maloy @ 2025-02-20 16:08 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, sbrivio, lvivier, dgibson



On 2025-02-19 22:47, David Gibson wrote:
> On Wed, Feb 19, 2025 at 02:30:05PM -0500, Jon Maloy wrote:
>> Reconstruct incoming ICMP headers for failed UDP connect and forward back
>> to local peer.
>>
>> v2: - Added patch breaking out udp header creation from function
>>        tap_udp4_send().
>>      - Updated the ICMP creation by using the new function.
>>      - Added logics to find correct flow, depending on origin.
>>      - All done after feedback from David Gibson.
>> v3: - More changes after feedback from David Gibson.
> 

[...]

> "listening" socket.
> 
> The only way I can see to fix this would be to create connect()ed
> sockets for both ends of a flow when we establish it.  I _think_ we can
> have the listening and connected socket concurrently (at least with
> REUSEADDR) with the connect()ed socket taking priority when it
> matches.
> 
> However, it does have a complication: there's a brief time window
> between bind() and connect() on the new socket, when it might pick up
> packets that should go to the original listening socket.  We'd need to
> decide what to do with that case.
> 
> In any case, it kind of looks like there's not much point trying to
> handle error events on the listening socket (beyond a debug()
> message), since I'm not sure such events are ever delivered.
> 

I eliminated the handling of listener sockets ICMPs in my latest version.
I can create a separate connected socket in a separate patch,
if you agree that is the way to go.

///jon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect
  2025-02-20 16:08   ` Jon Maloy
@ 2025-02-21  2:25     ` David Gibson
  0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2025-02-21  2:25 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, sbrivio, lvivier, dgibson

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

On Thu, Feb 20, 2025 at 11:08:28AM -0500, Jon Maloy wrote:
> 
> 
> On 2025-02-19 22:47, David Gibson wrote:
> > On Wed, Feb 19, 2025 at 02:30:05PM -0500, Jon Maloy wrote:
> > > Reconstruct incoming ICMP headers for failed UDP connect and forward back
> > > to local peer.
> > > 
> > > v2: - Added patch breaking out udp header creation from function
> > >        tap_udp4_send().
> > >      - Updated the ICMP creation by using the new function.
> > >      - Added logics to find correct flow, depending on origin.
> > >      - All done after feedback from David Gibson.
> > > v3: - More changes after feedback from David Gibson.
> > 
> 
> [...]
> 
> > "listening" socket.
> > 
> > The only way I can see to fix this would be to create connect()ed
> > sockets for both ends of a flow when we establish it.  I _think_ we can
> > have the listening and connected socket concurrently (at least with
> > REUSEADDR) with the connect()ed socket taking priority when it
> > matches.
> > 
> > However, it does have a complication: there's a brief time window
> > between bind() and connect() on the new socket, when it might pick up
> > packets that should go to the original listening socket.  We'd need to
> > decide what to do with that case.
> > 
> > In any case, it kind of looks like there's not much point trying to
> > handle error events on the listening socket (beyond a debug()
> > message), since I'm not sure such events are ever delivered.
> > 
> 
> I eliminated the handling of listener sockets ICMPs in my latest
> version.

I think that makes sense for now.

> I can create a separate connected socket in a separate patch,
> if you agree that is the way to go.

That's reasonable.  Fwiw someone momentarily popped up on the IRC and
suggested trying to read MSG_ERRQUEUE for the listening sockets on
EPOLLIN events.  I don't know if that will do anything useful, but you
could try.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-02-21  6:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-19 19:30 [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect Jon Maloy
2025-02-19 19:30 ` [PATCH v3 1/2] tap: break out building of udp header from tap_udp4_send function Jon Maloy
2025-02-20  1:08   ` David Gibson
2025-02-19 19:30 ` [PATCH v3 2/2] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-02-20  3:13   ` David Gibson
2025-02-20  3:47 ` [PATCH v3 0/2] Reconstruct ICMP headers for failed UDP connect David Gibson
2025-02-20 16:08   ` Jon Maloy
2025-02-21  2:25     ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).