public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [2/4] udp: create and send ICMPv4 to local peer when applicable
Date: Wed, 5 Mar 2025 11:23:01 +1100	[thread overview]
Message-ID: <Z8eZZSGX3XqOljAF@zatzit> (raw)
In-Reply-To: <20250304224532.1770263-3-jmaloy@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11327 bytes --]

On Tue, Mar 04, 2025 at 05:45:30PM -0500, Jon Maloy wrote:
> When a local peer sends a UDP message to a non-existing port on an
> existing remote host, that host will return an ICMP message containing
> the error code ICMP_PORT_UNREACH, plus the header and the first eight
> bytes of the original message. If the sender socket has been connected,
> it uses this message to issue a "Connection Refused" event to the user.
> 
> Until now, we have only read such events from the externally facing
> socket, but we don't forward them back to the local sender because
> we cannot read the ICMP message directly to user space. Because of
> this, the local peer will hang and wait for a response that never
> arrives.
> 
> We now fix this for IPv4 by recreating and forwarding a correct ICMP
> message back to the internal sender. We synthesize the message based
> on the information in the extended error structure, plus the returned
> part of the original message body.
> 
> Note that for the sake of completeness, we even produce ICMP messages
> for other error codes. We have noticed that at least ICMP_PROT_UNREACH
> is propagated as an error event back to the user.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
> v2: - Updated the ICMP creation to use the new function tap_push_uh4().
>     - Added logics to find correct flow, depending on origin.
>     - All done after feedback from David Gibson.
> v3: - Passing parameter 'now' along with call to udp_sock_errs() call.
>     - Corrected lookup of flow from listener socket
>     - All done after feedback from David Gibson.
> v4: - Eliminate any attempt to handle ICMPs from listener sockets.
>     - After feedback from David Gibson.
> v5: - Small reshuffle in anticipation of following commits.
> v6: - Introduced macro ICMP4_MAX_DLEN
> v7: - Introduced ASSERT() on data lenght size
>     - Added MTU for ICMPv4 ICMP_FRAG_NEEDED messages
> v8: - Distinguishing between ICMP source address and origininal UDP
>       destination address.
> v9: - Fixed wrong l4len value given to tap_push_ip4h()

Ah, yes.  I also missed that in the previous version.

Again,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c          |  4 +--
>  tap.h          |  2 ++
>  udp.c          | 87 +++++++++++++++++++++++++++++++++++++++++++-------
>  udp_internal.h |  2 +-
>  udp_vu.c       |  4 +--
>  5 files changed, 82 insertions(+), 17 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index d5c8edd..97416ac 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -143,8 +143,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto)
>   *
>   * Return: pointer at which to write the packet's payload
>   */
> -static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
> -			   struct in_addr dst, size_t l4len, uint8_t proto)
> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
> +		    struct in_addr dst, size_t l4len, uint8_t proto)
>  {
>  	uint16_t l3len = l4len + sizeof(*ip4h);
>  
> diff --git a/tap.h b/tap.h
> index 0b3eb93..37da21d 100644
> --- a/tap.h
> +++ b/tap.h
> @@ -47,6 +47,8 @@ static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len)
>  void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
>  		   struct in_addr dst, in_port_t dport,
>  		   const void *in, size_t dlen);
> +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src,
> +		    struct in_addr dst, size_t l4len, uint8_t proto);
>  void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport,
>  		   struct in_addr dst, in_port_t dport,
>  		   const void *in, size_t dlen);
> diff --git a/udp.c b/udp.c
> index 923cc38..c260e3d 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -87,6 +87,7 @@
>  #include <netinet/in.h>
>  #include <netinet/ip.h>
>  #include <netinet/udp.h>
> +#include <netinet/ip_icmp.h>
>  #include <stdint.h>
>  #include <stddef.h>
>  #include <string.h>
> @@ -112,6 +113,9 @@
>  #include "udp_internal.h"
>  #include "udp_vu.h"
>  
> +/* Maximum UDP data to be returned in ICMP messages */
> +#define ICMP4_MAX_DLEN 8
> +
>  /* "Spliced" sockets indexed by bound port (host order) */
>  static int udp_splice_ns  [IP_VERSIONS][NUM_PORTS];
>  static int udp_splice_init[IP_VERSIONS][NUM_PORTS];
> @@ -402,25 +406,76 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
>  	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
>  }
>  
> +/**
> + * udp_send_conn_fail_icmp4() - Construct and send ICMPv4 to local peer
> + * @c:		Execution context
> + * @ee:	Extended error descriptor
> + * @toside:	Destination side of flow
> + * @saddr:	Address of ICMP generating node
> + * @in:	First bytes (max 8) of original UDP message body
> + * @dlen:	Length of the read part of original UDP message body
> + */
> +static void udp_send_conn_fail_icmp4(const struct ctx *c,
> +				     const struct sock_extended_err *ee,
> +				     const struct flowside *toside,
> +				     struct in_addr saddr,
> +				     void *in, size_t dlen)
> +{
> +	struct in_addr oaddr = toside->oaddr.v4mapped.a4;
> +	struct in_addr eaddr = toside->eaddr.v4mapped.a4;
> +	in_port_t eport = toside->eport;
> +	in_port_t oport = toside->oport;
> +	struct {
> +		struct icmphdr icmp4h;
> +		struct iphdr ip4h;
> +		struct udphdr uh;
> +		char data[ICMP4_MAX_DLEN];
> +	} __attribute__((packed, aligned(__alignof__(max_align_t)))) msg;
> +	size_t msglen = sizeof(msg) - sizeof(msg.data) + dlen;
> +	size_t l4len = dlen + sizeof(struct udphdr);
> +
> +	ASSERT(dlen <= ICMP4_MAX_DLEN);
> +	memset(&msg, 0, sizeof(msg));
> +	msg.icmp4h.type = ee->ee_type;
> +	msg.icmp4h.code = ee->ee_code;
> +	if (ee->ee_type == ICMP_DEST_UNREACH && ee->ee_code == ICMP_FRAG_NEEDED)
> +		msg.icmp4h.un.frag.mtu = htons((uint16_t) ee->ee_info);
> +
> +	/* Reconstruct the original headers as returned in the ICMP message */
> +	tap_push_ip4h(&msg.ip4h, eaddr, oaddr, l4len, IPPROTO_UDP);
> +	tap_push_uh4(&msg.uh, eaddr, eport, oaddr, oport, in, dlen);
> +	memcpy(&msg.data, in, dlen);
> +
> +	tap_icmp4_send(c, saddr, eaddr, &msg, msglen);
> +}
> +
>  /**
>   * udp_sock_recverr() - Receive and clear an error from a socket
> - * @s:		Socket to receive from
> + * @c:		Execution context
> + * @ref:	epoll reference
>   *
>   * Return: 1 if error received and processed, 0 if no more errors in queue, < 0
>   *         if there was an error reading the queue
>   *
>   * #syscalls recvmsg
>   */
> -static int udp_sock_recverr(int s)
> +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref)
>  {
>  	const struct sock_extended_err *ee;
>  	const struct cmsghdr *hdr;
> +	union sockaddr_inany saddr;
>  	char buf[CMSG_SPACE(sizeof(*ee))];
> +	char data[ICMP4_MAX_DLEN];
> +	int s = ref.fd;
> +	struct iovec iov = {
> +		.iov_base = data,
> +		.iov_len = sizeof(data)
> +	};
>  	struct msghdr mh = {
> -		.msg_name = NULL,
> -		.msg_namelen = 0,
> -		.msg_iov = NULL,
> -		.msg_iovlen = 0,
> +		.msg_name = &saddr,
> +		.msg_namelen = sizeof(saddr),
> +		.msg_iov = &iov,
> +		.msg_iovlen = 1,
>  		.msg_control = buf,
>  		.msg_controllen = sizeof(buf),
>  	};
> @@ -450,8 +505,15 @@ static int udp_sock_recverr(int s)
>  	}
>  
>  	ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
> +	if (ref.type == EPOLL_TYPE_UDP_REPLY) {
> +		flow_sidx_t sidx = flow_sidx_opposite(ref.flowside);
> +		const struct flowside *toside = flowside_at_sidx(sidx);
>  
> -	/* TODO: When possible propagate and otherwise handle errors */
> +		udp_send_conn_fail_icmp4(c, ee, toside, saddr.sa4.sin_addr,
> +					 data, rc);
> +	} else {
> +		trace("Ignoring received IP_RECVERR cmsg on listener socket");
> +	}
>  	debug("%s error on UDP socket %i: %s",
>  	      str_ee_origin(ee), s, strerror_(ee->ee_errno));
>  
> @@ -461,15 +523,16 @@ static int udp_sock_recverr(int s)
>  /**
>   * udp_sock_errs() - Process errors on a socket
>   * @c:		Execution context
> - * @s:		Socket to receive from
> + * @ref:	epoll reference
>   * @events:	epoll events bitmap
>   *
>   * Return: Number of errors handled, or < 0 if we have an unrecoverable error
>   */
> -int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
> +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events)
>  {
>  	unsigned n_err = 0;
>  	socklen_t errlen;
> +	int s = ref.fd;
>  	int rc, err;
>  
>  	ASSERT(!c->no_udp);
> @@ -478,7 +541,7 @@ int udp_sock_errs(const struct ctx *c, int s, uint32_t events)
>  		return 0; /* Nothing to do */
>  
>  	/* Empty the error queue */
> -	while ((rc = udp_sock_recverr(s)) > 0)
> +	while ((rc = udp_sock_recverr(c, ref)) > 0)
>  		n_err += rc;
>  
>  	if (rc < 0)
> @@ -558,7 +621,7 @@ static void udp_buf_listen_sock_handler(const struct ctx *c,
>  	const socklen_t sasize = sizeof(udp_meta[0].s_in);
>  	int n, i;
>  
> -	if (udp_sock_errs(c, ref.fd, events) < 0) {
> +	if (udp_sock_errs(c, ref, events) < 0) {
>  		err("UDP: Unrecoverable error on listening socket:"
>  		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
>  		/* FIXME: what now?  close/re-open socket? */
> @@ -661,7 +724,7 @@ static void udp_buf_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
>  
>  	from_s = uflow->s[ref.flowside.sidei];
>  
> -	if (udp_sock_errs(c, from_s, events) < 0) {
> +	if (udp_sock_errs(c, ref, events) < 0) {
>  		flow_err(uflow, "Unrecoverable error on reply socket");
>  		flow_err_details(uflow);
>  		udp_flow_close(c, uflow);
> diff --git a/udp_internal.h b/udp_internal.h
> index cc80e30..3b081f5 100644
> --- a/udp_internal.h
> +++ b/udp_internal.h
> @@ -30,5 +30,5 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
>  size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
>                         const struct flowside *toside, size_t dlen,
>  		       bool no_udp_csum);
> -int udp_sock_errs(const struct ctx *c, int s, uint32_t events);
> +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t events);
>  #endif /* UDP_INTERNAL_H */
> diff --git a/udp_vu.c b/udp_vu.c
> index 4123510..c26a223 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -227,7 +227,7 @@ void udp_vu_listen_sock_handler(const struct ctx *c, union epoll_ref ref,
>  	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
>  	int i;
>  
> -	if (udp_sock_errs(c, ref.fd, events) < 0) {
> +	if (udp_sock_errs(c, ref, events) < 0) {
>  		err("UDP: Unrecoverable error on listening socket:"
>  		    " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port);
>  		return;
> @@ -302,7 +302,7 @@ void udp_vu_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
>  
>  	ASSERT(!c->no_udp);
>  
> -	if (udp_sock_errs(c, from_s, events) < 0) {
> +	if (udp_sock_errs(c, ref, events) < 0) {
>  		flow_err(uflow, "Unrecoverable error on reply socket");
>  		flow_err_details(uflow);
>  		udp_flow_close(c, uflow);

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-03-05  0:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 22:45 [PATCH v10 0/4] Reconstruct incoming ICMP headers for failed UDP connect and forward back Jon Maloy
2025-03-04 22:45 ` [1/4] tap: break out building of udp header from tap_udp4_send function Jon Maloy
2025-03-04 22:45 ` [2/4] udp: create and send ICMPv4 to local peer when applicable Jon Maloy
2025-03-05  0:23   ` David Gibson [this message]
2025-03-04 22:45 ` [3/4] tap: break out building of udp header from tap_udp6_send function Jon Maloy
2025-03-04 22:45 ` [4/4] udp: create and send ICMPv6 to local peer when applicable Jon Maloy
  -- strict thread matches above, loose matches on Subject: below --
2025-03-04  1:29 [PATCH v9 0/4] Reconstruct incoming ICMP headers for failed UDP connect and forward back Jon Maloy
2025-03-04  1:29 ` [2/4] udp: create and send ICMPv4 to local peer when applicable Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8eZZSGX3XqOljAF@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).