From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=YYz9RmdP; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id E3A105A078D for ; Wed, 05 Mar 2025 01:23:42 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1741134206; bh=hV+3xmDPiUx5GObgH6Orn83m7jWknOGn15QDe+vy4lQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=YYz9RmdPKkta9lo1/K43KYWuvxpHqMS8cCCNcxgWxG4FiKlnpsK0EoQ6EJai82xsE rL0fUjKw94ls+7+BRrFj4mGEQ8Aa6F0oiKM/F7MUUZ2GRXSUf5WNkaADYg0pSePinY aoe6su1UFHKPjUodfMy1SOT/QRx/xUjWP/syD/t3qWD44OvgyoL4R1kaJ/J4Th7wf6 h7PxO36fqLV7CdX7Yp/bKN6dw+FVqoC1Qiy3qBQoXsIth3clN6qq0pd/mA5I4XNgXT GnmX+R+TORDOjZLebl14tItT1SxarN2t/hiNRoM13Pc4Uc+ynTDUmCN+fJdj6Aw4cY 4xYQR2Y/+bdFg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Z6tZ600Vxz4x3d; Wed, 5 Mar 2025 11:23:25 +1100 (AEDT) Date: Wed, 5 Mar 2025 11:23:01 +1100 From: David Gibson To: Jon Maloy Subject: Re: [2/4] udp: create and send ICMPv4 to local peer when applicable Message-ID: References: <20250304224532.1770263-1-jmaloy@redhat.com> <20250304224532.1770263-3-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Xlc+N7Rdo+AZahsT" Content-Disposition: inline In-Reply-To: <20250304224532.1770263-3-jmaloy@redhat.com> Message-ID-Hash: J43USLD4PK5NOWKP5U4CUDIY3LAJK4PK X-Message-ID-Hash: J43USLD4PK5NOWKP5U4CUDIY3LAJK4PK X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Xlc+N7Rdo+AZahsT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 04, 2025 at 05:45:30PM -0500, Jon Maloy wrote: > When a local peer sends a UDP message to a non-existing port on an > existing remote host, that host will return an ICMP message containing > the error code ICMP_PORT_UNREACH, plus the header and the first eight > bytes of the original message. If the sender socket has been connected, > it uses this message to issue a "Connection Refused" event to the user. >=20 > Until now, we have only read such events from the externally facing > socket, but we don't forward them back to the local sender because > we cannot read the ICMP message directly to user space. Because of > this, the local peer will hang and wait for a response that never > arrives. >=20 > We now fix this for IPv4 by recreating and forwarding a correct ICMP > message back to the internal sender. We synthesize the message based > on the information in the extended error structure, plus the returned > part of the original message body. >=20 > Note that for the sake of completeness, we even produce ICMP messages > for other error codes. We have noticed that at least ICMP_PROT_UNREACH > is propagated as an error event back to the user. >=20 > Reviewed-by: David Gibson > Signed-off-by: Jon Maloy > --- > v2: - Updated the ICMP creation to use the new function tap_push_uh4(). > - Added logics to find correct flow, depending on origin. > - All done after feedback from David Gibson. > v3: - Passing parameter 'now' along with call to udp_sock_errs() call. > - Corrected lookup of flow from listener socket > - All done after feedback from David Gibson. > v4: - Eliminate any attempt to handle ICMPs from listener sockets. > - After feedback from David Gibson. > v5: - Small reshuffle in anticipation of following commits. > v6: - Introduced macro ICMP4_MAX_DLEN > v7: - Introduced ASSERT() on data lenght size > - Added MTU for ICMPv4 ICMP_FRAG_NEEDED messages > v8: - Distinguishing between ICMP source address and origininal UDP > destination address. > v9: - Fixed wrong l4len value given to tap_push_ip4h() Ah, yes. I also missed that in the previous version. Again, Reviewed-by: David Gibson > --- > tap.c | 4 +-- > tap.h | 2 ++ > udp.c | 87 +++++++++++++++++++++++++++++++++++++++++++------- > udp_internal.h | 2 +- > udp_vu.c | 4 +-- > 5 files changed, 82 insertions(+), 17 deletions(-) >=20 > diff --git a/tap.c b/tap.c > index d5c8edd..97416ac 100644 > --- a/tap.c > +++ b/tap.c > @@ -143,8 +143,8 @@ static void *tap_push_l2h(const struct ctx *c, void *= buf, uint16_t proto) > * > * Return: pointer at which to write the packet's payload > */ > -static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > - struct in_addr dst, size_t l4len, uint8_t proto) > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > + struct in_addr dst, size_t l4len, uint8_t proto) > { > uint16_t l3len =3D l4len + sizeof(*ip4h); > =20 > diff --git a/tap.h b/tap.h > index 0b3eb93..37da21d 100644 > --- a/tap.h > +++ b/tap.h > @@ -47,6 +47,8 @@ static inline void tap_hdr_update(struct tap_hdr *thdr,= size_t l2len) > void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t spor= t, > struct in_addr dst, in_port_t dport, > const void *in, size_t dlen); > +void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, > + struct in_addr dst, size_t l4len, uint8_t proto); > void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sp= ort, > struct in_addr dst, in_port_t dport, > const void *in, size_t dlen); > diff --git a/udp.c b/udp.c > index 923cc38..c260e3d 100644 > --- a/udp.c > +++ b/udp.c > @@ -87,6 +87,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -112,6 +113,9 @@ > #include "udp_internal.h" > #include "udp_vu.h" > =20 > +/* Maximum UDP data to be returned in ICMP messages */ > +#define ICMP4_MAX_DLEN 8 > + > /* "Spliced" sockets indexed by bound port (host order) */ > static int udp_splice_ns [IP_VERSIONS][NUM_PORTS]; > static int udp_splice_init[IP_VERSIONS][NUM_PORTS]; > @@ -402,25 +406,76 @@ static void udp_tap_prepare(const struct mmsghdr *m= mh, > (*tap_iov)[UDP_IOV_PAYLOAD].iov_len =3D l4len; > } > =20 > +/** > + * udp_send_conn_fail_icmp4() - Construct and send ICMPv4 to local peer > + * @c: Execution context > + * @ee: Extended error descriptor > + * @toside: Destination side of flow > + * @saddr: Address of ICMP generating node > + * @in: First bytes (max 8) of original UDP message body > + * @dlen: Length of the read part of original UDP message body > + */ > +static void udp_send_conn_fail_icmp4(const struct ctx *c, > + const struct sock_extended_err *ee, > + const struct flowside *toside, > + struct in_addr saddr, > + void *in, size_t dlen) > +{ > + struct in_addr oaddr =3D toside->oaddr.v4mapped.a4; > + struct in_addr eaddr =3D toside->eaddr.v4mapped.a4; > + in_port_t eport =3D toside->eport; > + in_port_t oport =3D toside->oport; > + struct { > + struct icmphdr icmp4h; > + struct iphdr ip4h; > + struct udphdr uh; > + char data[ICMP4_MAX_DLEN]; > + } __attribute__((packed, aligned(__alignof__(max_align_t)))) msg; > + size_t msglen =3D sizeof(msg) - sizeof(msg.data) + dlen; > + size_t l4len =3D dlen + sizeof(struct udphdr); > + > + ASSERT(dlen <=3D ICMP4_MAX_DLEN); > + memset(&msg, 0, sizeof(msg)); > + msg.icmp4h.type =3D ee->ee_type; > + msg.icmp4h.code =3D ee->ee_code; > + if (ee->ee_type =3D=3D ICMP_DEST_UNREACH && ee->ee_code =3D=3D ICMP_FRA= G_NEEDED) > + msg.icmp4h.un.frag.mtu =3D htons((uint16_t) ee->ee_info); > + > + /* Reconstruct the original headers as returned in the ICMP message */ > + tap_push_ip4h(&msg.ip4h, eaddr, oaddr, l4len, IPPROTO_UDP); > + tap_push_uh4(&msg.uh, eaddr, eport, oaddr, oport, in, dlen); > + memcpy(&msg.data, in, dlen); > + > + tap_icmp4_send(c, saddr, eaddr, &msg, msglen); > +} > + > /** > * udp_sock_recverr() - Receive and clear an error from a socket > - * @s: Socket to receive from > + * @c: Execution context > + * @ref: epoll reference > * > * Return: 1 if error received and processed, 0 if no more errors in que= ue, < 0 > * if there was an error reading the queue > * > * #syscalls recvmsg > */ > -static int udp_sock_recverr(int s) > +static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) > { > const struct sock_extended_err *ee; > const struct cmsghdr *hdr; > + union sockaddr_inany saddr; > char buf[CMSG_SPACE(sizeof(*ee))]; > + char data[ICMP4_MAX_DLEN]; > + int s =3D ref.fd; > + struct iovec iov =3D { > + .iov_base =3D data, > + .iov_len =3D sizeof(data) > + }; > struct msghdr mh =3D { > - .msg_name =3D NULL, > - .msg_namelen =3D 0, > - .msg_iov =3D NULL, > - .msg_iovlen =3D 0, > + .msg_name =3D &saddr, > + .msg_namelen =3D sizeof(saddr), > + .msg_iov =3D &iov, > + .msg_iovlen =3D 1, > .msg_control =3D buf, > .msg_controllen =3D sizeof(buf), > }; > @@ -450,8 +505,15 @@ static int udp_sock_recverr(int s) > } > =20 > ee =3D (const struct sock_extended_err *)CMSG_DATA(hdr); > + if (ref.type =3D=3D EPOLL_TYPE_UDP_REPLY) { > + flow_sidx_t sidx =3D flow_sidx_opposite(ref.flowside); > + const struct flowside *toside =3D flowside_at_sidx(sidx); > =20 > - /* TODO: When possible propagate and otherwise handle errors */ > + udp_send_conn_fail_icmp4(c, ee, toside, saddr.sa4.sin_addr, > + data, rc); > + } else { > + trace("Ignoring received IP_RECVERR cmsg on listener socket"); > + } > debug("%s error on UDP socket %i: %s", > str_ee_origin(ee), s, strerror_(ee->ee_errno)); > =20 > @@ -461,15 +523,16 @@ static int udp_sock_recverr(int s) > /** > * udp_sock_errs() - Process errors on a socket > * @c: Execution context > - * @s: Socket to receive from > + * @ref: epoll reference > * @events: epoll events bitmap > * > * Return: Number of errors handled, or < 0 if we have an unrecoverable = error > */ > -int udp_sock_errs(const struct ctx *c, int s, uint32_t events) > +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t eve= nts) > { > unsigned n_err =3D 0; > socklen_t errlen; > + int s =3D ref.fd; > int rc, err; > =20 > ASSERT(!c->no_udp); > @@ -478,7 +541,7 @@ int udp_sock_errs(const struct ctx *c, int s, uint32_= t events) > return 0; /* Nothing to do */ > =20 > /* Empty the error queue */ > - while ((rc =3D udp_sock_recverr(s)) > 0) > + while ((rc =3D udp_sock_recverr(c, ref)) > 0) > n_err +=3D rc; > =20 > if (rc < 0) > @@ -558,7 +621,7 @@ static void udp_buf_listen_sock_handler(const struct = ctx *c, > const socklen_t sasize =3D sizeof(udp_meta[0].s_in); > int n, i; > =20 > - if (udp_sock_errs(c, ref.fd, events) < 0) { > + if (udp_sock_errs(c, ref, events) < 0) { > err("UDP: Unrecoverable error on listening socket:" > " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port); > /* FIXME: what now? close/re-open socket? */ > @@ -661,7 +724,7 @@ static void udp_buf_reply_sock_handler(const struct c= tx *c, union epoll_ref ref, > =20 > from_s =3D uflow->s[ref.flowside.sidei]; > =20 > - if (udp_sock_errs(c, from_s, events) < 0) { > + if (udp_sock_errs(c, ref, events) < 0) { > flow_err(uflow, "Unrecoverable error on reply socket"); > flow_err_details(uflow); > udp_flow_close(c, uflow); > diff --git a/udp_internal.h b/udp_internal.h > index cc80e30..3b081f5 100644 > --- a/udp_internal.h > +++ b/udp_internal.h > @@ -30,5 +30,5 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_p= ayload_t *bp, > size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, > const struct flowside *toside, size_t dlen, > bool no_udp_csum); > -int udp_sock_errs(const struct ctx *c, int s, uint32_t events); > +int udp_sock_errs(const struct ctx *c, union epoll_ref ref, uint32_t eve= nts); > #endif /* UDP_INTERNAL_H */ > diff --git a/udp_vu.c b/udp_vu.c > index 4123510..c26a223 100644 > --- a/udp_vu.c > +++ b/udp_vu.c > @@ -227,7 +227,7 @@ void udp_vu_listen_sock_handler(const struct ctx *c, = union epoll_ref ref, > struct vu_virtq *vq =3D &vdev->vq[VHOST_USER_RX_QUEUE]; > int i; > =20 > - if (udp_sock_errs(c, ref.fd, events) < 0) { > + if (udp_sock_errs(c, ref, events) < 0) { > err("UDP: Unrecoverable error on listening socket:" > " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port); > return; > @@ -302,7 +302,7 @@ void udp_vu_reply_sock_handler(const struct ctx *c, u= nion epoll_ref ref, > =20 > ASSERT(!c->no_udp); > =20 > - if (udp_sock_errs(c, from_s, events) < 0) { > + if (udp_sock_errs(c, ref, events) < 0) { > flow_err(uflow, "Unrecoverable error on reply socket"); > flow_err_details(uflow); > udp_flow_close(c, uflow); --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --Xlc+N7Rdo+AZahsT Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmfHmWEACgkQzQJF27ox 2GejTBAAgpVELGZU6pnu9AkmgVPnXjql/P+dspKPSJyhj2GzejhpTmjQHwm2itK7 D2biP+xhp+9bZBLVnbOfeYlJkFsD2WQiEIZZETjRpc9IWCM3sivFUA1xqZJfKMih yLB1nvH08ogAqkNs3fwAjAjtPINXquPgE/+BD8Tx2R+4AYN9iy3ApDmd4tbADYIn HCdHciiwzw3cin8hyxv1qes5W7IHAlXWmyQKpruR1nAnqouEZBQofj9WcyC7E+2J LA6hMKNkDoTtp4TXwJuenAsS/5zynJ9oL/U0C4QqJg3B4ha38GG2J+ctXHbQa3zJ vp+6QEioQcOJPFCMQxEq54516YfUrueQ1uPeuFV5Ui/Cn7axUG1/gyf3GL8Xi5wl n4Y4R8ZhgqC3fR9/F8rHuxWC/rqelDixJ3a9BsDyEELwUsxahutSxUQ/ShAWijRi WZgqymKRFVDxVWRlgE83HUXUUogZZE+xx6FhBjNHYjKjeFxABwT3fyD61F90Kx9r W4tRVI0k8srs10L2VExcEsG+p+5dIvNmrFCZs4aFIISEkXBVNyHX7Dmcnul7dsR1 iuQ9B3rpintd1n1AkyocpanAWFYNoFrzbmzehpa2tb3eT8SBHNpDstwVSxiSao2L WAfiSqCWG/Zk1cj4zNRzuqc6i8ftVxI7P4KZRbi9t0Bnb8PcoME= =w/iE -----END PGP SIGNATURE----- --Xlc+N7Rdo+AZahsT--