From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=S2nXdlOi; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D9DE35A0008 for ; Fri, 04 Apr 2025 04:04:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1743732260; bh=PrNHjDHUB/Uz3rGs0DvAg3aoQJEWe9iCWaolmodrjBk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=S2nXdlOibtSftX6nFxRM5/GNBQB3IxpHs7gNQkzO78QETcCkVDLL26MaSiQHiAPoa Ss3/B+VZL5UKJTNYJgBIMwpvVWZEKIFu8H1soOOhO5RYTGV2DlbMiUS0QJMadPRtwr aV5oZV7BFG8aCRjJACSotQSzbBzYWnGB7rs5zlYDW41hEinXu6OQzyVZa8j2NbUqcc X8gDD1FzFvzGK+z/EPvGWafYsch947foG71WjklqqQmp6A4eu3aB0WI97W9SG4ozNZ ZLy8qpKvAJkYE82XRbIR9go/QzFv7zaok9vsbViqWT2Sk606f3gV5MpeRBdknlfHbb 2Zh+7FalMB66A== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4ZTMNh3tlWz4x8w; Fri, 4 Apr 2025 13:04:20 +1100 (AEDT) Date: Fri, 4 Apr 2025 11:57:58 +1100 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v5] udp: support traceroute Message-ID: References: <20250403222706.1036876-1-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="lo/2IMqqfpwGxoGf" Content-Disposition: inline In-Reply-To: Message-ID-Hash: ZKS53Y2AHRJARDRO4NE4Z34KF532F2D7 X-Message-ID-Hash: ZKS53Y2AHRJARDRO4NE4Z34KF532F2D7 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --lo/2IMqqfpwGxoGf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 04, 2025 at 10:40:27AM +1100, David Gibson wrote: > On Thu, Apr 03, 2025 at 06:27:06PM -0400, Jon Maloy wrote: > > Now that ICMP pass-through from socket-to-tap is in place, it is > > easy to support UDP based traceroute functionality in direction > > tap-to-socket. > >=20 > > We fix that in this commit. > >=20 > > Link: https://bugs.passt.top/show_bug.cgi?id=3D64 > > Signed-off-by: Jon Maloy >=20 > Reviewed-by: David Gibson >=20 > One commont below. Oh.. wait. I just realised this patch has a weird side effect, for flows which are initiated from outside, rather than from the guest. If the flow is initiated from outside, it's maybe a bit unlikely, but we could still get a non-default TTL from the guest on reply datagrams. That will trigger this code and alter the socket's TTL. But in this case the socket is not a flow specific socket, but the listening socket which initiated the flow, which could be handling packets on many flows. The "cached" TTL is stored per-flow, not per-socket, so we won't look at the right value when we process datagrams from other flows, so they may go out with the wrong TTL. I think the obvious way to address this is to stop using the "listening" socket to send datagrams for a flow, using a connect()ed socket instead. We have other reasons to do that too, and I'm working on implementing that right now. The question is whether this is a serious enough problem to delay this until the "both sides connect()ed sockets" change is merged. Mitigating the problem we have: * Actually having multiple flows on a socket isn't super common (we handled this completely wrong before UDP flow table support, and no-one much noticed) * Non-default TTL packets on replies from the guest are probably uncommon * Wrong TTL packets sent out on different flows probably aren't particularly damaging in most cases > > --- > > v2: - Using ancillary data instead of setsockopt to transfer outgoing > > TTL. > > - Support IPv6 > > v3: - Storing ttl per packet instead of per flow. This may not be > > elegant, but much less intrusive than changing the flow > > criteria. This eliminates the need for the extra, flow-changing > > patch we introduced in v2. > > v4: - Going back to something similar to the original solution, but > > storing current ttl in struct udp_flow, plus ensuring that all > > packets in a struct tap4_l4_t/tap6_l4_t instance have the same > > ttl. After input from David Gibson. > > v5: - Some minor fixes after feedback from Stefano Brivio. > > --- > > packet.h | 2 ++ > > tap.c | 17 +++++++++++++---- > > udp.c | 19 ++++++++++++++++++- > > udp.h | 3 ++- > > udp_flow.c | 1 + > > udp_flow.h | 4 +++- > > 6 files changed, 39 insertions(+), 7 deletions(-) > >=20 > > diff --git a/packet.h b/packet.h > > index c94780a..e84e123 100644 > > --- a/packet.h > > +++ b/packet.h > > @@ -11,6 +11,8 @@ > > /* Maximum size of a single packet stored in pool, including headers */ > > #define PACKET_MAX_LEN ((size_t)UINT16_MAX) > > =20 > > +#define DEFAULT_TTL 64 >=20 > This is still fixed, rather than either probing the sysctl or using > getsockopt() to determine the initial value. I don't think we want to > delay this further to change that, but it could be a reasonable > follow up improvement. >=20 > > + > > /** > > * struct pool - Generic pool of packets stored in a buffer > > * @buf: Buffer storing packet descriptors, > > diff --git a/tap.c b/tap.c > > index 3a6fcbe..d630f6d 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -559,6 +559,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); > > * struct l4_seq4_t - Message sequence for one protocol handler call, = IPv4 > > * @msgs: Count of messages in sequence > > * @protocol: Protocol number > > + * @ttl: Time to live > > * @source: Source port > > * @dest: Destination port > > * @saddr: Source address > > @@ -567,6 +568,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); > > */ > > static struct tap4_l4_t { > > uint8_t protocol; > > + uint8_t ttl; > > =20 > > uint16_t source; > > uint16_t dest; > > @@ -586,6 +588,7 @@ static struct tap4_l4_t { > > * @dest: Destination port > > * @saddr: Source address > > * @daddr: Destination address > > + * @hop_limit: Hop limit > > * @msg: Array of messages that can be handled in a single call > > */ > > static struct tap6_l4_t { > > @@ -598,6 +601,8 @@ static struct tap6_l4_t { > > struct in6_addr saddr; > > struct in6_addr daddr; > > =20 > > + uint8_t hop_limit; > > + > > struct pool_l4_t p; > > } tap6_l4[TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users= */]; > > =20 > > @@ -786,7 +791,8 @@ resume: > > #define L4_MATCH(iph, uh, seq) \ > > ((seq)->protocol =3D=3D (iph)->protocol && \ > > (seq)->source =3D=3D (uh)->source && (seq)->dest =3D=3D (uh)->= dest && \ > > - (seq)->saddr.s_addr =3D=3D (iph)->saddr && (seq)->daddr.s_addr =3D= =3D (iph)->daddr) > > + (seq)->saddr.s_addr =3D=3D (iph)->saddr && \ > > + (seq)->daddr.s_addr =3D=3D (iph)->daddr && (seq)->ttl =3D=3D (iph)->= ttl) > > =20 > > #define L4_SET(iph, uh, seq) \ > > do { \ > > @@ -795,6 +801,7 @@ resume: > > (seq)->dest =3D (uh)->dest; \ > > (seq)->saddr.s_addr =3D (iph)->saddr; \ > > (seq)->daddr.s_addr =3D (iph)->daddr; \ > > + (seq)->ttl =3D (iph)->ttl; \ > > } while (0) > > =20 > > if (seq && L4_MATCH(iph, uh, seq) && seq->p.count < UIO_MAXIOV) > > @@ -843,7 +850,7 @@ append: > > for (k =3D 0; k < p->count; ) > > k +=3D udp_tap_handler(c, PIF_TAP, AF_INET, > > &seq->saddr, &seq->daddr, > > - p, k, now); > > + seq->ttl, p, k, now); > > } > > } > > =20 > > @@ -966,7 +973,8 @@ resume: > > (seq)->dest =3D=3D (uh)->dest && \ > > (seq)->flow_lbl =3D=3D ip6_get_flow_lbl(ip6h) && \ > > IN6_ARE_ADDR_EQUAL(&(seq)->saddr, saddr) && \ > > - IN6_ARE_ADDR_EQUAL(&(seq)->daddr, daddr)) > > + IN6_ARE_ADDR_EQUAL(&(seq)->daddr, daddr) && \ > > + (seq)->hop_limit =3D=3D (ip6h)->hop_limit) > > =20 > > #define L4_SET(ip6h, proto, uh, seq) \ > > do { \ > > @@ -976,6 +984,7 @@ resume: > > (seq)->flow_lbl =3D ip6_get_flow_lbl(ip6h); \ > > (seq)->saddr =3D *saddr; \ > > (seq)->daddr =3D *daddr; \ > > + (seq)->hop_limit =3D (ip6h)->hop_limit; \ > > } while (0) > > =20 > > if (seq && L4_MATCH(ip6h, proto, uh, seq) && > > @@ -1026,7 +1035,7 @@ append: > > for (k =3D 0; k < p->count; ) > > k +=3D udp_tap_handler(c, PIF_TAP, AF_INET6, > > &seq->saddr, &seq->daddr, > > - p, k, now); > > + seq->hop_limit, p, k, now); > > } > > } > > =20 > > diff --git a/udp.c b/udp.c > > index 39431d7..618a4e2 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -849,6 +849,7 @@ fail: > > * @af: Address family, AF_INET or AF_INET6 > > * @saddr: Source address > > * @daddr: Destination address > > + * @ttl: TTL or hop limit for packets to be sent in this call > > * @p: Pool of UDP packets, with UDP headers > > * @idx: Index of first packet to process > > * @now: Current timestamp > > @@ -859,7 +860,8 @@ fail: > > */ > > int udp_tap_handler(const struct ctx *c, uint8_t pif, > > sa_family_t af, const void *saddr, const void *daddr, > > - const struct pool *p, int idx, const struct timespec *now) > > + uint8_t ttl, const struct pool *p, int idx, > > + const struct timespec *now) > > { > > const struct flowside *toside; > > struct mmsghdr mm[UIO_MAXIOV]; > > @@ -938,6 +940,21 @@ int udp_tap_handler(const struct ctx *c, uint8_t p= if, > > mm[i].msg_hdr.msg_controllen =3D 0; > > mm[i].msg_hdr.msg_flags =3D 0; > > =20 > > + if (ttl !=3D uflow->ttl[tosidx.sidei]) { > > + uflow->ttl[tosidx.sidei] =3D ttl; > > + if (af =3D=3D AF_INET) { > > + if (setsockopt(s, IPPROTO_IP, IP_TTL, > > + &ttl, sizeof(ttl)) < 0) > > + flow_perror(uflow, > > + "setsockopt IP_TTL"); > > + } else { > > + if (setsockopt(s, IPPROTO_IPV6, IPV6_HOPLIMIT, > > + &ttl, sizeof(ttl)) < 0) > > + flow_perror(uflow, > > + "setsockopt IPV6_HOPLIMIT"); > > + } > > + } > > + > > count++; > > } > > =20 > > diff --git a/udp.h b/udp.h > > index de2df6d..a811475 100644 > > --- a/udp.h > > +++ b/udp.h > > @@ -15,7 +15,8 @@ void udp_reply_sock_handler(const struct ctx *c, unio= n epoll_ref ref, > > uint32_t events, const struct timespec *now); > > int udp_tap_handler(const struct ctx *c, uint8_t pif, > > sa_family_t af, const void *saddr, const void *daddr, > > - const struct pool *p, int idx, const struct timespec *now); > > + uint8_t ttl, const struct pool *p, int idx, > > + const struct timespec *now); > > int udp_sock_init(const struct ctx *c, int ns, const union inany_addr = *addr, > > const char *ifname, in_port_t port); > > int udp_init(struct ctx *c); > > diff --git a/udp_flow.c b/udp_flow.c > > index bf4b896..39372c2 100644 > > --- a/udp_flow.c > > +++ b/udp_flow.c > > @@ -137,6 +137,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c= , union flow *flow, > > uflow =3D FLOW_SET_TYPE(flow, FLOW_UDP, udp); > > uflow->ts =3D now->tv_sec; > > uflow->s[INISIDE] =3D uflow->s[TGTSIDE] =3D -1; > > + uflow->ttl[INISIDE] =3D uflow->ttl[TGTSIDE] =3D DEFAULT_TTL; > > =20 > > if (s_ini >=3D 0) { > > /* When using auto port-scanning the listening port could go > > diff --git a/udp_flow.h b/udp_flow.h > > index 9a1b059..520de62 100644 > > --- a/udp_flow.h > > +++ b/udp_flow.h > > @@ -8,11 +8,12 @@ > > #define UDP_FLOW_H > > =20 > > /** > > - * struct udp - Descriptor for a flow of UDP packets > > + * struct udp_flow - Descriptor for a flow of UDP packets > > * @f: Generic flow information > > * @closed: Flow is already closed > > * @ts: Activity timestamp > > * @s: Socket fd (or -1) for each side of the flow > > + * @ttl: TTL or hop_limit for both sides > > */ > > struct udp_flow { > > /* Must be first element */ > > @@ -21,6 +22,7 @@ struct udp_flow { > > bool closed :1; > > time_t ts; > > int s[SIDES]; > > + uint8_t ttl[SIDES]; > > }; > > =20 > > struct udp_flow *udp_at_sidx(flow_sidx_t sidx); >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --lo/2IMqqfpwGxoGf Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmfvLpUACgkQzQJF27ox 2GdCKA//a5lpOUWG4PphRDA8ne2bEW9nU7ZbFCzcho1QhCwfFVXxr6A8acrA5ZtE /T8a4ZmY0oRKFVeJwoyJ5xh3VHJackgObe6XnTb3uJYJvw25JNIbUGAC0C+VRZNG MD0rJcYPrnva/f1sZBFG+2f0P0C3Q/1cW/bJDvaluZYCes9XhjsYcAK1iEFl5kHs w5x+sDTm8zWwQgnJavVVJh5044f/hTnP4yh7aviq/+s+srQRTJ/UwDxJCn1hBau/ ruP7wlyzUu8sF09WvfzFScxPGQ32zhuYlaVw6SoO6l69UQ6+MxQ7mA9/YGIhpU1y tltu6aQe4+Z9FHcAf7pEKNr3/5XvjxiReR2msQiUVOl0IAKu+mjupm/pWjN2Kc+0 +JofVP6jBkVASNt+rLA4ODY3QTcj0hyOwhCKtAUepO3ubf/F6Fw1BJIZRH3CblNT zMsJ3kiGb6eJW+/h4TCcJhDDUqPaTN7ds02O8/KtR3+kcQSb5zQCP3zCiTzuxPF2 hCOV4wt4SCwpOCji6kP1Wgn3MPkga0kiQoJYOvEHe2g2BWrfyDDWYK10vs7e/9MU 3bZqgABjjx6CHCYHxh4nCvzP/f7rjUsqvW4EgmPMq3PBuH2Vyy/P6jkR0LejJ2Fy 55gcFdROTiAuNx4ZE7d1Um1ohJp8UrKETNrYijYlINaH573oaxQ= =njH6 -----END PGP SIGNATURE----- --lo/2IMqqfpwGxoGf--