From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=laqkxLOw; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 706F15A0271 for ; Fri, 04 Apr 2025 01:35:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1743723325; bh=jDM4ytB6xpFF3fuuvHznHbCHFjV3GdFmxIVsmluwx6Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=laqkxLOw2o/W/5BFtJiIKjQIMqPNRwwuU0CKEEDN1MwKtAF04RUcrIGaiTe3LovjW kF/QlLgMH15YNMiHXEieMbu/tYM3d3ZMiGsl0j9gW9AecFTXKG+2kPFMpSCgXFNWIx oLFNaXRItOmw3KVN/Q23CYawcaY7T55wcQdbPGGSILmj5RYI2AfIddZyaZE+szX5+2 RNUcBwGHD5J+u67dJCRLCY+8UixQKVrG1ajg/+CJibthiPNpNQTgVbUZLGSKMyuxB8 BqyCC/vC/DQnfr5VemdKYmx8fND9wwZNejRHqXd2GvWdPv9JO0AJcnr1o6K7CMHaJx R5T10fFsV8MKg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4ZTJ4s45Xgz4x4d; Fri, 4 Apr 2025 10:35:25 +1100 (AEDT) Date: Fri, 4 Apr 2025 10:31:29 +1100 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v4] udp: support traceroute Message-ID: References: <20250403022229.836067-1-jmaloy@redhat.com> <20250403174833.6d033172@elisabeth> <4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="NzahVet5i6VvF6Ga" Content-Disposition: inline In-Reply-To: <4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com> Message-ID-Hash: FOUAHSCBWUVP55Z6SRRFGW2BTUHMYM2L X-Message-ID-Hash: FOUAHSCBWUVP55Z6SRRFGW2BTUHMYM2L X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --NzahVet5i6VvF6Ga Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 03, 2025 at 04:27:12PM -0400, Jon Maloy wrote: >=20 >=20 > On 2025-04-03 11:48, Stefano Brivio wrote: > > The implementation looks solid to me, a list of nits (or a bit > > more) below. > >=20 > > By the way, I don't think you need to Cc: people who are already on > > this list unless you specifically want their attention. > >=20 > > On Wed, 2 Apr 2025 22:22:29 -0400 > > Jon Maloy wrote: > >=20 > > > Now that ICMP pass-through from socket-to-tap is in place, it is > > > easy to support UDP based traceroute functionality in direction > > > tap-to-socket. > > >=20 > > > We fix that in this commit. > > >=20 > > > Signed-off-by: Jon Maloy > >=20 > > This fixes https://bugs.passt.top/show_bug.cgi?id=3D64 ("Link:" tag) if= I > > understood correctly. > >=20 > > > --- > > > v2: - Using ancillary data instead of setsockopt to transfer outgoing > > > TTL. > > > - Support IPv6 > > > v3: - Storing ttl per packet instead of per flow. This may not be > > > elegant, but much less intrusive than changing the flow >=20 > [...] >=20 > > > @@ -11,6 +11,8 @@ > > > /* Maximum size of a single packet stored in pool, including header= s */ > > > #define PACKET_MAX_LEN ((size_t)UINT16_MAX) > > > +#define DEFAULT_TTL 64 > >=20 > > If I understood correctly, David's comment to this on v3: > >=20 > > https://archives.passt.top/passt-dev/Z-om3Ey-HR1Hj8UH@zatzit/ > >=20 > > was meant to imply that, as the default value can be changed via > > sysctl, the value set via sysctl could be read at start-up. I'm fine > > with 64 as well, by the way, with a slight preference for reading the > > value via sysctl. >=20 > I don't think the local host/container setting will have any effect > if the sending guest is a VM. That's true, but.. > The benefit is of this is dubious. =2E. uflow->ttl[] isn't so much representing what the guest set, as a cache of what the socket is sending and that *does* depend on the host value. >=20 > >=20 > > All this might go away, though, please read the comment to > > udp_flow_new() below, first. > >=20 > > > + > > > /** > > > * struct pool - Generic pool of packets stored in a buffer > > > * @buf: Buffer storing packet descriptors, > > > diff --git a/tap.c b/tap.c > > > index 3a6fcbe..e65d592 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -563,6 +563,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); > > > * @dest: Destination port > > > * @saddr: Source address > > > * @daddr: Destination address > > > + * @ttl: Time to live > > > * @msg: Array of messages that can be handled in a single call > > > */ > > > static struct tap4_l4_t { > > > @@ -574,6 +575,8 @@ static struct tap4_l4_t { > > > struct in_addr saddr; > > > struct in_addr daddr; > > > + uint8_t ttl; > >=20 > > If you move this after 'protocol' you save 4 or 8 bytes depending on > > the architecture and, perhaps more importantly, with 64-byte cachelines, > > you can fit the set of fields involved in the L4_MATCH() comparison > > four times instead of three. If you have a look with pahole(1): > >=20 > Good point. I didn't notice. >=20 >=20 > [...] > > > const struct flowside *toside; > > > struct mmsghdr mm[UIO_MAXIOV]; > > > @@ -938,6 +940,19 @@ int udp_tap_handler(const struct ctx *c, uint8_t= pif, > > > mm[i].msg_hdr.msg_controllen =3D 0; > > > mm[i].msg_hdr.msg_flags =3D 0; > > > + if (ttl !=3D uflow->ttl[tosidx.sidei]) { > > > + uflow->ttl[tosidx.sidei] =3D ttl; > > > + if (af =3D=3D AF_INET) { > > > + if (setsockopt(s, IPPROTO_IP, IP_TTL, > > > + &ttl, sizeof(ttl)) < 0) > > > + perror("setsockopt (IP_TTL)"); > >=20 > > This would print to file descriptor 2 even if it's a socket. It should > > be err_perror() instead, but now we also have flow_perror() which > > prints flow index and type, given 'uflow' here, say: > >=20 > > flow_perror(uflow, "IP_TTL setsockopt"); > >=20 > > > + } else { > > > + if (setsockopt(s, IPPROTO_IPV6, IPV6_HOPLIMIT, > > > + &ttl, sizeof(ttl)) < 0) > > > + perror("setsockopt (IP_TTL)"); > >=20 > > ...and this is IPV6_HOPLIMIT, not IP_TTL, so perhaps: > >=20 > > flow_perror(uflow, > > "setsockopt IPV6_HOPLIMIT"); > >=20 > Ok. >=20 > > > + } > > > + } > > > + > > > count++; > > > } > > > diff --git a/udp.h b/udp.h > > > index de2df6d..041fad4 100644 > > > --- a/udp.h > > > +++ b/udp.h > > > @@ -15,7 +15,8 @@ void udp_reply_sock_handler(const struct ctx *c, un= ion epoll_ref ref, > > > uint32_t events, const struct timespec *now); > > > int udp_tap_handler(const struct ctx *c, uint8_t pif, > > > sa_family_t af, const void *saddr, const void *daddr, > > > - const struct pool *p, int idx, const struct timespec *now); > > > + uint8_t ttl, const struct pool *p, int idx, > >=20 > > Excess whitespace beetween 'uint8_t' and 'ttl'. > >=20 > > > + const struct timespec *now); > > > int udp_sock_init(const struct ctx *c, int ns, const union inany_ad= dr *addr, > > > const char *ifname, in_port_t port); > > > int udp_init(struct ctx *c); > > > diff --git a/udp_flow.c b/udp_flow.c > > > index bf4b896..39372c2 100644 > > > --- a/udp_flow.c > > > +++ b/udp_flow.c > > > @@ -137,6 +137,7 @@ static flow_sidx_t udp_flow_new(const struct ctx = *c, union flow *flow, > > > uflow =3D FLOW_SET_TYPE(flow, FLOW_UDP, udp); > > > uflow->ts =3D now->tv_sec; > > > uflow->s[INISIDE] =3D uflow->s[TGTSIDE] =3D -1; > > > + uflow->ttl[INISIDE] =3D uflow->ttl[TGTSIDE] =3D DEFAULT_TTL; > >=20 > > By the way, instead of using a default value, what about fetching the > > current value with getsockopt()? > >=20 > > One additional system call per UDP flow doesn't feel like a lot of > > overhead, and we can be sure it's correct, no matter if the user > > configures a different value before or after we start. > >=20 > This patch fixes UDP messaging tap->socket, and TTL may have any > value in the first arriving packet. Reading it from the socket here only > makes sense when I add the same support in direction socket->tap. > That is my next project. >=20 > > > if (s_ini >=3D 0) { > > > /* When using auto port-scanning the listening port could go > > > diff --git a/udp_flow.h b/udp_flow.h > > > index 9a1b059..606ac08 100644 > > > --- a/udp_flow.h > > > +++ b/udp_flow.h > > > @@ -21,6 +21,7 @@ struct udp_flow { > > > bool closed :1; > > > time_t ts; > > > int s[SIDES]; > > > + uint8_t ttl[SIDES]; > >=20 > > Ths should be added to the struct comment above, which, by mistake, > > seems to refer to 'struct udp' by the way (I would fix that right away > > while at it...). >=20 > ok. >=20 > ///jon >=20 > >=20 > > > }; > > > struct udp_flow *udp_at_sidx(flow_sidx_t sidx); > >=20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --NzahVet5i6VvF6Ga Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmfvGlAACgkQzQJF27ox 2GfO+hAAmu/sMvVVvbSRZG6I83O0U/MCEX4tiVeDTx3l1iq1XgTfPcfUSDq/vdpU W0la7NoIe7Px71RjlXxO4iQLUlH2ZJPQyOjDsr7cZTKSRLGyiSeIMHXuEsAkINFa 6XW1vb45cW9s1D3SyU0+JAks12EXiZLJrLqIMAvdPfdycl8HoxIADZfSw5+5dqI9 4slIkQlL12scg/3VYSveObBlDL+BHOjcLo5cFdTRkwZSRQpPBNUd4ZQZHifO0wBs Qqe6PLapJqr1Z0qulhxJ7U8l5XvQbXH1xDS2dhfn0bF5EndbytMNmGKlzRFXJwa7 fzil+umRLfCRzNc2vZRrqr5iUQzj4HtKtPfC/0Le1ocBXU6k0Nv4+eUmeXvD1Lvb 4bZSg6K7i0Pefbt3HYdEIdcLK51naLt9BC+jpbDolz9Wo48hlZlobM0bEvCiYDlr nzUa5y92+MZ9qCOMFnM0JI4j/PHjLjbcgYp7kp/GYpxCo2clkJhqa2iSPFP0vOHH OznAV8WQwD/VIVCyAyQWB550kNRHcjoTR1oYaOKKtXQtVoOHQ9ubrDuRfzMYWkeW ibSWECOheDF0EfHceqIVeCUYoHFNMAByMQzM+PEr2eiSOVr1wK/T5jnhIA3j9S6T PhNyaC24Hn9ZBbxqK4e1Lg0jHb+lAH7AwOikQ7Xt/9l9pzw4bqA= =EBuS -----END PGP SIGNATURE----- --NzahVet5i6VvF6Ga--