From: Jon Maloy <jmaloy@redhat.com>
To: passt-dev@passt.top
Subject: Re: [PATCH v4] udp: support traceroute
Date: Thu, 3 Apr 2025 16:27:12 -0400 [thread overview]
Message-ID: <4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com> (raw)
In-Reply-To: <20250403174833.6d033172@elisabeth>
On 2025-04-03 11:48, Stefano Brivio wrote:
> The implementation looks solid to me, a list of nits (or a bit
> more) below.
>
> By the way, I don't think you need to Cc: people who are already on
> this list unless you specifically want their attention.
>
> On Wed, 2 Apr 2025 22:22:29 -0400
> Jon Maloy <jmaloy@redhat.com> wrote:
>
>> Now that ICMP pass-through from socket-to-tap is in place, it is
>> easy to support UDP based traceroute functionality in direction
>> tap-to-socket.
>>
>> We fix that in this commit.
>>
>> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>
> This fixes https://bugs.passt.top/show_bug.cgi?id=64 ("Link:" tag) if I
> understood correctly.
>
>> ---
>> v2: - Using ancillary data instead of setsockopt to transfer outgoing
>> TTL.
>> - Support IPv6
>> v3: - Storing ttl per packet instead of per flow. This may not be
>> elegant, but much less intrusive than changing the flow
[...]
>> @@ -11,6 +11,8 @@
>> /* Maximum size of a single packet stored in pool, including headers */
>> #define PACKET_MAX_LEN ((size_t)UINT16_MAX)
>>
>> +#define DEFAULT_TTL 64
>
> If I understood correctly, David's comment to this on v3:
>
> https://archives.passt.top/passt-dev/Z-om3Ey-HR1Hj8UH@zatzit/
>
> was meant to imply that, as the default value can be changed via
> sysctl, the value set via sysctl could be read at start-up. I'm fine
> with 64 as well, by the way, with a slight preference for reading the
> value via sysctl.
I don't think the local host/container setting will have any effect
if the sending guest is a VM. The benefit is of this is dubious.
>
> All this might go away, though, please read the comment to
> udp_flow_new() below, first.
>
>> +
>> /**
>> * struct pool - Generic pool of packets stored in a buffer
>> * @buf: Buffer storing packet descriptors,
>> diff --git a/tap.c b/tap.c
>> index 3a6fcbe..e65d592 100644
>> --- a/tap.c
>> +++ b/tap.c
>> @@ -563,6 +563,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf);
>> * @dest: Destination port
>> * @saddr: Source address
>> * @daddr: Destination address
>> + * @ttl: Time to live
>> * @msg: Array of messages that can be handled in a single call
>> */
>> static struct tap4_l4_t {
>> @@ -574,6 +575,8 @@ static struct tap4_l4_t {
>> struct in_addr saddr;
>> struct in_addr daddr;
>>
>> + uint8_t ttl;
>
> If you move this after 'protocol' you save 4 or 8 bytes depending on
> the architecture and, perhaps more importantly, with 64-byte cachelines,
> you can fit the set of fields involved in the L4_MATCH() comparison
> four times instead of three. If you have a look with pahole(1):
>
> --
> struct tap4_l4_t {
> uint8_t protocol; /* 0 1 */
>
> /* XXX 1 byte hole, try to pack */
>
> uint16_t source; /* 2 2 */
> uint16_t dest; /* 4 2 */
>
> /* XXX 2 bytes hole, try to pack */
>
> struct in_addr saddr; /* 8 4 */
> struct in_addr daddr; /* 12 4 */
> uint8_t ttl; /* 16 1 */
>
> /* XXX 7 bytes hole, try to pack */
>
> ...
> }
> --
>
> becomes:
>
> --
> struct tap4_l4_t {
> uint8_t protocol; /* 0 1 */
> uint8_t ttl; /* 1 1 */
> uint16_t source; /* 2 2 */
> uint16_t dest; /* 4 2 */
>
> /* XXX 2 bytes hole, try to pack */
>
> struct in_addr saddr; /* 8 4 */
> struct in_addr daddr; /* 12 4 */
> ...
> }
Good point. I didn't notice.
> --
>
> ...if you move it, please don't forget to update the comment to the
> struct.
>
>> +
>> struct pool_l4_t p;
[...]
>> const struct flowside *toside;
>> struct mmsghdr mm[UIO_MAXIOV];
>> @@ -938,6 +940,19 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif,
>> mm[i].msg_hdr.msg_controllen = 0;
>> mm[i].msg_hdr.msg_flags = 0;
>>
>> + if (ttl != uflow->ttl[tosidx.sidei]) {
>> + uflow->ttl[tosidx.sidei] = ttl;
>> + if (af == AF_INET) {
>> + if (setsockopt(s, IPPROTO_IP, IP_TTL,
>> + &ttl, sizeof(ttl)) < 0)
>> + perror("setsockopt (IP_TTL)");
>
> This would print to file descriptor 2 even if it's a socket. It should
> be err_perror() instead, but now we also have flow_perror() which
> prints flow index and type, given 'uflow' here, say:
>
> flow_perror(uflow, "IP_TTL setsockopt");
>
>> + } else {
>> + if (setsockopt(s, IPPROTO_IPV6, IPV6_HOPLIMIT,
>> + &ttl, sizeof(ttl)) < 0)
>> + perror("setsockopt (IP_TTL)");
>
> ...and this is IPV6_HOPLIMIT, not IP_TTL, so perhaps:
>
> flow_perror(uflow,
> "setsockopt IPV6_HOPLIMIT");
>
Ok.
>> + }
>> + }
>> +
>> count++;
>> }
>>
>> diff --git a/udp.h b/udp.h
>> index de2df6d..041fad4 100644
>> --- a/udp.h
>> +++ b/udp.h
>> @@ -15,7 +15,8 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref,
>> uint32_t events, const struct timespec *now);
>> int udp_tap_handler(const struct ctx *c, uint8_t pif,
>> sa_family_t af, const void *saddr, const void *daddr,
>> - const struct pool *p, int idx, const struct timespec *now);
>> + uint8_t ttl, const struct pool *p, int idx,
>
> Excess whitespace beetween 'uint8_t' and 'ttl'.
>
>> + const struct timespec *now);
>> int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr,
>> const char *ifname, in_port_t port);
>> int udp_init(struct ctx *c);
>> diff --git a/udp_flow.c b/udp_flow.c
>> index bf4b896..39372c2 100644
>> --- a/udp_flow.c
>> +++ b/udp_flow.c
>> @@ -137,6 +137,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow,
>> uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp);
>> uflow->ts = now->tv_sec;
>> uflow->s[INISIDE] = uflow->s[TGTSIDE] = -1;
>> + uflow->ttl[INISIDE] = uflow->ttl[TGTSIDE] = DEFAULT_TTL;
>
> By the way, instead of using a default value, what about fetching the
> current value with getsockopt()?
>
> One additional system call per UDP flow doesn't feel like a lot of
> overhead, and we can be sure it's correct, no matter if the user
> configures a different value before or after we start.
>
This patch fixes UDP messaging tap->socket, and TTL may have any
value in the first arriving packet. Reading it from the socket here only
makes sense when I add the same support in direction socket->tap.
That is my next project.
>>
>> if (s_ini >= 0) {
>> /* When using auto port-scanning the listening port could go
>> diff --git a/udp_flow.h b/udp_flow.h
>> index 9a1b059..606ac08 100644
>> --- a/udp_flow.h
>> +++ b/udp_flow.h
>> @@ -21,6 +21,7 @@ struct udp_flow {
>> bool closed :1;
>> time_t ts;
>> int s[SIDES];
>> + uint8_t ttl[SIDES];
>
> Ths should be added to the struct comment above, which, by mistake,
> seems to refer to 'struct udp' by the way (I would fix that right away
> while at it...).
ok.
///jon
>
>> };
>>
>> struct udp_flow *udp_at_sidx(flow_sidx_t sidx);
>
next prev parent reply other threads:[~2025-04-03 20:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 2:22 [PATCH v4] udp: support traceroute Jon Maloy
2025-04-03 15:48 ` Stefano Brivio
2025-04-03 20:27 ` Jon Maloy [this message]
2025-04-03 23:31 ` David Gibson
2025-04-04 11:50 ` Stefano Brivio
2025-04-04 11:54 ` Stefano Brivio
2025-04-04 12:54 ` Jon Maloy
2025-04-04 13:02 ` Stefano Brivio
2025-04-04 13:35 ` Jon Maloy
2025-04-04 14:03 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com \
--to=jmaloy@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).