From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HTE18dfa; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 19F8D5A0008 for <passt-dev@passt.top>; Thu, 03 Apr 2025 22:27:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1743712036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5tkT5WJjh99okjkr1+cTzG/QYZCkF0FyeuSSoSRs3NE=; b=HTE18dfaA5DWqo7m+xCwCnD0sKlxxSbVMCYYhNa7sY7GxIH9k0WOYQ4wlUr21wqnaz55BA FqSBRaAxKB1AzGCWs08/wz2Ltwv2fz3cQQUemOrf/qhutT8vuYNkyYsn4oOlkaoeER7Ar7 ipfx1dzIkWQRni+26xykH1ecOPHD5pQ= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-323-2W3Xx594N5O6nw4Sidp7_g-1; Thu, 03 Apr 2025 16:27:14 -0400 X-MC-Unique: 2W3Xx594N5O6nw4Sidp7_g-1 X-Mimecast-MFC-AGG-ID: 2W3Xx594N5O6nw4Sidp7_g_1743712034 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6e90788e2a7so22914226d6.0 for <passt-dev@passt.top>; Thu, 03 Apr 2025 13:27:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743712034; x=1744316834; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5tkT5WJjh99okjkr1+cTzG/QYZCkF0FyeuSSoSRs3NE=; b=xUxUzzJhd1GBcDCrsY45/CC4nGKL11S+j+umyjnDllBD1vtTeWPXyRve1t7HLRGBbv ksjfQF1WKIpvZtJMKWlNS2L6HeNAH1FPhC/U5+uEWR+b0gjMDEm+9TRb9lJKf30XK1Ov nsn55GZiSdf6kmATxTY3D1YUBgvpQMzSVKp9Tv5t+M1PO7Q6jlkUwV17zlbXVxFdU3JC lxO42pys7+glM08LNSBsyHJkhkALTDU9zDkSjaLEc04A/VjwRiyBAtDMUUFnwHIXAkQQ Glchq3+iXsuSa0vKYqAUtNhIO6brz7lTMuBzLyryMG/DZDb/rtcMP1WlN+sTRwflx2im mdgw== X-Gm-Message-State: AOJu0YyqfShUp/0hWZ2i2vVJV6Q1fKkKdjYgrJuT/f+7m+0vKVO0LpS4 WNCcxL5pgWq7LQlJFA8roZCadVdvhxYbwRZ20m9T6r5k7xxNO5xiVYzUEWTGZR8VR6iXFBscpIe usOXQb1S0a/kmni9gQX2tm3wixrSwKA2xUfHQnLHbwaJGbDvpWLUveQWeTdolVNftmpPe+As4/L ER8HPMMkiwJCs/0nU9u3LB7bnmU+/ZdDSGVQ== X-Gm-Gg: ASbGnctzGsMq63w+C7DrmJcVMH7KTsvDJNsikyb35dH7xtr98IakWBzlJ6nFcSt14ju IBAf+x6Ga+klqHsBeiUwTmLaYn7GXWRX8ZzyDMGyNtCggW9gwOJHxNd8mT9N6FiZTedNrk5CTV0 5uMNgc11Ssa9Tw2Il/0lQl7yXqAJIhtVcyEvN3gfUzc6Xlyubu156h9QJmJdgxbL7e5T8bO/m7V uj0zjx10GGdEjX719BjdL7GBVOYxYlfYfmZSsLiBP64OOtudZ/C8N/NgyTIZSFCImRxCL1kKSlu BIjp78frw3wZ3KBQI3S6KApOYfQJaCWFIo7E8bRvJV+eGWFbqsh7guKilMr6Sfs= X-Received: by 2002:a05:6214:21a7:b0:6e8:f387:e0d2 with SMTP id 6a1803df08f44-6eff5512928mr11994656d6.11.1743712033865; Thu, 03 Apr 2025 13:27:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHnKB3MipW1u5EY49omT/zHBv6Jc6fSh87N9uoXLS4POKmei3wbgIDxYe2REChSjG1yTM69yA== X-Received: by 2002:a05:6214:21a7:b0:6e8:f387:e0d2 with SMTP id 6a1803df08f44-6eff5512928mr11993976d6.11.1743712033056; Thu, 03 Apr 2025 13:27:13 -0700 (PDT) Received: from ?IPV6:2001:4958:231f:7c01:99a2:ef22:1861:9725? ([2001:4958:231f:7c01:99a2:ef22:1861:9725]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ef0efc128fsm11705366d6.23.2025.04.03.13.27.12 for <passt-dev@passt.top> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Apr 2025 13:27:12 -0700 (PDT) Message-ID: <4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com> Date: Thu, 3 Apr 2025 16:27:12 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4] udp: support traceroute To: passt-dev@passt.top References: <20250403022229.836067-1-jmaloy@redhat.com> <20250403174833.6d033172@elisabeth> From: Jon Maloy <jmaloy@redhat.com> In-Reply-To: <20250403174833.6d033172@elisabeth> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wI9p90Q5OQ6p-Z_mXwxt89aeEdpcq2IGFi0aTzNRh3Q_1743712034 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: PT5Y77D6QTAB4Z4VHIMXX5JA5NHRT3DK X-Message-ID-Hash: PT5Y77D6QTAB4Z4VHIMXX5JA5NHRT3DK X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt <passt-dev.passt.top> Archived-At: <https://archives.passt.top/passt-dev/4986e27d-20d9-4b2b-883d-d696e84ec9cf@redhat.com/> Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/PT5Y77D6QTAB4Z4VHIMXX5JA5NHRT3DK/> List-Archive: <https://archives.passt.top/passt-dev/> List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/> List-Help: <mailto:passt-dev-request@passt.top?subject=help> List-Owner: <mailto:passt-dev-owner@passt.top> List-Post: <mailto:passt-dev@passt.top> List-Subscribe: <mailto:passt-dev-join@passt.top> List-Unsubscribe: <mailto:passt-dev-leave@passt.top> On 2025-04-03 11:48, Stefano Brivio wrote: > The implementation looks solid to me, a list of nits (or a bit > more) below. > > By the way, I don't think you need to Cc: people who are already on > this list unless you specifically want their attention. > > On Wed, 2 Apr 2025 22:22:29 -0400 > Jon Maloy <jmaloy@redhat.com> wrote: > >> Now that ICMP pass-through from socket-to-tap is in place, it is >> easy to support UDP based traceroute functionality in direction >> tap-to-socket. >> >> We fix that in this commit. >> >> Signed-off-by: Jon Maloy <jmaloy@redhat.com> > > This fixes https://bugs.passt.top/show_bug.cgi?id=64 ("Link:" tag) if I > understood correctly. > >> --- >> v2: - Using ancillary data instead of setsockopt to transfer outgoing >> TTL. >> - Support IPv6 >> v3: - Storing ttl per packet instead of per flow. This may not be >> elegant, but much less intrusive than changing the flow [...] >> @@ -11,6 +11,8 @@ >> /* Maximum size of a single packet stored in pool, including headers */ >> #define PACKET_MAX_LEN ((size_t)UINT16_MAX) >> >> +#define DEFAULT_TTL 64 > > If I understood correctly, David's comment to this on v3: > > https://archives.passt.top/passt-dev/Z-om3Ey-HR1Hj8UH@zatzit/ > > was meant to imply that, as the default value can be changed via > sysctl, the value set via sysctl could be read at start-up. I'm fine > with 64 as well, by the way, with a slight preference for reading the > value via sysctl. I don't think the local host/container setting will have any effect if the sending guest is a VM. The benefit is of this is dubious. > > All this might go away, though, please read the comment to > udp_flow_new() below, first. > >> + >> /** >> * struct pool - Generic pool of packets stored in a buffer >> * @buf: Buffer storing packet descriptors, >> diff --git a/tap.c b/tap.c >> index 3a6fcbe..e65d592 100644 >> --- a/tap.c >> +++ b/tap.c >> @@ -563,6 +563,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); >> * @dest: Destination port >> * @saddr: Source address >> * @daddr: Destination address >> + * @ttl: Time to live >> * @msg: Array of messages that can be handled in a single call >> */ >> static struct tap4_l4_t { >> @@ -574,6 +575,8 @@ static struct tap4_l4_t { >> struct in_addr saddr; >> struct in_addr daddr; >> >> + uint8_t ttl; > > If you move this after 'protocol' you save 4 or 8 bytes depending on > the architecture and, perhaps more importantly, with 64-byte cachelines, > you can fit the set of fields involved in the L4_MATCH() comparison > four times instead of three. If you have a look with pahole(1): > > -- > struct tap4_l4_t { > uint8_t protocol; /* 0 1 */ > > /* XXX 1 byte hole, try to pack */ > > uint16_t source; /* 2 2 */ > uint16_t dest; /* 4 2 */ > > /* XXX 2 bytes hole, try to pack */ > > struct in_addr saddr; /* 8 4 */ > struct in_addr daddr; /* 12 4 */ > uint8_t ttl; /* 16 1 */ > > /* XXX 7 bytes hole, try to pack */ > > ... > } > -- > > becomes: > > -- > struct tap4_l4_t { > uint8_t protocol; /* 0 1 */ > uint8_t ttl; /* 1 1 */ > uint16_t source; /* 2 2 */ > uint16_t dest; /* 4 2 */ > > /* XXX 2 bytes hole, try to pack */ > > struct in_addr saddr; /* 8 4 */ > struct in_addr daddr; /* 12 4 */ > ... > } Good point. I didn't notice. > -- > > ...if you move it, please don't forget to update the comment to the > struct. > >> + >> struct pool_l4_t p; [...] >> const struct flowside *toside; >> struct mmsghdr mm[UIO_MAXIOV]; >> @@ -938,6 +940,19 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, >> mm[i].msg_hdr.msg_controllen = 0; >> mm[i].msg_hdr.msg_flags = 0; >> >> + if (ttl != uflow->ttl[tosidx.sidei]) { >> + uflow->ttl[tosidx.sidei] = ttl; >> + if (af == AF_INET) { >> + if (setsockopt(s, IPPROTO_IP, IP_TTL, >> + &ttl, sizeof(ttl)) < 0) >> + perror("setsockopt (IP_TTL)"); > > This would print to file descriptor 2 even if it's a socket. It should > be err_perror() instead, but now we also have flow_perror() which > prints flow index and type, given 'uflow' here, say: > > flow_perror(uflow, "IP_TTL setsockopt"); > >> + } else { >> + if (setsockopt(s, IPPROTO_IPV6, IPV6_HOPLIMIT, >> + &ttl, sizeof(ttl)) < 0) >> + perror("setsockopt (IP_TTL)"); > > ...and this is IPV6_HOPLIMIT, not IP_TTL, so perhaps: > > flow_perror(uflow, > "setsockopt IPV6_HOPLIMIT"); > Ok. >> + } >> + } >> + >> count++; >> } >> >> diff --git a/udp.h b/udp.h >> index de2df6d..041fad4 100644 >> --- a/udp.h >> +++ b/udp.h >> @@ -15,7 +15,8 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref, >> uint32_t events, const struct timespec *now); >> int udp_tap_handler(const struct ctx *c, uint8_t pif, >> sa_family_t af, const void *saddr, const void *daddr, >> - const struct pool *p, int idx, const struct timespec *now); >> + uint8_t ttl, const struct pool *p, int idx, > > Excess whitespace beetween 'uint8_t' and 'ttl'. > >> + const struct timespec *now); >> int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr, >> const char *ifname, in_port_t port); >> int udp_init(struct ctx *c); >> diff --git a/udp_flow.c b/udp_flow.c >> index bf4b896..39372c2 100644 >> --- a/udp_flow.c >> +++ b/udp_flow.c >> @@ -137,6 +137,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, >> uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp); >> uflow->ts = now->tv_sec; >> uflow->s[INISIDE] = uflow->s[TGTSIDE] = -1; >> + uflow->ttl[INISIDE] = uflow->ttl[TGTSIDE] = DEFAULT_TTL; > > By the way, instead of using a default value, what about fetching the > current value with getsockopt()? > > One additional system call per UDP flow doesn't feel like a lot of > overhead, and we can be sure it's correct, no matter if the user > configures a different value before or after we start. > This patch fixes UDP messaging tap->socket, and TTL may have any value in the first arriving packet. Reading it from the socket here only makes sense when I add the same support in direction socket->tap. That is my next project. >> >> if (s_ini >= 0) { >> /* When using auto port-scanning the listening port could go >> diff --git a/udp_flow.h b/udp_flow.h >> index 9a1b059..606ac08 100644 >> --- a/udp_flow.h >> +++ b/udp_flow.h >> @@ -21,6 +21,7 @@ struct udp_flow { >> bool closed :1; >> time_t ts; >> int s[SIDES]; >> + uint8_t ttl[SIDES]; > > Ths should be added to the struct comment above, which, by mistake, > seems to refer to 'struct udp' by the way (I would fix that right away > while at it...). ok. ///jon > >> }; >> >> struct udp_flow *udp_at_sidx(flow_sidx_t sidx); >