From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=O0l5MA5p; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 487B35A026F for ; Thu, 03 Apr 2025 04:22:35 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1743646954; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Zeq2QVS4DvVG17C5LJr6lhOZutD9+HuOD5guv8kP65E=; b=O0l5MA5pWhNtbYcbfSEWmX0K3gGrkQW1VcnyxQjX0pIy+qcRoe8fTvOj0VfC/2QSCpMcOB qT3LmvnYvB1+kHq5oyg7DilCYmP+gpZO8O0aUAdg+AaRWPBOa0NLpdw6g4q4XVdpvf932e 7+G6qJmVxWkabovLTlz/zX2g+bYRpQg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-348-Wy3EqzM5OUCGFYyw0hqq-w-1; Wed, 02 Apr 2025 22:22:32 -0400 X-MC-Unique: Wy3EqzM5OUCGFYyw0hqq-w-1 X-Mimecast-MFC-AGG-ID: Wy3EqzM5OUCGFYyw0hqq-w_1743646952 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EE8621801A07 for ; Thu, 3 Apr 2025 02:22:31 +0000 (UTC) Received: from jmaloy-thinkpadp16vgen1.rmtcaqc.csb (unknown [10.22.80.15]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 54A3D3001D0E; Thu, 3 Apr 2025 02:22:30 +0000 (UTC) From: Jon Maloy To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com Subject: [PATCH v4] udp: support traceroute Date: Wed, 2 Apr 2025 22:22:29 -0400 Message-ID: <20250403022229.836067-1-jmaloy@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: WljM8LAo4OBJQjFPARW62Dt2ep3EesoJCC9rfCaxuvs_1743646952 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: SV4DTUNILLJRWKEQB5IYJUHE7RLBHHB5 X-Message-ID-Hash: SV4DTUNILLJRWKEQB5IYJUHE7RLBHHB5 X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Now that ICMP pass-through from socket-to-tap is in place, it is easy to support UDP based traceroute functionality in direction tap-to-socket. We fix that in this commit. Signed-off-by: Jon Maloy --- v2: - Using ancillary data instead of setsockopt to transfer outgoing TTL. - Support IPv6 v3: - Storing ttl per packet instead of per flow. This may not be elegant, but much less intrusive than changing the flow criteria. This eliminates the need for the extra, flow-changing patch we introduced in v2. v4: - Going back to something similar to the original solution, but storing current ttl in struct udp_flow, plus ensuring that all packets in a struct tap4_l4_t/tap6_l4_t instance, have the same ttl. After input from David Gibson. --- packet.h | 2 ++ tap.c | 18 ++++++++++++++---- udp.c | 17 ++++++++++++++++- udp.h | 3 ++- udp_flow.c | 1 + udp_flow.h | 1 + 6 files changed, 36 insertions(+), 6 deletions(-) diff --git a/packet.h b/packet.h index c94780a..e84e123 100644 --- a/packet.h +++ b/packet.h @@ -11,6 +11,8 @@ /* Maximum size of a single packet stored in pool, including headers */ #define PACKET_MAX_LEN ((size_t)UINT16_MAX) +#define DEFAULT_TTL 64 + /** * struct pool - Generic pool of packets stored in a buffer * @buf: Buffer storing packet descriptors, diff --git a/tap.c b/tap.c index 3a6fcbe..e65d592 100644 --- a/tap.c +++ b/tap.c @@ -563,6 +563,7 @@ PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); * @dest: Destination port * @saddr: Source address * @daddr: Destination address + * @ttl: Time to live * @msg: Array of messages that can be handled in a single call */ static struct tap4_l4_t { @@ -574,6 +575,8 @@ static struct tap4_l4_t { struct in_addr saddr; struct in_addr daddr; + uint8_t ttl; + struct pool_l4_t p; } tap4_l4[TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; @@ -586,6 +589,7 @@ static struct tap4_l4_t { * @dest: Destination port * @saddr: Source address * @daddr: Destination address + * @hop_limit: Hop limit * @msg: Array of messages that can be handled in a single call */ static struct tap6_l4_t { @@ -598,6 +602,8 @@ static struct tap6_l4_t { struct in6_addr saddr; struct in6_addr daddr; + uint8_t hop_limit; + struct pool_l4_t p; } tap6_l4[TAP_SEQS /* Arbitrary: TAP_MSGS in theory, so limit in users */]; @@ -786,7 +792,8 @@ resume: #define L4_MATCH(iph, uh, seq) \ ((seq)->protocol == (iph)->protocol && \ (seq)->source == (uh)->source && (seq)->dest == (uh)->dest && \ - (seq)->saddr.s_addr == (iph)->saddr && (seq)->daddr.s_addr == (iph)->daddr) + (seq)->saddr.s_addr == (iph)->saddr && \ + (seq)->daddr.s_addr == (iph)->daddr && (seq)->ttl == (iph)->ttl) #define L4_SET(iph, uh, seq) \ do { \ @@ -795,6 +802,7 @@ resume: (seq)->dest = (uh)->dest; \ (seq)->saddr.s_addr = (iph)->saddr; \ (seq)->daddr.s_addr = (iph)->daddr; \ + (seq)->ttl = (iph)->ttl; \ } while (0) if (seq && L4_MATCH(iph, uh, seq) && seq->p.count < UIO_MAXIOV) @@ -843,7 +851,7 @@ append: for (k = 0; k < p->count; ) k += udp_tap_handler(c, PIF_TAP, AF_INET, &seq->saddr, &seq->daddr, - p, k, now); + seq->ttl, p, k, now); } } @@ -966,7 +974,8 @@ resume: (seq)->dest == (uh)->dest && \ (seq)->flow_lbl == ip6_get_flow_lbl(ip6h) && \ IN6_ARE_ADDR_EQUAL(&(seq)->saddr, saddr) && \ - IN6_ARE_ADDR_EQUAL(&(seq)->daddr, daddr)) + IN6_ARE_ADDR_EQUAL(&(seq)->daddr, daddr) && \ + (seq)->hop_limit == (ip6h)->hop_limit) #define L4_SET(ip6h, proto, uh, seq) \ do { \ @@ -976,6 +985,7 @@ resume: (seq)->flow_lbl = ip6_get_flow_lbl(ip6h); \ (seq)->saddr = *saddr; \ (seq)->daddr = *daddr; \ + (seq)->hop_limit = (ip6h)->hop_limit; \ } while (0) if (seq && L4_MATCH(ip6h, proto, uh, seq) && @@ -1026,7 +1036,7 @@ append: for (k = 0; k < p->count; ) k += udp_tap_handler(c, PIF_TAP, AF_INET6, &seq->saddr, &seq->daddr, - p, k, now); + seq->hop_limit, p, k, now); } } diff --git a/udp.c b/udp.c index 39431d7..bc93292 100644 --- a/udp.c +++ b/udp.c @@ -849,6 +849,7 @@ fail: * @af: Address family, AF_INET or AF_INET6 * @saddr: Source address * @daddr: Destination address + * @ttl: TTL or hop limit for packets to be sent in this call * @p: Pool of UDP packets, with UDP headers * @idx: Index of first packet to process * @now: Current timestamp @@ -859,7 +860,8 @@ fail: */ int udp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, - const struct pool *p, int idx, const struct timespec *now) + uint8_t ttl, const struct pool *p, int idx, + const struct timespec *now) { const struct flowside *toside; struct mmsghdr mm[UIO_MAXIOV]; @@ -938,6 +940,19 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, mm[i].msg_hdr.msg_controllen = 0; mm[i].msg_hdr.msg_flags = 0; + if (ttl != uflow->ttl[tosidx.sidei]) { + uflow->ttl[tosidx.sidei] = ttl; + if (af == AF_INET) { + if (setsockopt(s, IPPROTO_IP, IP_TTL, + &ttl, sizeof(ttl)) < 0) + perror("setsockopt (IP_TTL)"); + } else { + if (setsockopt(s, IPPROTO_IPV6, IPV6_HOPLIMIT, + &ttl, sizeof(ttl)) < 0) + perror("setsockopt (IP_TTL)"); + } + } + count++; } diff --git a/udp.h b/udp.h index de2df6d..041fad4 100644 --- a/udp.h +++ b/udp.h @@ -15,7 +15,8 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now); int udp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, - const struct pool *p, int idx, const struct timespec *now); + uint8_t ttl, const struct pool *p, int idx, + const struct timespec *now); int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr, const char *ifname, in_port_t port); int udp_init(struct ctx *c); diff --git a/udp_flow.c b/udp_flow.c index bf4b896..39372c2 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -137,6 +137,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp); uflow->ts = now->tv_sec; uflow->s[INISIDE] = uflow->s[TGTSIDE] = -1; + uflow->ttl[INISIDE] = uflow->ttl[TGTSIDE] = DEFAULT_TTL; if (s_ini >= 0) { /* When using auto port-scanning the listening port could go diff --git a/udp_flow.h b/udp_flow.h index 9a1b059..606ac08 100644 --- a/udp_flow.h +++ b/udp_flow.h @@ -21,6 +21,7 @@ struct udp_flow { bool closed :1; time_t ts; int s[SIDES]; + uint8_t ttl[SIDES]; }; struct udp_flow *udp_at_sidx(flow_sidx_t sidx); -- 2.48.1