From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 920AB5A02C2 for ; Tue, 14 May 2024 02:18:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715645925; bh=P95ELIxq0RqU4vx/nZ1dS5rgAUNprAgo9lu4/0DsLXI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oibvkiv1RbMW1AFmFaccfwb+TC+43HsvyMPEKnOMI114q0VTtcOG05AsGNAsbGPY9 lnsbNGX6cZdrKIKQ1LUHBgA65M1speBB7xn3u8WpwgalRqZxgL6/22HwdowjdVuZGP NnaFVbTnWdCyJChS583Pt6s9/nMrLmK3rylfAh4JrXfBCw78vdAImJFR3X70IXrTA4 Tqr34J/fvRJBhDns+if89Vs1dLIQlRtlFDiMXGva6gDBOLdYWrvIM//51mDvTckP79 wtsmNUFaSYhG6kC27KRFaaHIf9dZyPnC2ACuR1f0p4eRtsZs3KEOzMEy4RXcNawLMa /pzETNqLhs8Pw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4VdcQs5Rh0z4wcR; Tue, 14 May 2024 10:18:45 +1000 (AEST) Date: Tue, 14 May 2024 10:15:37 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v4 02/16] tcp: Maintain flowside information for "tap" connections Message-ID: References: <20240503011135.2924437-1-david@gibson.dropbear.id.au> <20240503011135.2924437-3-david@gibson.dropbear.id.au> <20240513200722.3dc02874@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="aAoNJFNRYv+MhJdj" Content-Disposition: inline In-Reply-To: <20240513200722.3dc02874@elisabeth> Message-ID-Hash: 7XA72NNSLUKZXZDBTIDV3PE3TM3RKC4W X-Message-ID-Hash: 7XA72NNSLUKZXZDBTIDV3PE3TM3RKC4W X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --aAoNJFNRYv+MhJdj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 13, 2024 at 08:07:22PM +0200, Stefano Brivio wrote: > On Fri, 3 May 2024 11:11:21 +1000 > David Gibson wrote: >=20 > > tcp_tap_conn has several fields to track addresses and ports as seen > > by the guest/namespace. We now have general fields for this in the > > common flowside struct so use those instead of protocol specific > > fields. The flowside also has space for the guest side endpoint > > address (local address from the guest's PoV) so we fill that in as > > well. > >=20 > > We didn't previously store equivalent information for the connection > > as it appears to the host; that was implicit in the state of the host > > side socket. For future generalisations of flow/connection tracking, > > we're going to need that information, so populate the other flowside > > in each flow table entry with as much of this information as we can > > easily obtain. For connections initiated by the guest that's the > > endpoint address and port. To get the forwarding address and port > > we'd need to call getsockname() in general, so leave that blank for > > now. For connections initiated from outside, we also have the > > endpoint address from accept(). We have the forwarding port from the > > epoll ref, but we leave the forwarding address blank. > >=20 > > For now we just fill the information in without really using it for > > anything. > >=20 > > Signed-off-by: David Gibson > > --- > > flow.h | 1 - > > tcp.c | 88 +++++++++++++++++++++++++++++++++++++----------------- > > tcp_conn.h | 8 ----- > > 3 files changed, 60 insertions(+), 37 deletions(-) > >=20 > > diff --git a/flow.h b/flow.h > > index f7fb537..88caa76 100644 > > --- a/flow.h > > +++ b/flow.h > > @@ -85,7 +85,6 @@ static inline void flowside_from_inany(struct flowsid= e *fside, uint8_t pif, > > * If NULL is given for either address, the appropriate unspecified/an= y address > > * for the address family is substituted. > > */ > > -/* cppcheck-suppress unusedFunction */ > > static inline void flowside_from_af(struct flowside *fside, > > uint8_t pif, sa_family_t af, > > const void *faddr, in_port_t fport, > > diff --git a/tcp.c b/tcp.c > > index 21d0af0..1835b86 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -372,7 +372,7 @@ > > #define OPT_SACK 5 > > #define OPT_TS 8 > > =20 > > -#define CONN_V4(conn) (!!inany_v4(&(conn)->faddr)) > > +#define CONN_V4(conn) (!!inany_v4(&conn->f.side[TAPSIDE].faddr)) >=20 > ...which reminds me: I guess CONN_V4() and CONN_V6() should eventually > go away, just like SPLICE_V6 in 7/16. Yes. I've thought about doing that, but haven't quite gotten there yet. > > #define CONN_V6(conn) (!CONN_V4(conn)) > > #define CONN_IS_CLOSING(conn) \ > > ((conn->events & ESTABLISHED) && \ > > @@ -795,10 +795,11 @@ static void conn_event_do(const struct ctx *c, st= ruct tcp_tap_conn *conn, > > */ > > static int tcp_rtt_dst_low(const struct tcp_tap_conn *conn) > > { > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > int i; > > =20 > > for (i =3D 0; i < LOW_RTT_TABLE_SIZE; i++) > > - if (inany_equals(&conn->faddr, low_rtt_dst + i)) > > + if (inany_equals(&tapside->faddr, low_rtt_dst + i)) > > return 1; > > =20 > > return 0; > > @@ -813,6 +814,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_= conn *conn, > > const struct tcp_info *tinfo) > > { > > #ifdef HAS_MIN_RTT > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > int i, hole =3D -1; > > =20 > > if (!tinfo->tcpi_min_rtt || > > @@ -820,7 +822,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_= conn *conn, > > return; > > =20 > > for (i =3D 0; i < LOW_RTT_TABLE_SIZE; i++) { > > - if (inany_equals(&conn->faddr, low_rtt_dst + i)) > > + if (inany_equals(&tapside->faddr, low_rtt_dst + i)) > > return; > > if (hole =3D=3D -1 && IN6_IS_ADDR_UNSPECIFIED(low_rtt_dst + i)) > > hole =3D i; > > @@ -832,7 +834,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_= conn *conn, > > if (hole =3D=3D -1) > > return; > > =20 > > - low_rtt_dst[hole++] =3D conn->faddr; > > + low_rtt_dst[hole++] =3D tapside->faddr; > > if (hole =3D=3D LOW_RTT_TABLE_SIZE) > > hole =3D 0; > > inany_from_af(low_rtt_dst + hole, AF_INET6, &in6addr_any); > > @@ -1085,8 +1087,10 @@ static int tcp_hash_match(const struct tcp_tap_c= onn *conn, > > const union inany_addr *faddr, > > in_port_t eport, in_port_t fport) > > { > > - if (inany_equals(&conn->faddr, faddr) && > > - conn->eport =3D=3D eport && conn->fport =3D=3D fport) > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > + > > + if (inany_equals(&tapside->faddr, faddr) && > > + tapside->eport =3D=3D eport && tapside->fport =3D=3D fport) > > return 1; > > =20 > > return 0; > > @@ -1120,7 +1124,9 @@ static uint64_t tcp_hash(const struct ctx *c, con= st union inany_addr *faddr, > > static uint64_t tcp_conn_hash(const struct ctx *c, > > const struct tcp_tap_conn *conn) > > { > > - return tcp_hash(c, &conn->faddr, conn->eport, conn->fport); > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > + > > + return tcp_hash(c, &tapside->faddr, tapside->eport, tapside->fport); > > } > > =20 > > /** > > @@ -1302,10 +1308,12 @@ void tcp_defer_handler(struct ctx *c) > > * @seq: Sequence number > > */ > > static void tcp_fill_header(struct tcphdr *th, > > - const struct tcp_tap_conn *conn, uint32_t seq) > > + const struct tcp_tap_conn *conn, uint32_t seq) > > { > > - th->source =3D htons(conn->fport); > > - th->dest =3D htons(conn->eport); > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE];=09 > > + > > + th->source =3D htons(tapside->fport); > > + th->dest =3D htons(tapside->eport); > > th->seq =3D htonl(seq); > > th->ack_seq =3D htonl(conn->seq_ack_to_tap); > > if (conn->events & ESTABLISHED) { > > @@ -1337,7 +1345,8 @@ static size_t tcp_fill_headers4(const struct ctx = *c, > > size_t dlen, const uint16_t *check, > > uint32_t seq) > > { > > - const struct in_addr *a4 =3D inany_v4(&conn->faddr); > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > + const struct in_addr *a4 =3D inany_v4(&tapside->faddr); > > size_t l4len =3D dlen + sizeof(*th); > > size_t l3len =3D l4len + sizeof(*iph); > > =20 > > @@ -1379,10 +1388,11 @@ static size_t tcp_fill_headers6(const struct ct= x *c, > > struct ipv6hdr *ip6h, struct tcphdr *th, > > size_t dlen, uint32_t seq) > > { > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; > > size_t l4len =3D dlen + sizeof(*th); > > =20 > > ip6h->payload_len =3D htons(l4len); > > - ip6h->saddr =3D conn->faddr.a6; > > + ip6h->saddr =3D tapside->faddr.a6; > > if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) > > ip6h->daddr =3D c->ip6.addr_ll_seen; > > else > > @@ -1421,9 +1431,7 @@ static size_t tcp_l2_buf_fill_headers(const struc= t ctx *c, > > struct iovec *iov, size_t dlen, > > const uint16_t *check, uint32_t seq) > > { > > - const struct in_addr *a4 =3D inany_v4(&conn->faddr); > > - > > - if (a4) { > > + if (CONN_V4(conn)) { > > return tcp_fill_headers4(c, conn, iov[TCP_IOV_TAP].iov_base, > > iov[TCP_IOV_IP].iov_base, > > iov[TCP_IOV_PAYLOAD].iov_base, dlen, > > @@ -1738,7 +1746,7 @@ static void tcp_tap_window_update(struct tcp_tap_= conn *conn, unsigned wnd) > > /** > > * tcp_seq_init() - Calculate initial sequence number according to RFC= 6528 > > * @c: Execution context > > - * @conn: TCP connection, with faddr, fport and eport populated > > + * @conn: TCP connection, with tap flowside faddr, fport and eport > > * @now: Current timestamp > > */ > > static void tcp_seq_init(const struct ctx *c, struct tcp_tap_conn *con= n, > > @@ -1746,6 +1754,7 @@ static void tcp_seq_init(const struct ctx *c, str= uct tcp_tap_conn *conn, > > { > > struct siphash_state state =3D SIPHASH_INIT(c->hash_secret); > > union inany_addr aany; > > + const struct flowside *tapside =3D &conn->f.side[TAPSIDE]; >=20 > One line up. Already fixed in the latest equivalent code. > > uint64_t hash; > > uint32_t ns; > > =20 > > @@ -1754,10 +1763,10 @@ static void tcp_seq_init(const struct ctx *c, s= truct tcp_tap_conn *conn, > > else > > inany_from_af(&aany, AF_INET6, &c->ip6.addr); > > =20 > > - inany_siphash_feed(&state, &conn->faddr); > > + inany_siphash_feed(&state, &tapside->faddr); > > inany_siphash_feed(&state, &aany); > > hash =3D siphash_final(&state, 36, > > - (uint64_t)conn->fport << 16 | conn->eport); > > + (uint64_t)tapside->fport << 16 | tapside->eport); > > =20 > > /* 32ns ticks, overflows 32 bits every 137s */ > > ns =3D (now->tv_sec * 1000000000 + now->tv_nsec) >> 5; > > @@ -1945,6 +1954,7 @@ static void tcp_conn_from_tap(struct ctx *c, sa_f= amily_t af, > > .sin6_port =3D htons(dstport), > > .sin6_addr =3D *(struct in6_addr *)daddr, > > }; > > + struct flowside *tapside, *sockside; > > const struct sockaddr *sa; > > struct tcp_tap_conn *conn; > > union flow *flow; > > @@ -1954,6 +1964,11 @@ static void tcp_conn_from_tap(struct ctx *c, sa_= family_t af, > > if (!(flow =3D flow_alloc())) > > return; > > =20 > > + tapside =3D &flow->f.side[TAPSIDE]; > > + sockside =3D &flow->f.side[SOCKSIDE]; > > + > > + flowside_from_af(tapside, PIF_TAP, af, daddr, dstport, saddr, srcport= ); > > + > > if (af =3D=3D AF_INET) { > > if (IN4_IS_ADDR_UNSPECIFIED(saddr) || > > IN4_IS_ADDR_BROADCAST(saddr) || > > @@ -2026,19 +2041,19 @@ static void tcp_conn_from_tap(struct ctx *c, sa= _family_t af, > > if (!(conn->wnd_from_tap =3D (htons(th->window) >> conn->ws_from_tap)= )) > > conn->wnd_from_tap =3D 1; > > =20 > > - inany_from_af(&conn->faddr, af, daddr); > > + sockside->pif =3D PIF_HOST; > > + sockside->eport =3D dstport; > > =20 > > if (af =3D=3D AF_INET) { > > + inany_from_af(&sockside->eaddr, AF_INET, &addr4.sin_addr); > > sa =3D (struct sockaddr *)&addr4; > > sl =3D sizeof(addr4); > > } else { > > + inany_from_af(&sockside->eaddr, AF_INET6, &addr6.sin6_addr); > > sa =3D (struct sockaddr *)&addr6; > > sl =3D sizeof(addr6); > > } > > =20 > > - conn->fport =3D dstport; > > - conn->eport =3D srcport; > > - > > conn->seq_init_from_tap =3D ntohl(th->seq); > > conn->seq_from_tap =3D conn->seq_init_from_tap + 1; > > conn->seq_ack_to_tap =3D conn->seq_from_tap; > > @@ -2724,18 +2739,35 @@ static void tcp_tap_conn_from_sock(struct ctx *= c, in_port_t dstport, > > const union sockaddr_inany *sa, > > const struct timespec *now) > > { > > - struct tcp_tap_conn *conn =3D FLOW_START(flow, FLOW_TCP, tcp, SOCKSID= E); > > + struct flowside *sockside =3D &flow->f.side[SOCKSIDE]; > > + struct flowside *tapside =3D &flow->f.side[TAPSIDE]; > > + struct tcp_tap_conn *conn; > > + > > + sockside->pif =3D PIF_HOST; > > + inany_from_sockaddr(&sockside->eaddr, &sockside->eport, sa); > > + sockside->fport =3D dstport; > > + > > + tapside->pif =3D PIF_TAP; > > + tapside->faddr =3D sockside->eaddr; > > + tapside->fport =3D sockside->eport; > > + tcp_snat_inbound(c, &tapside->faddr); > > + if (CONN_V4(flow)) { > > + inany_from_af(&tapside->eaddr, AF_INET, &c->ip4.addr_seen); > > + } else { > > + if (IN6_IS_ADDR_LINKLOCAL(&tapside->faddr.a6)) > > + tapside->eaddr.a6 =3D c->ip6.addr_ll_seen; > > + else > > + tapside->eaddr.a6 =3D c->ip6.addr_seen; > > + } > > + tapside->eport =3D dstport + c->tcp.fwd_in.delta[dstport]; >=20 > Pre-existing, but I wonder: doesn't this port translation also belong > to tcp_snat_inbound()? Not really, because "snat" here is for "source nat". But in any case both are subsumed into common NAT functions later in the series. > > + > > + conn =3D FLOW_START(flow, FLOW_TCP, tcp, SOCKSIDE); > > =20 > > conn->sock =3D s; > > conn->timer =3D -1; > > conn->ws_to_tap =3D conn->ws_from_tap =3D 0; > > conn_event(c, conn, SOCK_ACCEPTED); > > =20 > > - inany_from_sockaddr(&conn->faddr, &conn->fport, sa); > > - conn->eport =3D dstport + c->tcp.fwd_in.delta[dstport]; > > - > > - tcp_snat_inbound(c, &conn->faddr); > > - > > tcp_seq_init(c, conn, now); > > tcp_hash_insert(c, conn); > > =20 > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 1a07dd5..f55f144 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -23,9 +23,6 @@ > > * @ws_to_tap: Window scaling factor advertised to tap/guest > > * @sndbuf: Sending buffer in kernel, rounded to 2 ^ SNDBUF_BITS > > * @seq_dup_ack_approx: Last duplicate ACK number sent to tap > > - * @faddr: Guest side forwarding address (guest's remote address) > > - * @eport: Guest side endpoint port (guest's local port) > > - * @fport: Guest side forwarding port (guest's remote port) > > * @wnd_from_tap: Last window size from tap, unscaled (as received) > > * @wnd_to_tap: Sending window advertised to tap, unscaled (as sent) > > * @seq_to_tap: Next sequence for packets to tap > > @@ -91,11 +88,6 @@ struct tcp_tap_conn { > > =20 > > uint8_t seq_dup_ack_approx; > > =20 > > - > > - union inany_addr faddr; > > - in_port_t eport; > > - in_port_t fport; > > - > > uint16_t wnd_from_tap; > > uint16_t wnd_to_tap; > > =20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --aAoNJFNRYv+MhJdj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZCrSgACgkQzQJF27ox 2GeCuBAAo2xwI/B0MvA5n8buyHAqQ2FqW9R6Qlle8pOLgkEeGUtF8zstup3KO/Ep 5n0Cr0WGNai0U6nO08hXTBwPVTZ/Tj3IA3Te5OVVMXxSHaZapNRxRB4MxYVQhUMH 09oCyQiOGjekDjjfWkXss6YRxQQGKKEDsbldhSdG7BNp+VCZHoBpg2AXK6A/KJRe kXrSO409azEOdpMFOg0Ow7UTfZk+TOkpXl5bizzgEw6bylvNRQS0oGnpWphBhFV9 LcXSypJj4BflVVu1xpJCjGo9YVyEU05HFIQ2+pVLvRKfO02bnEQIcqUVHx2QdWpk Ki8/16O8WoF/zGziCsJicJffIICAtXqwdgc3cruLk5kWiv2BfoMYcTVjEqoJ/ZJ/ /kDkTVw6mZ8dxGNpmUCYmvXzbIxnzfCbEF5+zwJNcrX9mXIKifbaQqfIcvCX46OE z33mblo5NcohnEnikl/z3mxBsxTVmJpN9XOkhVSdjw8YWXD22rOufDshQTqnFbAd dn4BBqKbbEoiVoRMk1sJjawZouCmSWX+vDWA8FPwG8Opp/W6/cGTclF11/j0tMS+ 6/R4UlkFqd15zS7eN5N68+gRWR02zboXAaaDt3FM/OYbzyzl+fB8Kruzf1Z3HgbS Pc9IUlRpcQdpFLZJtuHR+5Q0AT6hPK9h9Ow7lX23jHeUokk7P/0= =Xv5p -----END PGP SIGNATURE----- --aAoNJFNRYv+MhJdj--