From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id B54515A0271 for ; Tue, 16 Jan 2024 07:27:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1705386425; bh=S7vXYDwABilhOH7VA7Q53rY8dIXqtwZCkWU55ASKwuA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=No5Ngmqa4MVr9WBGqJy0SLICiC2hARc3GewjqrHn55ghyfzH9ay3V5qhINa9tgz87 5AT3uHbxy7eCbhEYwpoWU9ywwLbYRgQorpY2CtFdxJeBuv0nrULalRzoGWuIhNPUwM iZk5HuWwdvwQlBn5nO1ysE18CC4KcOfy4lL/iG186rm0nfDNPUrGT5eJ5ENMgWeDyZ Dq5+ZO4xKR4cveQqxoYSSwYHWNPgTL0aZkYtPQ0UIOX6fADi59FphyBETcvQ8WFjCN RjyeisMhB2nyLTMP7CXxJEzT7VpjaPMu7RokamzK7jzGGH5u7d8imPvzREMVhX9Lsy 2MJ0q8+j+a0GQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TDfDn6tZ7z4x7q; Tue, 16 Jan 2024 17:27:05 +1100 (AEDT) Date: Tue, 16 Jan 2024 17:23:38 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 02/15] tcp, flow: Maintain guest side flow information Message-ID: References: <20231221070237.1422557-1-david@gibson.dropbear.id.au> <20231221070237.1422557-3-david@gibson.dropbear.id.au> <20240113235105.2f0fb0ea@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="IKSaRY+xa04oIECG" Content-Disposition: inline In-Reply-To: <20240113235105.2f0fb0ea@elisabeth> Message-ID-Hash: 2DKP2BICDJNF227QCYJNIWRLXK3YVKI5 X-Message-ID-Hash: 2DKP2BICDJNF227QCYJNIWRLXK3YVKI5 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --IKSaRY+xa04oIECG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 13, 2024 at 11:51:05PM +0100, Stefano Brivio wrote: > On Thu, 21 Dec 2023 18:02:24 +1100 > David Gibson wrote: >=20 > > tcp_tap_conn has several fields to track addresses and ports as seen by > > the guest/namespace. However we now have general fields for this in the > > common flowside struct. Use those instead of protocol specific fields. > >=20 > > So far we've only explicitly tracked the guest side forwarding address > > in the TCP connection - the remote address from the guest's point of > > view. The tap side endpoint address - the local address from the > > guest's point of view - was assumed to always be one of the handful of > > guest addresses we track as addr_seen (one each for IPv4, IPv6 global > > and IPv6 link-local). > >=20 > > struct flowside expects both addresses, and we will want to use the > > endpoint address in future. So, we need to add code to determine that > > address and record it in the flowside. > >=20 > > Signed-off-by: David Gibson > > --- > > flow.h | 20 +++++++++++++++ > > tcp.c | 75 ++++++++++++++++++++++++++++-------------------------- > > tcp_conn.h | 8 ------ > > 3 files changed, 59 insertions(+), 44 deletions(-) > >=20 > > diff --git a/flow.h b/flow.h > > index e090ba0..e7126e4 100644 > > --- a/flow.h > > +++ b/flow.h > > @@ -45,6 +45,26 @@ struct flowside { > > static_assert(_Alignof(struct flowside) =3D=3D _Alignof(uint32_t), > > "Unexpected alignment for struct flowside"); > > =20 > > +/** flowside_from_af - Initialize flowside from addresses >=20 > flowside_from_af() - ...from addresses and ports? Yes. I couldn't think of a way of spelling it out more directly without making it very long. I hoped this was clear enough by way of analogy with inany_from_af(). > > + * @fside: flowside to initialize > > + * @pif: pif if of this flowside >=20 > s/if/id/ or s/if/ID/. Fixed. > > + * @af: Address family (AF_INET or AF_INET6) > > + * @faddr: Forwarding address (pointer to in_addr or in6_addr) > > + * @fport: Forwarding port > > + * @eaddr: Endpoint address (pointer to in_addr or in6_addr) > > + * @eport: Endpoint port > > + */ > > +static inline void flowside_from_af(struct flowside *fside, uint8_t pi= f, int af, > > + const void *faddr, in_port_t fport, > > + const void *eaddr, in_port_t eport) > > +{ > > + fside->pif =3D pif; > > + inany_from_af(&fside->faddr, af, faddr); > > + inany_from_af(&fside->eaddr, af, eaddr); > > + fside->fport =3D fport; > > + fside->eport =3D eport; > > +} > > + > > /** flowside_complete - Check if flowside is fully initialized > > * @fside: flowside to check > > */ > > diff --git a/tcp.c b/tcp.c > > index ddabc34..7ef20b1 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -394,7 +394,9 @@ struct tcp6_l2_head { /* For MSS6 macro: keep in sy= nc with tcp6_l2_buf_t */ > > #define OPT_SACK 5 > > #define OPT_TS 8 > > =20 > > -#define CONN_V4(conn) (!!inany_v4(&(conn)->faddr)) > > +#define TAPFSIDE(conn) (&(conn)->f.side[TAPSIDE]) >=20 > After reviewing the patch I understand what TAPFSIDE() is for... but > I still think the name is pretty obscure. I'm struggling to come up > with a better idea though. Perhaps FLOWSIDE_TAP(conn), or: >=20 > #define FLOWSIDE(conn, _side) (&(conn)->f.side[(_side)]) >=20 > ? >=20 > Then sure, TAPFSIDE ans SOCKFSIDE have the advantage of brevity... but > for instance they don't spare any ugliness in tcp_tap_conn_from_sock() > compared to FLOWSIDE_TAP. Maybe they do in other places I didn't check. Right, the whole point of these macros is brevity - without it expressions we use it in become unreadably verbose. I think FLOWSIDE(conn, TAPSIDE) is too long. FLOWSIDE_TAP(conn) might be barely acceptable. > > + > > +#define CONN_V4(conn) (!!inany_v4(&TAPFSIDE(conn)->faddr)) > > #define CONN_V6(conn) (!CONN_V4(conn)) > > #define CONN_IS_CLOSING(conn) \ > > ((conn->events & ESTABLISHED) && \ > > @@ -845,7 +847,7 @@ static int tcp_rtt_dst_low(const struct tcp_tap_con= n *conn) > > int i; > > =20 > > for (i =3D 0; i < LOW_RTT_TABLE_SIZE; i++) > > - if (inany_equals(&conn->faddr, low_rtt_dst + i)) > > + if (inany_equals(&TAPFSIDE(conn)->faddr, low_rtt_dst + i)) > > return 1; > > =20 > > return 0; > > @@ -867,7 +869,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_= conn *conn, > > return; > > =20 > > for (i =3D 0; i < LOW_RTT_TABLE_SIZE; i++) { > > - if (inany_equals(&conn->faddr, low_rtt_dst + i)) > > + if (inany_equals(&TAPFSIDE(conn)->faddr, low_rtt_dst + i)) > > return; > > if (hole =3D=3D -1 && IN6_IS_ADDR_UNSPECIFIED(low_rtt_dst + i)) > > hole =3D i; > > @@ -879,7 +881,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_= conn *conn, > > if (hole =3D=3D -1) > > return; > > =20 > > - low_rtt_dst[hole++] =3D conn->faddr; > > + low_rtt_dst[hole++] =3D TAPFSIDE(conn)->faddr; > > if (hole =3D=3D LOW_RTT_TABLE_SIZE) > > hole =3D 0; > > inany_from_af(low_rtt_dst + hole, AF_INET6, &in6addr_any); > > @@ -1144,8 +1146,8 @@ static int tcp_hash_match(const struct tcp_tap_co= nn *conn, > > const union inany_addr *faddr, > > in_port_t eport, in_port_t fport) > > { > > - if (inany_equals(&conn->faddr, faddr) && > > - conn->eport =3D=3D eport && conn->fport =3D=3D fport) > > + if (inany_equals(&TAPFSIDE(conn)->faddr, faddr) && > > + TAPFSIDE(conn)->eport =3D=3D eport && TAPFSIDE(conn)->fport =3D= =3D fport) > > return 1; > > =20 > > return 0; > > @@ -1179,7 +1181,8 @@ static uint64_t tcp_hash(const struct ctx *c, con= st union inany_addr *faddr, > > static uint64_t tcp_conn_hash(const struct ctx *c, > > const struct tcp_tap_conn *conn) > > { > > - return tcp_hash(c, &conn->faddr, conn->eport, conn->fport); > > + return tcp_hash(c, &TAPFSIDE(conn)->faddr, > > + TAPFSIDE(conn)->eport, TAPFSIDE(conn)->fport); > > } > > =20 > > /** > > @@ -1365,13 +1368,13 @@ static size_t tcp_l2_buf_fill_headers(const str= uct ctx *c, > > void *p, size_t plen, > > const uint16_t *check, uint32_t seq) > > { > > - const struct in_addr *a4 =3D inany_v4(&conn->faddr); > > + const struct in_addr *a4 =3D inany_v4(&TAPFSIDE(conn)->faddr); > > size_t ip_len, tlen; > > =20 > > #define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ > > do { \ > > - b->th.source =3D htons(conn->fport); \ > > - b->th.dest =3D htons(conn->eport); \ > > + b->th.source =3D htons(TAPFSIDE(conn)->fport); \ > > + b->th.dest =3D htons(TAPFSIDE(conn)->eport); \ > > b->th.seq =3D htonl(seq); \ > > b->th.ack_seq =3D htonl(conn->seq_ack_to_tap); \ > > if (conn->events & ESTABLISHED) { \ > > @@ -1389,7 +1392,7 @@ do { \ > > ip_len =3D plen + sizeof(struct iphdr) + sizeof(struct tcphdr); > > b->iph.tot_len =3D htons(ip_len); > > b->iph.saddr =3D a4->s_addr; > > - b->iph.daddr =3D c->ip4.addr_seen.s_addr; > > + b->iph.daddr =3D inany_v4(&TAPFSIDE(conn)->eaddr)->s_addr; > > =20 > > if (check) > > b->iph.check =3D *check; > > @@ -1407,11 +1410,8 @@ do { \ > > ip_len =3D plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); > > =20 > > b->ip6h.payload_len =3D htons(plen + sizeof(struct tcphdr)); > > - b->ip6h.saddr =3D conn->faddr.a6; > > - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) > > - b->ip6h.daddr =3D c->ip6.addr_ll_seen; > > - else > > - b->ip6h.daddr =3D c->ip6.addr_seen; > > + b->ip6h.saddr =3D TAPFSIDE(conn)->faddr.a6; > > + b->ip6h.daddr =3D TAPFSIDE(conn)->eaddr.a6; >=20 > As I was checking these assignments, it occurred to me that perhaps > laddr/raddr (local/remote), instead of faddr/eaddr, used in the sense of > the comment to struct flowside: >=20 > * @eaddr: Endpoint address (remote address from passt's PoV) > * @faddr: Forwarding address (local address from passt's PoV) >=20 > might be more obvious...? I'm not sure at this point. I don't think so. I introduced the endpoint/forwarding terminology specifically because local and remote often weren't clear. In this one case it might be ok, but there are lots of places where it's not necessarily obvious whether we're talking about local/remote w.r.t. passt or local/remote w.r.t. the guest. > > memset(b->ip6h.flow_lbl, 0, 3); > > =20 > > @@ -1739,26 +1739,21 @@ static void tcp_tap_window_update(struct tcp_ta= p_conn *conn, unsigned wnd) > > /** > > * tcp_seq_init() - Calculate initial sequence number according to RFC= 6528 > > * @c: Execution context > > - * @conn: TCP connection, with faddr, fport and eport populated > > + * @conn: TCP connection, with faddr, fport, eaddr, eport populated > > * @now: Current timestamp > > */ > > static void tcp_seq_init(const struct ctx *c, struct tcp_tap_conn *con= n, > > const struct timespec *now) > > { > > struct siphash_state state =3D SIPHASH_INIT(c->hash_secret); > > - union inany_addr aany; > > uint64_t hash; > > uint32_t ns; > > =20 > > - if (CONN_V4(conn)) > > - inany_from_af(&aany, AF_INET, &c->ip4.addr); > > - else > > - inany_from_af(&aany, AF_INET6, &c->ip6.addr); > > - > > - inany_siphash_feed(&state, &conn->faddr); > > - inany_siphash_feed(&state, &aany); > > + inany_siphash_feed(&state, &TAPFSIDE(conn)->faddr); > > + inany_siphash_feed(&state, &TAPFSIDE(conn)->eaddr); > > hash =3D siphash_final(&state, 36, > > - (uint64_t)conn->fport << 16 | conn->eport); > > + (uint64_t)TAPFSIDE(conn)->fport << 16 | > > + TAPFSIDE(conn)->eport); > > =20 > > /* 32ns ticks, overflows 32 bits every 137s */ > > ns =3D (now->tv_sec * 1000000000 + now->tv_nsec) >> 5; > > @@ -1925,8 +1920,6 @@ static void tcp_conn_from_tap(struct ctx *c, > > socklen_t sl; > > int s, mss; > > =20 > > - (void)saddr; > > - > > if (!(flow =3D flow_alloc())) > > return; > > =20 > > @@ -1953,6 +1946,10 @@ static void tcp_conn_from_tap(struct ctx *c, > > =20 > > conn =3D &flow->tcp; > > conn->f.type =3D FLOW_TCP; > > + flowside_from_af(TAPFSIDE(conn), PIF_TAP, af, > > + daddr, ntohs(th->dest), saddr, ntohs(th->source)); > > + ASSERT(flowside_complete(TAPFSIDE(conn))); > > + > > conn->sock =3D s; > > conn->timer =3D -1; > > conn_event(c, conn, TAP_SYN_RCVD); > > @@ -1972,8 +1969,6 @@ static void tcp_conn_from_tap(struct ctx *c, > > if (!(conn->wnd_from_tap =3D (htons(th->window) >> conn->ws_from_tap)= )) > > conn->wnd_from_tap =3D 1; > > =20 > > - inany_from_af(&conn->faddr, af, daddr); > > - > > if (af =3D=3D AF_INET) { > > sa =3D (struct sockaddr *)&addr4; > > sl =3D sizeof(addr4); > > @@ -1982,9 +1977,6 @@ static void tcp_conn_from_tap(struct ctx *c, > > sl =3D sizeof(addr6); > > } > > =20 > > - conn->fport =3D ntohs(th->dest); > > - conn->eport =3D ntohs(th->source); > > - > > conn->seq_init_from_tap =3D ntohl(th->seq); > > conn->seq_from_tap =3D conn->seq_init_from_tap + 1; > > conn->seq_ack_to_tap =3D conn->seq_from_tap; > > @@ -2674,10 +2666,21 @@ static void tcp_tap_conn_from_sock(struct ctx *= c, > > conn->ws_to_tap =3D conn->ws_from_tap =3D 0; > > conn_event(c, conn, SOCK_ACCEPTED); > > =20 > > - inany_from_sockaddr(&conn->faddr, &conn->fport, sa); > > - conn->eport =3D ref.port; > > + TAPFSIDE(conn)->pif =3D PIF_HOST; > > + inany_from_sockaddr(&TAPFSIDE(conn)->faddr, &TAPFSIDE(conn)->fport, s= a); > > + tcp_snat_inbound(c, &TAPFSIDE(conn)->faddr); > > + > > + if (CONN_V4(conn)) { > > + inany_from_af(&TAPFSIDE(conn)->eaddr, AF_INET, &c->ip4.addr_seen); > > + } else { > > + if (IN6_IS_ADDR_LINKLOCAL(&TAPFSIDE(conn)->faddr.a6)) > > + TAPFSIDE(conn)->eaddr.a6 =3D c->ip6.addr_ll_seen; > > + else > > + TAPFSIDE(conn)->eaddr.a6 =3D c->ip6.addr_seen; > > + } > > + TAPFSIDE(conn)->eport =3D ref.port; > > =20 > > - tcp_snat_inbound(c, &conn->faddr); > > + ASSERT(flowside_complete(TAPFSIDE(conn))); > > =20 > > tcp_seq_init(c, conn, now); > > tcp_hash_insert(c, conn); > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 1a4dcf2..8d9c7bd 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -23,9 +23,6 @@ > > * @ws_to_tap: Window scaling factor advertised to tap/guest > > * @sndbuf: Sending buffer in kernel, rounded to 2 ^ SNDBUF_BITS > > * @seq_dup_ack_approx: Last duplicate ACK number sent to tap > > - * @faddr: Guest side forwarding address (guest's remote address) > > - * @eport: Guest side endpoint port (guest's local port) > > - * @fport: Guest side forwarding port (guest's remote port) > > * @wnd_from_tap: Last window size from tap, unscaled (as received) > > * @wnd_to_tap: Sending window advertised to tap, unscaled (as sent) > > * @seq_to_tap: Next sequence for packets to tap > > @@ -91,11 +88,6 @@ struct tcp_tap_conn { > > =20 > > uint8_t seq_dup_ack_approx; > > =20 > > - > > - union inany_addr faddr; > > - in_port_t eport; > > - in_port_t fport; > > - > > uint16_t wnd_from_tap; > > uint16_t wnd_to_tap; > > =20 >=20 > The rest of this patch looks all good to me, but I'm still reviewing > this series, currently at 6/15. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --IKSaRY+xa04oIECG Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmWmIOkACgkQzQJF27ox 2GfpfA//UBUSG39W0tWR8I1emdUKCJOXyWtA0DrsmG98/Lds9wJ2LP9buEQPKEAL JxuKHZLaZGg2xyQOdsic6/O41xCqx2Vdc/+Xt++e/Yq86RzNI/AyVpawZj8AKJfs do4LdSyrdb2t48S90/lVA2iVt2QvhdEmW81i1z1i4Ob399a50DpA4KJn1jL0r4E4 Ip7W7jRk2ayJH5mj9ghWLKUsxjqKsEa3Q4Bu6ixMfYXnYrgppyNea0ghekuYfiUz GFoFVt1K2xVlt4AkZNFn5Ub60gUZa6Iqox2y7tnHkpFJ+l7UZhjbNz7fdkhqajZH vhXHsFkUYUETORo20NFWHu/fD+T7waENO7JZIVhGqXWw+kDEYZixHRzm30LLPEvh JTf70qQTf8uR+dcH0hZvwk9QYjEAunZfWx5u6qHPXzy/C/2FA8+TUQMIC07Svf+q VGlnfhmrVF7B5F57enfRkixVnMfdMLvVAIGPMTsrlUUWlQBjwVD8wnXvPdYct/EG kPFRNSdCT2GR5qzszVGtOYTgQRbOVWCKsaAMT0BmHBUCKErt2znGXFnF0HQxdUpc Nbr9xAGY8UAjvRacxvm/ltwMpA0h9Rc3sriymkeMYK1b0+MyxKiZWK3YPMzbgTgb eN79m/jeucPeBGlRMoPP9Q6ZxhKV58liPrtvBrwWMnEp31hKSYE= =8LGB -----END PGP SIGNATURE----- --IKSaRY+xa04oIECG--