From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 7B3A95A0271 for ; Thu, 18 Jan 2024 02:15:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1705540519; bh=Jt0h78d9Oh20NcaRd5GrX4NJCYqg4vimUNvhVJcK498=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=cpArDXtqjCbcCOHPbukm7uCldk8lFH2znDpxdSQ7gvdVLN18ibOM18hZFh+jBL3n9 anGhiAg6jVOkFg+6SXFoIkfGxx50W/ViKmXJf2D3vxw9TaIspTSaBGZzbTeQiNNIoH 9kYia2kzOpd8P5LbY37G8Ken1Zhe94bjDm03RP8kHoRKSxnfIhq7gChsnTAzFBsLeq PooMLRYgrSejQEpNOoQ+oXxDqeo+ohXiK1H2JMiCbeWBiW9Z6iZM3b33iBqniZOY8V jl3T/QSHrkeR6crZCdPlpGIzRNjmJi0HWE5DkhbVIfLRpMGt8u0rvKLUZrTc8BiUGO gE7g8roJNCPjQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TFlD72qxpz4xP9; Thu, 18 Jan 2024 12:15:19 +1100 (AEDT) Date: Thu, 18 Jan 2024 12:01:32 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 04/15] tcp_splice,flow: Maintain flow information for spliced connections Message-ID: References: <20231221070237.1422557-1-david@gibson.dropbear.id.au> <20231221070237.1422557-5-david@gibson.dropbear.id.au> <20240117205536.25e6de59@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Dviz+g7hFahSWvGV" Content-Disposition: inline In-Reply-To: <20240117205536.25e6de59@elisabeth> Message-ID-Hash: O2T7DYWU2GT5PS62WHEZZBXSBOR6NJYL X-Message-ID-Hash: O2T7DYWU2GT5PS62WHEZZBXSBOR6NJYL X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Dviz+g7hFahSWvGV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 17, 2024 at 08:59:14PM +0100, Stefano Brivio wrote: > On Thu, 21 Dec 2023 18:02:26 +1100 > David Gibson wrote: >=20 > > Every flow in the flow table now has space for the the addresses as see= n by > > both the host and guest side. We fill that information in for regular > > "tap" TCP connections, but not for spliced connections. > >=20 > > Fill in that information for spliced connections too, so it's now unifo= rmly > > available for all flow types (that are implemented so far). >=20 > I wonder if carrying the address for spliced connections is in any way > useful -- other than being obviously useful as a simplification (which > justifies this of course). The simplification / consistency is even more important than it seems right here. One of the big aims of this (though I haven't implemented it yet) is to allow our NAT to be done generically, rather that per-protocol. That requires having the addressing information in the common structure regardless of flow type. > That is, for a spliced connection, addresses and ports are kind of > meaningless to us once the connection is established: we operate > exclusively above Layer 4. >=20 > Also, conceptually, all that's there to represent for a spliced > connection is that addresses are loopback. >=20 > To be clear: I'm not suggesting any change to this -- I just want to > raise the conceptual inconsistency if it didn't occur to you. Hmm.. I would say in this case it's conceptual consistency leading to redundancy of information in practice rather than a conceptual inconsistency. > > Signed-off-by: David Gibson > > --- > > tcp.c | 35 +++++++++++++---------------- > > tcp_splice.c | 62 +++++++++++++++++++++++++++++++++++++--------------- > > tcp_splice.h | 3 +-- > > 3 files changed, 60 insertions(+), 40 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index 18ab3ac..6d77cf6 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2658,32 +2658,23 @@ static void tcp_snat_inbound(const struct ctx *= c, union inany_addr *addr) > > * tcp_tap_conn_from_sock() - Initialize state for non-spliced connect= ion > > * @c: Execution context > > * @ref: epoll reference of listening socket > > - * @conn: connection structure to initialize > > + * @conn: connection structure (with TAPFSIDE(@conn) completed) > > * @s: Accepted socket > > - * @sa: Peer socket address (from accept()) > > * @now: Current timestamp > > - * > > - * Return: true if able to create a tap connection, false otherwise > > */ > > -static bool tcp_tap_conn_from_sock(struct ctx *c, > > +static void tcp_tap_conn_from_sock(struct ctx *c, > > union tcp_listen_epoll_ref ref, > > struct tcp_tap_conn *conn, int s, > > - const struct sockaddr *sa, > > const struct timespec *now) > > { > > + ASSERT(flowside_complete(SOCKFSIDE(conn))); > > + > > conn->f.type =3D FLOW_TCP; > > conn->sock =3D s; > > conn->timer =3D -1; > > conn->ws_to_tap =3D conn->ws_from_tap =3D 0; > > conn_event(c, conn, SOCK_ACCEPTED); > > =20 > > - if (flowside_from_sock(SOCKFSIDE(conn), PIF_HOST, s, NULL, sa) < 0) { > > - err("tcp: Failed to get local name, connection dropped"); > > - return false; > > - } > > - > > - ASSERT(flowside_complete(SOCKFSIDE(conn))); > > - > > TAPFSIDE(conn)->pif =3D PIF_TAP; > > TAPFSIDE(conn)->faddr =3D SOCKFSIDE(conn)->eaddr; > > TAPFSIDE(conn)->fport =3D SOCKFSIDE(conn)->eport; > > @@ -2712,8 +2703,6 @@ static bool tcp_tap_conn_from_sock(struct ctx *c, > > conn_flag(c, conn, ACK_FROM_TAP_DUE); > > =20 > > tcp_get_sndbuf(conn); > > - > > - return true; > > } > > =20 > > /** > > @@ -2737,15 +2726,21 @@ void tcp_listen_handler(struct ctx *c, union ep= oll_ref ref, > > if (s < 0) > > goto cancel; > > =20 > > - if (c->mode =3D=3D MODE_PASTA && > > - tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, > > - s, (struct sockaddr *)&sa)) > > + if (flowside_from_sock(&flow->f.side[0], ref.tcp_listen.pif, s, > > + NULL, &sa) < 0) { > > + err("tcp: Failed to get local name, connection dropped"); > > + close(s); > > + flow_alloc_cancel(flow); > > return; > > + } > > =20 > > - if (tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, > > - (struct sockaddr *)&sa, now)) > > + if (c->mode =3D=3D MODE_PASTA && > > + tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, s= )) > > return; > > =20 > > + tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, now); > > + return;=09 > > + > > cancel: > > /* Failed to create the connection */ > > if (s >=3D 0) > > diff --git a/tcp_splice.c b/tcp_splice.c > > index eec02fe..0faeb1b 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -72,6 +72,9 @@ static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE]; > > /* Pool of pre-opened pipes */ > > static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2]; > > =20 > > +#define FSIDE0(conn) (&(conn)->f.side[0]) > > +#define FSIDE1(conn) (&(conn)->f.side[1]) > > + > > #define CONN_V6(x) (x->flags & SPLICE_V6) > > #define CONN_V4(x) (!CONN_V6(x)) > > #define CONN_HAS(conn, set) ((conn->events & (set)) =3D=3D (set)) > > @@ -280,9 +283,21 @@ bool tcp_splice_flow_defer(union flow *flow) > > static int tcp_splice_connect_finish(const struct ctx *c, > > struct tcp_splice_conn *conn) > > { > > + struct sockaddr_storage sa; > > + socklen_t sl =3D sizeof(sa); > > unsigned side; > > int i =3D 0; > > =20 > > + if (getsockname(conn->s[1], (struct sockaddr *)&sa, &sl) < 0) { > > + int ret =3D -errno; > > + conn_flag(c, conn, CLOSING); > > + return ret; > > + } > > + inany_from_sockaddr(&FSIDE1(conn)->faddr, &FSIDE1(conn)->fport, > > + (struct sockaddr *)&sa); > > + > > + ASSERT(flowside_complete(FSIDE1(conn))); > > + > > for (side =3D 0; side < SIDES; side++) { > > conn->pipe[side][0] =3D conn->pipe[side][1] =3D -1; > > =20 > > @@ -352,13 +367,24 @@ static int tcp_splice_connect(const struct ctx *c= , struct tcp_splice_conn *conn, > > conn->s[1]); > > } > > =20 > > + /* It would be nicer if we could initialise FSIDE1 all at once with > > + * flowaddrs_from_af() or flowaddrs_from_sock(). However, we can't g= et > > + * the forwarding port until the connect() has finished and we don't > > + * want to block to wait for it. Meanwhile we have the endpoint addr= ess >=20 > [...] endpoint address and port [...]. Or, if "address" includes the > port too, then the comment should also say "forwarding address", not > "forwarding port". >=20 > It's confusing otherwise: why is there anything special with the > endpoint *address* as opposed to the forwarding *port*? >=20 > > + * here, but don't have a place to stash it other than in the flowadd= rs > > + * itself. So, initialisation of FSIDE1 is split between here and > > + * tcp_splice_connect_finish(). Ugly but necessary. > > + */ > > if (CONN_V6(conn)) { > > sa =3D (struct sockaddr *)&addr6; > > sl =3D sizeof(addr6); > > + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET6, &addr6.sin6_addr); > > } else { > > sa =3D (struct sockaddr *)&addr4; > > sl =3D sizeof(addr4); > > + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET, &addr4.sin_addr); > > } > > + FSIDE1(conn)->eport =3D port; >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Dviz+g7hFahSWvGV Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmWoeGsACgkQzQJF27ox 2GdSKw//UjRkNloQtrrP7Qcfrc7ENWw2HLo9avfzLZXpKODvclDpMybEH4PjBmpu ZEl7bdEM+gvvNOpO9xFe6m1eLsI+pBQVXaXUTHCwzZDPHK10B6M4f3xm7Z3Xm0FA ZWSvtv7j1tojvS0kDBi8sH9k/SL90oMnuerSizQtnkqzaQh1W1cACvKa+MFfYcQa Ee9WolPwm7m1l5sLYvMmUyK8Xv2t0WDUYnPOl6zPWfxGS6yxIchTphXAgOBe1oZs yAL4SWds/SlPgs23+QSeW/D3MFVBtZTRqzUm5KHXoQuJ8bMWzfuvegdqd9f5/Jzb 6V7DJOkf3rE0yvc2cjxU+Uz/8ubqHAWSCLKnVUpfbt3LqHZ6X9YwBCqzxrojFhm8 IeSMaMSwflYej+lYFCSi3F3ZlHxv3jI/G4ahdyAMGtilAMRI9kr1vttLbfUUOJJY Tq3zSWByzm4Y9GF8zDcJbkFRk0BjiPkPv/yWHXMOuaCpaQnBHUuclUby6Ki70mJP bL+OmTra/ry+oMEw4y7hBNlti8HjSJzQGmJGiJNtklolltjmw5rbk3vLhyu9kT5r dml4j3jeeavd6u+DXbZoaIDTJhWjUuNAc14yBfY0qrkFoLpJQPFPyCPXrVDq8tje j6gkg38+orghSAqqTAeAGw07hOjQhJMsBTU6TsTfPmb862ZiQoU= =StGc -----END PGP SIGNATURE----- --Dviz+g7hFahSWvGV--