From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 9795D5A026D for ; Tue, 11 Oct 2022 03:23:57 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4MmdN755ltz4xGj; Tue, 11 Oct 2022 12:23:51 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665451431; bh=RSlI3M59rdEhc8mRxaU3m0VtjVgYRmVFheEu6YOvpF8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=S3Ju+unob8aKdSjen8VCbufHVuO6nrXCbbbK3Bdw8W5djNebf8QOqMS3mWMs54rQX 2L5AzOPwn48Gk14gcK8uvLGtSnmUdRJWegBSTaAK4Pj9x5Mdh2GdmDh+UTONa/9+FT SuJDci2noFfI7vtmyrzwgNJzT52STcM6bAzAmgPU= Date: Tue, 11 Oct 2022 12:19:49 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Message-ID: References: <20221010233350.1198630-1-sbrivio@redhat.com> <20221010233350.1198630-3-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="W4lhU5kBttFOkhQe" Content-Disposition: inline In-Reply-To: <20221010233350.1198630-3-sbrivio@redhat.com> Message-ID-Hash: ERAZB4XR6BLTF4N4Z4ASG4FFXBNF5LTI X-Message-ID-Hash: ERAZB4XR6BLTF4N4Z4ASG4FFXBNF5LTI X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --W4lhU5kBttFOkhQe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 11, 2022 at 01:33:49AM +0200, Stefano Brivio wrote: > It's indeed convenient to use the final destination port in the epoll > reference data, so that we don't have to check the configured port > deltas before establishing connections. However, this doesn't work if > we need to access the original port information to know what to do > once we receive a connection. >=20 > The existing implementation resulted in inbound, spliced connections > with a remapped destination port (and only those) to be attempted on > the outbound side, as we would check the wrong position in port > bitmaps to establish whether to use inbound or outbound sockets. >=20 > For listening spliced sockets, set the port index in the epoll > reference to the original port. Once we get a connection, use the > port delta arrays to find the final destination port, and the > original destination port to decide what socket pool we should use. >=20 > This avoids the need for a reverse mapping table as implemented for > UDP. >=20 > Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework") > Signed-off-by: Stefano Brivio > --- > tcp.c | 19 ++++++++----------- > tcp_splice.c | 18 ++++++++++++++---- > 2 files changed, 22 insertions(+), 15 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 63153b6..cc08353 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -3084,15 +3084,16 @@ void tcp_sock_handler(struct ctx *c, union epoll_= ref ref, uint32_t events, > void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > const void *addr, const char *ifname, in_port_t port) > { > + union tcp_epoll_ref tref_spliced =3D { .tcp.listen =3D 1, .tcp.splice = =3D 1 }; > union tcp_epoll_ref tref =3D { .tcp.listen =3D 1 }; > const void *bind_addr; > int s; > =20 > - if (ns) { > + if (ns) > tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_out.delta[port]); > - } else { > + else > tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_in.delta[port]); > - } > + tref_spliced.tcp.index =3D (in_port_t) port; > =20 > if (af =3D=3D AF_INET || af =3D=3D AF_UNSPEC) { > if (!addr && c->mode =3D=3D MODE_PASTA) > @@ -3100,8 +3101,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_= family_t af, > else > bind_addr =3D addr; > =20 > - tref.tcp.v6 =3D 0; > - tref.tcp.splice =3D 0; > + tref.tcp.v6 =3D tref_spliced.tcp.v6 =3D 0; > =20 > if (!ns) { > s =3D sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname, > @@ -3118,9 +3118,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_= family_t af, > if (c->mode =3D=3D MODE_PASTA) { > bind_addr =3D &(uint32_t){ htonl(INADDR_LOOPBACK) }; > =20 > - tref.tcp.splice =3D 1; > s =3D sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname, > - port, tref.u32); > + port, tref_spliced.u32); > if (s >=3D 0) > tcp_sock_set_bufsize(c, s); > else > @@ -3141,9 +3140,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_= family_t af, > else > bind_addr =3D addr; > =20 > - tref.tcp.v6 =3D 1; > + tref.tcp.v6 =3D tref_spliced.tcp.v6 =3D 1; > =20 > - tref.tcp.splice =3D 0; > if (!ns) { > s =3D sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname, > port, tref.u32); > @@ -3159,9 +3157,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_= family_t af, > if (c->mode =3D=3D MODE_PASTA) { > bind_addr =3D &in6addr_loopback; > =20 > - tref.tcp.splice =3D 1; > s =3D sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname, > - port, tref.u32); > + port, tref_spliced.u32); > if (s >=3D 0) > tcp_sock_set_bufsize(c, s); > else > diff --git a/tcp_splice.c b/tcp_splice.c > index 96c31c8..bf5f62c 100644 > --- a/tcp_splice.c > +++ b/tcp_splice.c > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg) > static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *c= onn, > in_port_t port) > { > - struct tcp_splice_connect_ns_arg ns_arg =3D { c, conn, port, 0 }; > int *p, i, s =3D -1; > + bool inbound; > =20 > - if (bitmap_isset(c->tcp.fwd_in.map, port)) > + if (bitmap_isset(c->tcp.fwd_in.map, port)) { > p =3D CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4; > - else > + port +=3D c->tcp.fwd_in.delta[port]; > + inbound =3D true; > + } else { > p =3D CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4; > + port +=3D c->tcp.fwd_out.delta[port]; > + inbound =3D false; > + } This smells wrong to me. AFAICT there's nothing inherently wrong with forwarding inbound connections to port 5015 to port 5016 in the namespace *and* forwarding outbound connections to port 5015 to port 5016 on the host. So I think rather than consulting the port map here, each conn object should know if its an inward or outward connection. To make that work with epoll, we'd also need a bit in the ref which says whether it's a socket listening on the host or in the ns. > =20 > for (i =3D 0; i < TCP_SOCK_POOL_SIZE; i++, p++) { > SWAP(s, *p); > @@ -526,11 +532,15 @@ static int tcp_splice_new(const struct ctx *c, stru= ct tcp_splice_conn *conn, > break; > } > =20 > - if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) { > + /* No socket available in namespace: create a new one for connect() */ > + if (s < 0 && inbound) { > + struct tcp_splice_connect_ns_arg ns_arg =3D { c, conn, port, 0 }; > + > NS_CALL(tcp_splice_connect_ns, &ns_arg); > return ns_arg.ret; > } > =20 > + /* Otherwise, the socket will connect on the side it was created on */ > return tcp_splice_connect(c, conn, s, port); > } > =20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --W4lhU5kBttFOkhQe Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNExK4ACgkQgypY4gEw YSJmqRAAtdNa0zIGQE6+ZQzCjZ24/lEkIU0dKgHht+gBH5QZ0KHzhZgJctQhIOmu g2vlKpNbAcD3M04uzTq8Wwx9AHrVCcIxW+jQaYywV6pOdeKm7jJAht8LFVHQXggp E2x4xH7p2l6mrKjcxOs/w/dxzztThRN9HsOU9EB/pJrEyZIfSAMLufbhsxoFKf0u NtI832YlVWMu3Ejv4HDZNQcsIZkycBMX8D3swdB3LrMa0s0MNBJqoSd0xAakHIS7 x2Tc2Eln22Hp5IcyaVrh2+yUiPbwra9vo5aT3ADFYi67Z2RvxjkzuhxBPxolcjW1 +qbroOhHHooe0kfuPiBhn7BYEmLvO81V/7TdYZQ8xHnLpuaINII6FweLwSq6MK18 38joW7BGP8gRzmumfZjvepXK36U8VxOMRAlRHZ360pGLggDEcEQXn/cc3pWO3qvE pnqN9CuUYs5FqX4f9ApeuEqQlUaPQDBDwgYIkI30DdPQJN+zEDzSaDO8hhOSYLLi dQ4vIYHTD3+Axvi+Lye7iC894qEMk2LaPeWB8h+s2ufSprB0LYkw6/WEjQDByJCk IZG6Yw0kOC48XK9hvQjLhafoe4qXpWR/c0mQ0kUSzEE1DrSe66UFrRarkaE+kf5U GTKq0p4qRrRiXE1Iyf3eynLtVOLqUTkqFZ3w+29R/15/A7fYwV0= =rRwx -----END PGP SIGNATURE----- --W4lhU5kBttFOkhQe--