From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 9BC6D5A0268 for ; Sun, 15 Jan 2023 01:31:24 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4NvbgB71Fkz4xwq; Sun, 15 Jan 2023 11:31:18 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1673742678; bh=HQk5QriXzyZnJcljdXMOhvT84Twet4nknmI42hFAj+Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SGyrh7XmnQZNAbiU3PTHvm5k82pU7BEgDWbA0bIkP54rsYyL16ULsWORYwbr0Ix7w 2BDFIPtdheYsN5R9DLoTMCfQyXbtJbHUFzcREDZNSDza6brBootbxDaZcs2c17O7SY fE4moygxgGUkc51Sn6pGxy54cE9880Xz732df3E8= Date: Sun, 15 Jan 2023 10:31:08 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 5/5] tcp: Improve handling of fallback if socket pool is empty on new splice Message-ID: References: <20230109005026.24512-1-david@gibson.dropbear.id.au> <20230109005026.24512-6-david@gibson.dropbear.id.au> <20230112190927.502081e7@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9pqXUjOYZUXTGNaz" Content-Disposition: inline In-Reply-To: <20230112190927.502081e7@elisabeth> Message-ID-Hash: R66VKB6OA4JRENTI67QLW4GQ6EFW2SMT X-Message-ID-Hash: R66VKB6OA4JRENTI67QLW4GQ6EFW2SMT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --9pqXUjOYZUXTGNaz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 12, 2023 at 07:09:27PM +0100, Stefano Brivio wrote: > On Mon, 9 Jan 2023 11:50:26 +1100 > David Gibson wrote: >=20 > > When creating a new spliced connection, we need to get a socket in the > > other ns from the originating one. To avoid excessive ns switches we > > usually get these from a pool refilled on a timer. However, if the pool > > runs out we need a fallback. Currently that's done by passing -1 as the > > socket to tcp_splice_connnect() and running it in the target ns. > >=20 > > This means that tcp_splice_connect() itself needs to have different cas= es > > depending on whether it's given an existing socket or not, which is > > a separate concern from what it's mostly doing. We change it to require > > a suitable open socket to be passed in, and ensuring in the caller that= we > > have one. > >=20 > > This requires adding the fallback paths to the caller, tcp_splice_new(). > > We use slightly different approaches for a socket in the init ns versus= the > > guest ns. > >=20 > > This also means that we no longer need to run tcp_splice_connect() itse= lf > > in the guest ns, which allows us to remove a bunch of boilerplate code. > >=20 > > Signed-off-by: David Gibson > > --- > > tcp.c | 2 +- > > tcp_conn.h | 1 + > > tcp_splice.c | 89 ++++++++++++++++++---------------------------------- > > 3 files changed, 32 insertions(+), 60 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index a540235..7a88dab 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1849,7 +1849,7 @@ int tcp_conn_pool_sock(int pool[]) > > * > > * Return: socket number on success, negative code if socket creation = failed > > */ > > -static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af) > > +int tcp_conn_new_sock(const struct ctx *c, sa_family_t af) > > { > > int s; > > =20 > > diff --git a/tcp_conn.h b/tcp_conn.h > > index c807e8b..a499f34 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -193,6 +193,7 @@ void tcp_table_compact(struct ctx *c, union tcp_con= n *hole); > > void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union); > > void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union); > > int tcp_conn_pool_sock(int pool[]); > > +int tcp_conn_new_sock(const struct ctx *c, sa_family_t af); > > void tcp_sock_refill_pool(const struct ctx *c, int pool[], int af); > > void tcp_splice_refill(const struct ctx *c); > > =20 > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 5a88975..1cd9fdc 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -88,6 +88,9 @@ static const char *tcp_splice_flag_str[] __attribute(= (__unused__)) =3D { > > "RCVLOWAT_ACT_B", "CLOSING", > > }; > > =20 > > +/* Forward decl */ >=20 > s/decl/declaration/ Fixed. > > +static int tcp_sock_refill_ns(void *arg); > > + > > /** > > * tcp_splice_conn_epoll_events() - epoll events masks for given state > > * @events: Connection event flags > > @@ -348,12 +351,8 @@ static int tcp_splice_connect_finish(const struct = ctx *c, > > * Return: 0 for connect() succeeded or in progress, negative value on= error > > */ > > static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_c= onn *conn, > > - int s, in_port_t port) > > + int sock_conn, in_port_t port) > > { > > - int sock_conn =3D (s >=3D 0) ? s : socket(CONN_V6(conn) ? AF_INET6 : > > - AF_INET, > > - SOCK_STREAM | SOCK_NONBLOCK, > > - IPPROTO_TCP); > > struct sockaddr_in6 addr6 =3D { > > .sin6_family =3D AF_INET6, > > .sin6_port =3D htons(port), > > @@ -367,19 +366,8 @@ static int tcp_splice_connect(const struct ctx *c,= struct tcp_splice_conn *conn, > > const struct sockaddr *sa; > > socklen_t sl; > > =20 > > - if (sock_conn < 0) > > - return -errno; > > - > > - if (sock_conn > SOCKET_MAX) { > > - close(sock_conn); > > - return -EIO; > > - } > > - > > conn->b =3D sock_conn; > > =20 > > - if (s < 0) > > - tcp_sock_set_bufsize(c, conn->b); > > - > > if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK, > > &((int){ 1 }), sizeof(int))) { > > trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i", > > @@ -410,36 +398,6 @@ static int tcp_splice_connect(const struct ctx *c,= struct tcp_splice_conn *conn, > > return 0; > > } > > =20 > > -/** > > - * struct tcp_splice_connect_ns_arg - Arguments for tcp_splice_connect= _ns() > > - * @c: Execution context > > - * @conn: Accepted inbound connection > > - * @port: Destination port, host order > > - * @ret: Return value of tcp_splice_connect_ns() > > - */ > > -struct tcp_splice_connect_ns_arg { > > - const struct ctx *c; > > - struct tcp_splice_conn *conn; > > - in_port_t port; > > - int ret; > > -}; > > - > > -/** > > - * tcp_splice_connect_ns() - Enter namespace and call tcp_splice_conne= ct() > > - * @arg: See struct tcp_splice_connect_ns_arg > > - * > > - * Return: 0 > > - */ > > -static int tcp_splice_connect_ns(void *arg) > > -{ > > - struct tcp_splice_connect_ns_arg *a; > > - > > - a =3D (struct tcp_splice_connect_ns_arg *)arg; > > - ns_enter(a->c); > > - a->ret =3D tcp_splice_connect(a->c, a->conn, -1, a->port); > > - return 0; > > -} > > - > > /** > > * tcp_splice_new() - Handle new spliced connection > > * @c: Execution context > > @@ -452,24 +410,37 @@ static int tcp_splice_connect_ns(void *arg) > > static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn = *conn, > > in_port_t port, int outbound) > > { > > - int *p, s =3D -1; > > - > > - if (outbound) > > - p =3D CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4; > > - else > > - p =3D CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4; > > + int s =3D -1; > > + > > + /* If the pool is empty we take slightly different approaches > > + * for init or ns sockets. For init sockets we just open a > > + * new one without refilling the pool to keep latency down. > > + * For ns sockets, we're going to incur the latency of > > + * entering the ns anyway, so we might as well refill the > > + * pool. > > + */ >=20 > Well, wait, one thing is the latency of clone() and setns(), another > thing is the latency coming from a series of (up to) 32 socket() calls. >=20 > It's much lower, indeed, but somewhat comparable. Could we just have the > same approach for both cases? Well, we could. Making both paths refill the pool is very simple, but might introduce more latency than we really want for the easier init case, Making both paths just open a single socket is a little bit fiddlier, because we'll need some more boilerplate for another helper that can be NS_CALL()ed, instead of reusing the refill one. What we have here was the tradeoff I settled on, admittedly without much quantative idea of the latencies involved. Happy to accept your call on a better one. > > + if (outbound) { > > + int *p =3D CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4; > > + int af =3D CONN_V6(conn) ? AF_INET6 : AF_INET; > > + > > + s =3D tcp_conn_pool_sock(p); > > + if (s < 0) > > + s =3D tcp_conn_new_sock(c, af); > > + } else { > > + int *p =3D CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4; > > =20 > > - s =3D tcp_conn_pool_sock(p); > > + /* If pool is empty, refill it first */ > > + if (p[TCP_SOCK_POOL_SIZE-1] < 0) >=20 > [TCP_SOCK_POOL_SIZE - 1] >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --9pqXUjOYZUXTGNaz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmPDSUUACgkQgypY4gEw YSI1ERAAuqF/DyVAQaNrQPrFx0mqdisMjNY2mf0gsACvTgze4uXUqixRhUtE7GAi 3RS5dVPkxhKsGf6t/ZU6hg8JphuzJNn50Z6JqYnKs5SEj+aH269HcPRMce/v3hVj V7Hj9+F6HelTbDBKX18TygSm9qTxOhvmv1HByMG94jbliNr5YIOaboMbGo6m8BnA w8hR+6gJ6553/OhHzKxwD46ngo2Lpc4/9a6vbbgEa8rya06geVOhTeDUnTVsCKxU CRIMfsBiQal5dKMpLjOgKeZhyvmjj1+1nyv6mgfEGtUxHFy7b5w2XsLrvY37ECbZ eTS4+6lLHFRpyVhI04AU7pGabP5wYecZzZjq2C2BxJ45l4mPAYf+HuqZpPjHIfL4 1lqCtzvKKEJMPPXAnNq6urbDIFVc/1Jvq5HjDzSnLy3tfWJjUAeccOuv+vuoVEhR g8oGV547xQdzy/Qu/GbyMYlTEQUaXsrCu51fAcSjqd8oH5wB0PvOXWwrx7eYIilX rmItoU9ZGKYaNZ0Cpag+HDNRcc2SvldsP+fa9i6oVO0j0uXphEFHfMqXFNRhct0E T0oK8fvDgHw4W/HDmZ9kbPuW6EZoLOpEctFuTdpPtVo2x6/P4AiF/Rcu3O19Z5Aq nI1o7sHvlz4Rj7nJ5kwFew6Z3p8QVo7T2ukkSnkQuWUO92wJ4d4= =8kvz -----END PGP SIGNATURE----- --9pqXUjOYZUXTGNaz--