public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 5/5] tcp: Improve handling of fallback if socket pool is empty on new splice
Date: Mon, 23 Jan 2023 18:47:31 +0100	[thread overview]
Message-ID: <20230123184731.61f06299@elisabeth> (raw)
In-Reply-To: <Y8NJTFUkkhGFeLnT@yekko>

On Sun, 15 Jan 2023 10:31:08 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jan 12, 2023 at 07:09:27PM +0100, Stefano Brivio wrote:
> > On Mon,  9 Jan 2023 11:50:26 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > When creating a new spliced connection, we need to get a socket in the
> > > other ns from the originating one.  To avoid excessive ns switches we
> > > usually get these from a pool refilled on a timer.  However, if the pool
> > > runs out we need a fallback.  Currently that's done by passing -1 as the
> > > socket to tcp_splice_connnect() and running it in the target ns.
> > > 
> > > This means that tcp_splice_connect() itself needs to have different cases
> > > depending on whether it's given an existing socket or not, which is
> > > a separate concern from what it's mostly doing.  We change it to require
> > > a suitable open socket to be passed in, and ensuring in the caller that we
> > > have one.
> > > 
> > > This requires adding the fallback paths to the caller, tcp_splice_new().
> > > We use slightly different approaches for a socket in the init ns versus the
> > > guest ns.
> > > 
> > > This also means that we no longer need to run tcp_splice_connect() itself
> > > in the guest ns, which allows us to remove a bunch of boilerplate code.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  tcp.c        |  2 +-
> > >  tcp_conn.h   |  1 +
> > >  tcp_splice.c | 89 ++++++++++++++++++----------------------------------
> > >  3 files changed, 32 insertions(+), 60 deletions(-)
> > > 
> > > diff --git a/tcp.c b/tcp.c
> > > index a540235..7a88dab 100644
> > > --- a/tcp.c
> > > +++ b/tcp.c
> > > @@ -1849,7 +1849,7 @@ int tcp_conn_pool_sock(int pool[])
> > >   *
> > >   * Return: socket number on success, negative code if socket creation failed
> > >   */
> > > -static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af)
> > > +int tcp_conn_new_sock(const struct ctx *c, sa_family_t af)
> > >  {
> > >  	int s;
> > >  
> > > diff --git a/tcp_conn.h b/tcp_conn.h
> > > index c807e8b..a499f34 100644
> > > --- a/tcp_conn.h
> > > +++ b/tcp_conn.h
> > > @@ -193,6 +193,7 @@ void tcp_table_compact(struct ctx *c, union tcp_conn *hole);
> > >  void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union);
> > >  void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union);
> > >  int tcp_conn_pool_sock(int pool[]);
> > > +int tcp_conn_new_sock(const struct ctx *c, sa_family_t af);
> > >  void tcp_sock_refill_pool(const struct ctx *c, int pool[], int af);
> > >  void tcp_splice_refill(const struct ctx *c);
> > >  
> > > diff --git a/tcp_splice.c b/tcp_splice.c
> > > index 5a88975..1cd9fdc 100644
> > > --- a/tcp_splice.c
> > > +++ b/tcp_splice.c
> > > @@ -88,6 +88,9 @@ static const char *tcp_splice_flag_str[] __attribute((__unused__)) = {
> > >  	"RCVLOWAT_ACT_B", "CLOSING",
> > >  };
> > >  
> > > +/* Forward decl */  
> > 
> > s/decl/declaration/  
> 
> Fixed.
> 
> > > +static int tcp_sock_refill_ns(void *arg);
> > > +
> > >  /**
> > >   * tcp_splice_conn_epoll_events() - epoll events masks for given state
> > >   * @events:	Connection event flags
> > > @@ -348,12 +351,8 @@ static int tcp_splice_connect_finish(const struct ctx *c,
> > >   * Return: 0 for connect() succeeded or in progress, negative value on error
> > >   */
> > >  static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
> > > -			      int s, in_port_t port)
> > > +			      int sock_conn, in_port_t port)
> > >  {
> > > -	int sock_conn = (s >= 0) ? s : socket(CONN_V6(conn) ? AF_INET6 :
> > > -							      AF_INET,
> > > -					      SOCK_STREAM | SOCK_NONBLOCK,
> > > -					      IPPROTO_TCP);
> > >  	struct sockaddr_in6 addr6 = {
> > >  		.sin6_family = AF_INET6,
> > >  		.sin6_port = htons(port),
> > > @@ -367,19 +366,8 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
> > >  	const struct sockaddr *sa;
> > >  	socklen_t sl;
> > >  
> > > -	if (sock_conn < 0)
> > > -		return -errno;
> > > -
> > > -	if (sock_conn > SOCKET_MAX) {
> > > -		close(sock_conn);
> > > -		return -EIO;
> > > -	}
> > > -
> > >  	conn->b = sock_conn;
> > >  
> > > -	if (s < 0)
> > > -		tcp_sock_set_bufsize(c, conn->b);
> > > -
> > >  	if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK,
> > >  		       &((int){ 1 }), sizeof(int))) {
> > >  		trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i",
> > > @@ -410,36 +398,6 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
> > >  	return 0;
> > >  }
> > >  
> > > -/**
> > > - * struct tcp_splice_connect_ns_arg - Arguments for tcp_splice_connect_ns()
> > > - * @c:		Execution context
> > > - * @conn:	Accepted inbound connection
> > > - * @port:	Destination port, host order
> > > - * @ret:	Return value of tcp_splice_connect_ns()
> > > - */
> > > -struct tcp_splice_connect_ns_arg {
> > > -	const struct ctx *c;
> > > -	struct tcp_splice_conn *conn;
> > > -	in_port_t port;
> > > -	int ret;
> > > -};
> > > -
> > > -/**
> > > - * tcp_splice_connect_ns() - Enter namespace and call tcp_splice_connect()
> > > - * @arg:	See struct tcp_splice_connect_ns_arg
> > > - *
> > > - * Return: 0
> > > - */
> > > -static int tcp_splice_connect_ns(void *arg)
> > > -{
> > > -	struct tcp_splice_connect_ns_arg *a;
> > > -
> > > -	a = (struct tcp_splice_connect_ns_arg *)arg;
> > > -	ns_enter(a->c);
> > > -	a->ret = tcp_splice_connect(a->c, a->conn, -1, a->port);
> > > -	return 0;
> > > -}
> > > -
> > >  /**
> > >   * tcp_splice_new() - Handle new spliced connection
> > >   * @c:		Execution context
> > > @@ -452,24 +410,37 @@ static int tcp_splice_connect_ns(void *arg)
> > >  static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> > >  			  in_port_t port, int outbound)
> > >  {
> > > -	int *p, s = -1;
> > > -
> > > -	if (outbound)
> > > -		p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
> > > -	else
> > > -		p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
> > > +	int s = -1;
> > > +
> > > +	/* If the pool is empty we take slightly different approaches
> > > +	 * for init or ns sockets.  For init sockets we just open a
> > > +	 * new one without refilling the pool to keep latency down.
> > > +	 * For ns sockets, we're going to incur the latency of
> > > +	 * entering the ns anyway, so we might as well refill the
> > > +	 * pool.
> > > +	 */  
> > 
> > Well, wait, one thing is the latency of clone() and setns(), another
> > thing is the latency coming from a series of (up to) 32 socket() calls.
> > 
> > It's much lower, indeed, but somewhat comparable. Could we just have the
> > same approach for both cases?  
> 
> Well, we could.  Making both paths refill the pool is very simple, but
> might introduce more latency than we really want for the easier init
> case,
> 
> Making both paths just open a single socket is a little bit fiddlier,
> because we'll need some more boilerplate for another helper that can
> be NS_CALL()ed, instead of reusing the refill one.
> 
> What we have here was the tradeoff I settled on, admittedly without
> much quantative idea of the latencies involved.  Happy to accept your
> call on a better one.

Thinking about this a bit further, I guess it's actually okay to have
this path refill the pool -- after all, it only happens if the pool is
already depleted, which shouldn't be a frequent condition, in practice.

-- 
Stefano


      reply	other threads:[~2023-01-23 17:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  0:50 [PATCH 0/5] Cleanups to tcp socket pool handling David Gibson
2023-01-09  0:50 ` [PATCH 1/5] tcp: Make a helper to refill each socket pool David Gibson
2023-01-12 18:09   ` Stefano Brivio
2023-01-15  0:22     ` David Gibson
2023-01-09  0:50 ` [PATCH 2/5] tcp: Split init and ns cases for tcp_sock_refill() David Gibson
2023-01-09  0:50 ` [PATCH 3/5] tcp: Move socket pool declarations around David Gibson
2023-01-12 18:09   ` Stefano Brivio
2023-01-15  0:26     ` David Gibson
2023-01-09  0:50 ` [PATCH 4/5] tcp: Split pool lookup from creating new sockets in tcp_conn_new_sock() David Gibson
2023-01-09  0:50 ` [PATCH 5/5] tcp: Improve handling of fallback if socket pool is empty on new splice David Gibson
2023-01-12 18:09   ` Stefano Brivio
2023-01-15  0:31     ` David Gibson
2023-01-23 17:47       ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230123184731.61f06299@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).