From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 85CA85A0270 for ; Thu, 17 Nov 2022 03:08:55 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4NCNcx2Jlbz4xZh; Thu, 17 Nov 2022 13:08:49 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1668650929; bh=cblMN7PPA272S3QyaFGu90xZiAd9RDAsWOwVtzO7ZGc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OeT8SG2mmrZhTo+rvAypLM5ZErapkD+VgsCdjI4ytMTwn4R20eGD7ol8qexVlMDno AXD7TA8cdXZRRfV4w1mgXbr4sscq1/LrMPfEjeoWWdsoA6YRLXSiK9rsdSN+n8n3vC QKPX03gJUs5mOEtJyq6aKwSJzrcZdhBrMnhaKcc4= Date: Thu, 17 Nov 2022 12:32:35 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 14/32] tcp: Separate helpers to create ns listening sockets Message-ID: References: <20221116044212.3876516-1-david@gibson.dropbear.id.au> <20221116044212.3876516-15-david@gibson.dropbear.id.au> <20221117005138.17007cb1@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vrWCFPPVDjjUViFr" Content-Disposition: inline In-Reply-To: <20221117005138.17007cb1@elisabeth> Message-ID-Hash: LJGEAMIW27S2JCVVYR4OE6HROYEDBPZW X-Message-ID-Hash: LJGEAMIW27S2JCVVYR4OE6HROYEDBPZW X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --vrWCFPPVDjjUViFr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 17, 2022 at 12:51:38AM +0100, Stefano Brivio wrote: > On Wed, 16 Nov 2022 15:41:54 +1100 > David Gibson wrote: >=20 > > tcp_sock_init*() can create either sockets listening on the host, or in > > the pasta network namespace (with @ns=3D=3D1). There are, however, a n= umber > > of differences in how these two cases work in practice though. "ns" > > sockets are only used in pasta mode, and they always lead to spliced > > connections only. The functions are also only ever called in "ns" mode > > with a NULL address and interface name, and it doesn't really make sense > > for them to be called any other way. > >=20 > > Later changes will introduce further differences in behaviour between t= hese > > two cases, so it makes more sense to use separate functions for creating > > the ns listening sockets than the regular external/host listening socke= ts. > > --- > > conf.c | 6 +-- > > tcp.c | 130 ++++++++++++++++++++++++++++++++++++++------------------- > > tcp.h | 4 +- > > 3 files changed, 92 insertions(+), 48 deletions(-) > >=20 > > diff --git a/conf.c b/conf.c > > index 3ad247e..2b39d18 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -209,7 +209,7 @@ static int conf_ports(const struct ctx *c, char opt= name, const char *optarg, > > =20 > > for (i =3D 0; i < PORT_EPHEMERAL_MIN; i++) { > > if (optname =3D=3D 't') > > - tcp_sock_init(c, 0, AF_UNSPEC, NULL, NULL, i); > > + tcp_sock_init(c, AF_UNSPEC, NULL, NULL, i); > > else if (optname =3D=3D 'u') > > udp_sock_init(c, 0, AF_UNSPEC, NULL, NULL, i); > > } > > @@ -287,7 +287,7 @@ static int conf_ports(const struct ctx *c, char opt= name, const char *optarg, > > bitmap_set(fwd->map, i); > > =20 > > if (optname =3D=3D 't') > > - tcp_sock_init(c, 0, af, addr, ifname, i); > > + tcp_sock_init(c, af, addr, ifname, i); > > else if (optname =3D=3D 'u') > > udp_sock_init(c, 0, af, addr, ifname, i); > > } > > @@ -333,7 +333,7 @@ static int conf_ports(const struct ctx *c, char opt= name, const char *optarg, > > fwd->delta[i] =3D mapped_range.first - orig_range.first; > > =20 > > if (optname =3D=3D 't') > > - tcp_sock_init(c, 0, af, addr, ifname, i); > > + tcp_sock_init(c, af, addr, ifname, i); > > else if (optname =3D=3D 'u') > > udp_sock_init(c, 0, af, addr, ifname, i); > > } > > diff --git a/tcp.c b/tcp.c > > index aac70cd..72d3b49 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2987,15 +2987,15 @@ void tcp_sock_handler(struct ctx *c, union epol= l_ref ref, uint32_t events, > > /** > > * tcp_sock_init4() - Initialise listening sockets for a given IPv4 po= rt > > * @c: Execution context > > - * @ns: In pasta mode, if set, bind with loopback address in namespace > > * @addr: Pointer to address for binding, NULL if not configured > > * @ifname: Name of interface to bind to, NULL if not configured > > * @port: Port, host order > > */ > > -static void tcp_sock_init4(const struct ctx *c, int ns, const struct i= n_addr *addr, > > +static void tcp_sock_init4(const struct ctx *c, const struct in_addr *= addr, > > const char *ifname, in_port_t port) > > { > > - union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.outbound =3D n= s }; > > + in_port_t idx =3D port + c->tcp.fwd_in.delta[port]; > > + union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.index =3D idx = }; >=20 > Usual order here... You mean the reverse christmas tree thing? I can't do that here, because idx is used in the next declaration. > > bool spliced =3D false, tap =3D true; > > int s; > > =20 > > @@ -3006,14 +3006,9 @@ static void tcp_sock_init4(const struct ctx *c, = int ns, const struct in_addr *ad > > if (!addr) > > addr =3D &c->ip4.addr; > > =20 > > - tap =3D !ns && !IN4_IS_ADDR_LOOPBACK(addr); > > + tap =3D !IN4_IS_ADDR_LOOPBACK(addr); > > } > > =20 > > - if (ns) > > - tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_out.delta[port]); > > - else > > - tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_in.delta[port]); > > - > > if (tap) { > > s =3D sock_l4(c, AF_INET, IPPROTO_TCP, addr, ifname, port, > > tref.u32); > > @@ -3039,29 +3034,25 @@ static void tcp_sock_init4(const struct ctx *c,= int ns, const struct in_addr *ad > > else > > s =3D -1; > > =20 > > - if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) { > > - if (ns) > > - tcp_sock_ns[port][V4] =3D s; > > - else > > - tcp_sock_init_lo[port][V4] =3D s; > > - } > > + if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) > > + tcp_sock_init_lo[port][V4] =3D s; > > } > > } > > =20 > > /** > > * tcp_sock_init6() - Initialise listening sockets for a given IPv6 po= rt > > * @c: Execution context > > - * @ns: In pasta mode, if set, bind with loopback address in namespace > > * @addr: Pointer to address for binding, NULL if not configured > > * @ifname: Name of interface to bind to, NULL if not configured > > * @port: Port, host order > > */ > > -static void tcp_sock_init6(const struct ctx *c, int ns, > > +static void tcp_sock_init6(const struct ctx *c, > > const struct in6_addr *addr, const char *ifname, > > in_port_t port) > > { > > - union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.outbound =3D n= s, > > - .tcp.v6 =3D 1 }; > > + in_port_t idx =3D port + c->tcp.fwd_in.delta[port]; > > + union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.v6 =3D 1, > > + .tcp.index =3D idx }; >=20 > Excess whitespace. Fixed. > > bool spliced =3D false, tap =3D true; > > int s; > > =20 > > @@ -3073,14 +3064,9 @@ static void tcp_sock_init6(const struct ctx *c, = int ns, > > if (!addr) > > addr =3D &c->ip6.addr; > > =20 > > - tap =3D !ns && !IN6_IS_ADDR_LOOPBACK(addr); > > + tap =3D !IN6_IS_ADDR_LOOPBACK(addr); > > } > > =20 > > - if (ns) > > - tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_out.delta[port]); > > - else > > - tref.tcp.index =3D (in_port_t)(port + c->tcp.fwd_in.delta[port]); > > - > > if (tap) { > > s =3D sock_l4(c, AF_INET6, IPPROTO_TCP, addr, ifname, port, > > tref.u32); > > @@ -3105,40 +3091,99 @@ static void tcp_sock_init6(const struct ctx *c,= int ns, > > else > > s =3D -1; > > =20 > > - if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) { > > - if (ns) > > - tcp_sock_ns[port][V6] =3D s; > > - else > > - tcp_sock_init_lo[port][V6] =3D s; > > - } > > + if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) > > + tcp_sock_init_lo[port][V6] =3D s; > > } > > } > > =20 > > /** > > * tcp_sock_init() - Initialise listening sockets for a given port >=20 > Maybe we should now indicate this is for "inbound" connections only > ("for a given, inbound, port"?) Updated. > > * @c: Execution context > > - * @ns: In pasta mode, if set, bind with loopback address in namespace > > * @af: Address family to select a specific IP version, or AF_UNSPEC > > * @addr: Pointer to address for binding, NULL if not configured > > * @ifname: Name of interface to bind to, NULL if not configured > > * @port: Port, host order > > */ > > -void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > - const void *addr, const char *ifname, in_port_t port) > > +void tcp_sock_init(const struct ctx *c, sa_family_t af, const void *ad= dr, > > + const char *ifname, in_port_t port) > > { > > if ((af =3D=3D AF_INET || af =3D=3D AF_UNSPEC) && c->ifi4) > > - tcp_sock_init4(c, ns, addr, ifname, port); > > + tcp_sock_init4(c, addr, ifname, port); > > if ((af =3D=3D AF_INET6 || af =3D=3D AF_UNSPEC) && c->ifi6) > > - tcp_sock_init6(c, ns, addr, ifname, port); > > + tcp_sock_init6(c, addr, ifname, port); > > +} > > + > > +/** > > + * tcp_ns_sock_init4() - Init socket to listen for outbound IPv4 conne= ctions > > + * @c: Execution context > > + * @port: Port, host order > > + */ > > +static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port) > > +{ > > + in_port_t idx =3D port + c->tcp.fwd_out.delta[port]; >=20 > Move after declaration of 'loopback'. Again, I can't do that because idx is used in the next declaration. > > + union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.outbound =3D 1, > > + .tcp.splice =3D 1, .tcp.index =3D idx }; > > + struct in_addr loopback =3D { htonl(INADDR_LOOPBACK) }; > > + int s; > > + > > + assert(c->mode =3D=3D MODE_PASTA); > > + > > + s =3D sock_l4(c, AF_INET, IPPROTO_TCP, &loopback, NULL, port, tref.u3= 2); > > + if (s >=3D 0) > > + tcp_sock_set_bufsize(c, s); > > + else > > + s =3D -1; > > + > > + if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) > > + tcp_sock_ns[port][V4] =3D s; > > } > > =20 > > /** > > - * tcp_sock_init_ns() - Bind sockets in namespace for outbound connect= ions > > + * tcp_ns_sock_init6() - Init socket to listen for outbound IPv6 conne= ctions > > + * @c: Execution context > > + * @port: Port, host order > > + */ > > +static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port) > > +{ > > + in_port_t idx =3D port + c->tcp.fwd_out.delta[port]; > > + union tcp_epoll_ref tref =3D { .tcp.listen =3D 1, .tcp.outbound =3D 1, > > + .tcp.splice =3D 1, .tcp.v6 =3D 1, > > + .tcp.index =3D idx}; >=20 > Missing whitespace between 'idx' and }; Fixed. > > + int s; > > + > > + assert(c->mode =3D=3D MODE_PASTA); > > + > > + s =3D sock_l4(c, AF_INET6, IPPROTO_TCP, &in6addr_loopback, NULL, port, > > + tref.u32); > > + if (s >=3D 0) > > + tcp_sock_set_bufsize(c, s); > > + else > > + s =3D -1; > > + > > + if (c->tcp.fwd_out.mode =3D=3D FWD_AUTO) > > + tcp_sock_ns[port][V6] =3D s; > > +} > > + > > +/** > > + * tcp_ns_sock_init() - Init socket to listen for spliced outbound con= nections > > + * @c: Execution context > > + * @port: Port, host order > > + */ > > +void tcp_ns_sock_init(const struct ctx *c, in_port_t port) > > +{ > > + if (c->ifi4) > > + tcp_ns_sock_init4(c, port); > > + if (c->ifi6) > > + tcp_ns_sock_init6(c, port); > > +} > > + > > +/** > > + * tcp_ns_socks_init() - Bind sockets in namespace for outbound connec= tions > > * @arg: Execution context > > * > > * Return: 0 > > */ > > -static int tcp_sock_init_ns(void *arg) > > +static int tcp_ns_socks_init(void *arg) > > { > > struct ctx *c =3D (struct ctx *)arg; > > unsigned port; > > @@ -3149,7 +3194,7 @@ static int tcp_sock_init_ns(void *arg) > > if (!bitmap_isset(c->tcp.fwd_out.map, port)) > > continue; > > =20 > > - tcp_sock_init(c, 1, AF_UNSPEC, NULL, NULL, port); > > + tcp_ns_sock_init(c, port); > > } > > =20 > > return 0; > > @@ -3279,7 +3324,7 @@ int tcp_init(struct ctx *c) > > if (c->mode =3D=3D MODE_PASTA) { > > tcp_splice_init(c); > > =20 > > - NS_CALL(tcp_sock_init_ns, c); > > + NS_CALL(tcp_ns_socks_init, c); > > =20 > > refill_arg.ns =3D 1; > > NS_CALL(tcp_sock_refill, &refill_arg); > > @@ -3364,8 +3409,7 @@ static int tcp_port_rebind(void *arg) > > =20 > > if ((a->c->ifi4 && tcp_sock_ns[port][V4] =3D=3D -1) || > > (a->c->ifi6 && tcp_sock_ns[port][V6] =3D=3D -1)) > > - tcp_sock_init(a->c, 1, AF_UNSPEC, NULL, NULL, > > - port); > > + tcp_ns_sock_init(a->c, port); > > } > > } else { > > for (port =3D 0; port < NUM_PORTS; port++) { > > @@ -3398,7 +3442,7 @@ static int tcp_port_rebind(void *arg) > > =20 > > if ((a->c->ifi4 && tcp_sock_init_ext[port][V4] =3D=3D -1) || > > (a->c->ifi6 && tcp_sock_init_ext[port][V6] =3D=3D -1)) > > - tcp_sock_init(a->c, 0, AF_UNSPEC, NULL, NULL, > > + tcp_sock_init(a->c, AF_UNSPEC, NULL, NULL, > > port); > > } > > } > > diff --git a/tcp.h b/tcp.h > > index 49738ef..f4ed298 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -19,8 +19,8 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref = ref, uint32_t events, > > const struct timespec *now); > > int tcp_tap_handler(struct ctx *c, int af, const void *addr, > > const struct pool *p, const struct timespec *now); > > -void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > - const void *addr, const char *ifname, in_port_t port); > > +void tcp_sock_init(const struct ctx *c, sa_family_t af, const void *ad= dr, > > + const char *ifname, in_port_t port); > > int tcp_init(struct ctx *c); > > void tcp_timer(struct ctx *c, const struct timespec *ts); > > void tcp_defer_handler(struct ctx *c); >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --vrWCFPPVDjjUViFr Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmN1jy0ACgkQgypY4gEw YSJkhRAApYTH48RncBhujo/SRIWxlx7VtUQuiG8lA8gfBklEy0GC9mNU3k4DDwbM +QSRy2sjslqBqWHIAj4xURw3aYBO0nLJRm/EoUwoa9ACI7eJ9JlSwHE44tIlGgTi T/2wATkS5UeecyfWW6uHdLivveZqvP+RYIwjWF82jCUCyVCK6pnFhtyCcAYGACbf qV4rYN+VBqLyZxJN8vv8CWeG+WgRQGkQ1XRFmX4Y+0hVRqZb9+eH7bjbG0PpbMce 2GCdyufOyiD5365ZfQyrwbobaxGjJJDjEGXpxJJOP4n6jV7LME9xzxvAItRNEcQp 5JvwIg9vySXVPVVI1cRwXg80AXQjsIP1T4AjSLBpIkePsuZ0cwGIQ6cwf8boAGJj 9BbEA/7x64xfPqEC0lgImRrHSO9u11sfpBwm3AlsdBnNHxNPEdx5wVwx5GhJ2wsI kOraBwypnqGmVy5WA6ntTDxoH4tXzgodsvb+F2SH0dHgXPGOgVUJuDaRZQcvnt7S 6M/H4Xd88et3poC6FO20gjCiDm1vYGKgQfJzd5gaPh1eBOxpYAJdF4BmLkWRmzmh bw5Bl0HRMALiNFAPsoq7LYqYkt4hICeuIfwmK2F01KweTeuLbxAs24DsdTYXf+93 C91xr8L0JEwuzb80wFWeowQgib5udpu2wN/tOkyhxaoEGtn2+j4= =vJrW -----END PGP SIGNATURE----- --vrWCFPPVDjjUViFr--