From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 9AD9B5A026D for ; Sat, 4 Nov 2023 03:03:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1699063421; bh=XpuBhZ75XyGSmxdZmBXz6b0CPPd4I9RKnwPbCU3UeeE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dyhLFdqq9mdcM5Tv271whWXqTxMyu7H7T0XT+AhNHYfDz8joK8ICvI3gV5dznh5iP acG/8lYzGdggJqleSw0MQz/WJI46pCcu7eRqzYWwrdzCL9t03RaRQmu79pq4Ti11Gr BBVy0jHgupoVNeeVFnA6I5mHXEqdcOjkCLItgPN8= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4SMgrY65ncz4xWC; Sat, 4 Nov 2023 13:03:41 +1100 (AEDT) Date: Sat, 4 Nov 2023 13:03:24 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 3/4] pif: Record originating pif in listening socket refs Message-ID: References: <20231009083013.2837178-1-david@gibson.dropbear.id.au> <20231009083013.2837178-4-david@gibson.dropbear.id.au> <20231103104736.28a4913b@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="R1VpQxA1yi0IGvLS" Content-Disposition: inline In-Reply-To: <20231103104736.28a4913b@elisabeth> Message-ID-Hash: 45M6RLZJ4K26VOXKU5H4ZIPFG6XPX6R3 X-Message-ID-Hash: 45M6RLZJ4K26VOXKU5H4ZIPFG6XPX6R3 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --R1VpQxA1yi0IGvLS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 03, 2023 at 10:47:36AM +0100, Stefano Brivio wrote: > On Mon, 9 Oct 2023 19:30:12 +1100 > David Gibson wrote: >=20 > > For certain socket types, we record in the epoll ref whether they're > > sockets in the namespace, or on the host. We now have the notion of "p= if" > > to indicate what "place" a socket is associated with, so generalise the > > simple one-bit 'ns' to a pif id. > >=20 > > Signed-off-by: David Gibson > > --- > > passt.h | 1 + > > tcp.c | 5 +++-- > > tcp.h | 4 ++-- > > tcp_splice.c | 10 ++++++---- > > udp.c | 23 ++++++++++++----------- > > udp.h | 8 ++++---- > > 6 files changed, 28 insertions(+), 23 deletions(-) > >=20 > > diff --git a/passt.h b/passt.h > > index 53defa4..cac720a 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -35,6 +35,7 @@ union epoll_ref; > > #include > > #include > > =20 > > +#include "pif.h" > > #include "packet.h" > > #include "icmp.h" > > #include "port_fwd.h" > > diff --git a/tcp.c b/tcp.c > > index c13b7de..bad8c38 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2964,6 +2964,7 @@ static int tcp_sock_init_af(const struct ctx *c, = int af, in_port_t port, > > { > > union tcp_listen_epoll_ref tref =3D { > > .port =3D port + c->tcp.fwd_in.delta[port], > > + .pif =3D PIF_HOST, > > }; > > int s; > > =20 > > @@ -3025,7 +3026,7 @@ static void tcp_ns_sock_init4(const struct ctx *c= , in_port_t port) > > { > > union tcp_listen_epoll_ref tref =3D { > > .port =3D port + c->tcp.fwd_out.delta[port], > > - .ns =3D true, > > + .pif =3D PIF_SPLICE, > > }; > > struct in_addr loopback =3D { htonl(INADDR_LOOPBACK) }; > > int s; > > @@ -3051,7 +3052,7 @@ static void tcp_ns_sock_init6(const struct ctx *c= , in_port_t port) > > { > > union tcp_listen_epoll_ref tref =3D { > > .port =3D port + c->tcp.fwd_out.delta[port], > > - .ns =3D true, > > + .pif =3D PIF_SPLICE, > > }; > > int s; > > =20 > > diff --git a/tcp.h b/tcp.h > > index 6444d6a..3f6937c 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -41,13 +41,13 @@ union tcp_epoll_ref { > > /** > > * union tcp_listen_epoll_ref - epoll reference portion for TCP listen= ing > > * @port: Port number we're forwarding *to* (listening port plus delta) > > - * @ns: True if listening within the pasta namespace > > + * @pif: Pif in which the socket is listening > > * @u32: Opaque u32 value of reference > > */ > > union tcp_listen_epoll_ref { > > struct { > > in_port_t port; > > - bool ns; > > + pif_id pif; > > }; > > uint32_t u32; > > }; > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 54fc317..19f5406 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -411,12 +411,12 @@ static int tcp_splice_connect(const struct ctx *c= , struct tcp_splice_conn *conn, > > * @c: Execution context > > * @conn: Connection pointer > > * @port: Destination port, host order > > - * @outbound: Connection request coming from namespace > > + * @pif: Originating pif of the splice > > * > > * Return: return code from connect() > > */ > > static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn = *conn, > > - in_port_t port, int outbound) > > + in_port_t port, pif_id pif) > > { > > int s =3D -1; > > =20 > > @@ -427,7 +427,7 @@ static int tcp_splice_new(const struct ctx *c, stru= ct tcp_splice_conn *conn, > > * entering the ns anyway, so we might as well refill the > > * pool. > > */ > > - if (outbound) { > > + if (pif =3D=3D PIF_SPLICE) { > > int *p =3D CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4; > > int af =3D CONN_V6(conn) ? AF_INET6 : AF_INET; > > =20 > > @@ -437,6 +437,8 @@ static int tcp_splice_new(const struct ctx *c, stru= ct tcp_splice_conn *conn, > > } else { > > int *p =3D CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4; > > =20 > > + ASSERT(pif =3D=3D PIF_HOST); > > + > > /* If pool is empty, refill it first */ > > if (p[TCP_SOCK_POOL_SIZE-1] < 0) > > NS_CALL(tcp_sock_refill_ns, c); > > @@ -516,7 +518,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c, > > conn->c.spliced =3D true; > > conn->a =3D s; > > =20 > > - if (tcp_splice_new(c, conn, ref.port, ref.ns)) > > + if (tcp_splice_new(c, conn, ref.port, ref.pif)) > > conn_flag(c, conn, CLOSING); > > =20 > > return true; > > diff --git a/udp.c b/udp.c > > index db35859..ce02979 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -365,7 +365,7 @@ static void udp_sock6_iov_init(const struct ctx *c) > > * @c: Execution context > > * @v6: Set for IPv6 sockets > > * @src: Source port of original connection, host order > > - * @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace > > + * @ns: Does the splice originate in the ns or not > > * > > * Return: prepared socket, negative error code on failure > > * > > @@ -375,16 +375,17 @@ int udp_splice_new(const struct ctx *c, int v6, i= n_port_t src, bool ns) > > { > > struct epoll_event ev =3D { .events =3D EPOLLIN | EPOLLRDHUP | EPOLLH= UP }; > > union epoll_ref ref =3D { .type =3D EPOLL_TYPE_UDP, > > - .udp =3D { .splice =3D true, .ns =3D ns, > > - .v6 =3D v6, .port =3D src } > > + .udp =3D { .splice =3D true, .v6 =3D v6, .port =3D src } > > }; > > struct udp_splice_port *sp; > > int act, s; > > =20 > > if (ns) { > > + ref.udp.pif =3D PIF_SPLICE; > > sp =3D &udp_splice_ns[v6 ? V6 : V4][src]; > > act =3D UDP_ACT_SPLICE_NS; > > } else { > > + ref.udp.pif =3D PIF_HOST; > > sp =3D &udp_splice_init[v6 ? V6 : V4][src]; > > act =3D UDP_ACT_SPLICE_INIT; > > } > > @@ -495,15 +496,15 @@ static int udp_mmh_splice_port(bool v6, const str= uct mmsghdr *mmh) > > * @n: Number of datagrams to send > > * @src: Datagrams will be sent from this port (on origin side) > > * @dst: Datagrams will be send to this port (on destination side) > > + * @from_pif: Pif from which the packet originated >=20 > "pif" isn't a word and not a proper acronym either... should we keep > this simpler and always spell it lowercase (we would do the same with, > say, "uint8_t") even at the beginning of a sentence? Not a strong > preference. Makes sense to me, changed throughout. > > * @v6: Send as IPv6? > > - * @from_ns: If true send from pasta ns to init, otherwise reverse > > * @allow_new: If true create sending socket if needed, if false disca= rd > > * if no sending socket is available > > * @now: Timestamp > > */ > > static void udp_splice_sendfrom(const struct ctx *c, unsigned start, u= nsigned n, > > - in_port_t src, in_port_t dst, > > - bool v6, bool from_ns, bool allow_new, > > + in_port_t src, in_port_t dst, pif_id from_pif, > > + bool v6, bool allow_new, > > const struct timespec *now) > > { > > struct mmsghdr *mmh_recv, *mmh_send; > > @@ -518,7 +519,7 @@ static void udp_splice_sendfrom(const struct ctx *c= , unsigned start, unsigned n, > > mmh_send =3D udp4_mh_splice; > > } > > =20 > > - if (from_ns) { > > + if (from_pif =3D=3D PIF_SPLICE) { > > src +=3D c->udp.fwd_in.rdelta[src]; > > s =3D udp_splice_init[v6][src].sock; > > if (!s && allow_new) > > @@ -530,6 +531,7 @@ static void udp_splice_sendfrom(const struct ctx *c= , unsigned start, unsigned n, > > udp_splice_ns[v6][dst].ts =3D now->tv_sec; > > udp_splice_init[v6][src].ts =3D now->tv_sec; > > } else { > > + ASSERT(from_pif =3D=3D PIF_HOST); > > src +=3D c->udp.fwd_out.rdelta[src]; > > s =3D udp_splice_ns[v6][src].sock; > > if (!s && allow_new) { > > @@ -776,7 +778,7 @@ void udp_sock_handler(const struct ctx *c, union ep= oll_ref ref, uint32_t events, > > =20 > > if (splicefrom >=3D 0) > > udp_splice_sendfrom(c, i, m, splicefrom, dstport, > > - v6, ref.udp.ns, ref.udp.orig, now); > > + ref.udp.pif, v6, ref.udp.orig, now); > > else > > udp_tap_send(c, i, m, dstport, v6, now); > > } > > @@ -974,8 +976,10 @@ int udp_sock_init(const struct ctx *c, int ns, sa_= family_t af, > > int s, r4 =3D FD_REF_MAX + 1, r6 =3D FD_REF_MAX + 1; > > =20 > > if (ns) { > > + uref.pif =3D PIF_SPLICE; > > uref.port =3D (in_port_t)(port + c->udp.fwd_out.f.delta[port]); > > } else { > > + uref.pif =3D PIF_HOST; > > uref.port =3D (in_port_t)(port + c->udp.fwd_in.f.delta[port]); > > } > > =20 > > @@ -990,7 +994,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa_f= amily_t af, > > udp_splice_init[V4][port].sock =3D s < 0 ? -1 : s; > > } else { > > struct in_addr loopback =3D { htonl(INADDR_LOOPBACK) }; > > - uref.ns =3D true; > > =20 > > r4 =3D s =3D sock_l4(c, AF_INET, IPPROTO_UDP, &loopback, > > ifname, port, uref.u32); > > @@ -1008,8 +1011,6 @@ int udp_sock_init(const struct ctx *c, int ns, sa= _family_t af, > > udp_tap_map[V6][uref.port].sock =3D s < 0 ? -1 : s; > > udp_splice_init[V6][port].sock =3D s < 0 ? -1 : s; > > } else { > > - uref.ns =3D true; > > - > > r6 =3D s =3D sock_l4(c, AF_INET6, IPPROTO_UDP, > > &in6addr_loopback, > > ifname, port, uref.u32); > > diff --git a/udp.h b/udp.h > > index 7837134..122a3d9 100644 > > --- a/udp.h > > +++ b/udp.h > > @@ -20,21 +20,21 @@ void udp_update_l2_buf(const unsigned char *eth_d, = const unsigned char *eth_s); > > =20 > > /** > > * union udp_epoll_ref - epoll reference portion for TCP connections > > + * @port: Source port for connected sockets, bound port otherwise > > + * @pif: Pif for this socket > > * @bound: Set if this file descriptor is a bound socket > > * @splice: Set if descriptor packets to be "spliced" > > * @orig: Set if a spliced socket which can originate "connections" > > - * @ns: Set if this is a socket in the pasta network namespace > > * @v6: Set for IPv6 sockets or connections > > - * @port: Source port for connected sockets, bound port otherwise > > * @u32: Opaque u32 value of reference > > */ > > union udp_epoll_ref { > > struct { > > + in_port_t port; > > + pif_id pif; > > bool splice:1, > > orig:1, > > - ns:1, > > v6:1; > > - uint32_t port:16; >=20 > This is rather a comment to 2/4, but its application is more obvious > here: having 'uint8_t pif' would show clearly where it needs to be > placed in these types of struct, instead of hiding its type behind the > typedef. Oof. I have mixed thoughts about this. In this instance, you're right, that the typedef obscures what's going in here, in a packed bitfield where it's pretty relevant. In other places the type makes it clearer what's valid information in that slot. So I'm not really sure which way to go. > The rest of the series looks good to me. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --R1VpQxA1yi0IGvLS Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmVFploACgkQzQJF27ox 2GduCQ/+MU9noaqCXUIj/OAetM69cdjDw0s6HSTwxOYNrfffvBXZosZo4zm+Frjd GkEvSvtVFHjFEeaSmQHBFTs/6YH3y8HNl6X0zjV7W2ZBPD27iHBarIO1ouznlAOx l1DvgiMlHzRFY+Z36pNDhBkpbHDdnyDe8awGuqm9fPPv7Lxtjzh4Q+pNT6VOadOc nB1fzfEliMqBp3zEtH2Hc+JTvasNe/WguSZYkl5lkK2AJ0Dg+cZekppIqmNUNhK9 w1g/NSvT5eWPjxcITQAqUyoBPeDrdKwzYI+sPEWzJokswpFUUxhoG25uypajVvlw QAWpNKH1Fn129KMFEAomP809EzUQOjz0TUqcltNN+WbfgphAqLB1q0Rhr4EPC/O7 6hOXE4ZeG1rko/9a2BrZaDYrMThVH2wf0QvKUeAXQU9S+UGpfuQOun82bTikUTsS LvMCkk30Uad9/lCU+lZDvypCPfg0CdREzd9ignCHJRhcu628kEjWUfTvlvrkxH5P DDnxv8mqzd5Q26+v3XHGeEoYX90aE897JHt8+9xZtVUdO9M05oqd4d81zLB30d/8 +e96CihCdCySM6xPFfjgX6RCxMNVkqCY29Us42DVrLSkJPszviNAN1zBxJ2SSAcc 69/oHahPwNv2NAC1poUeEr8HGyjFlisIiZzg8xPFRVwAyQw977U= =pDzJ -----END PGP SIGNATURE----- --R1VpQxA1yi0IGvLS--