From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D84AA5A026D for ; Thu, 7 Sep 2023 06:14:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1694060051; bh=FgijdifJdJfTCLXgyjWpxZA6/MJO935mxnYSXl1fmUM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QPRw8Hmcxdzq48a1Ur/mjX79EFsc6CVZlttPcqacZh4ahJpPxVP9xRapnSESY08pu C1jUhDay4jioapeKZQb/hL8jTvWr/xqVer3mnHGaXmOVLBmQ14d0jCWb4aVgYXMA48 H17bH8Ff2lyzsmgpnCN29qaReZyn6FGKa+TuRgII= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Rh5Tv3X5Zz4xFD; Thu, 7 Sep 2023 14:14:11 +1000 (AEST) Date: Thu, 7 Sep 2023 14:14:04 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 10/10] tcp_splice: Fill out flowside information for spliced connections Message-ID: References: <20230828054146.48673-1-david@gibson.dropbear.id.au> <20230828054146.48673-11-david@gibson.dropbear.id.au> <20230907030253.7ec8c24d@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="3nssH11EBbiURYBE" Content-Disposition: inline In-Reply-To: <20230907030253.7ec8c24d@elisabeth> Message-ID-Hash: U5YDIGPZC3E23EQUE4VIQCXCM5F4VFYU X-Message-ID-Hash: U5YDIGPZC3E23EQUE4VIQCXCM5F4VFYU X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --3nssH11EBbiURYBE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 07, 2023 at 03:02:53AM +0200, Stefano Brivio wrote: > On Mon, 28 Aug 2023 15:41:46 +1000 > David Gibson wrote: >=20 > > Every flow in the flow table now has space for the the addresses as see= n by > > both the host and guest side. We fill that information in for regular > > "tap" TCP connections, but not for spliced connections. > >=20 > > Fill in that information for spliced connections too, so it's now unifo= rmly > > available for all flow types (that we've implemented so far). > >=20 > > Signed-off-by: David Gibson > > --- > > tcp.c | 46 +++++++++++++++++++--------------------------- > > tcp_splice.c | 40 ++++++++++++++++++++++++++-------------- > > tcp_splice.h | 3 +-- > > 3 files changed, 46 insertions(+), 43 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index 297134f..7459fc2 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2639,37 +2639,25 @@ static void tcp_snat_inbound(const struct ctx *= c, union inany_addr *addr) > > * tcp_tap_conn_from_sock() - Initialize state for non-spliced connect= ion > > * @c: Execution context > > * @ref: epoll reference of listening socket > > - * @conn: connection structure to initialize > > + * @conn: connection structure (with TAPSIDE(@conn) completed) > > * @s: Accepted socket > > - * @sa: Peer socket address (from accept()) > > * @now: Current timestamp > > - * > > - * Return: true if able to create a tap connection, false otherwise > > */ > > -static bool tcp_tap_conn_from_sock(struct ctx *c, > > +static void tcp_tap_conn_from_sock(struct ctx *c, > > union tcp_listen_epoll_ref ref, > > struct tcp_tap_conn *conn, int s, > > - struct sockaddr *sa, > > const struct timespec *now) > > { > > char fsstr[FLOWSIDE_STRLEN]; > > =20 > > + ASSERT(flowside_complete(SOCKSIDE(conn))); > > + > > conn->f.type =3D FLOW_TCP; > > conn->sock =3D s; > > conn->timer =3D -1; > > conn->ws_to_tap =3D conn->ws_from_tap =3D 0; > > conn_event(c, conn, SOCK_ACCEPTED); > > =20 > > - if (flowside_getsockname(SOCKSIDE(conn), s) < 0) { > > - err("tcp: Failed to get local name, connection dropped"); > > - return false; > > - } > > - inany_from_sockaddr(&SOCKSIDE(conn)->eaddr, &SOCKSIDE(conn)->eport, s= a); > > - > > - ASSERT(flowside_complete(SOCKSIDE(conn))); > > - debug("TCP: index %li, new connection from socket, %s", FLOW_IDX(conn= ), > > - flowside_fmt(SOCKSIDE(conn), fsstr, sizeof(fsstr))); > > - > > TAPSIDE(conn)->faddr =3D SOCKSIDE(conn)->eaddr; > > TAPSIDE(conn)->fport =3D SOCKSIDE(conn)->eport; > > tcp_snat_inbound(c, &TAPSIDE(conn)->faddr); > > @@ -2699,8 +2687,6 @@ static bool tcp_tap_conn_from_sock(struct ctx *c, > > conn_flag(c, conn, ACK_FROM_TAP_DUE); > > =20 > > tcp_get_sndbuf(conn); > > - > > - return true; > > } > > =20 > > /** > > @@ -2712,6 +2698,7 @@ static bool tcp_tap_conn_from_sock(struct ctx *c, > > void tcp_listen_handler(struct ctx *c, union epoll_ref ref, > > const struct timespec *now) > > { > > + char fsstr[FLOWSIDE_STRLEN]; > > struct sockaddr_storage sa; > > union flow *flow; > > socklen_t sl; > > @@ -2730,20 +2717,25 @@ void tcp_listen_handler(struct ctx *c, union ep= oll_ref ref, > > if (s < 0) > > return; > > =20 > > - flow =3D flowtab + c->flow_count++; > > + flow =3D flowtab + c->flow_count; > > =20 > > - if (c->mode =3D=3D MODE_PASTA && > > - tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, > > - s, (struct sockaddr *)&sa)) > > + if (flowside_getsockname(&flow->f.side[0], s) < 0) { > > + err("tcp: Failed to get local name, connection dropped"); > > + close(s); > > return; > > + } > > + inany_from_sockaddr(&flow->f.side[0].eaddr, &flow->f.side[0].eport, > > + &sa); > > + c->flow_count++; > > =20 > > - if (tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, > > - (struct sockaddr *)&sa, now)) > > + debug("TCP: index %li, new connection from socket, %s", FLOW_IDX(flow= ), > > + flowside_fmt(&flow->f.side[0], fsstr, sizeof(fsstr))); > > + > > + if (c->mode =3D=3D MODE_PASTA && > > + tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, s= )) > > return; > > =20 > > - /* Failed to create the connection */ > > - close(s); > > - c->flow_count--; > > + tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, now); > > } > > =20 > > /** > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 676e7e8..018d095 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -73,6 +73,9 @@ static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE]; > > /* Pool of pre-opened pipes */ > > static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2][2]; > > =20 > > +#define ASIDE(conn) (&(conn)->f.side[0]) > > +#define BSIDE(conn) (&(conn)->f.side[1]) > > + > > #define CONN_V6(x) (x->flags & SPLICE_V6) > > #define CONN_V4(x) (!CONN_V6(x)) > > #define CONN_HAS(conn, set) ((conn->events & (set)) =3D=3D (set)) > > @@ -310,7 +313,16 @@ void tcp_splice_destroy(struct ctx *c, union flow = *flow) > > static int tcp_splice_connect_finish(const struct ctx *c, > > struct tcp_splice_conn *conn) > > { > > - int i; > > + char fsstr[FLOWSIDE_STRLEN]; > > + int i, rc; > > + > > + rc =3D flowside_getsockname(BSIDE(conn), conn->b); > > + if (rc) > > + return rc; > > + > > + ASSERT(flowside_complete(BSIDE(conn))); > > + debug("TCP (splice): index %li, connection forwarded, %s", FLOW_IDX(c= onn), > > + flowside_fmt(BSIDE(conn), fsstr, sizeof(fsstr))); > > =20 > > conn->pipe_a_b[0] =3D conn->pipe_b_a[0] =3D -1; > > conn->pipe_a_b[1] =3D conn->pipe_b_a[1] =3D -1; > > @@ -386,10 +398,13 @@ static int tcp_splice_connect(const struct ctx *c= , struct tcp_splice_conn *conn, > > if (CONN_V6(conn)) { > > sa =3D (struct sockaddr *)&addr6; > > sl =3D sizeof(addr6); > > + inany_from_af(&BSIDE(conn)->eaddr, AF_INET6, &addr6.sin6_addr); > > } else { > > sa =3D (struct sockaddr *)&addr4; > > sl =3D sizeof(addr4); > > + inany_from_af(&BSIDE(conn)->eaddr, AF_INET, &addr4.sin_addr); > > } > > + BSIDE(conn)->eport =3D port; > > =20 > > if (connect(conn->b, sa, sl)) { > > if (errno !=3D EINPROGRESS) { > > @@ -480,33 +495,30 @@ static void tcp_splice_dir(struct tcp_splice_conn= *conn, int ref_sock, > > * tcp_splice_conn_from_sock() - Attempt to init state for a spliced c= onnection > > * @c: Execution context > > * @ref: epoll reference of listening socket > > - * @conn: connection structure to initialize > > + * @conn: connection structure (with ASIDE(@conn) completed) > > * @s: Accepted socket > > - * @sa: Peer address of connection > > * > > * Return: true if able to create a spliced connection, false otherwise > > * #syscalls:pasta setsockopt > > */ > > bool tcp_splice_conn_from_sock(struct ctx *c, union tcp_listen_epoll_r= ef ref, > > - struct tcp_splice_conn *conn, int s, > > - const struct sockaddr *sa) > > + struct tcp_splice_conn *conn, int s) > > { > > - const struct in_addr *a4; > > - union inany_addr aany; > > - in_port_t port; > > + const struct in_addr *e4 =3D inany_v4(&ASIDE(conn)->eaddr); > > + const struct in_addr *f4 =3D inany_v4(&ASIDE(conn)->faddr); > > =20 > > ASSERT(c->mode =3D=3D MODE_PASTA); > > + ASSERT(flowside_complete(ASIDE(conn))); > > =20 > > - inany_from_sockaddr(&aany, &port, sa); > > - a4 =3D inany_v4(&aany); > > - > > - if (a4) { > > - if (!IN4_IS_ADDR_LOOPBACK(a4)) > > + if (e4) { > > + if (!IN4_IS_ADDR_LOOPBACK(e4)) > > return false; > > + ASSERT(f4 && IN4_IS_ADDR_LOOPBACK(f4)); >=20 > I can't follow this: the test you're replacing is actually (still) a > test used by tcp_listen_handler() unless I'm missing something. I'm not replacing a test - I'm just rewriting the same test, then adding an assert. > Returning false here should simply mean we can't use a spliced > connection, not that something is wrong. Yes. The 'if' checks if this is a spliceable connection - we have a loopback endpoint (remote to pasta) address. The assert is then checking that the forwarding address (local to pasta) is also loopback - i.e. that we don't have traffic that's going from 127.0.0.1 to a non-loopback address which would be nonsense. I guess this does indicate that the kernel has given us something weird, rather than an internal bug, so maybe it should be log an error and drop the connection rather than asserting. >=20 > > conn->flags =3D 0; > > } else { > > - if (!IN6_IS_ADDR_LOOPBACK(&aany.a6)) > > + if (!IN6_IS_ADDR_LOOPBACK(&ASIDE(conn)->eaddr.a6)) > > return false; > > + ASSERT(IN6_IS_ADDR_LOOPBACK(&ASIDE(conn)->faddr.a6)); >=20 > ...same here. >=20 > Everything else in the series looks good to me! It looks simpler (so > far) than I thought it would be. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --3nssH11EBbiURYBE Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmT5TgYACgkQzQJF27ox 2Gfrdg/8C8JygBjaBCxKuxtaraiwlst+6ujwuG21mAs/mif+fVa5JnYEf3eNB7I2 EUudpxuOUgCN2DCNjC5pGHNkdoagEtD8qpM2kveIojf7E+oO+vSTsPs+G10VFeYn DLvyOfTy3fWAoOAbbeXpG2UEKHGbCdmWAbrruiuP65JKNIiAQjSlacZ2EFsVAdnA C7tSFBPfI4vUSQxm7R9mmIdeI2yC758hcOY4MHhDWbbfCHry6aFiZO8udWhpfQkg 4UxpNKt3opdOn/gJlkKZyzlMN5t8RFg7iDnJWtD+k8JnPbfDaQsgmrIuVH77UIG4 FQLJTIhOvUxg1x7rzi5EFTsup5GJfZ1vveu36591EHTTafYfdOdNtt2Lj6lLfVeu uOHHbPAkxInZM4JXH6kfLbmSe9XKTNtKjEsGPf7I2hxGbvIBigzLXqbST4ANOuXO aDzhaaweoV9dhCRZl4yfoeVugmr8tlcIpmYFgCEROnst5/WBACGNbCpBAXx0Ubs/ ca4C8l8tHF43vUTjqzYGMB47N+02wPG3mZT7egfmfkz1z+IObIC6SZmmNdNvWain 5ElV0pePIunnUtZiQ4RaldMvNbZ9pO8jdZvPMC+9uB7Z1xS2LoV1nIpUKgP1aeCV owbmh6H8JhhCa50wyGTvBXgpcXz4FO+fjQ3+c8bQ3vbY+P2OIGM= =xic0 -----END PGP SIGNATURE----- --3nssH11EBbiURYBE--