On Wed, Jan 17, 2024 at 08:59:14PM +0100, Stefano Brivio wrote: > On Thu, 21 Dec 2023 18:02:26 +1100 > David Gibson wrote: > > > Every flow in the flow table now has space for the the addresses as seen by > > both the host and guest side. We fill that information in for regular > > "tap" TCP connections, but not for spliced connections. > > > > Fill in that information for spliced connections too, so it's now uniformly > > available for all flow types (that are implemented so far). > > I wonder if carrying the address for spliced connections is in any way > useful -- other than being obviously useful as a simplification (which > justifies this of course). The simplification / consistency is even more important than it seems right here. One of the big aims of this (though I haven't implemented it yet) is to allow our NAT to be done generically, rather that per-protocol. That requires having the addressing information in the common structure regardless of flow type. > That is, for a spliced connection, addresses and ports are kind of > meaningless to us once the connection is established: we operate > exclusively above Layer 4. > > Also, conceptually, all that's there to represent for a spliced > connection is that addresses are loopback. > > To be clear: I'm not suggesting any change to this -- I just want to > raise the conceptual inconsistency if it didn't occur to you. Hmm.. I would say in this case it's conceptual consistency leading to redundancy of information in practice rather than a conceptual inconsistency. > > Signed-off-by: David Gibson > > --- > > tcp.c | 35 +++++++++++++---------------- > > tcp_splice.c | 62 +++++++++++++++++++++++++++++++++++++--------------- > > tcp_splice.h | 3 +-- > > 3 files changed, 60 insertions(+), 40 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index 18ab3ac..6d77cf6 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -2658,32 +2658,23 @@ static void tcp_snat_inbound(const struct ctx *c, union inany_addr *addr) > > * tcp_tap_conn_from_sock() - Initialize state for non-spliced connection > > * @c: Execution context > > * @ref: epoll reference of listening socket > > - * @conn: connection structure to initialize > > + * @conn: connection structure (with TAPFSIDE(@conn) completed) > > * @s: Accepted socket > > - * @sa: Peer socket address (from accept()) > > * @now: Current timestamp > > - * > > - * Return: true if able to create a tap connection, false otherwise > > */ > > -static bool tcp_tap_conn_from_sock(struct ctx *c, > > +static void tcp_tap_conn_from_sock(struct ctx *c, > > union tcp_listen_epoll_ref ref, > > struct tcp_tap_conn *conn, int s, > > - const struct sockaddr *sa, > > const struct timespec *now) > > { > > + ASSERT(flowside_complete(SOCKFSIDE(conn))); > > + > > conn->f.type = FLOW_TCP; > > conn->sock = s; > > conn->timer = -1; > > conn->ws_to_tap = conn->ws_from_tap = 0; > > conn_event(c, conn, SOCK_ACCEPTED); > > > > - if (flowside_from_sock(SOCKFSIDE(conn), PIF_HOST, s, NULL, sa) < 0) { > > - err("tcp: Failed to get local name, connection dropped"); > > - return false; > > - } > > - > > - ASSERT(flowside_complete(SOCKFSIDE(conn))); > > - > > TAPFSIDE(conn)->pif = PIF_TAP; > > TAPFSIDE(conn)->faddr = SOCKFSIDE(conn)->eaddr; > > TAPFSIDE(conn)->fport = SOCKFSIDE(conn)->eport; > > @@ -2712,8 +2703,6 @@ static bool tcp_tap_conn_from_sock(struct ctx *c, > > conn_flag(c, conn, ACK_FROM_TAP_DUE); > > > > tcp_get_sndbuf(conn); > > - > > - return true; > > } > > > > /** > > @@ -2737,15 +2726,21 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref, > > if (s < 0) > > goto cancel; > > > > - if (c->mode == MODE_PASTA && > > - tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, > > - s, (struct sockaddr *)&sa)) > > + if (flowside_from_sock(&flow->f.side[0], ref.tcp_listen.pif, s, > > + NULL, &sa) < 0) { > > + err("tcp: Failed to get local name, connection dropped"); > > + close(s); > > + flow_alloc_cancel(flow); > > return; > > + } > > > > - if (tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, > > - (struct sockaddr *)&sa, now)) > > + if (c->mode == MODE_PASTA && > > + tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, s)) > > return; > > > > + tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, now); > > + return; > > + > > cancel: > > /* Failed to create the connection */ > > if (s >= 0) > > diff --git a/tcp_splice.c b/tcp_splice.c > > index eec02fe..0faeb1b 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -72,6 +72,9 @@ static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE]; > > /* Pool of pre-opened pipes */ > > static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2]; > > > > +#define FSIDE0(conn) (&(conn)->f.side[0]) > > +#define FSIDE1(conn) (&(conn)->f.side[1]) > > + > > #define CONN_V6(x) (x->flags & SPLICE_V6) > > #define CONN_V4(x) (!CONN_V6(x)) > > #define CONN_HAS(conn, set) ((conn->events & (set)) == (set)) > > @@ -280,9 +283,21 @@ bool tcp_splice_flow_defer(union flow *flow) > > static int tcp_splice_connect_finish(const struct ctx *c, > > struct tcp_splice_conn *conn) > > { > > + struct sockaddr_storage sa; > > + socklen_t sl = sizeof(sa); > > unsigned side; > > int i = 0; > > > > + if (getsockname(conn->s[1], (struct sockaddr *)&sa, &sl) < 0) { > > + int ret = -errno; > > + conn_flag(c, conn, CLOSING); > > + return ret; > > + } > > + inany_from_sockaddr(&FSIDE1(conn)->faddr, &FSIDE1(conn)->fport, > > + (struct sockaddr *)&sa); > > + > > + ASSERT(flowside_complete(FSIDE1(conn))); > > + > > for (side = 0; side < SIDES; side++) { > > conn->pipe[side][0] = conn->pipe[side][1] = -1; > > > > @@ -352,13 +367,24 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, > > conn->s[1]); > > } > > > > + /* It would be nicer if we could initialise FSIDE1 all at once with > > + * flowaddrs_from_af() or flowaddrs_from_sock(). However, we can't get > > + * the forwarding port until the connect() has finished and we don't > > + * want to block to wait for it. Meanwhile we have the endpoint address > > [...] endpoint address and port [...]. Or, if "address" includes the > port too, then the comment should also say "forwarding address", not > "forwarding port". > > It's confusing otherwise: why is there anything special with the > endpoint *address* as opposed to the forwarding *port*? > > > + * here, but don't have a place to stash it other than in the flowaddrs > > + * itself. So, initialisation of FSIDE1 is split between here and > > + * tcp_splice_connect_finish(). Ugly but necessary. > > + */ > > if (CONN_V6(conn)) { > > sa = (struct sockaddr *)&addr6; > > sl = sizeof(addr6); > > + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET6, &addr6.sin6_addr); > > } else { > > sa = (struct sockaddr *)&addr4; > > sl = sizeof(addr4); > > + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET, &addr4.sin_addr); > > } > > + FSIDE1(conn)->eport = port; > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson