From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 0CEBD5A005E for ; Tue, 11 Oct 2022 09:43:13 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665474192; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z54yrTka6Js2jt9EqB/Sv7Xf2hGOycbh4Q8Gcp1tlfQ=; b=Bbadku+a5zeHAIxjfXrPyPY3JTG+3ryDXzgDPkn++V8V0pVyZzvoiH5jLpSHgNueR3PbDx lZ4rpC7SbCnTDIz3xt0GF6w4tLfSe6k0H+0gJotSpLSyZFhOO6E5eZrWQySvqBV1Zss3Kp XBc1u4zKW9gDRgTNf51Qnr8MacBqR9U= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-442-YpDlgy2BP5K9LL8KwBfm7A-1; Tue, 11 Oct 2022 03:43:09 -0400 X-MC-Unique: YpDlgy2BP5K9LL8KwBfm7A-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4C09480280D; Tue, 11 Oct 2022 07:43:09 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C4EB840C206B; Tue, 11 Oct 2022 07:43:08 +0000 (UTC) Date: Tue, 11 Oct 2022 09:42:57 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Message-ID: <20221011094257.4df99a36@elisabeth> In-Reply-To: References: <20221010233350.1198630-1-sbrivio@redhat.com> <20221010233350.1198630-3-sbrivio@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: OPSFUOBLRTMHROCP3RZZVZS33QRD5N2P X-Message-ID-Hash: OPSFUOBLRTMHROCP3RZZVZS33QRD5N2P X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 11 Oct 2022 12:19:49 +1100 David Gibson wrote: > On Tue, Oct 11, 2022 at 01:33:49AM +0200, Stefano Brivio wrote: > > It's indeed convenient to use the final destination port in the epoll > > reference data, so that we don't have to check the configured port > > deltas before establishing connections. However, this doesn't work if > > we need to access the original port information to know what to do > > once we receive a connection. > > > > The existing implementation resulted in inbound, spliced connections > > with a remapped destination port (and only those) to be attempted on > > the outbound side, as we would check the wrong position in port > > bitmaps to establish whether to use inbound or outbound sockets. > > > > For listening spliced sockets, set the port index in the epoll > > reference to the original port. Once we get a connection, use the > > port delta arrays to find the final destination port, and the > > original destination port to decide what socket pool we should use. > > > > This avoids the need for a reverse mapping table as implemented for > > UDP. > > > > Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework") > > Signed-off-by: Stefano Brivio > > --- > > tcp.c | 19 ++++++++----------- > > tcp_splice.c | 18 ++++++++++++++---- > > 2 files changed, 22 insertions(+), 15 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index 63153b6..cc08353 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -3084,15 +3084,16 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events, > > void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > const void *addr, const char *ifname, in_port_t port) > > { > > + union tcp_epoll_ref tref_spliced = { .tcp.listen = 1, .tcp.splice = 1 }; > > union tcp_epoll_ref tref = { .tcp.listen = 1 }; > > const void *bind_addr; > > int s; > > > > - if (ns) { > > + if (ns) > > tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]); > > - } else { > > + else > > tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]); > > - } > > + tref_spliced.tcp.index = (in_port_t) port; > > > > if (af == AF_INET || af == AF_UNSPEC) { > > if (!addr && c->mode == MODE_PASTA) > > @@ -3100,8 +3101,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > else > > bind_addr = addr; > > > > - tref.tcp.v6 = 0; > > - tref.tcp.splice = 0; > > + tref.tcp.v6 = tref_spliced.tcp.v6 = 0; > > > > if (!ns) { > > s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname, > > @@ -3118,9 +3118,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > if (c->mode == MODE_PASTA) { > > bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) }; > > > > - tref.tcp.splice = 1; > > s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname, > > - port, tref.u32); > > + port, tref_spliced.u32); > > if (s >= 0) > > tcp_sock_set_bufsize(c, s); > > else > > @@ -3141,9 +3140,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > else > > bind_addr = addr; > > > > - tref.tcp.v6 = 1; > > + tref.tcp.v6 = tref_spliced.tcp.v6 = 1; > > > > - tref.tcp.splice = 0; > > if (!ns) { > > s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname, > > port, tref.u32); > > @@ -3159,9 +3157,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af, > > if (c->mode == MODE_PASTA) { > > bind_addr = &in6addr_loopback; > > > > - tref.tcp.splice = 1; > > s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname, > > - port, tref.u32); > > + port, tref_spliced.u32); > > if (s >= 0) > > tcp_sock_set_bufsize(c, s); > > else > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 96c31c8..bf5f62c 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -35,6 +35,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg) > > static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn, > > in_port_t port) > > { > > - struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 }; > > int *p, i, s = -1; > > + bool inbound; > > > > - if (bitmap_isset(c->tcp.fwd_in.map, port)) > > + if (bitmap_isset(c->tcp.fwd_in.map, port)) { > > p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4; > > - else > > + port += c->tcp.fwd_in.delta[port]; > > + inbound = true; > > + } else { > > p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4; > > + port += c->tcp.fwd_out.delta[port]; > > + inbound = false; > > + } > > This smells wrong to me. AFAICT there's nothing inherently wrong with > forwarding inbound connections to port 5015 to port 5016 in the > namespace *and* forwarding outbound connections to port 5015 to port > 5016 on the host. Hah, yes, right, I didn't think of that. I had originally implemented this before port remapping, and this possibility didn't occur to me later on. > So I think rather than consulting the port map here, each conn object > should know if its an inward or outward connection. To make that work > with epoll, we'd also need a bit in the ref which says whether it's a > socket listening on the host or in the ns. Even if it weren't for the issue you described, that looks much cleaner in any case, thanks for the tip, I'll do this. We still have 8 bits left there. And this way we could also keep the reference format consistent across spliced and non-spliced. -- Stefano