From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from imap.gmail.com [173.194.76.109] by localhost with POP3 (fetchmail-6.3.26) for (single-drop); Mon, 20 May 2024 07:57:28 +0200 (CEST) Received: by 2002:a05:6a10:9148:b0:55f:c3c0:ed08 with SMTP id n8csp190798pxb; Sun, 19 May 2024 22:56:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXcMwqFXzDBZbRJqSzwvEpGcvZzwUF8ZsG8GNuZCW6Wonrm/TmvcNT/h0mWy5clqeXtReLn3ZCwDCPxTOAS0N1oNM8YKGAQSZA= X-Google-Smtp-Source: AGHT+IGn+nlIrBCw+3KGNNhSUYpS4u7udcJ2t85DcktmVSXzI1c7ZsXShFwidOCJuyL/MSbnKssH X-Received: by 2002:a25:9702:0:b0:de6:4ff:3164 with SMTP id 3f1490d57ef6-dee4f2eb5b2mr27063863276.36.1716184610922; Sun, 19 May 2024 22:56:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716184610; cv=none; d=google.com; s=arc-20160816; b=M9hFxhcZjd/B+iRAKhzxYXMsrw9slOkpTSDxq9tKvAuwg/HMdf4lytGSzT7E+rROSX yl/n/ZeCwOEijMNlOX18WfFnJUfntPjE55qSS4B5PXe2X9E9uyR99vEhzgnABli0uc7q znkXkLKxXT8PbYNR5UnN7SiRYbT/QdzlahfpQKmztHYiQtLxwSIAUuIVcQv50j7u+0ne jE2jdks7MCIZmWXfNAm3Y+jlaRXSiJ65OtFi/MzRHjEsBXgKM5wN+OCSbYHmFWP1nTyB HjH/pGOkGZJ5NzYxWukr76+fhkXBBwsTXOOmDFzAnUOIz3KNrH85JR9Zo2JVGLyGY7hX Anrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-post:list-owner:list-help :list-archive:list-archive:archived-at:archived-at:list-id :precedence:cc:message-id-hash:in-reply-to:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature:delivered-to; bh=KGkw1gxLAKjcK8Z6VhqySU2WvIgtJ5+OOFb+XXlKVGw=; fh=htis3z6nDPD3FFPmKX7SGQyhLv1B9tDlnJYWrC8d10o=; b=lD2m5cfOVAPVvWjXaw1Kf/UPDubEO6x7i5lx9ewSbD9nvb5HwFpbWGa5lqCXcbeEkA mkc+4VkiD6y10hhoc+qOb4xP5vUkxDW0BW/8vEiHTjtsedLj8GSenzA7864UeIP7QBO3 YLIfR86AQ1GSINDIm56j2CbTexVSt0K2Q//8oAsUESv48tdcvLgP/eOSmKfF4qYBNmJ2 wo+fKSYhoD0gxwgmBQ8SOHMghrayyk1Cisxkz+K0n0KbExSDjNGohzwDJnaJJFVQ2C9u JQw8YLD6IYj+w0AA6aTBGkRf0jK1sgzC1KVfIjbIHP8rVklTHRfFk/aOuLw/LZY6QHLq vpwQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@gibson.dropbear.id.au header.s=202312 header.b=VoUMH8zg; spf=pass (google.com: domain of passt-dev-bounces@passt.top designates 88.198.0.164 as permitted sender) smtp.mailfrom=passt-dev-bounces@passt.top Return-Path: Received: from us-smtp-inbound-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com. [170.10.128.131]) by mx.google.com with ESMTPS id 6a1803df08f44-6a9b2fd6087si27326646d6.487.2024.05.19.22.56.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 May 2024 22:56:50 -0700 (PDT) Received-SPF: pass (google.com: domain of passt-dev-bounces@passt.top designates 88.198.0.164 as permitted sender) client-ip=88.198.0.164; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@gibson.dropbear.id.au header.s=202312 header.b=VoUMH8zg; spf=pass (google.com: domain of passt-dev-bounces@passt.top designates 88.198.0.164 as permitted sender) smtp.mailfrom=passt-dev-bounces@passt.top Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-635-yyU29HaVNsWOQ-s5GF5gTw-1; Mon, 20 May 2024 01:56:49 -0400 X-MC-Unique: yyU29HaVNsWOQ-s5GF5gTw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F0A7C1C031B7 for ; Mon, 20 May 2024 05:56:48 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id ED41B1054820; Mon, 20 May 2024 05:56:48 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast02.extmail.prod.ext.rdu2.redhat.com [10.11.55.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B4A0D105480A for ; Mon, 20 May 2024 05:56:48 +0000 (UTC) Received: from us-smtp-inbound-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [170.10.128.131]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 964F98058D4 for ; Mon, 20 May 2024 05:56:48 +0000 (UTC) Received: from passt.top (passt.top [88.198.0.164]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-517-ezYsxz37PXCqKFqcK5v-GA-1; Mon, 20 May 2024 01:56:43 -0400 X-MC-Unique: ezYsxz37PXCqKFqcK5v-GA-1 Received: from [127.0.0.1] (localhost.localdomain [127.0.0.1]) by passt.top (Postfix) with ESMTP id 6630F5A0305; Mon, 20 May 2024 07:56:41 +0200 (CEST) Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id ABD1B5A004C for ; Mon, 20 May 2024 07:56:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1716184595; bh=KGkw1gxLAKjcK8Z6VhqySU2WvIgtJ5+OOFb+XXlKVGw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VoUMH8zgn+wsGCVmJAUVigBgqIxTt/aeUthO/ZJJFgod2uAkkaqmrjBLXXZxfuGQv Lo3qLNx+cBiHEdgTuCTMXXoU0s4/Cz5jF/PhMGBPALNGtJUra9Q7UGTiBIs9yMmjIG aTgT5hC7FgpMMF/5qUbzp5Ptbd1jQ8V8VlIMENeC9CP2VGOzN6Av6JeYphYz/SOd+5 9CrdheVXpH2AlFiKhmDjfxPIIR7Ep2mgD3d0ni/NRQF2tWkMRq6xXGucHMrbnvbh75 pprfH+f3a0z5cTDE9KBOyh51DqcAUqtbb/bVyqFYRuu1LlVNiDS3TJavgNIs2jwFPx s4FWinKOBog6Q== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4VjRdv34CSz4wny; Mon, 20 May 2024 15:56:35 +1000 (AEST) Date: Mon, 20 May 2024 15:44:07 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v5 18/19] flow, tcp: Flow based NAT and port forwarding for TCP Message-ID: References: <20240514010337.1104606-1-david@gibson.dropbear.id.au> <20240514010337.1104606-19-david@gibson.dropbear.id.au> <20240518001345.2d127b09@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mI7V1mV7fOMlbuTr" Content-Disposition: inline In-Reply-To: <20240518001345.2d127b09@elisabeth> Message-ID-Hash: CA6ABM6JJX44JCPKSEVH6XJVYROUT63Z X-Message-ID-Hash: CA6ABM6JJX44JCPKSEVH6XJVYROUT63Z X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 --mI7V1mV7fOMlbuTr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 18, 2024 at 12:13:45AM +0200, Stefano Brivio wrote: > On Tue, 14 May 2024 11:03:36 +1000 > David Gibson wrote: >=20 > > Currently the code to translate host side addresses and ports to guest = side > > addresses and ports, and vice versa, is scattered across the TCP code. > > This includes both port redirection as controlled by the -t and -T opti= ons, > > and our special case NAT controlled by the --no-map-gw option. > >=20 > > Gather this logic into fwd_from_*() functions for each input interface > > in fwd.c which take protocol and address information for the initiating > > side and generates the pif and address information for the forwarded si= de. > > This performs any NAT or port forwarding needed. > >=20 > > We create a flow_forward() helper which applies those forwarding functi= ons > > as needed to automatically move a flow from INI to FWD state. For now = we > > leave the older flow_forward_af() function taking explicit addresses as > > a transitional tool. > >=20 > > Signed-off-by: David Gibson > > --- > > flow.c | 53 +++++++++++++++++++++++++ > > flow_table.h | 2 + > > fwd.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > fwd.h | 12 ++++++ > > tcp.c | 102 +++++++++++++++-------------------------------- > > tcp_splice.c | 63 ++--------------------------- > > tcp_splice.h | 5 +-- > > 7 files changed, 213 insertions(+), 134 deletions(-) > >=20 > > diff --git a/flow.c b/flow.c > > index 4942075..a6afe39 100644 > > --- a/flow.c > > +++ b/flow.c > > @@ -304,6 +304,59 @@ const struct flowside *flow_forward_af(union flow = *flow, uint8_t pif, > > return fwd; > > } > > =20 > > + > > +/** > > + * flow_forward() - Determine where flow should forward to, and move t= o FWD > > + * @c: Execution context > > + * @flow: Flow to forward > > + * @proto: Protocol > > + * > > + * Return: pointer to the forwarded flowside information > > + */ > > +const struct flowside *flow_forward(const struct ctx *c, union flow *f= low, > > + uint8_t proto) > > +{ > > + char estr[INANY_ADDRSTRLEN], fstr[INANY_ADDRSTRLEN]; > > + struct flow_common *f =3D &flow->f; > > + const struct flowside *ini =3D &f->side[INISIDE]; > > + struct flowside *fwd =3D &f->side[FWDSIDE]; > > + uint8_t pif1 =3D PIF_NONE; >=20 > This could now be 'pif_fwd' / 'pif_tgt', right? Good idea, changed. [snip] > > diff --git a/fwd.c b/fwd.c > > index b3d5a37..5fe2361 100644 > > --- a/fwd.c > > +++ b/fwd.c > > @@ -25,6 +25,7 @@ > > #include "fwd.h" > > #include "passt.h" > > #include "lineread.h" > > +#include "flow_table.h" > > =20 > > /* See enum in kernel's include/net/tcp_states.h */ > > #define UDP_LISTEN 0x07 > > @@ -154,3 +155,112 @@ void fwd_scan_ports_init(struct ctx *c) > > &c->tcp.fwd_out, &c->tcp.fwd_in); > > } > > } > > + > > +uint8_t fwd_from_tap(const struct ctx *c, uint8_t proto, > > + const struct flowside *a, struct flowside *b) >=20 > A function comment would be nice to have, albeit a bit redundant. Ah, yes. I meant to go back and add these, but obviously forgot. Fixed now. > Now > 'a' and 'b' could also be called 'ini' and 'tgt' I guess? Also a good idea, done. > > +{ > > + (void)proto; > > + > > + b->eaddr =3D a->faddr; > > + b->eport =3D a->fport; > > + > > + if (!c->no_map_gw) { > > + struct in_addr *v4 =3D inany_v4(&b->eaddr); > > + > > + if (v4 && IN4_ARE_ADDR_EQUAL(v4, &c->ip4.gw)) > > + *v4 =3D in4addr_loopback; > > + if (IN6_ARE_ADDR_EQUAL(&b->eaddr, &c->ip6.gw)) > > + b->eaddr.a6 =3D in6addr_loopback; >=20 > I haven't tested this, but I'm a bit lost: I thought that in this case > we would also set b->faddr here. Where does that happen? Ah.. right. So notionally we should set tgt->faddr here. However, because in this case we're forwarding to PIF_HOST we don't actually know tgt->faddr (or tgt->fport) without a getsockname() call, so we're leaving them blank. They will, in fact, be blank because we zero the entire entry in flow_alloc(). That's pretty non-obvious though, I'll change this to explicitly set faddr and fport with a comment. > > + } > > + > > + return PIF_HOST; > > +} > > + > > +uint8_t fwd_from_splice(const struct ctx *c, uint8_t proto, > > + const struct flowside *a, struct flowside *b) > > +{ > > + const struct in_addr *ae4 =3D inany_v4(&a->eaddr); > > + > > + if (!inany_is_loopback(&a->eaddr) || > > + (!inany_is_loopback(&a->faddr) && !inany_is_unspecified(&a->faddr= ))) { > > + char estr[INANY_ADDRSTRLEN], fstr[INANY_ADDRSTRLEN]; > > + > > + debug("Non loopback address on %s: [%s]:%hu -> [%s]:%hu", > > + pif_name(PIF_SPLICE), > > + inany_ntop(&a->eaddr, estr, sizeof(estr)), a->eport, > > + inany_ntop(&a->faddr, fstr, sizeof(fstr)), a->fport); > > + return PIF_NONE; > > + } > > + > > + if (ae4) > > + inany_from_af(&b->eaddr, AF_INET, &in4addr_loopback); > > + else > > + inany_from_af(&b->eaddr, AF_INET6, &in6addr_loopback); > > + > > + b->eport =3D a->fport; > > + > > + if (proto =3D=3D IPPROTO_TCP) > > + b->eport +=3D c->tcp.fwd_out.delta[b->eport]; > > + > > + return PIF_HOST; > > +} > > + > > +uint8_t fwd_from_host(const struct ctx *c, uint8_t proto, > > + const struct flowside *a, struct flowside *b) > > +{ > > + struct in_addr *bf4; > > + > > + if (c->mode =3D=3D MODE_PASTA && inany_is_loopback(&a->eaddr) && > > + proto =3D=3D IPPROTO_TCP) { > > + /* spliceable */ >=20 > Before we conclude this, does f->pif[INISIDE] =3D=3D PIF_HOST in the call= er > guarantee that inany_is_loopback(&a->faddr), too? Only in the sense that if we accept()ed a connection from a loopback address on a socket not bound to a loopback address (or ANY), then the kernel has done something wrong. This kind of has the inverse of the issue above: we don't necessarily know the forwarding address here - we only know that with either a getsockname(), or by looking at the bound address of the listening socket (which might be unspecified). > If not, we shouldn't > splice unless that's true as well. So I'm pretty confident what we do here is equivalent to what we did before. That might not be correct, but fixing that is for a different patch. Making problems like that more obvious is one of the advantages I expect for gathering all this forwarding logic into one place. > > + b->faddr =3D a->eaddr; > > + > > + if (inany_v4(&a->eaddr)) > > + inany_from_af(&b->eaddr, AF_INET, &in4addr_loopback); > > + else > > + inany_from_af(&b->eaddr, AF_INET6, &in6addr_loopback); > > + b->eport =3D a->fport; > > + if (proto =3D=3D IPPROTO_TCP) > > + b->eport +=3D c->tcp.fwd_in.delta[b->eport]; > > + > > + return PIF_SPLICE; > > + } > > + > > + b->faddr =3D a->eaddr; > > + b->fport =3D a->eport; > > + > > + bf4 =3D inany_v4(&b->faddr); > > + > > + if (bf4) { > > + if (IN4_IS_ADDR_LOOPBACK(bf4) || > > + IN4_IS_ADDR_UNSPECIFIED(bf4) || > > + IN4_ARE_ADDR_EQUAL(bf4, &c->ip4.addr_seen)) > > + *bf4 =3D c->ip4.gw; > > + } else { > > + struct in6_addr *bf6 =3D &b->faddr.a6; > > + > > + if (IN6_IS_ADDR_LOOPBACK(bf6) || > > + IN6_ARE_ADDR_EQUAL(bf6, &c->ip6.addr_seen) || > > + IN6_ARE_ADDR_EQUAL(bf6, &c->ip6.addr)) { > > + if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) > > + *bf6 =3D c->ip6.gw; > > + else > > + *bf6 =3D c->ip6.addr_ll; > > + } > > + } > > + > > + if (bf4) { > > + inany_from_af(&b->eaddr, AF_INET, &c->ip4.addr_seen); > > + } else { > > + if (IN6_IS_ADDR_LINKLOCAL(&b->faddr.a6)) > > + b->eaddr.a6 =3D c->ip6.addr_ll_seen; > > + else > > + b->eaddr.a6 =3D c->ip6.addr_seen; > > + } > > + > > + b->eport =3D a->fport; > > + if (proto =3D=3D IPPROTO_TCP) > > + b->eport +=3D c->tcp.fwd_in.delta[b->eport]; >=20 > As we do this in any case, spliced or not spliced, I would find it less > confusing to have these assignments in common, earlier (I just spent > half an hour trying to figure out why you wouldn't set b->eport for the > non-spliced case...). Fair point. This was just because I thought my way through the two cases separately. I've made this stanza common. [snip] > > +static void tcp_tap_conn_from_sock(struct ctx *c, union flow *flow, in= t s, > > const struct timespec *now) > > { > > - union inany_addr saddr, daddr; /* FIXME: avoid bulky temporaries */ > > - struct tcp_tap_conn *conn; > > - in_port_t srcport; > > + struct tcp_tap_conn *conn =3D FLOW_SET_TYPE(flow, FLOW_TCP, tcp); > > uint64_t hash; > > =20 > > - inany_from_sockaddr(&saddr, &srcport, sa); > > - tcp_snat_inbound(c, &saddr); > > - > > - if (inany_v4(&saddr)) { > > - inany_from_af(&daddr, AF_INET, &c->ip4.addr_seen); > > - } else { > > - if (IN6_IS_ADDR_LINKLOCAL(&saddr)) > > - daddr.a6 =3D c->ip6.addr_ll_seen; > > - else > > - daddr.a6 =3D c->ip6.addr_seen; > > - } > > - dstport +=3D c->tcp.fwd_in.delta[dstport]; > > - > > - flow_forward_af(flow, PIF_TAP, AF_INET6, > > - &saddr, srcport, &daddr, dstport); > > - conn =3D FLOW_SET_TYPE(flow, FLOW_TCP, tcp); > > - > > +=09 >=20 > Excess newline and tab. Looks like I already fixed that. [snip] > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -395,71 +395,18 @@ static int tcp_conn_sock_ns(const struct ctx *c, = sa_family_t af) > > /** > > * tcp_splice_conn_from_sock() - Attempt to init state for a spliced c= onnection > > * @c: Execution context > > - * @pif0: pif id of side 0 > > - * @dstport: Side 0 destination port of connection > > * @flow: flow to initialise > > * @s0: Accepted (side 0) socket > > * @sa: Peer address of connection > > * > > - * Return: true if able to create a spliced connection, false otherwise >=20 > Not related to this patch, but I think we should probably describe in > the theory of operation for flows what's the threshold between calling > flow_alloc_cancel() on a flow (which would imply returning something > here, in case tcp_splice_connect() fails), and deferring that instead > to a CLOSING state. That's included in the new descriptions of the flow states. There might be a way to make it more obvious, but I'm not immediately sure of it. In any case the answer is: you can't cancel once you FLOW_ACTIVATE(). [snip] --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --mI7V1mV7fOMlbuTr Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZK4xYACgkQzQJF27ox 2GdndRAApBOWbGjMRg7lyFY4yXZRj/2tuImtE3pcz3Y57roYox57OfQKWbg42H61 mNzmEy8x38M7BOXuU5Om2+SeGkbEWuOPz0v20ybFWEPYA+flR0Dc8sKQwB2L0sGc 4dHzO4tTuPaeSS9zYf97dJQCpnfO1xm/g1ADrgpxp08cGyU0XcqDDc6bRuRF8Y1S NqG9sGVy9puCChQrj8Tj6X95nZQJEmKCShALwk+n9tFaMbSkL7pYNmgtX7/4jmGK ObsVIQ6OXT+6wp1SumX25zbHO4/+FaN8XeIyU1A74Q6grNekaZwfNktHWm//EMKb 6qHSIB0M0LPFdTWgmbb7/D36HdFxRwgVTMRt/dNJ7nza0ovGtfPqf5uXjByuJ1dk Uw2aOoJ1hdDaoVnySoGwarAICCSlQq9pscHOhj9w1apKV9qExh/i5/Rts0Z8caJI yqnhwwL24HA7c+SaswtEtSJ4xhHAfRBlVJvkSwlpHxW2+IQsf7MeifT+Bq7MZeJb w6VEopVfpn1mtd6kQ2qwx5riHQ9bf5rnHeAG6xcMcLW7tJHsS93yuSeUg55bmFRc pO/xU+fOy6+jQBAVnTXnmFt1Wm3xoA7EueO5DT3t5Ru7k7fZ+dkVGTtJsNZ73gug 6nUewTw8fUucqgzKncyH8+x/nvCCNF99Jfe2owvORHIYKHmnY1o= =p8g6 -----END PGP SIGNATURE----- --mI7V1mV7fOMlbuTr--