From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 13B3F5A0051 for ; Wed, 19 Jun 2024 03:50:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1718761854; bh=yvgRf0pY1sNY2kKOwjyULQsMcwYrBETTzYCRSD+0BWk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rqHQqaQz00UOkT0QEYDrAlNX2YfV1nwLwWjlLY0KczJISXfPQHB/qXkEzQct+xj1j 23cAwdDdAKMOf4vsUlRpZL5xXWOtFe/pH2elEbewLRZUfBLCUqa7RntBvXXLdld7b7 0/gu8as+lqTNKA4ARS/sba5sA3ZuJTKayqOdZfd2NpqEexf2hHmQexGqIXQKKHM2qi L3nh6i6MafWfeGUFMFiFmkPtUWf/nKhf3IPYlXRgJ0X+6pWHmQaeu6QXE8U/5shRTh N5dtCv48t3mik92sDZEF9eTHpTwhvhZUkw14UkdTxEVYfiSu+Xuvw8rrVXplkYIXib YhQPpXOqdr8Dg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4W3mmZ36zHz4wb7; Wed, 19 Jun 2024 11:50:54 +1000 (AEST) Date: Wed, 19 Jun 2024 11:27:09 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] tcp: Don't rely on bind() to fail to decide that connection target is valid Message-ID: References: <20240618163057.1905355-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="hqxKDUomGvX4wWk9" Content-Disposition: inline In-Reply-To: <20240618163057.1905355-1-sbrivio@redhat.com> Message-ID-Hash: GPY5R66SGJBRQZJ4BZZLOLNEDO4HYHD2 X-Message-ID-Hash: GPY5R66SGJBRQZJ4BZZLOLNEDO4HYHD2 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --hqxKDUomGvX4wWk9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 18, 2024 at 06:30:57PM +0200, Stefano Brivio wrote: > Commit e1a2e2780c91 ("tcp: Check if connection is local or low RTT > was seen before using large MSS") added a call to bind() before we > issue a connect() to the target for an outbound connection. >=20 > If bind() fails, but neither with EADDRNOTAVAIL, nor with EACCESS, we > can conclude that the target address is a local (host) address, and we > can use an unlimited MSS. >=20 > While at it, according to the reasoning of that commit, if bind() > succeeds, we would know right away that nobody is listening at that > (local) address and port, and we don't even need to call connect(): we > can just fail early and reset the connection attempt. >=20 > But if non-local binds are enabled via net.ipv4.ip_nonlocal_bind or > net.ipv6.ip_nonlocal_bind sysctl, binding to a non-local address will > actually succeed, so we can't rely on it to fail in general. >=20 > The visible issue with the existing behaviour is that we would reset > any outbound connection to non-local addresses, if non-local binds are > enabled. >=20 > Keep the significant optimisation for local addresses along with the > bind() call, but if it succeeds, don't draw any conclusion: close the > socket, grab another one, and proceed normally. >=20 > This will incur a small latency penalty if non-local binds are > enabled (we'll likely fetch an existing socket from the pool but > additionally call close()), or if the target is local but not bound: > we'll need to call connect() and get a failure before relaying that > failure back. >=20 > Link: https://github.com/containers/podman/issues/23003 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > v2: Avoid (bogus) reports of uninitialised 'sa' in bind() using an > assertion on 'af' (it must be AF_INET or AF_INET6) >=20 > tcp.c | 48 +++++++++++++++++++++++++++++++----------------- > 1 file changed, 31 insertions(+), 17 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 231f63b..698e7ec 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1623,6 +1623,9 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fam= ily_t af, > =20 > flow_initiate(flow, PIF_TAP); > =20 > + flow_target(flow, PIF_HOST); > + conn =3D FLOW_SET_TYPE(flow, FLOW_TCP, tcp); > + > if (af =3D=3D AF_INET) { > if (IN4_IS_ADDR_UNSPECIFIED(saddr) || > IN4_IS_ADDR_BROADCAST(saddr) || > @@ -1639,6 +1642,9 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fam= ily_t af, > dstport); > goto cancel; > } > + > + sa =3D (struct sockaddr *)&addr4; > + sl =3D sizeof(addr4); > } else if (af =3D=3D AF_INET6) { > if (IN6_IS_ADDR_UNSPECIFIED(saddr) || > IN6_IS_ADDR_MULTICAST(saddr) || srcport =3D=3D 0 || > @@ -1653,6 +1659,11 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fa= mily_t af, > dstport); > goto cancel; > } > + > + sa =3D (struct sockaddr *)&addr6; > + sl =3D sizeof(addr6); > + } else { > + ASSERT(0); > } > =20 > if ((s =3D tcp_conn_sock(c, af)) < 0) > @@ -1665,6 +1676,26 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fa= mily_t af, > addr6.sin6_addr =3D in6addr_loopback; > } > =20 > + /* Use bind() to check if the target address is local (EADDRINUSE or > + * similar) and already bound, and set the LOCAL flag in that case. > + * > + * If bind() succeeds, in general, we could infer that nobody (else) is > + * listening on that address and port and reset the connection attempt > + * early, but we can't rely on that if non-local binds are enabled, > + * because bind() would succeed for any non-local address we can reach. > + * > + * So, if bind() succeeds, close the socket, get a new one, and proceed. > + */ > + if (bind(s, sa, sl)) { > + if (errno !=3D EADDRNOTAVAIL && errno !=3D EACCES) > + conn_flag(c, conn, LOCAL); > + } else { > + /* Not a local, bound destination, inconclusive test */ > + close(s); > + if ((s =3D tcp_conn_sock(c, af)) < 0) > + goto cancel; > + } > + > if (af =3D=3D AF_INET6 && IN6_IS_ADDR_LINKLOCAL(&addr6.sin6_addr)) { > struct sockaddr_in6 addr6_ll =3D { > .sin6_family =3D AF_INET6, > @@ -1675,8 +1706,6 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fam= ily_t af, > goto cancel; > } > =20 > - flow_target(flow, PIF_HOST); > - conn =3D FLOW_SET_TYPE(flow, FLOW_TCP, tcp); > conn->sock =3D s; > conn->timer =3D -1; > conn_event(c, conn, TAP_SYN_RCVD); > @@ -1698,14 +1727,6 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fa= mily_t af, > =20 > inany_from_af(&conn->faddr, af, daddr); > =20 > - if (af =3D=3D AF_INET) { > - sa =3D (struct sockaddr *)&addr4; > - sl =3D sizeof(addr4); > - } else { > - sa =3D (struct sockaddr *)&addr6; > - sl =3D sizeof(addr6); > - } > - > conn->fport =3D dstport; > conn->eport =3D srcport; > =20 > @@ -1718,13 +1739,6 @@ static void tcp_conn_from_tap(struct ctx *c, sa_fa= mily_t af, > =20 > tcp_hash_insert(c, conn); > =20 > - if (!bind(s, sa, sl)) { > - tcp_rst(c, conn); /* Nobody is listening then */ > - goto cancel; > - } > - if (errno !=3D EADDRNOTAVAIL && errno !=3D EACCES) > - conn_flag(c, conn, LOCAL); > - > if ((af =3D=3D AF_INET && !IN4_IS_ADDR_LOOPBACK(&addr4.sin_addr)) || > (af =3D=3D AF_INET6 && !IN6_IS_ADDR_LOOPBACK(&addr6.sin6_addr) && > !IN6_IS_ADDR_LINKLOCAL(&addr6.sin6_addr))) --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --hqxKDUomGvX4wWk9 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZyM+wACgkQzQJF27ox 2GePYg/9GMPFJANkhjwmRVnPLwGvL4tcHFvU6P+irUHVLruVavSy639XbgAOh6Us NU89EddroKiYOvKd89gV6YHyuBE4HGY3psfoYyS/MkpUpa1QOUcedokUZWOHNARx FcYy6AN4BOl76bUJz6GCGunaNADMdBSzDkkXT+yschAjvUU2PumxvDwBHoT5jgL7 IZ1FZE1piBy0Ii+/OjeDlD0attp6QE2I3Vj0cLZt/ZGzhBmFXa+shDEEWVmD3Lna 69JcQ14zNbeb47b7uO0fuzU6E0OSKgXUWwpTzUr/GixhKsfUrtTF9l+fKVRf/t3j XAJnpmkVqWN65rNWkDdA0GNIS4X5I1bazA0JeNh0yTk+WTh2fgetL0UkLLl//Upd pHVN0yNXr2d6RIUp5Qx0bprDsq2Oxruy4gunmA5imZFrbO4w49KrJB9sA8jm6u9d L86YXY7l6fa0/vgpIzQe1E/KCmjigSg6UzrslbXtch2ToKVnpHkgMHpV19zE8R14 7YEe3Sf0bVeGlo3h41bVcuIJgqhaY9biwHM7QkGpMqEFUKbSZAV8Cd1tkQ/nAWpp 4uwKRt2qkKYiDbekKNZxifpJP1/hkMv6hVPirdDme3gfBJVphQ9BpGL5UhlW006A MKNBt0i34F0yAd8dR1Autp7qXoZ89IsTRY+2QpDeTZqZP6GpFv4= =K8pG -----END PGP SIGNATURE----- --hqxKDUomGvX4wWk9--