From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 7312C5A0082 for ; Thu, 15 Dec 2022 01:33:25 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4NXY9t03m8z4xTy; Thu, 15 Dec 2022 11:33:22 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1671064402; bh=AG+j5p0suy6j/LGYeZgun+J+2+kVgoO9ljxSOSD54Rk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OQyZvlUmT3SX8ej7lKcZcodopSduQa75epE+eNIBiwb5uzEXyoMfwjthTQm7zpLxa MiIPjhaunI4eCCEnTYBKGD1esYA8RLhbQBfqZsL2TRQXSLAFIY5VCVMxSkHxb/Mn3c Qmhw9VWjCOxIgecnjSKJn1bBTEVK1QrX/coHyfdU= Date: Thu, 15 Dec 2022 11:33:13 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket Message-ID: References: <20221205081425.2614425-1-david@gibson.dropbear.id.au> <20221205081425.2614425-8-david@gibson.dropbear.id.au> <20221213234918.0b51893d@elisabeth> <20221214113554.29c0c196@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="1w4RwC0FFaVRxOtz" Content-Disposition: inline In-Reply-To: <20221214113554.29c0c196@elisabeth> Message-ID-Hash: GX3O5XJUOZ5QNGXWAANNF2VITCVC6JG5 X-Message-ID-Hash: GX3O5XJUOZ5QNGXWAANNF2VITCVC6JG5 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --1w4RwC0FFaVRxOtz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 14, 2022 at 11:35:54AM +0100, Stefano Brivio wrote: > On Wed, 14 Dec 2022 12:47:25 +1100 > David Gibson wrote: >=20 > > On Tue, Dec 13, 2022 at 11:49:18PM +0100, Stefano Brivio wrote: > > > On Mon, 5 Dec 2022 19:14:24 +1100 > > > David Gibson wrote: > > > =20 > > > > Currently we have special sockets for receiving datagrams from loca= host > > > > which can use the optimized "splice" path rather than going across = the tap > > > > interface. > > > >=20 > > > > We want to loosen this so that sockets can receive sockets that wil= l be > > > > forwarded by both the spliced and non-spliced paths. To do this, w= e alter > > > > the meaning of the @splice bit in the reference to mean that packets > > > > receieved on this socket *can* be spliced, not that they *will* be = spliced. > > > > They'll only actually be spliced if they come from 127.0.0.1 or ::1. > > > >=20 > > > > We can't (for now) remove the splice bit entirely, unlike with TCP.= Our > > > > gateway mapping means that if the ns initiates communication to the= gw > > > > address, we'll translate that to target 127.0.0.1 on the host side.= Reply > > > > packets will therefore have source address 127.0.0.1 when received = on the > > > > host, but these need to go via the tap path where that will be tran= slated > > > > back to the gateway address. We need the @splice bit to distinguis= h that > > > > case from packets going from localhost to a port mapped explicitly = with > > > > -u which should be spliced. > > > >=20 > > > > Signed-off-by: David Gibson > > > > --- > > > > udp.c | 54 +++++++++++++++++++++++++++++++++++------------------- > > > > udp.h | 2 +- > > > > 2 files changed, 36 insertions(+), 20 deletions(-) > > > >=20 > > > > diff --git a/udp.c b/udp.c > > > > index 6ccfe8c..011a157 100644 > > > > --- a/udp.c > > > > +++ b/udp.c > > > > @@ -513,16 +513,27 @@ static int udp_splice_new_ns(void *arg) > > > > } > > > > =20 > > > > /** > > > > - * sa_port() - Determine port from a sockaddr_in or sockaddr_in6 > > > > + * udp_mmh_splice_port() - Is source address of message suitable f= or splicing? > > > > * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)? > > > > - * @sa: Pointer to either sockaddr_in or sockaddr_in6 > > > > + * @mmh: mmsghdr of incoming message > > > > + * > > > > + * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port = =66rom > > > > + * @sa, otherwise 0. > > > > + * > > > > + * NOTE: this relies on the fact that it's not valid to use UDP po= rt 0 =20 > > >=20 > > > The port is reserved by IANA indeed, but... it can actually be used. = On > > > Linux, you can bind() it and you can connect() to it. As far as I can > > > tell from the new version of udp_sock_handler() we would actually > > > misdirect packets in that case. =20 > >=20 > > Hm, ok. Given the IANA reservation, I think it would be acceptable to > > simply drop such packets - but if we were to make that choice we > > should do so explicitly, rather than misdirecting them. >=20 > Acceptable, sure, but... I don't know, it somehow doesn't look > desirable to me. The kernel doesn't enforce this, so I guess we > shouldn't either. >=20 > > > How bad would it be to use an int here? =20 > >=20 > > Pretty straightforward. Just means we have to use the somewhat > > abtruse "if (port <=3D USHRT_MAX)" or "if (port >=3D 0)" or something > > instead of just "if (port)". Should I go ahead and make that change? >=20 > Eh, I don't like it either, but... I guess it's better than the > alternative, so yes, thanks. Or pass port as a pointer, set on return. > I'm fine with both. I think the int is less ugly than the pointer. Ok, I'll make that change. > > > By the way, I think the comment should also mention that the port is > > > returned in host order. =20 > >=20 > > Ok, easily done. Generally I try to keep the endianness associated > > with the type, rather than attempting to document it for each variable > > (or even worse, each point in time for each variable). >=20 > Yes, I see, and it's a more valid approach than mine, but still mine > comes almost for free. Right. Actually rereading the function in question, it specifically says "port from @sa" and in the sockaddr, of course, it's network endian, so it could particularly do with clarification in this case. > By the way, I got distracted by this and I forgot about two things: >=20 > > +static in_port_t udp_mmh_splice_port(bool v6, const struct mmsghdr *mm= h) > > { > > - const struct sockaddr_in6 *sa6 =3D sa; > > - const struct sockaddr_in *sa4 =3D sa; > > + const struct sockaddr_in6 *sa6 =3D mmh->msg_hdr.msg_name; > > + const struct sockaddr_in *sa4 =3D mmh->msg_hdr.msg_name;; >=20 > Stray semicolon here. Fixed. > > + > > + if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr)) > > + return ntohs(sa6->sin6_port); > > =20 > > - return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port); > > + if (ntohl(sa4->sin_addr.s_addr) =3D=3D INADDR_LOOPBACK) > > + return ntohs(sa4->sin_port); >=20 > If it's IPv6, but not a loopback address, we'll check if > sa4->sin_addr.s_addr =3D=3D INADDR_LOOPBACK -- which might actually be tr= ue for > an IPv6, non-loopback address. Oops, yes, that's definitely wrong, and wrong enough to respin. > Also, I think we can happily "splice" for any loopback address, not > just 127.0.0.1. What about something like: Well.. yes and no. Yes, we can deliver packets in this way, but we'll lost track of the original from address, so reply packets will be misdirected. Hrm.. ISTR you mentioned some cases that worked despite that, so I'll make the change, and hope to fix it up better when I get to the NAT / splice semantics rework. > if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr)) > return ntohs(sa6->sin6_port); >=20 > if (!v4 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr)) > return ntohs(sa4->sin_port); >=20 > return -1; >=20 > ? >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --1w4RwC0FFaVRxOtz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmOaayIACgkQgypY4gEw YSI9RBAA0pDYFBY97pScObSJvY0soEbcrZH869MWiCEfxl7rINoec/5rqd2aNvCi trqQw+FbFvaVmb+JsVK9a+GzUEfLBMYYPoyzhKYax9gB9tHKUXHL8CNi5oFR0XPk vbZirJax+cq0uz5A2+0tSOH4FncwwFq9ldyEya88nD8ubUFo0x8aEYYgs+gqJ7f8 Pbl071judhppSsrylu9jJM/ntuiO94xcjCiQveQA8ZgvxHnhYgLhnHE/qJgvaC0z ciwKNygLgK1saH5Z9lH8wlLuLQSzqawZiY34IT0qrUU5rGl9LCZ9LCcuJTrW8bkd UjEYfXdGY31LU/2RtjjmzF4wNrm8+ygYvQlb9PsUqb2gi8H7AHGObIwnJ/5P2qvU OdjwteiN8eRl/w75m5j72iU3OLIX26/ELNn+t03mJ726KwI+h+63xxP0AVx60AUJ eqRkB0EKQYGBXrL2EziN4JQZEVeBVl2YlyzYiJUpcK6nVIhuOZR0oMegN3ed21SR Su7VVYhDtAahZIUBu8fr7G0JbB8MQrNplHFTrKU0leykouiExC3MsF6OFuEYhV0v ZrU8yLaBPaKGmavzKUWDdJABQaE9nxuqnF5YsnKnHBkWSXwTRGfKZ4QWC86O0ba6 zt4GWoV/VXV+9PmQgtwMDS3/K/LPL4TUI4WZXXg79v766LLjMF4= =y96d -----END PGP SIGNATURE----- --1w4RwC0FFaVRxOtz--