From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id AB4DA5A0082 for ; Wed, 14 Dec 2022 11:36:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671014159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mF1wHm02JbsJ7iFXFy/EM5pY7ENkZubaIF5M56nPRSQ=; b=AWjbRWXvAjEI20S93XXAJp5uwQgt849mXyat0cBwQ+ax/I4JoCnkK+P53CIrANayMNV7ef WpUYoN31zcrszPUq1QonPjaMZhd+dvQs+JPYaKnf8lHIQ8JQcA9B29bvHtJ5aQh03i16IG j71HMDy5Woxy/OtNdxPKmHuFzpt7REE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-Ypx2WaflP8-QmH5NwfqmDA-1; Wed, 14 Dec 2022 05:35:57 -0500 X-MC-Unique: Ypx2WaflP8-QmH5NwfqmDA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2EC093C0252D; Wed, 14 Dec 2022 10:35:57 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-4.brq.redhat.com [10.40.208.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D05FC1121314; Wed, 14 Dec 2022 10:35:56 +0000 (UTC) Date: Wed, 14 Dec 2022 11:35:54 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket Message-ID: <20221214113554.29c0c196@elisabeth> In-Reply-To: References: <20221205081425.2614425-1-david@gibson.dropbear.id.au> <20221205081425.2614425-8-david@gibson.dropbear.id.au> <20221213234918.0b51893d@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: L3JRWJ4DZOHSXEBJ5JPG7E6AEL54BPFK X-Message-ID-Hash: L3JRWJ4DZOHSXEBJ5JPG7E6AEL54BPFK X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 14 Dec 2022 12:47:25 +1100 David Gibson wrote: > On Tue, Dec 13, 2022 at 11:49:18PM +0100, Stefano Brivio wrote: > > On Mon, 5 Dec 2022 19:14:24 +1100 > > David Gibson wrote: > > > > > Currently we have special sockets for receiving datagrams from locahost > > > which can use the optimized "splice" path rather than going across the tap > > > interface. > > > > > > We want to loosen this so that sockets can receive sockets that will be > > > forwarded by both the spliced and non-spliced paths. To do this, we alter > > > the meaning of the @splice bit in the reference to mean that packets > > > receieved on this socket *can* be spliced, not that they *will* be spliced. > > > They'll only actually be spliced if they come from 127.0.0.1 or ::1. > > > > > > We can't (for now) remove the splice bit entirely, unlike with TCP. Our > > > gateway mapping means that if the ns initiates communication to the gw > > > address, we'll translate that to target 127.0.0.1 on the host side. Reply > > > packets will therefore have source address 127.0.0.1 when received on the > > > host, but these need to go via the tap path where that will be translated > > > back to the gateway address. We need the @splice bit to distinguish that > > > case from packets going from localhost to a port mapped explicitly with > > > -u which should be spliced. > > > > > > Signed-off-by: David Gibson > > > --- > > > udp.c | 54 +++++++++++++++++++++++++++++++++++------------------- > > > udp.h | 2 +- > > > 2 files changed, 36 insertions(+), 20 deletions(-) > > > > > > diff --git a/udp.c b/udp.c > > > index 6ccfe8c..011a157 100644 > > > --- a/udp.c > > > +++ b/udp.c > > > @@ -513,16 +513,27 @@ static int udp_splice_new_ns(void *arg) > > > } > > > > > > /** > > > - * sa_port() - Determine port from a sockaddr_in or sockaddr_in6 > > > + * udp_mmh_splice_port() - Is source address of message suitable for splicing? > > > * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)? > > > - * @sa: Pointer to either sockaddr_in or sockaddr_in6 > > > + * @mmh: mmsghdr of incoming message > > > + * > > > + * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port from > > > + * @sa, otherwise 0. > > > + * > > > + * NOTE: this relies on the fact that it's not valid to use UDP port 0 > > > > The port is reserved by IANA indeed, but... it can actually be used. On > > Linux, you can bind() it and you can connect() to it. As far as I can > > tell from the new version of udp_sock_handler() we would actually > > misdirect packets in that case. > > Hm, ok. Given the IANA reservation, I think it would be acceptable to > simply drop such packets - but if we were to make that choice we > should do so explicitly, rather than misdirecting them. Acceptable, sure, but... I don't know, it somehow doesn't look desirable to me. The kernel doesn't enforce this, so I guess we shouldn't either. > > How bad would it be to use an int here? > > Pretty straightforward. Just means we have to use the somewhat > abtruse "if (port <= USHRT_MAX)" or "if (port >= 0)" or something > instead of just "if (port)". Should I go ahead and make that change? Eh, I don't like it either, but... I guess it's better than the alternative, so yes, thanks. Or pass port as a pointer, set on return. I'm fine with both. > > By the way, I think the comment should also mention that the port is > > returned in host order. > > Ok, easily done. Generally I try to keep the endianness associated > with the type, rather than attempting to document it for each variable > (or even worse, each point in time for each variable). Yes, I see, and it's a more valid approach than mine, but still mine comes almost for free. By the way, I got distracted by this and I forgot about two things: > +static in_port_t udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh) > { > - const struct sockaddr_in6 *sa6 = sa; > - const struct sockaddr_in *sa4 = sa; > + const struct sockaddr_in6 *sa6 = mmh->msg_hdr.msg_name; > + const struct sockaddr_in *sa4 = mmh->msg_hdr.msg_name;; Stray semicolon here. > + > + if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr)) > + return ntohs(sa6->sin6_port); > > - return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port); > + if (ntohl(sa4->sin_addr.s_addr) == INADDR_LOOPBACK) > + return ntohs(sa4->sin_port); If it's IPv6, but not a loopback address, we'll check if sa4->sin_addr.s_addr == INADDR_LOOPBACK -- which might actually be true for an IPv6, non-loopback address. Also, I think we can happily "splice" for any loopback address, not just 127.0.0.1. What about something like: if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr)) return ntohs(sa6->sin6_port); if (!v4 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr)) return ntohs(sa4->sin_port); return -1; ? -- Stefano