public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket
Date: Wed, 14 Dec 2022 11:35:54 +0100	[thread overview]
Message-ID: <20221214113554.29c0c196@elisabeth> (raw)
In-Reply-To: <Y5krLbRLTmtpJjaE@yekko>

On Wed, 14 Dec 2022 12:47:25 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Dec 13, 2022 at 11:49:18PM +0100, Stefano Brivio wrote:
> > On Mon,  5 Dec 2022 19:14:24 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > Currently we have special sockets for receiving datagrams from locahost
> > > which can use the optimized "splice" path rather than going across the tap
> > > interface.
> > > 
> > > We want to loosen this so that sockets can receive sockets that will be
> > > forwarded by both the spliced and non-spliced paths.  To do this, we alter
> > > the meaning of the @splice bit in the reference to mean that packets
> > > receieved on this socket *can* be spliced, not that they *will* be spliced.
> > > They'll only actually be spliced if they come from 127.0.0.1 or ::1.
> > > 
> > > We can't (for now) remove the splice bit entirely, unlike with TCP.  Our
> > > gateway mapping means that if the ns initiates communication to the gw
> > > address, we'll translate that to target 127.0.0.1 on the host side.  Reply
> > > packets will therefore have source address 127.0.0.1 when received on the
> > > host, but these need to go via the tap path where that will be translated
> > > back to the gateway address.  We need the @splice bit to distinguish that
> > > case from packets going from localhost to a port mapped explicitly with
> > > -u which should be spliced.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  udp.c | 54 +++++++++++++++++++++++++++++++++++-------------------
> > >  udp.h |  2 +-
> > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/udp.c b/udp.c
> > > index 6ccfe8c..011a157 100644
> > > --- a/udp.c
> > > +++ b/udp.c
> > > @@ -513,16 +513,27 @@ static int udp_splice_new_ns(void *arg)
> > >  }
> > >  
> > >  /**
> > > - * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
> > > + * udp_mmh_splice_port() - Is source address of message suitable for splicing?
> > >   * @v6:		Is @sa a sockaddr_in6 (otherwise sockaddr_in)?
> > > - * @sa:		Pointer to either sockaddr_in or sockaddr_in6
> > > + * @mmh:	mmsghdr of incoming message
> > > + *
> > > + * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port from
> > > + *         @sa, otherwise 0.
> > > + *
> > > + * NOTE: this relies on the fact that it's not valid to use UDP port 0  
> > 
> > The port is reserved by IANA indeed, but... it can actually be used. On
> > Linux, you can bind() it and you can connect() to it. As far as I can
> > tell from the new version of udp_sock_handler() we would actually
> > misdirect packets in that case.  
> 
> Hm, ok.  Given the IANA reservation, I think it would be acceptable to
> simply drop such packets - but if we were to make that choice we
> should do so explicitly, rather than misdirecting them.

Acceptable, sure, but... I don't know, it somehow doesn't look
desirable to me. The kernel doesn't enforce this, so I guess we
shouldn't either.

> > How bad would it be to use an int here?  
> 
> Pretty straightforward.  Just means we have to use the somewhat
> abtruse "if (port <= USHRT_MAX)" or "if (port >= 0)" or something
> instead of just "if (port)".  Should I go ahead and make that change?

Eh, I don't like it either, but... I guess it's better than the
alternative, so yes, thanks. Or pass port as a pointer, set on return.
I'm fine with both.

> > By the way, I think the comment should also mention that the port is
> > returned in host order.  
> 
> Ok, easily done.  Generally I try to keep the endianness associated
> with the type, rather than attempting to document it for each variable
> (or even worse, each point in time for each variable).

Yes, I see, and it's a more valid approach than mine, but still mine
comes almost for free.

By the way, I got distracted by this and I forgot about two things:

> +static in_port_t udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
>  {
> -	const struct sockaddr_in6 *sa6 = sa;
> -	const struct sockaddr_in *sa4 = sa;
> +	const struct sockaddr_in6 *sa6 = mmh->msg_hdr.msg_name;
> +	const struct sockaddr_in *sa4 = mmh->msg_hdr.msg_name;;

Stray semicolon here.

> +
> +	if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
> +		return ntohs(sa6->sin6_port);
>  
> -	return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
> +	if (ntohl(sa4->sin_addr.s_addr) == INADDR_LOOPBACK)
> +		return ntohs(sa4->sin_port);

If it's IPv6, but not a loopback address, we'll check if
sa4->sin_addr.s_addr == INADDR_LOOPBACK -- which might actually be true for
an IPv6, non-loopback address.

Also, I think we can happily "splice" for any loopback address, not
just 127.0.0.1. What about something like:

	if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
		return ntohs(sa6->sin6_port);

	if (!v4 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr))
		return ntohs(sa4->sin_port);

	return -1;

?

-- 
Stefano


  reply	other threads:[~2022-12-14 10:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-05  8:14 [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications David Gibson
2022-12-05  8:14 ` [PATCH 1/8] udp: Move sending pasta tap frames to the end of udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 2/8] udp: Split sending to passt tap interface into separate function David Gibson
2022-12-05  8:14 ` [PATCH 3/8] udp: Split receive from preparation and send in udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 4/8] udp: Receive multiple datagrams at once on the pasta sock->tap path David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:42     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-20 10:42         ` Stefano Brivio
2022-12-21  6:00           ` David Gibson
2022-12-22  2:37             ` Stefano Brivio
2023-01-04  0:08             ` Stefano Brivio
2023-01-04  4:53               ` David Gibson
2022-12-05  8:14 ` [PATCH 5/8] udp: Pre-populate msg_names with local address David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:22     ` David Gibson
2022-12-05  8:14 ` [PATCH 6/8] udp: Unify udp_sock_handler_splice() with udp_sock_handler() David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:19     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-05  8:14 ` [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket David Gibson
2022-12-13 22:49   ` Stefano Brivio
2022-12-14  1:47     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio [this message]
2022-12-15  0:33         ` David Gibson
2022-12-05  8:14 ` [PATCH 8/8] udp: Don't use separate sockets to listen for spliced packets David Gibson
2022-12-06  6:45 ` [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications Stefano Brivio
2022-12-06  6:46   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221214113554.29c0c196@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).