public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket
Date: Thu, 15 Dec 2022 11:33:13 +1100	[thread overview]
Message-ID: <Y5prSckYUr7qEvzX@yekko> (raw)
In-Reply-To: <20221214113554.29c0c196@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 6209 bytes --]

On Wed, Dec 14, 2022 at 11:35:54AM +0100, Stefano Brivio wrote:
> On Wed, 14 Dec 2022 12:47:25 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Dec 13, 2022 at 11:49:18PM +0100, Stefano Brivio wrote:
> > > On Mon,  5 Dec 2022 19:14:24 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Currently we have special sockets for receiving datagrams from locahost
> > > > which can use the optimized "splice" path rather than going across the tap
> > > > interface.
> > > > 
> > > > We want to loosen this so that sockets can receive sockets that will be
> > > > forwarded by both the spliced and non-spliced paths.  To do this, we alter
> > > > the meaning of the @splice bit in the reference to mean that packets
> > > > receieved on this socket *can* be spliced, not that they *will* be spliced.
> > > > They'll only actually be spliced if they come from 127.0.0.1 or ::1.
> > > > 
> > > > We can't (for now) remove the splice bit entirely, unlike with TCP.  Our
> > > > gateway mapping means that if the ns initiates communication to the gw
> > > > address, we'll translate that to target 127.0.0.1 on the host side.  Reply
> > > > packets will therefore have source address 127.0.0.1 when received on the
> > > > host, but these need to go via the tap path where that will be translated
> > > > back to the gateway address.  We need the @splice bit to distinguish that
> > > > case from packets going from localhost to a port mapped explicitly with
> > > > -u which should be spliced.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  udp.c | 54 +++++++++++++++++++++++++++++++++++-------------------
> > > >  udp.h |  2 +-
> > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/udp.c b/udp.c
> > > > index 6ccfe8c..011a157 100644
> > > > --- a/udp.c
> > > > +++ b/udp.c
> > > > @@ -513,16 +513,27 @@ static int udp_splice_new_ns(void *arg)
> > > >  }
> > > >  
> > > >  /**
> > > > - * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
> > > > + * udp_mmh_splice_port() - Is source address of message suitable for splicing?
> > > >   * @v6:		Is @sa a sockaddr_in6 (otherwise sockaddr_in)?
> > > > - * @sa:		Pointer to either sockaddr_in or sockaddr_in6
> > > > + * @mmh:	mmsghdr of incoming message
> > > > + *
> > > > + * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port from
> > > > + *         @sa, otherwise 0.
> > > > + *
> > > > + * NOTE: this relies on the fact that it's not valid to use UDP port 0  
> > > 
> > > The port is reserved by IANA indeed, but... it can actually be used. On
> > > Linux, you can bind() it and you can connect() to it. As far as I can
> > > tell from the new version of udp_sock_handler() we would actually
> > > misdirect packets in that case.  
> > 
> > Hm, ok.  Given the IANA reservation, I think it would be acceptable to
> > simply drop such packets - but if we were to make that choice we
> > should do so explicitly, rather than misdirecting them.
> 
> Acceptable, sure, but... I don't know, it somehow doesn't look
> desirable to me. The kernel doesn't enforce this, so I guess we
> shouldn't either.
> 
> > > How bad would it be to use an int here?  
> > 
> > Pretty straightforward.  Just means we have to use the somewhat
> > abtruse "if (port <= USHRT_MAX)" or "if (port >= 0)" or something
> > instead of just "if (port)".  Should I go ahead and make that change?
> 
> Eh, I don't like it either, but... I guess it's better than the
> alternative, so yes, thanks. Or pass port as a pointer, set on return.
> I'm fine with both.

I think the int is less ugly than the pointer.  Ok, I'll make that change.

> > > By the way, I think the comment should also mention that the port is
> > > returned in host order.  
> > 
> > Ok, easily done.  Generally I try to keep the endianness associated
> > with the type, rather than attempting to document it for each variable
> > (or even worse, each point in time for each variable).
> 
> Yes, I see, and it's a more valid approach than mine, but still mine
> comes almost for free.

Right.  Actually rereading the function in question, it specifically
says "port from @sa" and in the sockaddr, of course, it's network
endian, so it could particularly do with clarification in this case.

> By the way, I got distracted by this and I forgot about two things:
> 
> > +static in_port_t udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
> >  {
> > -	const struct sockaddr_in6 *sa6 = sa;
> > -	const struct sockaddr_in *sa4 = sa;
> > +	const struct sockaddr_in6 *sa6 = mmh->msg_hdr.msg_name;
> > +	const struct sockaddr_in *sa4 = mmh->msg_hdr.msg_name;;
> 
> Stray semicolon here.

Fixed.

> > +
> > +	if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
> > +		return ntohs(sa6->sin6_port);
> >  
> > -	return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
> > +	if (ntohl(sa4->sin_addr.s_addr) == INADDR_LOOPBACK)
> > +		return ntohs(sa4->sin_port);
> 
> If it's IPv6, but not a loopback address, we'll check if
> sa4->sin_addr.s_addr == INADDR_LOOPBACK -- which might actually be true for
> an IPv6, non-loopback address.

Oops, yes, that's definitely wrong, and wrong enough to respin.

> Also, I think we can happily "splice" for any loopback address, not
> just 127.0.0.1. What about something like:

Well.. yes and no.  Yes, we can deliver packets in this way, but we'll
lost track of the original from address, so reply packets will be
misdirected.

Hrm.. ISTR you mentioned some cases that worked despite that, so I'll
make the change, and hope to fix it up better when I get to the NAT /
splice semantics rework.

> 	if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
> 		return ntohs(sa6->sin6_port);
> 
> 	if (!v4 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr))
> 		return ntohs(sa4->sin_port);
> 
> 	return -1;
> 
> ?
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-12-15  0:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-05  8:14 [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications David Gibson
2022-12-05  8:14 ` [PATCH 1/8] udp: Move sending pasta tap frames to the end of udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 2/8] udp: Split sending to passt tap interface into separate function David Gibson
2022-12-05  8:14 ` [PATCH 3/8] udp: Split receive from preparation and send in udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 4/8] udp: Receive multiple datagrams at once on the pasta sock->tap path David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:42     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-20 10:42         ` Stefano Brivio
2022-12-21  6:00           ` David Gibson
2022-12-22  2:37             ` Stefano Brivio
2023-01-04  0:08             ` Stefano Brivio
2023-01-04  4:53               ` David Gibson
2022-12-05  8:14 ` [PATCH 5/8] udp: Pre-populate msg_names with local address David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:22     ` David Gibson
2022-12-05  8:14 ` [PATCH 6/8] udp: Unify udp_sock_handler_splice() with udp_sock_handler() David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:19     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-05  8:14 ` [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket David Gibson
2022-12-13 22:49   ` Stefano Brivio
2022-12-14  1:47     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-15  0:33         ` David Gibson [this message]
2022-12-05  8:14 ` [PATCH 8/8] udp: Don't use separate sockets to listen for spliced packets David Gibson
2022-12-06  6:45 ` [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications Stefano Brivio
2022-12-06  6:46   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5prSckYUr7qEvzX@yekko \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).