public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* UDP "splicing" fairly broken :(
@ 2022-11-17  5:07 David Gibson
  2022-11-17  7:21 ` Stefano Brivio
  0 siblings, 1 reply; 5+ messages in thread
From: David Gibson @ 2022-11-17  5:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

In preparation for trying to implement dual stack sockets for UDP,
I've been getting my head around how the UDP splicing works.  Alas,
I'm pretty sure that it's broken if there's not a one-to-one
correspondence between init side source ports and ns side destination
ports.  That will typically be the case, but given its UDP there's no
guarantee.

In addition, UDP servers in the ns will not see the correct port
numbers with getpeername().  That's also true of spliced TCP
connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
more likely to matter for UDP (I don't know of any TCP protocols that
care about source port number on the server side, but there are some
common UDP protocols that have at least port number conventions on
both sides).

I can expand on the details later, but pasta will do the wrong thing
in at least some circumstances for both a single init side socket
sendto()ing packets to multiple different ports in the ns/guest and
multiple init side sockets send()ing to the same port in the ns/guest.

I think I know how to fix it, but it's not a trivial job.  So, the
question is do I embark on this now, or do I just remove UDP
"splicing" entirely for the time being (other than a minimum required
to make -U work)?  That would unblock dual stack UDP sockets and we
can attempt to reoptimize this later.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UDP "splicing" fairly broken :(
  2022-11-17  5:07 UDP "splicing" fairly broken :( David Gibson
@ 2022-11-17  7:21 ` Stefano Brivio
  2022-11-17 10:30   ` David Gibson
  0 siblings, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2022-11-17  7:21 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 17 Nov 2022 16:07:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> In preparation for trying to implement dual stack sockets for UDP,
> I've been getting my head around how the UDP splicing works.  Alas,
> I'm pretty sure that it's broken if there's not a one-to-one
> correspondence between init side source ports and ns side destination
> ports.  That will typically be the case, but given its UDP there's no
> guarantee.

I understand the concern below, but I don't understand this part, that
is: in which other way is it broken?

> In addition, UDP servers in the ns will not see the correct port
> numbers with getpeername().  That's also true of spliced TCP
> connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
> more likely to matter for UDP (I don't know of any TCP protocols that
> care about source port number on the server side, but there are some
> common UDP protocols that have at least port number conventions on
> both sides).

I can think of DHCP and DNS, for which we offer special handling
somehow. Still, if the flow is started by the guest or container,
replies should really come with a source port matching the destination
port used initially.

For TCP, I don't see this is as an issue at all.

> I can expand on the details later, but pasta will do the wrong thing
> in at least some circumstances for both a single init side socket
> sendto()ing packets to multiple different ports in the ns/guest and
> multiple init side sockets send()ing to the same port in the ns/guest.
> 
> I think I know how to fix it, but it's not a trivial job.  So, the
> question is do I embark on this now, or do I just remove UDP
> "splicing" entirely for the time being (other than a minimum required
> to make -U work)?  That would unblock dual stack UDP sockets and we
> can attempt to reoptimize this later.

So, I'm not really sure what's broken here, but in any case, UDP
"splicing" doesn't offer as much value as the TCP one does, the
difference in packet rate is not that big. I don't see a problem if we
want to remove it temporarily.

The only real concern I have is how easy it would be to add it back
after a rework not taking that functionality into account.

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UDP "splicing" fairly broken :(
  2022-11-17  7:21 ` Stefano Brivio
@ 2022-11-17 10:30   ` David Gibson
  2022-11-17 20:02     ` Stefano Brivio
  0 siblings, 1 reply; 5+ messages in thread
From: David Gibson @ 2022-11-17 10:30 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 6997 bytes --]

On Thu, Nov 17, 2022 at 08:21:00AM +0100, Stefano Brivio wrote:
> On Thu, 17 Nov 2022 16:07:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > In preparation for trying to implement dual stack sockets for UDP,
> > I've been getting my head around how the UDP splicing works.  Alas,
> > I'm pretty sure that it's broken if there's not a one-to-one
> > correspondence between init side source ports and ns side destination
> > ports.  That will typically be the case, but given its UDP there's no
> > guarantee.
> 
> I understand the concern below, but I don't understand this part, that
> is: in which other way is it broken?

Uh.. other than?

> > In addition, UDP servers in the ns will not see the correct port
> > numbers with getpeername().  That's also true of spliced TCP
> > connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
> > more likely to matter for UDP (I don't know of any TCP protocols that
> > care about source port number on the server side, but there are some
> > common UDP protocols that have at least port number conventions on
> > both sides).
> 
> I can think of DHCP and DNS, for which we offer special handling
> somehow. Still, if the flow is started by the guest or container,
> replies should really come with a source port matching the destination
> port used initially.

I'm not concerned so much about replies coming from a different port
as a server which expects initial requests from a particular port.
Still not that likely, but more likely than with TCP.

> For TCP, I don't see this is as an issue at all.

I largely agree.

> > I can expand on the details later, but pasta will do the wrong thing
> > in at least some circumstances for both a single init side socket
> > sendto()ing packets to multiple different ports in the ns/guest and
> > multiple init side sockets send()ing to the same port in the ns/guest.
> > 
> > I think I know how to fix it, but it's not a trivial job.  So, the
> > question is do I embark on this now, or do I just remove UDP
> > "splicing" entirely for the time being (other than a minimum required
> > to make -U work)?  That would unblock dual stack UDP sockets and we
> > can attempt to reoptimize this later.
> 
> So, I'm not really sure what's broken here, but in any case, UDP

I'll fill in the details below.

> "splicing" doesn't offer as much value as the TCP one does, the
> difference in packet rate is not that big. I don't see a problem if we
> want to remove it temporarily.

Ok, good to know.

> The only real concern I have is how easy it would be to add it back
> after a rework not taking that functionality into account.

I actually think this will fit better with the tap path once I've made
the dual stack socket changes.

Ok, for the details of the problem.  I'm only considering the case
where the host side initiates the communication.  I think there are
similar cases the other way, but I haven't thought them through.

Scenario 1: one source port, multiple destination ports

Here pasta is running with -u 200 -u 300

  1. Client on the host opens UDP socket A and binds it to localhost:100

  2. Client sends datagram 1 on socket A to localhost:200 with sendto()
  3. Datagram 1 is received by pasta on splice socket B bound to
     localhost:200
  4. Because of the -U 200, pasta handles this in
     udp_sock_handler_splice(), ref has splice==UDP_TO_NS
  5, recvmmsg() gets a single datagram, from source localhost:100, so
     src==100
  6. udp_splice_map[v6][100].ns_conn_sock is empty, so we call
     udp_splice_connect_ns()
      6.1. udp_splice_connect() creates socket B*, and connects it
           to localhost:200 in the namespace
      6.2. udp_splice_map[v6][100].ns_conn_sock is populated with
           socket B*
  7. sendmmsg() forwards the datagram to socket B*
  8. Datagram 1 correctly reaches port 200 within the ns

  9. Client sends datagram 2 on socket A to localhost:300 with
     sendto()
  11. Datagram 2 is received by pasta on socket C bound to
      localhost:300
  10. Again, pasta handles this in udp_sock_handler_splice() with
      UDP_TO_NS.  Again, src==100
  11. udp_splice_map[v6][100] is populated with socket B* from above
  12. sendmmsg() forwads datagram 2 to socket B*
* 13, Datagram 2 is incorrectly delivered to port 200 within the ns,
      instead of port 300

Scenario 2: multiple source ports, one destination port

Here pasta is running with -u 1000

  1. Client on the host opens socket A bound to localhost:2000
  2. Client on the host opens socket B bound to localhost:3000
  2. Client sends datagram 1 from socket A to localhost:1000 with
     sendto()
  3. Client sends datagram 2 from socket B to localhost:1000 with
     sendto()
  
  4. Datagram 1 and 2 are both received by pasta on socket C bound to
     localhost:1000, with UDP_TO_NS
  5. Datagram 1 and 2 happen to both be received by the same
     recvmmsg(), in that order
  6. udp_sock_handler_splice() only examines udp_mmh_recv[0] and so
     sets src==2000
  7. udp_splice_map[v6][2000].ns_conn_sock is unpopulated, so
     udp_splice_connect_ns() is called
       7.1 udp_splice_connect creates socket C* and connects it to
           localhost:1000 within the guest, let's say it gets
	   ephemeral bound port 50000.  It's tagged with
	   UDP_BACK_TO_INIT
       7.2 udp_splice_map[v6][2000].ns_conn_sock is populated with
           socket C*
       7.3 udp_splice_map[v6][50000].init_bound_sock is populated with
           socket C
       7.4 udp_splice_map[v6][50000].init_dst_port is populated with 2000
  8. sendmmsg() forwads datagrams 1 & 2 to socket C*
  9. Datagrams 1 & 2 correctly delivered to port 1000 in the namespace

  10. Server within the namespace receives datagram 1 with
      recvfrom().  From address is localhost:50000 (socket C*)
  11. Server sends reply datagram 1* to localhost:50000 within the ns
  12. Server receives datagram 2 with recvfrom().  From address is
      again localhost:50000 (socket C*)
  13. Server sends reply datagram 2* to localhost:50000 within the ns

  14. pasta receives datagrams 1* and 2* on socket C*.
      UDP_BACK_TO_INIT and dst==50000 from the epoll ref

  15. udp_sock_handler_splice() sets s to socket C from
      udp_splice_map[v6][50000].init_bound_sock, and send_dst to 2000
      from udp_splice_map[v6][50000]
  16. sendmmsg() forwards datagram 1* on socket C to localhost:2000
  17. Datagram 1* correctly received by socket A on localhost:2000
  18. sendmmsg() forwards datagram 2* on socket C to localhost:2000
* 19. Datagram 2* incorrectly received by socket A on localhost:2000
      instead of socket B on localhost:3000

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UDP "splicing" fairly broken :(
  2022-11-17 10:30   ` David Gibson
@ 2022-11-17 20:02     ` Stefano Brivio
  2022-11-18  5:08       ` David Gibson
  0 siblings, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2022-11-17 20:02 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 17 Nov 2022 21:30:03 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Nov 17, 2022 at 08:21:00AM +0100, Stefano Brivio wrote:
> > On Thu, 17 Nov 2022 16:07:32 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > In preparation for trying to implement dual stack sockets for UDP,
> > > I've been getting my head around how the UDP splicing works.  Alas,
> > > I'm pretty sure that it's broken if there's not a one-to-one
> > > correspondence between init side source ports and ns side destination
> > > ports.  That will typically be the case, but given its UDP there's no
> > > guarantee.  
> > 
> > I understand the concern below, but I don't understand this part, that
> > is: in which other way is it broken?  
> 
> Uh.. other than?

Sorry, I didn't understand it's just broken for the cases below, but
not in case one source port corresponds one destination port (you wrote
it, I misread).

So, well, I disagree on it being fairly broken: it works in the most
common case. This doesn't really matter though:

- it's an obstacle to unify IPv4 and IPv6 sockets

- it's definitely not pretty -- it was more of a sketch I wanted to
  rewrite for a long time

- it doesn't make a huge difference in packet rates (far from having
  the same impact as the TCP spliced connections)

- it's broken in those (albeit probably uncommon) cases below

...let's drop it for now, and add back a saner version later, now that
we know in much better detail how it should work and where the problems
might be.

> > > In addition, UDP servers in the ns will not see the correct port
> > > numbers with getpeername().  That's also true of spliced TCP
> > > connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
> > > more likely to matter for UDP (I don't know of any TCP protocols that
> > > care about source port number on the server side, but there are some
> > > common UDP protocols that have at least port number conventions on
> > > both sides).  
> > 
> > I can think of DHCP and DNS, for which we offer special handling
> > somehow. Still, if the flow is started by the guest or container,
> > replies should really come with a source port matching the destination
> > port used initially.  
> 
> I'm not concerned so much about replies coming from a different port
> as a server which expects initial requests from a particular port.
> Still not that likely, but more likely than with TCP.
> 
> > For TCP, I don't see this is as an issue at all.  
> 
> I largely agree.
> 
> > > I can expand on the details later, but pasta will do the wrong thing
> > > in at least some circumstances for both a single init side socket
> > > sendto()ing packets to multiple different ports in the ns/guest and
> > > multiple init side sockets send()ing to the same port in the ns/guest.
> > > 
> > > I think I know how to fix it, but it's not a trivial job.  So, the
> > > question is do I embark on this now, or do I just remove UDP
> > > "splicing" entirely for the time being (other than a minimum required
> > > to make -U work)?  That would unblock dual stack UDP sockets and we
> > > can attempt to reoptimize this later.  
> > 
> > So, I'm not really sure what's broken here, but in any case, UDP  
> 
> I'll fill in the details below.
> 
> > "splicing" doesn't offer as much value as the TCP one does, the
> > difference in packet rate is not that big. I don't see a problem if we
> > want to remove it temporarily.  
> 
> Ok, good to know.
> 
> > The only real concern I have is how easy it would be to add it back
> > after a rework not taking that functionality into account.  
> 
> I actually think this will fit better with the tap path once I've made
> the dual stack socket changes.
> 
> Ok, for the details of the problem.  I'm only considering the case
> where the host side initiates the communication.  I think there are
> similar cases the other way, but I haven't thought them through.
> 
> Scenario 1: one source port, multiple destination ports
> 
> Here pasta is running with -u 200 -u 300
> 
>   1. Client on the host opens UDP socket A and binds it to localhost:100
> 
>   2. Client sends datagram 1 on socket A to localhost:200 with sendto()
>   3. Datagram 1 is received by pasta on splice socket B bound to
>      localhost:200
>   4. Because of the -U 200, pasta handles this in

-u 200 here

>      udp_sock_handler_splice(), ref has splice==UDP_TO_NS
>   5, recvmmsg() gets a single datagram, from source localhost:100, so
>      src==100
>   6. udp_splice_map[v6][100].ns_conn_sock is empty, so we call
>      udp_splice_connect_ns()
>       6.1. udp_splice_connect() creates socket B*, and connects it
>            to localhost:200 in the namespace
>       6.2. udp_splice_map[v6][100].ns_conn_sock is populated with
>            socket B*
>   7. sendmmsg() forwards the datagram to socket B*
>   8. Datagram 1 correctly reaches port 200 within the ns
> 
>   9. Client sends datagram 2 on socket A to localhost:300 with
>      sendto()
>   11. Datagram 2 is received by pasta on socket C bound to
>       localhost:300
>   10. Again, pasta handles this in udp_sock_handler_splice() with
>       UDP_TO_NS.  Again, src==100
>   11. udp_splice_map[v6][100] is populated with socket B* from above
>   12. sendmmsg() forwads datagram 2 to socket B*
> * 13, Datagram 2 is incorrectly delivered to port 200 within the ns,
>       instead of port 300
> 
> Scenario 2: multiple source ports, one destination port
> 
> Here pasta is running with -u 1000
> 
>   1. Client on the host opens socket A bound to localhost:2000
>   2. Client on the host opens socket B bound to localhost:3000
>   2. Client sends datagram 1 from socket A to localhost:1000 with
>      sendto()
>   3. Client sends datagram 2 from socket B to localhost:1000 with
>      sendto()
>   
>   4. Datagram 1 and 2 are both received by pasta on socket C bound to
>      localhost:1000, with UDP_TO_NS
>   5. Datagram 1 and 2 happen to both be received by the same
>      recvmmsg(), in that order
>   6. udp_sock_handler_splice() only examines udp_mmh_recv[0] and so
>      sets src==2000
>   7. udp_splice_map[v6][2000].ns_conn_sock is unpopulated, so
>      udp_splice_connect_ns() is called
>        7.1 udp_splice_connect creates socket C* and connects it to
>            localhost:1000 within the guest, let's say it gets
> 	   ephemeral bound port 50000.  It's tagged with
> 	   UDP_BACK_TO_INIT
>        7.2 udp_splice_map[v6][2000].ns_conn_sock is populated with
>            socket C*
>        7.3 udp_splice_map[v6][50000].init_bound_sock is populated with
>            socket C
>        7.4 udp_splice_map[v6][50000].init_dst_port is populated with 2000
>   8. sendmmsg() forwads datagrams 1 & 2 to socket C*
>   9. Datagrams 1 & 2 correctly delivered to port 1000 in the namespace
> 
>   10. Server within the namespace receives datagram 1 with
>       recvfrom().  From address is localhost:50000 (socket C*)
>   11. Server sends reply datagram 1* to localhost:50000 within the ns
>   12. Server receives datagram 2 with recvfrom().  From address is
>       again localhost:50000 (socket C*)
>   13. Server sends reply datagram 2* to localhost:50000 within the ns
> 
>   14. pasta receives datagrams 1* and 2* on socket C*.
>       UDP_BACK_TO_INIT and dst==50000 from the epoll ref
> 
>   15. udp_sock_handler_splice() sets s to socket C from
>       udp_splice_map[v6][50000].init_bound_sock, and send_dst to 2000
>       from udp_splice_map[v6][50000]
>   16. sendmmsg() forwards datagram 1* on socket C to localhost:2000
>   17. Datagram 1* correctly received by socket A on localhost:2000
>   18. sendmmsg() forwards datagram 2* on socket C to localhost:2000
> * 19. Datagram 2* incorrectly received by socket A on localhost:2000
>       instead of socket B on localhost:3000

Thanks a lot for the details, issues are clear to me now.

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UDP "splicing" fairly broken :(
  2022-11-17 20:02     ` Stefano Brivio
@ 2022-11-18  5:08       ` David Gibson
  0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2022-11-18  5:08 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 9776 bytes --]

On Thu, Nov 17, 2022 at 09:02:35PM +0100, Stefano Brivio wrote:
> On Thu, 17 Nov 2022 21:30:03 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Nov 17, 2022 at 08:21:00AM +0100, Stefano Brivio wrote:
> > > On Thu, 17 Nov 2022 16:07:32 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > In preparation for trying to implement dual stack sockets for UDP,
> > > > I've been getting my head around how the UDP splicing works.  Alas,
> > > > I'm pretty sure that it's broken if there's not a one-to-one
> > > > correspondence between init side source ports and ns side destination
> > > > ports.  That will typically be the case, but given its UDP there's no
> > > > guarantee.  
> > > 
> > > I understand the concern below, but I don't understand this part, that
> > > is: in which other way is it broken?  
> > 
> > Uh.. other than?
> 
> Sorry, I didn't understand it's just broken for the cases below, but
> not in case one source port corresponds one destination port (you wrote
> it, I misread).

Ah, right.

> So, well, I disagree on it being fairly broken: it works in the most
> common case. This doesn't really matter though:

So, I'm less sanguine about this, as you probably realize.  Yes, it's
not a common case, but the failure mode is nasty.  It's not just that
it falls back to a slower path, nor even that it just drops packets.
At best it's going to lead to very hard to debug failures.  At worst,
it's a security problem: the second scenario potentially lets an
unprivileged host process intercept replies to another user's requests
to a server in the ns, or interpose incorrect reponses to that other
user's requests.

> - it's an obstacle to unify IPv4 and IPv6 sockets
> 
> - it's definitely not pretty -- it was more of a sketch I wanted to
>   rewrite for a long time
> 
> - it doesn't make a huge difference in packet rates (far from having
>   the same impact as the TCP spliced connections)
> 
> - it's broken in those (albeit probably uncommon) cases below
> 
> ...let's drop it for now, and add back a saner version later, now that
> we know in much better detail how it should work and where the problems
> might be.

I had a look at this today, and unfortunately it's less
straightforward than I first thought.  The difficulty is that the
splicing isn't just a performance hack, it's also the only way that
the host can access servers within the ns bound explicitly to
127.0.0.1 or ::1.  I'm not particularly convinced allowing that's a
good idea (it's a behavioural difference between passt and pasta for
one).  But, removing it now would be a big semantic change, and if we
do so we should make it consistent for tcp as well.

Still looking and thinking through options.

> > > > In addition, UDP servers in the ns will not see the correct port
> > > > numbers with getpeername().  That's also true of spliced TCP
> > > > connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
> > > > more likely to matter for UDP (I don't know of any TCP protocols that
> > > > care about source port number on the server side, but there are some
> > > > common UDP protocols that have at least port number conventions on
> > > > both sides).  
> > > 
> > > I can think of DHCP and DNS, for which we offer special handling
> > > somehow. Still, if the flow is started by the guest or container,
> > > replies should really come with a source port matching the destination
> > > port used initially.  
> > 
> > I'm not concerned so much about replies coming from a different port
> > as a server which expects initial requests from a particular port.
> > Still not that likely, but more likely than with TCP.
> > 
> > > For TCP, I don't see this is as an issue at all.  
> > 
> > I largely agree.
> > 
> > > > I can expand on the details later, but pasta will do the wrong thing
> > > > in at least some circumstances for both a single init side socket
> > > > sendto()ing packets to multiple different ports in the ns/guest and
> > > > multiple init side sockets send()ing to the same port in the ns/guest.
> > > > 
> > > > I think I know how to fix it, but it's not a trivial job.  So, the
> > > > question is do I embark on this now, or do I just remove UDP
> > > > "splicing" entirely for the time being (other than a minimum required
> > > > to make -U work)?  That would unblock dual stack UDP sockets and we
> > > > can attempt to reoptimize this later.  
> > > 
> > > So, I'm not really sure what's broken here, but in any case, UDP  
> > 
> > I'll fill in the details below.
> > 
> > > "splicing" doesn't offer as much value as the TCP one does, the
> > > difference in packet rate is not that big. I don't see a problem if we
> > > want to remove it temporarily.  
> > 
> > Ok, good to know.
> > 
> > > The only real concern I have is how easy it would be to add it back
> > > after a rework not taking that functionality into account.  
> > 
> > I actually think this will fit better with the tap path once I've made
> > the dual stack socket changes.
> > 
> > Ok, for the details of the problem.  I'm only considering the case
> > where the host side initiates the communication.  I think there are
> > similar cases the other way, but I haven't thought them through.
> > 
> > Scenario 1: one source port, multiple destination ports
> > 
> > Here pasta is running with -u 200 -u 300
> > 
> >   1. Client on the host opens UDP socket A and binds it to localhost:100
> > 
> >   2. Client sends datagram 1 on socket A to localhost:200 with sendto()
> >   3. Datagram 1 is received by pasta on splice socket B bound to
> >      localhost:200
> >   4. Because of the -U 200, pasta handles this in
> 
> -u 200 here
> 
> >      udp_sock_handler_splice(), ref has splice==UDP_TO_NS
> >   5, recvmmsg() gets a single datagram, from source localhost:100, so
> >      src==100
> >   6. udp_splice_map[v6][100].ns_conn_sock is empty, so we call
> >      udp_splice_connect_ns()
> >       6.1. udp_splice_connect() creates socket B*, and connects it
> >            to localhost:200 in the namespace
> >       6.2. udp_splice_map[v6][100].ns_conn_sock is populated with
> >            socket B*
> >   7. sendmmsg() forwards the datagram to socket B*
> >   8. Datagram 1 correctly reaches port 200 within the ns
> > 
> >   9. Client sends datagram 2 on socket A to localhost:300 with
> >      sendto()
> >   11. Datagram 2 is received by pasta on socket C bound to
> >       localhost:300
> >   10. Again, pasta handles this in udp_sock_handler_splice() with
> >       UDP_TO_NS.  Again, src==100
> >   11. udp_splice_map[v6][100] is populated with socket B* from above
> >   12. sendmmsg() forwads datagram 2 to socket B*
> > * 13, Datagram 2 is incorrectly delivered to port 200 within the ns,
> >       instead of port 300
> > 
> > Scenario 2: multiple source ports, one destination port
> > 
> > Here pasta is running with -u 1000
> > 
> >   1. Client on the host opens socket A bound to localhost:2000
> >   2. Client on the host opens socket B bound to localhost:3000
> >   2. Client sends datagram 1 from socket A to localhost:1000 with
> >      sendto()
> >   3. Client sends datagram 2 from socket B to localhost:1000 with
> >      sendto()
> >   
> >   4. Datagram 1 and 2 are both received by pasta on socket C bound to
> >      localhost:1000, with UDP_TO_NS
> >   5. Datagram 1 and 2 happen to both be received by the same
> >      recvmmsg(), in that order
> >   6. udp_sock_handler_splice() only examines udp_mmh_recv[0] and so
> >      sets src==2000
> >   7. udp_splice_map[v6][2000].ns_conn_sock is unpopulated, so
> >      udp_splice_connect_ns() is called
> >        7.1 udp_splice_connect creates socket C* and connects it to
> >            localhost:1000 within the guest, let's say it gets
> > 	   ephemeral bound port 50000.  It's tagged with
> > 	   UDP_BACK_TO_INIT
> >        7.2 udp_splice_map[v6][2000].ns_conn_sock is populated with
> >            socket C*
> >        7.3 udp_splice_map[v6][50000].init_bound_sock is populated with
> >            socket C
> >        7.4 udp_splice_map[v6][50000].init_dst_port is populated with 2000
> >   8. sendmmsg() forwads datagrams 1 & 2 to socket C*
> >   9. Datagrams 1 & 2 correctly delivered to port 1000 in the namespace
> > 
> >   10. Server within the namespace receives datagram 1 with
> >       recvfrom().  From address is localhost:50000 (socket C*)
> >   11. Server sends reply datagram 1* to localhost:50000 within the ns
> >   12. Server receives datagram 2 with recvfrom().  From address is
> >       again localhost:50000 (socket C*)
> >   13. Server sends reply datagram 2* to localhost:50000 within the ns
> > 
> >   14. pasta receives datagrams 1* and 2* on socket C*.
> >       UDP_BACK_TO_INIT and dst==50000 from the epoll ref
> > 
> >   15. udp_sock_handler_splice() sets s to socket C from
> >       udp_splice_map[v6][50000].init_bound_sock, and send_dst to 2000
> >       from udp_splice_map[v6][50000]
> >   16. sendmmsg() forwards datagram 1* on socket C to localhost:2000
> >   17. Datagram 1* correctly received by socket A on localhost:2000
> >   18. sendmmsg() forwards datagram 2* on socket C to localhost:2000
> > * 19. Datagram 2* incorrectly received by socket A on localhost:2000
> >       instead of socket B on localhost:3000
> 
> Thanks a lot for the details, issues are clear to me now.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-18  5:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17  5:07 UDP "splicing" fairly broken :( David Gibson
2022-11-17  7:21 ` Stefano Brivio
2022-11-17 10:30   ` David Gibson
2022-11-17 20:02     ` Stefano Brivio
2022-11-18  5:08       ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).