From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: UDP "splicing" fairly broken :(
Date: Thu, 17 Nov 2022 21:30:03 +1100 [thread overview]
Message-ID: <Y3YNKzFIz0J4RFYs@yekko> (raw)
In-Reply-To: <20221117082100.5fb5ee2d@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 6997 bytes --]
On Thu, Nov 17, 2022 at 08:21:00AM +0100, Stefano Brivio wrote:
> On Thu, 17 Nov 2022 16:07:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > In preparation for trying to implement dual stack sockets for UDP,
> > I've been getting my head around how the UDP splicing works. Alas,
> > I'm pretty sure that it's broken if there's not a one-to-one
> > correspondence between init side source ports and ns side destination
> > ports. That will typically be the case, but given its UDP there's no
> > guarantee.
>
> I understand the concern below, but I don't understand this part, that
> is: in which other way is it broken?
Uh.. other than?
> > In addition, UDP servers in the ns will not see the correct port
> > numbers with getpeername(). That's also true of spliced TCP
> > connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
> > more likely to matter for UDP (I don't know of any TCP protocols that
> > care about source port number on the server side, but there are some
> > common UDP protocols that have at least port number conventions on
> > both sides).
>
> I can think of DHCP and DNS, for which we offer special handling
> somehow. Still, if the flow is started by the guest or container,
> replies should really come with a source port matching the destination
> port used initially.
I'm not concerned so much about replies coming from a different port
as a server which expects initial requests from a particular port.
Still not that likely, but more likely than with TCP.
> For TCP, I don't see this is as an issue at all.
I largely agree.
> > I can expand on the details later, but pasta will do the wrong thing
> > in at least some circumstances for both a single init side socket
> > sendto()ing packets to multiple different ports in the ns/guest and
> > multiple init side sockets send()ing to the same port in the ns/guest.
> >
> > I think I know how to fix it, but it's not a trivial job. So, the
> > question is do I embark on this now, or do I just remove UDP
> > "splicing" entirely for the time being (other than a minimum required
> > to make -U work)? That would unblock dual stack UDP sockets and we
> > can attempt to reoptimize this later.
>
> So, I'm not really sure what's broken here, but in any case, UDP
I'll fill in the details below.
> "splicing" doesn't offer as much value as the TCP one does, the
> difference in packet rate is not that big. I don't see a problem if we
> want to remove it temporarily.
Ok, good to know.
> The only real concern I have is how easy it would be to add it back
> after a rework not taking that functionality into account.
I actually think this will fit better with the tap path once I've made
the dual stack socket changes.
Ok, for the details of the problem. I'm only considering the case
where the host side initiates the communication. I think there are
similar cases the other way, but I haven't thought them through.
Scenario 1: one source port, multiple destination ports
Here pasta is running with -u 200 -u 300
1. Client on the host opens UDP socket A and binds it to localhost:100
2. Client sends datagram 1 on socket A to localhost:200 with sendto()
3. Datagram 1 is received by pasta on splice socket B bound to
localhost:200
4. Because of the -U 200, pasta handles this in
udp_sock_handler_splice(), ref has splice==UDP_TO_NS
5, recvmmsg() gets a single datagram, from source localhost:100, so
src==100
6. udp_splice_map[v6][100].ns_conn_sock is empty, so we call
udp_splice_connect_ns()
6.1. udp_splice_connect() creates socket B*, and connects it
to localhost:200 in the namespace
6.2. udp_splice_map[v6][100].ns_conn_sock is populated with
socket B*
7. sendmmsg() forwards the datagram to socket B*
8. Datagram 1 correctly reaches port 200 within the ns
9. Client sends datagram 2 on socket A to localhost:300 with
sendto()
11. Datagram 2 is received by pasta on socket C bound to
localhost:300
10. Again, pasta handles this in udp_sock_handler_splice() with
UDP_TO_NS. Again, src==100
11. udp_splice_map[v6][100] is populated with socket B* from above
12. sendmmsg() forwads datagram 2 to socket B*
* 13, Datagram 2 is incorrectly delivered to port 200 within the ns,
instead of port 300
Scenario 2: multiple source ports, one destination port
Here pasta is running with -u 1000
1. Client on the host opens socket A bound to localhost:2000
2. Client on the host opens socket B bound to localhost:3000
2. Client sends datagram 1 from socket A to localhost:1000 with
sendto()
3. Client sends datagram 2 from socket B to localhost:1000 with
sendto()
4. Datagram 1 and 2 are both received by pasta on socket C bound to
localhost:1000, with UDP_TO_NS
5. Datagram 1 and 2 happen to both be received by the same
recvmmsg(), in that order
6. udp_sock_handler_splice() only examines udp_mmh_recv[0] and so
sets src==2000
7. udp_splice_map[v6][2000].ns_conn_sock is unpopulated, so
udp_splice_connect_ns() is called
7.1 udp_splice_connect creates socket C* and connects it to
localhost:1000 within the guest, let's say it gets
ephemeral bound port 50000. It's tagged with
UDP_BACK_TO_INIT
7.2 udp_splice_map[v6][2000].ns_conn_sock is populated with
socket C*
7.3 udp_splice_map[v6][50000].init_bound_sock is populated with
socket C
7.4 udp_splice_map[v6][50000].init_dst_port is populated with 2000
8. sendmmsg() forwads datagrams 1 & 2 to socket C*
9. Datagrams 1 & 2 correctly delivered to port 1000 in the namespace
10. Server within the namespace receives datagram 1 with
recvfrom(). From address is localhost:50000 (socket C*)
11. Server sends reply datagram 1* to localhost:50000 within the ns
12. Server receives datagram 2 with recvfrom(). From address is
again localhost:50000 (socket C*)
13. Server sends reply datagram 2* to localhost:50000 within the ns
14. pasta receives datagrams 1* and 2* on socket C*.
UDP_BACK_TO_INIT and dst==50000 from the epoll ref
15. udp_sock_handler_splice() sets s to socket C from
udp_splice_map[v6][50000].init_bound_sock, and send_dst to 2000
from udp_splice_map[v6][50000]
16. sendmmsg() forwards datagram 1* on socket C to localhost:2000
17. Datagram 1* correctly received by socket A on localhost:2000
18. sendmmsg() forwards datagram 2* on socket C to localhost:2000
* 19. Datagram 2* incorrectly received by socket A on localhost:2000
instead of socket B on localhost:3000
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-11-17 10:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 5:07 UDP "splicing" fairly broken :( David Gibson
2022-11-17 7:21 ` Stefano Brivio
2022-11-17 10:30 ` David Gibson [this message]
2022-11-17 20:02 ` Stefano Brivio
2022-11-18 5:08 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3YNKzFIz0J4RFYs@yekko \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).