From: David Gibson <david@gibson.dropbear.id.au>
To: passt-dev@passt.top, Stefano Brivio <sbrivio@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v4 7/8] udp: Decide whether to "splice" per datagram rather than per socket
Date: Thu, 5 Jan 2023 15:26:24 +1100 [thread overview]
Message-ID: <20230105042625.1981812-8-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20230105042625.1981812-1-david@gibson.dropbear.id.au>
Currently we have special sockets for receiving datagrams from locahost
which can use the optimized "splice" path rather than going across the tap
interface.
We want to loosen this so that sockets can receive sockets that will be
forwarded by both the spliced and non-spliced paths. To do this, we alter
the meaning of the @splice bit in the reference to mean that packets
receieved on this socket *can* be spliced, not that they *will* be spliced.
They'll only actually be spliced if they come from 127.0.0.1 or ::1.
We can't (for now) remove the splice bit entirely, unlike with TCP. Our
gateway mapping means that if the ns initiates communication to the gw
address, we'll translate that to target 127.0.0.1 on the host side. Reply
packets will therefore have source address 127.0.0.1 when received on the
host, but these need to go via the tap path where that will be translated
back to the gateway address. We need the @splice bit to distinguish that
case from packets going from localhost to a port mapped explicitly with
-u which should be spliced.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 52 +++++++++++++++++++++++++++++++++-------------------
udp.h | 2 +-
2 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/udp.c b/udp.c
index f160ecc..3f216ed 100644
--- a/udp.c
+++ b/udp.c
@@ -513,16 +513,25 @@ static int udp_splice_new_ns(void *arg)
}
/**
- * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
+ * udp_mmh_splice_port() - Is source address of message suitable for splicing?
* @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)?
- * @sa: Pointer to either sockaddr_in or sockaddr_in6
+ * @mmh: mmsghdr of incoming message
+ *
+ * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port from
+ * @sa in host order, otherwise -1.
*/
-static in_port_t sa_port(bool v6, const void *sa)
+static int udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
{
- const struct sockaddr_in6 *sa6 = sa;
- const struct sockaddr_in *sa4 = sa;
+ const struct sockaddr_in6 *sa6 = mmh->msg_hdr.msg_name;
+ const struct sockaddr_in *sa4 = mmh->msg_hdr.msg_name;
+
+ if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
+ return ntohs(sa6->sin6_port);
+
+ if (!v6 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr))
+ return ntohs(sa4->sin_port);
- return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
+ return -1;
}
/**
@@ -926,23 +935,28 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events,
if (n <= 0)
return;
- if (!ref.r.p.udp.udp.splice) {
- udp_tap_send(c, 0, n, dstport, v6, now);
- return;
- }
-
for (i = 0; i < n; i += m) {
- in_port_t src = sa_port(v6, mmh_recv[i].msg_hdr.msg_name);
+ int splicefrom = -1;
+ m = n;
+
+ if (ref.r.p.udp.udp.splice) {
+ splicefrom = udp_mmh_splice_port(v6, mmh_recv + i);
+
+ for (m = 1; i + m < n; m++) {
+ int p;
- for (m = 1; i + m < n; m++) {
- void *mname = mmh_recv[i + m].msg_hdr.msg_name;
- if (sa_port(v6, mname) != src)
- break;
+ p = udp_mmh_splice_port(v6, mmh_recv + i + m);
+ if (p != splicefrom)
+ break;
+ }
}
- udp_splice_sendfrom(c, i, m, src, dstport, v6,
- ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig,
- now);
+ if (splicefrom >= 0)
+ udp_splice_sendfrom(c, i, m, splicefrom, dstport,
+ v6, ref.r.p.udp.udp.ns,
+ ref.r.p.udp.udp.orig, now);
+ else
+ udp_tap_send(c, i, m, dstport, v6, now);
}
}
diff --git a/udp.h b/udp.h
index 053991e..2a03335 100644
--- a/udp.h
+++ b/udp.h
@@ -22,7 +22,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
/**
* union udp_epoll_ref - epoll reference portion for TCP connections
* @bound: Set if this file descriptor is a bound socket
- * @splice: Set if descriptor is associated to "spliced" connection
+ * @splice: Set if descriptor packets to be "spliced"
* @orig: Set if a spliced socket which can originate "connections"
* @ns: Set if this is a socket in the pasta network namespace
* @v6: Set for IPv6 sockets or connections
--
@@ -22,7 +22,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
/**
* union udp_epoll_ref - epoll reference portion for TCP connections
* @bound: Set if this file descriptor is a bound socket
- * @splice: Set if descriptor is associated to "spliced" connection
+ * @splice: Set if descriptor packets to be "spliced"
* @orig: Set if a spliced socket which can originate "connections"
* @ns: Set if this is a socket in the pasta network namespace
* @v6: Set for IPv6 sockets or connections
--
2.39.0
next prev parent reply other threads:[~2023-01-05 4:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-05 4:26 [PATCH v4 0/8] Don't use additional sockets for receiving "spliced" UDP communications David Gibson
2023-01-05 4:26 ` [PATCH v4 1/8] udp: Move sending pasta tap frames to the end of udp_sock_handler() David Gibson
2023-01-05 4:26 ` [PATCH v4 2/8] udp: Split sending to passt tap interface into separate function David Gibson
2023-01-05 4:26 ` [PATCH v4 3/8] udp: Split receive from preparation and send in udp_sock_handler() David Gibson
2023-01-05 4:26 ` [PATCH v4 4/8] udp: Don't handle tap receive batch size calculation within a #define David Gibson
2023-01-05 4:26 ` [PATCH v4 5/8] udp: Pre-populate msg_names with local address David Gibson
2023-01-05 4:26 ` [PATCH v4 6/8] udp: Unify udp_sock_handler_splice() with udp_sock_handler() David Gibson
2023-01-05 4:26 ` David Gibson [this message]
2023-01-05 4:26 ` [PATCH v4 8/8] udp: Don't use separate sockets to listen for spliced packets David Gibson
2023-01-05 21:50 ` [PATCH v4 0/8] Don't use additional sockets for receiving "spliced" UDP communications Stefano Brivio
2023-01-06 0:59 ` David Gibson
2023-01-13 0:07 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230105042625.1981812-8-david@gibson.dropbear.id.au \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).