* [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-25 1:47 ` Stefano Brivio
2022-11-24 1:16 ` [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows David Gibson
` (14 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
pasta handles "spliced" port forwarding by resending datagrams received on
a bound socket in the init namespace to a connected socket in the guest
namespace. This means there are actually three ports associated with each
"connection". First there's the source and destination ports of the
originating datagram. That's also the destination port of the forwarded
datagram, but the source port of the forwarded datagram is the kernel
allocated bound address of the connected socket.
However, by bind()ing as well as connect()ing the forwarding socket we can
choose the source port of the forwarded datagrams. By choosing it to match
the original source port we remove that surprising third port number and
no longer need to store port numbers in struct udp_splice_port.
As a bonus this means that the recipient of the packets will see the
original source port if they call getpeername(). This rarely matters, but
it can't hurt.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 84 +++++++++++++++++++++++------------------------------------
1 file changed, 32 insertions(+), 52 deletions(-)
diff --git a/udp.c b/udp.c
index 0aa6308..a025a48 100644
--- a/udp.c
+++ b/udp.c
@@ -51,20 +51,19 @@
* - send packet to udp4_splice_map[5000].ns_conn_sock
* - otherwise:
* - create new socket udp_splice_map[V4][5000].ns_conn_sock
+ * - bind in namespace to 127.0.0.1:5000
* - connect in namespace to 127.0.0.1:80 (note: this destination port
* might be remapped to another port instead)
- * - get source port of new connected socket (10000) with getsockname()
- * - add to epoll with reference: index = 10000, splice: UDP_BACK_TO_INIT
- * - set udp_splice_map[V4][10000].init_bound_sock to s
- * - set udp_splice_map[V4][10000].init_dst_port to 5000
+ * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
+ * - set udp_splice_map[V4][5000].init_bound_sock to s
* - update udp_splice_map[V4][5000].ns_conn_ts with current time
*
- * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:10000 in namespace from
- * connected socket s, having epoll reference: index = 10000,
+ * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
+ * connected socket s, having epoll reference: index = 5000,
* splice = UDP_BACK_TO_INIT
- * - if udp_splice_map[V4][10000].init_bound_sock:
- * - send to udp_splice_map[V4][10000].init_bound_sock, with destination
- * port udp_splice_map[V4][10000].init_dst_port (5000)
+ * - if udp_splice_map[V4][5000].init_bound_sock:
+ * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
+ * port 5000
* - otherwise, discard
*
* - from namespace to init:
@@ -75,19 +74,18 @@
* - send packet to udp4_splice_map[2000].init_conn_sock
* - otherwise:
* - create new socket udp_splice_map[V4][2000].init_conn_sock
+ * - bind in init to 127.0.0.1:2000
* - connect in init to 127.0.0.1:22 (note: this destination port
* might be remapped to another port instead)
- * - get source port of new connected socket (4000) with getsockname()
- * - add to epoll with reference: index = 4000, splice = UDP_BACK_TO_NS
- * - set udp_splice_map[V4][4000].ns_bound_sock to s
- * - set udp_splice_map[V4][4000].ns_dst_port to 2000
- * - update udp_splice_map[V4][4000].init_conn_ts with current time
+ * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - set udp_splice_map[V4][2000].ns_bound_sock to s
+ * - update udp_splice_map[V4][2000].init_conn_ts with current time
*
- * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:4000 in init from connected
- * socket s, having epoll reference: index = 4000, splice = UDP_BACK_TO_NS
- * - if udp_splice_map[V4][4000].ns_bound_sock:
- * - send to udp_splice_map[V4][4000].ns_bound_sock, with destination port
- * udp_splice_map[4000].ns_dst_port (2000)
+ * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
+ * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - if udp_splice_map[V4][2000].ns_bound_sock:
+ * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
+ * 2000
* - otherwise, discard
*/
@@ -145,8 +143,6 @@ struct udp_tap_port {
* @init_conn_sock: Socket connected in init for namespace source port
* @ns_conn_ts: Timestamp of activity for socket connected in namespace
* @init_conn_ts: Timestamp of activity for socket connceted in init
- * @ns_dst_port: Destination port in namespace for init source port
- * @init_dst_port: Destination port in init for namespace source port
* @ns_bound_sock: Bound socket in namespace for this source port in init
* @init_bound_sock: Bound socket in init for this source port in namespace
*/
@@ -157,9 +153,6 @@ struct udp_splice_port {
time_t ns_conn_ts;
time_t init_conn_ts;
- in_port_t ns_dst_port;
- in_port_t init_dst_port;
-
int ns_bound_sock;
int init_bound_sock;
};
@@ -425,11 +418,10 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
- .r.p.udp.udp = { .splice = splice, .v6 = v6 }
+ .r.p.udp.udp = { .splice = splice, .v6 = v6,
+ .port = src }
};
- struct sockaddr_storage sa;
- struct udp_splice_port *sp;
- socklen_t sl = sizeof(sa);
+ struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
int s;
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
@@ -448,46 +440,34 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
if (v6) {
struct sockaddr_in6 addr6 = {
.sin6_family = AF_INET6,
- .sin6_port = htons(dst),
+ .sin6_port = htons(src),
.sin6_addr = IN6ADDR_LOOPBACK_INIT,
};
+ if (bind(s, (struct sockaddr *)&addr6, sizeof(addr6)))
+ goto fail;
+ addr6.sin6_port = htons(dst);
if (connect(s, (struct sockaddr *)&addr6, sizeof(addr6)))
goto fail;
} else {
struct sockaddr_in addr4 = {
.sin_family = AF_INET,
- .sin_port = htons(dst),
+ .sin_port = htons(src),
.sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
};
+ if (bind(s, (struct sockaddr *)&addr4, sizeof(addr4)))
+ goto fail;
+ addr4.sin_port = htons(dst);
if (connect(s, (struct sockaddr *)&addr4, sizeof(addr4)))
goto fail;
}
- if (getsockname(s, (struct sockaddr *)&sa, &sl))
- goto fail;
-
- if (v6) {
- struct sockaddr_in6 sa6;
-
- memcpy(&sa6, &sa, sizeof(sa6));
- ref.r.p.udp.udp.port = ntohs(sa6.sin6_port);
- } else {
- struct sockaddr_in sa4;
-
- memcpy(&sa4, &sa, sizeof(sa4));
- ref.r.p.udp.udp.port = ntohs(sa4.sin_port);
- }
-
- sp = &udp_splice_map[v6 ? V6 : V4][ref.r.p.udp.udp.port];
if (splice == UDP_BACK_TO_INIT) {
sp->init_bound_sock = bound_sock;
- sp->init_dst_port = src;
- udp_splice_map[v6 ? V6 : V4][src].ns_conn_sock = s;
+ sp->ns_conn_sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_NS_CONN], src);
} else if (splice == UDP_BACK_TO_NS) {
sp->ns_bound_sock = bound_sock;
- sp->ns_dst_port = src;
- udp_splice_map[v6 ? V6 : V4][src].init_conn_sock = s;
+ sp->init_conn_sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_INIT_CONN], src);
}
@@ -591,7 +571,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(s = udp_splice_map[v6][dst].init_bound_sock))
return;
- send_dst = udp_splice_map[v6][dst].init_dst_port;
+ send_dst = dst;
break;
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
@@ -608,7 +588,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(s = udp_splice_map[v6][dst].ns_bound_sock))
return;
- send_dst = udp_splice_map[v6][dst].ns_dst_port;
+ send_dst = dst;
break;
default:
return;
--
@@ -51,20 +51,19 @@
* - send packet to udp4_splice_map[5000].ns_conn_sock
* - otherwise:
* - create new socket udp_splice_map[V4][5000].ns_conn_sock
+ * - bind in namespace to 127.0.0.1:5000
* - connect in namespace to 127.0.0.1:80 (note: this destination port
* might be remapped to another port instead)
- * - get source port of new connected socket (10000) with getsockname()
- * - add to epoll with reference: index = 10000, splice: UDP_BACK_TO_INIT
- * - set udp_splice_map[V4][10000].init_bound_sock to s
- * - set udp_splice_map[V4][10000].init_dst_port to 5000
+ * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
+ * - set udp_splice_map[V4][5000].init_bound_sock to s
* - update udp_splice_map[V4][5000].ns_conn_ts with current time
*
- * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:10000 in namespace from
- * connected socket s, having epoll reference: index = 10000,
+ * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
+ * connected socket s, having epoll reference: index = 5000,
* splice = UDP_BACK_TO_INIT
- * - if udp_splice_map[V4][10000].init_bound_sock:
- * - send to udp_splice_map[V4][10000].init_bound_sock, with destination
- * port udp_splice_map[V4][10000].init_dst_port (5000)
+ * - if udp_splice_map[V4][5000].init_bound_sock:
+ * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
+ * port 5000
* - otherwise, discard
*
* - from namespace to init:
@@ -75,19 +74,18 @@
* - send packet to udp4_splice_map[2000].init_conn_sock
* - otherwise:
* - create new socket udp_splice_map[V4][2000].init_conn_sock
+ * - bind in init to 127.0.0.1:2000
* - connect in init to 127.0.0.1:22 (note: this destination port
* might be remapped to another port instead)
- * - get source port of new connected socket (4000) with getsockname()
- * - add to epoll with reference: index = 4000, splice = UDP_BACK_TO_NS
- * - set udp_splice_map[V4][4000].ns_bound_sock to s
- * - set udp_splice_map[V4][4000].ns_dst_port to 2000
- * - update udp_splice_map[V4][4000].init_conn_ts with current time
+ * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - set udp_splice_map[V4][2000].ns_bound_sock to s
+ * - update udp_splice_map[V4][2000].init_conn_ts with current time
*
- * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:4000 in init from connected
- * socket s, having epoll reference: index = 4000, splice = UDP_BACK_TO_NS
- * - if udp_splice_map[V4][4000].ns_bound_sock:
- * - send to udp_splice_map[V4][4000].ns_bound_sock, with destination port
- * udp_splice_map[4000].ns_dst_port (2000)
+ * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
+ * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - if udp_splice_map[V4][2000].ns_bound_sock:
+ * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
+ * 2000
* - otherwise, discard
*/
@@ -145,8 +143,6 @@ struct udp_tap_port {
* @init_conn_sock: Socket connected in init for namespace source port
* @ns_conn_ts: Timestamp of activity for socket connected in namespace
* @init_conn_ts: Timestamp of activity for socket connceted in init
- * @ns_dst_port: Destination port in namespace for init source port
- * @init_dst_port: Destination port in init for namespace source port
* @ns_bound_sock: Bound socket in namespace for this source port in init
* @init_bound_sock: Bound socket in init for this source port in namespace
*/
@@ -157,9 +153,6 @@ struct udp_splice_port {
time_t ns_conn_ts;
time_t init_conn_ts;
- in_port_t ns_dst_port;
- in_port_t init_dst_port;
-
int ns_bound_sock;
int init_bound_sock;
};
@@ -425,11 +418,10 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
- .r.p.udp.udp = { .splice = splice, .v6 = v6 }
+ .r.p.udp.udp = { .splice = splice, .v6 = v6,
+ .port = src }
};
- struct sockaddr_storage sa;
- struct udp_splice_port *sp;
- socklen_t sl = sizeof(sa);
+ struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
int s;
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
@@ -448,46 +440,34 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
if (v6) {
struct sockaddr_in6 addr6 = {
.sin6_family = AF_INET6,
- .sin6_port = htons(dst),
+ .sin6_port = htons(src),
.sin6_addr = IN6ADDR_LOOPBACK_INIT,
};
+ if (bind(s, (struct sockaddr *)&addr6, sizeof(addr6)))
+ goto fail;
+ addr6.sin6_port = htons(dst);
if (connect(s, (struct sockaddr *)&addr6, sizeof(addr6)))
goto fail;
} else {
struct sockaddr_in addr4 = {
.sin_family = AF_INET,
- .sin_port = htons(dst),
+ .sin_port = htons(src),
.sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
};
+ if (bind(s, (struct sockaddr *)&addr4, sizeof(addr4)))
+ goto fail;
+ addr4.sin_port = htons(dst);
if (connect(s, (struct sockaddr *)&addr4, sizeof(addr4)))
goto fail;
}
- if (getsockname(s, (struct sockaddr *)&sa, &sl))
- goto fail;
-
- if (v6) {
- struct sockaddr_in6 sa6;
-
- memcpy(&sa6, &sa, sizeof(sa6));
- ref.r.p.udp.udp.port = ntohs(sa6.sin6_port);
- } else {
- struct sockaddr_in sa4;
-
- memcpy(&sa4, &sa, sizeof(sa4));
- ref.r.p.udp.udp.port = ntohs(sa4.sin_port);
- }
-
- sp = &udp_splice_map[v6 ? V6 : V4][ref.r.p.udp.udp.port];
if (splice == UDP_BACK_TO_INIT) {
sp->init_bound_sock = bound_sock;
- sp->init_dst_port = src;
- udp_splice_map[v6 ? V6 : V4][src].ns_conn_sock = s;
+ sp->ns_conn_sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_NS_CONN], src);
} else if (splice == UDP_BACK_TO_NS) {
sp->ns_bound_sock = bound_sock;
- sp->ns_dst_port = src;
- udp_splice_map[v6 ? V6 : V4][src].init_conn_sock = s;
+ sp->init_conn_sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_INIT_CONN], src);
}
@@ -591,7 +571,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(s = udp_splice_map[v6][dst].init_bound_sock))
return;
- send_dst = udp_splice_map[v6][dst].init_dst_port;
+ send_dst = dst;
break;
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
@@ -608,7 +588,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(s = udp_splice_map[v6][dst].ns_bound_sock))
return;
- send_dst = udp_splice_map[v6][dst].ns_dst_port;
+ send_dst = dst;
break;
default:
return;
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding
2022-11-24 1:16 ` [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding David Gibson
@ 2022-11-25 1:47 ` Stefano Brivio
2022-11-25 7:01 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Stefano Brivio @ 2022-11-25 1:47 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 24 Nov 2022 12:16:44 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> pasta handles "spliced" port forwarding by resending datagrams received on
> a bound socket in the init namespace to a connected socket in the guest
> namespace. This means there are actually three ports associated with each
> "connection". First there's the source and destination ports of the
> originating datagram. That's also the destination port of the forwarded
> datagram, but the source port of the forwarded datagram is the kernel
> allocated bound address of the connected socket.
>
> However, by bind()ing as well as connect()ing the forwarding socket we can
> choose the source port of the forwarded datagrams. By choosing it to match
> the original source port we remove that surprising third port number and
> no longer need to store port numbers in struct udp_splice_port.
If you wondered, I think the whole connect() with getsockname() thing
without a bind() came from the fundamental misconception I had that you
couldn't connect() a bound socket -- and I didn't quite think of
dropping connect() as you do in 3/16 anyway.
There's one minor problem this introduces: the source port of the
originating datagram now needs to be free in the init namespace. It's
still better than the alternative problem you fix in 16/16, though.
I'm wondering if we could, _once you're done with all this_ (it already
looks complicated enough), revisit the 'goto fail' in
udp_splice_connect() (now udp_splice_new()) when bind() fails, and just
proceed with an ephemeral port then.
Also, I haven't tried, but I'm not sure if this introduces some kind of
DoS possibility: even if pasta forwards a single port, it should be
possible for a remote host to make pasta bind to a large amount of
non-ephemeral ports.
Maybe it would make sense to think of a limit on how many ports a
single peer could cause pasta to bind.
I'm not sure yet how we could track peers without a separate address
storage (even though keeping an LRU array should be feasible) -- the
simpler alternative, limiting bound ports by destination port, would
offer an even more convenient way to a DoS.
On the other hand, this is exceedingly minor I guess. We're binding
ports in the namespace after all, and we can reuse bound sockets.
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding
2022-11-25 1:47 ` Stefano Brivio
@ 2022-11-25 7:01 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-25 7:01 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]
On Fri, Nov 25, 2022 at 02:47:51AM +0100, Stefano Brivio wrote:
> On Thu, 24 Nov 2022 12:16:44 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > pasta handles "spliced" port forwarding by resending datagrams received on
> > a bound socket in the init namespace to a connected socket in the guest
> > namespace. This means there are actually three ports associated with each
> > "connection". First there's the source and destination ports of the
> > originating datagram. That's also the destination port of the forwarded
> > datagram, but the source port of the forwarded datagram is the kernel
> > allocated bound address of the connected socket.
> >
> > However, by bind()ing as well as connect()ing the forwarding socket we can
> > choose the source port of the forwarded datagrams. By choosing it to match
> > the original source port we remove that surprising third port number and
> > no longer need to store port numbers in struct udp_splice_port.
>
> If you wondered, I think the whole connect() with getsockname() thing
> without a bind() came from the fundamental misconception I had that you
> couldn't connect() a bound socket -- and I didn't quite think of
> dropping connect() as you do in 3/16 anyway.
>
> There's one minor problem this introduces: the source port of the
> originating datagram now needs to be free in the init namespace. It's
> still better than the alternative problem you fix in 16/16, though.
You mean in the target namespace? Which could be init or otherwise
depending on direction. That did occur to me, however, I believe the
only way it would be not free is if we have a fixed port mapping
occupying that port - in which case the problem goes away again in
8/16.
> I'm wondering if we could, _once you're done with all this_ (it already
> looks complicated enough), revisit the 'goto fail' in
> udp_splice_connect() (now udp_splice_new()) when bind() fails, and just
> proceed with an ephemeral port then.
Oof... I'd rather not, because then we'd have to have somewhere to
keep track of the original source port just for that unlikely edge
case.
> Also, I haven't tried, but I'm not sure if this introduces some kind of
> DoS possibility: even if pasta forwards a single port, it should be
> possible for a remote host to make pasta bind to a large amount of
> non-ephemeral ports.
Hm, yes, that's a point.
> Maybe it would make sense to think of a limit on how many ports a
> single peer could cause pasta to bind.
Maybe, yes.
> I'm not sure yet how we could track peers without a separate address
> storage (even though keeping an LRU array should be feasible) -- the
> simpler alternative, limiting bound ports by destination port, would
> offer an even more convenient way to a DoS.
>
> On the other hand, this is exceedingly minor I guess. We're binding
> ports in the namespace after all, and we can reuse bound sockets.
Right. I think this is something to address when and if it proves to
be a problem in practice.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
2022-11-24 1:16 ` [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-25 1:47 ` Stefano Brivio
2022-11-24 1:16 ` [PATCH v2 03/16] udp: Always use sendto() rather than send() for forwarding spliced packets David Gibson
` (13 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Each entry udp_splice_map[v6][N] keeps information about two essentially
unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock
track a packet flow from port N in the host init namespace to some other
port in the pasta namespace (the one @ns_conn_sock is connected to).
@init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from
port N in the pasta namespace to some other port in the host init namespace
(the one @init_conn_sock is connected to).
Split udp_splice_map[][] into two separate tables for the two directions.
Each entry in each table is a 'struct udp_splice_flow' with @orig_sock
(previously the bound socket), @target_sock (previously the connected
socket) and @ts (the timeout for the target socket).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 111 +++++++++++++++++++++++++++++-----------------------------
1 file changed, 56 insertions(+), 55 deletions(-)
diff --git a/udp.c b/udp.c
index a025a48..4caf73e 100644
--- a/udp.c
+++ b/udp.c
@@ -47,44 +47,44 @@
*
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
* socket s, with epoll reference: index = 80, splice = UDP_TO_NS
- * - if udp_splice_map[V4][5000].ns_conn_sock:
- * - send packet to udp4_splice_map[5000].ns_conn_sock
+ * - if udp_splice_to_ns[V4][5000].target_sock:
+ * - send packet to udp_splice_to_ns[V4][5000].target_sock
* - otherwise:
- * - create new socket udp_splice_map[V4][5000].ns_conn_sock
+ * - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
* - connect in namespace to 127.0.0.1:80 (note: this destination port
* might be remapped to another port instead)
* - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
- * - set udp_splice_map[V4][5000].init_bound_sock to s
- * - update udp_splice_map[V4][5000].ns_conn_ts with current time
+ * - set udp_splice_to_ns[V4][5000].orig_sock to s
+ * - update udp_splice_to_ns[V4][5000].ts with current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
* connected socket s, having epoll reference: index = 5000,
* splice = UDP_BACK_TO_INIT
- * - if udp_splice_map[V4][5000].init_bound_sock:
- * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
- * port 5000
+ * - if udp_splice_to_ns[V4][5000].orig_sock:
+ * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
+ * 5000
* - otherwise, discard
*
* - from namespace to init:
*
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
- * - if udp4_splice_map[V4][2000].init_conn_sock:
- * - send packet to udp4_splice_map[2000].init_conn_sock
+ * - if udp4_splice_to_init[V4][2000].target_sock:
+ * - send packet to udp_splice_to_init[V4][2000].target_sock
* - otherwise:
- * - create new socket udp_splice_map[V4][2000].init_conn_sock
+ * - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
* - connect in init to 127.0.0.1:22 (note: this destination port
* might be remapped to another port instead)
* - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
- * - set udp_splice_map[V4][2000].ns_bound_sock to s
- * - update udp_splice_map[V4][2000].init_conn_ts with current time
+ * - set udp_splice_to_init[V4][2000].orig_sock to s
+ * - update udp_splice_to_init[V4][2000].ts with current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
* socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
- * - if udp_splice_map[V4][2000].ns_bound_sock:
- * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
+ * - if udp_splice_to_init[V4][2000].orig_sock:
+ * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
* 2000
* - otherwise, discard
*/
@@ -138,28 +138,26 @@ struct udp_tap_port {
};
/**
- * struct udp_splice_port - Source port tracking for traffic between namespaces
- * @ns_conn_sock: Socket connected in namespace for init source port
- * @init_conn_sock: Socket connected in init for namespace source port
- * @ns_conn_ts: Timestamp of activity for socket connected in namespace
- * @init_conn_ts: Timestamp of activity for socket connceted in init
- * @ns_bound_sock: Bound socket in namespace for this source port in init
- * @init_bound_sock: Bound socket in init for this source port in namespace
+ * struct udp_splice_flow - Spliced "connection"
+ * @orig_sock: Originating socket, bound to dest port in source ns of
+ * originating datagram
+ * @target_sock: Target socket, bound to source port of originating
+ * datagram in dest ns, connected to dest port of
+ * originating datagram in dest ns
+ * @ts: Activity timestamp
*/
-struct udp_splice_port {
- int ns_conn_sock;
- int init_conn_sock;
-
- time_t ns_conn_ts;
- time_t init_conn_ts;
-
- int ns_bound_sock;
- int init_bound_sock;
+struct udp_splice_flow {
+ int orig_sock;
+ int target_sock;
+ time_t ts;
};
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_port udp_splice_map [IP_VERSIONS][NUM_PORTS];
+
+/* Spliced "connections" indexed by originating source port (host order) */
+static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
+static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
@@ -421,8 +419,17 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
.r.p.udp.udp = { .splice = splice, .v6 = v6,
.port = src }
};
- struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
+ struct udp_splice_flow *flow;
int s;
+ int act;
+
+ if (splice == UDP_BACK_TO_INIT) {
+ flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
+ act = UDP_ACT_NS_CONN;
+ } else {
+ flow = &udp_splice_to_init[v6 ? V6 : V4][src];
+ act = UDP_ACT_INIT_CONN;
+ }
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
IPPROTO_UDP);
@@ -461,15 +468,9 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
goto fail;
}
- if (splice == UDP_BACK_TO_INIT) {
- sp->init_bound_sock = bound_sock;
- sp->ns_conn_sock = s;
- bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_NS_CONN], src);
- } else if (splice == UDP_BACK_TO_NS) {
- sp->ns_bound_sock = bound_sock;
- sp->init_conn_sock = s;
- bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_INIT_CONN], src);
- }
+ flow->orig_sock = bound_sock;
+ flow->target_sock = s;
+ bitmap_set(udp_act[v6 ? V6 : V4][act], src);
ev.data.u64 = ref.u64;
epoll_ctl(c->epollfd, EPOLL_CTL_ADD, s, &ev);
@@ -556,7 +557,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_TO_NS:
src += c->udp.fwd_out.rdelta[src];
- if (!(s = udp_splice_map[v6][src].ns_conn_sock)) {
+ if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
struct udp_splice_connect_ns_arg arg = {
c, v6, ref.r.s, src, dst, -1,
};
@@ -565,10 +566,10 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if ((s = arg.s) < 0)
return;
}
- udp_splice_map[v6][src].ns_conn_ts = now->tv_sec;
+ udp_splice_to_ns[v6][src].ts = now->tv_sec;
break;
case UDP_BACK_TO_INIT:
- if (!(s = udp_splice_map[v6][dst].init_bound_sock))
+ if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
return;
send_dst = dst;
@@ -576,16 +577,16 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
- if (!(s = udp_splice_map[v6][src].init_conn_sock)) {
+ if (!(s = udp_splice_to_init[v6][src].target_sock)) {
s = udp_splice_connect(c, v6, ref.r.s, src, dst,
UDP_BACK_TO_NS);
if (s < 0)
return;
}
- udp_splice_map[v6][src].init_conn_ts = now->tv_sec;
+ udp_splice_to_init[v6][src].ts = now->tv_sec;
break;
case UDP_BACK_TO_NS:
- if (!(s = udp_splice_map[v6][dst].ns_bound_sock))
+ if (!(s = udp_splice_to_init[v6][dst].orig_sock))
return;
send_dst = dst;
@@ -1286,7 +1287,7 @@ int udp_init(struct ctx *c)
static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
in_port_t port, const struct timespec *ts)
{
- struct udp_splice_port *sp;
+ struct udp_splice_flow *flow;
struct udp_tap_port *tp;
int s = -1;
@@ -1301,17 +1302,17 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
break;
case UDP_ACT_INIT_CONN:
- sp = &udp_splice_map[v6 ? V6 : V4][port];
+ flow = &udp_splice_to_init[v6 ? V6 : V4][port];
- if (ts->tv_sec - sp->init_conn_ts > UDP_CONN_TIMEOUT)
- s = sp->init_conn_sock;
+ if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
+ s = flow->target_sock;
break;
case UDP_ACT_NS_CONN:
- sp = &udp_splice_map[v6 ? V6 : V4][port];
+ flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
- if (ts->tv_sec - sp->ns_conn_ts > UDP_CONN_TIMEOUT)
- s = sp->ns_conn_sock;
+ if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
+ s = flow->target_sock;
break;
default:
--
@@ -47,44 +47,44 @@
*
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
* socket s, with epoll reference: index = 80, splice = UDP_TO_NS
- * - if udp_splice_map[V4][5000].ns_conn_sock:
- * - send packet to udp4_splice_map[5000].ns_conn_sock
+ * - if udp_splice_to_ns[V4][5000].target_sock:
+ * - send packet to udp_splice_to_ns[V4][5000].target_sock
* - otherwise:
- * - create new socket udp_splice_map[V4][5000].ns_conn_sock
+ * - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
* - connect in namespace to 127.0.0.1:80 (note: this destination port
* might be remapped to another port instead)
* - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
- * - set udp_splice_map[V4][5000].init_bound_sock to s
- * - update udp_splice_map[V4][5000].ns_conn_ts with current time
+ * - set udp_splice_to_ns[V4][5000].orig_sock to s
+ * - update udp_splice_to_ns[V4][5000].ts with current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
* connected socket s, having epoll reference: index = 5000,
* splice = UDP_BACK_TO_INIT
- * - if udp_splice_map[V4][5000].init_bound_sock:
- * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
- * port 5000
+ * - if udp_splice_to_ns[V4][5000].orig_sock:
+ * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
+ * 5000
* - otherwise, discard
*
* - from namespace to init:
*
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
- * - if udp4_splice_map[V4][2000].init_conn_sock:
- * - send packet to udp4_splice_map[2000].init_conn_sock
+ * - if udp4_splice_to_init[V4][2000].target_sock:
+ * - send packet to udp_splice_to_init[V4][2000].target_sock
* - otherwise:
- * - create new socket udp_splice_map[V4][2000].init_conn_sock
+ * - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
* - connect in init to 127.0.0.1:22 (note: this destination port
* might be remapped to another port instead)
* - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
- * - set udp_splice_map[V4][2000].ns_bound_sock to s
- * - update udp_splice_map[V4][2000].init_conn_ts with current time
+ * - set udp_splice_to_init[V4][2000].orig_sock to s
+ * - update udp_splice_to_init[V4][2000].ts with current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
* socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
- * - if udp_splice_map[V4][2000].ns_bound_sock:
- * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
+ * - if udp_splice_to_init[V4][2000].orig_sock:
+ * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
* 2000
* - otherwise, discard
*/
@@ -138,28 +138,26 @@ struct udp_tap_port {
};
/**
- * struct udp_splice_port - Source port tracking for traffic between namespaces
- * @ns_conn_sock: Socket connected in namespace for init source port
- * @init_conn_sock: Socket connected in init for namespace source port
- * @ns_conn_ts: Timestamp of activity for socket connected in namespace
- * @init_conn_ts: Timestamp of activity for socket connceted in init
- * @ns_bound_sock: Bound socket in namespace for this source port in init
- * @init_bound_sock: Bound socket in init for this source port in namespace
+ * struct udp_splice_flow - Spliced "connection"
+ * @orig_sock: Originating socket, bound to dest port in source ns of
+ * originating datagram
+ * @target_sock: Target socket, bound to source port of originating
+ * datagram in dest ns, connected to dest port of
+ * originating datagram in dest ns
+ * @ts: Activity timestamp
*/
-struct udp_splice_port {
- int ns_conn_sock;
- int init_conn_sock;
-
- time_t ns_conn_ts;
- time_t init_conn_ts;
-
- int ns_bound_sock;
- int init_bound_sock;
+struct udp_splice_flow {
+ int orig_sock;
+ int target_sock;
+ time_t ts;
};
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_port udp_splice_map [IP_VERSIONS][NUM_PORTS];
+
+/* Spliced "connections" indexed by originating source port (host order) */
+static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
+static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
@@ -421,8 +419,17 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
.r.p.udp.udp = { .splice = splice, .v6 = v6,
.port = src }
};
- struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
+ struct udp_splice_flow *flow;
int s;
+ int act;
+
+ if (splice == UDP_BACK_TO_INIT) {
+ flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
+ act = UDP_ACT_NS_CONN;
+ } else {
+ flow = &udp_splice_to_init[v6 ? V6 : V4][src];
+ act = UDP_ACT_INIT_CONN;
+ }
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
IPPROTO_UDP);
@@ -461,15 +468,9 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
goto fail;
}
- if (splice == UDP_BACK_TO_INIT) {
- sp->init_bound_sock = bound_sock;
- sp->ns_conn_sock = s;
- bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_NS_CONN], src);
- } else if (splice == UDP_BACK_TO_NS) {
- sp->ns_bound_sock = bound_sock;
- sp->init_conn_sock = s;
- bitmap_set(udp_act[v6 ? V6 : V4][UDP_ACT_INIT_CONN], src);
- }
+ flow->orig_sock = bound_sock;
+ flow->target_sock = s;
+ bitmap_set(udp_act[v6 ? V6 : V4][act], src);
ev.data.u64 = ref.u64;
epoll_ctl(c->epollfd, EPOLL_CTL_ADD, s, &ev);
@@ -556,7 +557,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_TO_NS:
src += c->udp.fwd_out.rdelta[src];
- if (!(s = udp_splice_map[v6][src].ns_conn_sock)) {
+ if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
struct udp_splice_connect_ns_arg arg = {
c, v6, ref.r.s, src, dst, -1,
};
@@ -565,10 +566,10 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if ((s = arg.s) < 0)
return;
}
- udp_splice_map[v6][src].ns_conn_ts = now->tv_sec;
+ udp_splice_to_ns[v6][src].ts = now->tv_sec;
break;
case UDP_BACK_TO_INIT:
- if (!(s = udp_splice_map[v6][dst].init_bound_sock))
+ if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
return;
send_dst = dst;
@@ -576,16 +577,16 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
- if (!(s = udp_splice_map[v6][src].init_conn_sock)) {
+ if (!(s = udp_splice_to_init[v6][src].target_sock)) {
s = udp_splice_connect(c, v6, ref.r.s, src, dst,
UDP_BACK_TO_NS);
if (s < 0)
return;
}
- udp_splice_map[v6][src].init_conn_ts = now->tv_sec;
+ udp_splice_to_init[v6][src].ts = now->tv_sec;
break;
case UDP_BACK_TO_NS:
- if (!(s = udp_splice_map[v6][dst].ns_bound_sock))
+ if (!(s = udp_splice_to_init[v6][dst].orig_sock))
return;
send_dst = dst;
@@ -1286,7 +1287,7 @@ int udp_init(struct ctx *c)
static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
in_port_t port, const struct timespec *ts)
{
- struct udp_splice_port *sp;
+ struct udp_splice_flow *flow;
struct udp_tap_port *tp;
int s = -1;
@@ -1301,17 +1302,17 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
break;
case UDP_ACT_INIT_CONN:
- sp = &udp_splice_map[v6 ? V6 : V4][port];
+ flow = &udp_splice_to_init[v6 ? V6 : V4][port];
- if (ts->tv_sec - sp->init_conn_ts > UDP_CONN_TIMEOUT)
- s = sp->init_conn_sock;
+ if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
+ s = flow->target_sock;
break;
case UDP_ACT_NS_CONN:
- sp = &udp_splice_map[v6 ? V6 : V4][port];
+ flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
- if (ts->tv_sec - sp->ns_conn_ts > UDP_CONN_TIMEOUT)
- s = sp->ns_conn_sock;
+ if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
+ s = flow->target_sock;
break;
default:
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows
2022-11-24 1:16 ` [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows David Gibson
@ 2022-11-25 1:47 ` Stefano Brivio
2022-11-25 7:06 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Stefano Brivio @ 2022-11-25 1:47 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
Just two nits here:
On Thu, 24 Nov 2022 12:16:45 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> Each entry udp_splice_map[v6][N] keeps information about two essentially
> unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock
> track a packet flow from port N in the host init namespace to some other
> port in the pasta namespace (the one @ns_conn_sock is connected to).
> @init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from
> port N in the pasta namespace to some other port in the host init namespace
> (the one @init_conn_sock is connected to).
>
> Split udp_splice_map[][] into two separate tables for the two directions.
> Each entry in each table is a 'struct udp_splice_flow' with @orig_sock
> (previously the bound socket), @target_sock (previously the connected
> socket) and @ts (the timeout for the target socket).
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> udp.c | 111 +++++++++++++++++++++++++++++-----------------------------
> 1 file changed, 56 insertions(+), 55 deletions(-)
>
> diff --git a/udp.c b/udp.c
> index a025a48..4caf73e 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -47,44 +47,44 @@
> *
This comment still references struct udp_splice_port, it should now say
"see struct udp_spliced_flow" instead.
> * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
> * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
> - * - if udp_splice_map[V4][5000].ns_conn_sock:
> - * - send packet to udp4_splice_map[5000].ns_conn_sock
> + * - if udp_splice_to_ns[V4][5000].target_sock:
> + * - send packet to udp_splice_to_ns[V4][5000].target_sock
> * - otherwise:
> - * - create new socket udp_splice_map[V4][5000].ns_conn_sock
> + * - create new socket udp_splice_to_ns[V4][5000].target_sock
> * - bind in namespace to 127.0.0.1:5000
> * - connect in namespace to 127.0.0.1:80 (note: this destination port
> * might be remapped to another port instead)
> * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
> - * - set udp_splice_map[V4][5000].init_bound_sock to s
> - * - update udp_splice_map[V4][5000].ns_conn_ts with current time
> + * - set udp_splice_to_ns[V4][5000].orig_sock to s
> + * - update udp_splice_to_ns[V4][5000].ts with current time
> *
> * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
> * connected socket s, having epoll reference: index = 5000,
> * splice = UDP_BACK_TO_INIT
> - * - if udp_splice_map[V4][5000].init_bound_sock:
> - * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
> - * port 5000
> + * - if udp_splice_to_ns[V4][5000].orig_sock:
> + * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
> + * 5000
> * - otherwise, discard
> *
> * - from namespace to init:
> *
> * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
> * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
> - * - if udp4_splice_map[V4][2000].init_conn_sock:
> - * - send packet to udp4_splice_map[2000].init_conn_sock
> + * - if udp4_splice_to_init[V4][2000].target_sock:
> + * - send packet to udp_splice_to_init[V4][2000].target_sock
> * - otherwise:
> - * - create new socket udp_splice_map[V4][2000].init_conn_sock
> + * - create new socket udp_splice_to_init[V4][2000].target_sock
> * - bind in init to 127.0.0.1:2000
> * - connect in init to 127.0.0.1:22 (note: this destination port
> * might be remapped to another port instead)
> * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
> - * - set udp_splice_map[V4][2000].ns_bound_sock to s
> - * - update udp_splice_map[V4][2000].init_conn_ts with current time
> + * - set udp_splice_to_init[V4][2000].orig_sock to s
> + * - update udp_splice_to_init[V4][2000].ts with current time
> *
> * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
> * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> - * - if udp_splice_map[V4][2000].ns_bound_sock:
> - * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
> + * - if udp_splice_to_init[V4][2000].orig_sock:
> + * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
> * 2000
> * - otherwise, discard
> */
> @@ -138,28 +138,26 @@ struct udp_tap_port {
> };
>
> /**
> - * struct udp_splice_port - Source port tracking for traffic between namespaces
> - * @ns_conn_sock: Socket connected in namespace for init source port
> - * @init_conn_sock: Socket connected in init for namespace source port
> - * @ns_conn_ts: Timestamp of activity for socket connected in namespace
> - * @init_conn_ts: Timestamp of activity for socket connceted in init
> - * @ns_bound_sock: Bound socket in namespace for this source port in init
> - * @init_bound_sock: Bound socket in init for this source port in namespace
> + * struct udp_splice_flow - Spliced "connection"
> + * @orig_sock: Originating socket, bound to dest port in source ns of
> + * originating datagram
> + * @target_sock: Target socket, bound to source port of originating
> + * datagram in dest ns, connected to dest port of
> + * originating datagram in dest ns
> + * @ts: Activity timestamp
> */
> -struct udp_splice_port {
> - int ns_conn_sock;
> - int init_conn_sock;
> -
> - time_t ns_conn_ts;
> - time_t init_conn_ts;
> -
> - int ns_bound_sock;
> - int init_bound_sock;
> +struct udp_splice_flow {
> + int orig_sock;
> + int target_sock;
> + time_t ts;
> };
>
> /* Port tracking, arrays indexed by packet source port (host order) */
> static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
> -static struct udp_splice_port udp_splice_map [IP_VERSIONS][NUM_PORTS];
> +
> +/* Spliced "connections" indexed by originating source port (host order) */
> +static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
> +static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
>
> enum udp_act_type {
> UDP_ACT_TAP,
> @@ -421,8 +419,17 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
> .r.p.udp.udp = { .splice = splice, .v6 = v6,
> .port = src }
> };
> - struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
> + struct udp_splice_flow *flow;
> int s;
> + int act;
...and this should go before 'int s;'.
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows
2022-11-25 1:47 ` Stefano Brivio
@ 2022-11-25 7:06 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-25 7:06 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 7416 bytes --]
On Fri, Nov 25, 2022 at 02:47:45AM +0100, Stefano Brivio wrote:
> Just two nits here:
>
> On Thu, 24 Nov 2022 12:16:45 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > Each entry udp_splice_map[v6][N] keeps information about two essentially
> > unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock
> > track a packet flow from port N in the host init namespace to some other
> > port in the pasta namespace (the one @ns_conn_sock is connected to).
> > @init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from
> > port N in the pasta namespace to some other port in the host init namespace
> > (the one @init_conn_sock is connected to).
> >
> > Split udp_splice_map[][] into two separate tables for the two directions.
> > Each entry in each table is a 'struct udp_splice_flow' with @orig_sock
> > (previously the bound socket), @target_sock (previously the connected
> > socket) and @ts (the timeout for the target socket).
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > udp.c | 111 +++++++++++++++++++++++++++++-----------------------------
> > 1 file changed, 56 insertions(+), 55 deletions(-)
> >
> > diff --git a/udp.c b/udp.c
> > index a025a48..4caf73e 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -47,44 +47,44 @@
> > *
>
> This comment still references struct udp_splice_port, it should now say
> "see struct udp_spliced_flow" instead.
Fixed. Although that change is obsoleted later in the series.
> > * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
> > * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
> > - * - if udp_splice_map[V4][5000].ns_conn_sock:
> > - * - send packet to udp4_splice_map[5000].ns_conn_sock
> > + * - if udp_splice_to_ns[V4][5000].target_sock:
> > + * - send packet to udp_splice_to_ns[V4][5000].target_sock
> > * - otherwise:
> > - * - create new socket udp_splice_map[V4][5000].ns_conn_sock
> > + * - create new socket udp_splice_to_ns[V4][5000].target_sock
> > * - bind in namespace to 127.0.0.1:5000
> > * - connect in namespace to 127.0.0.1:80 (note: this destination port
> > * might be remapped to another port instead)
> > * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
> > - * - set udp_splice_map[V4][5000].init_bound_sock to s
> > - * - update udp_splice_map[V4][5000].ns_conn_ts with current time
> > + * - set udp_splice_to_ns[V4][5000].orig_sock to s
> > + * - update udp_splice_to_ns[V4][5000].ts with current time
> > *
> > * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
> > * connected socket s, having epoll reference: index = 5000,
> > * splice = UDP_BACK_TO_INIT
> > - * - if udp_splice_map[V4][5000].init_bound_sock:
> > - * - send to udp_splice_map[V4][5000].init_bound_sock, with destination
> > - * port 5000
> > + * - if udp_splice_to_ns[V4][5000].orig_sock:
> > + * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
> > + * 5000
> > * - otherwise, discard
> > *
> > * - from namespace to init:
> > *
> > * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
> > * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
> > - * - if udp4_splice_map[V4][2000].init_conn_sock:
> > - * - send packet to udp4_splice_map[2000].init_conn_sock
> > + * - if udp4_splice_to_init[V4][2000].target_sock:
> > + * - send packet to udp_splice_to_init[V4][2000].target_sock
> > * - otherwise:
> > - * - create new socket udp_splice_map[V4][2000].init_conn_sock
> > + * - create new socket udp_splice_to_init[V4][2000].target_sock
> > * - bind in init to 127.0.0.1:2000
> > * - connect in init to 127.0.0.1:22 (note: this destination port
> > * might be remapped to another port instead)
> > * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
> > - * - set udp_splice_map[V4][2000].ns_bound_sock to s
> > - * - update udp_splice_map[V4][2000].init_conn_ts with current time
> > + * - set udp_splice_to_init[V4][2000].orig_sock to s
> > + * - update udp_splice_to_init[V4][2000].ts with current time
> > *
> > * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
> > * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> > - * - if udp_splice_map[V4][2000].ns_bound_sock:
> > - * - send to udp_splice_map[V4][2000].ns_bound_sock, with destination port
> > + * - if udp_splice_to_init[V4][2000].orig_sock:
> > + * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
> > * 2000
> > * - otherwise, discard
> > */
> > @@ -138,28 +138,26 @@ struct udp_tap_port {
> > };
> >
> > /**
> > - * struct udp_splice_port - Source port tracking for traffic between namespaces
> > - * @ns_conn_sock: Socket connected in namespace for init source port
> > - * @init_conn_sock: Socket connected in init for namespace source port
> > - * @ns_conn_ts: Timestamp of activity for socket connected in namespace
> > - * @init_conn_ts: Timestamp of activity for socket connceted in init
> > - * @ns_bound_sock: Bound socket in namespace for this source port in init
> > - * @init_bound_sock: Bound socket in init for this source port in namespace
> > + * struct udp_splice_flow - Spliced "connection"
> > + * @orig_sock: Originating socket, bound to dest port in source ns of
> > + * originating datagram
> > + * @target_sock: Target socket, bound to source port of originating
> > + * datagram in dest ns, connected to dest port of
> > + * originating datagram in dest ns
> > + * @ts: Activity timestamp
> > */
> > -struct udp_splice_port {
> > - int ns_conn_sock;
> > - int init_conn_sock;
> > -
> > - time_t ns_conn_ts;
> > - time_t init_conn_ts;
> > -
> > - int ns_bound_sock;
> > - int init_bound_sock;
> > +struct udp_splice_flow {
> > + int orig_sock;
> > + int target_sock;
> > + time_t ts;
> > };
> >
> > /* Port tracking, arrays indexed by packet source port (host order) */
> > static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
> > -static struct udp_splice_port udp_splice_map [IP_VERSIONS][NUM_PORTS];
> > +
> > +/* Spliced "connections" indexed by originating source port (host order) */
> > +static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
> > +static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
> >
> > enum udp_act_type {
> > UDP_ACT_TAP,
> > @@ -421,8 +419,17 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
> > .r.p.udp.udp = { .splice = splice, .v6 = v6,
> > .port = src }
> > };
> > - struct udp_splice_port *sp = &udp_splice_map[v6 ? V6 : V4][src];
> > + struct udp_splice_flow *flow;
> > int s;
> > + int act;
>
> ...and this should go before 'int s;'.
Fixed.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 03/16] udp: Always use sendto() rather than send() for forwarding spliced packets
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
2022-11-24 1:16 ` [PATCH v2 01/16] udp: Also bind() connected ports for "splice" forwarding David Gibson
2022-11-24 1:16 ` [PATCH v2 02/16] udp: Separate tracking of inbound and outbound packet flows David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows David Gibson
` (12 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
udp_sock_handler_splice() has two different ways of sending out packets
once it has determined the correct destination socket. For the originating
sockets (which are not connected) it uses sendto() to specify a specific
address. For the forward socket (which is connected) we use send().
However we know the correct destination address even for the forward socket
we do also know the correct destination address. We can use this to use
sendto() instead of send(), removing the need for two different paths and
some staging data structures.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 40 +++++++---------------------------------
1 file changed, 7 insertions(+), 33 deletions(-)
diff --git a/udp.c b/udp.c
index 4caf73e..1e78da5 100644
--- a/udp.c
+++ b/udp.c
@@ -48,7 +48,8 @@
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
* socket s, with epoll reference: index = 80, splice = UDP_TO_NS
* - if udp_splice_to_ns[V4][5000].target_sock:
- * - send packet to udp_splice_to_ns[V4][5000].target_sock
+ * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
+ * destination port 80
* - otherwise:
* - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
@@ -71,7 +72,8 @@
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
* - if udp4_splice_to_init[V4][2000].target_sock:
- * - send packet to udp_splice_to_init[V4][2000].target_sock
+ * - send packet to udp_splice_to_init[V4][2000].target_sock, with
+ * destination port 22
* - otherwise:
* - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
@@ -244,9 +246,6 @@ static struct mmsghdr udp6_l2_mh_tap [UDP_TAP_FRAMES_MEM];
static struct iovec udp_iov_recv [UDP_SPLICE_FRAMES];
static struct mmsghdr udp_mmh_recv [UDP_SPLICE_FRAMES];
-static struct iovec udp_iov_send [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_send [UDP_SPLICE_FRAMES];
-
static struct iovec udp_iov_sendto [UDP_SPLICE_FRAMES];
static struct mmsghdr udp_mmh_sendto [UDP_SPLICE_FRAMES];
@@ -530,7 +529,7 @@ static int udp_splice_connect_ns(void *arg)
static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port, send_dst = 0;
+ in_port_t src, dst = ref.r.p.udp.udp.port;
struct msghdr *mh = &udp_mmh_recv[0].msg_hdr;
struct sockaddr_storage *sa_s = mh->msg_name;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
@@ -571,8 +570,6 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_BACK_TO_INIT:
if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
return;
-
- send_dst = dst;
break;
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
@@ -588,25 +585,11 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_BACK_TO_NS:
if (!(s = udp_splice_to_init[v6][dst].orig_sock))
return;
-
- send_dst = dst;
break;
default:
return;
}
- if (ref.r.p.udp.udp.splice == UDP_TO_NS ||
- ref.r.p.udp.udp.splice == UDP_TO_INIT) {
- for (i = 0; i < n; i++) {
- struct msghdr *mh_s = &udp_mmh_send[i].msg_hdr;
-
- mh_s->msg_iov->iov_len = udp_mmh_recv[i].msg_len;
- }
-
- sendmmsg(s, udp_mmh_send, n, MSG_NOSIGNAL);
- return;
- }
-
for (i = 0; i < n; i++) {
struct msghdr *mh_s = &udp_mmh_sendto[i].msg_hdr;
@@ -618,7 +601,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
((struct sockaddr_in6) {
.sin6_family = AF_INET6,
.sin6_addr = IN6ADDR_LOOPBACK_INIT,
- .sin6_port = htons(send_dst),
+ .sin6_port = htons(dst),
.sin6_scope_id = 0,
});
} else {
@@ -626,7 +609,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
((struct sockaddr_in) {
.sin_family = AF_INET,
.sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
- .sin_port = htons(send_dst),
+ .sin_port = htons(dst),
.sin_zero = { 0 },
});
}
@@ -1229,15 +1212,6 @@ static void udp_splice_iov_init(void)
iov->iov_len = sizeof(udp_splice_buf[i]);
}
- for (i = 0, h = udp_mmh_send; i < UDP_SPLICE_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
-
- mh->msg_iov = &udp_iov_send[i];
- mh->msg_iovlen = 1;
- }
- for (i = 0, iov = udp_iov_send; i < UDP_SPLICE_FRAMES; i++, iov++)
- iov->iov_base = udp_splice_buf[i];
-
for (i = 0, h = udp_mmh_sendto; i < UDP_SPLICE_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
--
@@ -48,7 +48,8 @@
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
* socket s, with epoll reference: index = 80, splice = UDP_TO_NS
* - if udp_splice_to_ns[V4][5000].target_sock:
- * - send packet to udp_splice_to_ns[V4][5000].target_sock
+ * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
+ * destination port 80
* - otherwise:
* - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
@@ -71,7 +72,8 @@
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
* - if udp4_splice_to_init[V4][2000].target_sock:
- * - send packet to udp_splice_to_init[V4][2000].target_sock
+ * - send packet to udp_splice_to_init[V4][2000].target_sock, with
+ * destination port 22
* - otherwise:
* - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
@@ -244,9 +246,6 @@ static struct mmsghdr udp6_l2_mh_tap [UDP_TAP_FRAMES_MEM];
static struct iovec udp_iov_recv [UDP_SPLICE_FRAMES];
static struct mmsghdr udp_mmh_recv [UDP_SPLICE_FRAMES];
-static struct iovec udp_iov_send [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_send [UDP_SPLICE_FRAMES];
-
static struct iovec udp_iov_sendto [UDP_SPLICE_FRAMES];
static struct mmsghdr udp_mmh_sendto [UDP_SPLICE_FRAMES];
@@ -530,7 +529,7 @@ static int udp_splice_connect_ns(void *arg)
static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port, send_dst = 0;
+ in_port_t src, dst = ref.r.p.udp.udp.port;
struct msghdr *mh = &udp_mmh_recv[0].msg_hdr;
struct sockaddr_storage *sa_s = mh->msg_name;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
@@ -571,8 +570,6 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_BACK_TO_INIT:
if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
return;
-
- send_dst = dst;
break;
case UDP_TO_INIT:
src += c->udp.fwd_in.rdelta[src];
@@ -588,25 +585,11 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
case UDP_BACK_TO_NS:
if (!(s = udp_splice_to_init[v6][dst].orig_sock))
return;
-
- send_dst = dst;
break;
default:
return;
}
- if (ref.r.p.udp.udp.splice == UDP_TO_NS ||
- ref.r.p.udp.udp.splice == UDP_TO_INIT) {
- for (i = 0; i < n; i++) {
- struct msghdr *mh_s = &udp_mmh_send[i].msg_hdr;
-
- mh_s->msg_iov->iov_len = udp_mmh_recv[i].msg_len;
- }
-
- sendmmsg(s, udp_mmh_send, n, MSG_NOSIGNAL);
- return;
- }
-
for (i = 0; i < n; i++) {
struct msghdr *mh_s = &udp_mmh_sendto[i].msg_hdr;
@@ -618,7 +601,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
((struct sockaddr_in6) {
.sin6_family = AF_INET6,
.sin6_addr = IN6ADDR_LOOPBACK_INIT,
- .sin6_port = htons(send_dst),
+ .sin6_port = htons(dst),
.sin6_scope_id = 0,
});
} else {
@@ -626,7 +609,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
((struct sockaddr_in) {
.sin_family = AF_INET,
.sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
- .sin_port = htons(send_dst),
+ .sin_port = htons(dst),
.sin_zero = { 0 },
});
}
@@ -1229,15 +1212,6 @@ static void udp_splice_iov_init(void)
iov->iov_len = sizeof(udp_splice_buf[i]);
}
- for (i = 0, h = udp_mmh_send; i < UDP_SPLICE_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
-
- mh->msg_iov = &udp_iov_send[i];
- mh->msg_iovlen = 1;
- }
- for (i = 0, iov = udp_iov_send; i < UDP_SPLICE_FRAMES; i++, iov++)
- iov->iov_base = udp_splice_buf[i];
-
for (i = 0, h = udp_mmh_sendto; i < UDP_SPLICE_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (2 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 03/16] udp: Always use sendto() rather than send() for forwarding spliced packets David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-25 1:47 ` Stefano Brivio
2022-11-24 1:16 ` [PATCH v2 05/16] udp: Remove the @bound field from union udp_epoll_ref David Gibson
` (11 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Currently we connect() the socket we use to forward spliced UDP flows.
However, we now only ever use sendto() rather than send() on this socket
so there's not actually any need to connect it. Don't do so.
Rename a number of things that referred to "connect" or "conn" since that
would now be misleading.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 85 ++++++++++++++++++++++++-----------------------------------
1 file changed, 35 insertions(+), 50 deletions(-)
diff --git a/udp.c b/udp.c
index 1e78da5..ed095ec 100644
--- a/udp.c
+++ b/udp.c
@@ -45,23 +45,20 @@
*
* - from init to namespace:
*
- * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
- * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
+ * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
+ * with epoll reference: index = 80, splice = UDP_TO_NS
* - if udp_splice_to_ns[V4][5000].target_sock:
* - send packet to udp_splice_to_ns[V4][5000].target_sock, with
* destination port 80
* - otherwise:
* - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
- * - connect in namespace to 127.0.0.1:80 (note: this destination port
- * might be remapped to another port instead)
* - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
* - set udp_splice_to_ns[V4][5000].orig_sock to s
* - update udp_splice_to_ns[V4][5000].ts with current time
*
- * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
- * connected socket s, having epoll reference: index = 5000,
- * splice = UDP_BACK_TO_INIT
+ * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
+ * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
* - if udp_splice_to_ns[V4][5000].orig_sock:
* - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
* 5000
@@ -69,7 +66,7 @@
*
* - from namespace to init:
*
- * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
+ * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
* - if udp4_splice_to_init[V4][2000].target_sock:
* - send packet to udp_splice_to_init[V4][2000].target_sock, with
@@ -77,14 +74,12 @@
* - otherwise:
* - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
- * - connect in init to 127.0.0.1:22 (note: this destination port
- * might be remapped to another port instead)
* - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
* - set udp_splice_to_init[V4][2000].orig_sock to s
* - update udp_splice_to_init[V4][2000].ts with current time
*
- * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
- * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
+ * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
* - if udp_splice_to_init[V4][2000].orig_sock:
* - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
* 2000
@@ -144,8 +139,7 @@ struct udp_tap_port {
* @orig_sock: Originating socket, bound to dest port in source ns of
* originating datagram
* @target_sock: Target socket, bound to source port of originating
- * datagram in dest ns, connected to dest port of
- * originating datagram in dest ns
+ * datagram in dest ns
* @ts: Activity timestamp
*/
struct udp_splice_flow {
@@ -163,8 +157,8 @@ static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
- UDP_ACT_NS_CONN,
- UDP_ACT_INIT_CONN,
+ UDP_ACT_SPLICE_NS,
+ UDP_ACT_SPLICE_INIT,
UDP_ACT_TYPE_MAX,
};
@@ -398,20 +392,19 @@ static void udp_sock6_iov_init(void)
}
/**
- * udp_splice_connect() - Create and connect socket for "spliced" binding
+ * udp_splice_new() - Create and prepare socket for "spliced" binding
* @c: Execution context
- * @v6: Set for IPv6 connections
+ * @v6: Set for IPv6 sockets
* @bound_sock: Originating bound socket
* @src: Source port of original connection, host order
- * @dst: Destination port of original connection, host order
* @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
*
- * Return: connected socket, negative error code on failure
+ * Return: Prepared socket, negative error code on failure
*
* #syscalls:pasta getsockname
*/
-int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
- in_port_t src, in_port_t dst, int splice)
+int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
+ int splice)
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
@@ -424,10 +417,10 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
if (splice == UDP_BACK_TO_INIT) {
flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
- act = UDP_ACT_NS_CONN;
+ act = UDP_ACT_SPLICE_NS;
} else {
flow = &udp_splice_to_init[v6 ? V6 : V4][src];
- act = UDP_ACT_INIT_CONN;
+ act = UDP_ACT_SPLICE_INIT;
}
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
@@ -451,9 +444,6 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
};
if (bind(s, (struct sockaddr *)&addr6, sizeof(addr6)))
goto fail;
- addr6.sin6_port = htons(dst);
- if (connect(s, (struct sockaddr *)&addr6, sizeof(addr6)))
- goto fail;
} else {
struct sockaddr_in addr4 = {
.sin_family = AF_INET,
@@ -462,9 +452,6 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
};
if (bind(s, (struct sockaddr *)&addr4, sizeof(addr4)))
goto fail;
- addr4.sin_port = htons(dst);
- if (connect(s, (struct sockaddr *)&addr4, sizeof(addr4)))
- goto fail;
}
flow->orig_sock = bound_sock;
@@ -481,40 +468,39 @@ fail:
}
/**
- * struct udp_splice_connect_ns_arg - Arguments for udp_splice_connect_ns()
+ * struct udp_splice_new_ns_arg - Arguments for udp_splice_new_ns()
* @c: Execution context
- * @v6: Set for inbound IPv6 connection
+ * @v6: Set for IPv6
* @bound_sock: Originating bound socket
- * @src: Source port of original connection, host order
- * @dst: Destination port of original connection, host order
+ * @src: Source port of originating datagram, host order
+ * @dst: Destination port of originating datagram, host order
* @s: Newly created socket or negative error code
*/
-struct udp_splice_connect_ns_arg {
+struct udp_splice_new_ns_arg {
const struct ctx *c;
int v6;
int bound_sock;
in_port_t src;
- in_port_t dst;
int s;
};
/**
- * udp_splice_connect_ns() - Enter namespace and call udp_splice_connect()
- * @arg: See struct udp_splice_connect_ns_arg
+ * udp_splice_new_ns() - Enter namespace and call udp_splice_new()
+ * @arg: See struct udp_splice_new_ns_arg
*
* Return: 0
*/
-static int udp_splice_connect_ns(void *arg)
+static int udp_splice_new_ns(void *arg)
{
- struct udp_splice_connect_ns_arg *a;
+ struct udp_splice_new_ns_arg *a;
- a = (struct udp_splice_connect_ns_arg *)arg;
+ a = (struct udp_splice_new_ns_arg *)arg;
if (ns_enter(a->c))
return 0;
- a->s = udp_splice_connect(a->c, a->v6, a->bound_sock, a->src, a->dst,
- UDP_BACK_TO_INIT);
+ a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src,
+ UDP_BACK_TO_INIT);
return 0;
}
@@ -557,11 +543,11 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
src += c->udp.fwd_out.rdelta[src];
if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
- struct udp_splice_connect_ns_arg arg = {
- c, v6, ref.r.s, src, dst, -1,
+ struct udp_splice_new_ns_arg arg = {
+ c, v6, ref.r.s, src, -1,
};
- NS_CALL(udp_splice_connect_ns, &arg);
+ NS_CALL(udp_splice_new_ns, &arg);
if ((s = arg.s) < 0)
return;
}
@@ -575,8 +561,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
src += c->udp.fwd_in.rdelta[src];
if (!(s = udp_splice_to_init[v6][src].target_sock)) {
- s = udp_splice_connect(c, v6, ref.r.s, src, dst,
- UDP_BACK_TO_NS);
+ s = udp_splice_new(c, v6, ref.r.s, src, UDP_BACK_TO_NS);
if (s < 0)
return;
}
@@ -1275,14 +1260,14 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
}
break;
- case UDP_ACT_INIT_CONN:
+ case UDP_ACT_SPLICE_INIT:
flow = &udp_splice_to_init[v6 ? V6 : V4][port];
if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
s = flow->target_sock;
break;
- case UDP_ACT_NS_CONN:
+ case UDP_ACT_SPLICE_NS:
flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
--
@@ -45,23 +45,20 @@
*
* - from init to namespace:
*
- * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
- * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
+ * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
+ * with epoll reference: index = 80, splice = UDP_TO_NS
* - if udp_splice_to_ns[V4][5000].target_sock:
* - send packet to udp_splice_to_ns[V4][5000].target_sock, with
* destination port 80
* - otherwise:
* - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
- * - connect in namespace to 127.0.0.1:80 (note: this destination port
- * might be remapped to another port instead)
* - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
* - set udp_splice_to_ns[V4][5000].orig_sock to s
* - update udp_splice_to_ns[V4][5000].ts with current time
*
- * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
- * connected socket s, having epoll reference: index = 5000,
- * splice = UDP_BACK_TO_INIT
+ * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
+ * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
* - if udp_splice_to_ns[V4][5000].orig_sock:
* - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
* 5000
@@ -69,7 +66,7 @@
*
* - from namespace to init:
*
- * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
+ * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
* socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
* - if udp4_splice_to_init[V4][2000].target_sock:
* - send packet to udp_splice_to_init[V4][2000].target_sock, with
@@ -77,14 +74,12 @@
* - otherwise:
* - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
- * - connect in init to 127.0.0.1:22 (note: this destination port
- * might be remapped to another port instead)
* - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
* - set udp_splice_to_init[V4][2000].orig_sock to s
* - update udp_splice_to_init[V4][2000].ts with current time
*
- * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
- * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
+ * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
* - if udp_splice_to_init[V4][2000].orig_sock:
* - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
* 2000
@@ -144,8 +139,7 @@ struct udp_tap_port {
* @orig_sock: Originating socket, bound to dest port in source ns of
* originating datagram
* @target_sock: Target socket, bound to source port of originating
- * datagram in dest ns, connected to dest port of
- * originating datagram in dest ns
+ * datagram in dest ns
* @ts: Activity timestamp
*/
struct udp_splice_flow {
@@ -163,8 +157,8 @@ static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
- UDP_ACT_NS_CONN,
- UDP_ACT_INIT_CONN,
+ UDP_ACT_SPLICE_NS,
+ UDP_ACT_SPLICE_INIT,
UDP_ACT_TYPE_MAX,
};
@@ -398,20 +392,19 @@ static void udp_sock6_iov_init(void)
}
/**
- * udp_splice_connect() - Create and connect socket for "spliced" binding
+ * udp_splice_new() - Create and prepare socket for "spliced" binding
* @c: Execution context
- * @v6: Set for IPv6 connections
+ * @v6: Set for IPv6 sockets
* @bound_sock: Originating bound socket
* @src: Source port of original connection, host order
- * @dst: Destination port of original connection, host order
* @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
*
- * Return: connected socket, negative error code on failure
+ * Return: Prepared socket, negative error code on failure
*
* #syscalls:pasta getsockname
*/
-int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
- in_port_t src, in_port_t dst, int splice)
+int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
+ int splice)
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
@@ -424,10 +417,10 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
if (splice == UDP_BACK_TO_INIT) {
flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
- act = UDP_ACT_NS_CONN;
+ act = UDP_ACT_SPLICE_NS;
} else {
flow = &udp_splice_to_init[v6 ? V6 : V4][src];
- act = UDP_ACT_INIT_CONN;
+ act = UDP_ACT_SPLICE_INIT;
}
s = socket(v6 ? AF_INET6 : AF_INET, SOCK_DGRAM | SOCK_NONBLOCK,
@@ -451,9 +444,6 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
};
if (bind(s, (struct sockaddr *)&addr6, sizeof(addr6)))
goto fail;
- addr6.sin6_port = htons(dst);
- if (connect(s, (struct sockaddr *)&addr6, sizeof(addr6)))
- goto fail;
} else {
struct sockaddr_in addr4 = {
.sin_family = AF_INET,
@@ -462,9 +452,6 @@ int udp_splice_connect(const struct ctx *c, int v6, int bound_sock,
};
if (bind(s, (struct sockaddr *)&addr4, sizeof(addr4)))
goto fail;
- addr4.sin_port = htons(dst);
- if (connect(s, (struct sockaddr *)&addr4, sizeof(addr4)))
- goto fail;
}
flow->orig_sock = bound_sock;
@@ -481,40 +468,39 @@ fail:
}
/**
- * struct udp_splice_connect_ns_arg - Arguments for udp_splice_connect_ns()
+ * struct udp_splice_new_ns_arg - Arguments for udp_splice_new_ns()
* @c: Execution context
- * @v6: Set for inbound IPv6 connection
+ * @v6: Set for IPv6
* @bound_sock: Originating bound socket
- * @src: Source port of original connection, host order
- * @dst: Destination port of original connection, host order
+ * @src: Source port of originating datagram, host order
+ * @dst: Destination port of originating datagram, host order
* @s: Newly created socket or negative error code
*/
-struct udp_splice_connect_ns_arg {
+struct udp_splice_new_ns_arg {
const struct ctx *c;
int v6;
int bound_sock;
in_port_t src;
- in_port_t dst;
int s;
};
/**
- * udp_splice_connect_ns() - Enter namespace and call udp_splice_connect()
- * @arg: See struct udp_splice_connect_ns_arg
+ * udp_splice_new_ns() - Enter namespace and call udp_splice_new()
+ * @arg: See struct udp_splice_new_ns_arg
*
* Return: 0
*/
-static int udp_splice_connect_ns(void *arg)
+static int udp_splice_new_ns(void *arg)
{
- struct udp_splice_connect_ns_arg *a;
+ struct udp_splice_new_ns_arg *a;
- a = (struct udp_splice_connect_ns_arg *)arg;
+ a = (struct udp_splice_new_ns_arg *)arg;
if (ns_enter(a->c))
return 0;
- a->s = udp_splice_connect(a->c, a->v6, a->bound_sock, a->src, a->dst,
- UDP_BACK_TO_INIT);
+ a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src,
+ UDP_BACK_TO_INIT);
return 0;
}
@@ -557,11 +543,11 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
src += c->udp.fwd_out.rdelta[src];
if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
- struct udp_splice_connect_ns_arg arg = {
- c, v6, ref.r.s, src, dst, -1,
+ struct udp_splice_new_ns_arg arg = {
+ c, v6, ref.r.s, src, -1,
};
- NS_CALL(udp_splice_connect_ns, &arg);
+ NS_CALL(udp_splice_new_ns, &arg);
if ((s = arg.s) < 0)
return;
}
@@ -575,8 +561,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
src += c->udp.fwd_in.rdelta[src];
if (!(s = udp_splice_to_init[v6][src].target_sock)) {
- s = udp_splice_connect(c, v6, ref.r.s, src, dst,
- UDP_BACK_TO_NS);
+ s = udp_splice_new(c, v6, ref.r.s, src, UDP_BACK_TO_NS);
if (s < 0)
return;
}
@@ -1275,14 +1260,14 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
}
break;
- case UDP_ACT_INIT_CONN:
+ case UDP_ACT_SPLICE_INIT:
flow = &udp_splice_to_init[v6 ? V6 : V4][port];
if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
s = flow->target_sock;
break;
- case UDP_ACT_NS_CONN:
+ case UDP_ACT_SPLICE_NS:
flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows
2022-11-24 1:16 ` [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows David Gibson
@ 2022-11-25 1:47 ` Stefano Brivio
2022-11-25 7:07 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Stefano Brivio @ 2022-11-25 1:47 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
Nit:
On Thu, 24 Nov 2022 12:16:47 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> Currently we connect() the socket we use to forward spliced UDP flows.
> However, we now only ever use sendto() rather than send() on this socket
> so there's not actually any need to connect it. Don't do so.
>
> Rename a number of things that referred to "connect" or "conn" since that
> would now be misleading.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> udp.c | 85 ++++++++++++++++++++++++-----------------------------------
> 1 file changed, 35 insertions(+), 50 deletions(-)
>
> diff --git a/udp.c b/udp.c
> index 1e78da5..ed095ec 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -45,23 +45,20 @@
> *
> * - from init to namespace:
> *
> - * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
> - * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
> + * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
> + * with epoll reference: index = 80, splice = UDP_TO_NS
> * - if udp_splice_to_ns[V4][5000].target_sock:
> * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
> * destination port 80
> * - otherwise:
> * - create new socket udp_splice_to_ns[V4][5000].target_sock
> * - bind in namespace to 127.0.0.1:5000
> - * - connect in namespace to 127.0.0.1:80 (note: this destination port
> - * might be remapped to another port instead)
> * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
> * - set udp_splice_to_ns[V4][5000].orig_sock to s
> * - update udp_splice_to_ns[V4][5000].ts with current time
> *
> - * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
> - * connected socket s, having epoll reference: index = 5000,
> - * splice = UDP_BACK_TO_INIT
> + * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
> + * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
> * - if udp_splice_to_ns[V4][5000].orig_sock:
> * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
> * 5000
> @@ -69,7 +66,7 @@
> *
> * - from namespace to init:
> *
> - * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
> + * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
> * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
> * - if udp4_splice_to_init[V4][2000].target_sock:
> * - send packet to udp_splice_to_init[V4][2000].target_sock, with
> @@ -77,14 +74,12 @@
> * - otherwise:
> * - create new socket udp_splice_to_init[V4][2000].target_sock
> * - bind in init to 127.0.0.1:2000
> - * - connect in init to 127.0.0.1:22 (note: this destination port
> - * might be remapped to another port instead)
> * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
> * - set udp_splice_to_init[V4][2000].orig_sock to s
> * - update udp_splice_to_init[V4][2000].ts with current time
> *
> - * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
> - * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> + * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
> + * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> * - if udp_splice_to_init[V4][2000].orig_sock:
> * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
> * 2000
> @@ -144,8 +139,7 @@ struct udp_tap_port {
> * @orig_sock: Originating socket, bound to dest port in source ns of
> * originating datagram
> * @target_sock: Target socket, bound to source port of originating
> - * datagram in dest ns, connected to dest port of
> - * originating datagram in dest ns
> + * datagram in dest ns
> * @ts: Activity timestamp
> */
> struct udp_splice_flow {
> @@ -163,8 +157,8 @@ static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
>
> enum udp_act_type {
> UDP_ACT_TAP,
> - UDP_ACT_NS_CONN,
> - UDP_ACT_INIT_CONN,
> + UDP_ACT_SPLICE_NS,
> + UDP_ACT_SPLICE_INIT,
> UDP_ACT_TYPE_MAX,
> };
>
> @@ -398,20 +392,19 @@ static void udp_sock6_iov_init(void)
> }
>
> /**
> - * udp_splice_connect() - Create and connect socket for "spliced" binding
> + * udp_splice_new() - Create and prepare socket for "spliced" binding
> * @c: Execution context
> - * @v6: Set for IPv6 connections
> + * @v6: Set for IPv6 sockets
> * @bound_sock: Originating bound socket
> * @src: Source port of original connection, host order
> - * @dst: Destination port of original connection, host order
> * @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
> *
> - * Return: connected socket, negative error code on failure
> + * Return: Prepared socket, negative error code on failure
prepared
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows
2022-11-25 1:47 ` Stefano Brivio
@ 2022-11-25 7:07 ` David Gibson
2022-12-01 18:49 ` Stefano Brivio
0 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-25 7:07 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5728 bytes --]
On Fri, Nov 25, 2022 at 02:47:59AM +0100, Stefano Brivio wrote:
> Nit:
>
> On Thu, 24 Nov 2022 12:16:47 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > Currently we connect() the socket we use to forward spliced UDP flows.
> > However, we now only ever use sendto() rather than send() on this socket
> > so there's not actually any need to connect it. Don't do so.
> >
> > Rename a number of things that referred to "connect" or "conn" since that
> > would now be misleading.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > udp.c | 85 ++++++++++++++++++++++++-----------------------------------
> > 1 file changed, 35 insertions(+), 50 deletions(-)
> >
> > diff --git a/udp.c b/udp.c
> > index 1e78da5..ed095ec 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -45,23 +45,20 @@
> > *
> > * - from init to namespace:
> > *
> > - * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
> > - * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
> > + * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
> > + * with epoll reference: index = 80, splice = UDP_TO_NS
> > * - if udp_splice_to_ns[V4][5000].target_sock:
> > * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
> > * destination port 80
> > * - otherwise:
> > * - create new socket udp_splice_to_ns[V4][5000].target_sock
> > * - bind in namespace to 127.0.0.1:5000
> > - * - connect in namespace to 127.0.0.1:80 (note: this destination port
> > - * might be remapped to another port instead)
> > * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
> > * - set udp_splice_to_ns[V4][5000].orig_sock to s
> > * - update udp_splice_to_ns[V4][5000].ts with current time
> > *
> > - * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
> > - * connected socket s, having epoll reference: index = 5000,
> > - * splice = UDP_BACK_TO_INIT
> > + * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
> > + * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
> > * - if udp_splice_to_ns[V4][5000].orig_sock:
> > * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
> > * 5000
> > @@ -69,7 +66,7 @@
> > *
> > * - from namespace to init:
> > *
> > - * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
> > + * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
> > * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
> > * - if udp4_splice_to_init[V4][2000].target_sock:
> > * - send packet to udp_splice_to_init[V4][2000].target_sock, with
> > @@ -77,14 +74,12 @@
> > * - otherwise:
> > * - create new socket udp_splice_to_init[V4][2000].target_sock
> > * - bind in init to 127.0.0.1:2000
> > - * - connect in init to 127.0.0.1:22 (note: this destination port
> > - * might be remapped to another port instead)
> > * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
> > * - set udp_splice_to_init[V4][2000].orig_sock to s
> > * - update udp_splice_to_init[V4][2000].ts with current time
> > *
> > - * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
> > - * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> > + * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
> > + * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> > * - if udp_splice_to_init[V4][2000].orig_sock:
> > * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
> > * 2000
> > @@ -144,8 +139,7 @@ struct udp_tap_port {
> > * @orig_sock: Originating socket, bound to dest port in source ns of
> > * originating datagram
> > * @target_sock: Target socket, bound to source port of originating
> > - * datagram in dest ns, connected to dest port of
> > - * originating datagram in dest ns
> > + * datagram in dest ns
> > * @ts: Activity timestamp
> > */
> > struct udp_splice_flow {
> > @@ -163,8 +157,8 @@ static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
> >
> > enum udp_act_type {
> > UDP_ACT_TAP,
> > - UDP_ACT_NS_CONN,
> > - UDP_ACT_INIT_CONN,
> > + UDP_ACT_SPLICE_NS,
> > + UDP_ACT_SPLICE_INIT,
> > UDP_ACT_TYPE_MAX,
> > };
> >
> > @@ -398,20 +392,19 @@ static void udp_sock6_iov_init(void)
> > }
> >
> > /**
> > - * udp_splice_connect() - Create and connect socket for "spliced" binding
> > + * udp_splice_new() - Create and prepare socket for "spliced" binding
> > * @c: Execution context
> > - * @v6: Set for IPv6 connections
> > + * @v6: Set for IPv6 sockets
> > * @bound_sock: Originating bound socket
> > * @src: Source port of original connection, host order
> > - * @dst: Destination port of original connection, host order
> > * @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
> > *
> > - * Return: connected socket, negative error code on failure
> > + * Return: Prepared socket, negative error code on failure
>
> prepared
Fixed. Is there a particular rationale for capitalizing on the
parameters, but not the return description?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows
2022-11-25 7:07 ` David Gibson
@ 2022-12-01 18:49 ` Stefano Brivio
0 siblings, 0 replies; 29+ messages in thread
From: Stefano Brivio @ 2022-12-01 18:49 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Fri, 25 Nov 2022 18:07:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Nov 25, 2022 at 02:47:59AM +0100, Stefano Brivio wrote:
> > Nit:
> >
> > On Thu, 24 Nov 2022 12:16:47 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > Currently we connect() the socket we use to forward spliced UDP flows.
> > > However, we now only ever use sendto() rather than send() on this socket
> > > so there's not actually any need to connect it. Don't do so.
> > >
> > > Rename a number of things that referred to "connect" or "conn" since that
> > > would now be misleading.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > udp.c | 85 ++++++++++++++++++++++++-----------------------------------
> > > 1 file changed, 35 insertions(+), 50 deletions(-)
> > >
> > > diff --git a/udp.c b/udp.c
> > > index 1e78da5..ed095ec 100644
> > > --- a/udp.c
> > > +++ b/udp.c
> > > @@ -45,23 +45,20 @@
> > > *
> > > * - from init to namespace:
> > > *
> > > - * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from bound
> > > - * socket s, with epoll reference: index = 80, splice = UDP_TO_NS
> > > + * - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
> > > + * with epoll reference: index = 80, splice = UDP_TO_NS
> > > * - if udp_splice_to_ns[V4][5000].target_sock:
> > > * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
> > > * destination port 80
> > > * - otherwise:
> > > * - create new socket udp_splice_to_ns[V4][5000].target_sock
> > > * - bind in namespace to 127.0.0.1:5000
> > > - * - connect in namespace to 127.0.0.1:80 (note: this destination port
> > > - * might be remapped to another port instead)
> > > * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
> > > * - set udp_splice_to_ns[V4][5000].orig_sock to s
> > > * - update udp_splice_to_ns[V4][5000].ts with current time
> > > *
> > > - * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace from
> > > - * connected socket s, having epoll reference: index = 5000,
> > > - * splice = UDP_BACK_TO_INIT
> > > + * - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
> > > + * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
> > > * - if udp_splice_to_ns[V4][5000].orig_sock:
> > > * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
> > > * 5000
> > > @@ -69,7 +66,7 @@
> > > *
> > > * - from namespace to init:
> > > *
> > > - * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from bound
> > > + * - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
> > > * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
> > > * - if udp4_splice_to_init[V4][2000].target_sock:
> > > * - send packet to udp_splice_to_init[V4][2000].target_sock, with
> > > @@ -77,14 +74,12 @@
> > > * - otherwise:
> > > * - create new socket udp_splice_to_init[V4][2000].target_sock
> > > * - bind in init to 127.0.0.1:2000
> > > - * - connect in init to 127.0.0.1:22 (note: this destination port
> > > - * might be remapped to another port instead)
> > > * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
> > > * - set udp_splice_to_init[V4][2000].orig_sock to s
> > > * - update udp_splice_to_init[V4][2000].ts with current time
> > > *
> > > - * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from connected
> > > - * socket s, having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> > > + * - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
> > > + * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
> > > * - if udp_splice_to_init[V4][2000].orig_sock:
> > > * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
> > > * 2000
> > > @@ -144,8 +139,7 @@ struct udp_tap_port {
> > > * @orig_sock: Originating socket, bound to dest port in source ns of
> > > * originating datagram
> > > * @target_sock: Target socket, bound to source port of originating
> > > - * datagram in dest ns, connected to dest port of
> > > - * originating datagram in dest ns
> > > + * datagram in dest ns
> > > * @ts: Activity timestamp
> > > */
> > > struct udp_splice_flow {
> > > @@ -163,8 +157,8 @@ static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
> > >
> > > enum udp_act_type {
> > > UDP_ACT_TAP,
> > > - UDP_ACT_NS_CONN,
> > > - UDP_ACT_INIT_CONN,
> > > + UDP_ACT_SPLICE_NS,
> > > + UDP_ACT_SPLICE_INIT,
> > > UDP_ACT_TYPE_MAX,
> > > };
> > >
> > > @@ -398,20 +392,19 @@ static void udp_sock6_iov_init(void)
> > > }
> > >
> > > /**
> > > - * udp_splice_connect() - Create and connect socket for "spliced" binding
> > > + * udp_splice_new() - Create and prepare socket for "spliced" binding
> > > * @c: Execution context
> > > - * @v6: Set for IPv6 connections
> > > + * @v6: Set for IPv6 sockets
> > > * @bound_sock: Originating bound socket
> > > * @src: Source port of original connection, host order
> > > - * @dst: Destination port of original connection, host order
> > > * @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
> > > *
> > > - * Return: connected socket, negative error code on failure
> > > + * Return: Prepared socket, negative error code on failure
> >
> > prepared
>
> Fixed. Is there a particular rationale for capitalizing on the
> parameters, but not the return description?
Sorry, I started reviewing your v4 and noticed I forgot to answer this.
It's actually inconsistent with the Kernel-doc style which pretty much
inspires this:
https://docs.kernel.org/doc-guide/kernel-doc.html
so I started like that because in kernel's net/ directory the
non-capitalised version is more common:
$ grep -rnI "Return: [A-Z][ a-z]" | wc -l
66
$ grep -rnI "Return: [a-z]" | wc -l
298
and I'm simply used to that -- but it's also true I added 19 of those.
My very personal interpretation of the rationale is that in this case we
form a sentence, and at the same time sentences certainly don't start
with '@'.
Actually, if we capitalise them, we could at least refer to those as
coding style guidelines. :) I'd rather "fix" this all at once when we
get to https://bugs.passt.top/show_bug.cgi?id=35.
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 05/16] udp: Remove the @bound field from union udp_epoll_ref
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (3 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 04/16] udp: Don't connect "forward" sockets for spliced flows David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 06/16] udp: Split splice field in udp_epoll_ref into (mostly) independent bits David Gibson
` (10 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
We set this field, but nothing ever checked it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 8 +++-----
udp.h | 3 +--
2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/udp.c b/udp.c
index ed095ec..2c5e4e9 100644
--- a/udp.c
+++ b/udp.c
@@ -952,8 +952,7 @@ int udp_tap_handler(struct ctx *c, int af, const void *addr,
sl = sizeof(s_in);
if (!(s = udp_tap_map[V4][src].sock)) {
- union udp_epoll_ref uref = { .udp.bound = 1,
- .udp.port = src };
+ union udp_epoll_ref uref = { .udp.port = src };
s = sock_l4(c, AF_INET, IPPROTO_UDP, NULL, NULL, src,
uref.u32);
@@ -1005,8 +1004,7 @@ int udp_tap_handler(struct ctx *c, int af, const void *addr,
}
if (!(s = udp_tap_map[V6][src].sock)) {
- union udp_epoll_ref uref = { .udp.bound = 1,
- .udp.v6 = 1,
+ union udp_epoll_ref uref = { .udp.v6 = 1,
.udp.port = src };
s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, NULL,
@@ -1069,7 +1067,7 @@ int udp_tap_handler(struct ctx *c, int af, const void *addr,
void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
const void *addr, const char *ifname, in_port_t port)
{
- union udp_epoll_ref uref = { .udp.bound = 1 };
+ union udp_epoll_ref uref = { .u32 = 0 };
const void *bind_addr;
int s;
diff --git a/udp.h b/udp.h
index 2ac8610..43bd28a 100644
--- a/udp.h
+++ b/udp.h
@@ -29,8 +29,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
*/
union udp_epoll_ref {
struct {
- uint32_t bound:1,
- splice:3,
+ uint32_t splice:3,
#define UDP_TO_NS 1
#define UDP_TO_INIT 2
#define UDP_BACK_TO_NS 3
--
@@ -29,8 +29,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
*/
union udp_epoll_ref {
struct {
- uint32_t bound:1,
- splice:3,
+ uint32_t splice:3,
#define UDP_TO_NS 1
#define UDP_TO_INIT 2
#define UDP_BACK_TO_NS 3
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 06/16] udp: Split splice field in udp_epoll_ref into (mostly) independent bits
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (4 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 05/16] udp: Remove the @bound field from union udp_epoll_ref David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 07/16] udp: Don't create double sockets for -U port David Gibson
` (9 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
The @splice field in union udp_epoll_ref can have a number of values for
different types of "spliced" packet flows. Split it into several single
bit fields with more or less independent meanings. The new @splice field
is just a boolean indicating whether the socket is associated with a
spliced flow, making it identical to the @splice fiend in tcp_epoll_ref.
The new bit @orig, indicates whether this is a socket which can originate
new udp packet flows (created with -u or -U) or a socket created on the
fly to handle reply socket. @ns indicates whether the socket lives in the
init namespace or the pasta namespace.
Making these bits more orthogonal to each other will simplify some future
cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
passt.h | 2 ++
udp.c | 53 ++++++++++++++++++++++++++---------------------------
udp.h | 15 +++++++--------
3 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/passt.h b/passt.h
index 6649c0a..01d8893 100644
--- a/passt.h
+++ b/passt.h
@@ -31,6 +31,8 @@ struct tap_l4_msg {
union epoll_ref;
+#include <stdbool.h>
+
#include "packet.h"
#include "icmp.h"
#include "port_fwd.h"
diff --git a/udp.c b/udp.c
index 2c5e4e9..4c87e66 100644
--- a/udp.c
+++ b/udp.c
@@ -46,19 +46,20 @@
* - from init to namespace:
*
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
- * with epoll reference: index = 80, splice = UDP_TO_NS
+ * with epoll reference: index = 80, splice = 1, orig = 1, ns = 0
* - if udp_splice_to_ns[V4][5000].target_sock:
* - send packet to udp_splice_to_ns[V4][5000].target_sock, with
* destination port 80
* - otherwise:
* - create new socket udp_splice_to_ns[V4][5000].target_sock
* - bind in namespace to 127.0.0.1:5000
- * - add to epoll with reference: index = 5000, splice: UDP_BACK_TO_INIT
+ * - add to epoll with reference: index = 5000, splice = 1, orig = 0,
+ * ns = 1
* - set udp_splice_to_ns[V4][5000].orig_sock to s
* - update udp_splice_to_ns[V4][5000].ts with current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
- * having epoll reference: index = 5000, splice = UDP_BACK_TO_INIT
+ * having epoll reference: index = 5000, splice = 1, orig = 0, ns = 1
* - if udp_splice_to_ns[V4][5000].orig_sock:
* - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
* 5000
@@ -67,19 +68,20 @@
* - from namespace to init:
*
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
- * socket s, with epoll reference: index = 22, splice = UDP_TO_INIT
+ * socket s, with epoll reference: index = 22, splice = 1, orig = 1, ns = 1
* - if udp4_splice_to_init[V4][2000].target_sock:
* - send packet to udp_splice_to_init[V4][2000].target_sock, with
* destination port 22
* - otherwise:
* - create new socket udp_splice_to_init[V4][2000].target_sock
* - bind in init to 127.0.0.1:2000
- * - add to epoll with reference: index = 2000, splice = UDP_BACK_TO_NS
+ * - add to epoll with reference: index = 2000, splice = 1, orig = 0,
+ * ns = 0
* - set udp_splice_to_init[V4][2000].orig_sock to s
* - update udp_splice_to_init[V4][2000].ts with current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
- * having epoll reference: index = 2000, splice = UDP_BACK_TO_NS
+ * having epoll reference: index = 2000, splice = 1, orig = 0, ns = 0
* - if udp_splice_to_init[V4][2000].orig_sock:
* - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
* 2000
@@ -404,18 +406,18 @@ static void udp_sock6_iov_init(void)
* #syscalls:pasta getsockname
*/
int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
- int splice)
+ bool ns)
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
- .r.p.udp.udp = { .splice = splice, .v6 = v6,
- .port = src }
+ .r.p.udp.udp = { .splice = true, .ns = ns,
+ .v6 = v6, .port = src }
};
struct udp_splice_flow *flow;
int s;
int act;
- if (splice == UDP_BACK_TO_INIT) {
+ if (ns) {
flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
act = UDP_ACT_SPLICE_NS;
} else {
@@ -499,8 +501,7 @@ static int udp_splice_new_ns(void *arg)
if (ns_enter(a->c))
return 0;
- a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src,
- UDP_BACK_TO_INIT);
+ a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src, true);
return 0;
}
@@ -538,8 +539,8 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
src = ntohs(sa->sin_port);
}
- switch (ref.r.p.udp.udp.splice) {
- case UDP_TO_NS:
+
+ if (ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
src += c->udp.fwd_out.rdelta[src];
if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
@@ -551,27 +552,24 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if ((s = arg.s) < 0)
return;
}
+
udp_splice_to_ns[v6][src].ts = now->tv_sec;
- break;
- case UDP_BACK_TO_INIT:
+ } else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
return;
- break;
- case UDP_TO_INIT:
+ } else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
if (!(s = udp_splice_to_init[v6][src].target_sock)) {
- s = udp_splice_new(c, v6, ref.r.s, src, UDP_BACK_TO_NS);
+ s = udp_splice_new(c, v6, ref.r.s, src, false);
if (s < 0)
return;
}
udp_splice_to_init[v6][src].ts = now->tv_sec;
- break;
- case UDP_BACK_TO_NS:
+ } else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
if (!(s = udp_splice_to_init[v6][dst].orig_sock))
return;
- break;
- default:
+ } else {
return;
}
@@ -1097,15 +1095,16 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
if (c->mode == MODE_PASTA) {
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- uref.udp.splice = UDP_TO_NS;
+ uref.udp.splice = uref.udp.orig = true;
sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
port, uref.u32);
}
if (ns) {
+ uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
+
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- uref.udp.splice = UDP_TO_INIT;
sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
port, uref.u32);
@@ -1130,7 +1129,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
if (c->mode == MODE_PASTA) {
bind_addr = &in6addr_loopback;
- uref.udp.splice = UDP_TO_NS;
+ uref.udp.splice = uref.udp.orig = true;
sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
port, uref.u32);
@@ -1138,7 +1137,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
if (ns) {
bind_addr = &in6addr_loopback;
- uref.udp.splice = UDP_TO_INIT;
+ uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
port, uref.u32);
diff --git a/udp.h b/udp.h
index 43bd28a..053991e 100644
--- a/udp.h
+++ b/udp.h
@@ -23,20 +23,19 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
* union udp_epoll_ref - epoll reference portion for TCP connections
* @bound: Set if this file descriptor is a bound socket
* @splice: Set if descriptor is associated to "spliced" connection
+ * @orig: Set if a spliced socket which can originate "connections"
+ * @ns: Set if this is a socket in the pasta network namespace
* @v6: Set for IPv6 sockets or connections
* @port: Source port for connected sockets, bound port otherwise
* @u32: Opaque u32 value of reference
*/
union udp_epoll_ref {
struct {
- uint32_t splice:3,
-#define UDP_TO_NS 1
-#define UDP_TO_INIT 2
-#define UDP_BACK_TO_NS 3
-#define UDP_BACK_TO_INIT 4
-
- v6:1,
- port:16;
+ bool splice:1,
+ orig:1,
+ ns:1,
+ v6:1;
+ uint32_t port:16;
} udp;
uint32_t u32;
};
--
@@ -23,20 +23,19 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
* union udp_epoll_ref - epoll reference portion for TCP connections
* @bound: Set if this file descriptor is a bound socket
* @splice: Set if descriptor is associated to "spliced" connection
+ * @orig: Set if a spliced socket which can originate "connections"
+ * @ns: Set if this is a socket in the pasta network namespace
* @v6: Set for IPv6 sockets or connections
* @port: Source port for connected sockets, bound port otherwise
* @u32: Opaque u32 value of reference
*/
union udp_epoll_ref {
struct {
- uint32_t splice:3,
-#define UDP_TO_NS 1
-#define UDP_TO_INIT 2
-#define UDP_BACK_TO_NS 3
-#define UDP_BACK_TO_INIT 4
-
- v6:1,
- port:16;
+ bool splice:1,
+ orig:1,
+ ns:1,
+ v6:1;
+ uint32_t port:16;
} udp;
uint32_t u32;
};
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 07/16] udp: Don't create double sockets for -U port
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (5 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 06/16] udp: Split splice field in udp_epoll_ref into (mostly) independent bits David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 08/16] udp: Re-use fixed bound sockets for packet forwarding when possible David Gibson
` (8 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
For each IP version udp_socket() has 3 possible calls to sock_l4(). One
is for the "non-spliced" bound socket in the init namespace, one for the
"spliced" bound socket in the init namespace and one for the "spliced"
bound socket in the pasta namespace.
However when this is called to create a socket in the pasta namspeace there
is a logic error which causes it to take the path for the init side spliced
socket as well as the ns socket. This essentially tries to create two
identical sockets on the ns side. Unsurprisingly the second bind() call
fails according to strace.
Correct this to only attempt to open one socket within the ns.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/udp.c b/udp.c
index 4c87e66..f93dd8b 100644
--- a/udp.c
+++ b/udp.c
@@ -1091,17 +1091,15 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
port, uref.u32);
udp_tap_map[V4][uref.udp.port].sock = s;
- }
-
- if (c->mode == MODE_PASTA) {
- bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
- }
+ if (c->mode == MODE_PASTA) {
+ bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
+ uref.udp.splice = uref.udp.orig = true;
- if (ns) {
+ sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
+ port, uref.u32);
+ }
+ } else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
@@ -1125,17 +1123,15 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
port, uref.u32);
udp_tap_map[V6][uref.udp.port].sock = s;
- }
-
- if (c->mode == MODE_PASTA) {
- bind_addr = &in6addr_loopback;
- uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
- }
+ if (c->mode == MODE_PASTA) {
+ bind_addr = &in6addr_loopback;
+ uref.udp.splice = uref.udp.orig = true;
- if (ns) {
+ sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
+ port, uref.u32);
+ }
+ } else {
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
--
@@ -1091,17 +1091,15 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
port, uref.u32);
udp_tap_map[V4][uref.udp.port].sock = s;
- }
-
- if (c->mode == MODE_PASTA) {
- bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
- }
+ if (c->mode == MODE_PASTA) {
+ bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
+ uref.udp.splice = uref.udp.orig = true;
- if (ns) {
+ sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
+ port, uref.u32);
+ }
+ } else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
@@ -1125,17 +1123,15 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
port, uref.u32);
udp_tap_map[V6][uref.udp.port].sock = s;
- }
-
- if (c->mode == MODE_PASTA) {
- bind_addr = &in6addr_loopback;
- uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
- }
+ if (c->mode == MODE_PASTA) {
+ bind_addr = &in6addr_loopback;
+ uref.udp.splice = uref.udp.orig = true;
- if (ns) {
+ sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
+ port, uref.u32);
+ }
+ } else {
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 08/16] udp: Re-use fixed bound sockets for packet forwarding when possible
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (6 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 07/16] udp: Don't create double sockets for -U port David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections" David Gibson
` (7 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
When we look up udp_splice_to_ns[v6][src].target_sock in
udp_sock_handler_splice, all we really require of the socket is that it
be bound to port src in the pasta guest namespace. Similarly for
udp_splice_to_init but bound in the init namespace.
Usually these sockets are created temporarily by udp_splice_connect() and
cleaned up by udp_timer(). However, depending on the -u and -U options its
possible we have a permanent socket bound to the relevant port created by
udp_sock_init(). If such a socket exists, we could use it instead of
creating a temporary one. In fact we *must* use it, because we'll fail
trying to bind() a temporary one to the same port.
So allow this, store permanently bound sockets into udp_splice_to_{ns,init}
in udp_sock_init(). These won't get incorrectly removed by the timer
because we don't put a corresponding entry in the udp_act[] structure
which directs the timer what to clean up.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/udp.c b/udp.c
index f93dd8b..65438de 100644
--- a/udp.c
+++ b/udp.c
@@ -153,7 +153,7 @@ struct udp_splice_flow {
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-/* Spliced "connections" indexed by originating source port (host order) */
+/* Spliced "connections" indexed by bound port of target_sock (host order) */
static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
@@ -1096,16 +1096,18 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_init[V4][port].target_sock = s;
}
} else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_ns[V4][port].target_sock = s;
}
}
@@ -1128,15 +1130,17 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_init[V6][port].target_sock = s;
}
} else {
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_ns[V6][port].target_sock = s;
}
}
}
--
@@ -153,7 +153,7 @@ struct udp_splice_flow {
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-/* Spliced "connections" indexed by originating source port (host order) */
+/* Spliced "connections" indexed by bound port of target_sock (host order) */
static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
@@ -1096,16 +1096,18 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_init[V4][port].target_sock = s;
}
} else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_ns[V4][port].target_sock = s;
}
}
@@ -1128,15 +1130,17 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_init[V6][port].target_sock = s;
}
} else {
bind_addr = &in6addr_loopback;
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
- sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr, ifname,
- port, uref.u32);
+ s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
+ ifname, port, uref.u32);
+ udp_splice_to_ns[V6][port].target_sock = s;
}
}
}
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections"
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (7 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 08/16] udp: Re-use fixed bound sockets for packet forwarding when possible David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-25 1:48 ` Stefano Brivio
2022-11-24 1:16 ` [PATCH v2 10/16] udp: Update UDP "connection" timestamps in both directions David Gibson
` (6 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
When we look up udp_splice_to_ns[][].orig_sock in udp_sock_handler_splice()
we're finding the socket on which the originating packet for the
"connection" was received on. However, we don't specifically need this
socket to be the originating one - we just need one that's bound to the
the source port of this reply packet in the init namespace. We can look
this up in udp_splice_to_init[v6][src].target_sock, whose defining
characteristic is exactly that. The same applies with init and ns swapped.
In practice, of course, the port we locate this way will always be the
originating port, since we couldn't have started this "connection" if it
wasn't.
Change this, and we no longer need the @orig_sock field at all. That leaves just @target_sock which we rename to simply @sock. The
whole udp_splice_flow structure now more represents a single bound port
than a "flow" per se, so rename and recomment it accordingly. Likewise the
udp_splice_to_{ns,init} names are now misleading, since the ports in those
maps are used in both directions. Rename them to udp_splice_{ns,init}
indicating the location where the described socket is bound.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 111 +++++++++++++++++++++++++++-------------------------------
1 file changed, 51 insertions(+), 60 deletions(-)
diff --git a/udp.c b/udp.c
index 65438de..14e8ff2 100644
--- a/udp.c
+++ b/udp.c
@@ -47,44 +47,40 @@
*
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
* with epoll reference: index = 80, splice = 1, orig = 1, ns = 0
- * - if udp_splice_to_ns[V4][5000].target_sock:
- * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
- * destination port 80
+ * - if udp_splice_ns[V4][5000].sock:
+ * - send packet to udp_splice_ns[V4][5000].sock, with destination port
+ * 80
* - otherwise:
- * - create new socket udp_splice_to_ns[V4][5000].target_sock
+ * - create new socket udp_splice_ns[V4][5000].sock
* - bind in namespace to 127.0.0.1:5000
* - add to epoll with reference: index = 5000, splice = 1, orig = 0,
* ns = 1
- * - set udp_splice_to_ns[V4][5000].orig_sock to s
- * - update udp_splice_to_ns[V4][5000].ts with current time
+ * - update udp_splice_ns[V4][5000].ts with current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
* having epoll reference: index = 5000, splice = 1, orig = 0, ns = 1
- * - if udp_splice_to_ns[V4][5000].orig_sock:
- * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
- * 5000
+ * - if udp_splice_init[V4][80].sock:
+ * - send to udp_splice_init[V4][80].sock, with destination port 5000
* - otherwise, discard
*
* - from namespace to init:
*
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
* socket s, with epoll reference: index = 22, splice = 1, orig = 1, ns = 1
- * - if udp4_splice_to_init[V4][2000].target_sock:
- * - send packet to udp_splice_to_init[V4][2000].target_sock, with
- * destination port 22
+ * - if udp4_splice_init[V4][2000].sock:
+ * - send packet to udp_splice_init[V4][2000].sock, with destination
+ * port 22
* - otherwise:
- * - create new socket udp_splice_to_init[V4][2000].target_sock
+ * - create new socket udp_splice_init[V4][2000].sock
* - bind in init to 127.0.0.1:2000
* - add to epoll with reference: index = 2000, splice = 1, orig = 0,
* ns = 0
- * - set udp_splice_to_init[V4][2000].orig_sock to s
- * - update udp_splice_to_init[V4][2000].ts with current time
+ * - update udp_splice_init[V4][2000].ts with current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
* having epoll reference: index = 2000, splice = 1, orig = 0, ns = 0
- * - if udp_splice_to_init[V4][2000].orig_sock:
- * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
- * 2000
+ * - if udp_splice_ns[V4][22].sock:
+ * - send to udp_splice_ns[V4][22].sock, with destination port 2000
* - otherwise, discard
*/
@@ -137,25 +133,21 @@ struct udp_tap_port {
};
/**
- * struct udp_splice_flow - Spliced "connection"
- * @orig_sock: Originating socket, bound to dest port in source ns of
- * originating datagram
- * @target_sock: Target socket, bound to source port of originating
- * datagram in dest ns
- * @ts: Activity timestamp
+ * struct udp_splice_port - Bound socket for spliced communication
+ * @sock: Socket bound to index port
+ * @ts: Activity timestamp
*/
-struct udp_splice_flow {
- int orig_sock;
- int target_sock;
+struct udp_splice_port {
+ int sock;
time_t ts;
};
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-/* Spliced "connections" indexed by bound port of target_sock (host order) */
-static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
+/* "Spliced" sockets indexed by bound port (host order) */
+static struct udp_splice_port udp_splice_ns [IP_VERSIONS][NUM_PORTS];
+static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
@@ -397,7 +389,6 @@ static void udp_sock6_iov_init(void)
* udp_splice_new() - Create and prepare socket for "spliced" binding
* @c: Execution context
* @v6: Set for IPv6 sockets
- * @bound_sock: Originating bound socket
* @src: Source port of original connection, host order
* @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
*
@@ -405,23 +396,22 @@ static void udp_sock6_iov_init(void)
*
* #syscalls:pasta getsockname
*/
-int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
- bool ns)
+int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
.r.p.udp.udp = { .splice = true, .ns = ns,
.v6 = v6, .port = src }
};
- struct udp_splice_flow *flow;
+ struct udp_splice_port *sp;
int s;
int act;
if (ns) {
- flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
+ sp = &udp_splice_ns[v6 ? V6 : V4][src];
act = UDP_ACT_SPLICE_NS;
} else {
- flow = &udp_splice_to_init[v6 ? V6 : V4][src];
+ sp = &udp_splice_init[v6 ? V6 : V4][src];
act = UDP_ACT_SPLICE_INIT;
}
@@ -456,8 +446,7 @@ int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
goto fail;
}
- flow->orig_sock = bound_sock;
- flow->target_sock = s;
+ sp->sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][act], src);
ev.data.u64 = ref.u64;
@@ -473,7 +462,6 @@ fail:
* struct udp_splice_new_ns_arg - Arguments for udp_splice_new_ns()
* @c: Execution context
* @v6: Set for IPv6
- * @bound_sock: Originating bound socket
* @src: Source port of originating datagram, host order
* @dst: Destination port of originating datagram, host order
* @s: Newly created socket or negative error code
@@ -481,7 +469,6 @@ fail:
struct udp_splice_new_ns_arg {
const struct ctx *c;
int v6;
- int bound_sock;
in_port_t src;
int s;
};
@@ -501,7 +488,7 @@ static int udp_splice_new_ns(void *arg)
if (ns_enter(a->c))
return 0;
- a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src, true);
+ a->s = udp_splice_new(a->c, a->v6, a->src, true);
return 0;
}
@@ -543,9 +530,9 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
src += c->udp.fwd_out.rdelta[src];
- if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
+ if (!(s = udp_splice_ns[v6][src].sock)) {
struct udp_splice_new_ns_arg arg = {
- c, v6, ref.r.s, src, -1,
+ c, v6, src, -1,
};
NS_CALL(udp_splice_new_ns, &arg);
@@ -553,21 +540,25 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
return;
}
- udp_splice_to_ns[v6][src].ts = now->tv_sec;
+ udp_splice_ns[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
- if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
+ src += c->udp.fwd_in.rdelta[src];
+
+ if (!(s = udp_splice_init[v6][src].sock))
return;
} else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
- if (!(s = udp_splice_to_init[v6][src].target_sock)) {
- s = udp_splice_new(c, v6, ref.r.s, src, false);
+ if (!(s = udp_splice_init[v6][src].sock)) {
+ s = udp_splice_new(c, v6, src, false);
if (s < 0)
return;
}
- udp_splice_to_init[v6][src].ts = now->tv_sec;
+ udp_splice_init[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
- if (!(s = udp_splice_to_init[v6][dst].orig_sock))
+ src += c->udp.fwd_out.rdelta[src];
+
+ if (!(s = udp_splice_ns[v6][src].sock))
return;
} else {
return;
@@ -1098,7 +1089,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_init[V4][port].target_sock = s;
+ udp_splice_init[V4][port].sock = s;
}
} else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
@@ -1107,7 +1098,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_ns[V4][port].target_sock = s;
+ udp_splice_ns[V4][port].sock = s;
}
}
@@ -1132,7 +1123,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_init[V6][port].target_sock = s;
+ udp_splice_init[V6][port].sock = s;
}
} else {
bind_addr = &in6addr_loopback;
@@ -1140,7 +1131,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_ns[V6][port].target_sock = s;
+ udp_splice_ns[V6][port].sock = s;
}
}
}
@@ -1243,7 +1234,7 @@ int udp_init(struct ctx *c)
static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
in_port_t port, const struct timespec *ts)
{
- struct udp_splice_flow *flow;
+ struct udp_splice_port *sp;
struct udp_tap_port *tp;
int s = -1;
@@ -1258,17 +1249,17 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
break;
case UDP_ACT_SPLICE_INIT:
- flow = &udp_splice_to_init[v6 ? V6 : V4][port];
+ sp = &udp_splice_init[v6 ? V6 : V4][port];
- if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
- s = flow->target_sock;
+ if (ts->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
+ s = sp->sock;
break;
case UDP_ACT_SPLICE_NS:
- flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
+ sp = &udp_splice_ns[v6 ? V6 : V4][port];
- if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
- s = flow->target_sock;
+ if (ts->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
+ s = sp->sock;
break;
default:
--
@@ -47,44 +47,40 @@
*
* - forward direction: 127.0.0.1:5000 -> 127.0.0.1:80 in init from socket s,
* with epoll reference: index = 80, splice = 1, orig = 1, ns = 0
- * - if udp_splice_to_ns[V4][5000].target_sock:
- * - send packet to udp_splice_to_ns[V4][5000].target_sock, with
- * destination port 80
+ * - if udp_splice_ns[V4][5000].sock:
+ * - send packet to udp_splice_ns[V4][5000].sock, with destination port
+ * 80
* - otherwise:
- * - create new socket udp_splice_to_ns[V4][5000].target_sock
+ * - create new socket udp_splice_ns[V4][5000].sock
* - bind in namespace to 127.0.0.1:5000
* - add to epoll with reference: index = 5000, splice = 1, orig = 0,
* ns = 1
- * - set udp_splice_to_ns[V4][5000].orig_sock to s
- * - update udp_splice_to_ns[V4][5000].ts with current time
+ * - update udp_splice_ns[V4][5000].ts with current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
* having epoll reference: index = 5000, splice = 1, orig = 0, ns = 1
- * - if udp_splice_to_ns[V4][5000].orig_sock:
- * - send to udp_splice_to_ns[V4][5000].orig_sock, with destination port
- * 5000
+ * - if udp_splice_init[V4][80].sock:
+ * - send to udp_splice_init[V4][80].sock, with destination port 5000
* - otherwise, discard
*
* - from namespace to init:
*
* - forward direction: 127.0.0.1:2000 -> 127.0.0.1:22 in namespace from
* socket s, with epoll reference: index = 22, splice = 1, orig = 1, ns = 1
- * - if udp4_splice_to_init[V4][2000].target_sock:
- * - send packet to udp_splice_to_init[V4][2000].target_sock, with
- * destination port 22
+ * - if udp4_splice_init[V4][2000].sock:
+ * - send packet to udp_splice_init[V4][2000].sock, with destination
+ * port 22
* - otherwise:
- * - create new socket udp_splice_to_init[V4][2000].target_sock
+ * - create new socket udp_splice_init[V4][2000].sock
* - bind in init to 127.0.0.1:2000
* - add to epoll with reference: index = 2000, splice = 1, orig = 0,
* ns = 0
- * - set udp_splice_to_init[V4][2000].orig_sock to s
- * - update udp_splice_to_init[V4][2000].ts with current time
+ * - update udp_splice_init[V4][2000].ts with current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
* having epoll reference: index = 2000, splice = 1, orig = 0, ns = 0
- * - if udp_splice_to_init[V4][2000].orig_sock:
- * - send to udp_splice_to_init[V4][2000].orig_sock, with destination port
- * 2000
+ * - if udp_splice_ns[V4][22].sock:
+ * - send to udp_splice_ns[V4][22].sock, with destination port 2000
* - otherwise, discard
*/
@@ -137,25 +133,21 @@ struct udp_tap_port {
};
/**
- * struct udp_splice_flow - Spliced "connection"
- * @orig_sock: Originating socket, bound to dest port in source ns of
- * originating datagram
- * @target_sock: Target socket, bound to source port of originating
- * datagram in dest ns
- * @ts: Activity timestamp
+ * struct udp_splice_port - Bound socket for spliced communication
+ * @sock: Socket bound to index port
+ * @ts: Activity timestamp
*/
-struct udp_splice_flow {
- int orig_sock;
- int target_sock;
+struct udp_splice_port {
+ int sock;
time_t ts;
};
/* Port tracking, arrays indexed by packet source port (host order) */
static struct udp_tap_port udp_tap_map [IP_VERSIONS][NUM_PORTS];
-/* Spliced "connections" indexed by bound port of target_sock (host order) */
-static struct udp_splice_flow udp_splice_to_ns [IP_VERSIONS][NUM_PORTS];
-static struct udp_splice_flow udp_splice_to_init[IP_VERSIONS][NUM_PORTS];
+/* "Spliced" sockets indexed by bound port (host order) */
+static struct udp_splice_port udp_splice_ns [IP_VERSIONS][NUM_PORTS];
+static struct udp_splice_port udp_splice_init[IP_VERSIONS][NUM_PORTS];
enum udp_act_type {
UDP_ACT_TAP,
@@ -397,7 +389,6 @@ static void udp_sock6_iov_init(void)
* udp_splice_new() - Create and prepare socket for "spliced" binding
* @c: Execution context
* @v6: Set for IPv6 sockets
- * @bound_sock: Originating bound socket
* @src: Source port of original connection, host order
* @splice: UDP_BACK_TO_INIT from init, UDP_BACK_TO_NS from namespace
*
@@ -405,23 +396,22 @@ static void udp_sock6_iov_init(void)
*
* #syscalls:pasta getsockname
*/
-int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
- bool ns)
+int udp_splice_new(const struct ctx *c, int v6, in_port_t src, bool ns)
{
struct epoll_event ev = { .events = EPOLLIN | EPOLLRDHUP | EPOLLHUP };
union epoll_ref ref = { .r.proto = IPPROTO_UDP,
.r.p.udp.udp = { .splice = true, .ns = ns,
.v6 = v6, .port = src }
};
- struct udp_splice_flow *flow;
+ struct udp_splice_port *sp;
int s;
int act;
if (ns) {
- flow = &udp_splice_to_ns[v6 ? V6 : V4][src];
+ sp = &udp_splice_ns[v6 ? V6 : V4][src];
act = UDP_ACT_SPLICE_NS;
} else {
- flow = &udp_splice_to_init[v6 ? V6 : V4][src];
+ sp = &udp_splice_init[v6 ? V6 : V4][src];
act = UDP_ACT_SPLICE_INIT;
}
@@ -456,8 +446,7 @@ int udp_splice_new(const struct ctx *c, int v6, int bound_sock, in_port_t src,
goto fail;
}
- flow->orig_sock = bound_sock;
- flow->target_sock = s;
+ sp->sock = s;
bitmap_set(udp_act[v6 ? V6 : V4][act], src);
ev.data.u64 = ref.u64;
@@ -473,7 +462,6 @@ fail:
* struct udp_splice_new_ns_arg - Arguments for udp_splice_new_ns()
* @c: Execution context
* @v6: Set for IPv6
- * @bound_sock: Originating bound socket
* @src: Source port of originating datagram, host order
* @dst: Destination port of originating datagram, host order
* @s: Newly created socket or negative error code
@@ -481,7 +469,6 @@ fail:
struct udp_splice_new_ns_arg {
const struct ctx *c;
int v6;
- int bound_sock;
in_port_t src;
int s;
};
@@ -501,7 +488,7 @@ static int udp_splice_new_ns(void *arg)
if (ns_enter(a->c))
return 0;
- a->s = udp_splice_new(a->c, a->v6, a->bound_sock, a->src, true);
+ a->s = udp_splice_new(a->c, a->v6, a->src, true);
return 0;
}
@@ -543,9 +530,9 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
src += c->udp.fwd_out.rdelta[src];
- if (!(s = udp_splice_to_ns[v6][src].target_sock)) {
+ if (!(s = udp_splice_ns[v6][src].sock)) {
struct udp_splice_new_ns_arg arg = {
- c, v6, ref.r.s, src, -1,
+ c, v6, src, -1,
};
NS_CALL(udp_splice_new_ns, &arg);
@@ -553,21 +540,25 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
return;
}
- udp_splice_to_ns[v6][src].ts = now->tv_sec;
+ udp_splice_ns[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
- if (!(s = udp_splice_to_ns[v6][dst].orig_sock))
+ src += c->udp.fwd_in.rdelta[src];
+
+ if (!(s = udp_splice_init[v6][src].sock))
return;
} else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
- if (!(s = udp_splice_to_init[v6][src].target_sock)) {
- s = udp_splice_new(c, v6, ref.r.s, src, false);
+ if (!(s = udp_splice_init[v6][src].sock)) {
+ s = udp_splice_new(c, v6, src, false);
if (s < 0)
return;
}
- udp_splice_to_init[v6][src].ts = now->tv_sec;
+ udp_splice_init[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
- if (!(s = udp_splice_to_init[v6][dst].orig_sock))
+ src += c->udp.fwd_out.rdelta[src];
+
+ if (!(s = udp_splice_ns[v6][src].sock))
return;
} else {
return;
@@ -1098,7 +1089,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_init[V4][port].target_sock = s;
+ udp_splice_init[V4][port].sock = s;
}
} else {
uref.udp.splice = uref.udp.orig = uref.udp.ns = true;
@@ -1107,7 +1098,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_ns[V4][port].target_sock = s;
+ udp_splice_ns[V4][port].sock = s;
}
}
@@ -1132,7 +1123,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_init[V6][port].target_sock = s;
+ udp_splice_init[V6][port].sock = s;
}
} else {
bind_addr = &in6addr_loopback;
@@ -1140,7 +1131,7 @@ void udp_sock_init(const struct ctx *c, int ns, sa_family_t af,
s = sock_l4(c, AF_INET6, IPPROTO_UDP, bind_addr,
ifname, port, uref.u32);
- udp_splice_to_ns[V6][port].target_sock = s;
+ udp_splice_ns[V6][port].sock = s;
}
}
}
@@ -1243,7 +1234,7 @@ int udp_init(struct ctx *c)
static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
in_port_t port, const struct timespec *ts)
{
- struct udp_splice_flow *flow;
+ struct udp_splice_port *sp;
struct udp_tap_port *tp;
int s = -1;
@@ -1258,17 +1249,17 @@ static void udp_timer_one(struct ctx *c, int v6, enum udp_act_type type,
break;
case UDP_ACT_SPLICE_INIT:
- flow = &udp_splice_to_init[v6 ? V6 : V4][port];
+ sp = &udp_splice_init[v6 ? V6 : V4][port];
- if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
- s = flow->target_sock;
+ if (ts->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
+ s = sp->sock;
break;
case UDP_ACT_SPLICE_NS:
- flow = &udp_splice_to_ns[v6 ? V6 : V4][port];
+ sp = &udp_splice_ns[v6 ? V6 : V4][port];
- if (ts->tv_sec - flow->ts > UDP_CONN_TIMEOUT)
- s = flow->target_sock;
+ if (ts->tv_sec - sp->ts > UDP_CONN_TIMEOUT)
+ s = sp->sock;
break;
default:
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections"
2022-11-24 1:16 ` [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections" David Gibson
@ 2022-11-25 1:48 ` Stefano Brivio
2022-11-25 7:09 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Stefano Brivio @ 2022-11-25 1:48 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
Nit:
On Thu, 24 Nov 2022 12:16:52 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> When we look up udp_splice_to_ns[][].orig_sock in udp_sock_handler_splice()
> we're finding the socket on which the originating packet for the
> "connection" was received on. However, we don't specifically need this
> socket to be the originating one - we just need one that's bound to the
> the source port of this reply packet in the init namespace. We can look
> this up in udp_splice_to_init[v6][src].target_sock, whose defining
> characteristic is exactly that. The same applies with init and ns swapped.
>
> In practice, of course, the port we locate this way will always be the
> originating port, since we couldn't have started this "connection" if it
> wasn't.
>
> Change this, and we no longer need the @orig_sock field at all. That leaves just @target_sock which we rename to simply @sock. The
Long line here.
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections"
2022-11-25 1:48 ` Stefano Brivio
@ 2022-11-25 7:09 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-25 7:09 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]
On Fri, Nov 25, 2022 at 02:48:05AM +0100, Stefano Brivio wrote:
> Nit:
>
> On Thu, 24 Nov 2022 12:16:52 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > When we look up udp_splice_to_ns[][].orig_sock in udp_sock_handler_splice()
> > we're finding the socket on which the originating packet for the
> > "connection" was received on. However, we don't specifically need this
> > socket to be the originating one - we just need one that's bound to the
> > the source port of this reply packet in the init namespace. We can look
> > this up in udp_splice_to_init[v6][src].target_sock, whose defining
> > characteristic is exactly that. The same applies with init and ns swapped.
> >
> > In practice, of course, the port we locate this way will always be the
> > originating port, since we couldn't have started this "connection" if it
> > wasn't.
> >
> > Change this, and we no longer need the @orig_sock field at all. That leaves just @target_sock which we rename to simply @sock. The
>
> Long line here.
Fixed.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 10/16] udp: Update UDP "connection" timestamps in both directions
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (8 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 09/16] udp: Don't explicitly track originating socket for spliced "connections" David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 11/16] udp: Simplify udp_sock_handler_splice David Gibson
` (5 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
A UDP pseudo-connection between port A in the init namespace and port B in
the pasta guest namespace involves two sockets: udp_splice_init[v6][B]
and udp_splice_ns[v6][A]. The socket which originated this "connection"
will be permanent but the other one will be closed on a timeout.
When we get a packet from the originating socket, we update the timeout on
the other socket, but we don't do the same when we get a reply packet from
the other socket. However any activity on the "connection" probably
indicates that it's still in use. Without this we could incorrectly time
out a "connection" if it's using a protocol which involves a single
initiating packet, but which then gets continuing replies from the target.
Correct this by updating the timeout on both sockets for a packet in either
direction. This also updates the timestamps for the permanent originating
sockets which is unnecessary, but harmless.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/udp.c b/udp.c
index 14e8ff2..206e9d3 100644
--- a/udp.c
+++ b/udp.c
@@ -55,12 +55,15 @@
* - bind in namespace to 127.0.0.1:5000
* - add to epoll with reference: index = 5000, splice = 1, orig = 0,
* ns = 1
- * - update udp_splice_ns[V4][5000].ts with current time
+ * - update udp_splice_init[V4][80].ts and udp_splice_ns[V4][5000].ts with
+ * current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
* having epoll reference: index = 5000, splice = 1, orig = 0, ns = 1
* - if udp_splice_init[V4][80].sock:
* - send to udp_splice_init[V4][80].sock, with destination port 5000
+ * - update udp_splice_init[V4][80].ts and udp_splice_ns[V4][5000].ts with
+ * current time
* - otherwise, discard
*
* - from namespace to init:
@@ -75,12 +78,15 @@
* - bind in init to 127.0.0.1:2000
* - add to epoll with reference: index = 2000, splice = 1, orig = 0,
* ns = 0
- * - update udp_splice_init[V4][2000].ts with current time
+ * - update udp_splice_ns[V4][22].ts and udp_splice_init[V4][2000].ts with
+ * current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
* having epoll reference: index = 2000, splice = 1, orig = 0, ns = 0
* - if udp_splice_ns[V4][22].sock:
* - send to udp_splice_ns[V4][22].sock, with destination port 2000
+ * - update udp_splice_ns[V4][22].ts and udp_splice_init[V4][2000].ts with
+ * current time
* - otherwise, discard
*/
@@ -540,12 +546,16 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
return;
}
+ udp_splice_init[v6][dst].ts = now->tv_sec;
udp_splice_ns[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
if (!(s = udp_splice_init[v6][src].sock))
return;
+
+ udp_splice_ns[v6][dst].ts = now->tv_sec;
+ udp_splice_init[v6][src].ts = now->tv_sec;
} else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
@@ -554,12 +564,17 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (s < 0)
return;
}
+
+ udp_splice_ns[v6][dst].ts = now->tv_sec;
udp_splice_init[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
src += c->udp.fwd_out.rdelta[src];
if (!(s = udp_splice_ns[v6][src].sock))
return;
+
+ udp_splice_init[v6][dst].ts = now->tv_sec;
+ udp_splice_ns[v6][src].ts = now->tv_sec;
} else {
return;
}
--
@@ -55,12 +55,15 @@
* - bind in namespace to 127.0.0.1:5000
* - add to epoll with reference: index = 5000, splice = 1, orig = 0,
* ns = 1
- * - update udp_splice_ns[V4][5000].ts with current time
+ * - update udp_splice_init[V4][80].ts and udp_splice_ns[V4][5000].ts with
+ * current time
*
* - reverse direction: 127.0.0.1:80 -> 127.0.0.1:5000 in namespace socket s,
* having epoll reference: index = 5000, splice = 1, orig = 0, ns = 1
* - if udp_splice_init[V4][80].sock:
* - send to udp_splice_init[V4][80].sock, with destination port 5000
+ * - update udp_splice_init[V4][80].ts and udp_splice_ns[V4][5000].ts with
+ * current time
* - otherwise, discard
*
* - from namespace to init:
@@ -75,12 +78,15 @@
* - bind in init to 127.0.0.1:2000
* - add to epoll with reference: index = 2000, splice = 1, orig = 0,
* ns = 0
- * - update udp_splice_init[V4][2000].ts with current time
+ * - update udp_splice_ns[V4][22].ts and udp_splice_init[V4][2000].ts with
+ * current time
*
* - reverse direction: 127.0.0.1:22 -> 127.0.0.1:2000 in init from socket s,
* having epoll reference: index = 2000, splice = 1, orig = 0, ns = 0
* - if udp_splice_ns[V4][22].sock:
* - send to udp_splice_ns[V4][22].sock, with destination port 2000
+ * - update udp_splice_ns[V4][22].ts and udp_splice_init[V4][2000].ts with
+ * current time
* - otherwise, discard
*/
@@ -540,12 +546,16 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
return;
}
+ udp_splice_init[v6][dst].ts = now->tv_sec;
udp_splice_ns[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
if (!(s = udp_splice_init[v6][src].sock))
return;
+
+ udp_splice_ns[v6][dst].ts = now->tv_sec;
+ udp_splice_init[v6][src].ts = now->tv_sec;
} else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
@@ -554,12 +564,17 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (s < 0)
return;
}
+
+ udp_splice_ns[v6][dst].ts = now->tv_sec;
udp_splice_init[v6][src].ts = now->tv_sec;
} else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
src += c->udp.fwd_out.rdelta[src];
if (!(s = udp_splice_ns[v6][src].sock))
return;
+
+ udp_splice_init[v6][dst].ts = now->tv_sec;
+ udp_splice_ns[v6][src].ts = now->tv_sec;
} else {
return;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 11/16] udp: Simplify udp_sock_handler_splice
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (9 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 10/16] udp: Update UDP "connection" timestamps in both directions David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 12/16] udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing David Gibson
` (4 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Previous cleanups mean that we can now rework some complex ifs in
udp_sock_handler_splice() into a simpler set.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 47 +++++++++++++++--------------------------------
1 file changed, 15 insertions(+), 32 deletions(-)
diff --git a/udp.c b/udp.c
index 206e9d3..ec61022 100644
--- a/udp.c
+++ b/udp.c
@@ -533,50 +533,33 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
}
- if (ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
- src += c->udp.fwd_out.rdelta[src];
-
- if (!(s = udp_splice_ns[v6][src].sock)) {
- struct udp_splice_new_ns_arg arg = {
- c, v6, src, -1,
- };
-
- NS_CALL(udp_splice_new_ns, &arg);
- if ((s = arg.s) < 0)
- return;
- }
-
- udp_splice_init[v6][dst].ts = now->tv_sec;
- udp_splice_ns[v6][src].ts = now->tv_sec;
- } else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
+ if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
+ s = udp_splice_init[v6][src].sock;
+ if (!s && ref.r.p.udp.udp.orig)
+ s = udp_splice_new(c, v6, src, false);
- if (!(s = udp_splice_init[v6][src].sock))
+ if (s < 0)
return;
udp_splice_ns[v6][dst].ts = now->tv_sec;
udp_splice_init[v6][src].ts = now->tv_sec;
- } else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
- src += c->udp.fwd_in.rdelta[src];
-
- if (!(s = udp_splice_init[v6][src].sock)) {
- s = udp_splice_new(c, v6, src, false);
- if (s < 0)
- return;
- }
-
- udp_splice_ns[v6][dst].ts = now->tv_sec;
- udp_splice_init[v6][src].ts = now->tv_sec;
- } else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
+ } else {
src += c->udp.fwd_out.rdelta[src];
+ s = udp_splice_ns[v6][src].sock;
+ if (!s && ref.r.p.udp.udp.orig) {
+ struct udp_splice_new_ns_arg arg = {
+ c, v6, src, -1,
+ };
- if (!(s = udp_splice_ns[v6][src].sock))
+ NS_CALL(udp_splice_new_ns, &arg);
+ s = arg.s;
+ }
+ if (s < 0)
return;
udp_splice_init[v6][dst].ts = now->tv_sec;
udp_splice_ns[v6][src].ts = now->tv_sec;
- } else {
- return;
}
for (i = 0; i < n; i++) {
--
@@ -533,50 +533,33 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
}
- if (ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
- src += c->udp.fwd_out.rdelta[src];
-
- if (!(s = udp_splice_ns[v6][src].sock)) {
- struct udp_splice_new_ns_arg arg = {
- c, v6, src, -1,
- };
-
- NS_CALL(udp_splice_new_ns, &arg);
- if ((s = arg.s) < 0)
- return;
- }
-
- udp_splice_init[v6][dst].ts = now->tv_sec;
- udp_splice_ns[v6][src].ts = now->tv_sec;
- } else if (!ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
+ if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
+ s = udp_splice_init[v6][src].sock;
+ if (!s && ref.r.p.udp.udp.orig)
+ s = udp_splice_new(c, v6, src, false);
- if (!(s = udp_splice_init[v6][src].sock))
+ if (s < 0)
return;
udp_splice_ns[v6][dst].ts = now->tv_sec;
udp_splice_init[v6][src].ts = now->tv_sec;
- } else if (ref.r.p.udp.udp.orig && ref.r.p.udp.udp.ns) {
- src += c->udp.fwd_in.rdelta[src];
-
- if (!(s = udp_splice_init[v6][src].sock)) {
- s = udp_splice_new(c, v6, src, false);
- if (s < 0)
- return;
- }
-
- udp_splice_ns[v6][dst].ts = now->tv_sec;
- udp_splice_init[v6][src].ts = now->tv_sec;
- } else if (!ref.r.p.udp.udp.orig && !ref.r.p.udp.udp.ns) {
+ } else {
src += c->udp.fwd_out.rdelta[src];
+ s = udp_splice_ns[v6][src].sock;
+ if (!s && ref.r.p.udp.udp.orig) {
+ struct udp_splice_new_ns_arg arg = {
+ c, v6, src, -1,
+ };
- if (!(s = udp_splice_ns[v6][src].sock))
+ NS_CALL(udp_splice_new_ns, &arg);
+ s = arg.s;
+ }
+ if (s < 0)
return;
udp_splice_init[v6][dst].ts = now->tv_sec;
udp_splice_ns[v6][src].ts = now->tv_sec;
- } else {
- return;
}
for (i = 0; i < n; i++) {
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 12/16] udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (10 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 11/16] udp: Simplify udp_sock_handler_splice David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6 David Gibson
` (3 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
These two constants have the same value, and there's not a lot of reason
they'd ever need to be different. Future changes will further integrate
the spliced and "tap" paths so that these need to be the same. So, merge
them into UDP_MAX_FRAMES.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 55 +++++++++++++++++++++++++++----------------------------
1 file changed, 27 insertions(+), 28 deletions(-)
diff --git a/udp.c b/udp.c
index ec61022..1db55e5 100644
--- a/udp.c
+++ b/udp.c
@@ -118,9 +118,8 @@
#include "log.h"
#define UDP_CONN_TIMEOUT 180 /* s, timeout for ephemeral or local bind */
-#define UDP_SPLICE_FRAMES 32
-#define UDP_TAP_FRAMES_MEM 32
-#define UDP_TAP_FRAMES (c->mode == MODE_PASST ? UDP_TAP_FRAMES_MEM : 1)
+#define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */
+#define UDP_TAP_FRAMES (c->mode == MODE_PASST ? UDP_MAX_FRAMES : 1)
/**
* struct udp_tap_port - Port tracking based on tap-facing source port
@@ -188,7 +187,7 @@ static struct udp4_l2_buf_t {
uint8_t data[USHRT_MAX -
(sizeof(struct iphdr) + sizeof(struct udphdr))];
} __attribute__ ((packed, aligned(__alignof__(unsigned int))))
-udp4_l2_buf[UDP_TAP_FRAMES_MEM];
+udp4_l2_buf[UDP_MAX_FRAMES];
/**
* udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -218,30 +217,30 @@ struct udp6_l2_buf_t {
#else
} __attribute__ ((packed, aligned(__alignof__(unsigned int))))
#endif
-udp6_l2_buf[UDP_TAP_FRAMES_MEM];
+udp6_l2_buf[UDP_MAX_FRAMES];
static struct sockaddr_storage udp_splice_namebuf;
-static uint8_t udp_splice_buf[UDP_SPLICE_FRAMES][USHRT_MAX];
+static uint8_t udp_splice_buf[UDP_MAX_FRAMES][USHRT_MAX];
/* recvmmsg()/sendmmsg() data for tap */
-static struct iovec udp4_l2_iov_sock [UDP_TAP_FRAMES_MEM];
-static struct iovec udp6_l2_iov_sock [UDP_TAP_FRAMES_MEM];
+static struct iovec udp4_l2_iov_sock [UDP_MAX_FRAMES];
+static struct iovec udp6_l2_iov_sock [UDP_MAX_FRAMES];
-static struct iovec udp4_l2_iov_tap [UDP_TAP_FRAMES_MEM];
-static struct iovec udp6_l2_iov_tap [UDP_TAP_FRAMES_MEM];
+static struct iovec udp4_l2_iov_tap [UDP_MAX_FRAMES];
+static struct iovec udp6_l2_iov_tap [UDP_MAX_FRAMES];
-static struct mmsghdr udp4_l2_mh_sock [UDP_TAP_FRAMES_MEM];
-static struct mmsghdr udp6_l2_mh_sock [UDP_TAP_FRAMES_MEM];
+static struct mmsghdr udp4_l2_mh_sock [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_l2_mh_sock [UDP_MAX_FRAMES];
-static struct mmsghdr udp4_l2_mh_tap [UDP_TAP_FRAMES_MEM];
-static struct mmsghdr udp6_l2_mh_tap [UDP_TAP_FRAMES_MEM];
+static struct mmsghdr udp4_l2_mh_tap [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_l2_mh_tap [UDP_MAX_FRAMES];
/* recvmmsg()/sendmmsg() data for "spliced" connections */
-static struct iovec udp_iov_recv [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_recv [UDP_SPLICE_FRAMES];
+static struct iovec udp_iov_recv [UDP_MAX_FRAMES];
+static struct mmsghdr udp_mmh_recv [UDP_MAX_FRAMES];
-static struct iovec udp_iov_sendto [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_sendto [UDP_SPLICE_FRAMES];
+static struct iovec udp_iov_sendto [UDP_MAX_FRAMES];
+static struct mmsghdr udp_mmh_sendto [UDP_MAX_FRAMES];
/**
* udp_invert_portmap() - Compute reverse port translations for return packets
@@ -286,7 +285,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
{
int i;
- for (i = 0; i < UDP_TAP_FRAMES_MEM; i++) {
+ for (i = 0; i < UDP_MAX_FRAMES; i++) {
struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
@@ -330,7 +329,7 @@ static void udp_sock4_iov_init(void)
};
}
- for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp4_l2_mh_sock; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp4_l2_buf[i].s_in;
@@ -342,7 +341,7 @@ static void udp_sock4_iov_init(void)
mh->msg_iovlen = 1;
}
- for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp4_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
udp4_l2_iov_tap[i].iov_base = &udp4_l2_buf[i].vnet_len;
@@ -370,7 +369,7 @@ static void udp_sock6_iov_init(void)
};
}
- for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp6_l2_mh_sock; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp6_l2_buf[i].s_in6;
@@ -382,7 +381,7 @@ static void udp_sock6_iov_init(void)
mh->msg_iovlen = 1;
}
- for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp6_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
udp6_l2_iov_tap[i].iov_base = &udp6_l2_buf[i].vnet_len;
@@ -517,7 +516,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(events & EPOLLIN))
return;
- n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_SPLICE_FRAMES, 0, NULL);
+ n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_MAX_FRAMES, 0, NULL);
if (n <= 0)
return;
@@ -1167,7 +1166,7 @@ static void udp_splice_iov_init(void)
struct iovec *iov;
int i;
- for (i = 0, h = udp_mmh_recv; i < UDP_SPLICE_FRAMES; i++, h++) {
+ for (i = 0, h = udp_mmh_recv; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
if (!i) {
@@ -1178,12 +1177,12 @@ static void udp_splice_iov_init(void)
mh->msg_iov = &udp_iov_recv[i];
mh->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_recv; i < UDP_SPLICE_FRAMES; i++, iov++) {
+ for (i = 0, iov = udp_iov_recv; i < UDP_MAX_FRAMES; i++, iov++) {
iov->iov_base = udp_splice_buf[i];
iov->iov_len = sizeof(udp_splice_buf[i]);
}
- for (i = 0, h = udp_mmh_sendto; i < UDP_SPLICE_FRAMES; i++, h++) {
+ for (i = 0, h = udp_mmh_sendto; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp_splice_namebuf;
@@ -1192,7 +1191,7 @@ static void udp_splice_iov_init(void)
mh->msg_iov = &udp_iov_sendto[i];
mh->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_sendto; i < UDP_SPLICE_FRAMES; i++, iov++)
+ for (i = 0, iov = udp_iov_sendto; i < UDP_MAX_FRAMES; i++, iov++)
iov->iov_base = udp_splice_buf[i];
}
--
@@ -118,9 +118,8 @@
#include "log.h"
#define UDP_CONN_TIMEOUT 180 /* s, timeout for ephemeral or local bind */
-#define UDP_SPLICE_FRAMES 32
-#define UDP_TAP_FRAMES_MEM 32
-#define UDP_TAP_FRAMES (c->mode == MODE_PASST ? UDP_TAP_FRAMES_MEM : 1)
+#define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */
+#define UDP_TAP_FRAMES (c->mode == MODE_PASST ? UDP_MAX_FRAMES : 1)
/**
* struct udp_tap_port - Port tracking based on tap-facing source port
@@ -188,7 +187,7 @@ static struct udp4_l2_buf_t {
uint8_t data[USHRT_MAX -
(sizeof(struct iphdr) + sizeof(struct udphdr))];
} __attribute__ ((packed, aligned(__alignof__(unsigned int))))
-udp4_l2_buf[UDP_TAP_FRAMES_MEM];
+udp4_l2_buf[UDP_MAX_FRAMES];
/**
* udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections
@@ -218,30 +217,30 @@ struct udp6_l2_buf_t {
#else
} __attribute__ ((packed, aligned(__alignof__(unsigned int))))
#endif
-udp6_l2_buf[UDP_TAP_FRAMES_MEM];
+udp6_l2_buf[UDP_MAX_FRAMES];
static struct sockaddr_storage udp_splice_namebuf;
-static uint8_t udp_splice_buf[UDP_SPLICE_FRAMES][USHRT_MAX];
+static uint8_t udp_splice_buf[UDP_MAX_FRAMES][USHRT_MAX];
/* recvmmsg()/sendmmsg() data for tap */
-static struct iovec udp4_l2_iov_sock [UDP_TAP_FRAMES_MEM];
-static struct iovec udp6_l2_iov_sock [UDP_TAP_FRAMES_MEM];
+static struct iovec udp4_l2_iov_sock [UDP_MAX_FRAMES];
+static struct iovec udp6_l2_iov_sock [UDP_MAX_FRAMES];
-static struct iovec udp4_l2_iov_tap [UDP_TAP_FRAMES_MEM];
-static struct iovec udp6_l2_iov_tap [UDP_TAP_FRAMES_MEM];
+static struct iovec udp4_l2_iov_tap [UDP_MAX_FRAMES];
+static struct iovec udp6_l2_iov_tap [UDP_MAX_FRAMES];
-static struct mmsghdr udp4_l2_mh_sock [UDP_TAP_FRAMES_MEM];
-static struct mmsghdr udp6_l2_mh_sock [UDP_TAP_FRAMES_MEM];
+static struct mmsghdr udp4_l2_mh_sock [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_l2_mh_sock [UDP_MAX_FRAMES];
-static struct mmsghdr udp4_l2_mh_tap [UDP_TAP_FRAMES_MEM];
-static struct mmsghdr udp6_l2_mh_tap [UDP_TAP_FRAMES_MEM];
+static struct mmsghdr udp4_l2_mh_tap [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_l2_mh_tap [UDP_MAX_FRAMES];
/* recvmmsg()/sendmmsg() data for "spliced" connections */
-static struct iovec udp_iov_recv [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_recv [UDP_SPLICE_FRAMES];
+static struct iovec udp_iov_recv [UDP_MAX_FRAMES];
+static struct mmsghdr udp_mmh_recv [UDP_MAX_FRAMES];
-static struct iovec udp_iov_sendto [UDP_SPLICE_FRAMES];
-static struct mmsghdr udp_mmh_sendto [UDP_SPLICE_FRAMES];
+static struct iovec udp_iov_sendto [UDP_MAX_FRAMES];
+static struct mmsghdr udp_mmh_sendto [UDP_MAX_FRAMES];
/**
* udp_invert_portmap() - Compute reverse port translations for return packets
@@ -286,7 +285,7 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
{
int i;
- for (i = 0; i < UDP_TAP_FRAMES_MEM; i++) {
+ for (i = 0; i < UDP_MAX_FRAMES; i++) {
struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i];
struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i];
@@ -330,7 +329,7 @@ static void udp_sock4_iov_init(void)
};
}
- for (i = 0, h = udp4_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp4_l2_mh_sock; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp4_l2_buf[i].s_in;
@@ -342,7 +341,7 @@ static void udp_sock4_iov_init(void)
mh->msg_iovlen = 1;
}
- for (i = 0, h = udp4_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp4_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
udp4_l2_iov_tap[i].iov_base = &udp4_l2_buf[i].vnet_len;
@@ -370,7 +369,7 @@ static void udp_sock6_iov_init(void)
};
}
- for (i = 0, h = udp6_l2_mh_sock; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp6_l2_mh_sock; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp6_l2_buf[i].s_in6;
@@ -382,7 +381,7 @@ static void udp_sock6_iov_init(void)
mh->msg_iovlen = 1;
}
- for (i = 0, h = udp6_l2_mh_tap; i < UDP_TAP_FRAMES_MEM; i++, h++) {
+ for (i = 0, h = udp6_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
udp6_l2_iov_tap[i].iov_base = &udp6_l2_buf[i].vnet_len;
@@ -517,7 +516,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (!(events & EPOLLIN))
return;
- n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_SPLICE_FRAMES, 0, NULL);
+ n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_MAX_FRAMES, 0, NULL);
if (n <= 0)
return;
@@ -1167,7 +1166,7 @@ static void udp_splice_iov_init(void)
struct iovec *iov;
int i;
- for (i = 0, h = udp_mmh_recv; i < UDP_SPLICE_FRAMES; i++, h++) {
+ for (i = 0, h = udp_mmh_recv; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
if (!i) {
@@ -1178,12 +1177,12 @@ static void udp_splice_iov_init(void)
mh->msg_iov = &udp_iov_recv[i];
mh->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_recv; i < UDP_SPLICE_FRAMES; i++, iov++) {
+ for (i = 0, iov = udp_iov_recv; i < UDP_MAX_FRAMES; i++, iov++) {
iov->iov_base = udp_splice_buf[i];
iov->iov_len = sizeof(udp_splice_buf[i]);
}
- for (i = 0, h = udp_mmh_sendto; i < UDP_SPLICE_FRAMES; i++, h++) {
+ for (i = 0, h = udp_mmh_sendto; i < UDP_MAX_FRAMES; i++, h++) {
struct msghdr *mh = &h->msg_hdr;
mh->msg_name = &udp_splice_namebuf;
@@ -1192,7 +1191,7 @@ static void udp_splice_iov_init(void)
mh->msg_iov = &udp_iov_sendto[i];
mh->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_sendto; i < UDP_SPLICE_FRAMES; i++, iov++)
+ for (i = 0, iov = udp_iov_sendto; i < UDP_MAX_FRAMES; i++, iov++)
iov->iov_base = udp_splice_buf[i];
}
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (11 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 12/16] udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-25 1:48 ` Stefano Brivio
2022-11-24 1:16 ` [PATCH v2 14/16] udp: Unify buffers for tap and splice paths David Gibson
` (2 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
udp_sock_handler_splice() has a somewhat clunky if to extract the port from
a socket address which could be either IPv4 or IPv6. Future changes are
going to make this even more clunky, so introduce a helper function to
do this extraction.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/udp.c b/udp.c
index 1db55e5..a79e5ca 100644
--- a/udp.c
+++ b/udp.c
@@ -498,6 +498,19 @@ static int udp_splice_new_ns(void *arg)
return 0;
}
+/**
+ * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
+ * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)
+ * @sa: Pointer to either sockaddr_in or sockaddr_in6
+ */
+static in_port_t sa_port(bool v6, const void *sa)
+{
+ const struct sockaddr_in6 *sa6 = sa;
+ const struct sockaddr_in *sa4 = sa;
+
+ return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
+}
+
/**
* udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
* @c: Execution context
@@ -509,8 +522,6 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
in_port_t src, dst = ref.r.p.udp.udp.port;
- struct msghdr *mh = &udp_mmh_recv[0].msg_hdr;
- struct sockaddr_storage *sa_s = mh->msg_name;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
if (!(events & EPOLLIN))
@@ -521,16 +532,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (n <= 0)
return;
- if (v6) {
- struct sockaddr_in6 *sa = (struct sockaddr_in6 *)sa_s;
-
- src = htons(sa->sin6_port);
- } else {
- struct sockaddr_in *sa = (struct sockaddr_in *)sa_s;
-
- src = ntohs(sa->sin_port);
- }
-
+ src = sa_port(v6, udp_mmh_recv[0].msg_hdr.msg_name);
if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
--
@@ -498,6 +498,19 @@ static int udp_splice_new_ns(void *arg)
return 0;
}
+/**
+ * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
+ * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)
+ * @sa: Pointer to either sockaddr_in or sockaddr_in6
+ */
+static in_port_t sa_port(bool v6, const void *sa)
+{
+ const struct sockaddr_in6 *sa6 = sa;
+ const struct sockaddr_in *sa4 = sa;
+
+ return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
+}
+
/**
* udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
* @c: Execution context
@@ -509,8 +522,6 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
in_port_t src, dst = ref.r.p.udp.udp.port;
- struct msghdr *mh = &udp_mmh_recv[0].msg_hdr;
- struct sockaddr_storage *sa_s = mh->msg_name;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
if (!(events & EPOLLIN))
@@ -521,16 +532,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
if (n <= 0)
return;
- if (v6) {
- struct sockaddr_in6 *sa = (struct sockaddr_in6 *)sa_s;
-
- src = htons(sa->sin6_port);
- } else {
- struct sockaddr_in *sa = (struct sockaddr_in *)sa_s;
-
- src = ntohs(sa->sin_port);
- }
-
+ src = sa_port(v6, udp_mmh_recv[0].msg_hdr.msg_name);
if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
2022-11-24 1:16 ` [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6 David Gibson
@ 2022-11-25 1:48 ` Stefano Brivio
2022-11-25 7:10 ` David Gibson
0 siblings, 1 reply; 29+ messages in thread
From: Stefano Brivio @ 2022-11-25 1:48 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
Nit:
On Thu, 24 Nov 2022 12:16:56 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> udp_sock_handler_splice() has a somewhat clunky if to extract the port from
> a socket address which could be either IPv4 or IPv6. Future changes are
> going to make this even more clunky, so introduce a helper function to
> do this extraction.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> udp.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/udp.c b/udp.c
> index 1db55e5..a79e5ca 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -498,6 +498,19 @@ static int udp_splice_new_ns(void *arg)
> return 0;
> }
>
> +/**
> + * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
> + * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)
Missing '?'
--
Stefano
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
2022-11-25 1:48 ` Stefano Brivio
@ 2022-11-25 7:10 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-25 7:10 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]
On Fri, Nov 25, 2022 at 02:48:09AM +0100, Stefano Brivio wrote:
> Nit:
>
> On Thu, 24 Nov 2022 12:16:56 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > udp_sock_handler_splice() has a somewhat clunky if to extract the port from
> > a socket address which could be either IPv4 or IPv6. Future changes are
> > going to make this even more clunky, so introduce a helper function to
> > do this extraction.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > udp.c | 26 ++++++++++++++------------
> > 1 file changed, 14 insertions(+), 12 deletions(-)
> >
> > diff --git a/udp.c b/udp.c
> > index 1db55e5..a79e5ca 100644
> > --- a/udp.c
> > +++ b/udp.c
> > @@ -498,6 +498,19 @@ static int udp_splice_new_ns(void *arg)
> > return 0;
> > }
> >
> > +/**
> > + * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
> > + * @v6: Is @sa a sockaddr_in6 (otherwise sockaddr_in)
>
> Missing '?'
Fixed.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 14/16] udp: Unify buffers for tap and splice paths
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (12 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 13/16] udp: Add helper to extract port from a sockaddr_in or sockaddr_in6 David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 15/16] udp: Split send half of udp_sock_handler_splice() from the receive half David Gibson
2022-11-24 1:16 ` [PATCH v2 16/16] udp: Correct splice forwarding when receiving from multiple sources David Gibson
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
We maintain a set of buffers for UDP packets to be forwarded via the tap
interface in udp[46]_l2_buf. We then have a separate set of buffers for
packets to be "spliced" in udp_splice_buf[]. However, we only use one of
these at a time, so we can share the buffer space.
For the receiving splice packets we can not only re-use the data buffers
but also the udp[46]_l2_iov_sock and udp[46]_l2_mh_sock control structures.
For sending the splice packets we keep the same data buffers, but we need
specific control structures. We create udp[46]_iov_splice - we can't
reuse udp_l2_iov_sock[] because we need to write iov_len as we're writing
spliced packets, but the tap path expects iov_len to remain the same (it
only uses it for receive). Likewise we create udp[46]_mh_splice with the
mmsghdr structures for sending spliced packets. As well as needing to
reference different iovs, these need to all reference udp_splice_namebuf
instead of individual msg_name fields for each slot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 71 ++++++++++++++++++++++++++---------------------------------
1 file changed, 31 insertions(+), 40 deletions(-)
diff --git a/udp.c b/udp.c
index a79e5ca..b7c71c2 100644
--- a/udp.c
+++ b/udp.c
@@ -219,9 +219,6 @@ struct udp6_l2_buf_t {
#endif
udp6_l2_buf[UDP_MAX_FRAMES];
-static struct sockaddr_storage udp_splice_namebuf;
-static uint8_t udp_splice_buf[UDP_MAX_FRAMES][USHRT_MAX];
-
/* recvmmsg()/sendmmsg() data for tap */
static struct iovec udp4_l2_iov_sock [UDP_MAX_FRAMES];
static struct iovec udp6_l2_iov_sock [UDP_MAX_FRAMES];
@@ -236,11 +233,13 @@ static struct mmsghdr udp4_l2_mh_tap [UDP_MAX_FRAMES];
static struct mmsghdr udp6_l2_mh_tap [UDP_MAX_FRAMES];
/* recvmmsg()/sendmmsg() data for "spliced" connections */
-static struct iovec udp_iov_recv [UDP_MAX_FRAMES];
-static struct mmsghdr udp_mmh_recv [UDP_MAX_FRAMES];
+static struct sockaddr_storage udp_splice_namebuf;
-static struct iovec udp_iov_sendto [UDP_MAX_FRAMES];
-static struct mmsghdr udp_mmh_sendto [UDP_MAX_FRAMES];
+static struct iovec udp4_iov_splice [UDP_MAX_FRAMES];
+static struct iovec udp6_iov_splice [UDP_MAX_FRAMES];
+
+static struct mmsghdr udp4_mh_splice [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_mh_splice [UDP_MAX_FRAMES];
/**
* udp_invert_portmap() - Compute reverse port translations for return packets
@@ -523,16 +522,25 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
{
in_port_t src, dst = ref.r.p.udp.udp.port;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
+ struct mmsghdr *mmh_recv, *mmh_send;
if (!(events & EPOLLIN))
return;
- n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_MAX_FRAMES, 0, NULL);
+ if (v6) {
+ mmh_recv = udp6_l2_mh_sock;
+ mmh_send = udp6_mh_splice;
+ } else {
+ mmh_recv = udp4_l2_mh_sock;
+ mmh_send = udp4_mh_splice;
+ }
+
+ n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
if (n <= 0)
return;
- src = sa_port(v6, udp_mmh_recv[0].msg_hdr.msg_name);
+ src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
@@ -563,11 +571,8 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
udp_splice_ns[v6][src].ts = now->tv_sec;
}
- for (i = 0; i < n; i++) {
- struct msghdr *mh_s = &udp_mmh_sendto[i].msg_hdr;
-
- mh_s->msg_iov->iov_len = udp_mmh_recv[i].msg_len;
- }
+ for (i = 0; i < n; i++)
+ mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
if (v6) {
*((struct sockaddr_in6 *)&udp_splice_namebuf) =
@@ -587,7 +592,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- sendmmsg(s, udp_mmh_sendto, n, MSG_NOSIGNAL);
+ sendmmsg(s, mmh_send, n, MSG_NOSIGNAL);
}
/**
@@ -1164,37 +1169,23 @@ int udp_sock_init_ns(void *arg)
*/
static void udp_splice_iov_init(void)
{
- struct mmsghdr *h;
- struct iovec *iov;
int i;
- for (i = 0, h = udp_mmh_recv; i < UDP_MAX_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
-
- if (!i) {
- mh->msg_name = &udp_splice_namebuf;
- mh->msg_namelen = sizeof(udp_splice_namebuf);
- }
-
- mh->msg_iov = &udp_iov_recv[i];
- mh->msg_iovlen = 1;
- }
- for (i = 0, iov = udp_iov_recv; i < UDP_MAX_FRAMES; i++, iov++) {
- iov->iov_base = udp_splice_buf[i];
- iov->iov_len = sizeof(udp_splice_buf[i]);
- }
+ for (i = 0; i < UDP_MAX_FRAMES; i++) {
+ struct msghdr *mh4 = &udp4_mh_splice[i].msg_hdr;
+ struct msghdr *mh6 = &udp6_mh_splice[i].msg_hdr;
- for (i = 0, h = udp_mmh_sendto; i < UDP_MAX_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
+ mh4->msg_name = mh6->msg_name = &udp_splice_namebuf;
+ mh4->msg_namelen = sizeof(udp_splice_namebuf);
+ mh6->msg_namelen = sizeof(udp_splice_namebuf);
- mh->msg_name = &udp_splice_namebuf;
- mh->msg_namelen = sizeof(udp_splice_namebuf);
+ udp4_iov_splice[i].iov_base = udp4_l2_buf[i].data;
+ udp6_iov_splice[i].iov_base = udp6_l2_buf[i].data;
- mh->msg_iov = &udp_iov_sendto[i];
- mh->msg_iovlen = 1;
+ mh4->msg_iov = &udp4_iov_splice[i];
+ mh6->msg_iov = &udp6_iov_splice[i];
+ mh4->msg_iovlen = mh6->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_sendto; i < UDP_MAX_FRAMES; i++, iov++)
- iov->iov_base = udp_splice_buf[i];
}
/**
--
@@ -219,9 +219,6 @@ struct udp6_l2_buf_t {
#endif
udp6_l2_buf[UDP_MAX_FRAMES];
-static struct sockaddr_storage udp_splice_namebuf;
-static uint8_t udp_splice_buf[UDP_MAX_FRAMES][USHRT_MAX];
-
/* recvmmsg()/sendmmsg() data for tap */
static struct iovec udp4_l2_iov_sock [UDP_MAX_FRAMES];
static struct iovec udp6_l2_iov_sock [UDP_MAX_FRAMES];
@@ -236,11 +233,13 @@ static struct mmsghdr udp4_l2_mh_tap [UDP_MAX_FRAMES];
static struct mmsghdr udp6_l2_mh_tap [UDP_MAX_FRAMES];
/* recvmmsg()/sendmmsg() data for "spliced" connections */
-static struct iovec udp_iov_recv [UDP_MAX_FRAMES];
-static struct mmsghdr udp_mmh_recv [UDP_MAX_FRAMES];
+static struct sockaddr_storage udp_splice_namebuf;
-static struct iovec udp_iov_sendto [UDP_MAX_FRAMES];
-static struct mmsghdr udp_mmh_sendto [UDP_MAX_FRAMES];
+static struct iovec udp4_iov_splice [UDP_MAX_FRAMES];
+static struct iovec udp6_iov_splice [UDP_MAX_FRAMES];
+
+static struct mmsghdr udp4_mh_splice [UDP_MAX_FRAMES];
+static struct mmsghdr udp6_mh_splice [UDP_MAX_FRAMES];
/**
* udp_invert_portmap() - Compute reverse port translations for return packets
@@ -523,16 +522,25 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
{
in_port_t src, dst = ref.r.p.udp.udp.port;
int s, v6 = ref.r.p.udp.udp.v6, n, i;
+ struct mmsghdr *mmh_recv, *mmh_send;
if (!(events & EPOLLIN))
return;
- n = recvmmsg(ref.r.s, udp_mmh_recv, UDP_MAX_FRAMES, 0, NULL);
+ if (v6) {
+ mmh_recv = udp6_l2_mh_sock;
+ mmh_send = udp6_mh_splice;
+ } else {
+ mmh_recv = udp4_l2_mh_sock;
+ mmh_send = udp4_mh_splice;
+ }
+
+ n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
if (n <= 0)
return;
- src = sa_port(v6, udp_mmh_recv[0].msg_hdr.msg_name);
+ src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
if (ref.r.p.udp.udp.ns) {
src += c->udp.fwd_in.rdelta[src];
@@ -563,11 +571,8 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
udp_splice_ns[v6][src].ts = now->tv_sec;
}
- for (i = 0; i < n; i++) {
- struct msghdr *mh_s = &udp_mmh_sendto[i].msg_hdr;
-
- mh_s->msg_iov->iov_len = udp_mmh_recv[i].msg_len;
- }
+ for (i = 0; i < n; i++)
+ mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
if (v6) {
*((struct sockaddr_in6 *)&udp_splice_namebuf) =
@@ -587,7 +592,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- sendmmsg(s, udp_mmh_sendto, n, MSG_NOSIGNAL);
+ sendmmsg(s, mmh_send, n, MSG_NOSIGNAL);
}
/**
@@ -1164,37 +1169,23 @@ int udp_sock_init_ns(void *arg)
*/
static void udp_splice_iov_init(void)
{
- struct mmsghdr *h;
- struct iovec *iov;
int i;
- for (i = 0, h = udp_mmh_recv; i < UDP_MAX_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
-
- if (!i) {
- mh->msg_name = &udp_splice_namebuf;
- mh->msg_namelen = sizeof(udp_splice_namebuf);
- }
-
- mh->msg_iov = &udp_iov_recv[i];
- mh->msg_iovlen = 1;
- }
- for (i = 0, iov = udp_iov_recv; i < UDP_MAX_FRAMES; i++, iov++) {
- iov->iov_base = udp_splice_buf[i];
- iov->iov_len = sizeof(udp_splice_buf[i]);
- }
+ for (i = 0; i < UDP_MAX_FRAMES; i++) {
+ struct msghdr *mh4 = &udp4_mh_splice[i].msg_hdr;
+ struct msghdr *mh6 = &udp6_mh_splice[i].msg_hdr;
- for (i = 0, h = udp_mmh_sendto; i < UDP_MAX_FRAMES; i++, h++) {
- struct msghdr *mh = &h->msg_hdr;
+ mh4->msg_name = mh6->msg_name = &udp_splice_namebuf;
+ mh4->msg_namelen = sizeof(udp_splice_namebuf);
+ mh6->msg_namelen = sizeof(udp_splice_namebuf);
- mh->msg_name = &udp_splice_namebuf;
- mh->msg_namelen = sizeof(udp_splice_namebuf);
+ udp4_iov_splice[i].iov_base = udp4_l2_buf[i].data;
+ udp6_iov_splice[i].iov_base = udp6_l2_buf[i].data;
- mh->msg_iov = &udp_iov_sendto[i];
- mh->msg_iovlen = 1;
+ mh4->msg_iov = &udp4_iov_splice[i];
+ mh6->msg_iov = &udp6_iov_splice[i];
+ mh4->msg_iovlen = mh6->msg_iovlen = 1;
}
- for (i = 0, iov = udp_iov_sendto; i < UDP_MAX_FRAMES; i++, iov++)
- iov->iov_base = udp_splice_buf[i];
}
/**
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 15/16] udp: Split send half of udp_sock_handler_splice() from the receive half
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (13 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 14/16] udp: Unify buffers for tap and splice paths David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-24 1:16 ` [PATCH v2 16/16] udp: Correct splice forwarding when receiving from multiple sources David Gibson
15 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Move the part of udp_sock_handler_splice() concerned with sending out the
datagrams into a new udp_splice_sendfrom() helper. This will make later
cleanups easier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 89 +++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 56 insertions(+), 33 deletions(-)
diff --git a/udp.c b/udp.c
index b7c71c2..2311e7d 100644
--- a/udp.c
+++ b/udp.c
@@ -511,41 +511,29 @@ static in_port_t sa_port(bool v6, const void *sa)
}
/**
- * udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
+ * udp_splice_sendfrom() - Send datagrams from given port to given port
* @c: Execution context
- * @ref: epoll reference
- * @events: epoll events bitmap
- * @now: Current timestamp
+ * @mmh: Message descriptors to send
+ * @n: Number of datagrams to send
+ * @src: Datagrams will be sent from this port (on origin side)
+ * @dst: Datagrams will be send to this port (on destination side)
+ * @v6: Send as IPv6?
+ * @from_ns: If true send from pasta ns to init, otherwise reverse
+ * @allow_new: If true create sending socket if needed, if false discard
+ * if no sending socket is available
+ * @now: Timestamp
*/
-static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
- uint32_t events, const struct timespec *now)
+static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n,
+ in_port_t src, in_port_t dst,
+ bool v6, bool from_ns, bool allow_new,
+ const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port;
- int s, v6 = ref.r.p.udp.udp.v6, n, i;
- struct mmsghdr *mmh_recv, *mmh_send;
-
- if (!(events & EPOLLIN))
- return;
-
- if (v6) {
- mmh_recv = udp6_l2_mh_sock;
- mmh_send = udp6_mh_splice;
- } else {
- mmh_recv = udp4_l2_mh_sock;
- mmh_send = udp4_mh_splice;
- }
-
- n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
-
- if (n <= 0)
- return;
-
- src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
+ int s;
- if (ref.r.p.udp.udp.ns) {
+ if (from_ns) {
src += c->udp.fwd_in.rdelta[src];
s = udp_splice_init[v6][src].sock;
- if (!s && ref.r.p.udp.udp.orig)
+ if (!s && allow_new)
s = udp_splice_new(c, v6, src, false);
if (s < 0)
@@ -556,7 +544,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
} else {
src += c->udp.fwd_out.rdelta[src];
s = udp_splice_ns[v6][src].sock;
- if (!s && ref.r.p.udp.udp.orig) {
+ if (!s && allow_new) {
struct udp_splice_new_ns_arg arg = {
c, v6, src, -1,
};
@@ -571,8 +559,38 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
udp_splice_ns[v6][src].ts = now->tv_sec;
}
- for (i = 0; i < n; i++)
- mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
+ sendmmsg(s, mmh, n, MSG_NOSIGNAL);
+}
+
+/**
+ * udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
+ * @c: Execution context
+ * @ref: epoll reference
+ * @events: epoll events bitmap
+ * @now: Current timestamp
+ */
+static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
+ uint32_t events, const struct timespec *now)
+{
+ in_port_t src, dst = ref.r.p.udp.udp.port;
+ struct mmsghdr *mmh_recv, *mmh_send;
+ int v6 = ref.r.p.udp.udp.v6, n, i;
+
+ if (!(events & EPOLLIN))
+ return;
+
+ if (v6) {
+ mmh_recv = udp6_l2_mh_sock;
+ mmh_send = udp6_mh_splice;
+ } else {
+ mmh_recv = udp4_l2_mh_sock;
+ mmh_send = udp4_mh_splice;
+ }
+
+ n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
+
+ if (n <= 0)
+ return;
if (v6) {
*((struct sockaddr_in6 *)&udp_splice_namebuf) =
@@ -592,7 +610,12 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- sendmmsg(s, mmh_send, n, MSG_NOSIGNAL);
+ for (i = 0; i < n; i++)
+ mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
+
+ src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
+ udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port,
+ v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now);
}
/**
--
@@ -511,41 +511,29 @@ static in_port_t sa_port(bool v6, const void *sa)
}
/**
- * udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
+ * udp_splice_sendfrom() - Send datagrams from given port to given port
* @c: Execution context
- * @ref: epoll reference
- * @events: epoll events bitmap
- * @now: Current timestamp
+ * @mmh: Message descriptors to send
+ * @n: Number of datagrams to send
+ * @src: Datagrams will be sent from this port (on origin side)
+ * @dst: Datagrams will be send to this port (on destination side)
+ * @v6: Send as IPv6?
+ * @from_ns: If true send from pasta ns to init, otherwise reverse
+ * @allow_new: If true create sending socket if needed, if false discard
+ * if no sending socket is available
+ * @now: Timestamp
*/
-static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
- uint32_t events, const struct timespec *now)
+static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n,
+ in_port_t src, in_port_t dst,
+ bool v6, bool from_ns, bool allow_new,
+ const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port;
- int s, v6 = ref.r.p.udp.udp.v6, n, i;
- struct mmsghdr *mmh_recv, *mmh_send;
-
- if (!(events & EPOLLIN))
- return;
-
- if (v6) {
- mmh_recv = udp6_l2_mh_sock;
- mmh_send = udp6_mh_splice;
- } else {
- mmh_recv = udp4_l2_mh_sock;
- mmh_send = udp4_mh_splice;
- }
-
- n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
-
- if (n <= 0)
- return;
-
- src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
+ int s;
- if (ref.r.p.udp.udp.ns) {
+ if (from_ns) {
src += c->udp.fwd_in.rdelta[src];
s = udp_splice_init[v6][src].sock;
- if (!s && ref.r.p.udp.udp.orig)
+ if (!s && allow_new)
s = udp_splice_new(c, v6, src, false);
if (s < 0)
@@ -556,7 +544,7 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
} else {
src += c->udp.fwd_out.rdelta[src];
s = udp_splice_ns[v6][src].sock;
- if (!s && ref.r.p.udp.udp.orig) {
+ if (!s && allow_new) {
struct udp_splice_new_ns_arg arg = {
c, v6, src, -1,
};
@@ -571,8 +559,38 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
udp_splice_ns[v6][src].ts = now->tv_sec;
}
- for (i = 0; i < n; i++)
- mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
+ sendmmsg(s, mmh, n, MSG_NOSIGNAL);
+}
+
+/**
+ * udp_sock_handler_splice() - Handler for socket mapped to "spliced" connection
+ * @c: Execution context
+ * @ref: epoll reference
+ * @events: epoll events bitmap
+ * @now: Current timestamp
+ */
+static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
+ uint32_t events, const struct timespec *now)
+{
+ in_port_t src, dst = ref.r.p.udp.udp.port;
+ struct mmsghdr *mmh_recv, *mmh_send;
+ int v6 = ref.r.p.udp.udp.v6, n, i;
+
+ if (!(events & EPOLLIN))
+ return;
+
+ if (v6) {
+ mmh_recv = udp6_l2_mh_sock;
+ mmh_send = udp6_mh_splice;
+ } else {
+ mmh_recv = udp4_l2_mh_sock;
+ mmh_send = udp4_mh_splice;
+ }
+
+ n = recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, NULL);
+
+ if (n <= 0)
+ return;
if (v6) {
*((struct sockaddr_in6 *)&udp_splice_namebuf) =
@@ -592,7 +610,12 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- sendmmsg(s, mmh_send, n, MSG_NOSIGNAL);
+ for (i = 0; i < n; i++)
+ mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
+
+ src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
+ udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port,
+ v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now);
}
/**
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 16/16] udp: Correct splice forwarding when receiving from multiple sources
2022-11-24 1:16 [PATCH v2 00/16] Simplify and correct handling of "spliced" UDP forwarding David Gibson
` (14 preceding siblings ...)
2022-11-24 1:16 ` [PATCH v2 15/16] udp: Split send half of udp_sock_handler_splice() from the receive half David Gibson
@ 2022-11-24 1:16 ` David Gibson
2022-11-29 5:55 ` David Gibson
15 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2022-11-24 1:16 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
udp_sock_handler_splice() reads a whole batch of datagrams at once with
recvmmsg(). It then forwards them all via a single socket on the other
side, based on the source port.
However, it's entirely possible that the datagrams in the set have
different source ports, and thus ought to be forwarded via different
sockets on the destination side. In fact this situation arises with the
iperf -P4 throughput tests in our own test suite. AFAICT we only get away
with this because iperf3 is strictly one way and doesn't send reply packets
which would be misdirected because of the incorrect source ports.
Alter udp_sock_handler_splice() to split the packets it receives into
batches with the same source address and send each batch with a separate
sendmmsg().
For now we only look for already contiguous batches, which means that if
there are multiple active flows interleaved this is likely to degenerate
to batches of size 1. For now this is the simplest way to correct the
behaviour and we can try to optimize later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
udp.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/udp.c b/udp.c
index 2311e7d..ee5c2c5 100644
--- a/udp.c
+++ b/udp.c
@@ -572,9 +572,9 @@ static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n,
static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port;
+ in_port_t dst = ref.r.p.udp.udp.port;
+ int v6 = ref.r.p.udp.udp.v6, n, i, m;
struct mmsghdr *mmh_recv, *mmh_send;
- int v6 = ref.r.p.udp.udp.v6, n, i;
if (!(events & EPOLLIN))
return;
@@ -610,12 +610,23 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- for (i = 0; i < n; i++)
- mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
-
- src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
- udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port,
- v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now);
+ for (i = 0; i < n; i += m) {
+ const struct mmsghdr *mmh = &mmh_recv[i];
+ in_port_t src = sa_port(v6, mmh->msg_hdr.msg_name);
+
+ m = 0;
+ do {
+ mmh_send[i + m].msg_hdr.msg_iov->iov_len = mmh->msg_len;
+ mmh++;
+ m++;
+ } while (sa_port(v6, mmh->msg_hdr.msg_name) == src);
+
+ udp_splice_sendfrom(c, mmh_send + i, m,
+ src, ref.r.p.udp.udp.port,
+ v6, ref.r.p.udp.udp.ns,
+ ref.r.p.udp.udp.orig,
+ now);
+ }
}
/**
--
@@ -572,9 +572,9 @@ static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n,
static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
uint32_t events, const struct timespec *now)
{
- in_port_t src, dst = ref.r.p.udp.udp.port;
+ in_port_t dst = ref.r.p.udp.udp.port;
+ int v6 = ref.r.p.udp.udp.v6, n, i, m;
struct mmsghdr *mmh_recv, *mmh_send;
- int v6 = ref.r.p.udp.udp.v6, n, i;
if (!(events & EPOLLIN))
return;
@@ -610,12 +610,23 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
});
}
- for (i = 0; i < n; i++)
- mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
-
- src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
- udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port,
- v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now);
+ for (i = 0; i < n; i += m) {
+ const struct mmsghdr *mmh = &mmh_recv[i];
+ in_port_t src = sa_port(v6, mmh->msg_hdr.msg_name);
+
+ m = 0;
+ do {
+ mmh_send[i + m].msg_hdr.msg_iov->iov_len = mmh->msg_len;
+ mmh++;
+ m++;
+ } while (sa_port(v6, mmh->msg_hdr.msg_name) == src);
+
+ udp_splice_sendfrom(c, mmh_send + i, m,
+ src, ref.r.p.udp.udp.port,
+ v6, ref.r.p.udp.udp.ns,
+ ref.r.p.udp.udp.orig,
+ now);
+ }
}
/**
--
2.38.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 16/16] udp: Correct splice forwarding when receiving from multiple sources
2022-11-24 1:16 ` [PATCH v2 16/16] udp: Correct splice forwarding when receiving from multiple sources David Gibson
@ 2022-11-29 5:55 ` David Gibson
0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2022-11-29 5:55 UTC (permalink / raw)
To: passt-dev, Stefano Brivio
[-- Attachment #1: Type: text/plain, Size: 3008 bytes --]
On Thu, Nov 24, 2022 at 12:16:59PM +1100, David Gibson wrote:
> udp_sock_handler_splice() reads a whole batch of datagrams at once with
> recvmmsg(). It then forwards them all via a single socket on the other
> side, based on the source port.
>
> However, it's entirely possible that the datagrams in the set have
> different source ports, and thus ought to be forwarded via different
> sockets on the destination side. In fact this situation arises with the
> iperf -P4 throughput tests in our own test suite. AFAICT we only get away
> with this because iperf3 is strictly one way and doesn't send reply packets
> which would be misdirected because of the incorrect source ports.
>
> Alter udp_sock_handler_splice() to split the packets it receives into
> batches with the same source address and send each batch with a separate
> sendmmsg().
>
> For now we only look for already contiguous batches, which means that if
> there are multiple active flows interleaved this is likely to degenerate
> to batches of size 1. For now this is the simplest way to correct the
> behaviour and we can try to optimize later.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This has a nasty bug, do not apply. I'll send a v3 shortly.
Specifically:
> ---
> udp.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/udp.c b/udp.c
> index 2311e7d..ee5c2c5 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -572,9 +572,9 @@ static void udp_splice_sendfrom(const struct ctx *c, struct mmsghdr *mmh, int n,
> static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
> uint32_t events, const struct timespec *now)
> {
> - in_port_t src, dst = ref.r.p.udp.udp.port;
> + in_port_t dst = ref.r.p.udp.udp.port;
> + int v6 = ref.r.p.udp.udp.v6, n, i, m;
> struct mmsghdr *mmh_recv, *mmh_send;
> - int v6 = ref.r.p.udp.udp.v6, n, i;
>
> if (!(events & EPOLLIN))
> return;
> @@ -610,12 +610,23 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref,
> });
> }
>
> - for (i = 0; i < n; i++)
> - mmh_send[i].msg_hdr.msg_iov->iov_len = mmh_recv[i].msg_len;
> -
> - src = sa_port(v6, mmh_recv[0].msg_hdr.msg_name);
> - udp_splice_sendfrom(c, mmh_send, n, src, ref.r.p.udp.udp.port,
> - v6, ref.r.p.udp.udp.ns, ref.r.p.udp.udp.orig, now);
> + for (i = 0; i < n; i += m) {
> + const struct mmsghdr *mmh = &mmh_recv[i];
> + in_port_t src = sa_port(v6, mmh->msg_hdr.msg_name);
> +
> + m = 0;
> + do {
> + mmh_send[i + m].msg_hdr.msg_iov->iov_len = mmh->msg_len;
> + mmh++;
> + m++;
> + } while (sa_port(v6, mmh->msg_hdr.msg_name) == src);
I missed a bounds check here, so this inner loop can overrun the mmh array.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread