* [PATCH 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound
2022-10-10 23:33 [PATCH 0/3] Fixes for spliced connections Stefano Brivio
@ 2022-10-10 23:33 ` Stefano Brivio
2022-10-11 0:57 ` David Gibson
2022-10-10 23:33 ` [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
2022-10-10 23:33 ` [PATCH 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only Stefano Brivio
2 siblings, 1 reply; 7+ messages in thread
From: Stefano Brivio @ 2022-10-10 23:33 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
For tcp_sock_init_ns(), "inbound" connections used to be the ones
being established toward any listening socket we create, as opposed
to sockets we connect().
Similarly, tcp_splice_new() used to handle "inbound" connections in
the sense that they originated from listening sockets, and they would
in turn cause a connect() on an "outbound" socket.
Since commit 1128fa03fe73 ("Improve types and names for port
forwarding configuration"), though, inbound connections are more
broadly defined as the ones directed to guest or namepsace, and
outbound the ones originating from there.
Update comments for those two functions.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 2 +-
tcp_splice.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tcp.c b/tcp.c
index 7e82589..63153b6 100644
--- a/tcp.c
+++ b/tcp.c
@@ -3178,7 +3178,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
}
/**
- * tcp_sock_init_ns() - Bind sockets in namespace for inbound connections
+ * tcp_sock_init_ns() - Bind sockets in namespace for outbound connections
* @arg: Execution context
*
* Return: 0
diff --git a/tcp_splice.c b/tcp_splice.c
index 4a015d0..96c31c8 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -502,7 +502,7 @@ static int tcp_splice_connect_ns(void *arg)
}
/**
- * tcp_splice_new() - Handle new inbound, spliced connection
+ * tcp_splice_new() - Handle new spliced connection
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
--
@@ -502,7 +502,7 @@ static int tcp_splice_connect_ns(void *arg)
}
/**
- * tcp_splice_new() - Handle new inbound, spliced connection
+ * tcp_splice_new() - Handle new spliced connection
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound
2022-10-10 23:33 ` [PATCH 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
@ 2022-10-11 0:57 ` David Gibson
0 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2022-10-11 0:57 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On Tue, Oct 11, 2022 at 01:33:48AM +0200, Stefano Brivio wrote:
> For tcp_sock_init_ns(), "inbound" connections used to be the ones
> being established toward any listening socket we create, as opposed
> to sockets we connect().
>
> Similarly, tcp_splice_new() used to handle "inbound" connections in
> the sense that they originated from listening sockets, and they would
> in turn cause a connect() on an "outbound" socket.
>
> Since commit 1128fa03fe73 ("Improve types and names for port
> forwarding configuration"), though, inbound connections are more
> broadly defined as the ones directed to guest or namepsace, and
> outbound the ones originating from there.
>
> Update comments for those two functions.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 2 +-
> tcp_splice.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 7e82589..63153b6 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -3178,7 +3178,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> }
>
> /**
> - * tcp_sock_init_ns() - Bind sockets in namespace for inbound connections
> + * tcp_sock_init_ns() - Bind sockets in namespace for outbound connections
> * @arg: Execution context
> *
> * Return: 0
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 4a015d0..96c31c8 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -502,7 +502,7 @@ static int tcp_splice_connect_ns(void *arg)
> }
>
> /**
> - * tcp_splice_new() - Handle new inbound, spliced connection
> + * tcp_splice_new() - Handle new spliced connection
> * @c: Execution context
> * @conn: Connection pointer
> * @port: Destination port, host order
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections
2022-10-10 23:33 [PATCH 0/3] Fixes for spliced connections Stefano Brivio
2022-10-10 23:33 ` [PATCH 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
@ 2022-10-10 23:33 ` Stefano Brivio
2022-10-11 1:19 ` David Gibson
2022-10-10 23:33 ` [PATCH 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only Stefano Brivio
2 siblings, 1 reply; 7+ messages in thread
From: Stefano Brivio @ 2022-10-10 23:33 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
It's indeed convenient to use the final destination port in the epoll
reference data, so that we don't have to check the configured port
deltas before establishing connections. However, this doesn't work if
we need to access the original port information to know what to do
once we receive a connection.
The existing implementation resulted in inbound, spliced connections
with a remapped destination port (and only those) to be attempted on
the outbound side, as we would check the wrong position in port
bitmaps to establish whether to use inbound or outbound sockets.
For listening spliced sockets, set the port index in the epoll
reference to the original port. Once we get a connection, use the
port delta arrays to find the final destination port, and the
original destination port to decide what socket pool we should use.
This avoids the need for a reverse mapping table as implemented for
UDP.
Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 19 ++++++++-----------
tcp_splice.c | 18 ++++++++++++++----
2 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/tcp.c b/tcp.c
index 63153b6..cc08353 100644
--- a/tcp.c
+++ b/tcp.c
@@ -3084,15 +3084,16 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
const void *addr, const char *ifname, in_port_t port)
{
+ union tcp_epoll_ref tref_spliced = { .tcp.listen = 1, .tcp.splice = 1 };
union tcp_epoll_ref tref = { .tcp.listen = 1 };
const void *bind_addr;
int s;
- if (ns) {
+ if (ns)
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
- } else {
+ else
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- }
+ tref_spliced.tcp.index = (in_port_t) port;
if (af == AF_INET || af == AF_UNSPEC) {
if (!addr && c->mode == MODE_PASTA)
@@ -3100,8 +3101,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
else
bind_addr = addr;
- tref.tcp.v6 = 0;
- tref.tcp.splice = 0;
+ tref.tcp.v6 = tref_spliced.tcp.v6 = 0;
if (!ns) {
s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
@@ -3118,9 +3118,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
if (c->mode == MODE_PASTA) {
bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- tref.tcp.splice = 1;
s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
+ port, tref_spliced.u32);
if (s >= 0)
tcp_sock_set_bufsize(c, s);
else
@@ -3141,9 +3140,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
else
bind_addr = addr;
- tref.tcp.v6 = 1;
+ tref.tcp.v6 = tref_spliced.tcp.v6 = 1;
- tref.tcp.splice = 0;
if (!ns) {
s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
port, tref.u32);
@@ -3159,9 +3157,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
if (c->mode == MODE_PASTA) {
bind_addr = &in6addr_loopback;
- tref.tcp.splice = 1;
s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
+ port, tref_spliced.u32);
if (s >= 0)
tcp_sock_set_bufsize(c, s);
else
diff --git a/tcp_splice.c b/tcp_splice.c
index 96c31c8..bf5f62c 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -35,6 +35,7 @@
#include <fcntl.h>
#include <limits.h>
#include <stdint.h>
+#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
@@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg)
static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
in_port_t port)
{
- struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
int *p, i, s = -1;
+ bool inbound;
- if (bitmap_isset(c->tcp.fwd_in.map, port))
+ if (bitmap_isset(c->tcp.fwd_in.map, port)) {
p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
- else
+ port += c->tcp.fwd_in.delta[port];
+ inbound = true;
+ } else {
p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
+ port += c->tcp.fwd_out.delta[port];
+ inbound = false;
+ }
for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
SWAP(s, *p);
@@ -526,11 +532,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
break;
}
- if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
+ /* No socket available in namespace: create a new one for connect() */
+ if (s < 0 && inbound) {
+ struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
+
NS_CALL(tcp_splice_connect_ns, &ns_arg);
return ns_arg.ret;
}
+ /* Otherwise, the socket will connect on the side it was created on */
return tcp_splice_connect(c, conn, s, port);
}
--
@@ -35,6 +35,7 @@
#include <fcntl.h>
#include <limits.h>
#include <stdint.h>
+#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
@@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg)
static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
in_port_t port)
{
- struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
int *p, i, s = -1;
+ bool inbound;
- if (bitmap_isset(c->tcp.fwd_in.map, port))
+ if (bitmap_isset(c->tcp.fwd_in.map, port)) {
p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
- else
+ port += c->tcp.fwd_in.delta[port];
+ inbound = true;
+ } else {
p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
+ port += c->tcp.fwd_out.delta[port];
+ inbound = false;
+ }
for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
SWAP(s, *p);
@@ -526,11 +532,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
break;
}
- if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
+ /* No socket available in namespace: create a new one for connect() */
+ if (s < 0 && inbound) {
+ struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
+
NS_CALL(tcp_splice_connect_ns, &ns_arg);
return ns_arg.ret;
}
+ /* Otherwise, the socket will connect on the side it was created on */
return tcp_splice_connect(c, conn, s, port);
}
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections
2022-10-10 23:33 ` [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
@ 2022-10-11 1:19 ` David Gibson
2022-10-11 7:42 ` Stefano Brivio
0 siblings, 1 reply; 7+ messages in thread
From: David Gibson @ 2022-10-11 1:19 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5928 bytes --]
On Tue, Oct 11, 2022 at 01:33:49AM +0200, Stefano Brivio wrote:
> It's indeed convenient to use the final destination port in the epoll
> reference data, so that we don't have to check the configured port
> deltas before establishing connections. However, this doesn't work if
> we need to access the original port information to know what to do
> once we receive a connection.
>
> The existing implementation resulted in inbound, spliced connections
> with a remapped destination port (and only those) to be attempted on
> the outbound side, as we would check the wrong position in port
> bitmaps to establish whether to use inbound or outbound sockets.
>
> For listening spliced sockets, set the port index in the epoll
> reference to the original port. Once we get a connection, use the
> port delta arrays to find the final destination port, and the
> original destination port to decide what socket pool we should use.
>
> This avoids the need for a reverse mapping table as implemented for
> UDP.
>
> Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> tcp.c | 19 ++++++++-----------
> tcp_splice.c | 18 ++++++++++++++----
> 2 files changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 63153b6..cc08353 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -3084,15 +3084,16 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
> void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> const void *addr, const char *ifname, in_port_t port)
> {
> + union tcp_epoll_ref tref_spliced = { .tcp.listen = 1, .tcp.splice = 1 };
> union tcp_epoll_ref tref = { .tcp.listen = 1 };
> const void *bind_addr;
> int s;
>
> - if (ns) {
> + if (ns)
> tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
> - } else {
> + else
> tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
> - }
> + tref_spliced.tcp.index = (in_port_t) port;
>
> if (af == AF_INET || af == AF_UNSPEC) {
> if (!addr && c->mode == MODE_PASTA)
> @@ -3100,8 +3101,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> else
> bind_addr = addr;
>
> - tref.tcp.v6 = 0;
> - tref.tcp.splice = 0;
> + tref.tcp.v6 = tref_spliced.tcp.v6 = 0;
>
> if (!ns) {
> s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
> @@ -3118,9 +3118,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> if (c->mode == MODE_PASTA) {
> bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
>
> - tref.tcp.splice = 1;
> s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
> - port, tref.u32);
> + port, tref_spliced.u32);
> if (s >= 0)
> tcp_sock_set_bufsize(c, s);
> else
> @@ -3141,9 +3140,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> else
> bind_addr = addr;
>
> - tref.tcp.v6 = 1;
> + tref.tcp.v6 = tref_spliced.tcp.v6 = 1;
>
> - tref.tcp.splice = 0;
> if (!ns) {
> s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
> port, tref.u32);
> @@ -3159,9 +3157,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> if (c->mode == MODE_PASTA) {
> bind_addr = &in6addr_loopback;
>
> - tref.tcp.splice = 1;
> s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
> - port, tref.u32);
> + port, tref_spliced.u32);
> if (s >= 0)
> tcp_sock_set_bufsize(c, s);
> else
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 96c31c8..bf5f62c 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -35,6 +35,7 @@
> #include <fcntl.h>
> #include <limits.h>
> #include <stdint.h>
> +#include <stdbool.h>
> #include <string.h>
> #include <time.h>
> #include <unistd.h>
> @@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg)
> static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> in_port_t port)
> {
> - struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
> int *p, i, s = -1;
> + bool inbound;
>
> - if (bitmap_isset(c->tcp.fwd_in.map, port))
> + if (bitmap_isset(c->tcp.fwd_in.map, port)) {
> p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
> - else
> + port += c->tcp.fwd_in.delta[port];
> + inbound = true;
> + } else {
> p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
> + port += c->tcp.fwd_out.delta[port];
> + inbound = false;
> + }
This smells wrong to me. AFAICT there's nothing inherently wrong with
forwarding inbound connections to port 5015 to port 5016 in the
namespace *and* forwarding outbound connections to port 5015 to port
5016 on the host.
So I think rather than consulting the port map here, each conn object
should know if its an inward or outward connection. To make that work
with epoll, we'd also need a bit in the ref which says whether it's a
socket listening on the host or in the ns.
>
> for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
> SWAP(s, *p);
> @@ -526,11 +532,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> break;
> }
>
> - if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
> + /* No socket available in namespace: create a new one for connect() */
> + if (s < 0 && inbound) {
> + struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
> +
> NS_CALL(tcp_splice_connect_ns, &ns_arg);
> return ns_arg.ret;
> }
>
> + /* Otherwise, the socket will connect on the side it was created on */
> return tcp_splice_connect(c, conn, s, port);
> }
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections
2022-10-11 1:19 ` David Gibson
@ 2022-10-11 7:42 ` Stefano Brivio
0 siblings, 0 replies; 7+ messages in thread
From: Stefano Brivio @ 2022-10-11 7:42 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Tue, 11 Oct 2022 12:19:49 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Tue, Oct 11, 2022 at 01:33:49AM +0200, Stefano Brivio wrote:
> > It's indeed convenient to use the final destination port in the epoll
> > reference data, so that we don't have to check the configured port
> > deltas before establishing connections. However, this doesn't work if
> > we need to access the original port information to know what to do
> > once we receive a connection.
> >
> > The existing implementation resulted in inbound, spliced connections
> > with a remapped destination port (and only those) to be attempted on
> > the outbound side, as we would check the wrong position in port
> > bitmaps to establish whether to use inbound or outbound sockets.
> >
> > For listening spliced sockets, set the port index in the epoll
> > reference to the original port. Once we get a connection, use the
> > port delta arrays to find the final destination port, and the
> > original destination port to decide what socket pool we should use.
> >
> > This avoids the need for a reverse mapping table as implemented for
> > UDP.
> >
> > Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> > tcp.c | 19 ++++++++-----------
> > tcp_splice.c | 18 ++++++++++++++----
> > 2 files changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index 63153b6..cc08353 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -3084,15 +3084,16 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
> > void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> > const void *addr, const char *ifname, in_port_t port)
> > {
> > + union tcp_epoll_ref tref_spliced = { .tcp.listen = 1, .tcp.splice = 1 };
> > union tcp_epoll_ref tref = { .tcp.listen = 1 };
> > const void *bind_addr;
> > int s;
> >
> > - if (ns) {
> > + if (ns)
> > tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
> > - } else {
> > + else
> > tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
> > - }
> > + tref_spliced.tcp.index = (in_port_t) port;
> >
> > if (af == AF_INET || af == AF_UNSPEC) {
> > if (!addr && c->mode == MODE_PASTA)
> > @@ -3100,8 +3101,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> > else
> > bind_addr = addr;
> >
> > - tref.tcp.v6 = 0;
> > - tref.tcp.splice = 0;
> > + tref.tcp.v6 = tref_spliced.tcp.v6 = 0;
> >
> > if (!ns) {
> > s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
> > @@ -3118,9 +3118,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> > if (c->mode == MODE_PASTA) {
> > bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
> >
> > - tref.tcp.splice = 1;
> > s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
> > - port, tref.u32);
> > + port, tref_spliced.u32);
> > if (s >= 0)
> > tcp_sock_set_bufsize(c, s);
> > else
> > @@ -3141,9 +3140,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> > else
> > bind_addr = addr;
> >
> > - tref.tcp.v6 = 1;
> > + tref.tcp.v6 = tref_spliced.tcp.v6 = 1;
> >
> > - tref.tcp.splice = 0;
> > if (!ns) {
> > s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
> > port, tref.u32);
> > @@ -3159,9 +3157,8 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> > if (c->mode == MODE_PASTA) {
> > bind_addr = &in6addr_loopback;
> >
> > - tref.tcp.splice = 1;
> > s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
> > - port, tref.u32);
> > + port, tref_spliced.u32);
> > if (s >= 0)
> > tcp_sock_set_bufsize(c, s);
> > else
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 96c31c8..bf5f62c 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -35,6 +35,7 @@
> > #include <fcntl.h>
> > #include <limits.h>
> > #include <stdint.h>
> > +#include <stdbool.h>
> > #include <string.h>
> > #include <time.h>
> > #include <unistd.h>
> > @@ -512,13 +513,18 @@ static int tcp_splice_connect_ns(void *arg)
> > static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> > in_port_t port)
> > {
> > - struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
> > int *p, i, s = -1;
> > + bool inbound;
> >
> > - if (bitmap_isset(c->tcp.fwd_in.map, port))
> > + if (bitmap_isset(c->tcp.fwd_in.map, port)) {
> > p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
> > - else
> > + port += c->tcp.fwd_in.delta[port];
> > + inbound = true;
> > + } else {
> > p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
> > + port += c->tcp.fwd_out.delta[port];
> > + inbound = false;
> > + }
>
> This smells wrong to me. AFAICT there's nothing inherently wrong with
> forwarding inbound connections to port 5015 to port 5016 in the
> namespace *and* forwarding outbound connections to port 5015 to port
> 5016 on the host.
Hah, yes, right, I didn't think of that. I had originally implemented
this before port remapping, and this possibility didn't occur to me
later on.
> So I think rather than consulting the port map here, each conn object
> should know if its an inward or outward connection. To make that work
> with epoll, we'd also need a bit in the ref which says whether it's a
> socket listening on the host or in the ns.
Even if it weren't for the issue you described, that looks much cleaner
in any case, thanks for the tip, I'll do this.
We still have 8 bits left there. And this way we could also keep the
reference format consistent across spliced and non-spliced.
--
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only
2022-10-10 23:33 [PATCH 0/3] Fixes for spliced connections Stefano Brivio
2022-10-10 23:33 ` [PATCH 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
2022-10-10 23:33 ` [PATCH 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
@ 2022-10-10 23:33 ` Stefano Brivio
2 siblings, 0 replies; 7+ messages in thread
From: Stefano Brivio @ 2022-10-10 23:33 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
If the user specifies an explicit loopback address for a port
binding, we're going to use that address for the 'tap' socket, and
the same exact address for the 'spliced' socket (because those are,
by definition, only bound to loopback addresses).
This means that the second binding will fail, and, unexpectedly, the
port is forwarded, but via tap device, which means the source address
in the namespace won't be a loopback address.
Make it explicit under which conditions we're creating which kind of
socket, by refactoring tcp_sock_init() into two separate functions
for IPv4 and IPv6 and gathering those conditions at the beginning.
Also, don't create spliced sockets if the user specifies explicitly
a non-loopback address, those are harmless but not desired either.
Fixes: 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 178 ++++++++++++++++++++++++++++++++++++---------------------
util.h | 3 +
2 files changed, 116 insertions(+), 65 deletions(-)
diff --git a/tcp.c b/tcp.c
index cc08353..0a769d5 100644
--- a/tcp.c
+++ b/tcp.c
@@ -275,6 +275,7 @@
#include <netinet/in.h>
#include <netinet/ip.h>
#include <stdint.h>
+#include <stdbool.h>
#include <stddef.h>
#include <string.h>
#include <sys/epoll.h>
@@ -3073,107 +3074,154 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
}
/**
- * tcp_sock_init() - Initialise listening sockets for a given port
+ * tcp_sock_init4() - Initialise listening sockets for a given IPv4 port
* @c: Execution context
* @ns: In pasta mode, if set, bind with loopback address in namespace
- * @af: Address family to select a specific IP version, or AF_UNSPEC
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*/
-void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
- const void *addr, const char *ifname, in_port_t port)
+static void tcp_sock_init4(const struct ctx *c, int ns, const in_addr_t *addr,
+ const char *ifname, in_port_t port)
{
- union tcp_epoll_ref tref_spliced = { .tcp.listen = 1, .tcp.splice = 1 };
union tcp_epoll_ref tref = { .tcp.listen = 1 };
- const void *bind_addr;
+ bool spliced = false, tap = true;
int s;
+ if (c->mode == MODE_PASTA) {
+ spliced = !addr ||
+ ntohl(*addr) == INADDR_ANY ||
+ IPV4_IS_LOOPBACK(ntohl(*addr));
+
+ if (!addr)
+ addr = &c->ip4.addr;
+
+ tap = !ns && !IPV4_IS_LOOPBACK(ntohl(*addr));
+ }
+
if (ns)
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
else
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- tref_spliced.tcp.index = (in_port_t) port;
- if (af == AF_INET || af == AF_UNSPEC) {
- if (!addr && c->mode == MODE_PASTA)
- bind_addr = &c->ip4.addr;
+ if (tap) {
+ s = sock_l4(c, AF_INET, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
else
- bind_addr = addr;
+ s = -1;
- tref.tcp.v6 = tref_spliced.tcp.v6 = 0;
+ if (c->tcp.fwd_in.mode == FWD_AUTO)
+ tcp_sock_init_ext[port][V4] = s;
+ }
- if (!ns) {
- s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
- else
- s = -1;
+ if (spliced) {
+ tref.tcp.splice = 1;
- if (c->tcp.fwd_in.mode == FWD_AUTO)
- tcp_sock_init_ext[port][V4] = s;
- }
+ tref.tcp.index = port;
+ addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- if (c->mode == MODE_PASTA) {
- bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
+ s = sock_l4(c, AF_INET, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
+ else
+ s = -1;
- s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
- port, tref_spliced.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
+ if (c->tcp.fwd_out.mode == FWD_AUTO) {
+ if (ns)
+ tcp_sock_ns[port][V4] = s;
else
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO) {
- if (ns)
- tcp_sock_ns[port][V4] = s;
- else
- tcp_sock_init_lo[port][V4] = s;
- }
+ tcp_sock_init_lo[port][V4] = s;
}
}
+}
+
+/**
+ * tcp_sock_init6() - Initialise listening sockets for a given IPv6 port
+ * @c: Execution context
+ * @ns: In pasta mode, if set, bind with loopback address in namespace
+ * @addr: Pointer to address for binding, NULL if not configured
+ * @ifname: Name of interface to bind to, NULL if not configured
+ * @port: Port, host order
+ */
+static void tcp_sock_init6(const struct ctx *c, int ns,
+ const struct in6_addr *addr, const char *ifname,
+ in_port_t port)
+{
+ union tcp_epoll_ref tref = { .tcp.listen = 1, .tcp.v6 = 1 };
+ bool spliced = false, tap = true;
+ int s;
+
+ if (c->mode == MODE_PASTA) {
+ spliced = !addr ||
+ IN6_IS_ADDR_UNSPECIFIED(addr) ||
+ IN6_IS_ADDR_LOOPBACK(addr);
+
+ if (!addr)
+ addr = &c->ip6.addr;
+
+ tap = !ns && !IN6_IS_ADDR_LOOPBACK(addr);
+ }
+
+ if (ns)
+ tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
+ else
+ tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- if (af == AF_INET6 || af == AF_UNSPEC) {
- if (!addr && c->mode == MODE_PASTA)
- bind_addr = &c->ip6.addr;
+ if (tap) {
+ s = sock_l4(c, AF_INET6, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
else
- bind_addr = addr;
+ s = -1;
- tref.tcp.v6 = tref_spliced.tcp.v6 = 1;
+ if (c->tcp.fwd_in.mode == FWD_AUTO)
+ tcp_sock_init_ext[port][V6] = s;
+ }
- if (!ns) {
- s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
- else
- s = -1;
+ if (spliced) {
+ tref.tcp.splice = 1;
- if (c->tcp.fwd_in.mode == FWD_AUTO)
- tcp_sock_init_ext[port][V6] = s;
- }
+ tref.tcp.index = port;
+ addr = &in6addr_loopback;
- if (c->mode == MODE_PASTA) {
- bind_addr = &in6addr_loopback;
+ s = sock_l4(c, AF_INET6, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
+ else
+ s = -1;
- s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
- port, tref_spliced.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
+ if (c->tcp.fwd_out.mode == FWD_AUTO) {
+ if (ns)
+ tcp_sock_ns[port][V6] = s;
else
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO) {
- if (ns)
- tcp_sock_ns[port][V6] = s;
- else
- tcp_sock_init_lo[port][V6] = s;
- }
+ tcp_sock_init_lo[port][V6] = s;
}
}
}
+/**
+ * tcp_sock_init() - Initialise listening sockets for a given port
+ * @c: Execution context
+ * @ns: In pasta mode, if set, bind with loopback address in namespace
+ * @af: Address family to select a specific IP version, or AF_UNSPEC
+ * @addr: Pointer to address for binding, NULL if not configured
+ * @ifname: Name of interface to bind to, NULL if not configured
+ * @port: Port, host order
+ */
+void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
+ const void *addr, const char *ifname, in_port_t port)
+{
+ if (af == AF_INET || af == AF_UNSPEC)
+ tcp_sock_init4(c, ns, addr, ifname, port);
+ if (af == AF_INET6 || af == AF_UNSPEC)
+ tcp_sock_init6(c, ns, addr, ifname, port);
+}
+
/**
* tcp_sock_init_ns() - Bind sockets in namespace for outbound connections
* @arg: Execution context
diff --git a/util.h b/util.h
index 7dc3d18..b1eadfd 100644
--- a/util.h
+++ b/util.h
@@ -81,6 +81,9 @@
#define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 })
#define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
+#define IPV4_IS_LOOPBACK(addr) \
+ ((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+
#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 4)
#define NS_CALL(fn, arg) \
do { \
--
@@ -81,6 +81,9 @@
#define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 })
#define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
+#define IPV4_IS_LOOPBACK(addr) \
+ ((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+
#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 4)
#define NS_CALL(fn, arg) \
do { \
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread