* [PATCH v2 0/3] Fixes for spliced connections
@ 2022-10-12 15:45 Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Stefano Brivio @ 2022-10-12 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
The most pressing problem fixed here is that spliced connections with
a remapped destination port would be attempted on the wrong side. This
is an old issue as indicated by the Fixes: tag.
Patch 1/3 restores sanity in comments before we attempt to fix the
issue, patch 2/3 fixes the actual issue, and patch 3/3 introduces
a minor rework on top to fix another issue, where if the user
explicitly configures a loopback address in a port binding we would
still create a non-spliced socket stealing the related connections.
v2:
- rewrite 2/3 with David's idea of using bit in epoll reference
- rebase, and use IPV4_IS_LOOPBACK in tcp_conn_from_sock() too
Stefano Brivio (3):
tcp, tcp_splice: Adjust comments to current meaning of inbound and
outbound
tcp, tcp_splice: Fix port remapping for inbound, spliced connections
Don't create 'tap' socket for ports that are bound to loopback only
tcp.c | 192 +++++++++++++++++++++++++++++++--------------------
tcp.h | 2 +
tcp_splice.c | 22 +++---
util.h | 3 +
4 files changed, 137 insertions(+), 82 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound
2022-10-12 15:45 [PATCH v2 0/3] Fixes for spliced connections Stefano Brivio
@ 2022-10-12 15:45 ` Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only Stefano Brivio
2 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2022-10-12 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
For tcp_sock_init_ns(), "inbound" connections used to be the ones
being established toward any listening socket we create, as opposed
to sockets we connect().
Similarly, tcp_splice_new() used to handle "inbound" connections in
the sense that they originated from listening sockets, and they would
in turn cause a connect() on an "outbound" socket.
Since commit 1128fa03fe73 ("Improve types and names for port
forwarding configuration"), though, inbound connections are more
broadly defined as the ones directed to guest or namepsace, and
outbound the ones originating from there.
Update comments for those two functions.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 2 +-
tcp_splice.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tcp.c b/tcp.c
index 7e82589..63153b6 100644
--- a/tcp.c
+++ b/tcp.c
@@ -3178,7 +3178,7 @@ void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
}
/**
- * tcp_sock_init_ns() - Bind sockets in namespace for inbound connections
+ * tcp_sock_init_ns() - Bind sockets in namespace for outbound connections
* @arg: Execution context
*
* Return: 0
diff --git a/tcp_splice.c b/tcp_splice.c
index 4a015d0..96c31c8 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -502,7 +502,7 @@ static int tcp_splice_connect_ns(void *arg)
}
/**
- * tcp_splice_new() - Handle new inbound, spliced connection
+ * tcp_splice_new() - Handle new spliced connection
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
--
@@ -502,7 +502,7 @@ static int tcp_splice_connect_ns(void *arg)
}
/**
- * tcp_splice_new() - Handle new inbound, spliced connection
+ * tcp_splice_new() - Handle new spliced connection
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections
2022-10-12 15:45 [PATCH v2 0/3] Fixes for spliced connections Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
@ 2022-10-12 15:45 ` Stefano Brivio
2022-10-13 1:19 ` David Gibson
2022-10-12 15:45 ` [PATCH v2 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only Stefano Brivio
2 siblings, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2022-10-12 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
In pasta mode, when we receive a new inbound connection, we need to
select a socket that was created in the namespace to proceed and
connect() it to its final destination.
The existing condition might pick a wrong socket, though, if the
destination port is remapped, because we'll check the bitmap of
inbound ports using the remapped port (stored in the epoll reference)
as index, and not the original port.
Instead of using the port bitmap for this purpose, store this
information in the epoll reference itself, by adding a new 'outbound'
bit, that's set if the listening socket was created the namespace,
and unset otherwise.
Then, use this bit to pick a socket on the right side.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
v2:
- use a bit in epoll reference to indicate whether a socket is for
inbound or outbound connections, instead of trying (and failing)
to rebuild that information from maps (David Gibson)
tcp.c | 7 +++----
tcp.h | 2 ++
tcp_splice.c | 20 +++++++++++++-------
3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/tcp.c b/tcp.c
index 63153b6..0d4ce57 100644
--- a/tcp.c
+++ b/tcp.c
@@ -3084,15 +3084,14 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
const void *addr, const char *ifname, in_port_t port)
{
- union tcp_epoll_ref tref = { .tcp.listen = 1 };
+ union tcp_epoll_ref tref = { .tcp.listen = 1, .tcp.outbound = ns };
const void *bind_addr;
int s;
- if (ns) {
+ if (ns)
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
- } else {
+ else
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- }
if (af == AF_INET || af == AF_UNSPEC) {
if (!addr && c->mode == MODE_PASTA)
diff --git a/tcp.h b/tcp.h
index 7ba7ab7..72e7815 100644
--- a/tcp.h
+++ b/tcp.h
@@ -34,6 +34,7 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
* union tcp_epoll_ref - epoll reference portion for TCP connections
* @listen: Set if this file descriptor is a listening socket
* @splice: Set if descriptor is associated to a spliced connection
+ * @outbound: Listening socket maps to outbound, spliced connection
* @v6: Set for IPv6 sockets or connections
* @timer: Reference is a timerfd descriptor for connection
* @index: Index of connection in table, or port for bound sockets
@@ -43,6 +44,7 @@ union tcp_epoll_ref {
struct {
uint32_t listen:1,
splice:1,
+ outbound:1,
v6:1,
timer:1,
index:20;
diff --git a/tcp_splice.c b/tcp_splice.c
index 96c31c8..99c5fa7 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -35,6 +35,7 @@
#include <fcntl.h>
#include <limits.h>
#include <stdint.h>
+#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
@@ -506,19 +507,19 @@ static int tcp_splice_connect_ns(void *arg)
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
+ * @outbound: Connection request coming from namespace
*
* Return: return code from connect()
*/
static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
- in_port_t port)
+ in_port_t port, int outbound)
{
- struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
int *p, i, s = -1;
- if (bitmap_isset(c->tcp.fwd_in.map, port))
- p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
- else
+ if (outbound)
p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
+ else
+ p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
SWAP(s, *p);
@@ -526,11 +527,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
break;
}
- if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
+ /* No socket available in namespace: create a new one for connect() */
+ if (s < 0 && !outbound) {
+ struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
+
NS_CALL(tcp_splice_connect_ns, &ns_arg);
return ns_arg.ret;
}
+ /* Otherwise, the socket will connect on the side it was created on */
return tcp_splice_connect(c, conn, s, port);
}
@@ -592,7 +597,8 @@ void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
conn->a = s;
conn->flags = ref.r.p.tcp.tcp.v6 ? SOCK_V6 : 0;
- if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index))
+ if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index,
+ ref.r.p.tcp.tcp.outbound))
conn_flag(c, conn, CLOSING);
return;
--
@@ -35,6 +35,7 @@
#include <fcntl.h>
#include <limits.h>
#include <stdint.h>
+#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
@@ -506,19 +507,19 @@ static int tcp_splice_connect_ns(void *arg)
* @c: Execution context
* @conn: Connection pointer
* @port: Destination port, host order
+ * @outbound: Connection request coming from namespace
*
* Return: return code from connect()
*/
static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
- in_port_t port)
+ in_port_t port, int outbound)
{
- struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
int *p, i, s = -1;
- if (bitmap_isset(c->tcp.fwd_in.map, port))
- p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
- else
+ if (outbound)
p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
+ else
+ p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
SWAP(s, *p);
@@ -526,11 +527,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
break;
}
- if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
+ /* No socket available in namespace: create a new one for connect() */
+ if (s < 0 && !outbound) {
+ struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
+
NS_CALL(tcp_splice_connect_ns, &ns_arg);
return ns_arg.ret;
}
+ /* Otherwise, the socket will connect on the side it was created on */
return tcp_splice_connect(c, conn, s, port);
}
@@ -592,7 +597,8 @@ void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
conn->a = s;
conn->flags = ref.r.p.tcp.tcp.v6 ? SOCK_V6 : 0;
- if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index))
+ if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index,
+ ref.r.p.tcp.tcp.outbound))
conn_flag(c, conn, CLOSING);
return;
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only
2022-10-12 15:45 [PATCH v2 0/3] Fixes for spliced connections Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
@ 2022-10-12 15:45 ` Stefano Brivio
2 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2022-10-12 15:45 UTC (permalink / raw)
To: passt-dev; +Cc: David Gibson
If the user specifies an explicit loopback address for a port
binding, we're going to use that address for the 'tap' socket, and
the same exact address for the 'spliced' socket (because those are,
by definition, only bound to loopback addresses).
This means that the second binding will fail, and, unexpectedly, the
port is forwarded, but via tap device, which means the source address
in the namespace won't be a loopback address.
Make it explicit under which conditions we're creating which kind of
socket, by refactoring tcp_sock_init() into two separate functions
for IPv4 and IPv6 and gathering those conditions at the beginning.
Also, don't create spliced sockets if the user specifies explicitly
a non-loopback address, those are harmless but not desired either.
Fixes: 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
v2:
- use new IPV4_IS_LOOPBACK macro in tcp_conn_from_sock() too
tcp.c | 183 +++++++++++++++++++++++++++++++++++----------------------
util.h | 3 +
2 files changed, 117 insertions(+), 69 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0d4ce57..e6979a9 100644
--- a/tcp.c
+++ b/tcp.c
@@ -275,6 +275,7 @@
#include <netinet/in.h>
#include <netinet/ip.h>
#include <stdint.h>
+#include <stdbool.h>
#include <stddef.h>
#include <string.h>
#include <sys/epoll.h>
@@ -2906,8 +2907,8 @@ static void tcp_conn_from_sock(struct ctx *c, union epoll_ref ref,
memset(&conn->a.a4.zero, 0, sizeof(conn->a.a4.zero));
memset(&conn->a.a4.one, 0xff, sizeof(conn->a.a4.one));
- if (s_addr >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET ||
- s_addr == INADDR_ANY || htonl(s_addr) == c->ip4.addr_seen)
+ if (IPV4_IS_LOOPBACK(s_addr) || s_addr == INADDR_ANY ||
+ htonl(s_addr) == c->ip4.addr_seen)
s_addr = ntohl(c->ip4.gw);
s_addr = htonl(s_addr);
@@ -3073,109 +3074,153 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
}
/**
- * tcp_sock_init() - Initialise listening sockets for a given port
+ * tcp_sock_init4() - Initialise listening sockets for a given IPv4 port
* @c: Execution context
* @ns: In pasta mode, if set, bind with loopback address in namespace
- * @af: Address family to select a specific IP version, or AF_UNSPEC
* @addr: Pointer to address for binding, NULL if not configured
* @ifname: Name of interface to bind to, NULL if not configured
* @port: Port, host order
*/
-void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
- const void *addr, const char *ifname, in_port_t port)
+static void tcp_sock_init4(const struct ctx *c, int ns, const in_addr_t *addr,
+ const char *ifname, in_port_t port)
{
union tcp_epoll_ref tref = { .tcp.listen = 1, .tcp.outbound = ns };
- const void *bind_addr;
+ bool spliced = false, tap = true;
int s;
+ if (c->mode == MODE_PASTA) {
+ spliced = !addr ||
+ ntohl(*addr) == INADDR_ANY ||
+ IPV4_IS_LOOPBACK(ntohl(*addr));
+
+ if (!addr)
+ addr = &c->ip4.addr;
+
+ tap = !ns && !IPV4_IS_LOOPBACK(ntohl(*addr));
+ }
+
if (ns)
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
else
tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- if (af == AF_INET || af == AF_UNSPEC) {
- if (!addr && c->mode == MODE_PASTA)
- bind_addr = &c->ip4.addr;
+ if (tap) {
+ s = sock_l4(c, AF_INET, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
else
- bind_addr = addr;
+ s = -1;
- tref.tcp.v6 = 0;
- tref.tcp.splice = 0;
+ if (c->tcp.fwd_in.mode == FWD_AUTO)
+ tcp_sock_init_ext[port][V4] = s;
+ }
- if (!ns) {
- s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
- else
- s = -1;
+ if (spliced) {
+ tref.tcp.splice = 1;
- if (c->tcp.fwd_in.mode == FWD_AUTO)
- tcp_sock_init_ext[port][V4] = s;
- }
+ addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
- if (c->mode == MODE_PASTA) {
- bind_addr = &(uint32_t){ htonl(INADDR_LOOPBACK) };
+ s = sock_l4(c, AF_INET, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
+ else
+ s = -1;
- tref.tcp.splice = 1;
- s = sock_l4(c, AF_INET, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
+ if (c->tcp.fwd_out.mode == FWD_AUTO) {
+ if (ns)
+ tcp_sock_ns[port][V4] = s;
else
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO) {
- if (ns)
- tcp_sock_ns[port][V4] = s;
- else
- tcp_sock_init_lo[port][V4] = s;
- }
+ tcp_sock_init_lo[port][V4] = s;
}
}
+}
+
+/**
+ * tcp_sock_init6() - Initialise listening sockets for a given IPv6 port
+ * @c: Execution context
+ * @ns: In pasta mode, if set, bind with loopback address in namespace
+ * @addr: Pointer to address for binding, NULL if not configured
+ * @ifname: Name of interface to bind to, NULL if not configured
+ * @port: Port, host order
+ */
+static void tcp_sock_init6(const struct ctx *c, int ns,
+ const struct in6_addr *addr, const char *ifname,
+ in_port_t port)
+{
+ union tcp_epoll_ref tref = { .tcp.listen = 1, .tcp.outbound = ns,
+ .tcp.v6 = 1 };
+ bool spliced = false, tap = true;
+ int s;
+
+ if (c->mode == MODE_PASTA) {
+ spliced = !addr ||
+ IN6_IS_ADDR_UNSPECIFIED(addr) ||
+ IN6_IS_ADDR_LOOPBACK(addr);
+
+ if (!addr)
+ addr = &c->ip6.addr;
+
+ tap = !ns && !IN6_IS_ADDR_LOOPBACK(addr);
+ }
+
+ if (ns)
+ tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
+ else
+ tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
- if (af == AF_INET6 || af == AF_UNSPEC) {
- if (!addr && c->mode == MODE_PASTA)
- bind_addr = &c->ip6.addr;
+ if (tap) {
+ s = sock_l4(c, AF_INET6, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
else
- bind_addr = addr;
+ s = -1;
- tref.tcp.v6 = 1;
+ if (c->tcp.fwd_in.mode == FWD_AUTO)
+ tcp_sock_init_ext[port][V6] = s;
+ }
- tref.tcp.splice = 0;
- if (!ns) {
- s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
- else
- s = -1;
+ if (spliced) {
+ tref.tcp.splice = 1;
- if (c->tcp.fwd_in.mode == FWD_AUTO)
- tcp_sock_init_ext[port][V6] = s;
- }
+ addr = &in6addr_loopback;
- if (c->mode == MODE_PASTA) {
- bind_addr = &in6addr_loopback;
+ s = sock_l4(c, AF_INET6, IPPROTO_TCP, addr, ifname, port,
+ tref.u32);
+ if (s >= 0)
+ tcp_sock_set_bufsize(c, s);
+ else
+ s = -1;
- tref.tcp.splice = 1;
- s = sock_l4(c, AF_INET6, IPPROTO_TCP, bind_addr, ifname,
- port, tref.u32);
- if (s >= 0)
- tcp_sock_set_bufsize(c, s);
+ if (c->tcp.fwd_out.mode == FWD_AUTO) {
+ if (ns)
+ tcp_sock_ns[port][V6] = s;
else
- s = -1;
-
- if (c->tcp.fwd_out.mode == FWD_AUTO) {
- if (ns)
- tcp_sock_ns[port][V6] = s;
- else
- tcp_sock_init_lo[port][V6] = s;
- }
+ tcp_sock_init_lo[port][V6] = s;
}
}
}
+/**
+ * tcp_sock_init() - Initialise listening sockets for a given port
+ * @c: Execution context
+ * @ns: In pasta mode, if set, bind with loopback address in namespace
+ * @af: Address family to select a specific IP version, or AF_UNSPEC
+ * @addr: Pointer to address for binding, NULL if not configured
+ * @ifname: Name of interface to bind to, NULL if not configured
+ * @port: Port, host order
+ */
+void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
+ const void *addr, const char *ifname, in_port_t port)
+{
+ if (af == AF_INET || af == AF_UNSPEC)
+ tcp_sock_init4(c, ns, addr, ifname, port);
+ if (af == AF_INET6 || af == AF_UNSPEC)
+ tcp_sock_init6(c, ns, addr, ifname, port);
+}
+
/**
* tcp_sock_init_ns() - Bind sockets in namespace for outbound connections
* @arg: Execution context
diff --git a/util.h b/util.h
index 7dc3d18..b1eadfd 100644
--- a/util.h
+++ b/util.h
@@ -81,6 +81,9 @@
#define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 })
#define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
+#define IPV4_IS_LOOPBACK(addr) \
+ ((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+
#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 4)
#define NS_CALL(fn, arg) \
do { \
--
@@ -81,6 +81,9 @@
#define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 })
#define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
+#define IPV4_IS_LOOPBACK(addr) \
+ ((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
+
#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 4)
#define NS_CALL(fn, arg) \
do { \
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections
2022-10-12 15:45 ` [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
@ 2022-10-13 1:19 ` David Gibson
0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2022-10-13 1:19 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5394 bytes --]
On Wed, Oct 12, 2022 at 05:45:53PM +0200, Stefano Brivio wrote:
> In pasta mode, when we receive a new inbound connection, we need to
> select a socket that was created in the namespace to proceed and
> connect() it to its final destination.
>
> The existing condition might pick a wrong socket, though, if the
> destination port is remapped, because we'll check the bitmap of
> inbound ports using the remapped port (stored in the epoll reference)
> as index, and not the original port.
>
> Instead of using the port bitmap for this purpose, store this
> information in the epoll reference itself, by adding a new 'outbound'
> bit, that's set if the listening socket was created the namespace,
> and unset otherwise.
>
> Then, use this bit to pick a socket on the right side.
>
> Suggested-by: David Gibson <david@gibson.dropbear.id.au>
> Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> v2:
> - use a bit in epoll reference to indicate whether a socket is for
> inbound or outbound connections, instead of trying (and failing)
> to rebuild that information from maps (David Gibson)
>
> tcp.c | 7 +++----
> tcp.h | 2 ++
> tcp_splice.c | 20 +++++++++++++-------
> 3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 63153b6..0d4ce57 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -3084,15 +3084,14 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events,
> void tcp_sock_init(const struct ctx *c, int ns, sa_family_t af,
> const void *addr, const char *ifname, in_port_t port)
> {
> - union tcp_epoll_ref tref = { .tcp.listen = 1 };
> + union tcp_epoll_ref tref = { .tcp.listen = 1, .tcp.outbound = ns };
> const void *bind_addr;
> int s;
>
> - if (ns) {
> + if (ns)
> tref.tcp.index = (in_port_t)(port + c->tcp.fwd_out.delta[port]);
> - } else {
> + else
> tref.tcp.index = (in_port_t)(port + c->tcp.fwd_in.delta[port]);
> - }
>
> if (af == AF_INET || af == AF_UNSPEC) {
> if (!addr && c->mode == MODE_PASTA)
> diff --git a/tcp.h b/tcp.h
> index 7ba7ab7..72e7815 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -34,6 +34,7 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s,
> * union tcp_epoll_ref - epoll reference portion for TCP connections
> * @listen: Set if this file descriptor is a listening socket
> * @splice: Set if descriptor is associated to a spliced connection
> + * @outbound: Listening socket maps to outbound, spliced connection
> * @v6: Set for IPv6 sockets or connections
> * @timer: Reference is a timerfd descriptor for connection
> * @index: Index of connection in table, or port for bound sockets
> @@ -43,6 +44,7 @@ union tcp_epoll_ref {
> struct {
> uint32_t listen:1,
> splice:1,
> + outbound:1,
> v6:1,
> timer:1,
> index:20;
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 96c31c8..99c5fa7 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -35,6 +35,7 @@
> #include <fcntl.h>
> #include <limits.h>
> #include <stdint.h>
> +#include <stdbool.h>
> #include <string.h>
> #include <time.h>
> #include <unistd.h>
> @@ -506,19 +507,19 @@ static int tcp_splice_connect_ns(void *arg)
> * @c: Execution context
> * @conn: Connection pointer
> * @port: Destination port, host order
> + * @outbound: Connection request coming from namespace
> *
> * Return: return code from connect()
> */
> static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> - in_port_t port)
> + in_port_t port, int outbound)
> {
> - struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
> int *p, i, s = -1;
>
> - if (bitmap_isset(c->tcp.fwd_in.map, port))
> - p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
> - else
> + if (outbound)
> p = CONN_V6(conn) ? init_sock_pool6 : init_sock_pool4;
> + else
> + p = CONN_V6(conn) ? ns_sock_pool6 : ns_sock_pool4;
>
> for (i = 0; i < TCP_SOCK_POOL_SIZE; i++, p++) {
> SWAP(s, *p);
> @@ -526,11 +527,15 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> break;
> }
>
> - if (s < 0 && bitmap_isset(c->tcp.fwd_in.map, port)) {
> + /* No socket available in namespace: create a new one for connect() */
> + if (s < 0 && !outbound) {
> + struct tcp_splice_connect_ns_arg ns_arg = { c, conn, port, 0 };
> +
> NS_CALL(tcp_splice_connect_ns, &ns_arg);
> return ns_arg.ret;
> }
>
> + /* Otherwise, the socket will connect on the side it was created on */
> return tcp_splice_connect(c, conn, s, port);
> }
>
> @@ -592,7 +597,8 @@ void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
> conn->a = s;
> conn->flags = ref.r.p.tcp.tcp.v6 ? SOCK_V6 : 0;
>
> - if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index))
> + if (tcp_splice_new(c, conn, ref.r.p.tcp.tcp.index,
> + ref.r.p.tcp.tcp.outbound))
> conn_flag(c, conn, CLOSING);
>
> return;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-10-13 4:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12 15:45 [PATCH v2 0/3] Fixes for spliced connections Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 1/3] tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound Stefano Brivio
2022-10-12 15:45 ` [PATCH v2 2/3] tcp, tcp_splice: Fix port remapping for inbound, spliced connections Stefano Brivio
2022-10-13 1:19 ` David Gibson
2022-10-12 15:45 ` [PATCH v2 3/3] tcp: Don't create 'tap' socket for ports that are bound to loopback only Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).