public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
@ 2024-07-20 13:54 Jon Maloy
  2024-07-20 14:46 ` Jon Maloy
  2024-07-21  9:20 ` Stefano Brivio
  0 siblings, 2 replies; 5+ messages in thread
From: Jon Maloy @ 2024-07-20 13:54 UTC (permalink / raw)
  To: passt-dev, sbrivio, lvivier, dgibson, jmaloy

The recently added socket option SO_PEEK_OFF is not supported for
TCP/IPv6 sockets. Until we get that support into the kernel we need to
test for support in both protocols to set the global 'peek_offset_cap´
to true.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tcp.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/tcp.c b/tcp.c
index c5431f1..32026ca 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c)
 	}
 }
 
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af:		Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(int af)
+{
+	bool ret = false;
+	int s, optv = 0;
+
+	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (s < 0) {
+		warn_perror("Temporary TCP socket creation failed");
+	} else {
+		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+			ret = true;
+		close(s);
+	}
+	return ret;
+}
+
 /**
  * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
  * @c:		Execution context
@@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c)
  */
 int tcp_init(struct ctx *c)
 {
-	unsigned int b, optv = 0;
-	int s;
+	unsigned int b;
 
 	ASSERT(!c->no_tcp);
 
@@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c)
 		NS_CALL(tcp_ns_socks_init, c);
 	}
 
-	/* Probe for SO_PEEK_OFF support */
-	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
-	if (s < 0) {
-		warn_perror("Temporary TCP socket creation failed");
-	} else {
-		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
-			peek_offset_cap = true;
-		close(s);
-	}
+	peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) &&
+		tcp_probe_peek_offset_cap(AF_INET6);
 	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
 
 	return 0;
-- 
@@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c)
 	}
 }
 
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af:		Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(int af)
+{
+	bool ret = false;
+	int s, optv = 0;
+
+	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (s < 0) {
+		warn_perror("Temporary TCP socket creation failed");
+	} else {
+		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+			ret = true;
+		close(s);
+	}
+	return ret;
+}
+
 /**
  * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
  * @c:		Execution context
@@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c)
  */
 int tcp_init(struct ctx *c)
 {
-	unsigned int b, optv = 0;
-	int s;
+	unsigned int b;
 
 	ASSERT(!c->no_tcp);
 
@@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c)
 		NS_CALL(tcp_ns_socks_init, c);
 	}
 
-	/* Probe for SO_PEEK_OFF support */
-	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
-	if (s < 0) {
-		warn_perror("Temporary TCP socket creation failed");
-	} else {
-		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
-			peek_offset_cap = true;
-		close(s);
-	}
+	peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) &&
+		tcp_probe_peek_offset_cap(AF_INET6);
 	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
 
 	return 0;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-20 13:54 [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Jon Maloy
@ 2024-07-20 14:46 ` Jon Maloy
  2024-07-21  9:21   ` Stefano Brivio
  2024-07-21  9:20 ` Stefano Brivio
  1 sibling, 1 reply; 5+ messages in thread
From: Jon Maloy @ 2024-07-20 14:46 UTC (permalink / raw)
  To: passt-dev, sbrivio, lvivier, dgibson

My first approach to this was to condition the use of SO_PEEK_OFF with 
tcpv4, e.g., basically
a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made 
an interesting discovery.

It turns out that, unless the ´-4' option is explicitly given on the 
command line, all sockets are
v6, even those that are later used as v4 sockets. So, the set_peek_off() 
call failed even
for supposedly v4 sockets.

I checked this by adding a printout to the tcp_listen_handler(), and 
noticed that all returns from
the accept4() call goes into the AF_INET6 branch, even when the client 
(iperf3) call is using an IPv4 address.
During traffic, the very same socket is marked as v4 in the tcp_tap_conn 
structure, and this seems to
have worked just fine until I added the set_peek_offset call().

I believe this is an issue that has been introduced during the last 
months, since I didn't start
using the ´-4' option consistently until some months ago, and then it 
worked.

Happy summer
///jon


On 2024-07-20 09:54, Jon Maloy wrote:
> The recently added socket option SO_PEEK_OFF is not supported for
> TCP/IPv6 sockets. Until we get that support into the kernel we need to
> test for support in both protocols to set the global 'peek_offset_cap´
> to true.
>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
>   tcp.c | 36 +++++++++++++++++++++++++-----------
>   1 file changed, 25 insertions(+), 11 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index c5431f1..32026ca 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c)
>   	}
>   }
>   
> +/**
> + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
> + * @af:		Address family, IPv4 or IPv6
> + *
> + * Return: true if supported, false otherwise
> + */
> +bool tcp_probe_peek_offset_cap(int af)
> +{
> +	bool ret = false;
> +	int s, optv = 0;
> +
> +	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +	if (s < 0) {
> +		warn_perror("Temporary TCP socket creation failed");
> +	} else {
> +		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
> +			ret = true;
> +		close(s);
> +	}
> +	return ret;
> +}
> +
>   /**
>    * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
>    * @c:		Execution context
> @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c)
>    */
>   int tcp_init(struct ctx *c)
>   {
> -	unsigned int b, optv = 0;
> -	int s;
> +	unsigned int b;
>   
>   	ASSERT(!c->no_tcp);
>   
> @@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c)
>   		NS_CALL(tcp_ns_socks_init, c);
>   	}
>   
> -	/* Probe for SO_PEEK_OFF support */
> -	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> -	if (s < 0) {
> -		warn_perror("Temporary TCP socket creation failed");
> -	} else {
> -		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
> -			peek_offset_cap = true;
> -		close(s);
> -	}
> +	peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) &&
> +		tcp_probe_peek_offset_cap(AF_INET6);
>   	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
>   
>   	return 0;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-20 13:54 [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Jon Maloy
  2024-07-20 14:46 ` Jon Maloy
@ 2024-07-21  9:20 ` Stefano Brivio
  1 sibling, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2024-07-21  9:20 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, lvivier, dgibson

On Sat, 20 Jul 2024 09:54:53 -0400
Jon Maloy <jmaloy@redhat.com> wrote:

> The recently added socket option SO_PEEK_OFF is not supported for
> TCP/IPv6 sockets. Until we get that support into the kernel we need to
> test for support in both protocols to set the global 'peek_offset_cap´
> to true.
> 
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
>  tcp.c | 36 +++++++++++++++++++++++++-----------
>  1 file changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index c5431f1..32026ca 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c)
>  	}
>  }
>  
> +/**
> + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
> + * @af:		Address family, IPv4 or IPv6
> + *
> + * Return: true if supported, false otherwise
> + */
> +bool tcp_probe_peek_offset_cap(int af)
> +{
> +	bool ret = false;
> +	int s, optv = 0;
> +
> +	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> +	if (s < 0) {
> +		warn_perror("Temporary TCP socket creation failed");
> +	} else {
> +		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
> +			ret = true;
> +		close(s);
> +	}
> +	return ret;
> +}
> +
>  /**
>   * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
>   * @c:		Execution context
> @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c)
>   */
>  int tcp_init(struct ctx *c)
>  {
> -	unsigned int b, optv = 0;
> -	int s;
> +	unsigned int b;
>  
>  	ASSERT(!c->no_tcp);
>  
> @@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c)
>  		NS_CALL(tcp_ns_socks_init, c);
>  	}
>  
> -	/* Probe for SO_PEEK_OFF support */
> -	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
> -	if (s < 0) {
> -		warn_perror("Temporary TCP socket creation failed");
> -	} else {
> -		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
> -			peek_offset_cap = true;
> -		close(s);
> -	}
> +	peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) &&
> +		tcp_probe_peek_offset_cap(AF_INET6);

I think we shouldn't probe for IPv4 SO_PEEK_OFF support if we're not
interested in it, and the same applies for IPv6: those two checks
should depend on whether c->ifi4 and c->ifi6 are set (following the
same logic as tcp_sock_init()).

In practice, since you just submitted the fix to have SO_PEEK_OFF
support also for IPv6 sockets, it doesn't matter so much, but it might
still be relevant for users who will stick to 6.9 and 6.10 kernel
versions for a while, for whatever reason.

Eventually, we might want to support IPv6 operation on IPv4-only hosts,
maybe even with default options, making this even less relevant.

But the day we get to it, it should be simpler to just replace all the
checks of c->ifi4 / c->ifi6 used to represent IPv4 / IPv6 support,
rather than replacing slightly different bits of logic.

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-20 14:46 ` Jon Maloy
@ 2024-07-21  9:21   ` Stefano Brivio
  2024-07-22  0:55     ` David Gibson
  0 siblings, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2024-07-21  9:21 UTC (permalink / raw)
  To: Jon Maloy; +Cc: passt-dev, lvivier, dgibson

On Sat, 20 Jul 2024 10:46:07 -0400
Jon Maloy <jmaloy@redhat.com> wrote:

> My first approach to this was to condition the use of SO_PEEK_OFF with 
> tcpv4, e.g., basically
> a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made 
> an interesting discovery.
> 
> It turns out that, unless the ´-4' option is explicitly given on the 
> command line, all sockets are
> v6, even those that are later used as v4 sockets.

Not necessarily: if IPv6 support is disabled for other reasons, such as
the fact that we didn't find any routable IPv6 address, sockets will
also be IPv4-only. See tcp_sock_init().

> So, the set_peek_off() 
> call failed even
> for supposedly v4 sockets.
> 
> I checked this by adding a printout to the tcp_listen_handler(), and 
> noticed that all returns from
> the accept4() call goes into the AF_INET6 branch, even when the client 
> (iperf3) call is using an IPv4 address.
> During traffic, the very same socket is marked as v4 in the tcp_tap_conn 
> structure, and this seems to
> have worked just fine until I added the set_peek_offset call().
> 
> I believe this is an issue that has been introduced during the last 
> months, since I didn't start
> using the ´-4' option consistently until some months ago, and then it 
> worked.

That's not actually an issue, it's intended, see commit 8e914238b6de
("tcp: Use dual stack sockets for port forwarding when possible").

Could it be that you enabled IPv6 on your setup meanwhile, and that's
why you see this now? I think I tested your changes on an isolated
IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF
support at the time.

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-21  9:21   ` Stefano Brivio
@ 2024-07-22  0:55     ` David Gibson
  0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2024-07-22  0:55 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Jon Maloy, passt-dev, lvivier, dgibson

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

On Sun, Jul 21, 2024 at 11:21:18AM +0200, Stefano Brivio wrote:
> On Sat, 20 Jul 2024 10:46:07 -0400
> Jon Maloy <jmaloy@redhat.com> wrote:
> 
> > My first approach to this was to condition the use of SO_PEEK_OFF with 
> > tcpv4, e.g., basically
> > a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made 
> > an interesting discovery.
> > 
> > It turns out that, unless the ´-4' option is explicitly given on the 
> > command line, all sockets are
> > v6, even those that are later used as v4 sockets.
> 
> Not necessarily: if IPv6 support is disabled for other reasons, such as
> the fact that we didn't find any routable IPv6 address, sockets will
> also be IPv4-only. See tcp_sock_init().

Also, if your forwarding options give explicit IPv4 addresses (or even
"0.0.0.0") you'll get IPv4 sockets...

> > So, the set_peek_off() 
> > call failed even
> > for supposedly v4 sockets.
> > 
> > I checked this by adding a printout to the tcp_listen_handler(), and 
> > noticed that all returns from
> > the accept4() call goes into the AF_INET6 branch, even when the client 
> > (iperf3) call is using an IPv4 address.
> > During traffic, the very same socket is marked as v4 in the tcp_tap_conn 
> > structure, and this seems to
> > have worked just fine until I added the set_peek_offset call().
> > 
> > I believe this is an issue that has been introduced during the last 
> > months, since I didn't start
> > using the ´-4' option consistently until some months ago, and then it 
> > worked.
> 
> That's not actually an issue, it's intended, see commit 8e914238b6de
> ("tcp: Use dual stack sockets for port forwarding when possible").

..because this is exactly the reason.  We're using dual stack sockets
to reduce the amount of kernel memory we consume with many forwards.

> Could it be that you enabled IPv6 on your setup meanwhile, and that's
> why you see this now? I think I tested your changes on an isolated
> IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF
> support at the time.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-22  0:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-20 13:54 [PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Jon Maloy
2024-07-20 14:46 ` Jon Maloy
2024-07-21  9:21   ` Stefano Brivio
2024-07-22  0:55     ` David Gibson
2024-07-21  9:20 ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).