public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
@ 2024-07-22 22:09 Stefano Brivio
  2024-07-23  6:57 ` David Gibson
  2024-07-23 20:29 ` Stefano Brivio
  0 siblings, 2 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-07-22 22:09 UTC (permalink / raw)
  To: passt-dev; +Cc: Jon Maloy

From: Jon Maloy <jmaloy@redhat.com>

Based on an original patch by Jon Maloy:

--
The recently added socket option SO_PEEK_OFF is not supported for
TCP/IPv6 sockets. Until we get that support into the kernel we need to
test for support in both protocols to set the global 'peek_offset_cap´
to true.
--

Compared to the original patch:
- only check for SO_PEEK_OFF support for enabled IP versions
- use sa_family_t instead of int to pass the address family around

Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tcp.c b/tcp.c
index 0c66ac8..c031f13 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2470,6 +2470,29 @@ static void tcp_sock_refill_init(const struct ctx *c)
 	}
 }
 
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af:		Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(sa_family_t af)
+{
+	bool ret = false;
+	int s, optv = 0;
+
+	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (s < 0) {
+		warn_perror("Temporary TCP socket creation failed");
+	} else {
+		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+			ret = true;
+		close(s);
+	}
+
+	return ret;
+}
+
 /**
  * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
  * @c:		Execution context
@@ -2478,9 +2501,6 @@ static void tcp_sock_refill_init(const struct ctx *c)
  */
 int tcp_init(struct ctx *c)
 {
-	unsigned int optv = 0;
-	int s;
-
 	ASSERT(!c->no_tcp);
 
 	if (c->ifi4)
@@ -2502,15 +2522,8 @@ int tcp_init(struct ctx *c)
 		NS_CALL(tcp_ns_socks_init, c);
 	}
 
-	/* Probe for SO_PEEK_OFF support */
-	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
-	if (s < 0) {
-		warn_perror("Temporary TCP socket creation failed");
-	} else {
-		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
-			peek_offset_cap = true;
-		close(s);
-	}
+	peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
+			  (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
 	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
 
 	return 0;
-- 
@@ -2470,6 +2470,29 @@ static void tcp_sock_refill_init(const struct ctx *c)
 	}
 }
 
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af:		Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(sa_family_t af)
+{
+	bool ret = false;
+	int s, optv = 0;
+
+	s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+	if (s < 0) {
+		warn_perror("Temporary TCP socket creation failed");
+	} else {
+		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+			ret = true;
+		close(s);
+	}
+
+	return ret;
+}
+
 /**
  * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
  * @c:		Execution context
@@ -2478,9 +2501,6 @@ static void tcp_sock_refill_init(const struct ctx *c)
  */
 int tcp_init(struct ctx *c)
 {
-	unsigned int optv = 0;
-	int s;
-
 	ASSERT(!c->no_tcp);
 
 	if (c->ifi4)
@@ -2502,15 +2522,8 @@ int tcp_init(struct ctx *c)
 		NS_CALL(tcp_ns_socks_init, c);
 	}
 
-	/* Probe for SO_PEEK_OFF support */
-	s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
-	if (s < 0) {
-		warn_perror("Temporary TCP socket creation failed");
-	} else {
-		if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
-			peek_offset_cap = true;
-		close(s);
-	}
+	peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
+			  (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
 	info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
@ 2024-07-23  6:57 ` David Gibson
  2024-07-23 20:29 ` Stefano Brivio
  1 sibling, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-07-23  6:57 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 409 bytes --]

On Tue, Jul 23, 2024 at 12:09:37AM +0200, Stefano Brivio wrote:
> From: Jon Maloy <jmaloy@redhat.com>
> 
> Based on an original patch by Jon Maloy:

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
  2024-07-23  6:57 ` David Gibson
@ 2024-07-23 20:29 ` Stefano Brivio
  2024-07-24  0:40   ` David Gibson
  1 sibling, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2024-07-23 20:29 UTC (permalink / raw)
  To: passt-dev; +Cc: Jon Maloy, David Gibson

On Tue, 23 Jul 2024 00:09:37 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> From: Jon Maloy <jmaloy@redhat.com>
> 
> Based on an original patch by Jon Maloy:
> 
> --
> The recently added socket option SO_PEEK_OFF is not supported for
> TCP/IPv6 sockets. Until we get that support into the kernel we need to
> test for support in both protocols to set the global 'peek_offset_cap´
> to true.
> --
> 
> Compared to the original patch:
> - only check for SO_PEEK_OFF support for enabled IP versions
> - use sa_family_t instead of int to pass the address family around
> 
> Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available")

...so, with this, the probing issue is solved: on a 6.10 kernel,
SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).

However, if I disable it, for some reason, resorting to IPv4, at least
together with the flow table (applying just this patch to HEAD), I get
something that looks like one of the "old" TCP stalls. On the host:

  $ ./passt -f -t 10000 -4

and in the guest:

  # ip link set dev eth0 up
  # dhclient eth0
  # iperf3 -s -p 10000

back to the host:

  $ iperf3 -c 127.0.0.1 -p 10000
  Connecting to host 127.0.0.1, port 10000
  [  5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
  [  5]   0.00-1.00   sec  11.2 MBytes  94.3 Mbits/sec    0   5.50 MBytes       
  [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
  [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       

...the transfer never recovers.

I didn't really have time to debug this further.

At the moment I would be inclined to temporarily revert commit
e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
available"), but it's not a good idea if this happens to be hiding some
(unlikely?) issue with the flow table.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-23 20:29 ` Stefano Brivio
@ 2024-07-24  0:40   ` David Gibson
  2024-07-24  3:31     ` David Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-07-24  0:40 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:
> On Tue, 23 Jul 2024 00:09:37 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > From: Jon Maloy <jmaloy@redhat.com>
> > 
> > Based on an original patch by Jon Maloy:
> > 
> 
> ...so, with this, the probing issue is solved: on a 6.10 kernel,
> SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
> 
> However, if I disable it, for some reason, resorting to IPv4, at least
> together with the flow table (applying just this patch to HEAD), I get
> something that looks like one of the "old" TCP stalls. On the host:
> 
>   $ ./passt -f -t 10000 -4
> 
> and in the guest:
> 
>   # ip link set dev eth0 up
>   # dhclient eth0
>   # iperf3 -s -p 10000
> 
> back to the host:
> 
>   $ iperf3 -c 127.0.0.1 -p 10000
>   Connecting to host 127.0.0.1, port 10000
>   [  5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
>   [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
>   [  5]   0.00-1.00   sec  11.2 MBytes  94.3 Mbits/sec    0   5.50 MBytes       
>   [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
>   [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
> 
> ...the transfer never recovers.

Bother.  I've reproduced and am debugging now.

> I didn't really have time to debug this further.
> 
> At the moment I would be inclined to temporarily revert commit
> e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
> available"), but it's not a good idea if this happens to be hiding some
> (unlikely?) issue with the flow table.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-24  0:40   ` David Gibson
@ 2024-07-24  3:31     ` David Gibson
  2024-07-24  7:29       ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-07-24  3:31 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 2148 bytes --]

On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:
> On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:
> > On Tue, 23 Jul 2024 00:09:37 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> > 
> > > From: Jon Maloy <jmaloy@redhat.com>
> > > 
> > > Based on an original patch by Jon Maloy:
> > > 
> > 
> > ...so, with this, the probing issue is solved: on a 6.10 kernel,
> > SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
> > 
> > However, if I disable it, for some reason, resorting to IPv4, at least
> > together with the flow table (applying just this patch to HEAD), I get
> > something that looks like one of the "old" TCP stalls. On the host:
> > 
> >   $ ./passt -f -t 10000 -4
> > 
> > and in the guest:
> > 
> >   # ip link set dev eth0 up
> >   # dhclient eth0
> >   # iperf3 -s -p 10000
> > 
> > back to the host:
> > 
> >   $ iperf3 -c 127.0.0.1 -p 10000
> >   Connecting to host 127.0.0.1, port 10000
> >   [  5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
> >   [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> >   [  5]   0.00-1.00   sec  11.2 MBytes  94.3 Mbits/sec    0   5.50 MBytes       
> >   [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
> >   [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
> > 
> > ...the transfer never recovers.
> 
> Bother.  I've reproduced and am debugging now.

Found it.  Looks like one of the cases where we need to set
SO_PEEK_OFF was lost somewhere in the refactorings :(.

> > I didn't really have time to debug this further.
> > 
> > At the moment I would be inclined to temporarily revert commit
> > e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
> > available"), but it's not a good idea if this happens to be hiding some
> > (unlikely?) issue with the flow table.
> > 
> 



-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
  2024-07-24  3:31     ` David Gibson
@ 2024-07-24  7:29       ` Stefano Brivio
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-07-24  7:29 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Jon Maloy

On Wed, 24 Jul 2024 13:31:49 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:
> > On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:  
> > > On Tue, 23 Jul 2024 00:09:37 +0200
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >   
> > > > From: Jon Maloy <jmaloy@redhat.com>
> > > > 
> > > > Based on an original patch by Jon Maloy:
> > > >   
> > > 
> > > ...so, with this, the probing issue is solved: on a 6.10 kernel,
> > > SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
> > > 
> > > However, if I disable it, for some reason, resorting to IPv4, at least
> > > together with the flow table (applying just this patch to HEAD), I get
> > > something that looks like one of the "old" TCP stalls. On the host:
> > > 
> > >   $ ./passt -f -t 10000 -4
> > > 
> > > and in the guest:
> > > 
> > >   # ip link set dev eth0 up
> > >   # dhclient eth0
> > >   # iperf3 -s -p 10000
> > > 
> > > back to the host:
> > > 
> > >   $ iperf3 -c 127.0.0.1 -p 10000
> > >   Connecting to host 127.0.0.1, port 10000
> > >   [  5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
> > >   [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > >   [  5]   0.00-1.00   sec  11.2 MBytes  94.3 Mbits/sec    0   5.50 MBytes       
> > >   [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
> > >   [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   5.50 MBytes       
> > > 
> > > ...the transfer never recovers.  
> > 
> > Bother.  I've reproduced and am debugging now.  
> 
> Found it.  Looks like one of the cases where we need to set
> SO_PEEK_OFF was lost somewhere in the refactorings :(.

Hah, great, thanks, it fixes the issue on my setup as well. Re-running
all tests now...

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-24  7:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
2024-07-23  6:57 ` David Gibson
2024-07-23 20:29 ` Stefano Brivio
2024-07-24  0:40   ` David Gibson
2024-07-24  3:31     ` David Gibson
2024-07-24  7:29       ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).