* [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
@ 2024-07-22 22:09 Stefano Brivio
2024-07-23 6:57 ` David Gibson
2024-07-23 20:29 ` Stefano Brivio
0 siblings, 2 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-07-22 22:09 UTC (permalink / raw)
To: passt-dev; +Cc: Jon Maloy
From: Jon Maloy <jmaloy@redhat.com>
Based on an original patch by Jon Maloy:
--
The recently added socket option SO_PEEK_OFF is not supported for
TCP/IPv6 sockets. Until we get that support into the kernel we need to
test for support in both protocols to set the global 'peek_offset_cap´
to true.
--
Compared to the original patch:
- only check for SO_PEEK_OFF support for enabled IP versions
- use sa_family_t instead of int to pass the address family around
Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0c66ac8..c031f13 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2470,6 +2470,29 @@ static void tcp_sock_refill_init(const struct ctx *c)
}
}
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af: Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(sa_family_t af)
+{
+ bool ret = false;
+ int s, optv = 0;
+
+ s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+ if (s < 0) {
+ warn_perror("Temporary TCP socket creation failed");
+ } else {
+ if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+ ret = true;
+ close(s);
+ }
+
+ return ret;
+}
+
/**
* tcp_init() - Get initial sequence, hash secret, initialise per-socket data
* @c: Execution context
@@ -2478,9 +2501,6 @@ static void tcp_sock_refill_init(const struct ctx *c)
*/
int tcp_init(struct ctx *c)
{
- unsigned int optv = 0;
- int s;
-
ASSERT(!c->no_tcp);
if (c->ifi4)
@@ -2502,15 +2522,8 @@ int tcp_init(struct ctx *c)
NS_CALL(tcp_ns_socks_init, c);
}
- /* Probe for SO_PEEK_OFF support */
- s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
- if (s < 0) {
- warn_perror("Temporary TCP socket creation failed");
- } else {
- if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
- peek_offset_cap = true;
- close(s);
- }
+ peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
+ (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
return 0;
--
@@ -2470,6 +2470,29 @@ static void tcp_sock_refill_init(const struct ctx *c)
}
}
+/**
+ * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel
+ * @af: Address family, IPv4 or IPv6
+ *
+ * Return: true if supported, false otherwise
+ */
+bool tcp_probe_peek_offset_cap(sa_family_t af)
+{
+ bool ret = false;
+ int s, optv = 0;
+
+ s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
+ if (s < 0) {
+ warn_perror("Temporary TCP socket creation failed");
+ } else {
+ if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
+ ret = true;
+ close(s);
+ }
+
+ return ret;
+}
+
/**
* tcp_init() - Get initial sequence, hash secret, initialise per-socket data
* @c: Execution context
@@ -2478,9 +2501,6 @@ static void tcp_sock_refill_init(const struct ctx *c)
*/
int tcp_init(struct ctx *c)
{
- unsigned int optv = 0;
- int s;
-
ASSERT(!c->no_tcp);
if (c->ifi4)
@@ -2502,15 +2522,8 @@ int tcp_init(struct ctx *c)
NS_CALL(tcp_ns_socks_init, c);
}
- /* Probe for SO_PEEK_OFF support */
- s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
- if (s < 0) {
- warn_perror("Temporary TCP socket creation failed");
- } else {
- if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int)))
- peek_offset_cap = true;
- close(s);
- }
+ peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) &&
+ (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6));
info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
@ 2024-07-23 6:57 ` David Gibson
2024-07-23 20:29 ` Stefano Brivio
1 sibling, 0 replies; 6+ messages in thread
From: David Gibson @ 2024-07-23 6:57 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev, Jon Maloy
[-- Attachment #1: Type: text/plain, Size: 409 bytes --]
On Tue, Jul 23, 2024 at 12:09:37AM +0200, Stefano Brivio wrote:
> From: Jon Maloy <jmaloy@redhat.com>
>
> Based on an original patch by Jon Maloy:
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
2024-07-23 6:57 ` David Gibson
@ 2024-07-23 20:29 ` Stefano Brivio
2024-07-24 0:40 ` David Gibson
1 sibling, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2024-07-23 20:29 UTC (permalink / raw)
To: passt-dev; +Cc: Jon Maloy, David Gibson
On Tue, 23 Jul 2024 00:09:37 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:
> From: Jon Maloy <jmaloy@redhat.com>
>
> Based on an original patch by Jon Maloy:
>
> --
> The recently added socket option SO_PEEK_OFF is not supported for
> TCP/IPv6 sockets. Until we get that support into the kernel we need to
> test for support in both protocols to set the global 'peek_offset_cap´
> to true.
> --
>
> Compared to the original patch:
> - only check for SO_PEEK_OFF support for enabled IP versions
> - use sa_family_t instead of int to pass the address family around
>
> Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available")
...so, with this, the probing issue is solved: on a 6.10 kernel,
SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
However, if I disable it, for some reason, resorting to IPv4, at least
together with the flow table (applying just this patch to HEAD), I get
something that looks like one of the "old" TCP stalls. On the host:
$ ./passt -f -t 10000 -4
and in the guest:
# ip link set dev eth0 up
# dhclient eth0
# iperf3 -s -p 10000
back to the host:
$ iperf3 -c 127.0.0.1 -p 10000
Connecting to host 127.0.0.1, port 10000
[ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
...the transfer never recovers.
I didn't really have time to debug this further.
At the moment I would be inclined to temporarily revert commit
e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
available"), but it's not a good idea if this happens to be hiding some
(unlikely?) issue with the flow table.
--
Stefano
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
2024-07-23 20:29 ` Stefano Brivio
@ 2024-07-24 0:40 ` David Gibson
2024-07-24 3:31 ` David Gibson
0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-07-24 0:40 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev, Jon Maloy
[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]
On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:
> On Tue, 23 Jul 2024 00:09:37 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
>
> > From: Jon Maloy <jmaloy@redhat.com>
> >
> > Based on an original patch by Jon Maloy:
> >
>
> ...so, with this, the probing issue is solved: on a 6.10 kernel,
> SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
>
> However, if I disable it, for some reason, resorting to IPv4, at least
> together with the flow table (applying just this patch to HEAD), I get
> something that looks like one of the "old" TCP stalls. On the host:
>
> $ ./passt -f -t 10000 -4
>
> and in the guest:
>
> # ip link set dev eth0 up
> # dhclient eth0
> # iperf3 -s -p 10000
>
> back to the host:
>
> $ iperf3 -c 127.0.0.1 -p 10000
> Connecting to host 127.0.0.1, port 10000
> [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes
> [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
> [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
>
> ...the transfer never recovers.
Bother. I've reproduced and am debugging now.
> I didn't really have time to debug this further.
>
> At the moment I would be inclined to temporarily revert commit
> e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
> available"), but it's not a good idea if this happens to be hiding some
> (unlikely?) issue with the flow table.
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
2024-07-24 0:40 ` David Gibson
@ 2024-07-24 3:31 ` David Gibson
2024-07-24 7:29 ` Stefano Brivio
0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2024-07-24 3:31 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev, Jon Maloy
[-- Attachment #1: Type: text/plain, Size: 2148 bytes --]
On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:
> On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:
> > On Tue, 23 Jul 2024 00:09:37 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >
> > > From: Jon Maloy <jmaloy@redhat.com>
> > >
> > > Based on an original patch by Jon Maloy:
> > >
> >
> > ...so, with this, the probing issue is solved: on a 6.10 kernel,
> > SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
> >
> > However, if I disable it, for some reason, resorting to IPv4, at least
> > together with the flow table (applying just this patch to HEAD), I get
> > something that looks like one of the "old" TCP stalls. On the host:
> >
> > $ ./passt -f -t 10000 -4
> >
> > and in the guest:
> >
> > # ip link set dev eth0 up
> > # dhclient eth0
> > # iperf3 -s -p 10000
> >
> > back to the host:
> >
> > $ iperf3 -c 127.0.0.1 -p 10000
> > Connecting to host 127.0.0.1, port 10000
> > [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes
> > [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
> > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
> >
> > ...the transfer never recovers.
>
> Bother. I've reproduced and am debugging now.
Found it. Looks like one of the cases where we need to set
SO_PEEK_OFF was lost somewhere in the refactorings :(.
> > I didn't really have time to debug this further.
> >
> > At the moment I would be inclined to temporarily revert commit
> > e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when
> > available"), but it's not a good idea if this happens to be hiding some
> > (unlikely?) issue with the flow table.
> >
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
2024-07-24 3:31 ` David Gibson
@ 2024-07-24 7:29 ` Stefano Brivio
0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2024-07-24 7:29 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev, Jon Maloy
On Wed, 24 Jul 2024 13:31:49 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:
> > On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:
> > > On Tue, 23 Jul 2024 00:09:37 +0200
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > > From: Jon Maloy <jmaloy@redhat.com>
> > > >
> > > > Based on an original patch by Jon Maloy:
> > > >
> > >
> > > ...so, with this, the probing issue is solved: on a 6.10 kernel,
> > > SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4).
> > >
> > > However, if I disable it, for some reason, resorting to IPv4, at least
> > > together with the flow table (applying just this patch to HEAD), I get
> > > something that looks like one of the "old" TCP stalls. On the host:
> > >
> > > $ ./passt -f -t 10000 -4
> > >
> > > and in the guest:
> > >
> > > # ip link set dev eth0 up
> > > # dhclient eth0
> > > # iperf3 -s -p 10000
> > >
> > > back to the host:
> > >
> > > $ iperf3 -c 127.0.0.1 -p 10000
> > > Connecting to host 127.0.0.1, port 10000
> > > [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes
> > > [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
> > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes
> > >
> > > ...the transfer never recovers.
> >
> > Bother. I've reproduced and am debugging now.
>
> Found it. Looks like one of the cases where we need to set
> SO_PEEK_OFF was lost somewhere in the refactorings :(.
Hah, great, thanks, it fixes the issue on my setup as well. Re-running
all tests now...
--
Stefano
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-24 7:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 22:09 [PATCH v2] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6 Stefano Brivio
2024-07-23 6:57 ` David Gibson
2024-07-23 20:29 ` Stefano Brivio
2024-07-24 0:40 ` David Gibson
2024-07-24 3:31 ` David Gibson
2024-07-24 7:29 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).