* [PATCH v3] tcp: add support for SO_PEEK_OFF
@ 2024-02-09 22:12 jmaloy
2024-02-11 23:17 ` Stefano Brivio
2024-02-13 10:49 ` Paolo Abeni
0 siblings, 2 replies; 22+ messages in thread
From: jmaloy @ 2024-02-09 22:12 UTC (permalink / raw)
To: netdev, davem
Cc: kuba, passt-dev, jmaloy, sbrivio, lvivier, dgibson, Paolo Abeni
From: Jon Maloy <jmaloy@redhat.com>
When reading received messages from a socket with MSG_PEEK, we may want
to read the contents with an offset, like we can do with pread/preadv()
when reading files. Currently, it is not possible to do that.
In this commit, we add support for the SO_PEEK_OFF socket option for TCP,
in a similar way it is done for Unix Domain sockets.
In the iperf3 log examples shown below, we can observe a throughput
improvement of 15-20 % in the direction host->namespace when using the
protocol splicer 'pasta' (https://passt.top).
This is a consistent result.
pasta(1) and passt(1) implement user-mode networking for network
namespaces (containers) and virtual machines by means of a translation
layer between Layer-2 network interface and native Layer-4 sockets
(TCP, UDP, ICMP/ICMPv6 echo).
Received, pending TCP data to the container/guest is kept in kernel
buffers until acknowledged, so the tool routinely needs to fetch new
data from socket, skipping data that was already sent.
At the moment this is implemented using a dummy buffer passed to
recvmsg(). With this change, we don't need a dummy buffer and the
related buffer copy (copy_to_user()) anymore.
passt and pasta are supported in KubeVirt and libvirt/qemu.
jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
SO_PEEK_OFF not supported by kernel.
jmaloy@freyr:~/passt# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 44822
[ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
[ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
[ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
[ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
[ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
[ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
[ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
[ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
[ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
[ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
[ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
jmaloy@freyr:~/passt#
logout
[ perf record: Woken up 23 times to write data ]
[ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
jmaloy@freyr:~/passt$
jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
SO_PEEK_OFF supported by kernel.
jmaloy@freyr:~/passt# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 52084
[ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec
[ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
[ 5] 2.00-3.00 sec 1.26 GBytes 10.8 Gbits/sec
[ 5] 3.00-4.00 sec 1.36 GBytes 11.7 Gbits/sec
[ 5] 4.00-5.00 sec 1.33 GBytes 11.4 Gbits/sec
[ 5] 5.00-6.00 sec 1.21 GBytes 10.4 Gbits/sec
[ 5] 6.00-7.00 sec 1.31 GBytes 11.2 Gbits/sec
[ 5] 7.00-8.00 sec 1.25 GBytes 10.7 Gbits/sec
[ 5] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec
[ 5] 9.00-10.00 sec 1.24 GBytes 10.7 Gbits/sec
[ 5] 10.00-10.04 sec 56.0 MBytes 12.1 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 12.9 GBytes 11.0 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
logout
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
jmaloy@freyr:~/passt$
The perf record confirms this result. Below, we can observe that the
CPU spends significantly less time in the function ____sys_recvmsg()
when we have offset support.
Without offset support:
----------------------
jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
-p ____sys_recvmsg -x --stdio -i perf.data | head -1
46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
With offset support:
----------------------
jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
-p ____sys_recvmsg -x --stdio -i perf.data | head -1
28.12% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
v3: - Applied changes suggested by Stefano Brivio and Paolo Abeni
---
net/ipv4/af_inet.c | 1 +
net/ipv4/tcp.c | 16 ++++++++++------
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4e635dd3d3c8..5f0e5d10c416 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1071,6 +1071,7 @@ const struct proto_ops inet_stream_ops = {
#endif
.splice_eof = inet_splice_eof,
.splice_read = tcp_splice_read,
+ .set_peek_off = sk_set_peek_off,
.read_sock = tcp_read_sock,
.read_skb = tcp_read_skb,
.sendmsg_locked = tcp_sendmsg_locked,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7e2481b9eae1..1c8cab14a32c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1415,8 +1415,6 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
struct sk_buff *skb;
int copied = 0, err = 0;
- /* XXX -- need to support SO_PEEK_OFF */
-
skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
if (err)
@@ -2327,6 +2325,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
int target; /* Read at least this many bytes */
long timeo;
struct sk_buff *skb, *last;
+ u32 peek_offset = 0;
u32 urg_hole = 0;
err = -ENOTCONN;
@@ -2360,7 +2359,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
seq = &tp->copied_seq;
if (flags & MSG_PEEK) {
- peek_seq = tp->copied_seq;
+ peek_offset = max(sk_peek_offset(sk, flags), 0);
+ peek_seq = tp->copied_seq + peek_offset;
seq = &peek_seq;
}
@@ -2463,11 +2463,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
}
if ((flags & MSG_PEEK) &&
- (peek_seq - copied - urg_hole != tp->copied_seq)) {
+ (peek_seq - peek_offset - copied - urg_hole != tp->copied_seq)) {
net_dbg_ratelimited("TCP(%s:%d): Application bug, race in MSG_PEEK\n",
current->comm,
task_pid_nr(current));
- peek_seq = tp->copied_seq;
+ peek_seq = tp->copied_seq + peek_offset;
}
continue;
@@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
WRITE_ONCE(*seq, *seq + used);
copied += used;
len -= used;
-
+ if (flags & MSG_PEEK)
+ sk_peek_offset_fwd(sk, used);
+ else
+ sk_peek_offset_bwd(sk, used);
tcp_rcv_space_adjust(sk);
skip_copy:
@@ -3007,6 +3010,7 @@ int tcp_disconnect(struct sock *sk, int flags)
__skb_queue_purge(&sk->sk_receive_queue);
WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
WRITE_ONCE(tp->urg_data, 0);
+ sk_set_peek_off(sk, -1);
tcp_write_queue_purge(sk);
tcp_fastopen_active_disable_ofo_check(sk);
skb_rbtree_purge(&tp->out_of_order_queue);
--
@@ -1415,8 +1415,6 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
struct sk_buff *skb;
int copied = 0, err = 0;
- /* XXX -- need to support SO_PEEK_OFF */
-
skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
if (err)
@@ -2327,6 +2325,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
int target; /* Read at least this many bytes */
long timeo;
struct sk_buff *skb, *last;
+ u32 peek_offset = 0;
u32 urg_hole = 0;
err = -ENOTCONN;
@@ -2360,7 +2359,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
seq = &tp->copied_seq;
if (flags & MSG_PEEK) {
- peek_seq = tp->copied_seq;
+ peek_offset = max(sk_peek_offset(sk, flags), 0);
+ peek_seq = tp->copied_seq + peek_offset;
seq = &peek_seq;
}
@@ -2463,11 +2463,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
}
if ((flags & MSG_PEEK) &&
- (peek_seq - copied - urg_hole != tp->copied_seq)) {
+ (peek_seq - peek_offset - copied - urg_hole != tp->copied_seq)) {
net_dbg_ratelimited("TCP(%s:%d): Application bug, race in MSG_PEEK\n",
current->comm,
task_pid_nr(current));
- peek_seq = tp->copied_seq;
+ peek_seq = tp->copied_seq + peek_offset;
}
continue;
@@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
WRITE_ONCE(*seq, *seq + used);
copied += used;
len -= used;
-
+ if (flags & MSG_PEEK)
+ sk_peek_offset_fwd(sk, used);
+ else
+ sk_peek_offset_bwd(sk, used);
tcp_rcv_space_adjust(sk);
skip_copy:
@@ -3007,6 +3010,7 @@ int tcp_disconnect(struct sock *sk, int flags)
__skb_queue_purge(&sk->sk_receive_queue);
WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
WRITE_ONCE(tp->urg_data, 0);
+ sk_set_peek_off(sk, -1);
tcp_write_queue_purge(sk);
tcp_fastopen_active_disable_ofo_check(sk);
skb_rbtree_purge(&tp->out_of_order_queue);
--
2.42.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-09 22:12 [PATCH v3] tcp: add support for SO_PEEK_OFF jmaloy
@ 2024-02-11 23:17 ` Stefano Brivio
2024-02-13 10:49 ` Paolo Abeni
1 sibling, 0 replies; 22+ messages in thread
From: Stefano Brivio @ 2024-02-11 23:17 UTC (permalink / raw)
To: jmaloy; +Cc: netdev, davem, kuba, passt-dev, lvivier, dgibson, Paolo Abeni
On Fri, 9 Feb 2024 17:12:33 -0500
jmaloy@redhat.com wrote:
> From: Jon Maloy <jmaloy@redhat.com>
>
> When reading received messages from a socket with MSG_PEEK, we may want
> to read the contents with an offset, like we can do with pread/preadv()
> when reading files. Currently, it is not possible to do that.
>
> In this commit, we add support for the SO_PEEK_OFF socket option for TCP,
> in a similar way it is done for Unix Domain sockets.
>
> In the iperf3 log examples shown below, we can observe a throughput
> improvement of 15-20 % in the direction host->namespace when using the
> protocol splicer 'pasta' (https://passt.top).
> This is a consistent result.
>
> pasta(1) and passt(1) implement user-mode networking for network
> namespaces (containers) and virtual machines by means of a translation
> layer between Layer-2 network interface and native Layer-4 sockets
> (TCP, UDP, ICMP/ICMPv6 echo).
>
> Received, pending TCP data to the container/guest is kept in kernel
> buffers until acknowledged, so the tool routinely needs to fetch new
> data from socket, skipping data that was already sent.
>
> At the moment this is implemented using a dummy buffer passed to
> recvmsg(). With this change, we don't need a dummy buffer and the
> related buffer copy (copy_to_user()) anymore.
>
> passt and pasta are supported in KubeVirt and libvirt/qemu.
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> SO_PEEK_OFF not supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 44822
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
> [ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
> [ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
> [ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
> [ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
> [ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
> [ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
> [ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
> [ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
> [ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
> [ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> jmaloy@freyr:~/passt#
> logout
> [ perf record: Woken up 23 times to write data ]
> [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
> jmaloy@freyr:~/passt$
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> SO_PEEK_OFF supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 52084
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec
> [ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
> [ 5] 2.00-3.00 sec 1.26 GBytes 10.8 Gbits/sec
> [ 5] 3.00-4.00 sec 1.36 GBytes 11.7 Gbits/sec
> [ 5] 4.00-5.00 sec 1.33 GBytes 11.4 Gbits/sec
> [ 5] 5.00-6.00 sec 1.21 GBytes 10.4 Gbits/sec
> [ 5] 6.00-7.00 sec 1.31 GBytes 11.2 Gbits/sec
> [ 5] 7.00-8.00 sec 1.25 GBytes 10.7 Gbits/sec
> [ 5] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec
> [ 5] 9.00-10.00 sec 1.24 GBytes 10.7 Gbits/sec
> [ 5] 10.00-10.04 sec 56.0 MBytes 12.1 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 12.9 GBytes 11.0 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> logout
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
> jmaloy@freyr:~/passt$
>
> The perf record confirms this result. Below, we can observe that the
> CPU spends significantly less time in the function ____sys_recvmsg()
> when we have offset support.
>
> Without offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> With offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 28.12% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-09 22:12 [PATCH v3] tcp: add support for SO_PEEK_OFF jmaloy
2024-02-11 23:17 ` Stefano Brivio
@ 2024-02-13 10:49 ` Paolo Abeni
2024-02-13 12:24 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-13 10:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
Oops,
I just noticed Eric is missing from the recipients list, adding him
now.
On Fri, 2024-02-09 at 17:12 -0500, jmaloy@redhat.com wrote:
> From: Jon Maloy <jmaloy@redhat.com>
>
> When reading received messages from a socket with MSG_PEEK, we may want
> to read the contents with an offset, like we can do with pread/preadv()
> when reading files. Currently, it is not possible to do that.
>
> In this commit, we add support for the SO_PEEK_OFF socket option for TCP,
> in a similar way it is done for Unix Domain sockets.
>
> In the iperf3 log examples shown below, we can observe a throughput
> improvement of 15-20 % in the direction host->namespace when using the
> protocol splicer 'pasta' (https://passt.top).
> This is a consistent result.
>
> pasta(1) and passt(1) implement user-mode networking for network
> namespaces (containers) and virtual machines by means of a translation
> layer between Layer-2 network interface and native Layer-4 sockets
> (TCP, UDP, ICMP/ICMPv6 echo).
>
> Received, pending TCP data to the container/guest is kept in kernel
> buffers until acknowledged, so the tool routinely needs to fetch new
> data from socket, skipping data that was already sent.
>
> At the moment this is implemented using a dummy buffer passed to
> recvmsg(). With this change, we don't need a dummy buffer and the
> related buffer copy (copy_to_user()) anymore.
>
> passt and pasta are supported in KubeVirt and libvirt/qemu.
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> SO_PEEK_OFF not supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 44822
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
> [ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
> [ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
> [ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
> [ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
> [ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
> [ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
> [ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
> [ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
> [ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
> [ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> jmaloy@freyr:~/passt#
> logout
> [ perf record: Woken up 23 times to write data ]
> [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
> jmaloy@freyr:~/passt$
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> SO_PEEK_OFF supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 52084
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec
> [ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
> [ 5] 2.00-3.00 sec 1.26 GBytes 10.8 Gbits/sec
> [ 5] 3.00-4.00 sec 1.36 GBytes 11.7 Gbits/sec
> [ 5] 4.00-5.00 sec 1.33 GBytes 11.4 Gbits/sec
> [ 5] 5.00-6.00 sec 1.21 GBytes 10.4 Gbits/sec
> [ 5] 6.00-7.00 sec 1.31 GBytes 11.2 Gbits/sec
> [ 5] 7.00-8.00 sec 1.25 GBytes 10.7 Gbits/sec
> [ 5] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec
> [ 5] 9.00-10.00 sec 1.24 GBytes 10.7 Gbits/sec
> [ 5] 10.00-10.04 sec 56.0 MBytes 12.1 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 12.9 GBytes 11.0 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> logout
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
> jmaloy@freyr:~/passt$
>
> The perf record confirms this result. Below, we can observe that the
> CPU spends significantly less time in the function ____sys_recvmsg()
> when we have offset support.
>
> Without offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> With offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 28.12% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>
> ---
> v3: - Applied changes suggested by Stefano Brivio and Paolo Abeni
> ---
> net/ipv4/af_inet.c | 1 +
> net/ipv4/tcp.c | 16 ++++++++++------
> 2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 4e635dd3d3c8..5f0e5d10c416 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1071,6 +1071,7 @@ const struct proto_ops inet_stream_ops = {
> #endif
> .splice_eof = inet_splice_eof,
> .splice_read = tcp_splice_read,
> + .set_peek_off = sk_set_peek_off,
> .read_sock = tcp_read_sock,
> .read_skb = tcp_read_skb,
> .sendmsg_locked = tcp_sendmsg_locked,
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 7e2481b9eae1..1c8cab14a32c 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1415,8 +1415,6 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
> struct sk_buff *skb;
> int copied = 0, err = 0;
>
> - /* XXX -- need to support SO_PEEK_OFF */
> -
> skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
> if (err)
> @@ -2327,6 +2325,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> int target; /* Read at least this many bytes */
> long timeo;
> struct sk_buff *skb, *last;
> + u32 peek_offset = 0;
> u32 urg_hole = 0;
>
> err = -ENOTCONN;
> @@ -2360,7 +2359,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
>
> seq = &tp->copied_seq;
> if (flags & MSG_PEEK) {
> - peek_seq = tp->copied_seq;
> + peek_offset = max(sk_peek_offset(sk, flags), 0);
> + peek_seq = tp->copied_seq + peek_offset;
> seq = &peek_seq;
> }
>
> @@ -2463,11 +2463,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> }
>
> if ((flags & MSG_PEEK) &&
> - (peek_seq - copied - urg_hole != tp->copied_seq)) {
> + (peek_seq - peek_offset - copied - urg_hole != tp->copied_seq)) {
> net_dbg_ratelimited("TCP(%s:%d): Application bug, race in MSG_PEEK\n",
> current->comm,
> task_pid_nr(current));
> - peek_seq = tp->copied_seq;
> + peek_seq = tp->copied_seq + peek_offset;
> }
> continue;
>
> @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> WRITE_ONCE(*seq, *seq + used);
> copied += used;
> len -= used;
> -
> + if (flags & MSG_PEEK)
> + sk_peek_offset_fwd(sk, used);
> + else
> + sk_peek_offset_bwd(sk, used);
> tcp_rcv_space_adjust(sk);
>
> skip_copy:
> @@ -3007,6 +3010,7 @@ int tcp_disconnect(struct sock *sk, int flags)
> __skb_queue_purge(&sk->sk_receive_queue);
> WRITE_ONCE(tp->copied_seq, tp->rcv_nxt);
> WRITE_ONCE(tp->urg_data, 0);
> + sk_set_peek_off(sk, -1);
> tcp_write_queue_purge(sk);
> tcp_fastopen_active_disable_ofo_check(sk);
> skb_rbtree_purge(&tp->out_of_order_queue);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 10:49 ` Paolo Abeni
@ 2024-02-13 12:24 ` Eric Dumazet
2024-02-13 13:02 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-02-13 12:24 UTC (permalink / raw)
To: Paolo Abeni
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Oops,
>
> I just noticed Eric is missing from the recipients list, adding him
> now.
>
Hmmm thanks.
> On Fri, 2024-02-09 at 17:12 -0500, jmaloy@redhat.com wrote:
> > From: Jon Maloy <jmaloy@redhat.com>
> >
> > When reading received messages from a socket with MSG_PEEK, we may want
> > to read the contents with an offset, like we can do with pread/preadv()
> > when reading files. Currently, it is not possible to do that.
> >
> > In this commit, we add support for the SO_PEEK_OFF socket option for TCP,
> > in a similar way it is done for Unix Domain sockets.
> >
> > In the iperf3 log examples shown below, we can observe a throughput
> > improvement of 15-20 % in the direction host->namespace when using the
> > protocol splicer 'pasta' (https://passt.top).
> > This is a consistent result.
> >
> > pasta(1) and passt(1) implement user-mode networking for network
> > namespaces (containers) and virtual machines by means of a translation
> > layer between Layer-2 network interface and native Layer-4 sockets
> > (TCP, UDP, ICMP/ICMPv6 echo).
> >
> > Received, pending TCP data to the container/guest is kept in kernel
> > buffers until acknowledged, so the tool routinely needs to fetch new
> > data from socket, skipping data that was already sent.
> >
> > At the moment this is implemented using a dummy buffer passed to
> > recvmsg(). With this change, we don't need a dummy buffer and the
> > related buffer copy (copy_to_user()) anymore.
> >
> > passt and pasta are supported in KubeVirt and libvirt/qemu.
> >
> > jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> > SO_PEEK_OFF not supported by kernel.
> >
> > jmaloy@freyr:~/passt# iperf3 -s
> > -----------------------------------------------------------
> > Server listening on 5201 (test #1)
> > -----------------------------------------------------------
> > Accepted connection from 192.168.122.1, port 44822
> > [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
> > [ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
> > [ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
> > [ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
> > [ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
> > [ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
> > [ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
> > [ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
> > [ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
> > [ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
> > [ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
> > -----------------------------------------------------------
> > Server listening on 5201 (test #2)
> > -----------------------------------------------------------
> > ^Ciperf3: interrupt - the server has terminated
> > jmaloy@freyr:~/passt#
> > logout
> > [ perf record: Woken up 23 times to write data ]
> > [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
> > jmaloy@freyr:~/passt$
> >
> > jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> > SO_PEEK_OFF supported by kernel.
> >
> > jmaloy@freyr:~/passt# iperf3 -s
> > -----------------------------------------------------------
> > Server listening on 5201 (test #1)
> > -----------------------------------------------------------
> > Accepted connection from 192.168.122.1, port 52084
> > [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec
> > [ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
> > [ 5] 2.00-3.00 sec 1.26 GBytes 10.8 Gbits/sec
> > [ 5] 3.00-4.00 sec 1.36 GBytes 11.7 Gbits/sec
> > [ 5] 4.00-5.00 sec 1.33 GBytes 11.4 Gbits/sec
> > [ 5] 5.00-6.00 sec 1.21 GBytes 10.4 Gbits/sec
> > [ 5] 6.00-7.00 sec 1.31 GBytes 11.2 Gbits/sec
> > [ 5] 7.00-8.00 sec 1.25 GBytes 10.7 Gbits/sec
> > [ 5] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec
> > [ 5] 9.00-10.00 sec 1.24 GBytes 10.7 Gbits/sec
> > [ 5] 10.00-10.04 sec 56.0 MBytes 12.1 Gbits/sec
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-10.04 sec 12.9 GBytes 11.0 Gbits/sec receiver
> > -----------------------------------------------------------
> > Server listening on 5201 (test #2)
> > -----------------------------------------------------------
> > ^Ciperf3: interrupt - the server has terminated
> > logout
> > [ perf record: Woken up 20 times to write data ]
> > [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
> > jmaloy@freyr:~/passt$
> >
> > The perf record confirms this result. Below, we can observe that the
> > CPU spends significantly less time in the function ____sys_recvmsg()
> > when we have offset support.
> >
> > Without offset support:
> > ----------------------
> > jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> > -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> > 46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
> >
> > With offset support:
> > ----------------------
> > jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
> > -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> > 28.12% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
> >
> > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> >
> > ---
> > v3: - Applied changes suggested by Stefano Brivio and Paolo Abeni
> > ---
> > net/ipv4/af_inet.c | 1 +
> > net/ipv4/tcp.c | 16 ++++++++++------
> > 2 files changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 4e635dd3d3c8..5f0e5d10c416 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -1071,6 +1071,7 @@ const struct proto_ops inet_stream_ops = {
> > #endif
> > .splice_eof = inet_splice_eof,
> > .splice_read = tcp_splice_read,
> > + .set_peek_off = sk_set_peek_off,
> > .read_sock = tcp_read_sock,
> > .read_skb = tcp_read_skb,
> > .sendmsg_locked = tcp_sendmsg_locked,
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 7e2481b9eae1..1c8cab14a32c 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1415,8 +1415,6 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
> > struct sk_buff *skb;
> > int copied = 0, err = 0;
> >
> > - /* XXX -- need to support SO_PEEK_OFF */
> > -
> > skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> > err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
> > if (err)
> > @@ -2327,6 +2325,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > int target; /* Read at least this many bytes */
> > long timeo;
> > struct sk_buff *skb, *last;
> > + u32 peek_offset = 0;
> > u32 urg_hole = 0;
> >
> > err = -ENOTCONN;
> > @@ -2360,7 +2359,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> >
> > seq = &tp->copied_seq;
> > if (flags & MSG_PEEK) {
> > - peek_seq = tp->copied_seq;
> > + peek_offset = max(sk_peek_offset(sk, flags), 0);
> > + peek_seq = tp->copied_seq + peek_offset;
> > seq = &peek_seq;
> > }
> >
> > @@ -2463,11 +2463,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > }
> >
> > if ((flags & MSG_PEEK) &&
> > - (peek_seq - copied - urg_hole != tp->copied_seq)) {
> > + (peek_seq - peek_offset - copied - urg_hole != tp->copied_seq)) {
> > net_dbg_ratelimited("TCP(%s:%d): Application bug, race in MSG_PEEK\n",
> > current->comm,
> > task_pid_nr(current));
> > - peek_seq = tp->copied_seq;
> > + peek_seq = tp->copied_seq + peek_offset;
> > }
> > continue;
> >
> > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > WRITE_ONCE(*seq, *seq + used);
> > copied += used;
> > len -= used;
> > -
> > + if (flags & MSG_PEEK)
> > + sk_peek_offset_fwd(sk, used);
> > + else
> > + sk_peek_offset_bwd(sk, used);
Yet another cache miss in TCP fast path...
We need to move sk_peek_off in a better location before we accept this patch.
I always thought MSK_PEEK was very inefficient, I am surprised we
allow arbitrary loops in recvmsg().
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 12:24 ` Eric Dumazet
@ 2024-02-13 13:02 ` Paolo Abeni
2024-02-13 13:34 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-13 13:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > WRITE_ONCE(*seq, *seq + used);
> > > copied += used;
> > > len -= used;
> > > -
> > > + if (flags & MSG_PEEK)
> > > + sk_peek_offset_fwd(sk, used);
> > > + else
> > > + sk_peek_offset_bwd(sk, used);
>
> Yet another cache miss in TCP fast path...
>
> We need to move sk_peek_off in a better location before we accept this patch.
>
> I always thought MSK_PEEK was very inefficient, I am surprised we
> allow arbitrary loops in recvmsg().
Let me double check I read the above correctly: are you concerned by
the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
touch a lot of skbs/cachelines before reaching the relevant skb?
The end goal here is allowing an user-space application to read
incrementally/sequentially the received data while leaving them in
receive buffer.
I don't see a better option than MSG_PEEK, am I missing something?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 13:02 ` Paolo Abeni
@ 2024-02-13 13:34 ` Eric Dumazet
2024-02-13 15:28 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-02-13 13:34 UTC (permalink / raw)
To: Paolo Abeni
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, Feb 13, 2024 at 2:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > > WRITE_ONCE(*seq, *seq + used);
> > > > copied += used;
> > > > len -= used;
> > > > -
> > > > + if (flags & MSG_PEEK)
> > > > + sk_peek_offset_fwd(sk, used);
> > > > + else
> > > > + sk_peek_offset_bwd(sk, used);
> >
> > Yet another cache miss in TCP fast path...
> >
> > We need to move sk_peek_off in a better location before we accept this patch.
> >
> > I always thought MSK_PEEK was very inefficient, I am surprised we
> > allow arbitrary loops in recvmsg().
>
> Let me double check I read the above correctly: are you concerned by
> the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
> touch a lot of skbs/cachelines before reaching the relevant skb?
>
> The end goal here is allowing an user-space application to read
> incrementally/sequentially the received data while leaving them in
> receive buffer.
>
> I don't see a better option than MSG_PEEK, am I missing something?
This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
MSG_PEEK case is very strange IMO.
Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
is used by the caller.
That would only touch non fast paths.
Since the API is mono-threaded anyway, the caller should not rely on
the fact that normal recvmsg() call
would 'consume' sk_peek_offset.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 13:34 ` Eric Dumazet
@ 2024-02-13 15:28 ` Paolo Abeni
2024-02-13 15:49 ` Eric Dumazet
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-13 15:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 2:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> > > On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > WRITE_ONCE(*seq, *seq + used);
> > > > > copied += used;
> > > > > len -= used;
> > > > > -
> > > > > + if (flags & MSG_PEEK)
> > > > > + sk_peek_offset_fwd(sk, used);
> > > > > + else
> > > > > + sk_peek_offset_bwd(sk, used);
> > >
> > > Yet another cache miss in TCP fast path...
> > >
> > > We need to move sk_peek_off in a better location before we accept this patch.
> > >
> > > I always thought MSK_PEEK was very inefficient, I am surprised we
> > > allow arbitrary loops in recvmsg().
> >
> > Let me double check I read the above correctly: are you concerned by
> > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
> > touch a lot of skbs/cachelines before reaching the relevant skb?
> >
> > The end goal here is allowing an user-space application to read
> > incrementally/sequentially the received data while leaving them in
> > receive buffer.
> >
> > I don't see a better option than MSG_PEEK, am I missing something?
>
>
> This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> MSG_PEEK case is very strange IMO.
>
> Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> is used by the caller.
>
> That would only touch non fast paths.
>
> Since the API is mono-threaded anyway, the caller should not rely on
> the fact that normal recvmsg() call
> would 'consume' sk_peek_offset.
Storing in sk_peek_seq the tcp next sequence number to be peeked should
avoid changes in the non MSG_PEEK cases.
AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
(in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
that be acceptable?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 15:28 ` Paolo Abeni
@ 2024-02-13 15:49 ` Eric Dumazet
2024-02-13 18:39 ` Paolo Abeni
2024-02-13 23:34 ` David Gibson
0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2024-02-13 15:49 UTC (permalink / raw)
To: Paolo Abeni
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, Feb 13, 2024 at 4:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 2:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> > > > On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > >
> > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > > WRITE_ONCE(*seq, *seq + used);
> > > > > > copied += used;
> > > > > > len -= used;
> > > > > > -
> > > > > > + if (flags & MSG_PEEK)
> > > > > > + sk_peek_offset_fwd(sk, used);
> > > > > > + else
> > > > > > + sk_peek_offset_bwd(sk, used);
> > > >
> > > > Yet another cache miss in TCP fast path...
> > > >
> > > > We need to move sk_peek_off in a better location before we accept this patch.
> > > >
> > > > I always thought MSK_PEEK was very inefficient, I am surprised we
> > > > allow arbitrary loops in recvmsg().
> > >
> > > Let me double check I read the above correctly: are you concerned by
> > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
> > > touch a lot of skbs/cachelines before reaching the relevant skb?
> > >
> > > The end goal here is allowing an user-space application to read
> > > incrementally/sequentially the received data while leaving them in
> > > receive buffer.
> > >
> > > I don't see a better option than MSG_PEEK, am I missing something?
> >
> >
> > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> > MSG_PEEK case is very strange IMO.
> >
> > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> > is used by the caller.
> >
> > That would only touch non fast paths.
> >
> > Since the API is mono-threaded anyway, the caller should not rely on
> > the fact that normal recvmsg() call
> > would 'consume' sk_peek_offset.
>
> Storing in sk_peek_seq the tcp next sequence number to be peeked should
> avoid changes in the non MSG_PEEK cases.
>
> AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
> (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
> that be acceptable?
We could have a parallel SO_PEEK_OFFSET option, reusing the same socket field.
The new semantic would be : Supported by TCP (so far), and tcp
recvmsg() only reads/writes this field when MSG_PEEK is used.
Applications would have to clear the values themselves.
BTW I see the man pages say SO_PEEK_OFF is "is currently supported
only for unix(7) sockets"
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 15:49 ` Eric Dumazet
@ 2024-02-13 18:39 ` Paolo Abeni
2024-02-13 19:31 ` Eric Dumazet
2024-02-13 23:34 ` David Gibson
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-13 18:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, 2024-02-13 at 16:49 +0100, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 4:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> > > MSG_PEEK case is very strange IMO.
> > >
> > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> > > is used by the caller.
> > >
> > > That would only touch non fast paths.
> > >
> > > Since the API is mono-threaded anyway, the caller should not rely on
> > > the fact that normal recvmsg() call
> > > would 'consume' sk_peek_offset.
> >
> > Storing in sk_peek_seq the tcp next sequence number to be peeked should
> > avoid changes in the non MSG_PEEK cases.
> >
> > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
> > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
> > that be acceptable?
>
> We could have a parallel SO_PEEK_OFFSET option, reusing the same socket field.
>
> The new semantic would be : Supported by TCP (so far), and tcp
> recvmsg() only reads/writes this field when MSG_PEEK is used.
> Applications would have to clear the values themselves.
I feel like there is some misunderstanding, or at least I can't follow.
Let me be more verbose, to try to clarify my reasoning.
Two consecutive recvmsg(MSG_PEEK) calls for TCP after SO_PEEK_OFF will
return adjacent data. AFAICS this is the same semantic currently
implemented by UDP and unix sockets.
Currently 'sk_peek_off' maintains the next offset to be peeked into the
current receive queue. To implement the above behaviour, tcp_recvmsg()
has to update 'sk_peek_off' after MSG_PEEK, to move the offset to the
next data, and after a plain read, to account for the data removed from
the receive queue.
I proposed to let introduce a tcp-specific set_peek_off doing something
alike:
WRTIE_ONCE(sk->sk_peek_off, tcp_sk(sk)->copied_seq + val);
so that the recvmsg will need to update sk_peek_off only for MSG_PEEK,
while retaining the semantic described above.
To keep the userspace interface unchanged that will need a paired
tcp_get_peek_off(), so that getsockopt(SO_PEEK_OFF) could return to the
user a plain offset. An additional bit flag will be needed to store the
information "the user-space enabled peek with offset".
I don't understand how a setsockopt(PEEK_OFFSET) variant would help
avoiding touching sk->sk_peek_offset?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 18:39 ` Paolo Abeni
@ 2024-02-13 19:31 ` Eric Dumazet
[not found] ` <20687849-ec5c-9ce5-0a18-cc80f5b64816@redhat.com>
0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-02-13 19:31 UTC (permalink / raw)
To: Paolo Abeni
Cc: kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy, netdev, davem
On Tue, Feb 13, 2024 at 7:39 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2024-02-13 at 16:49 +0100, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 4:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> > > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> > > > MSG_PEEK case is very strange IMO.
> > > >
> > > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> > > > is used by the caller.
> > > >
> > > > That would only touch non fast paths.
> > > >
> > > > Since the API is mono-threaded anyway, the caller should not rely on
> > > > the fact that normal recvmsg() call
> > > > would 'consume' sk_peek_offset.
> > >
> > > Storing in sk_peek_seq the tcp next sequence number to be peeked should
> > > avoid changes in the non MSG_PEEK cases.
> > >
> > > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
> > > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
> > > that be acceptable?
> >
> > We could have a parallel SO_PEEK_OFFSET option, reusing the same socket field.
> >
> > The new semantic would be : Supported by TCP (so far), and tcp
> > recvmsg() only reads/writes this field when MSG_PEEK is used.
> > Applications would have to clear the values themselves.
>
> I feel like there is some misunderstanding, or at least I can't follow.
> Let me be more verbose, to try to clarify my reasoning.
>
> Two consecutive recvmsg(MSG_PEEK) calls for TCP after SO_PEEK_OFF will
> return adjacent data. AFAICS this is the same semantic currently
> implemented by UDP and unix sockets.
>
> Currently 'sk_peek_off' maintains the next offset to be peeked into the
> current receive queue. To implement the above behaviour, tcp_recvmsg()
> has to update 'sk_peek_off' after MSG_PEEK, to move the offset to the
> next data, and after a plain read, to account for the data removed from
> the receive queue.
>
> I proposed to let introduce a tcp-specific set_peek_off doing something
> alike:
>
> WRTIE_ONCE(sk->sk_peek_off, tcp_sk(sk)->copied_seq + val);
>
> so that the recvmsg will need to update sk_peek_off only for MSG_PEEK,
> while retaining the semantic described above.
>
> To keep the userspace interface unchanged that will need a paired
> tcp_get_peek_off(), so that getsockopt(SO_PEEK_OFF) could return to the
> user a plain offset. An additional bit flag will be needed to store the
> information "the user-space enabled peek with offset".
>
> I don't understand how a setsockopt(PEEK_OFFSET) variant would help
> avoiding touching sk->sk_peek_offset?
>
I was trying to avoid using an extra storage, I was not trying to
implement the alternative myself :0)
If the recvmsg( MSG_PEEK) is supposed to auto-advance the peek_offset,
we probably need more than a mere 32bit field.
> Thanks!
>
> Paolo
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 15:49 ` Eric Dumazet
2024-02-13 18:39 ` Paolo Abeni
@ 2024-02-13 23:34 ` David Gibson
2024-02-14 3:41 ` Eric Dumazet
2024-02-15 3:21 ` David Gibson
1 sibling, 2 replies; 22+ messages in thread
From: David Gibson @ 2024-02-13 23:34 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy,
netdev, davem
[-- Attachment #1: Type: text/plain, Size: 3366 bytes --]
On Tue, Feb 13, 2024 at 04:49:01PM +0100, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 4:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> > > On Tue, Feb 13, 2024 at 2:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > >
> > > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> > > > > On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > >
> > > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > > > WRITE_ONCE(*seq, *seq + used);
> > > > > > > copied += used;
> > > > > > > len -= used;
> > > > > > > -
> > > > > > > + if (flags & MSG_PEEK)
> > > > > > > + sk_peek_offset_fwd(sk, used);
> > > > > > > + else
> > > > > > > + sk_peek_offset_bwd(sk, used);
> > > > >
> > > > > Yet another cache miss in TCP fast path...
> > > > >
> > > > > We need to move sk_peek_off in a better location before we accept this patch.
> > > > >
> > > > > I always thought MSK_PEEK was very inefficient, I am surprised we
> > > > > allow arbitrary loops in recvmsg().
> > > >
> > > > Let me double check I read the above correctly: are you concerned by
> > > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
> > > > touch a lot of skbs/cachelines before reaching the relevant skb?
> > > >
> > > > The end goal here is allowing an user-space application to read
> > > > incrementally/sequentially the received data while leaving them in
> > > > receive buffer.
> > > >
> > > > I don't see a better option than MSG_PEEK, am I missing something?
> > >
> > >
> > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> > > MSG_PEEK case is very strange IMO.
> > >
> > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> > > is used by the caller.
> > >
> > > That would only touch non fast paths.
> > >
> > > Since the API is mono-threaded anyway, the caller should not rely on
> > > the fact that normal recvmsg() call
> > > would 'consume' sk_peek_offset.
> >
> > Storing in sk_peek_seq the tcp next sequence number to be peeked should
> > avoid changes in the non MSG_PEEK cases.
> >
> > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
> > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
> > that be acceptable?
>
> We could have a parallel SO_PEEK_OFFSET option, reusing the same socket field.
>
> The new semantic would be : Supported by TCP (so far), and tcp
> recvmsg() only reads/writes this field when MSG_PEEK is used.
> Applications would have to clear the values themselves.
Those semantics would likely defeat the purpose of using SO_PEEK_OFF
for our use case, since we'd need an additional setsockopt() for every
non-PEEK recv() (which are all MSG_TRUNC in our case).
> BTW I see the man pages say SO_PEEK_OFF is "is currently supported
> only for unix(7) sockets"
Yes, this patch is explicitly aiming to change that.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 23:34 ` David Gibson
@ 2024-02-14 3:41 ` Eric Dumazet
2024-02-15 3:16 ` David Gibson
2024-02-15 3:21 ` David Gibson
1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2024-02-14 3:41 UTC (permalink / raw)
To: David Gibson
Cc: Paolo Abeni, kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy,
netdev, davem, Willem de Bruijn
On Wed, Feb 14, 2024 at 12:34 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
>
> > BTW I see the man pages say SO_PEEK_OFF is "is currently supported
> > only for unix(7) sockets"
>
> Yes, this patch is explicitly aiming to change that.
I was pointing out that UDP is supposed to support it already, since 2016,
the manpage should mention UDP already.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-14 3:41 ` Eric Dumazet
@ 2024-02-15 3:16 ` David Gibson
0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2024-02-15 3:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy,
netdev, davem, Willem de Bruijn
[-- Attachment #1: Type: text/plain, Size: 776 bytes --]
On Wed, Feb 14, 2024 at 04:41:14AM +0100, Eric Dumazet wrote:
> On Wed, Feb 14, 2024 at 12:34 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> >
> > > BTW I see the man pages say SO_PEEK_OFF is "is currently supported
> > > only for unix(7) sockets"
> >
> > Yes, this patch is explicitly aiming to change that.
>
> I was pointing out that UDP is supposed to support it already, since 2016,
> the manpage should mention UDP already.
Oh, sorry, I misunderstood. I tend to forget about this applied to
datagram sockets, since we have no use for that.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-13 23:34 ` David Gibson
2024-02-14 3:41 ` Eric Dumazet
@ 2024-02-15 3:21 ` David Gibson
2024-02-15 9:15 ` Eric Dumazet
1 sibling, 1 reply; 22+ messages in thread
From: David Gibson @ 2024-02-15 3:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy,
netdev, davem
[-- Attachment #1: Type: text/plain, Size: 3971 bytes --]
On Wed, Feb 14, 2024 at 10:34:50AM +1100, David Gibson wrote:
> On Tue, Feb 13, 2024 at 04:49:01PM +0100, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 4:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote:
> > > > On Tue, Feb 13, 2024 at 2:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > >
> > > > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote:
> > > > > > On Tue, Feb 13, 2024 at 11:49 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > >
> > > > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > > > > WRITE_ONCE(*seq, *seq + used);
> > > > > > > > copied += used;
> > > > > > > > len -= used;
> > > > > > > > -
> > > > > > > > + if (flags & MSG_PEEK)
> > > > > > > > + sk_peek_offset_fwd(sk, used);
> > > > > > > > + else
> > > > > > > > + sk_peek_offset_bwd(sk, used);
> > > > > >
> > > > > > Yet another cache miss in TCP fast path...
> > > > > >
> > > > > > We need to move sk_peek_off in a better location before we accept this patch.
> > > > > >
> > > > > > I always thought MSK_PEEK was very inefficient, I am surprised we
> > > > > > allow arbitrary loops in recvmsg().
> > > > >
> > > > > Let me double check I read the above correctly: are you concerned by
> > > > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could
> > > > > touch a lot of skbs/cachelines before reaching the relevant skb?
> > > > >
> > > > > The end goal here is allowing an user-space application to read
> > > > > incrementally/sequentially the received data while leaving them in
> > > > > receive buffer.
> > > > >
> > > > > I don't see a better option than MSG_PEEK, am I missing something?
> > > >
> > > >
> > > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non
> > > > MSG_PEEK case is very strange IMO.
> > > >
> > > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK
> > > > is used by the caller.
> > > >
> > > > That would only touch non fast paths.
> > > >
> > > > Since the API is mono-threaded anyway, the caller should not rely on
> > > > the fact that normal recvmsg() call
> > > > would 'consume' sk_peek_offset.
> > >
> > > Storing in sk_peek_seq the tcp next sequence number to be peeked should
> > > avoid changes in the non MSG_PEEK cases.
> > >
> > > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere
> > > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would
> > > that be acceptable?
> >
> > We could have a parallel SO_PEEK_OFFSET option, reusing the same socket field.
> >
> > The new semantic would be : Supported by TCP (so far), and tcp
> > recvmsg() only reads/writes this field when MSG_PEEK is used.
> > Applications would have to clear the values themselves.
>
> Those semantics would likely defeat the purpose of using SO_PEEK_OFF
> for our use case, since we'd need an additional setsockopt() for every
> non-PEEK recv() (which are all MSG_TRUNC in our case).
Btw, Eric,
If you're concerned about the extra access added to the "regular" TCP
path, would you be happier with the original approach Jon proposed:
that allowed a user to essentially supply an offset to an individial
MSG_PEEK recvmsg() by inserting a dummy entry as msg_iov[0] with a
NULL pointer and length to skip.
It did the job for us, although I admit it's a little ugly, which I
presume is why Paolo suggested we investigate SO_PEEK_OFF instead. I
think the SO_PEEK_OFF approach is more elegant, but maybe the
performance impact outweighs that.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] tcp: add support for SO_PEEK_OFF
2024-02-15 3:21 ` David Gibson
@ 2024-02-15 9:15 ` Eric Dumazet
0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2024-02-15 9:15 UTC (permalink / raw)
To: David Gibson
Cc: Paolo Abeni, kuba, passt-dev, sbrivio, lvivier, dgibson, jmaloy,
netdev, davem
On Thu, Feb 15, 2024 at 4:21 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
> Btw, Eric,
>
> If you're concerned about the extra access added to the "regular" TCP
> path, would you be happier with the original approach Jon proposed:
> that allowed a user to essentially supply an offset to an individial
> MSG_PEEK recvmsg() by inserting a dummy entry as msg_iov[0] with a
> NULL pointer and length to skip.
>
> It did the job for us, although I admit it's a little ugly, which I
> presume is why Paolo suggested we investigate SO_PEEK_OFF instead. I
> think the SO_PEEK_OFF approach is more elegant, but maybe the
> performance impact outweighs that.
>
Sorry, this was too ugly.
SO_PEEK_OFF is way better/standard, we have to polish the
implementation so that it is zero-cost for 99.9999 % of the users not
using MSG_PEEK..
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-02-19 2:03 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-09 22:12 [PATCH v3] tcp: add support for SO_PEEK_OFF jmaloy
2024-02-11 23:17 ` Stefano Brivio
2024-02-13 10:49 ` Paolo Abeni
2024-02-13 12:24 ` Eric Dumazet
2024-02-13 13:02 ` Paolo Abeni
2024-02-13 13:34 ` Eric Dumazet
2024-02-13 15:28 ` Paolo Abeni
2024-02-13 15:49 ` Eric Dumazet
2024-02-13 18:39 ` Paolo Abeni
2024-02-13 19:31 ` Eric Dumazet
[not found] ` <20687849-ec5c-9ce5-0a18-cc80f5b64816@redhat.com>
2024-02-15 17:41 ` Paolo Abeni
2024-02-15 17:46 ` Eric Dumazet
2024-02-15 22:24 ` Jon Maloy
2024-02-16 9:14 ` Paolo Abeni
2024-02-16 9:21 ` Eric Dumazet
[not found] ` <6a9f5dec-eb0c-51ef-0911-7345f50e08f0@redhat.com>
2024-02-16 10:55 ` Eric Dumazet
2024-02-19 2:02 ` David Gibson
2024-02-13 23:34 ` David Gibson
2024-02-14 3:41 ` Eric Dumazet
2024-02-15 3:16 ` David Gibson
2024-02-15 3:21 ` David Gibson
2024-02-15 9:15 ` Eric Dumazet
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).