public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com
Subject: Re: [RFC net-next v2] tcp: add support for read with offset when using MSG_PEEK
Date: Wed, 6 Dec 2023 11:48:19 -0500	[thread overview]
Message-ID: <c36ad7d6-ff76-8ca1-609b-987dd420af5c@redhat.com> (raw)
In-Reply-To: <20231205232028.1490809-1-jmaloy@redhat.com>

Note that I only sent this one to passt-dev, not netdev.
I would appreciate feedback and possible ack/reviewed-by  as soon as 
possible so I can send it to netdev.

///jon

On 2023-12-05 18:20, Jon Maloy wrote:
> When reading received messages with MSG_PEEK, we sometines have to read
> the leading bytes of the stream several times, only to reach the bytes
> we really want. This is clearly non-optimal.
>
> What we would want is something similar to pread/preadv(), but working
> even for tcp sockets. At the same time, we don't want to add any new
> arguments to the recv/recvmsg() calls.
>
> In this commit, we allow the user to set iovec.iov_base in the first
> vector entry to NULL. This tells the socket to skip the first entry,
> hence letting the iov_len field of that entry indicate the offset value.
> This way, there is no need to add any new arguments or flags.
>
> In the iperf3 logs examples shown below, we can observe a throughput
> improvement of ~20 % in the direction host->namespace when using the
> protocol splicer 'passt'. This is a consistent result.
>
> $ ./passt/passt/pasta --config-net  -f
> MSG_PEEK with offset not supported.
> [root@fedora37 ~]# perf record iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 60344
> [  6] local 192.168.122.163 port 5201 connected to 192.168.122.1 port 60360
> [ ID] Interval           Transfer     Bitrate
> {...]
> [  6]  13.00-14.00  sec  2.54 GBytes  21.8 Gbits/sec
> [  6]  14.00-15.00  sec  2.52 GBytes  21.7 Gbits/sec
> [  6]  15.00-16.00  sec  2.50 GBytes  21.5 Gbits/sec
> [  6]  16.00-17.00  sec  2.49 GBytes  21.4 Gbits/sec
> [  6]  17.00-18.00  sec  2.51 GBytes  21.6 Gbits/sec
> [  6]  18.00-19.00  sec  2.48 GBytes  21.3 Gbits/sec
> [  6]  19.00-20.00  sec  2.49 GBytes  21.4 Gbits/sec
> [  6]  20.00-20.04  sec  87.4 MBytes  19.2 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  6]   0.00-20.04  sec  48.9 GBytes  21.0 Gbits/sec receiver
> -----------------------------------------------------------
>
> [jmaloy@fedora37 ~]$ ./passt/passt/pasta --config-net  -f
> MSG_PEEK with offset supported.
> [root@fedora37 ~]# perf record iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 46362
> [  6] local 192.168.122.163 port 5201 connected to 192.168.122.1 port 46374
> [ ID] Interval           Transfer     Bitrate
> [...]
> [  6]  12.00-13.00  sec  3.18 GBytes  27.3 Gbits/sec
> [  6]  13.00-14.00  sec  3.17 GBytes  27.3 Gbits/sec
> [  6]  14.00-15.00  sec  3.13 GBytes  26.9 Gbits/sec
> [  6]  15.00-16.00  sec  3.17 GBytes  27.3 Gbits/sec
> [  6]  16.00-17.00  sec  3.17 GBytes  27.2 Gbits/sec
> [  6]  17.00-18.00  sec  3.14 GBytes  27.0 Gbits/sec
> [  6]  18.00-19.00  sec  3.17 GBytes  27.2 Gbits/sec
> [  6]  19.00-20.00  sec  3.12 GBytes  26.8 Gbits/sec
> [  6]  20.00-20.04  sec   119 MBytes  25.5 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  6]   0.00-20.04  sec  59.4 GBytes  25.4 Gbits/sec receiver
> -----------------------------------------------------------
>
> Passt is used to support VMs in containers, such as KubeVirt, and
> is also generally supported in libvirt/QEMU since release 9.2 / 7.2.
>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
>   net/ipv4/tcp.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 53bcc17c91e4..e9d3b5bf2f66 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2310,6 +2310,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
>   			      int *cmsg_flags)
>   {
>   	struct tcp_sock *tp = tcp_sk(sk);
> +	size_t peek_offset;
>   	int copied = 0;
>   	u32 peek_seq;
>   	u32 *seq;
> @@ -2353,6 +2354,20 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
>   	if (flags & MSG_PEEK) {
>   		peek_seq = tp->copied_seq;
>   		seq = &peek_seq;
> +		if (!msg->msg_iter.__iov[0].iov_base) {
> +			peek_offset = msg->msg_iter.__iov[0].iov_len;
> +			msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
> +			if (msg->msg_iter.nr_segs <= 1)
> +				goto out;
> +			msg->msg_iter.nr_segs -= 1;
> +			if (msg->msg_iter.count <= peek_offset)
> +				goto out;
> +			msg->msg_iter.count -= peek_offset;
> +			if (len <= peek_offset)
> +				goto out;
> +			len -= peek_offset;
> +			*seq += peek_offset;
> +		}
>   	}
>   
>   	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);


  reply	other threads:[~2023-12-06 16:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05 23:20 [RFC net-next v2] tcp: add support for read with offset when using MSG_PEEK Jon Maloy
2023-12-06 16:48 ` Jon Maloy [this message]
2023-12-06 18:02 ` Stefano Brivio
2024-01-20 16:52 jmaloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c36ad7d6-ff76-8ca1-609b-987dd420af5c@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=dgibson@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).