From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>, passt-dev@passt.top
Subject: Re: [RFC] tcp: Replace TCP buffer structure by an iovec array
Date: Fri, 15 Mar 2024 10:46:55 +1000 [thread overview]
Message-ID: <ZfOaf1cgy_UP9-nR@zatzit> (raw)
In-Reply-To: <20240314172617.22c28caa@elisabeth>
[-- Attachment #1: Type: text/plain, Size: 5948 bytes --]
On Thu, Mar 14, 2024 at 05:26:17PM +0100, Stefano Brivio wrote:
> On Thu, 14 Mar 2024 16:54:02 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
>
> > On 3/14/24 16:47, Stefano Brivio wrote:
> > > On Thu, 14 Mar 2024 15:07:48 +0100
> > > Laurent Vivier <lvivier@redhat.com> wrote:
> > >
> > >> On 3/13/24 12:37, Stefano Brivio wrote:
> > >> ...
> > >>>> @@ -390,6 +414,42 @@ static size_t tap_send_frames_passt(const struct ctx *c,
> > >>>> return i;
> > >>>> }
> > >>>>
> > >>>> +/**
> > >>>> + * tap_send_iov_passt() - Send out multiple prepared frames
> > >>>
> > >>> ...I would argue that this function prepares frames as well. Maybe:
> > >>>
> > >>> * tap_send_iov_passt() - Prepare TCP_IOV_VNET parts and send multiple frames
> > >>>
> > >>>> + * @c: Execution context
> > >>>> + * @iov: Array of frames, each frames is divided in an array of iovecs.
> > >>>> + * The first entry of the iovec is updated to point to an
> > >>>> + * uint32_t storing the frame length.
> > >>>
> > >>> * @iov: Array of frames, each one a vector of parts, TCP_IOV_VNET blank
> > >>>
> > >>>> + * @n: Number of frames in @iov
> > >>>> + *
> > >>>> + * Return: number of frames actually sent
> > >>>> + */
> > >>>> +static size_t tap_send_iov_passt(const struct ctx *c,
> > >>>> + struct iovec iov[][TCP_IOV_NUM],
> > >>>> + size_t n)
> > >>>> +{
> > >>>> + unsigned int i;
> > >>>> +
> > >>>> + for (i = 0; i < n; i++) {
> > >>>> + uint32_t vnet_len;
> > >>>> + int j;
> > >>>> +
> > >>>> + vnet_len = 0;
> > >>>
> > >>> This could be initialised in the declaration (yes, it's "reset" at
> > >>> every loop iteration).
> > >>>
> > >>>> + for (j = TCP_IOV_ETH; j < TCP_IOV_NUM; j++)
> > >>>> + vnet_len += iov[i][j].iov_len;
> > >>>> +
> > >>>> + vnet_len = htonl(vnet_len);
> > >>>> + iov[i][TCP_IOV_VNET].iov_base = &vnet_len;
> > >>>> + iov[i][TCP_IOV_VNET].iov_len = sizeof(vnet_len);
> > >>>> +
> > >>>> + if (!tap_send_frames_passt(c, iov[i], TCP_IOV_NUM))
> > >>>
> > >>> ...which would now send a single frame at a time, but actually it can
> > >>> already send everything in one shot because it's using sendmsg(), if you
> > >>> move it outside of the loop and do something like (untested):
> > >>>
> > >>> return tap_send_frames_passt(c, iov, TCP_IOV_NUM * n);
> > >>>
> > >>>> + break;
> > >>>> + }
> > >>>> +
> > >>>> + return i;
> > >>>> +
> > >>>> +}
> > >>>> +
> > >>
> > >> I tried to do something like that but I have a performance drop:
> > >>
> > >> static size_t tap_send_iov_passt(const struct ctx *c,
> > >> struct iovec iov[][TCP_IOV_NUM],
> > >> size_t n)
> > >> {
> > >> unsigned int i;
> > >> uint32_t vnet_len[n];
> > >>
> > >> for (i = 0; i < n; i++) {
> > >> int j;
> > >>
> > >> vnet_len[i] = 0;
> > >> for (j = TCP_IOV_ETH; j < TCP_IOV_NUM; j++)
> > >> vnet_len[i] += iov[i][j].iov_len;
> > >>
> > >> vnet_len[i] = htonl(vnet_len[i]);
> > >> iov[i][TCP_IOV_VNET].iov_base = &vnet_len[i];
> > >> iov[i][TCP_IOV_VNET].iov_len = sizeof(uint32_t);
> > >> }
> > >>
> > >> return tap_send_frames_passt(c, &iov[0][0], TCP_IOV_NUM * n) / TCP_IOV_NUM;
> > >> }
> > >>
> > >> iperf3 -c localhost -p 10001 -t 60 -4
> > >>
> > >> berfore
> > >> [ ID] Interval Transfer Bitrate Retr
> > >> [ 5] 0.00-60.00 sec 33.0 GBytes 4.72 Gbits/sec 1 sender
> > >> [ 5] 0.00-60.06 sec 33.0 GBytes 4.72 Gbits/sec receiver
> > >>
> > >> after:
> > >> [ ID] Interval Transfer Bitrate Retr
> > >> [ 5] 0.00-60.00 sec 18.2 GBytes 2.60 Gbits/sec 0 sender
> > >> [ 5] 0.00-60.07 sec 18.2 GBytes 2.60 Gbits/sec receiver
> > >
> > > Weird, it looks like doing one sendmsg() per frame results in a higher
> > > throughput than one sendmsg() per multiple frames, which sounds rather
> > > absurd. Perhaps we should start looking into what perf(1) reports, in
> > > terms of both syscall overhead and cache misses.
> > >
> > > I'll have a look later today or tomorrow -- unless you have other
> > > ideas as to why this might happen...
> >
> > Perhaps in first case we only update one vnet_len and in the second case we have to update
> > an array of vnet_len, so there is an use of more cache lines?
We should be able to test this relatively easily, yes? By updating
all the vnet_len then using a single sendmsg().
> Yes, I'm wondering if for example this:
>
> iov[i][TCP_IOV_VNET].iov_base = &vnet_len[i];
>
> causes a prefetch of everything pointed by iov[i][...], so we would
> prefetch (and throw away) each buffer, one by one.
>
> Another interesting experiment to verify if this is the case could be
> to "flush" a few frames at a time (say, 4), with something like this on
> top of your original change (completely untested):
>
> [...]
>
> if (!((i + 1) % 4) &&
> !tap_send_frames_passt(c, iov[i / 4], TCP_IOV_NUM * 4))
> break;
> }
>
> if ((i + 1) % 4) {
> tap_send_frames_passt(c, iov[i / 4],
> TCP_IOV_NUM * ((i + 1) % 4));
> }
>
> Or maybe we could set vnet_len right after we receive data in the
> buffers.
I really hope we can avoid this. If we want to allow IPv4<->IPv6
translation, then we can't know the vnet_len until after we've done
our routing / translation.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-15 0:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-11 13:33 [RFC] tcp: Replace TCP buffer structure by an iovec array Laurent Vivier
2024-03-12 22:56 ` Stefano Brivio
2024-03-13 11:37 ` Stefano Brivio
2024-03-13 14:42 ` Laurent Vivier
2024-03-13 15:27 ` Stefano Brivio
2024-03-13 15:20 ` Laurent Vivier
2024-03-13 16:58 ` Stefano Brivio
2024-03-14 14:07 ` Laurent Vivier
2024-03-14 15:47 ` Stefano Brivio
2024-03-14 15:54 ` Laurent Vivier
2024-03-14 16:26 ` Stefano Brivio
2024-03-15 0:46 ` David Gibson [this message]
2024-03-14 4:22 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfOaf1cgy_UP9-nR@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).