public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [RFC] tcp: Replace TCP buffer structure by an iovec array
Date: Thu, 14 Mar 2024 17:26:17 +0100	[thread overview]
Message-ID: <20240314172617.22c28caa@elisabeth> (raw)
In-Reply-To: <893d5b17-cb92-49bf-8752-7ba1d798ceeb@redhat.com>

On Thu, 14 Mar 2024 16:54:02 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 3/14/24 16:47, Stefano Brivio wrote:
> > On Thu, 14 Mar 2024 15:07:48 +0100
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> >> On 3/13/24 12:37, Stefano Brivio wrote:
> >> ...  
> >>>> @@ -390,6 +414,42 @@ static size_t tap_send_frames_passt(const struct ctx *c,
> >>>>    	return i;
> >>>>    }
> >>>>    
> >>>> +/**
> >>>> + * tap_send_iov_passt() - Send out multiple prepared frames  
> >>>
> >>> ...I would argue that this function prepares frames as well. Maybe:
> >>>
> >>>    * tap_send_iov_passt() - Prepare TCP_IOV_VNET parts and send multiple frames
> >>>      
> >>>> + * @c:		Execution context
> >>>> + * @iov:	Array of frames, each frames is divided in an array of iovecs.
> >>>> + *              The first entry of the iovec is updated to point to an
> >>>> + *              uint32_t storing the frame length.  
> >>>
> >>>    * @iov:	Array of frames, each one a vector of parts, TCP_IOV_VNET blank
> >>>      
> >>>> + * @n:		Number of frames in @iov
> >>>> + *
> >>>> + * Return: number of frames actually sent
> >>>> + */
> >>>> +static size_t tap_send_iov_passt(const struct ctx *c,
> >>>> +				 struct iovec iov[][TCP_IOV_NUM],
> >>>> +				 size_t n)
> >>>> +{
> >>>> +	unsigned int i;
> >>>> +
> >>>> +	for (i = 0; i < n; i++) {
> >>>> +		uint32_t vnet_len;
> >>>> +		int j;
> >>>> +
> >>>> +		vnet_len = 0;  
> >>>
> >>> This could be initialised in the declaration (yes, it's "reset" at
> >>> every loop iteration).
> >>>      
> >>>> +		for (j = TCP_IOV_ETH; j < TCP_IOV_NUM; j++)
> >>>> +			vnet_len += iov[i][j].iov_len;
> >>>> +
> >>>> +		vnet_len = htonl(vnet_len);
> >>>> +		iov[i][TCP_IOV_VNET].iov_base = &vnet_len;
> >>>> +		iov[i][TCP_IOV_VNET].iov_len = sizeof(vnet_len);
> >>>> +
> >>>> +		if (!tap_send_frames_passt(c, iov[i], TCP_IOV_NUM))  
> >>>
> >>> ...which would now send a single frame at a time, but actually it can
> >>> already send everything in one shot because it's using sendmsg(), if you
> >>> move it outside of the loop and do something like (untested):
> >>>
> >>> 	return tap_send_frames_passt(c, iov, TCP_IOV_NUM * n);
> >>>      
> >>>> +			break;
> >>>> +	}
> >>>> +
> >>>> +	return i;
> >>>> +
> >>>> +}
> >>>> +  
> >>
> >> I tried to do something like that but I have a performance drop:
> >>
> >> static size_t tap_send_iov_passt(const struct ctx *c,
> >>                                    struct iovec iov[][TCP_IOV_NUM],
> >>                                    size_t n)
> >> {
> >>           unsigned int i;
> >>           uint32_t vnet_len[n];
> >>
> >>           for (i = 0; i < n; i++) {
> >>                   int j;
> >>
> >>                   vnet_len[i] = 0;
> >>                   for (j = TCP_IOV_ETH; j < TCP_IOV_NUM; j++)
> >>                           vnet_len[i] += iov[i][j].iov_len;
> >>
> >>                   vnet_len[i] = htonl(vnet_len[i]);
> >>                   iov[i][TCP_IOV_VNET].iov_base = &vnet_len[i];
> >>                   iov[i][TCP_IOV_VNET].iov_len = sizeof(uint32_t);
> >>           }
> >>
> >>           return tap_send_frames_passt(c, &iov[0][0], TCP_IOV_NUM * n) / TCP_IOV_NUM;
> >> }
> >>
> >> iperf3 -c localhost -p 10001  -t 60  -4
> >>
> >> berfore
> >> [ ID] Interval           Transfer     Bitrate         Retr
> >> [  5]   0.00-60.00  sec  33.0 GBytes  4.72 Gbits/sec    1             sender
> >> [  5]   0.00-60.06  sec  33.0 GBytes  4.72 Gbits/sec                  receiver
> >>
> >> after:
> >> [ ID] Interval           Transfer     Bitrate         Retr
> >> [  5]   0.00-60.00  sec  18.2 GBytes  2.60 Gbits/sec    0             sender
> >> [  5]   0.00-60.07  sec  18.2 GBytes  2.60 Gbits/sec                  receiver  
> > 
> > Weird, it looks like doing one sendmsg() per frame results in a higher
> > throughput than one sendmsg() per multiple frames, which sounds rather
> > absurd. Perhaps we should start looking into what perf(1) reports, in
> > terms of both syscall overhead and cache misses.
> > 
> > I'll have a look later today or tomorrow -- unless you have other
> > ideas as to why this might happen...
> 
> Perhaps in first case we only update one vnet_len and in the second case we have to update 
> an array of vnet_len, so there is an use of more cache lines?

Yes, I'm wondering if for example this:

		iov[i][TCP_IOV_VNET].iov_base = &vnet_len[i];

causes a prefetch of everything pointed by iov[i][...], so we would
prefetch (and throw away) each buffer, one by one.

Another interesting experiment to verify if this is the case could be
to "flush" a few frames at a time (say, 4), with something like this on
top of your original change (completely untested):

		[...]

		if (!((i + 1) % 4) &&
		    !tap_send_frames_passt(c, iov[i / 4], TCP_IOV_NUM * 4))
			break;
	}

	if ((i + 1) % 4) {
		tap_send_frames_passt(c, iov[i / 4],
				      TCP_IOV_NUM * ((i + 1) % 4));
	}

Or maybe we could set vnet_len right after we receive data in the
buffers.

-- 
Stefano


  reply	other threads:[~2024-03-14 16:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 13:33 [RFC] tcp: Replace TCP buffer structure by an iovec array Laurent Vivier
2024-03-12 22:56 ` Stefano Brivio
2024-03-13 11:37 ` Stefano Brivio
2024-03-13 14:42   ` Laurent Vivier
2024-03-13 15:27     ` Stefano Brivio
2024-03-13 15:20   ` Laurent Vivier
2024-03-13 16:58     ` Stefano Brivio
2024-03-14 14:07   ` Laurent Vivier
2024-03-14 15:47     ` Stefano Brivio
2024-03-14 15:54       ` Laurent Vivier
2024-03-14 16:26         ` Stefano Brivio [this message]
2024-03-15  0:46           ` David Gibson
2024-03-14  4:22 ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240314172617.22c28caa@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).