public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 4/8] udp: Receive multiple datagrams at once on the pasta sock->tap path
Date: Thu, 22 Dec 2022 03:37:00 +0100	[thread overview]
Message-ID: <20221222033700.3ca2b46e@elisabeth> (raw)
In-Reply-To: <Y6Kg+DJ3YR7K2LXR@yekko>

On Wed, 21 Dec 2022 17:00:24 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Dec 20, 2022 at 11:42:46AM +0100, Stefano Brivio wrote:
> > Sorry for the further delay,
> > 
> > On Wed, 14 Dec 2022 11:35:46 +0100
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Wed, 14 Dec 2022 12:42:14 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Dec 13, 2022 at 11:48:47PM +0100, Stefano Brivio wrote:    
> > > > > Sorry for the long delay here,
> > > > > 
> > > > > On Mon,  5 Dec 2022 19:14:21 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >       
> > > > > > Usually udp_sock_handler() will receive and forward multiple (up to 32)
> > > > > > datagrams in udp_sock_handler(), then forward them all to the tap
> > > > > > interface.  For unclear reasons, though, when in pasta mode we will only
> > > > > > receive and forward a single datagram at a time.  Change it to receive
> > > > > > multiple datagrams at once, like the other paths.      
> > > > > 
> > > > > This is explained in the commit message of 6c931118643c ("tcp, udp:
> > > > > Receive batching doesn't pay off when writing single frames to tap").
> > > > > 
> > > > > I think it's worth re-checking the throughput now as this path is a bit
> > > > > different, but unfortunately I didn't include this in the "perf" tests :(
> > > > > because at the time I introduced those I wasn't sure it even made sense to
> > > > > have traffic from the same host being directed to the tap device.
> > > > > 
> > > > > The iperf3 runs were I observed this are actually the ones from the Podman
> > > > > demo. Ideally that case should be also checked in the perf/pasta_udp tests.      
> > > > 
> > > > Hm, ok.
> > > >     
> > > > > How fundamental is this for the rest of the series? I couldn't find any
> > > > > actual dependency on this but I might be missing something.      
> > > > 
> > > > So the issue is that prior to this change in pasta we receive multiple
> > > > frames at once on the splice path, but one frame at a time on the tap
> > > > path.  By the end of this series we can't do that any more, because we
> > > > don't know before the recvmmsg() which one we'll be doing.    
> > > 
> > > Oh, right, I see. Then let me add this path to the perf/pasta_udp test
> > > and check how relevant this is now, I'll get back to you in a bit.  
> > 
> > I was checking the wrong path. With this:
> > 
> > diff --git a/test/perf/pasta_udp b/test/perf/pasta_udp
> > index 27ea724..973c2f4 100644
> > --- a/test/perf/pasta_udp
> > +++ b/test/perf/pasta_udp
> > @@ -31,6 +31,14 @@ report	pasta lo_udp 1 __FREQ__
> >  
> >  th	MTU 1500B 4000B 16384B 65535B
> >  
> > +tr	UDP throughput over IPv6: host to ns
> > +nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
> > +nsout	ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.scope == "global" and .prefixlen == 64).local'
> > +bw	-
> > +bw	-
> > +bw	-
> > +iperf3	BW host ns __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G
> > +bw	__BW__ 7.0 9.0
> >  
> >  tr	UDP throughput over IPv6: ns to host
> >  ns	ip link set dev lo mtu 1500
> > diff --git a/test/run b/test/run
> > index e07513f..b53182b 100755
> > --- a/test/run
> > +++ b/test/run
> > @@ -67,6 +67,14 @@ run() {
> >  	test build/clang_tidy
> >  	teardown build
> >  
> > +	VALGRIND=0
> > +	setup passt_in_ns
> > +	test passt/ndp
> > +	test passt/dhcp
> > +	test perf/pasta_udp
> > +	test passt_in_ns/shutdown
> > +	teardown passt_in_ns
> > +
> >  	setup pasta
> >  	test pasta/ndp
> >  	test pasta/dhcp  
> 
> Ah, ok.  Can we add that to the standard set of tests ASAP, please.

Yes -- that part itself was easy, but now I'm fighting against my own
finest write-only code that generates the JavaScript snippet for the
performance report (perf_fill_lines() in test/lib/perf_report -- and
this is not a suggestion to have a look at it ;)).

I'm trying to rework it a bit together with the "new" test.

> > I get 21.6 gbps after this series, and 29.7 gbps before -- it's quite
> > significant.  
> 
> Drat.
> 
> > And there's nothing strange in perf's output, really, the distribution
> > of overhead per functions is pretty much the same, but writing multiple
> > messages to the tap device just takes more cycles per message compared
> > to a single message.  
> 
> That's so weird.  It should be basically an identical set of write()s,
> except that they happen in a batch, rather than a bit spread out.  I
> guess it has to be some kind of cache locality thing.  I wonder if the
> difference would go away or reverse if we had a way to submit multiple
> frames with a single syscall.

I haven't tried, but to test this, I think we could actually just write
multiple frames in a single call, with subsequent headers and
everything, and the iperf3 server will simply report how many bytes it
received.

> > I'm a bit ashamed to propose this, but do you think about something
> > like:  
> 
> > 	if (c->mode == MODE_PASTA) { if (recvmmsg(ref.r.s, mmh_recv,
> > 		1, 0, NULL) <= 0) return;  
> 
> > 		if (udp_mmh_splice_port(v6, mmh_recv)) { n =
> > 			recvmmsg(ref.r.s, mmh_recv + 1, UDP_MAX_FRAMES
> > 			- 1, 0, NULL); }  
> 
> > 		if (n > 0) n++; else n = 1; } else { n =
> > 			recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0,
> > 			NULL); if (n <= 0) return; }  
> 
> > ? Other than the inherent ugliness, it looks like a good
> > approximation to me.  
> 
> Hmm.  Well, the first question is how much impact does going 1 message
> at a time have on the spliced throughput.  If it's not too bad, then
> we could just always go one at a time for pasta, regardless of
> splicing.  And we could even abstract that difference into the tap
> backend with a callback like tap_batch_size(c).

Right... it used to be significantly worse in the "spliced" case, I
checked that when I did that commit to use 1 instead of UDP_MAX_FRAME
in the other case, but I don't have data. I'll test this again.

-- 
Stefano


  reply	other threads:[~2022-12-22  2:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-05  8:14 [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications David Gibson
2022-12-05  8:14 ` [PATCH 1/8] udp: Move sending pasta tap frames to the end of udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 2/8] udp: Split sending to passt tap interface into separate function David Gibson
2022-12-05  8:14 ` [PATCH 3/8] udp: Split receive from preparation and send in udp_sock_handler() David Gibson
2022-12-05  8:14 ` [PATCH 4/8] udp: Receive multiple datagrams at once on the pasta sock->tap path David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:42     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-20 10:42         ` Stefano Brivio
2022-12-21  6:00           ` David Gibson
2022-12-22  2:37             ` Stefano Brivio [this message]
2023-01-04  0:08             ` Stefano Brivio
2023-01-04  4:53               ` David Gibson
2022-12-05  8:14 ` [PATCH 5/8] udp: Pre-populate msg_names with local address David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:22     ` David Gibson
2022-12-05  8:14 ` [PATCH 6/8] udp: Unify udp_sock_handler_splice() with udp_sock_handler() David Gibson
2022-12-13 22:48   ` Stefano Brivio
2022-12-14  1:19     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-05  8:14 ` [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket David Gibson
2022-12-13 22:49   ` Stefano Brivio
2022-12-14  1:47     ` David Gibson
2022-12-14 10:35       ` Stefano Brivio
2022-12-15  0:33         ` David Gibson
2022-12-05  8:14 ` [PATCH 8/8] udp: Don't use separate sockets to listen for spliced packets David Gibson
2022-12-06  6:45 ` [PATCH 0/8] Don't use additional sockets for receiving "spliced" UDP communications Stefano Brivio
2022-12-06  6:46   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221222033700.3ca2b46e@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).