From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 04E245A0271 for ; Fri, 15 Mar 2024 01:47:13 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1710463626; bh=UlZtArojAYAcqrMvKsBuyMRm+vr4UYEy6FiuNTXi2FI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=awogHdLxNFqPo/CXEsIo+UewhrIq+MPktXVsr7zEcsERQuYpA9+yTNlzadv0FDQYZ deYDZwXvyQAFXb4Xqgzo6uu87Gbv4qyxKZfstbYW7ud8yDwywR+E/IkFlk73y8LN8X hXy8dceu7PDcj4rEIS90qkHzdOIA+WMjK5Xwl+ermrC0fcM144bXtko3RYE0neMT5K bFEkhz/6rWnTi9xZ9P8yF7s2AKqA4lHORobYSqI+V8j8JNSuAnIRTCwM5SA0tqYY/3 PARWtif/c+2NMvcIGSV7vhMSplcag/IHHV8cysOc+aUQQHie5R1pXg6JA+BvFsEf5r 9xldOX6EZ3ppg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TwlvG4wlxz4wcK; Fri, 15 Mar 2024 11:47:06 +1100 (AEDT) Date: Fri, 15 Mar 2024 10:46:55 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [RFC] tcp: Replace TCP buffer structure by an iovec array Message-ID: References: <20240311133356.1405001-1-lvivier@redhat.com> <20240313123725.7a37f311@elisabeth> <84cadd0b-4102-4bde-bad6-45705cca34ce@redhat.com> <20240314164707.75ee6501@elisabeth> <893d5b17-cb92-49bf-8752-7ba1d798ceeb@redhat.com> <20240314172617.22c28caa@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Vi1lSvuE1bqpC9EL" Content-Disposition: inline In-Reply-To: <20240314172617.22c28caa@elisabeth> Message-ID-Hash: LVD3W75BK2R7ZWJKJ2RKDXW4JIR2Y42W X-Message-ID-Hash: LVD3W75BK2R7ZWJKJ2RKDXW4JIR2Y42W X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Vi1lSvuE1bqpC9EL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 14, 2024 at 05:26:17PM +0100, Stefano Brivio wrote: > On Thu, 14 Mar 2024 16:54:02 +0100 > Laurent Vivier wrote: >=20 > > On 3/14/24 16:47, Stefano Brivio wrote: > > > On Thu, 14 Mar 2024 15:07:48 +0100 > > > Laurent Vivier wrote: > > > =20 > > >> On 3/13/24 12:37, Stefano Brivio wrote: > > >> ... =20 > > >>>> @@ -390,6 +414,42 @@ static size_t tap_send_frames_passt(const str= uct ctx *c, > > >>>> return i; > > >>>> } > > >>>> =20 > > >>>> +/** > > >>>> + * tap_send_iov_passt() - Send out multiple prepared frames =20 > > >>> > > >>> ...I would argue that this function prepares frames as well. Maybe: > > >>> > > >>> * tap_send_iov_passt() - Prepare TCP_IOV_VNET parts and send mul= tiple frames > > >>> =20 > > >>>> + * @c: Execution context > > >>>> + * @iov: Array of frames, each frames is divided in an array of i= ovecs. > > >>>> + * The first entry of the iovec is updated to point = to an > > >>>> + * uint32_t storing the frame length. =20 > > >>> > > >>> * @iov: Array of frames, each one a vector of parts, TCP_IOV_VNE= T blank > > >>> =20 > > >>>> + * @n: Number of frames in @iov > > >>>> + * > > >>>> + * Return: number of frames actually sent > > >>>> + */ > > >>>> +static size_t tap_send_iov_passt(const struct ctx *c, > > >>>> + struct iovec iov[][TCP_IOV_NUM], > > >>>> + size_t n) > > >>>> +{ > > >>>> + unsigned int i; > > >>>> + > > >>>> + for (i =3D 0; i < n; i++) { > > >>>> + uint32_t vnet_len; > > >>>> + int j; > > >>>> + > > >>>> + vnet_len =3D 0; =20 > > >>> > > >>> This could be initialised in the declaration (yes, it's "reset" at > > >>> every loop iteration). > > >>> =20 > > >>>> + for (j =3D TCP_IOV_ETH; j < TCP_IOV_NUM; j++) > > >>>> + vnet_len +=3D iov[i][j].iov_len; > > >>>> + > > >>>> + vnet_len =3D htonl(vnet_len); > > >>>> + iov[i][TCP_IOV_VNET].iov_base =3D &vnet_len; > > >>>> + iov[i][TCP_IOV_VNET].iov_len =3D sizeof(vnet_len); > > >>>> + > > >>>> + if (!tap_send_frames_passt(c, iov[i], TCP_IOV_NUM)) =20 > > >>> > > >>> ...which would now send a single frame at a time, but actually it c= an > > >>> already send everything in one shot because it's using sendmsg(), i= f you > > >>> move it outside of the loop and do something like (untested): > > >>> > > >>> return tap_send_frames_passt(c, iov, TCP_IOV_NUM * n); > > >>> =20 > > >>>> + break; > > >>>> + } > > >>>> + > > >>>> + return i; > > >>>> + > > >>>> +} > > >>>> + =20 > > >> > > >> I tried to do something like that but I have a performance drop: > > >> > > >> static size_t tap_send_iov_passt(const struct ctx *c, > > >> struct iovec iov[][TCP_IOV_NUM], > > >> size_t n) > > >> { > > >> unsigned int i; > > >> uint32_t vnet_len[n]; > > >> > > >> for (i =3D 0; i < n; i++) { > > >> int j; > > >> > > >> vnet_len[i] =3D 0; > > >> for (j =3D TCP_IOV_ETH; j < TCP_IOV_NUM; j++) > > >> vnet_len[i] +=3D iov[i][j].iov_len; > > >> > > >> vnet_len[i] =3D htonl(vnet_len[i]); > > >> iov[i][TCP_IOV_VNET].iov_base =3D &vnet_len[i]; > > >> iov[i][TCP_IOV_VNET].iov_len =3D sizeof(uint32_t); > > >> } > > >> > > >> return tap_send_frames_passt(c, &iov[0][0], TCP_IOV_NUM * = n) / TCP_IOV_NUM; > > >> } > > >> > > >> iperf3 -c localhost -p 10001 -t 60 -4 > > >> > > >> berfore > > >> [ ID] Interval Transfer Bitrate Retr > > >> [ 5] 0.00-60.00 sec 33.0 GBytes 4.72 Gbits/sec 1 = sender > > >> [ 5] 0.00-60.06 sec 33.0 GBytes 4.72 Gbits/sec = receiver > > >> > > >> after: > > >> [ ID] Interval Transfer Bitrate Retr > > >> [ 5] 0.00-60.00 sec 18.2 GBytes 2.60 Gbits/sec 0 = sender > > >> [ 5] 0.00-60.07 sec 18.2 GBytes 2.60 Gbits/sec = receiver =20 > > >=20 > > > Weird, it looks like doing one sendmsg() per frame results in a higher > > > throughput than one sendmsg() per multiple frames, which sounds rather > > > absurd. Perhaps we should start looking into what perf(1) reports, in > > > terms of both syscall overhead and cache misses. > > >=20 > > > I'll have a look later today or tomorrow -- unless you have other > > > ideas as to why this might happen... > >=20 > > Perhaps in first case we only update one vnet_len and in the second cas= e we have to update=20 > > an array of vnet_len, so there is an use of more cache lines? We should be able to test this relatively easily, yes? By updating all the vnet_len then using a single sendmsg(). > Yes, I'm wondering if for example this: >=20 > iov[i][TCP_IOV_VNET].iov_base =3D &vnet_len[i]; >=20 > causes a prefetch of everything pointed by iov[i][...], so we would > prefetch (and throw away) each buffer, one by one. >=20 > Another interesting experiment to verify if this is the case could be > to "flush" a few frames at a time (say, 4), with something like this on > top of your original change (completely untested): >=20 > [...] >=20 > if (!((i + 1) % 4) && > !tap_send_frames_passt(c, iov[i / 4], TCP_IOV_NUM * 4)) > break; > } >=20 > if ((i + 1) % 4) { > tap_send_frames_passt(c, iov[i / 4], > TCP_IOV_NUM * ((i + 1) % 4)); > } >=20 > Or maybe we could set vnet_len right after we receive data in the > buffers. I really hope we can avoid this. If we want to allow IPv4<->IPv6 translation, then we can't know the vnet_len until after we've done our routing / translation. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Vi1lSvuE1bqpC9EL Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXzmlwACgkQzQJF27ox 2GegZxAAiuCMu3kuMTueYk/8zZaRaUJPpoTlTdhO4xnx/qyCsHGiny8hxhlApwFY kLCP9DFZ+iPsLa1R/o1H1NAmTM7IaKIadTxzz42aeO+DcE1FNKrAVk0J8OUFM9o0 +CGajCTmbEOyOspcLHLu686yNUKqK2s+6gaiitqLqpUBvu9+cVhwumOsSXXkaaza +V8fX5zcQYwi0V9KI/j1WPoTM3AWuU1NV4d8e8Wk+usWpxM9Bwp47WgAVM8h860b RvogJRNEUkgTjcyFdTGIz1rxRwlkBZ28OYxWPvUslA51P3HIg1uNSRxjbLBP7N/o WMfNlc6DOu732hRQ5G9ZnbYoAYxJnzjJ6PMUQ0kr9lMqgUAGTvhjO2SR25FiKSKK lAysEuz2b8Ug9t7WdWYUm7YW7UoVYEGIyXp7XPN1+AHCPLhnCDFMJOX3SK8Gg1A+ DTqoY6Ith6i/KtZ8DyMAi1YMmPQ9D9gZu+XdQM87pHdoWJw9vfVeUuwOhZsIYkEd y8fPCNg//oynHfy4ET5Bzqk8Uvvr4KEGqHDcmyGmy+fZMw7sdGJHi0MtvgjifOKp qmdBHAqzb/Dl8yMxz5hjZkL54o7F7jChd3X9CJ6Qlt3XyQEYBMxith3t5xrT+ZmN HYIS0vMkPLBQlJ4N7y5LEB7PJW0l9B2NXav7i3inzsQrUyb2+zc= =L3Jg -----END PGP SIGNATURE----- --Vi1lSvuE1bqpC9EL--