From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=kaod.org Authentication-Results: passt.top; dkim=pass (2048-bit key; unprotected) header.d=kaod.org header.i=@kaod.org header.a=rsa-sha256 header.s=ovhmo393970-selector1 header.b=EdPgwEYR; dkim-atps=neutral Received: from 5.mo552.mail-out.ovh.net (5.mo552.mail-out.ovh.net [188.165.45.220]) by passt.top (Postfix) with ESMTPS id 5E3C35A026D for ; Tue, 10 Feb 2026 14:36:54 +0100 (CET) Received: from mxplan5.mail.ovh.net (unknown [10.110.43.14]) by mo552.mail-out.ovh.net (Postfix) with ESMTPS id 4f9Mzn5DyPz5vrJ; Tue, 10 Feb 2026 13:36:53 +0000 (UTC) Received: from kaod.org (37.59.142.104) by DAG14EX2.mxp5.local (172.16.2.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.35; Tue, 10 Feb 2026 14:36:53 +0100 Authentication-Results: garm.ovh; auth=pass (GARM-104R0052697cac5-ffd9-4f3d-b43b-61ebe41cf9dd, 013E132ABECAD38B5622F7607EFF7317793E1520) smtp.auth=clg@kaod.org X-OVh-ClientIp: 82.64.250.170 Message-ID: Date: Tue, 10 Feb 2026 14:36:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: checksum: add VSX fast path for POWER8/POWER9 To: jfiusdq , Laurent Vivier References: <5LdZey3mMxwwzujKCRhh-ZEiJi9nQZTX4Q9YabzgvpDN3XkjpaDXWgfUiS7ig1SlB2HCy7ecA1V6bx5F1rhdpuoElNFGZ7i0Z9TccLSms7U=@proton.me> <00b3e4a1-005b-46a5-b6ce-bf444be9b72d@redhat.com> Content-Language: en-US, fr From: =?UTF-8?Q?C=C3=A9dric_Le_Goater?= Autocrypt: addr=clg@kaod.org; keydata= xsFNBFu8o3UBEADP+oJVJaWm5vzZa/iLgpBAuzxSmNYhURZH+guITvSySk30YWfLYGBWQgeo 8NzNXBY3cH7JX3/a0jzmhDc0U61qFxVgrPqs1PQOjp7yRSFuDAnjtRqNvWkvlnRWLFq4+U5t yzYe4SFMjFb6Oc0xkQmaK2flmiJNnnxPttYwKBPd98WfXMmjwAv7QfwW+OL3VlTPADgzkcqj 53bfZ4VblAQrq6Ctbtu7JuUGAxSIL3XqeQlAwwLTfFGrmpY7MroE7n9Rl+hy/kuIrb/TO8n0 ZxYXvvhT7OmRKvbYuc5Jze6o7op/bJHlufY+AquYQ4dPxjPPVUT/DLiUYJ3oVBWFYNbzfOrV RxEwNuRbycttMiZWxgflsQoHF06q/2l4ttS3zsV4TDZudMq0TbCH/uJFPFsbHUN91qwwaN/+ gy1j7o6aWMz+Ib3O9dK2M/j/O/Ube95mdCqN4N/uSnDlca3YDEWrV9jO1mUS/ndOkjxa34ia 70FjwiSQAsyIwqbRO3CGmiOJqDa9qNvd2TJgAaS2WCw/TlBALjVQ7AyoPEoBPj31K74Wc4GS Rm+FSch32ei61yFu6ACdZ12i5Edt+To+hkElzjt6db/UgRUeKfzlMB7PodK7o8NBD8outJGS tsL2GRX24QvvBuusJdMiLGpNz3uqyqwzC5w0Fd34E6G94806fwARAQABzSBDw6lkcmljIExl IEdvYXRlciA8Y2xnQGthb2Qub3JnPsLBeAQTAQIAIgUCW7yjdQIbAwYLCQgHAwIGFQgCCQoL BBYCAwECHgECF4AACgkQUaNDx8/77KGRSxAAuMJJMhJdj7acTcFtwof7CDSfoVX0owE2FJdd M43hNeTwPWlV5oLCj1BOQo0MVilIpSd9Qu5wqRD8KnN2Bv/rllKPqK2+i8CXymi9hsuzF56m 76wiPwbsX54jhv/VYY9Al7NBknh6iLYJiC/pgacRCHtSj/wofemSCM48s61s1OleSPSSvJE/ jYRa0jMXP98N5IEn8rEbkPua/yrm9ynHqi4dKEBCq/F7WDQ+FfUaFQb4ey47A/aSHstzpgsl TSDTJDD+Ms8y9x2X5EPKXnI3GRLaCKXVNNtrvbUd9LsKymK3WSbADaX7i0gvMFq7j51P/8yj neaUSKSkktHauJAtBNXHMghWm/xJXIVAW8xX5aEiSK7DNp5AM478rDXn9NZFUdLTAScVf7LZ VzMFKR0jAVG786b/O5vbxklsww+YXJGvCUvHuysEsz5EEzThTJ6AC5JM2iBn9/63PKiS3ptJ QAqzasT6KkZ9fKLdK3qtc6yPaSm22C5ROM3GS+yLy6iWBkJ/nEYh/L/du+TLw7YNbKejBr/J ml+V3qZLfuhDjW0GbeJVPzsENuxiNiBbyzlSnAvKlzda/sBDvxmvWhC+nMRQCf47mFr8Xx3w WtDSQavnz3zTa0XuEucpwfBuVdk4RlPzNPri6p2KTBhPEvRBdC9wNOdRBtsP9rAPjd52d73O wU0EW7yjdQEQALyDNNMw/08/fsyWEWjfqVhWpOOrX2h+z4q0lOHkjxi/FRIRLfXeZjFfNQNL SoL8j1y2rQOs1j1g+NV3K5hrZYYcMs0xhmrZKXAHjjDx7FW3sG3jcGjFW5Xk4olTrZwFsZVU cP8XZlArLmkAX3UyrrXEWPSBJCXxDIW1hzwpbV/nVbo/K9XBptT/wPd+RPiOTIIRptjypGY+ S23HYBDND3mtfTz/uY0Jytaio9GETj+fFis6TxFjjbZNUxKpwftu/4RimZ7qL+uM1rG1lLWc 9SPtFxRQ8uLvLOUFB1AqHixBcx7LIXSKZEFUCSLB2AE4wXQkJbApye48qnZ09zc929df5gU6 hjgqV9Gk1rIfHxvTsYltA1jWalySEScmr0iSYBZjw8Nbd7SxeomAxzBv2l1Fk8fPzR7M616d tb3Z3HLjyvwAwxtfGD7VnvINPbzyibbe9c6gLxYCr23c2Ry0UfFXh6UKD83d5ybqnXrEJ5n/ t1+TLGCYGzF2erVYGkQrReJe8Mld3iGVldB7JhuAU1+d88NS3aBpNF6TbGXqlXGF6Yua6n1c OY2Yb4lO/mDKgjXd3aviqlwVlodC8AwI0SdujWryzL5/AGEU2sIDQCHuv1QgzmKwhE58d475 KdVX/3Vt5I9kTXpvEpfW18TjlFkdHGESM/JxIqVsqvhAJkalABEBAAHCwV8EGAECAAkFAlu8 o3UCGwwACgkQUaNDx8/77KEhwg//WqVopd5k8hQb9VVdk6RQOCTfo6wHhEqgjbXQGlaxKHoX ywEQBi8eULbeMQf5l4+tHJWBxswQ93IHBQjKyKyNr4FXseUI5O20XVNYDJZUrhA4yn0e/Af0 IX25d94HXQ5sMTWr1qlSK6Zu79lbH3R57w9jhQm9emQEp785ui3A5U2Lqp6nWYWXz0eUZ0Ta d2zC71Gg9VazU9MXyWn749s0nXbVLcLS0yops302Gf3ZmtgfXTX/W+M25hiVRRKCH88yr6it +OMJBUndQVAA/fE9hYom6t/zqA248j0QAV/pLHH3hSirE1mv+7jpQnhMvatrwUpeXrOiEw1n HzWCqOJUZ4SY+HmGFW0YirWV2mYKoaGO2YBUwYF7O9TI3GEEgRMBIRT98fHa0NPwtlTktVIS l73LpgVscdW8yg9Gc82oe8FzU1uHjU8b10lUXOMHpqDDEV9//r4ZhkKZ9C4O+YZcTFu+mvAY 3GlqivBNkmYsHYSlFsbxc37E1HpTEaSWsGfAHQoPn9qrDJgsgcbBVc1gkUT6hnxShKPp4Pls ZVMNjvPAnr5TEBgHkk54HQRhhwcYv1T2QumQizDiU6iOrUzBThaMhZO3i927SG2DwWDVzZlt KrCMD1aMPvb3NU8FOYRhNmIFR3fcalYr+9gDuVKe8BVz4atMOoktmt0GWTOC8P4= In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [37.59.142.104] X-ClientProxiedBy: DAG6EX1.mxp5.local (172.16.2.51) To DAG14EX2.mxp5.local (172.16.2.142) X-Ovh-Tracer-GUID: fa6ab420-8912-4fc6-a318-75b8a2550623 X-Ovh-Tracer-Id: 3779927464046726109 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: dmFkZTF33EuIlHyT/cL9pbCU7W1sZkUugCnpfGP8PfElYmZh2QtmpaWBYhZN0agxrvKKf13O3XgWWpxNFuNq4EW8pBOXLKxsk3Oy8fiNiQ7ba2Hk+i50Sh5+2XqsswTmemKJJg3iJ/dvTkwxqwqSFg3b6P8DNPn0fzknddWPkE7OQjcI5zGnTyfASLVkLebLxwWulbKmHZFPfSk0Rf7l9xXyG8JvriN1vCNegXcx54JOWq1pIQfpq0MA00VSBf0q7ZBAiNkq4WViRRMCr5YO8TsLkXjHBotPhMs6hydBlL+LoApvsNrXRAXB+M8B5HkZI3tpyo7hCnAsJGl0/PJr1K99xEB0B021Mi+xTRySmWHoasqx3lA3tv3U8TKj7Ge71fyaZYX/MNXf//RkEtuTretzonCIlm24FqGpqjIdGNEpPwLizZ6OEh6NRgEWrtUgIgbRkNi0V1ZTrIiKLiJnX+LkRnFhtRI66aufrRqP6QglS7c4rJsB4rdFQSNcfl+6PZaUf/343VjrOz+WIzQo+GviioUVmyARfieMFv/vyk50/+kMk7affhbi4x6iW2wj4/Uyq56NYOleUIoXGuMgu3iw0eHlfLRTlqZQx7Cz8Tvn5C1vloBJSCyYg+FXIB2R6Mnd79zLstURGyqbq16OCm1BCP+dkIbGl4NsJ6Wh7lnYRJ9dhA DKIM-Signature: a=rsa-sha256; bh=9+lJWKlJtdiqw7w3EbUs29iqcd1qQ1EXVwYRSubPnDg=; c=relaxed/relaxed; d=kaod.org; h=From; s=ovhmo393970-selector1; t=1770730614; v=1; b=EdPgwEYRGW3zY5I8OUIlEt5566IObg7XBlA7+5rYQnIAi8viH9G8L2T9Tl+eu0NjTJP/RgZC asU8LDgrkgulc2YHPzTay14tFw7a3pKsjab0wzi9Giyd+Ors1iYsMfsP7HV0EPbeP0FbPsHXCTF s4lhhkrYaA3pUNNRrp8SnT9BUsuBipTJzx1+DpgNoZioetR9qH0xdEccCrYJkQbfWTsZu0ZdOCF QJtIqC2RVEkb5otLdhCjPYPMUGuTi5OHaWW1LAsknPEooNDQGgeqcMvsKgDdIaHsk/wZUc8R5ru 32aYEpeFTl0n9xeYho/yNIV3D+gHVpM6KfydDWkoCgeUA== X-MailFrom: clg@kaod.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: DLF2AHVSQFQ3RZICRELJ25IABOCBCMRS X-Message-ID-Hash: DLF2AHVSQFQ3RZICRELJ25IABOCBCMRS X-Mailman-Approved-At: Tue, 10 Feb 2026 14:41:38 +0100 CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Hi, On 2/7/26 23:31, jfiusdq wrote: > Microbenchmark of the checksum function vs C version at different buffer sizes: > > > Results (GB/s, higher is better; speedup = VSX / scalar): > > 64B: VSX 4.61 vs scalar 5.91 -> 0.78x (VSX slower for tiny buffers) > 256B: VSX 10.91 vs scalar 7.57 -> 1.44x > 1500B: VSX 13.88 vs scalar 6.89 -> 2.02x > 16KB: VSX 14.53 vs scalar 6.96 -> 2.09x > 64KB: VSX 15.15 vs scalar 6.85 -> 2.21x Could you please share Microbenchmark ? Thanks, C. > On Friday, February 6th, 2026 at 3:17 PM, Laurent Vivier wrote: > >> > >> > >> On Thu, 05 Feb 2026 06:14:40 +0000, jfiusdq jfiusdq@proton.me wrote: >> > >>> Tested with podman on Debian 13 for a while and works ok. It's >>> difficult to run all the tests on POWER but 505-networking-pasta.bats >>> test suite passes. >>> --- >>> checksum.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 107 insertions(+), 3 deletions(-) >>> > >>> diff --git a/checksum.c b/checksum.c >>> index 0c3837c..828f9ec 100644 >>> --- a/checksum.c >>> +++ b/checksum.c >>> @@ -281,7 +281,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, >>> icmp6hr->icmp6_cksum = csum(payload, dlen, psum); >>> } >>> > >>> -#ifdef AVX2 >>> +#if defined(AVX2) >>> #include >>> > >>> /** >>> @@ -479,7 +479,111 @@ uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) >>> > >>> return init; >>> } >>> -#else /* AVX2 / >>> +#elif defined(POWER9_VECTOR) || defined(POWER8_VECTOR) >>> +#include >>> + >>> +/* >>> + * csum_vsx() - Compute 32-bit checksum using VSX SIMD instructions >>> + * @buf: Input buffer >>> + * @len: Input length >>> + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum >>> + * >>> + * Return: 32-bit checksum, not complemented, not folded >>> + / >>> +/ NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) / >>> +attribute((optimize("-fno-strict-aliasing"))) / See csum_16b() */ >>> +static uint32_t csum_vsx(const void buf, size_t len, uint32_t init) >>> +{ >>> + const uint8_t p = buf; >>> + vector unsigned int sum_even = vec_splat_u32(0); >>> + vector unsigned int sum_odd = vec_splat_u32(0); >>> + const vector unsigned short ones = vec_splat_u16(1); >>> + uint64_t sum64 = init; >>> + >>> +#ifdef POWER9_VECTOR >>> + while (len >= 64) { >>> + vector unsigned char v0b = vec_vsx_ld(0, p); >>> + vector unsigned char v1b = vec_vsx_ld(16, p); >>> + vector unsigned char v2b = vec_vsx_ld(32, p); >>> + vector unsigned char v3b = vec_vsx_ld(48, p); >>> + vector unsigned short v0 = (vector unsigned short)v0b; >>> + vector unsigned short v1 = (vector unsigned short)v1b; >>> + vector unsigned short v2 = (vector unsigned short)v2b; >>> + vector unsigned short v3 = (vector unsigned short)v3b; >>> + >>> + sum_even = vec_add(sum_even, vec_mule(v0, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); >>> + sum_even = vec_add(sum_even, vec_mule(v1, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v1, ones)); >>> + sum_even = vec_add(sum_even, vec_mule(v2, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v2, ones)); >>> + sum_even = vec_add(sum_even, vec_mule(v3, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v3, ones)); >>> + >>> + p += 64; >>> + len -= 64; >>> + } >>> +#endif >>> + >>> + while (len >= 32) { >>> + vector unsigned char v0b = vec_vsx_ld(0, p); >>> + vector unsigned char v1b = vec_vsx_ld(16, p); >>> + vector unsigned short v0 = (vector unsigned short)v0b; >>> + vector unsigned short v1 = (vector unsigned short)v1b; >>> + >>> + sum_even = vec_add(sum_even, vec_mule(v0, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); >>> + sum_even = vec_add(sum_even, vec_mule(v1, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v1, ones)); >>> + >>> + p += 32; >>> + len -= 32; >>> + } >>> + >>> + while (len >= 16) { >>> + vector unsigned char v0b = vec_vsx_ld(0, p); >>> + vector unsigned short v0 = (vector unsigned short)v0b; >>> + >>> + sum_even = vec_add(sum_even, vec_mule(v0, ones)); >>> + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); >>> + >>> + p += 16; >>> + len -= 16; >>> + } >>> + >>> + { >>> + vector unsigned int sum32 = vec_add(sum_even, sum_odd); >>> + uint32_t partial[4] attribute((aligned(16))); >>> + >>> + vec_st(sum32, 0, partial); >>> + sum64 += (uint64_t)partial[0] + partial[1] + >>> + partial[2] + partial[3]; >>> + } >>> + >>> + sum64 += sum_16b(p, len); >>> + >>> + sum64 = (sum64 >> 32) + (sum64 & 0xffffffff); >>> + sum64 += sum64 >> 32; >>> + >>> + return (uint32_t)sum64; >>> +} >>> + >>> +/ >>> + * csum_unfolded() - Calculate the unfolded checksum of a data buffer. >>> + * >>> + * @buf: Input buffer >>> + * @len: Input length >>> + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum >>> + * >>> + * Return: 32-bit unfolded checksum >>> + / >>> +/ NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) / >>> +attribute((optimize("-fno-strict-aliasing"))) / See csum_16b() / >>> +uint32_t csum_unfolded(const void buf, size_t len, uint32_t init) >>> +{ >>> + return csum_vsx(buf, len, init); >>> +} >>> +#else / !AVX2 && !POWER9_VECTOR && !POWER8_VECTOR / >>> / >>> * csum_unfolded() - Calculate the unfolded checksum of a data buffer. >>> * >>> @@ -495,7 +599,7 @@ uint32_t csum_unfolded(const void buf, size_t len, uint32_t init) >>> { >>> return sum_16b(buf, len) + init; >>> } >>> -#endif / !AVX2 / >>> +#endif / !AVX2 && !POWER9_VECTOR && !POWER8_VECTOR */ >>> > >>> /** >>> * csum_iov_tail() - Calculate unfolded checksum for the tail of an IO vector >>> -- >>> 2.52.0 >> > >> > >> Reviewed-by: Laurent Vivier lvivier@redhat.com