From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=proton.me Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=proton.me header.i=@proton.me header.a=rsa-sha256 header.s=protonmail header.b=eMO3JtVX; dkim-atps=neutral Received: from mail-43166.protonmail.ch (mail-43166.protonmail.ch [185.70.43.166]) by passt.top (Postfix) with ESMTPS id 66C805A0623 for ; Sat, 07 Feb 2026 23:31:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1770503493; x=1770762693; bh=L6ZFNw4td7MKbB0VhyN1U9aa3q94O0qRpl72vWz/YBM=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=eMO3JtVXtopIioN2rMJe15ZHLG1QryjlIGWJ3iOoeeEmCHdvgxm/k/noSEVTxr1Tx Jccwqj13wBAhjZR3J6uHJSZDuEz3uA+FpKHOc4BE1SEa3g8ovffuPEDiDN9YhV50nh PphmTeSpa8DXxrm4/V+7kvR1RRttWkzzDZPHg0+JmpHEGnrkk5j5+K9d4IClhuXIg/ 6JSBkhTFeK0ARBcD1mLHRv+iCqRGt1zmVy5wbPxF5yUlbbJSeb/xAzcCw3Wp1SK3Kb UEaA/MDVQjsv6j2ErvTYGbQZW8mLvtSDfBpBr91OH/oKbj0xIVKT9gbVjQ0P1VA62Z MhbqoC9oJ01ZA== Date: Sat, 07 Feb 2026 22:31:29 +0000 To: Laurent Vivier From: jfiusdq Subject: Re: checksum: add VSX fast path for POWER8/POWER9 Message-ID: In-Reply-To: <00b3e4a1-005b-46a5-b6ce-bf444be9b72d@redhat.com> References: <5LdZey3mMxwwzujKCRhh-ZEiJi9nQZTX4Q9YabzgvpDN3XkjpaDXWgfUiS7ig1SlB2HCy7ecA1V6bx5F1rhdpuoElNFGZ7i0Z9TccLSms7U=@proton.me> <00b3e4a1-005b-46a5-b6ce-bf444be9b72d@redhat.com> Feedback-ID: 167378330:user:proton X-Pm-Message-ID: eb89848f6b21d08d92b1d476ff37d69af03244fb MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha512; boundary="------36ad195056231589425f682ee190d607fb784ca44dfdedfcc9f7a64d6fe98936"; charset=utf-8 X-MailFrom: jfiusdq@proton.me X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: KTP3F4FI3QCDVTHAEYWK7YDBX7VZIHO6 X-Message-ID-Hash: KTP3F4FI3QCDVTHAEYWK7YDBX7VZIHO6 X-Mailman-Approved-At: Sun, 08 Feb 2026 00:58:18 +0100 CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------36ad195056231589425f682ee190d607fb784ca44dfdedfcc9f7a64d6fe98936 Content-Type: multipart/mixed;boundary=---------------------d074f6c04668a3f651ffde1378e803e5 -----------------------d074f6c04668a3f651ffde1378e803e5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain;charset=utf-8 Microbenchmark of the checksum function vs C version at different buffer s= izes: Results (GB/s, higher is better; speedup =3D VSX / scalar): 64B: VSX 4.61 vs scalar 5.91 -> 0.78x (VSX slower for tiny buffers) 256B: VSX 10.91 vs scalar 7.57 -> 1.44x 1500B: VSX 13.88 vs scalar 6.89 -> 2.02x 16KB: VSX 14.53 vs scalar 6.96 -> 2.09x 64KB: VSX 15.15 vs scalar 6.85 -> 2.21x On Friday, February 6th, 2026 at 3:17 PM, Laurent Vivier wrote: > = > = > On Thu, 05 Feb 2026 06:14:40 +0000, jfiusdq jfiusdq@proton.me wrote: > = > > Tested with podman on Debian 13 for a while and works ok. It's > > difficult to run all the tests on POWER but 505-networking-pasta.bats > > test suite passes. > > --- > > checksum.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 107 insertions(+), 3 deletions(-) > > = > > diff --git a/checksum.c b/checksum.c > > index 0c3837c..828f9ec 100644 > > --- a/checksum.c > > +++ b/checksum.c > > @@ -281,7 +281,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, > > icmp6hr->icmp6_cksum =3D csum(payload, dlen, psum); > > } > > = > > -#ifdef AVX2 > > +#if defined(AVX2) > > #include > > = > > /** > > @@ -479,7 +479,111 @@ uint32_t csum_unfolded(const void *buf, size_t l= en, uint32_t init) > > = > > return init; > > } > > -#else /* AVX2 / > > +#elif defined(POWER9_VECTOR) || defined(POWER8_VECTOR) > > +#include > > + > > +/* > > + * csum_vsx() - Compute 32-bit checksum using VSX SIMD instructions > > + * @buf: Input buffer > > + * @len: Input length > > + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > > + * > > + * Return: 32-bit checksum, not complemented, not folded > > + / > > +/ NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) / > > +attribute((optimize("-fno-strict-aliasing"))) / See csum_16b() */ > > +static uint32_t csum_vsx(const void buf, size_t len, uint32_t init) > > +{ > > + const uint8_t p =3D buf; > > + vector unsigned int sum_even =3D vec_splat_u32(0); > > + vector unsigned int sum_odd =3D vec_splat_u32(0); > > + const vector unsigned short ones =3D vec_splat_u16(1); > > + uint64_t sum64 =3D init; > > + > > +#ifdef POWER9_VECTOR > > + while (len >=3D 64) { > > + vector unsigned char v0b =3D vec_vsx_ld(0, p); > > + vector unsigned char v1b =3D vec_vsx_ld(16, p); > > + vector unsigned char v2b =3D vec_vsx_ld(32, p); > > + vector unsigned char v3b =3D vec_vsx_ld(48, p); > > + vector unsigned short v0 =3D (vector unsigned short)v0b; > > + vector unsigned short v1 =3D (vector unsigned short)v1b; > > + vector unsigned short v2 =3D (vector unsigned short)v2b; > > + vector unsigned short v3 =3D (vector unsigned short)v3b; > > + > > + sum_even =3D vec_add(sum_even, vec_mule(v0, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v0, ones)); > > + sum_even =3D vec_add(sum_even, vec_mule(v1, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v1, ones)); > > + sum_even =3D vec_add(sum_even, vec_mule(v2, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v2, ones)); > > + sum_even =3D vec_add(sum_even, vec_mule(v3, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v3, ones)); > > + > > + p +=3D 64; > > + len -=3D 64; > > + } > > +#endif > > + > > + while (len >=3D 32) { > > + vector unsigned char v0b =3D vec_vsx_ld(0, p); > > + vector unsigned char v1b =3D vec_vsx_ld(16, p); > > + vector unsigned short v0 =3D (vector unsigned short)v0b; > > + vector unsigned short v1 =3D (vector unsigned short)v1b; > > + > > + sum_even =3D vec_add(sum_even, vec_mule(v0, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v0, ones)); > > + sum_even =3D vec_add(sum_even, vec_mule(v1, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v1, ones)); > > + > > + p +=3D 32; > > + len -=3D 32; > > + } > > + > > + while (len >=3D 16) { > > + vector unsigned char v0b =3D vec_vsx_ld(0, p); > > + vector unsigned short v0 =3D (vector unsigned short)v0b; > > + > > + sum_even =3D vec_add(sum_even, vec_mule(v0, ones)); > > + sum_odd =3D vec_add(sum_odd, vec_mulo(v0, ones)); > > + > > + p +=3D 16; > > + len -=3D 16; > > + } > > + > > + { > > + vector unsigned int sum32 =3D vec_add(sum_even, sum_odd); > > + uint32_t partial[4] attribute((aligned(16))); > > + > > + vec_st(sum32, 0, partial); > > + sum64 +=3D (uint64_t)partial[0] + partial[1] + > > + partial[2] + partial[3]; > > + } > > + > > + sum64 +=3D sum_16b(p, len); > > + > > + sum64 =3D (sum64 >> 32) + (sum64 & 0xffffffff); > > + sum64 +=3D sum64 >> 32; > > + > > + return (uint32_t)sum64; > > +} > > + > > +/ > > + * csum_unfolded() - Calculate the unfolded checksum of a data buffer= . > > + * > > + * @buf: Input buffer > > + * @len: Input length > > + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > > + * > > + * Return: 32-bit unfolded checksum > > + / > > +/ NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) / > > +attribute((optimize("-fno-strict-aliasing"))) / See csum_16b() / > > +uint32_t csum_unfolded(const void buf, size_t len, uint32_t init) > > +{ > > + return csum_vsx(buf, len, init); > > +} > > +#else / !AVX2 && !POWER9_VECTOR && !POWER8_VECTOR / > > / > > * csum_unfolded() - Calculate the unfolded checksum of a data buffer. > > * > > @@ -495,7 +599,7 @@ uint32_t csum_unfolded(const void buf, size_t len,= uint32_t init) > > { > > return sum_16b(buf, len) + init; > > } > > -#endif / !AVX2 / > > +#endif / !AVX2 && !POWER9_VECTOR && !POWER8_VECTOR */ > > = > > /** > > * csum_iov_tail() - Calculate unfolded checksum for the tail of an IO = vector > > -- > > 2.52.0 > = > = > Reviewed-by: Laurent Vivier lvivier@redhat.com -----------------------d074f6c04668a3f651ffde1378e803e5 Content-Type: application/pgp-keys; filename="publickey - jfiusdq@proton.me - 0x344F580A.asc"; name="publickey - jfiusdq@proton.me - 0x344F580A.asc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="publickey - jfiusdq@proton.me - 0x344F580A.asc"; name="publickey - jfiusdq@proton.me - 0x344F580A.asc" LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgp4ak1FYVJXRTZ4WUpLd1lCQkFI YVJ3OEJBUWRBZi92VUlPRVlEcjl2OHZEQ1RicVlJdnhoUlJ0MjBhTm8KMTF1OUVuVGdyQ0hOSldw bWFYVnpaSEZBY0hKdmRHOXVMbTFsSUR4cVptbDFjMlJ4UUhCeWIzUnZiaTV0ClpUN0N3QkVFRXhZ S0FJTUZnbWtWaE9zREN3a0hDUkR0dzkxdmxVeGhSRVVVQUFBQUFBQWNBQ0J6WVd4MApRRzV2ZEdG MGFXOXVjeTV2Y0dWdWNHZHdhbk11YjNKbk5wY3RzQ0hKRjBqUEZLY3Vod1lCQ1hMZ2Y0S28KZHFy OWdXSmR4bVVPem5jREZRb0lCQllBQWdFQ0dRRUNtd01DSGdFV0lRUTBUMWdLY2hVRDBBWUZvajN0 Cnc5MXZsVXhoUkFBQWorWUEvMlFSRmkvOElBNFU4YWszNVBmN1hDWHdRQkxpTUZDUW16MDhYOWs1 SkNkZgpBUDBSWUpIWHdoZGZ1WUZBT0dsaE9iVzFQejBBM2FLM2dUa09SbE1VSTk5YURzNDRCR2tW aE9zU0Npc0cKQVFRQmwxVUJCUUVCQjBDbmlSQ21tN0MrV2tOaXk5RVcrZFo4MHhsRnZqbk90MXRj TkdrZUJDUVhOUU1CCkNBZkN2Z1FZRmdvQWNBV0NhUldFNndrUTdjUGRiNVZNWVVSRkZBQUFBQUFB SEFBZ2MyRnNkRUJ1YjNSaApkR2x2Ym5NdWIzQmxibkJuY0dwekxtOXlaNUVQUDJ6b3diOG9zZUFP MjRKQjdhMGxRWEVSZnJWK0t4Wm8KM214bldMdldBcHNNRmlFRU5FOVlDbklWQTlBR0JhSTk3Y1Bk YjVWTVlVUUFBQTdRQVAwWjcreHdQRU9hCmpnT3FKR0Z0TEN1M1hsUXhJNUVKSThxdElMc3Z2Vk5i dGdEK01UdjNJeGRvRmUrWmkxeGxnaGd3cjRBYwppdENwamlRL21Jc2lWTXlQb1EwPQo9U2tMTQot LS0tLUVORCBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCg== -----------------------d074f6c04668a3f651ffde1378e803e5-- --------36ad195056231589425f682ee190d607fb784ca44dfdedfcc9f7a64d6fe98936 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: ProtonMail wrsEARYKAG0FgmmHvTIJEO3D3W+VTGFERRQAAAAAABwAIHNhbHRAbm90YXRp b25zLm9wZW5wZ3Bqcy5vcmcS1/Q+tVguP7f9974FgUqwRFYuSIUGy3HjvYaR CC9gthYhBDRPWApyFQPQBgWiPe3D3W+VTGFEAACt2AEAgg3aqX16a1dj+2kq 3ZLi5CneJFyAWbqWZuETEmAOVEsBALXp/enrkP8oQrzU2fhe0DrC6zRlZkT3 xgnoJDVanm4B =tinf -----END PGP SIGNATURE----- --------36ad195056231589425f682ee190d607fb784ca44dfdedfcc9f7a64d6fe98936--