From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=S1NvYCLb; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 629815A0271 for ; Fri, 06 Feb 2026 16:17:55 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1770391074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=NCw3QOZUx9WpLfqutv6QccrJJBAQZixMAdllV4LwA/4=; b=S1NvYCLbdyxG92juj0VUqin29hbSiVos7LeIUa7vEd1dBS/b9o34ms4kXWO2dEz7kRcbae kWkkppqjM2cyF9qKD7NOHc9IY90GpiVq/GniBxT/3/CGLi0WOGRHFJkmb9DDzCwbPZnORJ GZgf8JL6bnxUowpL6NbnCDJ+5Jwa0fI= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-439-0OctPY3OMCGTxcggLICoeQ-1; Fri, 06 Feb 2026 10:17:52 -0500 X-MC-Unique: 0OctPY3OMCGTxcggLICoeQ-1 X-Mimecast-MFC-AGG-ID: 0OctPY3OMCGTxcggLICoeQ_1770391072 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4801bceb317so31614375e9.1 for ; Fri, 06 Feb 2026 07:17:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770391071; x=1770995871; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:to:subject:from:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NCw3QOZUx9WpLfqutv6QccrJJBAQZixMAdllV4LwA/4=; b=glhlGWfsjoO4AW3Za0MvAWXstH7FUigdCKB86xUtuUzbtbXztWdrMvB3aH4O6/U6DK i/IpXMJ8x4rX0sYaKbBhOPe50sT+bcFYEFW5Cb6wqoZbW+PoZGEqKMvgztcZxgpjyGTj iHqXmJt7Ipleoxts4opp8EzgdusGgI8QLwj8u/G0rpTwlyMgea1PdkLkr+0qVsAbvb+Y PudcFjCriWl2QHJOUoHZxoBBEacx9C6QqP/iAu5+JFaho44WSrXQLop/eORkOtdIEd/G LScrSlMKUkUFdFNqmAxXq61pvfi8jULnS6COWG7k+d1iW3maGPSkF6D1sqb44QZodfGL 8GpQ== X-Forwarded-Encrypted: i=1; AJvYcCU1oGXioskEX6WAiMOqFa2vH11uzSpHYceVLB3sWuZoBFgAbf4XPvjMhT+8ezaiMrQkTgPdxf8UsPo=@passt.top X-Gm-Message-State: AOJu0Ywlp1+GaSRqE0flT+BMqamdR5xHHFSvyXnUfKeMZYmw7+Uhvc3e foXk8bAxKPwJ8wgGCmLjkAJ94hlrZCDep01lrMswas2048G8WLK04t15V9rJ8rQ1ZnSI0YtM22H DOx22auXn6moATrYtynNV20nbhJH4Cp7q7kjmOPKMIYCUYLYBjr5WTQ== X-Gm-Gg: AZuq6aJGOQUujCyR/mvQ26YzFKyRTXquUIDlgLmyscxLvbCVRrZV6EYLwYamKax3oW3 vu6e6cbqPJ2FCxC/6m5qdljyhIZls/R0IM4qWrFOKYUMM+wsUHIqfRV/PVzb+CAytldT2y5FDHH OVRUN7bbxGBrPn0GdgoHMUzuw8Yu5HBrOo4sPMIjJ07x4udzqGt4e+xHvXf96YxeBm5xUJRuP/y LmmUwDqiOp1qxGbpoBBa5ZTruYhRQrLucIz7lRnWHYgqZlponkHHNiEY9Xn2o56+Hw7WAUNY9Z3 zNdEpgH+tc4VVqlMdzpRHTG37qxqsM7ZvKfb6i1S+fa4cM7Lihv/nLYsQTNKaV/Q+XsaaSKVeid GHVwhQk5awlHapuZwAR3NqJdfwg9kQOmO4o523tIx8dQ98uMm X-Received: by 2002:a05:600c:8b56:b0:480:3bba:1ca9 with SMTP id 5b1f17b1804b1-483201fff35mr39198405e9.4.1770391071551; Fri, 06 Feb 2026 07:17:51 -0800 (PST) X-Received: by 2002:a05:600c:8b56:b0:480:3bba:1ca9 with SMTP id 5b1f17b1804b1-483201fff35mr39197815e9.4.1770391070905; Fri, 06 Feb 2026 07:17:50 -0800 (PST) Received: from ?IPV6:2a01:e0a:e10:ef90:343a:68f:2e91:95c? ([2a01:e0a:e10:ef90:343a:68f:2e91:95c]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48320983f18sm30072605e9.8.2026.02.06.07.17.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 Feb 2026 07:17:50 -0800 (PST) Message-ID: <00b3e4a1-005b-46a5-b6ce-bf444be9b72d@redhat.com> Date: Fri, 6 Feb 2026 16:17:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Laurent Vivier Subject: Re: checksum: add VSX fast path for POWER8/POWER9 To: jfiusdq , passt-dev@passt.top References: <5LdZey3mMxwwzujKCRhh-ZEiJi9nQZTX4Q9YabzgvpDN3XkjpaDXWgfUiS7ig1SlB2HCy7ecA1V6bx5F1rhdpuoElNFGZ7i0Z9TccLSms7U=@proton.me> Autocrypt: addr=lvivier@redhat.com; keydata= xsFNBFYFJhkBEAC2me7w2+RizYOKZM+vZCx69GTewOwqzHrrHSG07MUAxJ6AY29/+HYf6EY2 WoeuLWDmXE7A3oJoIsRecD6BXHTb0OYS20lS608anr3B0xn5g0BX7es9Mw+hV/pL+63EOCVm SUVTEQwbGQN62guOKnJJJfphbbv82glIC/Ei4Ky8BwZkUuXd7d5NFJKC9/GDrbWdj75cDNQx UZ9XXbXEKY9MHX83Uy7JFoiFDMOVHn55HnncflUncO0zDzY7CxFeQFwYRbsCXOUL9yBtqLer Ky8/yjBskIlNrp0uQSt9LMoMsdSjYLYhvk1StsNPg74+s4u0Q6z45+l8RAsgLw5OLtTa+ePM JyS7OIGNYxAX6eZk1+91a6tnqfyPcMbduxyBaYXn94HUG162BeuyBkbNoIDkB7pCByed1A7q q9/FbuTDwgVGVLYthYSfTtN0Y60OgNkWCMtFwKxRaXt1WFA5ceqinN/XkgA+vf2Ch72zBkJL RBIhfOPFv5f2Hkkj0MvsUXpOWaOjatiu0fpPo6Hw14UEpywke1zN4NKubApQOlNKZZC4hu6/ 8pv2t4HRi7s0K88jQYBRPObjrN5+owtI51xMaYzvPitHQ2053LmgsOdN9EKOqZeHAYG2SmRW LOxYWKX14YkZI5j/TXfKlTpwSMvXho+efN4kgFvFmP6WT+tPnwARAQABzSNMYXVyZW50IFZp dmllciA8bHZpdmllckByZWRoYXQuY29tPsLBeAQTAQIAIgUCVgVQgAIbAwYLCQgHAwIGFQgC CQoLBBYCAwECHgECF4AACgkQ8ww4vT8vvjwpgg//fSGy0Rs/t8cPFuzoY1cex4limJQfReLr SJXCANg9NOWy/bFK5wunj+h/RCFxIFhZcyXveurkBwYikDPUrBoBRoOJY/BHK0iZo7/WQkur 6H5losVZtrotmKOGnP/lJYZ3H6OWvXzdz8LL5hb3TvGOP68K8Bn8UsIaZJoeiKhaNR0sOJyI YYbgFQPWMHfVwHD/U+/gqRhD7apVysxv5by/pKDln1I5v0cRRH6hd8M8oXgKhF2+rAOL7gvh jEHSSWKUlMjC7YwwjSZmUkL+TQyE18e2XBk85X8Da3FznrLiHZFHQ/NzETYxRjnOzD7/kOVy gKD/o7asyWQVU65mh/ECrtjfhtCBSYmIIVkopoLaVJ/kEbVJQegT2P6NgERC/31kmTF69vn8 uQyW11Hk8tyubicByL3/XVBrq4jZdJW3cePNJbTNaT0d/bjMg5zCWHbMErUib2Nellnbg6bc 2HLDe0NLVPuRZhHUHM9hO/JNnHfvgiRQDh6loNOUnm9Iw2YiVgZNnT4soUehMZ7au8PwSl4I KYE4ulJ8RRiydN7fES3IZWmOPlyskp1QMQBD/w16o+lEtY6HSFEzsK3o0vuBRBVp2WKnssVH qeeV01ZHw0bvWKjxVNOksP98eJfWLfV9l9e7s6TaAeySKRRubtJ+21PRuYAxKsaueBfUE7ZT 7zfOwU0EVgUmGQEQALxSQRbl/QOnmssVDxWhHM5TGxl7oLNJms2zmBpcmlrIsn8nNz0rRyxT 460k2niaTwowSRK8KWVDeAW6ZAaWiYjLlTunoKwvF8vP3JyWpBz0diTxL5o+xpvy/Q6YU3BN efdq8Vy3rFsxgW7mMSrI/CxJ667y8ot5DVugeS2NyHfmZlPGE0Nsy7hlebS4liisXOrN3jFz asKyUws3VXek4V65lHwB23BVzsnFMn/bw/rPliqXGcwl8CoJu8dSyrCcd1Ibs0/Inq9S9+t0 VmWiQWfQkz4rvEeTQkp/VfgZ6z98JRW7S6l6eophoWs0/ZyRfOm+QVSqRfFZdxdP2PlGeIFM C3fXJgygXJkFPyWkVElr76JTbtSHsGWbt6xUlYHKXWo+xf9WgtLeby3cfSkEchACrxDrQpj+ Jt/JFP+q997dybkyZ5IoHWuPkn7uZGBrKIHmBunTco1+cKSuRiSCYpBIXZMHCzPgVDjk4viP brV9NwRkmaOxVvye0vctJeWvJ6KA7NoAURplIGCqkCRwg0MmLrfoZnK/gRqVJ/f6adhU1oo6 z4p2/z3PemA0C0ANatgHgBb90cd16AUxpdEQmOCmdNnNJF/3Zt3inzF+NFzHoM5Vwq6rc1JP jfC3oqRLJzqAEHBDjQFlqNR3IFCIAo4SYQRBdAHBCzkM4rWyRhuVABEBAAHCwV8EGAECAAkF AlYFJhkCGwwACgkQ8ww4vT8vvjwg9w//VQrcnVg3TsjEybxDEUBm8dBmnKqcnTBFmxN5FFtI WlEuY8+YMiWRykd8Ln9RJ/98/ghABHz9TN8TRo2b6WimV64FmlVn17Ri6FgFU3xNt9TTEChq AcNg88eYryKsYpFwegGpwUlaUaaGh1m9OrTzcQy+klVfZWaVJ9Nw0keoGRGb8j4XjVpL8+2x OhXKrM1fzzb8JtAuSbuzZSQPDwQEI5CKKxp7zf76J21YeRrEW4WDznPyVcDTa+tz++q2S/Bp P4W98bXCBIuQgs2m+OflERv5c3Ojldp04/S4NEjXEYRWdiCxN7ca5iPml5gLtuvhJMSy36gl U6IW9kn30IWuSoBpTkgV7rLUEhh9Ms82VWW/h2TxL8enfx40PrfbDtWwqRID3WY8jLrjKfTd R3LW8BnUDNkG+c4FzvvGUs8AvuqxxyHbXAfDx9o/jXfPHVRmJVhSmd+hC3mcQ+4iX5bBPBPM oDqSoLt5w9GoQQ6gDVP2ZjTWqwSRMLzNr37rJjZ1pt0DCMMTbiYIUcrhX8eveCJtY7NGWNyx FCRkhxRuGcpwPmRVDwOl39MB3iTsRighiMnijkbLXiKoJ5CDVvX5yicNqYJPKh5MFXN1bvsB kmYiStMRbrD0HoY1kx5/VozBtc70OU0EB8Wrv9hZD+Ofp0T3KOr1RUHvCZoLURfFhSQ= In-Reply-To: <5LdZey3mMxwwzujKCRhh-ZEiJi9nQZTX4Q9YabzgvpDN3XkjpaDXWgfUiS7ig1SlB2HCy7ecA1V6bx5F1rhdpuoElNFGZ7i0Z9TccLSms7U=@proton.me> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: kzdvX5S7BEftmweZcIKBOpp2HBPsDGLlmAqSjHMCMOE_1770391072 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: MSYH7ET6VJ36FECFXSQYBRE7OXXOLVA2 X-Message-ID-Hash: MSYH7ET6VJ36FECFXSQYBRE7OXXOLVA2 X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 05 Feb 2026 06:14:40 +0000, jfiusdq wrote: > Tested with podman on Debian 13 for a while and works ok. It's > difficult to run all the tests on POWER but 505-networking-pasta.bats > test suite passes. > --- > checksum.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 107 insertions(+), 3 deletions(-) > > diff --git a/checksum.c b/checksum.c > index 0c3837c..828f9ec 100644 > --- a/checksum.c > +++ b/checksum.c > @@ -281,7 +281,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, > icmp6hr->icmp6_cksum = csum(payload, dlen, psum); > } > > -#ifdef __AVX2__ > +#if defined(__AVX2__) > #include > > /** > @@ -479,7 +479,111 @@ uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) > > return init; > } > -#else /* __AVX2__ */ > +#elif defined(__POWER9_VECTOR__) || defined(__POWER8_VECTOR__) > +#include > + > +/** > + * csum_vsx() - Compute 32-bit checksum using VSX SIMD instructions > + * @buf: Input buffer > + * @len: Input length > + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > + * > + * Return: 32-bit checksum, not complemented, not folded > + */ > +/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ > +__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ > +static uint32_t csum_vsx(const void *buf, size_t len, uint32_t init) > +{ > + const uint8_t *p = buf; > + vector unsigned int sum_even = vec_splat_u32(0); > + vector unsigned int sum_odd = vec_splat_u32(0); > + const vector unsigned short ones = vec_splat_u16(1); > + uint64_t sum64 = init; > + > +#ifdef __POWER9_VECTOR__ > + while (len >= 64) { > + vector unsigned char v0b = vec_vsx_ld(0, p); > + vector unsigned char v1b = vec_vsx_ld(16, p); > + vector unsigned char v2b = vec_vsx_ld(32, p); > + vector unsigned char v3b = vec_vsx_ld(48, p); > + vector unsigned short v0 = (vector unsigned short)v0b; > + vector unsigned short v1 = (vector unsigned short)v1b; > + vector unsigned short v2 = (vector unsigned short)v2b; > + vector unsigned short v3 = (vector unsigned short)v3b; > + > + sum_even = vec_add(sum_even, vec_mule(v0, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); > + sum_even = vec_add(sum_even, vec_mule(v1, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v1, ones)); > + sum_even = vec_add(sum_even, vec_mule(v2, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v2, ones)); > + sum_even = vec_add(sum_even, vec_mule(v3, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v3, ones)); > + > + p += 64; > + len -= 64; > + } > +#endif > + > + while (len >= 32) { > + vector unsigned char v0b = vec_vsx_ld(0, p); > + vector unsigned char v1b = vec_vsx_ld(16, p); > + vector unsigned short v0 = (vector unsigned short)v0b; > + vector unsigned short v1 = (vector unsigned short)v1b; > + > + sum_even = vec_add(sum_even, vec_mule(v0, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); > + sum_even = vec_add(sum_even, vec_mule(v1, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v1, ones)); > + > + p += 32; > + len -= 32; > + } > + > + while (len >= 16) { > + vector unsigned char v0b = vec_vsx_ld(0, p); > + vector unsigned short v0 = (vector unsigned short)v0b; > + > + sum_even = vec_add(sum_even, vec_mule(v0, ones)); > + sum_odd = vec_add(sum_odd, vec_mulo(v0, ones)); > + > + p += 16; > + len -= 16; > + } > + > + { > + vector unsigned int sum32 = vec_add(sum_even, sum_odd); > + uint32_t partial[4] __attribute__((aligned(16))); > + > + vec_st(sum32, 0, partial); > + sum64 += (uint64_t)partial[0] + partial[1] + > + partial[2] + partial[3]; > + } > + > + sum64 += sum_16b(p, len); > + > + sum64 = (sum64 >> 32) + (sum64 & 0xffffffff); > + sum64 += sum64 >> 32; > + > + return (uint32_t)sum64; > +} > + > +/** > + * csum_unfolded() - Calculate the unfolded checksum of a data buffer. > + * > + * @buf: Input buffer > + * @len: Input length > + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > + * > + * Return: 32-bit unfolded checksum > + */ > +/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ > +__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ > +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) > +{ > + return csum_vsx(buf, len, init); > +} > +#else /* !__AVX2__ && !__POWER9_VECTOR__ && !__POWER8_VECTOR__ */ > /** > * csum_unfolded() - Calculate the unfolded checksum of a data buffer. > * > @@ -495,7 +599,7 @@ uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) > { > return sum_16b(buf, len) + init; > } > -#endif /* !__AVX2__ */ > +#endif /* !__AVX2__ && !__POWER9_VECTOR__ && !__POWER8_VECTOR__ */ > > /** > * csum_iov_tail() - Calculate unfolded checksum for the tail of an IO vector > -- > 2.52.0 Reviewed-by: Laurent Vivier