On Fri, Feb 02, 2024 at 03:11:30PM +0100, Laurent Vivier wrote:
> if buffer is not aligned use sum_16b() only on the not aligned
> part, and then use csum() on the remaining part
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  checksum.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/checksum.c b/checksum.c
> index f21c9b7a14d1..c94980771c63 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -407,7 +407,19 @@ less_than_128_bytes:
>  __attribute__((optimize("-fno-strict-aliasing")))	/* See csum_16b() */
>  uint16_t csum(const void *buf, size_t len, uint32_t init)
>  {
> -	return (uint16_t)~csum_fold(csum_avx2(buf, len, init));
> +	intptr_t align = ((intptr_t)buf + 0x1f) & ~(intptr_t)0x1f;

Wonder if its worth adding an ALIGN_UP macro.

> +	unsigned int pad = align - (intptr_t)buf;
> +
> +	if (len < pad)
> +		pad = len;
> +
> +	if (pad)
> +		init += sum_16b(buf, pad);
> +
> +	if (len > pad)
> +		init = csum_avx2((void *)align, len - pad, init);
> +
> +	return (uint16_t)~csum_fold(init);
>  }
>  
>  #else /* __AVX2__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson