From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 1017F5A0281 for ; Thu, 15 Feb 2024 01:40:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707957620; bh=MRnYJJKGxdxAf9tqp7kW8nYXOonLfnVexiBq2Wp5GQU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Z+5w8C06TcWLB0E4UlPJzEiKApTg7ujrWxYoN9Oe3uOGggAoDecOyXjf9SJYJmFiE GdGZuZzMzTbzTtb9jrf3jlaiB70mcZdmxLetK5tdUhja+wpNMbXnd/qiiRhptmfvDc OBG41BD6Xhty0aH+lMD1usvZbJxDOG/gNiKgyyQkoaZklKWNrW90BglhuFzxkQvbmY TmC57zg9orHRB5+veCDy8AK5CF3Vp+yVvkjJxuo7uRbnwB8YCUOd3toQKb8b8YuFya N0jIgs0JMJDT1vc/hilUfS6w2NqkVSJoMDukbUoHRG1mlKNSvm1DMty5VOYhrskco4 aljuvzgxbF/Ug== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TZx6r53Kyz4wx6; Thu, 15 Feb 2024 11:40:20 +1100 (AEDT) Date: Thu, 15 Feb 2024 11:40:09 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH v2 3/8] checksum: align buffers Message-ID: References: <20240214085628.210783-1-lvivier@redhat.com> <20240214085628.210783-4-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="WZYEWkimx/7jeKII" Content-Disposition: inline In-Reply-To: <20240214085628.210783-4-lvivier@redhat.com> Message-ID-Hash: MILLBCKTX5BDBSO4VJ5FATAWNAZVXIUY X-Message-ID-Hash: MILLBCKTX5BDBSO4VJ5FATAWNAZVXIUY X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --WZYEWkimx/7jeKII Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 14, 2024 at 09:56:23AM +0100, Laurent Vivier wrote: > if buffer is not aligned use sum_16b() only on the not aligned Nit: s/if/If/ > part, and then use csum_avx2() on the remaining part >=20 > Remove unneeded now function csum_unaligned(). >=20 > Signed-off-by: Laurent Vivier Reviewed-by: David Gibson > --- >=20 > Notes: > v2: > - use ROUND_UP() and sizeof(__m256i) > - fix function comment > - remove csum_unaligned() and use csum() instead >=20 > checksum.c | 47 ++++++++++++++++++++++++----------------------- > 1 file changed, 24 insertions(+), 23 deletions(-) >=20 > diff --git a/checksum.c b/checksum.c > index f21c9b7a14d1..65486b4625ba 100644 > --- a/checksum.c > +++ b/checksum.c > @@ -56,6 +56,8 @@ > #include > #include > =20 > +#include "util.h" > + > /* Checksums are optional for UDP over IPv4, so we usually just set > * them to 0. Change this to 1 to calculate real UDP over IPv4 > * checksums > @@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum) > return sum; > } > =20 > -/** > - * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte alig= ned data > - * @buf: Input data > - * @len: Input length > - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > - * > - * Return: 16-bit IPv4-style checksum > - */ > -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ > -__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ > -uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) > -{ > - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); > -} > +uint16_t csum(const void *buf, size_t len, uint32_t init); > =20 > /** > * csum_ip4_header() - Calculate and set IPv4 header checksum > @@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, = uint32_t init) > void csum_ip4_header(struct iphdr *ip4h) > { > ip4h->check =3D 0; > - ip4h->check =3D csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0); > + ip4h->check =3D csum(ip4h, (size_t)ip4h->ihl * 4, 0); > } > =20 > /** > @@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr, > + htons(IPPROTO_UDP); > /* Add in partial checksum for the UDP header alone */ > psum +=3D sum_16b(udp4hr, sizeof(*udp4hr)); > - udp4hr->check =3D csum_unaligned(payload, len, psum); > + udp4hr->check =3D csum(payload, len, psum); > } > } > =20 > @@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *= payload, size_t len) > /* Partial checksum for ICMP header alone */ > psum =3D sum_16b(icmp4hr, sizeof(*icmp4hr)); > =20 > - icmp4hr->checksum =3D csum_unaligned(payload, len, psum); > + icmp4hr->checksum =3D csum(payload, len, psum); > } > =20 > /** > @@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr, > udp6hr->check =3D 0; > /* Add in partial checksum for the UDP header alone */ > psum +=3D sum_16b(udp6hr, sizeof(*udp6hr)); > - udp6hr->check =3D csum_unaligned(payload, len, psum); > + udp6hr->check =3D csum(payload, len, psum); > } > =20 > /** > @@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, > icmp6hr->icmp6_cksum =3D 0; > /* Add in partial checksum for the ICMPv6 header alone */ > psum +=3D sum_16b(icmp6hr, sizeof(*icmp6hr)); > - icmp6hr->icmp6_cksum =3D csum_unaligned(payload, len, psum); > + icmp6hr->icmp6_cksum =3D csum(payload, len, psum); > } > =20 > #ifdef __AVX2__ > @@ -397,17 +386,29 @@ less_than_128_bytes: > =20 > /** > * csum() - Compute TCP/IP-style checksum > - * @buf: Input buffer, must be aligned to 32-byte boundary > + * @buf: Input buffer > * @len: Input length > * @init: Initial 32-bit checksum, 0 for no pre-computed checksum > * > - * Return: 16-bit folded, complemented checksum sum > + * Return: 16-bit folded, complemented checksum > */ > /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ > __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ > uint16_t csum(const void *buf, size_t len, uint32_t init) > { > - return (uint16_t)~csum_fold(csum_avx2(buf, len, init)); > + intptr_t align =3D ROUND_UP((intptr_t)buf, sizeof(__m256i)); > + unsigned int pad =3D align - (intptr_t)buf; > + > + if (len < pad) > + pad =3D len; > + > + if (pad) > + init +=3D sum_16b(buf, pad); > + > + if (len > pad) > + init =3D csum_avx2((void *)align, len - pad, init); > + > + return (uint16_t)~csum_fold(init); > } > =20 > #else /* __AVX2__ */ > @@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t i= nit) > __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ > uint16_t csum(const void *buf, size_t len, uint32_t init) > { > - return csum_unaligned(buf, len, init); > + return (uint16_t)~csum_fold(sum_16b(buf, len) + init); > } > =20 > #endif /* !__AVX2__ */ --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --WZYEWkimx/7jeKII Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXNXWgACgkQzQJF27ox 2GcNGA//ZClbkvFiPHsQtPmMXkuC6+7YFOdGB43YqsirTgxaN4gW3Dzuob7jMPIx +wmYqBcf/NMjskUXaqK2lHrYIRN3F+2eACIKiR6ORKjwyniBVCe/tA6SMtR2VPTp ZJLUcN2uMfkS7thq4Uoi4+YgDFQa/HOVE8qznJlTVCRLSC4c/pqiohCUc3BxsfBm qAhEExB5bAjkqxS05zd2f5X+Q/hwvRa67GA1JIB28/GsnXaoR+Mc9CpKMaHGWJqn mtFfEnPNnsm6LLSpidcrUWxQxwoVVdfx86R0qgYjytasv3gx5FhL49RJOP9YALob Lngr2eQ5DhDFdjaJo0T/I74gvuA/FoS9HzG7xSdVkPDWRDulpqOzU8W/r00wyWsf QpdqJ2DFRD9jXLaVlE1buU24Fu/2BXQURC6QjBK0bbZRD3QhhiKjt9XXI3pEIGaF 92nWoGKhKFYsZTrdYTtGPa/3BdMSvNl2SZpi9wxELDCMYLqN3Jct6W+4pdaZTQtU VDQDa44rvuA4odijKlPnvhVS0qIHzbwzqxMnUoPxWhZOUuJH8SrfA7FkBsLT8JAW Qpzf8oWmMstFLQNp25Ul4uh2DMNDjE4rMBOqmuGnSrZPt4ZgdBDBaWc0By+iERke jzDvOLEuAcNsG3UVVEaHVB2FXVFUVrDh5u39fLWxVWhcMfyajKo= =G67+ -----END PGP SIGNATURE----- --WZYEWkimx/7jeKII--