From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202410 header.b=PbDn5KTO; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 0BEDA5A004E for ; Mon, 04 Nov 2024 02:17:17 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202410; t=1730683030; bh=t4lnxh9eQFZlNVV2gl3ktmhCzfIsDVFQ6bU3tmV7ZnY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PbDn5KTOqTwptSXMRxmFPaga+al6B3ahFl5exlhioPqWmmZgIGQTjBN9wv4F6Dz79 vgPl/ADy8zRlmvLPLynCmpHSxR/oW46N7DubUXvtQsgK6OIJcLTKFdFEUTBRIaZm5t s3ZMiJE/2ZAGeJM3CyH+uoMoaYJ42dzhD1thiMrUuq3xB1ApnAJoj5etfauNks3vvh fj2zCluGoBGOeEtSW8FvOaHB9KVaZkYIoFBjW/gtGhwETNofxbLK9JU2B+bpFhFf27 xt8qyXIch21VaWSKhnIhHRYjUnAsKoU2bu5NDd4hmBoirLVvSwzwBGZtsysr1gYJEr YKpV5vlEjY3gQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XhYTy2qwlz4x8X; Mon, 4 Nov 2024 12:17:10 +1100 (AEDT) Date: Mon, 4 Nov 2024 12:16:58 +1100 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v2] tcp: unify payload and flags l2 frames array Message-ID: References: <20241102005132.1408340-1-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Lwh07Q5/nv2zWIm7" Content-Disposition: inline In-Reply-To: <20241102005132.1408340-1-jmaloy@redhat.com> Message-ID-Hash: JY5OXM5MDLZXRQM7DWL5FMYKR2XPOGZI X-Message-ID-Hash: JY5OXM5MDLZXRQM7DWL5FMYKR2XPOGZI X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Lwh07Q5/nv2zWIm7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 01, 2024 at 08:51:32PM -0400, Jon Maloy wrote: > In order to reduce static memory and code footprint, we merge > the array for l2 flag frames into the one for payload frames. >=20 > This change also ensures that no flag message will be sent out > over the l2 media bypassing already queued payload messages. >=20 > Performance measurements with iperf3, where we force all > traffic via the tap queue, show no significant difference: >=20 > Dual traffic both directions sinmultaneously, with patch: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > host->ns: > -------- > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-100.00 sec 36.3 GBytes 3.12 Gbits/sec 4759 se= nder > [ 5] 0.00-100.04 sec 36.3 GBytes 3.11 Gbits/sec rec= eiver >=20 > ns->host: > --------- > [ ID] Interval Transfer Bitrate > [ 5] 0.00-100.00 sec 321 GBytes 27.6 Gbits/sec rec= eiver Not really anything to do with this patch, but it's mildly concerning that the throughput is different by an order of magnitude between the two directions. We should probably look into that, in our copious free time :/. > Dual traffic both directions sinmultaneously, without patch: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > host->ns: > -------- > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-100.00 sec 35.0 GBytes 3.01 Gbits/sec 6001 se= nder > [ 5] 0.00-100.04 sec 34.8 GBytes 2.99 Gbits/sec rec= eiver >=20 > ns->host > -------- > [ ID] Interval Transfer Bitrate > [ 5] 0.00-100.00 sec 345 GBytes 29.6 Gbits/sec rec= eiver >=20 > Single connection, with patch: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > host->ns: > --------- > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-100.00 sec 138 GBytes 11.8 Gbits/sec 922 sen= der > [ 5] 0.00-100.04 sec 138 GBytes 11.8 Gbits/sec rec= eiver >=20 > ns->host: > ----------- > [ ID] Interval Transfer Bitrate > [ 5] 0.00-100.00 sec 430 GBytes 36.9 Gbits/sec rec= eiver >=20 > Single connection, without patch: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > host->ns: > ------------ > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-100.00 sec 139 GBytes 11.9 Gbits/sec 900 sen= der > [ 5] 0.00-100.04 sec 139 GBytes 11.9 Gbits/sec rec= eiver >=20 > ns->host: > --------- > [ ID] Interval Transfer Bitrate > [ 5] 0.00-100.00 sec 440 GBytes 37.8 Gbits/sec rec= eiver >=20 > Signed-off-by: Jon Maloy >=20 > ---- > v2: - Adapted to and rebased on latest release. > - Removed redundant tcp_flags_push() function. > - Added measurement results. >=20 > Signed-off-by: Jon Maloy Generally LGTM. One problem, which is pretty simple, but I think really does need to be fixed. See below. > --- > tcp.c | 1 - > tcp_buf.c | 66 ++++++++++++-------------------------------------- > tcp_buf.h | 1 - > tcp_internal.h | 15 ------------ > 4 files changed, 16 insertions(+), 67 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 10ad06a..b17d5fe 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -936,7 +936,6 @@ bool tcp_flow_defer(const struct tcp_tap_conn *conn) > /* cppcheck-suppress [constParameterPointer, unmatchedSuppression] */ > void tcp_defer_handler(struct ctx *c) > { > - tcp_flags_flush(c); > tcp_payload_flush(c); > } > =20 > diff --git a/tcp_buf.c b/tcp_buf.c > index 274e313..58df4cc 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -33,6 +33,7 @@ > #include "tcp_internal.h" > #include "tcp_buf.h" > =20 > +#define PAYLOAD_FLAGS htons(0x5010) /* doff =3D 5, ack =3D 1 */ > #define TCP_FRAMES_MEM 128 > #define TCP_FRAMES \ > (c->mode =3D=3D MODE_PASTA ? 1 : TCP_FRAMES_MEM) > @@ -59,22 +60,10 @@ static_assert(MSS6 <=3D sizeof(tcp_payload[0].data), = "MSS6 is greater than 65516") > static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp_payload_used; > =20 > -static struct tap_hdr tcp_flags_tap_hdr[TCP_FRAMES_MEM]; > -/* IPv4 headers for TCP segment without payload */ > -static struct iphdr tcp4_flags_ip[TCP_FRAMES_MEM]; > -/* TCP segments without payload for IPv4 frames */ > -static struct tcp_flags_t tcp_flags[TCP_FRAMES_MEM]; > - > -static unsigned int tcp_flags_used; > - > -/* IPv6 headers for TCP segment without payload */ > -static struct ipv6hdr tcp6_flags_ip[TCP_FRAMES_MEM]; > - > /* recvmsg()/sendmsg() data for tap */ > static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; > =20 > static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > -static struct iovec tcp_l2_flags_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > =20 > /** > * tcp_update_l2_buf() - Update Ethernet header buffers with addresses > @@ -107,13 +96,6 @@ void tcp_sock_iov_init(const struct ctx *c) > tcp_payload[i].th.ack =3D 1; > } > =20 > - for (i =3D 0; i < ARRAY_SIZE(tcp_flags); i++) { > - tcp6_flags_ip[i] =3D ip6; > - tcp4_flags_ip[i] =3D iph; > - tcp_flags[i].th.doff =3D sizeof(struct tcphdr) / 4; > - tcp_flags[i].th.ack =3D 1; > - } > - > for (i =3D 0; i < TCP_FRAMES_MEM; i++) { > struct iovec *iov =3D tcp_l2_iov[i]; > =20 > @@ -121,25 +103,6 @@ void tcp_sock_iov_init(const struct ctx *c) > iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); > iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp_payload[i]; > } > - > - for (i =3D 0; i < TCP_FRAMES_MEM; i++) { > - struct iovec *iov =3D tcp_l2_flags_iov[i]; > - > - iov[TCP_IOV_TAP] =3D tap_hdr_iov(c, &tcp_flags_tap_hdr[i]); > - iov[TCP_IOV_ETH].iov_len =3D sizeof(struct ethhdr); > - iov[TCP_IOV_PAYLOAD].iov_base =3D &tcp_flags[i]; > - } > -} > - > -/** > - * tcp_flags_flush() - Send out buffers for segments with no data (flags) > - * @c: Execution context > - */ > -void tcp_flags_flush(const struct ctx *c) > -{ > - tap_send_frames(c, &tcp_l2_flags_iov[0][0], TCP_NUM_IOVS, > - tcp_flags_used); > - tcp_flags_used =3D 0; > } > =20 > /** > @@ -171,7 +134,7 @@ static void tcp_revert_seq(const struct ctx *c, struc= t tcp_tap_conn **conns, > } > =20 > /** > - * tcp_payload_flush() - Send out buffers for segments with data > + * tcp_payload_flush() - Send out buffers for segments with data or flags > * @c: Execution context > */ > void tcp_payload_flush(const struct ctx *c) > @@ -197,37 +160,35 @@ void tcp_payload_flush(const struct ctx *c) > */ > int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, in= t flags) > { > - struct tcp_flags_t *payload; > + struct tcp_payload_t *payload; > struct iovec *iov; > size_t optlen; > size_t l4len; > uint32_t seq; > int ret; > =20 > - iov =3D tcp_l2_flags_iov[tcp_flags_used]; > + iov =3D tcp_l2_iov[tcp_payload_used]; > if (CONN_V4(conn)) { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_flags_ip[tcp_flags_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_used]); > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > } else { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_flags_ip[tcp_flags_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_used]); > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > } > =20 > payload =3D iov[TCP_IOV_PAYLOAD].iov_base; > seq =3D conn->seq_to_tap; > ret =3D tcp_prepare_flags(c, conn, flags, &payload->th, > - &payload->opts, &optlen); > + (struct tcp_syn_opts *)&payload->data, &optlen); > if (ret <=3D 0) > return ret; > =20 > - tcp_flags_used++; > + tcp_payload_used++; > l4len =3D tcp_l2_buf_fill_headers(conn, iov, optlen, NULL, seq, false); > iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > - > if (flags & DUP_ACK) { > - struct iovec *dup_iov; > + struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_used++]; > =20 > - dup_iov =3D tcp_l2_flags_iov[tcp_flags_used++]; > memcpy(dup_iov[TCP_IOV_TAP].iov_base, iov[TCP_IOV_TAP].iov_base, > iov[TCP_IOV_TAP].iov_len); > dup_iov[TCP_IOV_ETH].iov_base =3D iov[TCP_IOV_ETH].iov_base; > @@ -237,8 +198,8 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp= _tap_conn *conn, int flags) > dup_iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > } > =20 > - if (tcp_flags_used > TCP_FRAMES_MEM - 2) > - tcp_flags_flush(c); > + if (tcp_payload_used > TCP_FRAMES_MEM - 2) > + tcp_payload_flush(c); > =20 > return 0; > } > @@ -254,8 +215,10 @@ int tcp_buf_send_flag(const struct ctx *c, struct tc= p_tap_conn *conn, int flags) > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *co= nn, > ssize_t dlen, int no_csum, uint32_t seq) > { > + struct tcp_payload_t *payload; > const uint16_t *check =3D NULL; > struct iovec *iov; > + uint16_t *flags; > size_t l4len; > =20 > conn->seq_to_tap =3D seq + dlen; > @@ -274,6 +237,9 @@ static void tcp_data_to_tap(const struct ctx *c, stru= ct tcp_tap_conn *conn, > iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_used]); > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > } > + payload =3D iov[TCP_IOV_PAYLOAD].iov_base; > + flags =3D &payload->th.window - 1; > + *(flags) =3D PAYLOAD_FLAGS; I think this is likely to run afoul of TBAA rules, which could cause miscompiles because cc assumes *flags is not aliased with payload->th. Although it's more bulky, I think it's better to explicitly assign each of the fields in struct tcphdr. The compiler should be able to boil it down to the same instructions in any case. Note that now we're using the netinet version of struct tcphdr, it has an anonymous union so you can use th_flags, rather than the one bit fields (doesn't include doffset, though). > l4len =3D tcp_l2_buf_fill_headers(conn, iov, dlen, check, seq, false); > iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > if (++tcp_payload_used > TCP_FRAMES_MEM - 1) > diff --git a/tcp_buf.h b/tcp_buf.h > index 49c04d4..54f5e53 100644 > --- a/tcp_buf.h > +++ b/tcp_buf.h > @@ -7,7 +7,6 @@ > #define TCP_BUF_H > =20 > void tcp_sock_iov_init(const struct ctx *c); > -void tcp_flags_flush(const struct ctx *c); > void tcp_payload_flush(const struct ctx *c); > int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *con= n); > int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, in= t flags); > diff --git a/tcp_internal.h b/tcp_internal.h > index a5a47df..c846f60 100644 > --- a/tcp_internal.h > +++ b/tcp_internal.h > @@ -134,21 +134,6 @@ struct tcp_syn_opts { > .ws =3D TCP_OPT_WS(ws_), \ > }) > =20 > -/** > - * struct tcp_flags_t - TCP header and data to send zero-length > - * segments (flags) > - * @th: TCP header > - * @opts TCP options > - */ > -struct tcp_flags_t { > - struct tcphdr th; > - struct tcp_syn_opts opts; > -#ifdef __AVX2__ > -} __attribute__ ((packed, aligned(32))); > -#else > -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); > -#endif > - > extern char tcp_buf_discard [MAX_WINDOW]; > =20 > void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --Lwh07Q5/nv2zWIm7 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmcoIF0ACgkQzQJF27ox 2Gf2gBAAne1CmCxJ8NI28k2ik714bLP5ZzOjPUJdA+/maGmErK+Y7qFprp9YR+Hn W1DXTGLRWn/qOo1eZ805FWaIsuwIWVSo2eZqLEXC4Xg/fN8ig9YDzhG8uAvCN3Iw gUyemBX84GznzWri11EX57AAud/zYN6TcXbGRYl7Cn6Budtjfd2CN1ucm41Vxh+t SiitnwseDK9WtAlAPchVWsErx39eW6nfNA2Shk2kLFDHkZoFPAa9rMCwxAj3ngCp /PvFzy6ZLVtDZm9z6a9EKL2xs/6ElHptai40dGWBqk2oFFPmLUgm9hy9YW5B4N6r 91q8hkIwWWtDJ6z4eq6R76xb8WB0ZzJNqUsj9GXhwcLrmKNYMi8uPFMT373JgpFl w4cX0qNybSZDFTlhKAWOuyDbtqFkgwrpf9ubxpXtR/BnoraYuisPzHV5hYgUBO/c WGPFRkDNnk+vr6FyTyEi2563fL3LladK2EBDYqjBEhijiC0N+xvZP/m9Zhx4QYM8 Q9DCpRidoLf0/O1nwSYJQwAZ6h9TvjK7QrOchSEwzS2iJgEBbgqRyd3iHeSN4uON goptKN6N7bFALEsq94rnnFo95QcYxpAVOK8V/26yL/0PC4CAVcPD6D85qcN/IqrM OI8gij863v8z/W9ACpuIOsis8v5koyqbRsx3ogyjX0Ym29m+Eds= =WVqe -----END PGP SIGNATURE----- --Lwh07Q5/nv2zWIm7--