From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202506 header.b=xWqcxfcC; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 7EF815A0271 for ; Thu, 24 Jul 2025 03:34:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202506; t=1753320673; bh=MpLN+YuVwfsF2baPmJzb70Ez9DiGPk/6eoEP9lWMKy4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=xWqcxfcCY7GYX60kPijqg+c5B1Ah6GVx6YIpwvUzP8TmLo0VlioAHQfs+W1UW0O38 61422Fc+5vnvBkJi7k8rRybPgx6Si1ftxiAOutbgyXpbeTnyIz8WXQuKwTG/iDH7ME 9lsuPGQEifTkilDt8/gkzJmJR5JiFxw6dFl45Y1GvlADVrmL3Y3sawR5PpIU6t7Mst zn5UFVBEI+yJfEC8g6qslsVf32f/z4B8OsVhfQEeQMC8NaFBYvKSHOvZ39AgpcfYhF GIsIDgj5MU8MlQ5TTWDC8Yd+hwciEh/8JeuklNRn9VEEEl20C0txliF6c74Z2EHHUL 5Ms9xMlxLySww== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4bnYPF4MXZz4xZh; Thu, 24 Jul 2025 11:31:13 +1000 (AEST) Date: Thu, 24 Jul 2025 11:33:51 +1000 From: David Gibson To: Eugenio =?iso-8859-1?Q?P=E9rez?= Subject: Re: [RFC v2 11/11] tcp_buf: adding TCP tx circular buffer Message-ID: References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-12-eperezma@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="EzXUNjHaIoT2AMLZ" Content-Disposition: inline In-Reply-To: <20250709174748.3514693-12-eperezma@redhat.com> Message-ID-Hash: NOPILH5G7MNWUR4P7NFIW2MOFZY7YT2W X-Message-ID-Hash: NOPILH5G7MNWUR4P7NFIW2MOFZY7YT2W X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --EzXUNjHaIoT2AMLZ Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 09, 2025 at 07:47:48PM +0200, Eugenio P=E9rez wrote: > Now both tcp_sock and tap uses the circular buffer as intended. >=20 > Very lightly tested. Especially, paths like ring full or almost full > that are checked before producing like > tcp_payload_sock_used + fill_bufs > TCP_FRAMES_MEM. >=20 > Processing the tx buffers in a circular buffer makes namespace rx go > from to ~11.5Gbit/s. to ~17.26Gbit/s. >=20 > TODO: Increase the tx queue length, as we spend a lot of descriptors in > each request. Ideally, tx size should be at least > bufs_per_frame*TCP_FRAMES_MEM, but maybe we got more performance with > bigger queues. >=20 > TODO: Sometimes we call tcp_buf_free_old_tap_xmit twice: one to free at > least N used tx buffers and the next one in tcp_payload_flush. Maybe we > can optimize it. >=20 > Signed-off-by: Eugenio P=E9rez > --- > tcp_buf.c | 130 ++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 106 insertions(+), 24 deletions(-) >=20 > diff --git a/tcp_buf.c b/tcp_buf.c > index f74d22d..326af79 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -53,13 +53,66 @@ static_assert(MSS6 <=3D sizeof(tcp_payload[0].data), = "MSS6 is greater than 65516") > =20 > /* References tracking the owner connection of frames in the tap outqueu= e */ > static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; > -static unsigned int tcp_payload_sock_used, tcp_payload_tap_used; > + > +/* > + * sock_head: Head of buffers available for writing. tcp_data_to_tap mov= es it > + * forward, but errors queueing to vhost can move it backwards to tap_he= ad > + * again. > + * > + * tap_head: Head of buffers that have been sent to vhost. flush moves t= his > + * forward. > + * > + * tail: Chasing index. Increments when vhost uses buffers. > + * > + * _used: Independent variables to tell between full and empty. Hm. I kind of hope there's a less bulky way of doing this. > + */ > +static unsigned int tcp_payload_sock_head, tcp_payload_tap_head, tcp_pay= load_tail, tcp_payload_sock_used, tcp_payload_tap_used; > +#define IS_POW2(y) (((y) > 0) && !((y) & ((y) - 1))) Worth putting this in util.h as a separate patch. > +static_assert(ARRAY_SIZE(tcp_payload) =3D=3D TCP_FRAMES_MEM, "TCP_FRAMES= _MEM is not the size of tcp_payload anymore"); > +static_assert(IS_POW2(TCP_FRAMES_MEM), "TCP_FRAMES_MEM must be a power o= f two"); > + > +static size_t tcp_payload_cnt_to_end(size_t head, size_t tail) > +{ > + assert(head !=3D tail); > + size_t end =3D ARRAY_SIZE(tcp_payload) - tail; > + size_t n =3D (head + end) % ARRAY_SIZE(tcp_payload); > + > + return MIN(n, end); > +} > + > +/* Count the number of items that has been written from sock to the > + * curcular buffer and can be sent to tap. s/curcular/circular/g > + */ > +static size_t tcp_payload_tap_cnt(void) > +{ > + return tcp_payload_sock_used - tcp_payload_tap_used; > +} > =20 > static void tcp_payload_sock_produce(size_t n) > { > + tcp_payload_sock_head =3D (tcp_payload_sock_head + n) % ARRAY_SIZE(tcp_= payload); > tcp_payload_sock_used +=3D n; > } > =20 > +/* Count the number of consecutive items that has been written from sock= to the > + * curcular buffer and can be sent to tap without having to wrap back to= the > + * beginning of the buffer. > + */ > +static size_t tcp_payload_tap_cnt_to_end(void) > +{ > + if (tcp_payload_sock_head =3D=3D tcp_payload_tap_head) { > + /* empty? */ > + if (tcp_payload_sock_used - tcp_payload_tap_used =3D=3D 0) > + return 0; > + > + /* full */ > + return ARRAY_SIZE(tcp_payload) - tcp_payload_tap_head; > + } > + > + return tcp_payload_cnt_to_end(tcp_payload_sock_head, > + tcp_payload_tap_head); > +} > + > static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > =20 > /** > @@ -137,14 +190,13 @@ static void tcp_revert_seq(const struct ctx *c, str= uct tcp_tap_conn **conns, > } > } > =20 > -static void tcp_buf_free_old_tap_xmit(const struct ctx *c) > +static void tcp_buf_free_old_tap_xmit(const struct ctx *c, size_t target) > { > - while (tcp_payload_tap_used) { > - tap_free_old_xmit(c, tcp_payload_tap_used); > + size_t n =3D tap_free_old_xmit(c, target); > =20 > - tcp_payload_tap_used =3D 0; > - tcp_payload_sock_used =3D 0; > - } > + tcp_payload_tail =3D (tcp_payload_tail + n) & (ARRAY_SIZE(tcp_pay= load) - 1); use % instead of & here - it's consistent with other places, and the compiler should be able to optimize it to the same thing. > + tcp_payload_tap_used -=3D n; > + tcp_payload_sock_used -=3D n; > } > =20 > /** > @@ -153,16 +205,33 @@ static void tcp_buf_free_old_tap_xmit(const struct = ctx *c) > */ > void tcp_payload_flush(const struct ctx *c) > { > - size_t m; > + size_t m, n =3D tcp_payload_tap_cnt_to_end(); > + struct iovec *head =3D &tcp_l2_iov[tcp_payload_tap_head][0]; > =20 > - m =3D tap_send_frames(c, &tcp_l2_iov[0][0], TCP_NUM_IOVS, > - tcp_payload_sock_used, true); > - if (m !=3D tcp_payload_sock_used) { > - tcp_revert_seq(c, &tcp_frame_conns[m], &tcp_l2_iov[m], > - tcp_payload_sock_used - m); > - } > + tcp_buf_free_old_tap_xmit(c, (size_t)-1); > + m =3D tap_send_frames(c, head, TCP_NUM_IOVS, n, true); > tcp_payload_tap_used +=3D m; > - tcp_buf_free_old_tap_xmit(c); > + tcp_payload_tap_head =3D (tcp_payload_tap_head + m) % > + ARRAY_SIZE(tcp_payload); > + > + if (m !=3D n) { > + n =3D tcp_payload_tap_cnt_to_end(); > + > + tcp_revert_seq(c, &tcp_frame_conns[tcp_payload_tap_head], > + &tcp_l2_iov[tcp_payload_tap_head], n); > + /* > + * circular buffer wrap case. > + * TODO: Maybe it's better to adapt tcp_revert_seq. > + */ > + tcp_revert_seq(c, &tcp_frame_conns[0], &tcp_l2_iov[0], > + tcp_payload_tap_cnt() - n); > + > + tcp_payload_sock_head =3D tcp_payload_tap_head; > + tcp_payload_sock_used =3D tcp_payload_tap_used; > + } else if (tcp_payload_tap_cnt_to_end()) { > + /* circular buffer wrap case */ > + tcp_payload_flush(c); > + } > } > =20 > /** > @@ -209,14 +278,15 @@ int tcp_buf_send_flag(const struct ctx *c, struct t= cp_tap_conn *conn, int flags) > size_t optlen; > size_t l4len; > uint32_t seq; > + unsigned int i =3D tcp_payload_sock_head; > int ret; > =20 > - iov =3D tcp_l2_iov[tcp_payload_sock_used]; > + iov =3D tcp_l2_iov[i]; > if (CONN_V4(conn)) { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_sock_use= d]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[i]); > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > } else { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_sock_use= d]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[i]); > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > } > =20 > @@ -228,13 +298,15 @@ int tcp_buf_send_flag(const struct ctx *c, struct t= cp_tap_conn *conn, int flags) > return ret; > =20 > tcp_payload_sock_produce(1); > + i =3D tcp_payload_sock_head; > l4len =3D optlen + sizeof(struct tcphdr); > iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > tcp_l2_buf_fill_headers(conn, iov, NULL, seq, false); > =20 > if (flags & DUP_ACK) { > - struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_sock_used]; > + struct iovec *dup_iov =3D tcp_l2_iov[i]; > tcp_payload_sock_produce(1); > + i =3D tcp_payload_sock_head; > =20 > memcpy(dup_iov[TCP_IOV_TAP].iov_base, iov[TCP_IOV_TAP].iov_base, > iov[TCP_IOV_TAP].iov_len); > @@ -246,7 +318,10 @@ int tcp_buf_send_flag(const struct ctx *c, struct tc= p_tap_conn *conn, int flags) > } > =20 > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) { > + tcp_buf_free_old_tap_xmit(c, 2); > tcp_payload_flush(c); > + /* TODO how to fix this? original code didn't chech for success either= */ > + assert(tcp_payload_sock_used <=3D TCP_FRAMES_MEM - 2); > } > =20 > return 0; > @@ -269,16 +344,17 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > struct iovec *iov; > =20 > conn->seq_to_tap =3D seq + dlen; > - tcp_frame_conns[tcp_payload_sock_used] =3D conn; > - iov =3D tcp_l2_iov[tcp_payload_sock_used]; > + tcp_frame_conns[tcp_payload_sock_head] =3D conn; > + iov =3D tcp_l2_iov[tcp_payload_sock_head]; > if (CONN_V4(conn)) { > if (no_csum) { > - struct iovec *iov_prev =3D tcp_l2_iov[tcp_payload_sock_used - 1]; > + unsigned prev =3D (tcp_payload_sock_head - 1) % TCP_FRAMES_MEM; > + struct iovec *iov_prev =3D tcp_l2_iov[prev]; > struct iphdr *iph =3D iov_prev[TCP_IOV_IP].iov_base; > =20 > check =3D &iph->check; > } > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_sock_use= d]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_sock_hea= d]); > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > } else if (CONN_V6(conn)) { > iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_sock_use= d]); > @@ -294,8 +370,11 @@ static void tcp_data_to_tap(const struct ctx *c, str= uct tcp_tap_conn *conn, > tcp_l2_buf_fill_headers(conn, iov, check, seq, false); > tcp_payload_sock_produce(1); > if (tcp_payload_sock_used > TCP_FRAMES_MEM - 1) { > + tcp_buf_free_old_tap_xmit(c, 1); > tcp_payload_flush(c); > + assert(tcp_payload_sock_used <=3D TCP_FRAMES_MEM - 1); > } > + > } > =20 > /** > @@ -362,11 +441,14 @@ int tcp_buf_data_from_sock(const struct ctx *c, str= uct tcp_tap_conn *conn) > } > =20 > if (tcp_payload_sock_used + fill_bufs > TCP_FRAMES_MEM) { > + tcp_buf_free_old_tap_xmit(c, fill_bufs); > tcp_payload_flush(c); > + /* TODO how to report this to upper layers? */ > + assert(tcp_payload_sock_used + fill_bufs <=3D TCP_FRAMES_MEM); > } > =20 > for (i =3D 0, iov =3D iov_sock + 1; i < fill_bufs; i++, iov++) { > - iov->iov_base =3D &tcp_payload[tcp_payload_sock_used + i].data; > + iov->iov_base =3D &tcp_payload[(tcp_payload_sock_head + i) % TCP_FRAME= S_MEM].data; > iov->iov_len =3D mss; > } > if (iov_rem) --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --EzXUNjHaIoT2AMLZ Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiBjX4ACgkQzQJF27ox 2Gc6ZxAAqnryZCoBpoKDOfnlQ5kvHamO595/SMXn7AQCiCsjpqTGuqK4AovG1K0a iJFMpNJ0YksYbmeG54XuB4GeEsROUN1U68g+JsL+wyB7yMYHx26MdFkvQBbYv5tD XGpg+ioGW17TzJyCRnK3IrSr5eoU8CJg9i3Bd4k68QYDFHfkEhSgFcZjOVLJyw7I 4UFgT/PqJhKrfi+VDKXGTF8sEWec+Idurv7YgXJI/gNclusKEPPJQk4QZ/CVn9jy yIlFw2idh/saPZTqF2hE42AuV67Y06LDW44Obz0XxVn6GMIu3JShUNkX8JippxmL 3CblAQf3swC+qZkmIkeUThRLNddxqwpsF+rGOS1OklZeiLlodEsYf2m5XEcpiL5l 5vlSpYVxsoW79EwUvauODhhcJPNPff/sQTqu5HKF5PqFLIoeAxyKJMFOPwfpjSog cXfylirpNVyeNSKKuybPl9+ZHzRiPB6T0S/PQO0u1lte5v235kPsQiNTeWnJmhiO shWLF0gvPPIqpv7xEweRmH3Dd/dPzOvjQN20yzVny5O6nbtrvJAAsm4W/FPF2VoD sid5h7ydaH8YTvselZh20iPWuBNxG2mHx8CW1SCp94zpbUDgkLd9dH6S/YF7g5On 1EQ9AFVaJFuLB97Dbi8LyGgS9lphfxhAit/crYmuAkXEoJGmsFM= =gd41 -----END PGP SIGNATURE----- --EzXUNjHaIoT2AMLZ--