From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202506 header.b=BSrto1Kp; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id ADF1D5A0271 for ; Thu, 24 Jul 2025 03:03:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202506; t=1753318856; bh=iH8zdLw61P1hcNnjfcTcREH9hCLvMBNxIFBc7yiYkb0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BSrto1Kp1hmm4Ej2OKm1S1i1CSvLboooqRKqin2jJix4OgPLlPVMrNwUgp+uvJKcI HDXvirx3tUEXL0S11CGMQnLLCIuZ+IcJ53w+oXQOLPGcWGYox2Ptb+ysk+dGIp/L+3 28TigspJdAOmMgcRrOJoi+SFdf0Uiqm703nFzwC6JEzddeIX3IZda5jTrrNkEr6rid LUsjNjtOFuyBkdHSpw5EfrYEV0/ybAk2njBDzUmvojJe+oVJEN/zbuAxmIDxMKAo4M Nz97bwEmNBaNAIPxQCIQqZ9WAfaCp6Uv3AESWqZJwX34gHy4Y0rSxYU9ePVqGIidhm A1LJNJXiHLL0A== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4bnXkJ1VCsz4xZh; Thu, 24 Jul 2025 11:00:56 +1000 (AEST) Date: Thu, 24 Jul 2025 11:03:39 +1000 From: David Gibson To: Eugenio =?iso-8859-1?Q?P=E9rez?= Subject: Re: [RFC v2 09/11] tcp: start conversion to circular buffer Message-ID: References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-10-eperezma@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="aZ8vZJS8lwXIlxuT" Content-Disposition: inline In-Reply-To: <20250709174748.3514693-10-eperezma@redhat.com> Message-ID-Hash: F5A6BMVMT4QHCQS5NHHA4S44ZUOLEOAK X-Message-ID-Hash: F5A6BMVMT4QHCQS5NHHA4S44ZUOLEOAK X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --aZ8vZJS8lwXIlxuT Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 09, 2025 at 07:47:46PM +0200, Eugenio P=E9rez wrote: > The vhost-kernel module is async by nature: the driver (pasta) places a > few buffers in the virtqueue and the device (vhost-kernel) trust the s/trust/trusts/ > driver will not modify them until it uses them. To implement it is not > possible with TCP at the moment, as tcp_buf trust it can reuse the > buffers as soon as tcp_payload_flush() finish. >=20 > To achieve async let's make tcp_buf work with a circular ring, so vhost > can transmit at the same time pasta is queing more data. When a buffer > is received from a TCP socket, the element is placed in the ring and > sock_head is moved: > [][][][] > ^ ^ > | | > | sock_head > | > tail > tap_head >=20 > When the data is sent to vhost through the tx queue, tap_head is moved > forward: > [][][][] > ^ ^ > | | > | sock_head > | tap_head > | > tail >=20 > Finally, the tail move forward when vhost has used the tx buffers, so > tcp_payload (and all lower protocol buffers) can be reused. > [][][][] > ^ > | > sock_head > tap_head > tail This all sounds good. I wonder if it might be clearer to do this circular queue conversion as a separate patch series. I think it makes sense even without the context of vhost (it's closer to how most network things work). > In the case of error queueing to the vhost virtqueue, sock_head moves > backwards. The only possible error is that the queue is full, as sock_head moves backwards? Or tap_head moves backwards? > virtio-net does not report success on packet sending. >=20 > Starting as simple as possible, and only implementing the count > variables in this patch so it keeps working as previously. The circular > behavior will be added on top. >=20 > From ~16BGbit/s to ~13Gbit/s compared with write(2) to the tap. I don't really understand what you're comparing here. > Signed-off-by: Eugenio P=E9rez > --- > tcp_buf.c | 63 +++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 40 insertions(+), 23 deletions(-) >=20 > diff --git a/tcp_buf.c b/tcp_buf.c > index 242086d..0437120 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -53,7 +53,12 @@ static_assert(MSS6 <=3D sizeof(tcp_payload[0].data), "= MSS6 is greater than 65516") > =20 > /* References tracking the owner connection of frames in the tap outqueu= e */ > static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; > -static unsigned int tcp_payload_used; > +static unsigned int tcp_payload_sock_used, tcp_payload_tap_used; I think the "payload" here is a hangover from when we had separate queues for flags-only and data-containing packets. We can probably drop it and make a bunch of names shorter. > +static void tcp_payload_sock_produce(size_t n) > +{ > + tcp_payload_sock_used +=3D n; > +} > =20 > static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > =20 > @@ -132,6 +137,16 @@ static void tcp_revert_seq(const struct ctx *c, stru= ct tcp_tap_conn **conns, > } > } > =20 > +static void tcp_buf_free_old_tap_xmit(void) > +{ > + while (tcp_payload_tap_used) { > + tap_free_old_xmit(tcp_payload_tap_used); > + > + tcp_payload_tap_used =3D 0; > + tcp_payload_sock_used =3D 0; > + } > +} > + > /** > * tcp_payload_flush() - Send out buffers for segments with data or flags > * @c: Execution context > @@ -141,12 +156,13 @@ void tcp_payload_flush(const struct ctx *c) > size_t m; > =20 > m =3D tap_send_frames(c, &tcp_l2_iov[0][0], TCP_NUM_IOVS, > - tcp_payload_used, false); > - if (m !=3D tcp_payload_used) { > + tcp_payload_sock_used, true); > + if (m !=3D tcp_payload_sock_used) { > tcp_revert_seq(c, &tcp_frame_conns[m], &tcp_l2_iov[m], > - tcp_payload_used - m); > + tcp_payload_sock_used - m); > } > - tcp_payload_used =3D 0; > + tcp_payload_tap_used +=3D m; > + tcp_buf_free_old_tap_xmit(); > } > =20 > /** > @@ -195,12 +211,12 @@ int tcp_buf_send_flag(const struct ctx *c, struct t= cp_tap_conn *conn, int flags) > uint32_t seq; > int ret; > =20 > - iov =3D tcp_l2_iov[tcp_payload_used]; > + iov =3D tcp_l2_iov[tcp_payload_sock_used]; > if (CONN_V4(conn)) { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_sock_use= d]); > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > } else { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_sock_use= d]); > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > } > =20 > @@ -211,13 +227,14 @@ int tcp_buf_send_flag(const struct ctx *c, struct t= cp_tap_conn *conn, int flags) > if (ret <=3D 0) > return ret; > =20 > - tcp_payload_used++; > + tcp_payload_sock_produce(1); > l4len =3D optlen + sizeof(struct tcphdr); > iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > tcp_l2_buf_fill_headers(conn, iov, NULL, seq, false); > =20 > if (flags & DUP_ACK) { > - struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_used++]; > + struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_sock_used]; > + tcp_payload_sock_produce(1); > =20 > memcpy(dup_iov[TCP_IOV_TAP].iov_base, iov[TCP_IOV_TAP].iov_base, > iov[TCP_IOV_TAP].iov_len); > @@ -228,8 +245,9 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp= _tap_conn *conn, int flags) > dup_iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > } > =20 > - if (tcp_payload_used > TCP_FRAMES_MEM - 2) > + if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) { > tcp_payload_flush(c); > + } No { } here in passt style. > =20 > return 0; > } > @@ -251,19 +269,19 @@ static void tcp_data_to_tap(const struct ctx *c, st= ruct tcp_tap_conn *conn, > struct iovec *iov; > =20 > conn->seq_to_tap =3D seq + dlen; > - tcp_frame_conns[tcp_payload_used] =3D conn; > - iov =3D tcp_l2_iov[tcp_payload_used]; > + tcp_frame_conns[tcp_payload_sock_used] =3D conn; > + iov =3D tcp_l2_iov[tcp_payload_sock_used]; > if (CONN_V4(conn)) { > if (no_csum) { > - struct iovec *iov_prev =3D tcp_l2_iov[tcp_payload_used - 1]; > + struct iovec *iov_prev =3D tcp_l2_iov[tcp_payload_sock_used - 1]; > struct iphdr *iph =3D iov_prev[TCP_IOV_IP].iov_base; > =20 > check =3D &iph->check; > } > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_payload_sock_use= d]); > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > } else if (CONN_V6(conn)) { > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_used]); > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_payload_sock_use= d]); > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > } > payload =3D iov[TCP_IOV_PAYLOAD].iov_base; > @@ -274,8 +292,10 @@ static void tcp_data_to_tap(const struct ctx *c, str= uct tcp_tap_conn *conn, > payload->th.psh =3D push; > iov[TCP_IOV_PAYLOAD].iov_len =3D dlen + sizeof(struct tcphdr); > tcp_l2_buf_fill_headers(conn, iov, check, seq, false); > - if (++tcp_payload_used > TCP_FRAMES_MEM - 1) > + tcp_payload_sock_produce(1); > + if (tcp_payload_sock_used > TCP_FRAMES_MEM - 1) { > tcp_payload_flush(c); > + } > } > =20 > /** > @@ -341,15 +361,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, str= uct tcp_tap_conn *conn) > mh_sock.msg_iovlen =3D fill_bufs; > } > =20 > - if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { > + if (tcp_payload_sock_used + fill_bufs > TCP_FRAMES_MEM) { > tcp_payload_flush(c); > - > - /* Silence Coverity CWE-125 false positive */ > - tcp_payload_used =3D 0; > } > =20 > for (i =3D 0, iov =3D iov_sock + 1; i < fill_bufs; i++, iov++) { > - iov->iov_base =3D &tcp_payload[tcp_payload_used + i].data; > + iov->iov_base =3D &tcp_payload[tcp_payload_sock_used + i].data; > iov->iov_len =3D mss; > } > if (iov_rem) > @@ -407,7 +424,7 @@ int tcp_buf_data_from_sock(const struct ctx *c, struc= t tcp_tap_conn *conn) > dlen =3D mss; > seq =3D conn->seq_to_tap; > for (i =3D 0; i < send_bufs; i++) { > - int no_csum =3D i && i !=3D send_bufs - 1 && tcp_payload_used; > + int no_csum =3D i && i !=3D send_bufs - 1 && tcp_payload_sock_used; > bool push =3D false; > =20 > if (i =3D=3D send_bufs - 1) { --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --aZ8vZJS8lwXIlxuT Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmiBhmoACgkQzQJF27ox 2GdT6g//TyXsTcs6xX/DrgcsF2HZASnJ/yPFSIh+0yJ0nlx17vxVqQNpdQhBRsfJ WDYE/8sXkHSawHQmC5Psr1JhZkGx7tV1JUaxZFGHXeKNwU29rZq120vvgN51jIFC QqM1EZSlKt672BX4JzvBE3McNkQN1o4SGK8LPU5a5t5f7sV6OXkBTbFyCtH7p4n8 jGHlA7bQ3hdNCxNGvE5oSv7sqTHViOWe0G7/cG9/XaM0SpZ/t1HN98epJv6se3wS NxqktbriooIFDELOioTmlVZya0zP9xNFgAXpY4NPIC3DOPEy2tFQVbzwNAOivYMJ 2cMq9PCQrdnXEbPJzb907u4C+Lf4WugCO7bIQxE2AmF37CHBdDXewgWg9QVQr/3/ 2SNPlE1mS2RPexBDcul4xbfH7hiVAl/2gdF2BhT0SZKddsDtpwYTBsBQMzltvGPu 1FnLJYmq2HIUuWluN/43z8DYTE6HJ8fGLpokWlR8bqbr89Rv/kcZE9rTdjOOJ+ZI lRK5u+sqWq0HLyEqW6R2gSlcY9fambiTHtZC7KqoBry5uthK75K2JknMKNcX6FXY 1LE9add3pzIJ7BVGvY0ZHstIwc7JXykIx9DSK3pIuScnhUM4aqzyXMpLfjqpGtrN wqV2dkWQXasCO0TWWXEKIOsfn3DXEk3+bgDfYicV4CnLbeU/HLA= =Lyid -----END PGP SIGNATURE----- --aZ8vZJS8lwXIlxuT--