From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XIppfEj1; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 709B75A0278 for ; Mon, 28 Jul 2025 18:56:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1753721791; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uExuW0LI9hmGB1jowAGeVwLHkckUg4u/awXzWIuFYgQ=; b=XIppfEj1LzJsF41FfFYvXPC6kCSrs6g0FAld77cj45oBzYZRg1YwCGcc2rwiSi3JIL6eRV FiFBOg3UMWF0z2QZGbm5396JXFgv+IPzeYPhZJtZQ5RWff6q/Wf1Y2o6NzUFSX9niCNE2/ rYbQzxZG1E4XoIFZb5+GbXSkRmd0ITk= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-378-R4-a9CxUN0iI89ZcOfC6CA-1; Mon, 28 Jul 2025 12:56:30 -0400 X-MC-Unique: R4-a9CxUN0iI89ZcOfC6CA-1 X-Mimecast-MFC-AGG-ID: R4-a9CxUN0iI89ZcOfC6CA_1753721788 Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-31332dc2b59so4499419a91.0 for ; Mon, 28 Jul 2025 09:56:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753721788; x=1754326588; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uExuW0LI9hmGB1jowAGeVwLHkckUg4u/awXzWIuFYgQ=; b=V6AaIGxMTklkOgaumQtAMk+c2xNpRxqLlXyNDTDkUdoU4qhHwOOShI2Kbnezr5RS8h ARIUke1UcQSEDTdiL1uIdS3GbuRHmL8r8txsVH4gCOB30X1fyrZ/9A8eARXqYZN6nx9j w4QkddHAD3WnCKs+TjV75+0jnjRHK1wPOlB90ztWOA6V6iBHS71cK0jTxIjOiLCBiFoq /RVovw1EKDype48UKetceScaeXyA7SPMiU4g7R7x3xDNUxoq5R5OjdJ2ApG467+ys0Di yaaJVfjTx6PkHq6vOHmumjgGp1l1buIXH2XE4UFhLmOkzdQiVJkmTIj83NfzXnh0i50t McFQ== X-Gm-Message-State: AOJu0YzBElpKtpurmCVPyN4B4CZ9fjQ2vxv4NO03YUBVWXAVHpJpXz6E h9cOB63vb/V8VLLgF/lCzdhco0PUILxN81eRSjCB0WJ9XnnVRn9ULBC6rOSdKwPLbaJ1s2ayFqo XuHq9oFc5q57eFQ/HTumGWv2Y9c/X4CRITEUzwITQKV7dgZoFPAJHBz39EUMPjg90LdxNqua4G2 nwOESlvhI1zcI/aeznBp7mP5eQn/Ie X-Gm-Gg: ASbGncuSv91UQhbJu+Vf0NKQyXn8wsHy5lblkQWt7eFjmtzXF/43q7wadb68iCr7EDq u3vvuIR62Nxbb6HfJvJrcSRDS9gkWhlfhGX9tk2FKLgkwfF2zIbJ+N2yYj1gcZaJ61RdeCTb8wz I80rKmIvIXbZJgI3yTL3MT X-Received: by 2002:a17:90b:558f:b0:31f:210e:e34a with SMTP id 98e67ed59e1d1-31f210ee5b6mr1347831a91.8.1753721787547; Mon, 28 Jul 2025 09:56:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGNObiZHBvDKA4hatAkAhjg5QPgO8GKKGNJqUlMP6UWChbEMWB6cxIc0lraHCPnF+qY/kEqBF/ut/GpF3+tYLc= X-Received: by 2002:a17:90b:558f:b0:31f:210e:e34a with SMTP id 98e67ed59e1d1-31f210ee5b6mr1347785a91.8.1753721787080; Mon, 28 Jul 2025 09:56:27 -0700 (PDT) MIME-Version: 1.0 References: <20250709174748.3514693-1-eperezma@redhat.com> <20250709174748.3514693-10-eperezma@redhat.com> In-Reply-To: From: Eugenio Perez Martin Date: Mon, 28 Jul 2025 18:55:50 +0200 X-Gm-Features: Ac12FXxhx_sChKg45Y0p5RiL0aKiR7RwQpZlWhWK3N_uIBOA3lwRk7QH6Ch6Tt8 Message-ID: Subject: Re: [RFC v2 09/11] tcp: start conversion to circular buffer To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yyuSmKE_yHtUQNIijxvqRfTdVt5A9JLXnGEOiFjXQVM_1753721788 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: Q5QPLRF7XWZYXY6W7VS4Q7VBNJQCXIVI X-Message-ID-Hash: Q5QPLRF7XWZYXY6W7VS4Q7VBNJQCXIVI X-MailFrom: eperezma@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, jasowang@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, Jul 24, 2025 at 3:03=E2=80=AFAM David Gibson wrote: > > On Wed, Jul 09, 2025 at 07:47:46PM +0200, Eugenio P=C3=A9rez wrote: > > The vhost-kernel module is async by nature: the driver (pasta) places a > > few buffers in the virtqueue and the device (vhost-kernel) trust the > > s/trust/trusts/ > Fixing in the next version. > > driver will not modify them until it uses them. To implement it is not > > possible with TCP at the moment, as tcp_buf trust it can reuse the > > buffers as soon as tcp_payload_flush() finish. > > > > > > > To achieve async let's make tcp_buf work with a circular ring, so vhost > > can transmit at the same time pasta is queing more data. When a buffer > > is received from a TCP socket, the element is placed in the ring and > > sock_head is moved: > > [][][][] > > ^ ^ > > | | > > | sock_head > > | > > tail > > tap_head > > > > When the data is sent to vhost through the tx queue, tap_head is moved > > forward: > > [][][][] > > ^ ^ > > | | > > | sock_head > > | tap_head > > | > > tail > > > > Finally, the tail move forward when vhost has used the tx buffers, so > > tcp_payload (and all lower protocol buffers) can be reused. > > [][][][] > > ^ > > | > > sock_head > > tap_head > > tail > > This all sounds good. I wonder if it might be clearer to do this > circular queue conversion as a separate patch series. I think it > makes sense even without the context of vhost (it's closer to how most > network things work). > Sure it can be done. > > In the case of error queueing to the vhost virtqueue, sock_head moves > > backwards. The only possible error is that the queue is full, as > > sock_head moves backwards? Or tap_head moves backwards? > Sock head moves backwards. Tap_head cannot move backwards as vhost does not have a way to report "the last X packets has not been sent". > > virtio-net does not report success on packet sending. > > > > Starting as simple as possible, and only implementing the count > > variables in this patch so it keeps working as previously. The circula= r > > behavior will be added on top. > > > > From ~16BGbit/s to ~13Gbit/s compared with write(2) to the tap. > > I don't really understand what you're comparing here. > Sending through vhost-net vs write(2) to tap device. > > Signed-off-by: Eugenio P=C3=A9rez > > --- > > tcp_buf.c | 63 +++++++++++++++++++++++++++++++++++-------------------- > > 1 file changed, 40 insertions(+), 23 deletions(-) > > > > diff --git a/tcp_buf.c b/tcp_buf.c > > index 242086d..0437120 100644 > > --- a/tcp_buf.c > > +++ b/tcp_buf.c > > @@ -53,7 +53,12 @@ static_assert(MSS6 <=3D sizeof(tcp_payload[0].data),= "MSS6 is greater than 65516") > > > > /* References tracking the owner connection of frames in the tap outqu= eue */ > > static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; > > -static unsigned int tcp_payload_used; > > +static unsigned int tcp_payload_sock_used, tcp_payload_tap_used; > > I think the "payload" here is a hangover from when we had separate > queues for flags-only and data-containing packets. We can probably > drop it and make a bunch of names shorter. > Maybe we can short even more if we isolate this in its own circular_buffer.h or equivalent. UDP will also need it. > > +static void tcp_payload_sock_produce(size_t n) > > +{ > > + tcp_payload_sock_used +=3D n; > > +} > > > > static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > > > > @@ -132,6 +137,16 @@ static void tcp_revert_seq(const struct ctx *c, st= ruct tcp_tap_conn **conns, > > } > > } > > > > +static void tcp_buf_free_old_tap_xmit(void) > > +{ > > + while (tcp_payload_tap_used) { > > + tap_free_old_xmit(tcp_payload_tap_used); > > + > > + tcp_payload_tap_used =3D 0; > > + tcp_payload_sock_used =3D 0; > > + } > > +} > > + > > /** > > * tcp_payload_flush() - Send out buffers for segments with data or fl= ags > > * @c: Execution context > > @@ -141,12 +156,13 @@ void tcp_payload_flush(const struct ctx *c) > > size_t m; > > > > m =3D tap_send_frames(c, &tcp_l2_iov[0][0], TCP_NUM_IOVS, > > - tcp_payload_used, false); > > - if (m !=3D tcp_payload_used) { > > + tcp_payload_sock_used, true); > > + if (m !=3D tcp_payload_sock_used) { > > tcp_revert_seq(c, &tcp_frame_conns[m], &tcp_l2_iov[m], > > - tcp_payload_used - m); > > + tcp_payload_sock_used - m); > > } > > - tcp_payload_used =3D 0; > > + tcp_payload_tap_used +=3D m; > > + tcp_buf_free_old_tap_xmit(); > > } > > > > /** > > @@ -195,12 +211,12 @@ int tcp_buf_send_flag(const struct ctx *c, struct= tcp_tap_conn *conn, int flags) > > uint32_t seq; > > int ret; > > > > - iov =3D tcp_l2_iov[tcp_payload_used]; > > + iov =3D tcp_l2_iov[tcp_payload_sock_used]; > > if (CONN_V4(conn)) { > > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_pay= load_used]); > > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_pay= load_sock_used]); > > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > > } else { > > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_pay= load_used]); > > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_pay= load_sock_used]); > > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > > } > > > > @@ -211,13 +227,14 @@ int tcp_buf_send_flag(const struct ctx *c, struct= tcp_tap_conn *conn, int flags) > > if (ret <=3D 0) > > return ret; > > > > - tcp_payload_used++; > > + tcp_payload_sock_produce(1); > > l4len =3D optlen + sizeof(struct tcphdr); > > iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > > tcp_l2_buf_fill_headers(conn, iov, NULL, seq, false); > > > > if (flags & DUP_ACK) { > > - struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_used++]; > > + struct iovec *dup_iov =3D tcp_l2_iov[tcp_payload_sock_use= d]; > > + tcp_payload_sock_produce(1); > > > > memcpy(dup_iov[TCP_IOV_TAP].iov_base, iov[TCP_IOV_TAP].io= v_base, > > iov[TCP_IOV_TAP].iov_len); > > @@ -228,8 +245,9 @@ int tcp_buf_send_flag(const struct ctx *c, struct t= cp_tap_conn *conn, int flags) > > dup_iov[TCP_IOV_PAYLOAD].iov_len =3D l4len; > > } > > > > - if (tcp_payload_used > TCP_FRAMES_MEM - 2) > > + if (tcp_payload_sock_used > TCP_FRAMES_MEM - 2) { > > tcp_payload_flush(c); > > + } > > No { } here in passt style. > Fixing in the next revision. > > > > return 0; > > } > > @@ -251,19 +269,19 @@ static void tcp_data_to_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > > struct iovec *iov; > > > > conn->seq_to_tap =3D seq + dlen; > > - tcp_frame_conns[tcp_payload_used] =3D conn; > > - iov =3D tcp_l2_iov[tcp_payload_used]; > > + tcp_frame_conns[tcp_payload_sock_used] =3D conn; > > + iov =3D tcp_l2_iov[tcp_payload_sock_used]; > > if (CONN_V4(conn)) { > > if (no_csum) { > > - struct iovec *iov_prev =3D tcp_l2_iov[tcp_payload= _used - 1]; > > + struct iovec *iov_prev =3D tcp_l2_iov[tcp_payload= _sock_used - 1]; > > struct iphdr *iph =3D iov_prev[TCP_IOV_IP].iov_ba= se; > > > > check =3D &iph->check; > > } > > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_pay= load_used]); > > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp4_payload_ip[tcp_pay= load_sock_used]); > > iov[TCP_IOV_ETH].iov_base =3D &tcp4_eth_src; > > } else if (CONN_V6(conn)) { > > - iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_pay= load_used]); > > + iov[TCP_IOV_IP] =3D IOV_OF_LVALUE(tcp6_payload_ip[tcp_pay= load_sock_used]); > > iov[TCP_IOV_ETH].iov_base =3D &tcp6_eth_src; > > } > > payload =3D iov[TCP_IOV_PAYLOAD].iov_base; > > @@ -274,8 +292,10 @@ static void tcp_data_to_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > > payload->th.psh =3D push; > > iov[TCP_IOV_PAYLOAD].iov_len =3D dlen + sizeof(struct tcphdr); > > tcp_l2_buf_fill_headers(conn, iov, check, seq, false); > > - if (++tcp_payload_used > TCP_FRAMES_MEM - 1) > > + tcp_payload_sock_produce(1); > > + if (tcp_payload_sock_used > TCP_FRAMES_MEM - 1) { > > tcp_payload_flush(c); > > + } > > } > > > > /** > > @@ -341,15 +361,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, s= truct tcp_tap_conn *conn) > > mh_sock.msg_iovlen =3D fill_bufs; > > } > > > > - if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { > > + if (tcp_payload_sock_used + fill_bufs > TCP_FRAMES_MEM) { > > tcp_payload_flush(c); > > - > > - /* Silence Coverity CWE-125 false positive */ > > - tcp_payload_used =3D 0; > > } > > > > for (i =3D 0, iov =3D iov_sock + 1; i < fill_bufs; i++, iov++) { > > - iov->iov_base =3D &tcp_payload[tcp_payload_used + i].data= ; > > + iov->iov_base =3D &tcp_payload[tcp_payload_sock_used + i]= .data; > > iov->iov_len =3D mss; > > } > > if (iov_rem) > > @@ -407,7 +424,7 @@ int tcp_buf_data_from_sock(const struct ctx *c, str= uct tcp_tap_conn *conn) > > dlen =3D mss; > > seq =3D conn->seq_to_tap; > > for (i =3D 0; i < send_bufs; i++) { > > - int no_csum =3D i && i !=3D send_bufs - 1 && tcp_payload_= used; > > + int no_csum =3D i && i !=3D send_bufs - 1 && tcp_payload_= sock_used; > > bool push =3D false; > > > > if (i =3D=3D send_bufs - 1) { > > -- > David Gibson (he or they) | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you, not the other wa= y > | around. > http://www.ozlabs.org/~dgibson