From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id DE76A5A026F for ; Fri, 2 Feb 2024 15:11:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706883114; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=41xUZg6dyCnjQlsa9ZUlELYC3ipD3dmNWC3Igiq7VT0=; b=ckwCa3+lLAdoQN73VZK5rvPEtms/LDvhVEGUAeFPfwTnV3NIv5Fju8hRLzz9SBJpXgWuYM X+CZH/DYUxyMj1McbFKtBzNQqIQxDQRo7KgtQCfxqKTlG3ZG6dzJzEKPwtbEGuPd/EWkUy dlLuFHbTGJms77k8kEvszl1cytoc9aE= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-669-LgZquDaoMf2-6_e67SXd1A-1; Fri, 02 Feb 2024 09:11:52 -0500 X-MC-Unique: LgZquDaoMf2-6_e67SXd1A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E4B23C13A88 for ; Fri, 2 Feb 2024 14:11:52 +0000 (UTC) Received: from virtlab218.virt.lab.eng.bos.redhat.com (virtlab218.virt.lab.eng.bos.redhat.com [10.19.152.190]) by smtp.corp.redhat.com (Postfix) with ESMTP id 73CA440C1231; Fri, 2 Feb 2024 14:11:52 +0000 (UTC) From: Laurent Vivier To: passt-dev@passt.top Subject: [PATCH 00/24] Add vhost-user support to passt. Date: Fri, 2 Feb 2024 15:11:27 +0100 Message-ID: <20240202141151.3762941-1-lvivier@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-ID-Hash: KUYZZOUPSUEFOGJS2VK6D4FN2XSV5RBT X-Message-ID-Hash: KUYZZOUPSUEFOGJS2VK6D4FN2XSV5RBT X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This series of patches adds vhost-user support to passt and then allows passt to connect to QEMU network backend using virtqueue rather than a socket. We have with iperf3 a 10x speed improvement: $ iperf3 -c localhost -p 10001 -t 60 -6 -u -b 50G socket: [ 5] 0.00-60.04 sec 30.5 GBytes 4.36 Gbits/sec 0.065 ms 9127377/10125415 (90%) receiver vhost-user: [ 5] 0.00-60.05 sec 292 GBytes 41.8 Gbits/sec 0.007 ms 259805/9832736 (2.6%) receiver $ iperf3 -c localhost -p 10001 -t 60 -4 -u -b 50G socket: [ 5] 0.00-60.04 sec 36.4 GBytes 5.21 Gbits/sec 0.048 ms 7535735/8728101 (86%) receiver vhost-user: [ 5] 0.00-60.05 sec 259 GBytes 37.0 Gbits/sec 0.003 ms 142594/8616705 (1.7%) receiver $ iperf3 -c localhost -p 10001 -t 60 -6 socket: [ 5] 0.00-60.00 sec 16.3 GBytes 2.33 Gbits/sec 0 sender [ 5] 0.00-60.06 sec 16.3 GBytes 2.32 Gbits/sec receiver vhost-user: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-60.00 sec 205 GBytes 29.3 Gbits/sec 0 sender [ 5] 0.00-60.04 sec 205 GBytes 29.3 Gbits/sec receiver $ iperf3 -c localhost -p 10001 -t 60 -4 socket: [ 5] 0.00-60.00 sec 16.1 GBytes 2.31 Gbits/sec 0 sender [ 5] 0.00-60.07 sec 16.1 GBytes 2.31 Gbits/sec receiver vhost-user: [ 5] 0.00-60.00 sec 201 GBytes 28.7 Gbits/sec 0 sender [ 5] 0.00-60.04 sec 201 GBytes 28.7 Gbits/sec receiver With QEMU, rather than using to connect: -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket we will use: -chardev socket,id=chr0,path=/tmp/passt_1.socket -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0 The memory backend is needed to share data between passt and QEMU. The series starts to introduce new functions to manage iovec and to do checksum on unaligned memory (we cannot align buffers provided by the guest with the same value we use with internal passt buffers): iov: add some functions to manage iovec pcap: add pcap_iov() checksum: align buffers checksum: add csum_iov() We introduce new files ip.c and ip.h to provide IP generic functions: util: move IP stuff from util.[ch] to ip.[ch] ip: move duplicate IPv4 checksum function to ip.h Then we extract from existing TCP and UDP function the internal passt buffer management to be able to use them with the guest provided buffers: tcp: extract buffer management from tcp_send_flag() tcp: extract buffer management from tcp_conn_tap_mss() tcp: rename functions that manage buffers tcp: move buffers management functions to their own file tap: make tap_update_mac() generic tap: export pool_flush()/tapX_handler()/packet_add() udp: move udpX_l2_buf_t and udpX_l2_mh_sock out of udp_update_hdrX() udp: rename udp_sock_handler() to udp_buf_sock_handler() packet: replace struct desc by struct iovec As vhost-user is a variant of passt mode, modify the code to compare to (!MODE_PASTA) rather than (MODE_PASST || MODE_VU) vhost-user: compare mode MODE_PASTA and not MODE_PASST We introduce virtio and vhost-user management functions: vhost-user: introduce virtio API vhost-user: introduce vhost-user API And then a first version of vhost-user that copies data from the passt buffers to guest memory, and vice-versa, as it's done with the socket algorithm: vhost-user: add vhost-user And finaly remove the buffers copy on TX and RX (TCP/UDP): vhost-user: use guest buffer directly in vu_handle_tx() tcp: vhost-user RX nocopy udp: vhost-user RX nocopy vhost-user: remove tap_send_frames_vu() Thanks, Laurent Laurent Vivier (24): iov: add some functions to manage iovec pcap: add pcap_iov() checksum: align buffers checksum: add csum_iov() util: move IP stuff from util.[ch] to ip.[ch] ip: move duplicate IPv4 checksum function to ip.h ip: introduce functions to compute the header part checksum for TCP/UDP tcp: extract buffer management from tcp_send_flag() tcp: extract buffer management from tcp_conn_tap_mss() tcp: rename functions that manage buffers tcp: move buffers management functions to their own file tap: make tap_update_mac() generic tap: export pool_flush()/tapX_handler()/packet_add() udp: move udpX_l2_buf_t and udpX_l2_mh_sock out of udp_update_hdrX() udp: rename udp_sock_handler() to udp_buf_sock_handler() packet: replace struct desc by struct iovec vhost-user: compare mode MODE_PASTA and not MODE_PASST vhost-user: introduce virtio API vhost-user: introduce vhost-user API vhost-user: add vhost-user vhost-user: use guest buffer directly in vu_handle_tx() tcp: vhost-user RX nocopy udp: vhost-user RX nocopy vhost-user: remove tap_send_frames_vu() Makefile | 7 +- checksum.c | 51 ++- checksum.h | 1 + conf.c | 33 +- dhcp.c | 1 + flow.c | 1 + icmp.c | 1 + iov.c | 78 ++++ iov.h | 46 +++ ip.c | 72 ++++ ip.h | 124 ++++++ isolation.c | 10 +- ndp.c | 1 + packet.c | 81 ++-- packet.h | 16 +- passt.c | 18 +- passt.h | 10 + pcap.c | 32 ++ pcap.h | 1 + port_fwd.c | 1 + qrap.c | 1 + tap.c | 226 +++++++---- tap.h | 13 +- tcp.c | 789 ++++++------------------------------ tcp.h | 2 +- tcp_buf.c | 569 ++++++++++++++++++++++++++ tcp_buf.h | 17 + tcp_internal.h | 78 ++++ tcp_splice.c | 1 + tcp_vu.c | 447 +++++++++++++++++++++ tcp_vu.h | 10 + udp.c | 171 ++++---- udp.h | 4 +- udp_internal.h | 21 + udp_vu.c | 215 ++++++++++ udp_vu.h | 8 + util.c | 55 --- util.h | 83 +--- vhost_user.c | 1050 ++++++++++++++++++++++++++++++++++++++++++++++++ vhost_user.h | 137 +++++++ virtio.c | 484 ++++++++++++++++++++++ virtio.h | 121 ++++++ 42 files changed, 4041 insertions(+), 1046 deletions(-) create mode 100644 iov.c create mode 100644 iov.h create mode 100644 ip.c create mode 100644 ip.h create mode 100644 tcp_buf.c create mode 100644 tcp_buf.h create mode 100644 tcp_internal.h create mode 100644 tcp_vu.c create mode 100644 tcp_vu.h create mode 100644 udp_internal.h create mode 100644 udp_vu.c create mode 100644 udp_vu.h create mode 100644 vhost_user.c create mode 100644 vhost_user.h create mode 100644 virtio.c create mode 100644 virtio.h -- 2.42.0