From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Qfe+L8fv; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 66E7C5A004E for ; Fri, 15 Nov 2024 11:09:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731665346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZnTtNjuO34it5TbJNe27eIy7qi5pPTcf46CuOlYxD6g=; b=Qfe+L8fvgu9U3iZ1VQZWCqyo3kPP6RLmqRgqX/Qipzs8VRUygZhJmJDLFvi+WkjRsSzIDb y/MqO2NG8oN7slAkSyKwgMv1qeS8RlFPqCS9wjYcIH31TuXPTGKpTGELKrqnv+yrBjCjHJ urBc3QLudcuqLPpzMN2lsWrwS7lT0g4= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-449-Xj3i_mQgNF-sIz8JD9zCqg-1; Fri, 15 Nov 2024 05:09:05 -0500 X-MC-Unique: Xj3i_mQgNF-sIz8JD9zCqg-1 X-Mimecast-MFC-AGG-ID: Xj3i_mQgNF-sIz8JD9zCqg Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4315a0f25afso12617175e9.3 for ; Fri, 15 Nov 2024 02:09:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731665342; x=1732270142; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ZnTtNjuO34it5TbJNe27eIy7qi5pPTcf46CuOlYxD6g=; b=Ng7W9/C4/XUz20H5oNQefzbajstWPw3M4LpNkeZ7Me91pkjir+fKJERb2PNUHgzOXm jEy8tKqngzyPu62aiHTbG6ynEhLfD1SNRkbbzXSkYWufIfzs6tNLQMCd8UFgyM4rR+fY APOvCMLO7EQOco9epb8fCJDtSqg26p6koXVrQ7DMB7U3E/5PbTpArGmw3eBKi/hzj1i9 Ps617sB+TFMissBCC5jSikjKSlb2s+EaeFod/UzIvh3eHPNPJGm8BIQSvei2sGJO5YSy /16dgjm1XQnhW8DxNpyPjmQNt7GjdYMOZBrtc4sgJxvZ+yckzg8g3Bkp08UDw/R/XsW2 26Yg== X-Gm-Message-State: AOJu0YzOMyaToYV/Me9WGnWwhtk6vkcXnW3eoy2r8mkdFAoYaVQFXNO0 TZnEWssWupS6k3pfqYcnnx9QCDBGc0S+64sQvjdAq4L/HW9p7wWc8xTKvZetfwi1iie2kPUG8Kh qgijBmRr02rDu7R0YZm5zKW+5esETUFtlBEubL5b4Z9cRwVseBWT0Dk5hDpWXb58QedEqBtTSdm tqV/zVAU47En9YSUB7HQA2+PtpozIuk9ID X-Received: by 2002:a05:600c:4f4a:b0:42f:75cd:2566 with SMTP id 5b1f17b1804b1-432df7213e0mr16915425e9.2.1731665342469; Fri, 15 Nov 2024 02:09:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IFX5hOyejSrTD25GGGsLDlnoN+jkYapeeAEPBT89TKS/NlDvnmq6LDDkhqjHCYou3YWLSyBjg== X-Received: by 2002:a05:600c:4f4a:b0:42f:75cd:2566 with SMTP id 5b1f17b1804b1-432df7213e0mr16915125e9.2.1731665341930; Fri, 15 Nov 2024 02:09:01 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432da27fcb4sm53213245e9.23.2024.11.15.02.09.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Nov 2024 02:09:01 -0800 (PST) Date: Fri, 15 Nov 2024 11:08:55 +0100 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v8 7/8] vhost-user: add vhost-user Message-ID: <20241115110855.4d2d9742@elisabeth> In-Reply-To: <03986996-a879-4109-8a7c-cc68c1c00464@redhat.com> References: <20241010122903.1188992-1-lvivier@redhat.com> <20241010122903.1188992-8-lvivier@redhat.com> <20241017021034.437f3757@elisabeth> <81955149-fc8f-47ef-b6fb-ce284e8e8b1b@redhat.com> <20241114152316.6843a427@elisabeth> <03986996-a879-4109-8a7c-cc68c1c00464@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: pu133hCsQHmjuh-dcmxOQGYK_PGyUOn7xSLn34FkjZk_1731665344 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: RU2VJIXZD44DCPVM2GIPOMTAJDAFZCWV X-Message-ID-Hash: RU2VJIXZD44DCPVM2GIPOMTAJDAFZCWV X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 15 Nov 2024 09:30:45 +0100 Laurent Vivier wrote: > On 14/11/2024 15:23, Stefano Brivio wrote: > > On Thu, 14 Nov 2024 11:23:11 +0100 > > Laurent Vivier wrote: > > > >> On 17/10/2024 02:10, Stefano Brivio wrote: > >>>> +/** > >>>> + * tcp_vu_data_from_sock() - Handle new data from socket, queue to vhost-user, > >>>> + * in window > >>>> + * @c: Execution context > >>>> + * @conn: Connection pointer > >>>> + * > >>>> + * Return: Negative on connection reset, 0 otherwise > >>>> + */ > >>>> +int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > >>>> +{ > >>>> + uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; > >>>> + struct vu_dev *vdev = c->vdev; > >>>> + struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; > >>>> + const struct flowside *tapside = TAPFLOW(conn); > >>>> + uint16_t mss = MSS_GET(conn); > >>>> + size_t l2_hdrlen, fillsize; > >>>> + int i, iov_cnt, iov_used; > >>>> + int v4 = CONN_V4(conn); > >>>> + uint32_t already_sent = 0; > >>>> + const uint16_t *check; > >>>> + struct iovec *first; > >>>> + int frame_size; > >>>> + int num_buffers; > >>>> + ssize_t len; > >>>> + > >>>> + if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) { > >>>> + flow_err(conn, > >>>> + "Got packet, but RX virtqueue not usable yet"); > >>>> + return 0; > >>>> + } > >>>> + > >>>> + already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; > >>>> + > >>>> + if (SEQ_LT(already_sent, 0)) { > >>>> + /* RFC 761, section 2.1. */ > >>>> + flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", > >>>> + conn->seq_ack_from_tap, conn->seq_to_tap); > >>>> + conn->seq_to_tap = conn->seq_ack_from_tap; > >>>> + already_sent = 0; > >>>> + } > >>>> + > >>>> + if (!wnd_scaled || already_sent >= wnd_scaled) { > >>>> + conn_flag(c, conn, STALLED); > >>>> + conn_flag(c, conn, ACK_FROM_TAP_DUE); > >>>> + return 0; > >>>> + } > >>>> + > >>>> + /* Set up buffer descriptors we'll fill completely and partially. */ > >>>> + > >>>> + fillsize = wnd_scaled; > >>>> + > >>>> + if (peek_offset_cap) > >>>> + already_sent = 0; > >>>> + > >>>> + iov_vu[0].iov_base = tcp_buf_discard; > >>>> + iov_vu[0].iov_len = already_sent; > >>>> + fillsize -= already_sent; > >>>> + > >>>> + /* collect the buffers from vhost-user and fill them with the > >>>> + * data from the socket > >>>> + */ > >>>> + iov_cnt = tcp_vu_sock_recv(c, conn, v4, fillsize, &len); > >>>> + if (iov_cnt <= 0) > >>>> + return iov_cnt; > >>>> + > >>>> + len -= already_sent; > >>>> + if (len <= 0) { > >>>> + conn_flag(c, conn, STALLED); > >>>> + vu_queue_rewind(vq, iov_cnt); > >>>> + return 0; > >>>> + } > >>>> + > >>>> + conn_flag(c, conn, ~STALLED); > >>>> + > >>>> + /* Likely, some new data was acked too. */ > >>>> + tcp_update_seqack_wnd(c, conn, 0, NULL); > >>>> + > >>>> + /* initialize headers */ > >>>> + l2_hdrlen = tcp_vu_l2_hdrlen(!v4); > >>>> + iov_used = 0; > >>>> + num_buffers = 0; > >>>> + check = NULL; > >>>> + frame_size = 0; > >>>> + > >>>> + /* iov_vu is an array of buffers and the buffer size can be > >>>> + * smaller than the frame size we want to use but with > >>>> + * num_buffer we can merge several virtio iov buffers in one packet > >>>> + * we need only to set the packet headers in the first iov and > >>>> + * num_buffer to the number of iov entries > >>> ...this part is clear to me, what I don't understand is if we still > >>> have a way to guarantee that the sum of several buffers is big enough > >>> to fit frame_size bytes. > >> > >> We don't have this garantee. But I think it's the same for the socket version? > > > > Well, there we do: > > > > fill_bufs = DIV_ROUND_UP(wnd_scaled - already_sent, mss); > > if (fill_bufs > TCP_FRAMES) { > > fill_bufs = TCP_FRAMES; > > > > and we don't fetch more data than that from the socket (in one pass). > > > > Is this implicit in the i < iov_cnt loop condition here? That's the part > > I don't understand: how do we limit the amount of data we can dequeue > > from a socket in one single pass. > > > > In the loop "i < iov_cnt" is the number of available buffers collected previously. Usually > the size of one buffer is 1536 bytes. We join the buffers here (when > VIRTIO_NET_F_MRG_RXBUF is avaialble) to create frame with a size of "mss". > > iov_cnt is computed in tcp_vu_sock_recv(): this is the number of buffers we have collected > from the queue to have enough space to store fillsize bytes. But if we don't have enough > buffers in the queue ioc_cnt will be lower and the size of the data we will collect will > be truncated. Oh, I see, like I suspected. I find it clearer to actually iterate on the "source" (data we have to transfer) rather than on the destination (buffers we might fill), as we do in tcp_data_from_sock(), but there it's easier as we own the buffers and we know they all have the same size. Never mind then, I guess it's clear enough for now, maybe at some point we'll find a reasonable way to turn "i < iov_cnt" into a stop condition and iterate in the for loop on a different variable. -- Stefano