From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=B7hchye6; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 445885A004E for ; Tue, 22 Oct 2024 20:19:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729621163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ef1Npw7z89x39yL36cun2C8qGPa/80WvABpnj+L/JJQ=; b=B7hchye6wE3FwImZvf3wz47/uyJBbEuNhUfTDww/fUwSsAY7dgDWqMkjNAYrKnM55Hod/l Y9UrjXOnqHRQEZ5O8omw+qXMS+Ee20VOpKxbvS03ZNwspZCtKB1RQC0xMsiV2NrHo+Jl2l T+1kOpIR0HZa3g+0RiiiBRROUQVnXkY= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-122-y-B8huXeN0usFujnmJk82A-1; Tue, 22 Oct 2024 14:19:22 -0400 X-MC-Unique: y-B8huXeN0usFujnmJk82A-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4316655b2f1so29435775e9.0 for ; Tue, 22 Oct 2024 11:19:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729621161; x=1730225961; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ef1Npw7z89x39yL36cun2C8qGPa/80WvABpnj+L/JJQ=; b=Ctsz3taUAU5eQTlHfMO+zn5t7nb7qae/lv6kQoZP02xvEzEMRKJ2xkhoTi22Sik/o3 6OZsI+53tIST88HTqWjB/vARJUqXgfDj/akK2dY9da1LEJEIRE9WvCQDcr7D5goU6UE1 a+8M5YKM3dvdpauUzpiVdBJLOsa0fIRdbAZpuhr8xSuuPdDXEsRype/S1KNa+U3S4TGB mw+PQe7UoM2wjGUAmcUHLcaPK6+8+Q+5DsQesJ5RcAHl+/oQUkK3VnqropeQI0usysun MZ4rgqZ6iWWoWuo4gJZeIX4kjxhZcmSKgq5RAV2LLr+0vPwYdmIuT5dBhJKqdUB41dQN 84uA== X-Forwarded-Encrypted: i=1; AJvYcCUNk1axXcxWg4/IWbQtk/7OXTBqSkYAjAaiJApKBQALFWTnMhrEEukDxbrRl6H9c6mptlUjZAlTGjk=@passt.top X-Gm-Message-State: AOJu0Ywc2c2MdwwmBH00CFYRSmwc6+QJ8VJ68MJfPrZlx6ulB08mnzkh y4KVW7KQgTJ+YW969EDInxmRTsOlPbSc31GRmfrrlI2gtLtRPSNkjulLWW4fCO4h4UKY7sBOZEj cadnxqdDUT1VzhfnUCAftVjzyyQYfW2gMaDqNIbkysPCjMJGWRg== X-Received: by 2002:a05:600c:5114:b0:431:5f9e:3f76 with SMTP id 5b1f17b1804b1-4318415bd5amr1189055e9.16.1729621160571; Tue, 22 Oct 2024 11:19:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/FSezj5kIx5rHwWLkOjDfuLMqDkBaajLj/JpRbqO+Pw8wLpuhReFpxDOqFfKHdKGtn73Y8w== X-Received: by 2002:a05:600c:5114:b0:431:5f9e:3f76 with SMTP id 5b1f17b1804b1-4318415bd5amr1188305e9.16.1729621158557; Tue, 22 Oct 2024 11:19:18 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4316f5c300csm95246525e9.30.2024.10.22.11.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 11:19:17 -0700 (PDT) Date: Tue, 22 Oct 2024 20:19:14 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v8 7/8] vhost-user: add vhost-user Message-ID: <20241022201914.072f7c7d@elisabeth> In-Reply-To: <20241022151953.3b99a7cd@elisabeth> References: <20241010122903.1188992-1-lvivier@redhat.com> <20241010122903.1188992-8-lvivier@redhat.com> <20241015215438.1595b4d7@elisabeth> <20241017021031.1adb421e@elisabeth> <20241022151953.3b99a7cd@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: QFE5IBSLEK574KTGGZAVWY2GTPF7Z5ZV X-Message-ID-Hash: QFE5IBSLEK574KTGGZAVWY2GTPF7Z5ZV X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 22 Oct 2024 15:19:53 +0200 Stefano Brivio wrote: > On Tue, 22 Oct 2024 14:59:19 +0200 > Laurent Vivier wrote: > > > On 17/10/2024 02:10, Stefano Brivio wrote: > > > On Wed, 16 Oct 2024 11:41:34 +1100 > > > David Gibson wrote: > > > > > >> On Tue, Oct 15, 2024 at 09:54:38PM +0200, Stefano Brivio wrote: > > >>> [Still partial review] > > >> [snip] > > >>>> + if (peek_offset_cap) > > >>>> + already_sent = 0; > > >>>> + > > >>>> + iov_vu[0].iov_base = tcp_buf_discard; > > >>>> + iov_vu[0].iov_len = already_sent; > > >>> > > >>> I think I had a similar comment to a previous revision. Now, I haven't > > >>> tested this (yet) on a kernel with support for SO_PEEK_OFF on TCP, but > > >>> I think this should eventually follow the same logic as the (updated) > > >>> tcp_buf_data_from_sock(): we should use tcp_buf_discard only if > > >>> (!peek_offset_cap). > > >>> > > >>> It's fine to always initialise VIRTQUEUE_MAX_SIZE iov_vu items, > > >>> starting from 1, for simplicity. But I'm not sure if it's safe to pass a > > >>> zero iov_len if (peek_offset_cap). > > >> > > >>> I'll test that (unless you already did) -- if it works, we can fix this > > >>> up later as well. > > >> > > >> I believe I tested it at some point, and I think we're already using > > >> it somewhere. > > > > > > I tested it again just to be sure on a recent net.git kernel: sometimes > > > the first test in passt_vu_in_ns/tcp, "TCP/IPv4: host to guest: big > > > transfer" hangs on my setup, sometimes it's the "TCP/IPv4: ns to guest > > > (using loopback address): big transfer" test instead. > > > > > > I can reproduce at least one of the two issues consistently (tests > > > stopped 5 times out of 5). > > > > > > The socat client completes the transfer, the server is still waiting > > > for something. I haven't taken captures yet or tried to re-send from > > > the client. > > > > > > It all works (consistently) with an older kernel without support for > > > SO_PEEK_OFF on TCP, but also on this kernel if I force peek_offset_cap > > > to false in tcp_init(). > > > > > > > I have a fix for that but there is an error I don't understand: > > when I run twice the test, the second time I have: > > > > guest: > > # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc > > # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc > > 2024/10/22 08:51:58 socat[1485] E bind(5, {AF=2 0.0.0.0:10001}, 16): Address already in use > > > > host: > > $ socat -u OPEN:test/big.bin TCP4:127.0.0.1:10001 > > > > If I wait a little it can work again several times and fails again. > > > > Any idea? > > I guess the connection from the first test is not closed properly, so > it's still in TIME_WAIT or similar. Given the topic, I can imagine that > something goes wrong as we check: > > if (ack && conn->events & TAP_FIN_SENT && > conn->seq_ack_from_tap == conn->seq_to_tap) > conn_event(c, conn, TAP_FIN_ACKED); > > in tcp_data_from_tap(). Or something around FIN segments anyway. > > You can give socat the 'reuseaddr' option, say: > > socat -u TCP4-LISTEN:10001,reuseaddr OPEN:test_big.bin,create,trunc > > to see if that's actually the case. > > > The patch is: > > > > [...] > > I can try that in a bit. ...I was able to reproduce that only once, but I didn't have any debugging option enabled. I kept retrying with your follow-up patch but I'm back to the same behaviour as I had without it (client completing transfer, server hanging). That single time I reproduced it, both client and server exited, but I couldn't restart the server right away on the same port. Anyway, looking further into this: I'm using a recent net.git kernel, with this series on top of current passt's HEAD and this follow-up patch. With --trace, I see that the transfer completes, then, once the client is done, I get: --- 37.6288: Flow 0 (TCP connection): flag at tcp_vu_data_from_sock:417 37.6288: vhost-user: virtqueue can skip notify... 37.6288: Flow 0 (TCP connection): flag at tcp_vu_data_from_sock:477 37.6289: Flow 0 (TCP connection): timer expires in 2.000s 37.6289: passt: epoll event on connected TCP socket 75 (events: 0x00002001) --- EPOLLRDHUP | EPOLLIN on the socket --- 37.6289: Flow 0 (TCP connection): event at tcp_sock_handler:2339 37.6289: Flow 0 (TCP connection): SOCK_FIN_RCVD: ESTABLISHED -> CLOSE_WAIT --- FIN segment (inferred) from the socket --- 37.6289: Flow 0 (TCP connection): event at tcp_vu_sock_recv:262 37.6289: Flow 0 (TCP connection): TAP_FIN_SENT: CLOSE_WAIT -> LAST_ACK --- FIN to the guest --- 37.6645: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6646: vhost-user: ot kick_data: 0000000000000001 idx:1 37.6646: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (1 packet) 37.6646: Flow 0 (TCP connection): packet length 20 from tap 37.6646: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1215 37.6646: Flow 0 (TCP connection): timer expires in 2.000s 37.6647: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 37.6670: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6671: vhost-user: ot kick_data: 0000000000000001 idx:1 37.6671: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (2 packets) 37.6671: Flow 0 (TCP connection): packet length 20 from tap 37.6671: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1215 37.6671: Flow 0 (TCP connection): timer expires in 2.000s 37.6671: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 37.6671: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6671: vhost-user: ot kick_data: 0000000000000002 idx:1 37.6672: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (1 packet) 37.6672: Flow 0 (TCP connection): packet length 20 from tap 37.6672: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1210 37.6672: Flow 0 (TCP connection): ACK_FROM_TAP_DUE dropped 37.6672: Flow 0 (TCP connection): event at tcp_data_from_tap:1922 37.6672: Flow 0 (TCP connection): TAP_FIN_ACKED: LAST_ACK -> LAST_ACK --- everything ACKed from the guest, including the FIN --- 37.6672: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 --- STALLED flag unset ...and now we're waiting for the guest to send us a FIN, so that we can "send" it to the socket (shutdown(x, SHUT_WR)) and close the connection in tap_sock_handler(): if (conn->events & ESTABLISHED) { if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) conn_event(c, conn, CLOSED); ...but that never comes. I also took a capture (attached), in a separate attempt matching this one (timestamps might look different), which also shows that we're getting all the packets acknowledged by the guest, but we're never sending a FIN to it. This seems to be confirmed by stracing socat in the guest. The transfer ends like this: read(6, "M6\263%\245\257\205\24\341\316\377\270\306\301\244\17\333\241/E\211/\243g\367\23\216-\346\306\22\356"..., 8192) = 3056 recvfrom(3, 0x7ffeb3145360, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb3145860, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(5, "M6\263%\245\257\205\24\341\316\377\270\306\301\244\17\333\241/E\211/\243g\367\23\216-\346\306\22\356"..., 3056) = 3056 recvfrom(3, 0x7ffeb3145860, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) pselect6(7, [6], [5], [], NULL, NULL) = 1 (out [5]) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) pselect6(7, [6], [], [], NULL, NULL ...we should see an event on the socket and a zero-sized recvfrom() if we sent a FIN. If we see TAP_FIN_SENT in the logs, it means that tcp_vu_send_flag() is called, but it doesn't look like it's sending the FIN | ACK flags as we request in tcp_vu_sock_recv(): int retf = tcp_vu_send_flag(c, conn, FIN | ACK); So I added prints in tcp_vu_send_flag(): --- diff --git a/tcp_vu.c b/tcp_vu.c index fc077c7..234d909 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -99,6 +99,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) int nb_ack; int ret; + err("%s:%i", __func__, __LINE__); + hdrlen = tcp_vu_l2_hdrlen(CONN_V6(conn)); vu_init_elem(elem, iov_vu, 2); @@ -109,6 +111,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) if (elem_cnt < 1) return 0; + err("%s:%i", __func__, __LINE__); + vu_set_vnethdr(vdev, &iov_vu[0], 1, 0); eh = vu_eth(iov_vu[0].iov_base); @@ -139,6 +143,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) return ret; } + err("%s:%i", __func__, __LINE__); + l4len = tcp_fill_headers4(conn, NULL, iph, payload, optlen, NULL, seq, true); l2len = sizeof(*iph); @@ -165,6 +171,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) return ret; } + err("%s:%i", __func__, __LINE__); + l4len = tcp_fill_headers6(conn, NULL, ip6h, payload, optlen, seq, true); l2len = sizeof(*ip6h); @@ -196,6 +204,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) vu_flush(vdev, vq, elem, nb_ack); + err("%s:%i", __func__, __LINE__); + return 0; } --- and it looks like vu_collect_one_frame() returns 0, so tcp_vu_send_flag() returns 0, we think we sent the FIN segment to the guest, but we didn't: --- 21.9120: Flow 0 (TCP connection): event at tcp_sock_handler:2339 21.9120: tcp_vu_send_flag:102 21.9120: Flow 0 (TCP connection): event at tcp_vu_sock_recv:272 21.9120: Flow 0 (TCP connection): TAP_FIN_SENT: CLOSE_WAIT -> LAST_ACK --- ...if vu_collect_one_frame() returns >= 1, as it usually does, tcp_vu_send_flag() would reach at least line 114. Side note: I thought I commented on this in a previous revision but I can't find my comment anymore: if vu_collect_one_frame() returns 0, I guess that should be an error. I don't know why vu_collect_one_frame() returns 0 here... I hope you do. :) -- Stefano