From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FgJFQem+; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id D55125A004C for ; Wed, 23 Oct 2024 18:24:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729700639; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mSwOoZkJ9r/pIyDmNTEo7BIGT6xscHUw7wpWwGSYRjk=; b=FgJFQem+szr92IF+PnFJuU/zJBf/5Cdf1SUdpxBGMsVfKfwjJgtTE9NrCdHvj3sAOdKjce 4EXjRYGYPyrxxGEyOqyJM40U9IDrKRzgxxwNHIEQ4dIdFziIyqZrgDiW/mi8NYoZCwy7pi jmLhIfETGAbVwDcwwqtqplPSSOH0ClY= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-386-SbkWyDhaM_-hkShgsyQsXQ-1; Wed, 23 Oct 2024 12:23:57 -0400 X-MC-Unique: SbkWyDhaM_-hkShgsyQsXQ-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-37d4af408dcso3930088f8f.0 for ; Wed, 23 Oct 2024 09:23:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729700636; x=1730305436; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/1vehiEIxMFhFuEPGM2uceujP4oCq8U8Ap0AOjQ1POk=; b=qI3BuRQO2K5knpI1X5s6/bf6RfYEszEZh6MG0RBCcXg15qOZ4E1IJV/2cmEbo49cTX vqrY+usY5ZXNhxNOzaWFlHDR+rgvurbEw2WPqTBI5LRz3n5EiqasrxojcZCdqRiWOJCt ks3o08Z63O/GuNioAkZpMprcVXKKgabm0pBA1nTzuBtV/XAJzrEIBD4FxvTCXMafi+m2 P5hjNf3x7+yqOvO8ycdl5xcEYYlH3VzBUldtVKKhhTpItQS+P4sgY9tZRVivyRZY4wjW VMtiyjRdfiDiD1ly/cSlFG81bCyfbchZeAneF9u2wfLLi5rKI+47Szl8me4jn6/3m/nT g5xA== X-Forwarded-Encrypted: i=1; AJvYcCUJ24DCIySYZIXls0Tn/AXOhzP6z5ofouzw58+EWqAkbl3OUsqFGJGeP4JVeywoXepKw/VBru6Iwwo=@passt.top X-Gm-Message-State: AOJu0Yw707ulC/Pgbk2KGzPtL7AsY8DmD1mwsuoLip7lOutw675Zeifi fhVCzNCj+0V7BmVjQ85nC+ApQ/TnfiZH0QJAREq9ho84qWU/qETU7YhGG/mIp3HvJltmN7w8ANr 4+cD/oOCDstBCR2mAa4xJwkh3BSIfMMRLALNYJZG5Kb6suLoPrA== X-Received: by 2002:adf:fd0e:0:b0:37d:49a1:40c7 with SMTP id ffacd0b85a97d-37efcf1b51amr2199520f8f.28.1729700636441; Wed, 23 Oct 2024 09:23:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEYruXrHl5h0FV2/JFgfazerDn2mecYFjApq1zJdgK8+WiC/cWoZp3LKAddfpOa0Ar00l6SgA== X-Received: by 2002:adf:fd0e:0:b0:37d:49a1:40c7 with SMTP id ffacd0b85a97d-37efcf1b51amr2199503f8f.28.1729700635932; Wed, 23 Oct 2024 09:23:55 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37ee0a585f7sm9143700f8f.51.2024.10.23.09.23.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Oct 2024 09:23:54 -0700 (PDT) Date: Wed, 23 Oct 2024 18:23:53 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v8 7/8] vhost-user: add vhost-user Message-ID: <20241023182353.2e471c17@elisabeth> In-Reply-To: <8ac883f3-f1ff-4cb7-85e7-02e590313261@redhat.com> References: <20241010122903.1188992-1-lvivier@redhat.com> <20241010122903.1188992-8-lvivier@redhat.com> <20241015215438.1595b4d7@elisabeth> <20241017021031.1adb421e@elisabeth> <8ac883f3-f1ff-4cb7-85e7-02e590313261@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: GDLULDDMSD5UIYNRBGPU4TXIIBLFVXZ7 X-Message-ID-Hash: GDLULDDMSD5UIYNRBGPU4TXIIBLFVXZ7 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top, Jon Maloy X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 23 Oct 2024 17:27:49 +0200 Laurent Vivier wrote: > On 22/10/2024 14:59, Laurent Vivier wrote: > > On 17/10/2024 02:10, Stefano Brivio wrote: =20 > >> On Wed, 16 Oct 2024 11:41:34 +1100 > >> David Gibson wrote: > >> =20 > >>> On Tue, Oct 15, 2024 at 09:54:38PM +0200, Stefano Brivio wrote: =20 > >>>> [Still partial review] =20 > >>> [snip] =20 > >>>>> +=C2=A0=C2=A0=C2=A0 if (peek_offset_cap) > >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 already_sent =3D 0; > >>>>> + > >>>>> +=C2=A0=C2=A0=C2=A0 iov_vu[0].iov_base =3D tcp_buf_discard; > >>>>> +=C2=A0=C2=A0=C2=A0 iov_vu[0].iov_len =3D already_sent; =20 > >>>> > >>>> I think I had a similar comment to a previous revision. Now, I haven= 't > >>>> tested this (yet) on a kernel with support for SO_PEEK_OFF on TCP, b= ut > >>>> I think this should eventually follow the same logic as the (updated= ) > >>>> tcp_buf_data_from_sock(): we should use tcp_buf_discard only if > >>>> (!peek_offset_cap). > >>>> > >>>> It's fine to always initialise VIRTQUEUE_MAX_SIZE iov_vu items, > >>>> starting from 1, for simplicity. But I'm not sure if it's safe to pa= ss a > >>>> zero iov_len if (peek_offset_cap). =20 > >>> =20 > >>>> I'll test that (unless you already did) -- if it works, we can fix t= his > >>>> up later as well. =20 > >>> > >>> I believe I tested it at some point, and I think we're already using > >>> it somewhere. =20 > >> > >> I tested it again just to be sure on a recent net.git kernel: sometime= s > >> the first test in passt_vu_in_ns/tcp, "TCP/IPv4: host to guest: big > >> transfer" hangs on my setup, sometimes it's the "TCP/IPv4: ns to guest > >> (using loopback address): big transfer" test instead. > >> > >> I can reproduce at least one of the two issues consistently (tests > >> stopped 5 times out of 5). > >> > >> The socat client completes the transfer, the server is still waiting > >> for something. I haven't taken captures yet or tried to re-send from > >> the client. > >> > >> It all works (consistently) with an older kernel without support for > >> SO_PEEK_OFF on TCP, but also on this kernel if I force peek_offset_cap > >> to false in tcp_init(). > >> =20 > >=20 > > I have a fix for that but there is an error I don't understand: > > when I run twice the test, the second time I have: > >=20 > > guest: > > # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc > > # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc > > 2024/10/22 08:51:58 socat[1485] E bind(5, {AF=3D2 0.0.0.0:10001}, 16): = Address already in use > >=20 > > host: > > $ socat -u OPEN:test/big.bin TCP4:127.0.0.1:10001 > >=20 > > If I wait a little it can work again several times and fails again. > >=20 > > Any idea? > >=20 > > The patch is: > > diff --git a/tcp_vu.c b/tcp_vu.c > > index 78884c673215..83e40fb07a03 100644 > > --- a/tcp_vu.c > > +++ b/tcp_vu.c > > @@ -379,6 +379,10 @@ int tcp_vu_data_from_sock(const struct ctx *c, str= uct tcp_tap_conn=20 > > *conn) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 conn->seq_ack_from_tap, conn->seq_to_tap); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 conn->seq_to_tap =3D conn->seq_ack_from_tap; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 already_sent =3D 0; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 if (tcp_set_peek_offset(conn->sock, 0)) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tcp_rst(c, = conn); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return -1; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 } > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!wnd_scaled || already_= sent >=3D wnd_scaled) { > > @@ -389,14 +393,13 @@ int tcp_vu_data_from_sock(const struct ctx *c, st= ruct tcp_tap_conn=20 > > *conn) > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Set up buffer descriptor= s we'll fill completely and partially. */ > >=20 > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fillsize =3D wnd_scaled; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fillsize =3D wnd_scaled - already= _sent; > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (peek_offset_cap) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 already_sent =3D 0; > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 iov_vu[0].iov_base =3D tcp_= buf_discard; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 iov_vu[0].iov_len =3D alrea= dy_sent; > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fillsize -=3D already_sent; > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* collect the buffers from= vhost-user and fill them with the > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * data from the socke= t > >=20 > > =20 >=20 > For the moment, I can see a behavior change of recvmsg() with the new ker= nel. >=20 > without peek_offset_cap, if no new data is available, it returns "already= _sent", so it=20 > enters in (found with tcp_vu.c but code samples from tcp_buf.c): >=20 > =09=3D=3D> recvmsg() returns already_sent, so len > 0 =20 >=20 > ... > sendlen -=3D already_sent; =3D=3D> here sendlen becomes 0 >=20 > if (sendlen <=3D 0) { > conn_flag(c, conn, STALLED); > return 0; > } >=20 > With peek_offset, it returns -1, so it enters in: This is expected, I think (and unfortunately not documented). >=20 > if (len < 0) > goto err; > ... > err: > if (errno !=3D EAGAIN && errno !=3D EWOULDBLOCK) { But errno here should be EAGAIN, so yes, it looks buggy to me in the sense that: > ret =3D -errno; > tcp_rst(c, conn); > } we return 0 here without setting the STALLED flag. While it should be fixed, that flag is some kind of optimisation, so this doesn't really explain the issue that I mentioned in 20241022201914.072f7c7d@elisabeth: https://archives.passt.top/passt-dev/20241022201914.072f7c7d@elisabeth/ As a quick fix, you should probably do this in tcp_vu_data_from_sock(): =09if (peek_offset_cap)=09/* add this condition */ =09=09len -=3D already_sent; =09if (len <=3D 0 || (peek_offset_cap && len =3D=3D -1 && errno =3D=3D EAGA= IN)) =09=09/* change this condition */ =09=09... ...or you mean that due to this behaviour you don't call vu_queue_rewind() and that causes troubles? --=20 Stefano