From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 698BD5A9D25 for ; Thu, 9 May 2024 02:28:33 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715214509; bh=SiHKdTszm2nUbrGkTJTMlXFN/fdot4rCuB7uLc8Nn1c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HaauXuEWh3gPBeQTRBvTgCZZbTbGpDvu/WA2p+10mhugG+2A23RSIY7kgTard573P CUsQPnOwi0oX9NaydOr64ClK0lTqf3S6x7+m56JF0LrtMmwQNJMtYGIjXXOMhDZE6m zSsrHTXcWv8GbyqR7XpWUvsdBDs58SVIAqJY4XprzWypg3RwXZv1LO+2V1xIlxk5v/ CIKnuwNObf/MYOqxgKpMk4KiLVkts0UfZQUJ1HIjyITM6dxouCO4l4EQ332xT+ROS9 doYqICsZFEYdTNVQPLZKTSxsSknp5pCjnperNwpR/1GcYDfBEOltLbmrf2/b0dJOJT hcj2AWba9Vwcg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4VZXtP1J7Zz4x0K; Thu, 9 May 2024 10:28:29 +1000 (AEST) Date: Thu, 9 May 2024 10:28:22 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied Message-ID: References: <20240508090338.2735208-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Yyz2oIwN1bRqKJo8" Content-Disposition: inline In-Reply-To: <20240508090338.2735208-1-sbrivio@redhat.com> Message-ID-Hash: MK52KFQ2VN3OC23AP6XLJEKIK2LNAGBU X-Message-ID-Hash: MK52KFQ2VN3OC23AP6XLJEKIK2LNAGBU X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Yyz2oIwN1bRqKJo8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 08, 2024 at 11:03:38AM +0200, Stefano Brivio wrote: > In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(), > from pipe to receiving socket, that doesn't necessarily mean that the > pipe is empty: the receiver buffer might be full instead. >=20 > Hence, we can't use the 'never_read' flag to decide that there's > nothing to wait for: even if we didn't read anything from the sending > side in a given iteration, we might still have data to send in the > pipe. Use read/written counters, instead. >=20 > This fixes an issue where large bulk transfers would occasionally > hang. From a corresponding strace: >=20 > 0.000061 epoll_wait(4, [{events=3DEPOLLOUT, data=3D{u32=3D29442, u64= =3D12884931330}}], 8, 1000) =3D 1 > 0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=3DEPOLLIN|EPOLLRDH= UP, data=3D{u32=3D54018, u64=3D8589988610}}) =3D 0 > 0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=3DEPOLLIN|EPOLLRDH= UP, data=3D{u32=3D29442, u64=3D12884931330}}) =3D 0 > 0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D -1 EAGAIN (Resource temporarily unavailable) > 0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D 1048576 > 0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D -1 EAGAIN (Resource temporarily unavailable) > 0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D 520415 > 0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D -1 EAGAIN (Resource temporarily unavailable) > 0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_= F_NONBLOCK) =3D -1 EAGAIN (Resource temporarily unavailable) > 0.000044 epoll_wait(4, [], 8, 1000) =3D 0 >=20 > we're reading from socket 211 into to the pipe end numbered 151, > which connects to pipe end 150, and from there we're writing into > socket 115. >=20 > We initially drop EPOLLOUT from the set of monitored flags for socket > 115, because it already signaled it's ready for output. Then we read > nothing from socket 211 (the sender had nothing to send), and we keep > emptying the pipe into socket 115 (first 1048576 bytes, then 520415 > bytes). >=20 > This call of tcp_splice_sock_handler() ends with EAGAIN on the writing > side, and we just exit this function without setting the OUT_WAIT_1 > flag (and, in turn, EPOLLOUT for socket 115). However, it turns out, > the pipe wasn't actually emptied, and while socket 211 had nothing > more to send, we should have waited on socket 115 to be ready for > output again. >=20 > As a further step, we could consider not clearing EPOLLOUT at all, > unless the read/written counters match, but I'm first trying to fix > this ugly issue with a minimal patch. >=20 > Link: https://github.com/containers/podman/issues/22575 > Link: https://github.com/containers/podman/issues/22593 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp_splice.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) >=20 > diff --git a/tcp_splice.c b/tcp_splice.c > index 42b7be0..4c36b72 100644 > --- a/tcp_splice.c > +++ b/tcp_splice.c > @@ -616,7 +616,7 @@ eintr: > if (errno !=3D EAGAIN) > goto close; > =20 > - if (never_read) > + if (conn->read[fromside] =3D=3D conn->written[fromside]) > break; > =20 > conn_event(c, conn, --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Yyz2oIwN1bRqKJo8 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmY8GJkACgkQzQJF27ox 2Ge+/A/7BYCCb+1pwadKZp55I69ug8+9Da0VavBQSPu+/LMhRcw5NqG31ua7xHUL rRTxQ6lTNMV0YQZ4ahpkquEyrrm8OnPDORP3iltq04220afMild8r6iH1ZVzYzG4 aSoFSzgKDsBOYp4BVy07JMMBYw5i9XtMNHlxnOHTo+qL9B6BdWGa/eSa/saJIFGz EMuKp9FqoezDXD+QiA52A16dcvYmGfZc4vd0KZ/c97cgm2KRmC5IwFLDA7puQwfI qoPAsR9vzRxcb3QCeQjwwrT7j3RB7F/RBksSBGd9T3k5XV+B2fkuM6Z/qrHA2Z7M 3Qj/ATF6m8Yi1PG4VhI+x8Qi+qwsasMJGl1IvNw4Z/rAzIoUIe/HcoB9Pq+ZKTvO xl3Rw6dqpL/vO+45vS401j8uSBQkxekAePynlVrAJx3TAXvrJ60CXQRpVnY0FpjF QS8cVJ/+CQjlcxU7fUhqDBi3BSwxyRBHxMUIa1TcZIiyzRm7Gq41kshBsirBsq2o /5zH213jxZfOzHPN58w6VXbSsayW7zupgo16A/+la7zy0JlIW9dm43puJH+Md3/L S/u/Vfgo+2kKREN1blnfYDzCvmZ31X/M5k63lpNeG6/6n8DqTpV2YWgDQGUgcm+c loH9dlEClRi4HyeAGpfat2uqrPQ3nwvXowAYj/lJFp3YyvQZhig= =I/Wb -----END PGP SIGNATURE----- --Yyz2oIwN1bRqKJo8--