From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 5435D5A0279 for ; Mon, 19 Feb 2024 04:29:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1708313382; bh=rtNeNV3xXoEFJY0/Jhc9PRyYDB5Jjv7ny/bUN44z53U=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nLpg1I4Xkqmp4vNX9O9neCC0DEmJ7FYB2xMIXk0tH2xFe9d1MVP3odrTbrBRHtJ4V ghLVlMhSwDbPhiz1nny7qh0oE6hJc47z60ERbvteQyZTSgccT+A87YmjChrtOLv0Ak h2WTgd1ThCbTmCtlSIZK71FFYsi0dHpw6qP+TMlZ+669fK4l50e1QqyyG/H0NnFSkz /peuuGWSwEz6+YY/DO9Pf/aY8Ae1PNBHvh1LIwu81hy+NbCMxq9qv++mAcLmzcYlvh F79e1p//8MnFs3i7hA8B6z47BF8BCNgHvUGwXMWpjD2ypzSvK/xUmjWGLfrxduM5+z FEBETew/gj8UA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TdShQ1RFfz4wcM; Mon, 19 Feb 2024 14:29:42 +1100 (AEDT) Date: Mon, 19 Feb 2024 14:23:36 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 16/22] tcp_splice: Improve error reporting on connect path Message-ID: References: <20240206011734.884138-1-david@gibson.dropbear.id.au> <20240206011734.884138-17-david@gibson.dropbear.id.au> <20240218220124.1375ca39@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="i3jhUQ90Wwr2nuGz" Content-Disposition: inline In-Reply-To: <20240218220124.1375ca39@elisabeth> Message-ID-Hash: J6VB7CGC3DPZYNKUFSCTP2TQCILDAHKN X-Message-ID-Hash: J6VB7CGC3DPZYNKUFSCTP2TQCILDAHKN X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --i3jhUQ90Wwr2nuGz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 18, 2024 at 10:01:24PM +0100, Stefano Brivio wrote: > On Tue, 6 Feb 2024 12:17:28 +1100 > David Gibson wrote: >=20 > > This makes a number of changes to improve error reporting while connect= ing > > a new spliced socket: > > * We use flow_err() and similar functions so all messages include info > > on which specific flow was affected > > * We use strerror() to interpret raw error values > > * We now report errors on connection (at "trace" level, since this wou= ld > > allow spamming the logs) > > * We also look up and report some details on EPOLLERR events, which can > > include connection errors, since we use a non-blocking connect(). A= gain > > we use "trace" level since this can spam the logs. > >=20 > > Signed-off-by: David Gibson > > --- > > tcp_splice.c | 24 ++++++++++++++++++++---- > > 1 file changed, 20 insertions(+), 4 deletions(-) > >=20 > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 5ba9c8ea..49075e5c 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -349,8 +349,9 @@ static int tcp_splice_connect(const struct ctx *c, = struct tcp_splice_conn *conn, > > ASSERT(0); > > =20 > > if (conn->s[1] < 0) { > > - warn("Couldn't open connectable socket for splice (%d)", > > - conn->s[1]); > > + flow_err(conn, > > + "Couldn't open connectable socket for splice: %s", > > + strerror(-conn->s[1])); >=20 > It took me a bit to convince myself that we actually store negative > error codes in pools, which sounds like a neat idea, except for two > things: Yeah, it took me a while to discover that too :) > - in tcp_sock_refill_pool(), once we get an error from > tcp_conn_new_sock(), should we really continue to call it and try to > get more sockets? >=20 > Presumably, that will have something to do with some kind of resource > exhaustion, so perhaps we shouldn't risk making it worse and just > stop refilling the pool. If we do, we might have up to one negative > error code in the pool, I guess. Some sort of exhaustion is certainly the most likely cause, yes. > - if we have an error code in the pool, that error might have occurred > a while ago, but here, we're logging it at the moment we need that > socket. Fair point. > Maybe it would be simpler and saner to just have -1 values in the pool > (error or no socket available) and report errors in tcp_conn_new_sock() > instead? Yeah, that makes sense. I'll rework to that goal. > I'm still reviewing patches 17/22 to 22/22, sorry for the delay. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --i3jhUQ90Wwr2nuGz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXSybcACgkQzQJF27ox 2Gdi8hAAhfb8vzge4tGDM3ORPINYK2V8DXCmgj5p+yCMPTyghsfbZipIw03xyzzS f8NQylyAOMFnsvK6RBcaQ3VbUr5zglzWYnRbukR+Hd9eRFYOFy+wWPOEHW/YYrh2 dLNvomVGVE3x7zFxVzpFwBw/sm0YG1ZZMq6c4v0BrTBSPRAs2ths/p9jE93aqIJr 3YJXPZz4ZelwDVWzQET+KjaX7atJrjpGHGBU4Bprlgb94tITkxjVdJIon1BOdfFK tdb4tq9+cBxJr/+giTcQixajlZJY54zV3MrB6L7q22Rk7ODNIqfyCu72JFIFioSF OzkLXjMaZrtX0VuzNlkMufevpkDtyUYmAw9QWcG7kR68t9HQ3YmAurY8BY4J/fnU toD2vjIq3Ix8Ea0nJCe9xG64XVG4iTkpTIpw3hNKTXO4AUlTA/UGt3ub+kdQwgRX RoCxW9PpMOCO878x27ahDxLfFItdIiRIonjLf2UHJ1DXmz02E1uBCPZL4+UweilR daxNkyf0ePpaFCa8KC75sY0YO9jzPMFI3VlBAazDuPizq9lWrzfcxPMooX/sfmLH INPSCltPAsJt9XgUrsOodq13YuEPT7m/Eb5QSnaHHbB3gI5htBXjm4iRkgIJdiFz DNnuWykz8X0rqRZVp/2BbgoQ0C4Tnyv3nYc+ieAgfrvIjCXjfuk= =IPzC -----END PGP SIGNATURE----- --i3jhUQ90Wwr2nuGz--