On Sun, Feb 18, 2024 at 10:01:24PM +0100, Stefano Brivio wrote: > On Tue, 6 Feb 2024 12:17:28 +1100 > David Gibson wrote: > > > This makes a number of changes to improve error reporting while connecting > > a new spliced socket: > > * We use flow_err() and similar functions so all messages include info > > on which specific flow was affected > > * We use strerror() to interpret raw error values > > * We now report errors on connection (at "trace" level, since this would > > allow spamming the logs) > > * We also look up and report some details on EPOLLERR events, which can > > include connection errors, since we use a non-blocking connect(). Again > > we use "trace" level since this can spam the logs. > > > > Signed-off-by: David Gibson > > --- > > tcp_splice.c | 24 ++++++++++++++++++++---- > > 1 file changed, 20 insertions(+), 4 deletions(-) > > > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 5ba9c8ea..49075e5c 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -349,8 +349,9 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, > > ASSERT(0); > > > > if (conn->s[1] < 0) { > > - warn("Couldn't open connectable socket for splice (%d)", > > - conn->s[1]); > > + flow_err(conn, > > + "Couldn't open connectable socket for splice: %s", > > + strerror(-conn->s[1])); > > It took me a bit to convince myself that we actually store negative > error codes in pools, which sounds like a neat idea, except for two > things: Yeah, it took me a while to discover that too :) > - in tcp_sock_refill_pool(), once we get an error from > tcp_conn_new_sock(), should we really continue to call it and try to > get more sockets? > > Presumably, that will have something to do with some kind of resource > exhaustion, so perhaps we shouldn't risk making it worse and just > stop refilling the pool. If we do, we might have up to one negative > error code in the pool, I guess. Some sort of exhaustion is certainly the most likely cause, yes. > - if we have an error code in the pool, that error might have occurred > a while ago, but here, we're logging it at the moment we need that > socket. Fair point. > Maybe it would be simpler and saner to just have -1 values in the pool > (error or no socket available) and report errors in tcp_conn_new_sock() > instead? Yeah, that makes sense. I'll rework to that goal. > I'm still reviewing patches 17/22 to 22/22, sorry for the delay. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson