On Wed, Jun 17, 2026 at 01:09:25AM +0200, Stefano Brivio wrote: > On Tue, 9 Jun 2026 12:32:26 +1000 > David Gibson wrote: > > > Most of out flow specific log messages are debug level for fear of flooding > > the logs, even when they report real error conditions that might be off > > of > > > significance. > > > > Now that we have the mechanisms for log message rate limiting, we can do > > better. Promote many flow related messages to warning or error level, with > > rate limiting. While we're there add ratelimiting to a handful of existing > > warning or error level messages. > > (Cc'ing Anshu) > > > They general heuristic is to promote messages that report a failure which > > The > > > is not something that should be triggered by the guest doing something > > weird. This mostly means failures from socket operations we expect to be > > legitimate. > > It all makes sense to me, just two nits below: > > > Adding the ratelimiting means plumbing the 'now' timestamp through much > > more of the code, hence the large churn. > > > > Signed-off-by: David Gibson [snip] > > @@ -593,7 +605,8 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > > > > rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl); > > if (rc) > > - flow_perror(conn, "Error retrieving SO_ERROR"); > > + flow_perror_ratelimit(conn, now, > > + "Error retrieving SO_ERROR"); > > Shouldn't this be flow_warn_ratelimit(), more or less in line with what > udp_sock_recverr() does, That wouldn't be in line with udp_sock_recverr(). udp_sock_recverr() uses a warn() when it can't propagate a specific error via ICMP. If it fails to read the errors at all, that's an err_perror(). (Should be flow_err_ratelimit(), I'll add a patch fixing that). > because we're not aborting any operation or > any connection, we're simply failing to read out errors which might not > exist for whatever reason? The severity here isn't so much because of the consequences of the error, but because failing to even read the error state is a very unexpected condition. The kernel should generally let us do that. So, this is something going wrong strictly on the host, rather than anything triggered by the guest. In that sense it's _more_ serious than the flow being killed because the guest tried to contact a peer that didn't exist or whatever. [snip] > > @@ -92,8 +95,9 @@ static int udp_flow_sock(const struct ctx *c, > > epoll_del(flow_epollfd(&uflow->f), s); > > close(s); > > > > - flow_dbg(uflow, "Couldn't connect flow socket: %s", > > - strerror_(-rc)); > > + flow_warn_ratelimit(uflow, now, > > + "Couldn't connect flow socket: %s", > > + strerror_(-rc)); > > This should eventually cause an error message in udp_sock_fwd(), but > regardless of that, for consistency, as we're actually throwing packets > away in this case, shouldn't this be flow_perror_ratelimit()? I'd tend towards no, because this can at least partly be triggered by the guest. We were getting this very error with an EADDRNOTAVAIL in https://github.com/podman-container-tools/podman/issues/23739 because the guest was trying to do a multicast that the host could find a source address for. Not one of the listed reasons for EADDRNOTAVAIL in connect(2), but there you go. I suspect you'll also get an error here if the guest attempts to connect to an address that's not routable from the host. > > > return rc; > > } > > uflow->s[sidei] = s; > > @@ -154,7 +158,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > > > > flow_foreach_sidei(sidei) { > > if (pif_is_socket(uflow->f.pif[sidei])) > > - if (udp_flow_sock(c, uflow, sidei) < 0) > > + if (udp_flow_sock(c, uflow, sidei, now) < 0) > > goto cancel; > > } > > > > @@ -176,7 +180,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > > goto cancel; > > } > > if (port != tgt->oport) { > > - flow_err(uflow, "Unexpected local port"); > > + flow_err_ratelimit(uflow, now, "Unexpected local port"); > > goto cancel; > > } > > } > > @@ -248,7 +252,8 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > > * been initiated from a socket bound to 0.0.0.0 or ::, we don't > > * know our address, so we have to leave it unpopulated. > > */ > > - flow_err(flow, "Invalid endpoint on UDP recvfrom()"); > > + flow_err_ratelimit(flow, now, > > + "Invalid endpoint on UDP recvfrom()"); > > flow_alloc_cancel(flow); > > return FLOW_SIDX_NONE; > > } > > -- > Stefano > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson