On Fri, Oct 03, 2025 at 05:27:14PM +0200, Laurent Vivier wrote: > Move epoll_ctl() calls from sock_l4_sa() to the protocol-specific code > (icmp.c, pif.c, udp_flow.c) to give callers more control over epoll > registration. This allows sock_l4_sa() to focus solely on socket > creation and binding, while epoll management happens at a higher level. > > Remove the data parameter from sock_l4_sa() and flowside_sock_l4() as > it's no longer needed - callers now construct the full epoll_ref and > register the socket themselves after creation. I like this idea in principle. I've previously thought that doing the epoll registration in sock_l4_sa() was verging on a layering violation. However... [snip] > diff --git a/icmp.c b/icmp.c > index bd3108a21675..d6f0abe68269 100644 > --- a/icmp.c > +++ b/icmp.c > @@ -172,10 +172,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > { > uint8_t proto = af == AF_INET ? IPPROTO_ICMP : IPPROTO_ICMPV6; > uint8_t flowtype = af == AF_INET ? FLOW_PING4 : FLOW_PING6; > - union epoll_ref ref = { .type = EPOLL_TYPE_PING }; > union flow *flow = flow_alloc(); > struct icmp_ping_flow *pingf; > const struct flowside *tgt; > + struct epoll_event ev; > + union epoll_ref ref; > > if (!flow) > return NULL; > @@ -196,9 +197,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > > pingf->seq = -1; > > - ref.flowside = FLOW_SIDX(flow, TGTSIDE); > - pingf->sock = flowside_sock_l4(c, EPOLL_TYPE_PING, PIF_HOST, > - tgt, ref.data); > + pingf->sock = flowside_sock_l4(c, EPOLL_TYPE_PING, PIF_HOST, tgt); > > if (pingf->sock < 0) { > warn("Cannot open \"ping\" socket. You might need to:"); > @@ -210,6 +209,18 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > if (pingf->sock > FD_REF_MAX) > goto cancel; > > + ref.type = EPOLL_TYPE_PING; > + ref.flowside = FLOW_SIDX(flow, TGTSIDE); > + ref.fd = pingf->sock; > + > + ev.events = EPOLLIN; > + ev.data.u64 = ref.u64; > + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, pingf->sock, &ev) == -1) { > + warn_perror("L4 epoll_ctl"); > + close(pingf->sock); > + goto cancel; > + } > + ... this is uncomfortably bulky to have in every protocol. Could we maybe mitigate this with an epoll_add() helper of some sort that does at least some of the bit shuffling to construct the epoll data? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson