From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=NtlHoc1O; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 571E65A0624 for ; Fri, 26 Dec 2025 06:34:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1766727236; bh=vRx7hrYZbsqMVGLxMAnqKCUFkLoHeFXMoMqw3N7oSWA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NtlHoc1ORZJ9EZNU0OiUZkgospuJNugwYqE3sKrKfsqAIbmsNw6HpaoGP2KYBOYj/ HYam4EEjjHYCJA1ynQlKjRI0GMxlVz3h3eOVBWRnwrhwt2t8o+NYEHGwjH5cZ9D/Dg 0+kKdCy6l7014bfAnE8VE+UXsGxULdf+j+xFWoe0hC68gXsI2SeHGpytds6MtberfU YF4cVvZHPi844eoHA2FdTgxuNCfi+OE+xQvj5lN+FH1zYpfwsdVqjzsevVFNPkHg22 /QSSsrIEiKFgZvTvgruKrsKzKyIjgd8lXhUHE8Jj7/2pWYO/TAXDMGclQrUSH2jAnZ iRLLxDkvXNy+w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dcvRm09bRz4wQG; Fri, 26 Dec 2025 16:33:56 +1100 (AEDT) Date: Fri, 26 Dec 2025 16:33:44 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 4/7] flow: Introduce flow_epoll_set() to centralize epoll operations Message-ID: References: <20251219164518.930012-1-lvivier@redhat.com> <20251219164518.930012-5-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="br8hyx7wp+An/eAj" Content-Disposition: inline In-Reply-To: <20251219164518.930012-5-lvivier@redhat.com> Message-ID-Hash: AN3HCBIUI6MTJJKMEXAKCBQLYSSYZABM X-Message-ID-Hash: AN3HCBIUI6MTJJKMEXAKCBQLYSSYZABM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --br8hyx7wp+An/eAj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 19, 2025 at 05:45:15PM +0100, Laurent Vivier wrote: > Currently, each flow type (TCP, TCP_SPLICE, PING, UDP) has its own > code to add or modify file descriptors in epoll. This leads to > duplicated boilerplate code across icmp.c, tcp.c, tcp_splice.c, and > udp_flow.c, each setting up epoll_ref unions and calling epoll_ctl() > with flow-type-specific details. >=20 > Introduce flow_epoll_set() in flow.c to handle epoll operations for > all flow types in a unified way. The function takes an explicit epoll > type parameter, allowing it to handle not only flow socket types but > also the TCP timer (EPOLL_TYPE_TCP_TIMER). >=20 > This will be needed to migrate queue pair from an epollfd to another. >=20 > Signed-off-by: Laurent Vivier Love the concept. Couple of warts in execution. > --- > flow.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > flow.h | 3 +++ > icmp.c | 9 ++----- > tcp.c | 39 +++++++++-------------------- > tcp_splice.c | 27 +++++++------------- > udp_flow.c | 12 +-------- > 6 files changed, 97 insertions(+), 63 deletions(-) >=20 > diff --git a/flow.c b/flow.c > index 4f53486586cd..9c98102e3616 100644 > --- a/flow.c > +++ b/flow.c > @@ -20,6 +20,7 @@ > #include "flow.h" > #include "flow_table.h" > #include "repair.h" > +#include "epoll_ctl.h" > =20 > const char *flow_state_str[] =3D { > [FLOW_STATE_FREE] =3D "FREE", > @@ -390,6 +391,75 @@ void flow_epollid_clear(struct flow_common *f) > f->epollid =3D EPOLLFD_ID_INVALID; > } > =20 > +/** > + * flow_epoll_set() - Add or modify epoll registration for a flow socket > + * @c: Execution context > + * @type: epoll type > + * @f: Flow to register socket for > + * @events: epoll events to watch for > + * @fd: File descriptor to register > + * @sidei: Side index of the flow > + * > + * Return: 0 on success, -1 on error (from epoll_ctl()) > + */ > +int flow_epoll_set(const struct ctx *c, enum epoll_type type, > + const struct flow_common *f, uint32_t events, int fd, > + unsigned int sidei) > +{ > + struct epoll_event ev; > + union epoll_ref ref; > + int epollfd; > + int m; > + > + ref.fd =3D fd; > + ref.type =3D type; > + > + switch (type) { > + case EPOLL_TYPE_TCP_SPLICE: > + case EPOLL_TYPE_TCP: > + ref.flowside =3D flow_sidx(f, sidei); > + if (flow_in_epoll(f)) { > + m =3D EPOLL_CTL_MOD; > + epollfd =3D flow_epollfd(f); > + } else { > + m =3D EPOLL_CTL_ADD; > + epollfd =3D c->epollfd; > + } It looks like the handling of flow_in_epoll() / flow_epoll() should be able to be common between all these epoll types. I'm guessing that doesn't quite work, because flow_in_epoll() isn't yet updated properly for all the types. Might this be cleaner if that was fixed first? > + break; > + case EPOLL_TYPE_TCP_TIMER: { > + const struct tcp_tap_conn *conn =3D &((union flow *)f)->tcp; > + > + ref.flow =3D flow_idx(f); > + m =3D conn->timer =3D=3D -1 ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; > + epollfd =3D flow_epollfd(f); > + break; > + } > + case EPOLL_TYPE_PING: > + ref.flowside =3D flow_sidx(f, sidei); > + m =3D EPOLL_CTL_ADD; > + epollfd =3D flow_epollfd(f); > + break; That would allow this case to be the same as the TCP ones. > + case EPOLL_TYPE_UDP: { > + union { > + flow_sidx_t sidx; > + uint32_t data; > + } fref =3D { .sidx =3D flow_sidx(f, sidei) }; I'm not sure quite what this union is about. EPOLL_TYPE_UDP uses epoll_ref.flowside, same as the non-timer TCP types. epoll_ref.udp is only for udp EPOLL_TYPE_UDP_LISTEN. Arguably we should change the name to reflect that, but I'm working towards unifying the epoll_ref variants for the various listening sockets anyway. > + > + ref.data =3D fref.data; > + m =3D EPOLL_CTL_ADD; > + epollfd =3D flow_epollfd(f); Given the above, this should be identical to the EPOLL_TYPE_PING case already, and apart from the EPOLL_CTL_ADD handling, it's identical to the non-timer TCP cases. > + break; > + } > + default: > + ASSERT(0); > + } > + > + ev.events =3D events; > + ev.data.u64 =3D ref.u64; > + > + return epoll_ctl(epollfd, m, fd, &ev); > +} > + > /** > * flow_epollid_register() - Initialize the epoll id -> fd mapping > * @epollid: epoll id to associate to > diff --git a/flow.h b/flow.h > index b43b0b1dd7f2..388fab124238 100644 > --- a/flow.h > +++ b/flow.h > @@ -265,6 +265,9 @@ bool flow_in_epoll(const struct flow_common *f); > int flow_epollfd(const struct flow_common *f); > void flow_epollid_set(struct flow_common *f, int epollid); > void flow_epollid_clear(struct flow_common *f); > +int flow_epoll_set(const struct ctx *c, enum epoll_type type, > + const struct flow_common *f, uint32_t events, int fd, > + unsigned int sidei); > void flow_epollid_register(int epollid, int epollfd); > void flow_defer_handler(const struct ctx *c, const struct timespec *now); > int flow_migrate_source_early(struct ctx *c, const struct migrate_stage = *stage, > diff --git a/icmp.c b/icmp.c > index 9564c4963f7b..235759567060 100644 > --- a/icmp.c > +++ b/icmp.c > @@ -177,7 +177,6 @@ static struct icmp_ping_flow *icmp_ping_new(const str= uct ctx *c, > union flow *flow =3D flow_alloc(); > struct icmp_ping_flow *pingf; > const struct flowside *tgt; > - union epoll_ref ref; > =20 > if (!flow) > return NULL; > @@ -211,12 +210,8 @@ static struct icmp_ping_flow *icmp_ping_new(const st= ruct ctx *c, > goto cancel; > =20 > flow_epollid_set(&pingf->f, EPOLLFD_ID_DEFAULT); > - > - ref.type =3D EPOLL_TYPE_PING; > - ref.flowside =3D FLOW_SIDX(flow, TGTSIDE); > - ref.fd =3D pingf->sock; > - > - if (epoll_add(flow_epollfd(&pingf->f), EPOLLIN, ref) < 0) { > + if (flow_epoll_set(c, EPOLL_TYPE_PING, &pingf->f, EPOLLIN, pingf->sock, > + TGTSIDE) < 0) { > close(pingf->sock); > goto cancel; > } > diff --git a/tcp.c b/tcp.c > index 6e6156098c12..2907af282551 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -530,14 +530,10 @@ static uint32_t tcp_conn_epoll_events(uint8_t event= s, uint8_t conn_flags) > */ > static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > { > - int m =3D flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; > - union epoll_ref ref =3D { .type =3D EPOLL_TYPE_TCP, .fd =3D conn->sock, > - .flowside =3D FLOW_SIDX(conn, !TAPSIDE(conn)), }; > - struct epoll_event ev =3D { .data.u64 =3D ref.u64 }; > - int epollfd =3D flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) > - : c->epollfd; > + uint32_t events; > =20 > if (conn->events =3D=3D CLOSED) { > + int epollfd =3D flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) : c->= epollfd; > if (flow_in_epoll(&conn->f)) > epoll_del(epollfd, conn->sock); > if (conn->timer !=3D -1) > @@ -545,22 +541,17 @@ static int tcp_epoll_ctl(const struct ctx *c, struc= t tcp_tap_conn *conn) > return 0; > } > =20 > - ev.events =3D tcp_conn_epoll_events(conn->events, conn->flags); > + events =3D tcp_conn_epoll_events(conn->events, conn->flags); > =20 > - if (epoll_ctl(epollfd, m, conn->sock, &ev)) > + if (flow_epoll_set(c, EPOLL_TYPE_TCP, &conn->f, events, conn->sock, > + !TAPSIDE(conn))) > return -errno; > =20 > flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); > =20 > if (conn->timer !=3D -1) { > - union epoll_ref ref_t =3D { .type =3D EPOLL_TYPE_TCP_TIMER, > - .fd =3D conn->timer, > - .flow =3D FLOW_IDX(conn) }; > - struct epoll_event ev_t =3D { .data.u64 =3D ref_t.u64, > - .events =3D EPOLLIN | EPOLLET }; > - > - if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_MOD, > - conn->timer, &ev_t)) > + if (flow_epoll_set(c, EPOLL_TYPE_TCP_TIMER, &conn->f, > + EPOLLIN | EPOLLET, conn->timer, 0)) > return -errno; > } > =20 > @@ -581,11 +572,6 @@ static void tcp_timer_ctl(const struct ctx *c, struc= t tcp_tap_conn *conn) > return; > =20 > if (conn->timer =3D=3D -1) { > - union epoll_ref ref =3D { .type =3D EPOLL_TYPE_TCP_TIMER, > - .flow =3D FLOW_IDX(conn) }; > - struct epoll_event ev =3D { .data.u64 =3D ref.u64, > - .events =3D EPOLLIN | EPOLLET }; > - int epollfd =3D flow_epollfd(&conn->f); > int fd; > =20 > fd =3D timerfd_create(CLOCK_MONOTONIC, 0); > @@ -593,18 +579,17 @@ static void tcp_timer_ctl(const struct ctx *c, stru= ct tcp_tap_conn *conn) > flow_dbg_perror(conn, "failed to get timer"); > if (fd > -1) > close(fd); > - conn->timer =3D -1; > return; > } > - conn->timer =3D fd; > - ref.fd =3D conn->timer; > =20 > - if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) { > + if (flow_epoll_set(c, EPOLL_TYPE_TCP_TIMER, &conn->f, > + EPOLLIN | EPOLLET, fd, 0)) { > flow_dbg_perror(conn, "failed to add timer"); > - close(conn->timer); > - conn->timer =3D -1; > + close(fd); > return; > } > + > + conn->timer =3D fd; > } > =20 > if (conn->flags & ACK_TO_TAP_DUE) { > diff --git a/tcp_splice.c b/tcp_splice.c > index bf4ff466de07..7367f435367b 100644 > --- a/tcp_splice.c > +++ b/tcp_splice.c > @@ -143,24 +143,15 @@ static uint32_t tcp_splice_conn_epoll_events(uint16= _t events, unsigned sidei) > static int tcp_splice_epoll_ctl(const struct ctx *c, > struct tcp_splice_conn *conn) > { > - int epollfd =3D flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) > - : c->epollfd; > - int m =3D flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; > - const union epoll_ref ref[SIDES] =3D { > - { .type =3D EPOLL_TYPE_TCP_SPLICE, .fd =3D conn->s[0], > - .flowside =3D FLOW_SIDX(conn, 0) }, > - { .type =3D EPOLL_TYPE_TCP_SPLICE, .fd =3D conn->s[1], > - .flowside =3D FLOW_SIDX(conn, 1) } > - }; > - struct epoll_event ev[SIDES] =3D { { .data.u64 =3D ref[0].u64 }, > - { .data.u64 =3D ref[1].u64 } }; > - > - ev[0].events =3D tcp_splice_conn_epoll_events(conn->events, 0); > - ev[1].events =3D tcp_splice_conn_epoll_events(conn->events, 1); > - > - > - if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) || > - epoll_ctl(epollfd, m, conn->s[1], &ev[1])) { > + uint32_t events[2]; > + > + events[0] =3D tcp_splice_conn_epoll_events(conn->events, 0); > + events[1] =3D tcp_splice_conn_epoll_events(conn->events, 1); > + > + if (flow_epoll_set(c, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[0], > + conn->s[0], 0) || > + flow_epoll_set(c, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[1], > + conn->s[1], 1)) { > int ret =3D -errno; > flow_perror(conn, "ERROR on epoll_ctl()"); > return ret; > diff --git a/udp_flow.c b/udp_flow.c > index 33f29f21e69e..42f3622d621c 100644 > --- a/udp_flow.c > +++ b/udp_flow.c > @@ -74,11 +74,6 @@ static int udp_flow_sock(const struct ctx *c, > { > const struct flowside *side =3D &uflow->f.side[sidei]; > uint8_t pif =3D uflow->f.pif[sidei]; > - union { > - flow_sidx_t sidx; > - uint32_t data; > - } fref =3D { .sidx =3D FLOW_SIDX(uflow, sidei) }; > - union epoll_ref ref; > int rc; > int s; > =20 > @@ -88,13 +83,8 @@ static int udp_flow_sock(const struct ctx *c, > return s; > } > =20 > - ref.type =3D EPOLL_TYPE_UDP; > - ref.data =3D fref.data; > - ref.fd =3D s; > - > flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT); > - > - rc =3D epoll_add(flow_epollfd(&uflow->f), EPOLLIN, ref); > + rc =3D flow_epoll_set(c, EPOLL_TYPE_UDP, &uflow->f, EPOLLIN, s, sidei); > if (rc < 0) { > close(s); > return rc; > --=20 > 2.51.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --br8hyx7wp+An/eAj Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmlOHh0ACgkQzQJF27ox 2Gc8Dg/+LEKdkWFGMMsytGYkWPDCeK3+1kihCPHLqlcbD0gVGOV0CY170gjkOT0N u4Z41SCBFMQhMgv4sgfUlbiISCmY6yxeN/IOD2NQPYGG0K1hojgN7GLhoRbv0Gj1 h5teRMWPxtk4ryaOMRWpODUuOj/Ro3AVPL576JM4b8iU2yXrU2ucDj8RB3L9/h5w s26eoGuZ5h+FpP+NZ9Q1WcHY+w60tgNF7tNf6dAWDhoyENOb8F8BCrTzfXtgc1dL gumHhv6snYBjDPga3E6CVxyLB8UvSMXokbrEpR/ocyrIm+zRaRYXHvrcAe5dJTOS PlFGoB+irOjVX+OWB+6UwCi2wmdJVEAwSBBF1HNRl2TvMjtKB+XYt/7TE4YXYwXG Z7Lo1iLOG+lQ31c71Unxd8Ior1P4vYKZIcHMwmoe/9/KVieE+lP+mYTPnVHUOTde 9vHlBVs2MIzXGN2ehBmr0BdzEtyybidcDe6f4+utwtIGOEVNwNpPcBkqfl45I2iD TupEJ6rCMcHMyHLCwWM2BduowdHV9gZ9c5uPnapNX900BTWRvlekalxZL+G7gJhe 41qE/CmcszQMJsgvxF4vknkp4hXKlKwdcCauExQsRVQLPLAgtIxk/9nr/z0IdvwS mvbQZdSbh9K4JsjHbzNdVGrfMunXoQo0vOo8ksk8PefaKr6p2yc= =WzPs -----END PGP SIGNATURE----- --br8hyx7wp+An/eAj--