* [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations
@ 2026-01-08 14:01 Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Currently, each protocol handler (TCP, TCP splice, ICMP/ping, UDP) has its
own code to add or modify file descriptors in epoll. This leads to
duplicated boilerplate across icmp.c, tcp.c, tcp_splice.c, and udp_flow.c,
with each setting up epoll_ref unions and calling epoll_ctl() with
flow-type-specific details.
This series introduces flow_epoll_set() in flow.c to handle epoll
operations for all flow types in a unified way. The function provides
a simpler interface where the caller passes the command, type, events,
and file descriptor, and flow_epoll_set() handles the epoll_ref setup
and epoll_ctl() call.
The API is:
int flow_epoll_set(int command, enum epoll_type type,
const struct flow_common *f, uint32_t events,
int fd, unsigned int sidei);
This centralized approach will be essential for queue pair migration in
the upcoming multithreading work, where flows need to be moved between
different epoll instances owned by different threads.
Preparatory patches:
- Patch 1 removes dead timer update code in tcp_epoll_ctl()
- Patch 2 removes unneeded epoll_ref indirection in udp_flow
- Patch 3 refactors udp_flow_sock() to assign the socket internally
- Patch 4 refactors tcp_splice_conn_epoll_events() to per-side computation
Core patch:
- Patch 5 introduces flow_epoll_set() used by all protocol handlers
Changes in v2:
- Added patch 1: Remove dead timer update in tcp_epoll_ctl() since flow
table compaction was eliminated
- Added patch 2: Remove unnecessary epoll_ref indirection in udp_flow.c
- Dropped v1 patch 6 (compute events inside flow_epoll_set())
- Dropped v1 patch 7 (retrieve fd from flow structure)
- Added epoll_ctl() command parameter (EPOLL_CTL_ADD/EPOLL_CTL_MOD)
- Callers must set epollfd via flow_epollid_set() before calling
flow_epoll_set()
- Removed FLOW_EPOLL_TIMER_ADD/FLOW_EPOLL_TIMER_MOD constants
- Reduced series from 7 to 5 patches
Laurent Vivier (5):
tcp: remove timer update in tcp_epoll_ctl()
udp_flow: remove unneeded epoll_ref indirection
udp_flow: Assign socket to flow inside udp_flow_sock()
tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side
computation
flow: Introduce flow_epoll_set() to centralize epoll operations
flow.c | 33 +++++++++++++++
flow.h | 3 ++
icmp.c | 11 ++---
tcp.c | 72 ++++++++++++++------------------
tcp_splice.c | 115 ++++++++++++++++++++++++---------------------------
udp_flow.c | 19 +++------
6 files changed, 131 insertions(+), 122 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl()
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
@ 2026-01-08 14:01 ` Laurent Vivier
2026-01-08 23:26 ` David Gibson
2026-01-08 14:01 ` [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Timer update in tcp_epoll_ctl() has been added by
bb708111833e ("treewide: Packet abstraction with mandatory boundary checks")
because epoll_ref stores "conn - tc" that can change in tcp_table_compact().
But since
e2e8219f13b8 ("flow, tcp: Consolidate flow pointer<->index helpers")
we use FLOW_IDX() and since
8981a720aac4 ("flow: Avoid moving flow entries to compact table")
flow table doesn't use a compaction procedure so FLOW_IDX() never changes.
Updating the timer event is now a no-op, remove it from tcp_epoll_ctl().
Fixes: 8981a720aac4 ("flow: Avoid moving flow entries to compact table")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
tcp.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/tcp.c b/tcp.c
index e7fa85f346b2..5141cdc7e839 100644
--- a/tcp.c
+++ b/tcp.c
@@ -552,18 +552,6 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
- if (conn->timer != -1) {
- union epoll_ref ref_t = { .type = EPOLL_TYPE_TCP_TIMER,
- .fd = conn->timer,
- .flow = FLOW_IDX(conn) };
- struct epoll_event ev_t = { .data.u64 = ref_t.u64,
- .events = EPOLLIN | EPOLLET };
-
- if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_MOD,
- conn->timer, &ev_t))
- return -errno;
- }
-
return 0;
}
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
@ 2026-01-08 14:01 ` Laurent Vivier
2026-01-08 23:26 ` David Gibson
2026-01-08 14:01 ` [PATCH v2 3/5] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
The fref union was used to convert flow_sidx_t to uint32_t for
assignment to ref.data. This is unnecessary since epoll_ref already
contains a flowside member of type flow_sidx_t, so we can assign
directly.
This aligns with how icmp.c and other callers assign flow_sidx_t to
epoll_ref.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
udp_flow.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/udp_flow.c b/udp_flow.c
index 8907f2f72741..0ba788060db7 100644
--- a/udp_flow.c
+++ b/udp_flow.c
@@ -74,10 +74,6 @@ static int udp_flow_sock(const struct ctx *c,
{
const struct flowside *side = &uflow->f.side[sidei];
uint8_t pif = uflow->f.pif[sidei];
- union {
- flow_sidx_t sidx;
- uint32_t data;
- } fref = { .sidx = FLOW_SIDX(uflow, sidei) };
union epoll_ref ref;
int rc;
int s;
@@ -89,7 +85,7 @@ static int udp_flow_sock(const struct ctx *c,
}
ref.type = EPOLL_TYPE_UDP;
- ref.data = fref.data;
+ ref.flowside = FLOW_SIDX(uflow, sidei);
ref.fd = s;
flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT);
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 3/5] udp_flow: Assign socket to flow inside udp_flow_sock()
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier
@ 2026-01-08 14:01 ` Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 4/5] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
4 siblings, 0 replies; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
Move the assignment of uflow->s[sidei] from the caller (udp_flow_new())
into udp_flow_sock() itself, placing it after the successful connect().
This is a pure refactoring with no functional change. The socket fd is
now assigned within udp_flow_sock() where the socket is created, rather
than requiring the caller to capture the return value. On error paths,
uflow->s[sidei] remains at its initialized value of -1 rather than being
set to the negative error code, which is semantically cleaner (though
functionally equivalent given the >= 0 check in udp_flow_close()).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
udp_flow.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/udp_flow.c b/udp_flow.c
index 0ba788060db7..c4cf35c2c89d 100644
--- a/udp_flow.c
+++ b/udp_flow.c
@@ -105,6 +105,7 @@ static int udp_flow_sock(const struct ctx *c,
flow_dbg_perror(uflow, "Couldn't connect flow socket");
return rc;
}
+ uflow->s[sidei] = s;
/* It's possible, if unlikely, that we could receive some packets in
* between the bind() and connect() which may or may not be for this
@@ -159,7 +160,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow,
flow_foreach_sidei(sidei) {
if (pif_is_socket(uflow->f.pif[sidei]))
- if ((uflow->s[sidei] = udp_flow_sock(c, uflow, sidei)) < 0)
+ if (udp_flow_sock(c, uflow, sidei) < 0)
goto cancel;
}
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 4/5] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
` (2 preceding siblings ...)
2026-01-08 14:01 ` [PATCH v2 3/5] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier
@ 2026-01-08 14:01 ` Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
4 siblings, 0 replies; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier, David Gibson
The function tcp_splice_conn_epoll_events() currently takes an array of
struct epoll_event and fills in the .events field for both sides using
flow_foreach_sidei() loops.
This works, but the function is doing two conceptually separate things
at once: computing events for side 0 and computing events for side 1.
The OUT_WAIT handling is particularly subtle, as it has cross-side
effects: when OUT_WAIT(sidei) is set, we add EPOLLOUT to ev[sidei] but
also remove EPOLLIN from ev[!sidei].
Refactor to make the function compute events for a single side at a
time, taking sidei as a parameter and returning uint32_t. This makes
the logic more focused and easier to follow. The cross-side effects of
OUT_WAIT are preserved by checking both OUT_WAIT(sidei) and
OUT_WAIT(!sidei) within each call.
The caller tcp_splice_epoll_ctl() now invokes the function twice, once
for each side, making the two-sided nature of the operation explicit.
No functional change.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 33 ++++++++++++++-------------------
1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 440522449c13..bf4ff466de07 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -114,29 +114,23 @@ static struct tcp_splice_conn *conn_at_sidx(flow_sidx_t sidx)
* @events: Connection event flags
* @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket
*/
-static void tcp_splice_conn_epoll_events(uint16_t events,
- struct epoll_event ev[])
+static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei)
{
- unsigned sidei;
-
- flow_foreach_sidei(sidei)
- ev[sidei].events = 0;
+ uint32_t e = 0;
if (events & SPLICE_ESTABLISHED) {
- flow_foreach_sidei(sidei) {
- if (!(events & FIN_SENT(!sidei)))
- ev[sidei].events = EPOLLIN | EPOLLRDHUP;
- }
- } else if (events & SPLICE_CONNECT) {
- ev[1].events = EPOLLOUT;
+ if (!(events & FIN_SENT(!sidei)))
+ e = EPOLLIN | EPOLLRDHUP;
+ } else if (sidei == 1 && events & SPLICE_CONNECT) {
+ e = EPOLLOUT;
}
- flow_foreach_sidei(sidei) {
- if (events & OUT_WAIT(sidei)) {
- ev[sidei].events |= EPOLLOUT;
- ev[!sidei].events &= ~EPOLLIN;
- }
- }
+ if (events & OUT_WAIT(sidei))
+ e |= EPOLLOUT;
+ if (events & OUT_WAIT(!sidei))
+ e &= ~EPOLLIN;
+
+ return e;
}
/**
@@ -161,7 +155,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
{ .data.u64 = ref[1].u64 } };
- tcp_splice_conn_epoll_events(conn->events, ev);
+ ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0);
+ ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1);
if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) ||
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
` (3 preceding siblings ...)
2026-01-08 14:01 ` [PATCH v2 4/5] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier
@ 2026-01-08 14:01 ` Laurent Vivier
2026-01-08 23:33 ` David Gibson
4 siblings, 1 reply; 10+ messages in thread
From: Laurent Vivier @ 2026-01-08 14:01 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Currently, each flow type (TCP, TCP_SPLICE, PING, UDP) has its own
code to add or modify file descriptors in epoll. This leads to
duplicated boilerplate code across icmp.c, tcp.c, tcp_splice.c, and
udp_flow.c, each setting up epoll_ref unions and calling epoll_ctl()
with flow-type-specific details.
Introduce flow_epoll_set() in flow.c to handle epoll operations for
all flow types in a unified way. The function takes an explicit epoll
type parameter, allowing it to handle not only flow socket types but
also the TCP timer (EPOLL_TYPE_TCP_TIMER).
This will be needed to migrate queue pair from an epollfd to another.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
flow.c | 33 ++++++++++++++++++++
flow.h | 3 ++
icmp.c | 11 +++----
tcp.c | 64 ++++++++++++++++++++------------------
tcp_splice.c | 86 +++++++++++++++++++++++++---------------------------
udp_flow.c | 12 +++-----
6 files changed, 119 insertions(+), 90 deletions(-)
diff --git a/flow.c b/flow.c
index 4f53486586cd..73d24f9d3765 100644
--- a/flow.c
+++ b/flow.c
@@ -20,6 +20,7 @@
#include "flow.h"
#include "flow_table.h"
#include "repair.h"
+#include "epoll_ctl.h"
const char *flow_state_str[] = {
[FLOW_STATE_FREE] = "FREE",
@@ -390,6 +391,38 @@ void flow_epollid_clear(struct flow_common *f)
f->epollid = EPOLLFD_ID_INVALID;
}
+/**
+ * flow_epoll_set() - Add or modify epoll registration for a flow socket
+ * @command: epoll_ctl() command: EPOLL_CTL_ADD or EPOLL_CTL_MOD
+ * @type: epoll type
+ * @f: Flow to register socket for
+ * @events: epoll events to watch for
+ * @fd: File descriptor to register
+ * @sidei: Side index of the flow
+ *
+ * Return: 0 on success, -1 on error (from epoll_ctl())
+ */
+int flow_epoll_set(int command, enum epoll_type type,
+ const struct flow_common *f, uint32_t events, int fd,
+ unsigned int sidei)
+{
+ struct epoll_event ev;
+ union epoll_ref ref;
+
+ ref.type = type;
+ ref.fd = fd;
+
+ if (type == EPOLL_TYPE_TCP_TIMER)
+ ref.flow = flow_idx(f);
+ else
+ ref.flowside = flow_sidx(f, sidei);
+
+ ev.events = events;
+ ev.data.u64 = ref.u64;
+
+ return epoll_ctl(flow_epollfd(f), command, fd, &ev);
+}
+
/**
* flow_epollid_register() - Initialize the epoll id -> fd mapping
* @epollid: epoll id to associate to
diff --git a/flow.h b/flow.h
index b43b0b1dd7f2..bd5c6d90322b 100644
--- a/flow.h
+++ b/flow.h
@@ -265,6 +265,9 @@ bool flow_in_epoll(const struct flow_common *f);
int flow_epollfd(const struct flow_common *f);
void flow_epollid_set(struct flow_common *f, int epollid);
void flow_epollid_clear(struct flow_common *f);
+int flow_epoll_set(int command, enum epoll_type type,
+ const struct flow_common *f, uint32_t events, int fd,
+ unsigned int sidei);
void flow_epollid_register(int epollid, int epollfd);
void flow_defer_handler(const struct ctx *c, const struct timespec *now);
int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage,
diff --git a/icmp.c b/icmp.c
index 9564c4963f7b..b6bb36d80715 100644
--- a/icmp.c
+++ b/icmp.c
@@ -177,7 +177,6 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
union flow *flow = flow_alloc();
struct icmp_ping_flow *pingf;
const struct flowside *tgt;
- union epoll_ref ref;
if (!flow)
return NULL;
@@ -211,13 +210,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
goto cancel;
flow_epollid_set(&pingf->f, EPOLLFD_ID_DEFAULT);
-
- ref.type = EPOLL_TYPE_PING;
- ref.flowside = FLOW_SIDX(flow, TGTSIDE);
- ref.fd = pingf->sock;
-
- if (epoll_add(flow_epollfd(&pingf->f), EPOLLIN, ref) < 0) {
+ if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_PING, &pingf->f,
+ EPOLLIN, pingf->sock,
+ TGTSIDE) < 0) {
close(pingf->sock);
+ flow_epollid_clear(&pingf->f);
goto cancel;
}
diff --git a/tcp.c b/tcp.c
index 5141cdc7e839..d8cc11b377de 100644
--- a/tcp.c
+++ b/tcp.c
@@ -523,34 +523,44 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags)
/**
* tcp_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
* @conn: Connection pointer
*
* Return: 0 on success, negative error code on failure (not on deletion)
*/
-static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
+static int tcp_epoll_ctl(struct tcp_tap_conn *conn)
{
- int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock,
- .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), };
- struct epoll_event ev = { .data.u64 = ref.u64 };
- int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
- : c->epollfd;
+ uint32_t events;
+ int m;
if (conn->events == CLOSED) {
- if (flow_in_epoll(&conn->f))
+ if (flow_in_epoll(&conn->f)) {
+ int epollfd = flow_epollfd(&conn->f);
+
epoll_del(epollfd, conn->sock);
- if (conn->timer != -1)
- epoll_del(epollfd, conn->timer);
+ if (conn->timer != -1)
+ epoll_del(epollfd, conn->timer);
+ }
+
return 0;
}
- ev.events = tcp_conn_epoll_events(conn->events, conn->flags);
+ events = tcp_conn_epoll_events(conn->events, conn->flags);
- if (epoll_ctl(epollfd, m, conn->sock, &ev))
- return -errno;
+ if (flow_in_epoll(&conn->f)) {
+ m = EPOLL_CTL_MOD;
+ } else {
+ flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
+ m = EPOLL_CTL_ADD;
+ }
+
+ if (flow_epoll_set(m, EPOLL_TYPE_TCP, &conn->f, events, conn->sock,
+ !TAPSIDE(conn)) < 0) {
+ int ret = -errno;
- flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
+ if (m == EPOLL_CTL_ADD)
+ flow_epollid_clear(&conn->f);
+ return ret;
+ }
return 0;
}
@@ -569,11 +579,6 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
return;
if (conn->timer == -1) {
- union epoll_ref ref = { .type = EPOLL_TYPE_TCP_TIMER,
- .flow = FLOW_IDX(conn) };
- struct epoll_event ev = { .data.u64 = ref.u64,
- .events = EPOLLIN | EPOLLET };
- int epollfd = flow_epollfd(&conn->f);
int fd;
fd = timerfd_create(CLOCK_MONOTONIC, 0);
@@ -581,18 +586,17 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
flow_dbg_perror(conn, "failed to get timer");
if (fd > -1)
close(fd);
- conn->timer = -1;
return;
}
- conn->timer = fd;
- ref.fd = conn->timer;
- if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) {
+ if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_TCP_TIMER,
+ &conn->f, EPOLLIN | EPOLLET, fd, 0) < 0) {
flow_dbg_perror(conn, "failed to add timer");
- close(conn->timer);
- conn->timer = -1;
+ close(fd);
return;
}
+
+ conn->timer = fd;
}
if (conn->flags & ACK_TO_TAP_DUE) {
@@ -669,7 +673,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
}
if (flag == STALLED || flag == ~STALLED)
- tcp_epoll_ctl(c, conn);
+ tcp_epoll_ctl(conn);
if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE ||
(flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) ||
@@ -726,7 +730,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
} else {
if (event == CLOSED)
flow_hash_remove(c, TAP_SIDX(conn));
- tcp_epoll_ctl(c, conn);
+ tcp_epoll_ctl(conn);
}
if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED))
@@ -1742,7 +1746,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
conn_event(c, conn, TAP_SYN_ACK_SENT);
}
- tcp_epoll_ctl(c, conn);
+ tcp_epoll_ctl(conn);
if (c->mode == MODE_VU) { /* To rebind to same oport after migration */
socklen_t sl = sizeof(sa);
@@ -3984,7 +3988,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
tcp_send_flag(c, conn, ACK);
tcp_data_from_sock(c, conn);
- if ((rc = tcp_epoll_ctl(c, conn))) {
+ if ((rc = tcp_epoll_ctl(conn))) {
flow_dbg(conn,
"Failed to subscribe to epoll for migrated socket: %s",
strerror_(-rc));
diff --git a/tcp_splice.c b/tcp_splice.c
index bf4ff466de07..26e9845c39ee 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -135,37 +135,35 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei)
/**
* tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
* @conn: Connection pointer
*
* Return: 0 on success, negative error code on failure (not on deletion)
*/
-static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn)
+static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn)
{
- int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
- : c->epollfd;
- int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- const union epoll_ref ref[SIDES] = {
- { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[0],
- .flowside = FLOW_SIDX(conn, 0) },
- { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[1],
- .flowside = FLOW_SIDX(conn, 1) }
- };
- struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
- { .data.u64 = ref[1].u64 } };
-
- ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0);
- ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1);
-
-
- if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) ||
- epoll_ctl(epollfd, m, conn->s[1], &ev[1])) {
+ uint32_t events[2];
+ int m;
+
+ if (flow_in_epoll(&conn->f)) {
+ m = EPOLL_CTL_MOD;
+ } else {
+ flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
+ m = EPOLL_CTL_ADD;
+ }
+
+ events[0] = tcp_splice_conn_epoll_events(conn->events, 0);
+ events[1] = tcp_splice_conn_epoll_events(conn->events, 1);
+
+ if (flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[0],
+ conn->s[0], 0) ||
+ flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[1],
+ conn->s[1], 1)) {
int ret = -errno;
flow_perror(conn, "ERROR on epoll_ctl()");
+ if (m == EPOLL_CTL_ADD)
+ flow_epollid_clear(&conn->f);
return ret;
}
- flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
return 0;
}
@@ -205,7 +203,7 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
}
}
-#define conn_flag(c, conn, flag) \
+#define conn_flag(conn, flag) \
do { \
flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \
conn_flag_do(conn, flag); \
@@ -213,12 +211,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
/**
* conn_event_do() - Set and log connection events, update epoll state
- * @c: Execution context
* @conn: Connection pointer
* @event: Connection event
*/
-static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
- unsigned long event)
+static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event)
{
if (event & (event - 1)) {
int flag_index = fls(~event);
@@ -240,14 +236,14 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]);
}
- if (tcp_splice_epoll_ctl(c, conn))
- conn_flag(c, conn, CLOSING);
+ if (tcp_splice_epoll_ctl(conn))
+ conn_flag(conn, CLOSING);
}
-#define conn_event(c, conn, event) \
+#define conn_event(conn, event) \
do { \
flow_trace(conn, "event at %s:%i",__func__, __LINE__); \
- conn_event_do(c, conn, event); \
+ conn_event_do(conn, event); \
} while (0)
@@ -315,7 +311,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) {
flow_perror(conn, "cannot create %d->%d pipe",
sidei, !sidei);
- conn_flag(c, conn, CLOSING);
+ conn_flag(conn, CLOSING);
return -EIO;
}
@@ -329,7 +325,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
}
if (!(conn->events & SPLICE_ESTABLISHED))
- conn_event(c, conn, SPLICE_ESTABLISHED);
+ conn_event(conn, SPLICE_ESTABLISHED);
return 0;
}
@@ -376,7 +372,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
- conn_event(c, conn, SPLICE_CONNECT);
+ conn_event(conn, SPLICE_CONNECT);
if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) {
if (errno != EINPROGRESS) {
@@ -385,7 +381,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
return -errno;
}
} else {
- conn_event(c, conn, SPLICE_ESTABLISHED);
+ conn_event(conn, SPLICE_ESTABLISHED);
return tcp_splice_connect_finish(c, conn);
}
@@ -445,7 +441,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0)
flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0);
if (tcp_splice_connect(c, conn))
- conn_flag(c, conn, CLOSING);
+ conn_flag(conn, CLOSING);
FLOW_ACTIVATE(conn);
}
@@ -494,14 +490,14 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
if (events & EPOLLOUT) {
fromsidei = !evsidei;
- conn_event(c, conn, ~OUT_WAIT(evsidei));
+ conn_event(conn, ~OUT_WAIT(evsidei));
} else {
fromsidei = evsidei;
}
if (events & EPOLLRDHUP)
/* For side 0 this is fake, but implied */
- conn_event(c, conn, FIN_RCVD(evsidei));
+ conn_event(conn, FIN_RCVD(evsidei));
swap:
eof = 0;
@@ -536,7 +532,7 @@ retry:
more = SPLICE_F_MORE;
if (conn->flags & lowat_set_flag)
- conn_flag(c, conn, lowat_act_flag);
+ conn_flag(conn, lowat_act_flag);
}
do
@@ -568,8 +564,8 @@ retry:
"Setting SO_RCVLOWAT %i: %s",
lowat, strerror_(errno));
} else {
- conn_flag(c, conn, lowat_set_flag);
- conn_flag(c, conn, lowat_act_flag);
+ conn_flag(conn, lowat_set_flag);
+ conn_flag(conn, lowat_act_flag);
}
}
@@ -583,7 +579,7 @@ retry:
if (conn->read[fromsidei] == conn->written[fromsidei])
break;
- conn_event(c, conn, OUT_WAIT(!fromsidei));
+ conn_event(conn, OUT_WAIT(!fromsidei));
break;
}
@@ -605,7 +601,7 @@ retry:
if ((conn->events & FIN_RCVD(sidei)) &&
!(conn->events & FIN_SENT(!sidei))) {
shutdown(conn->s[!sidei], SHUT_WR);
- conn_event(c, conn, FIN_SENT(!sidei));
+ conn_event(conn, FIN_SENT(!sidei));
}
}
}
@@ -626,7 +622,7 @@ retry:
return;
close:
- conn_flag(c, conn, CLOSING);
+ conn_flag(conn, CLOSING);
}
/**
@@ -762,10 +758,10 @@ void tcp_splice_timer(struct tcp_splice_conn *conn)
flow_trace(conn, "can't set SO_RCVLOWAT on %d",
conn->s[sidei]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET(sidei));
+ conn_flag(conn, ~RCVLOWAT_SET(sidei));
}
}
flow_foreach_sidei(sidei)
- conn_flag(c, conn, ~RCVLOWAT_ACT(sidei));
+ conn_flag(conn, ~RCVLOWAT_ACT(sidei));
}
diff --git a/udp_flow.c b/udp_flow.c
index c4cf35c2c89d..b016a8095ec6 100644
--- a/udp_flow.c
+++ b/udp_flow.c
@@ -74,7 +74,6 @@ static int udp_flow_sock(const struct ctx *c,
{
const struct flowside *side = &uflow->f.side[sidei];
uint8_t pif = uflow->f.pif[sidei];
- union epoll_ref ref;
int rc;
int s;
@@ -84,14 +83,11 @@ static int udp_flow_sock(const struct ctx *c,
return s;
}
- ref.type = EPOLL_TYPE_UDP;
- ref.flowside = FLOW_SIDX(uflow, sidei);
- ref.fd = s;
-
flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT);
-
- rc = epoll_add(flow_epollfd(&uflow->f), EPOLLIN, ref);
- if (rc < 0) {
+ if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_UDP, &uflow->f,
+ EPOLLIN, s, sidei) < 0) {
+ rc = -errno;
+ flow_epollid_clear(&uflow->f);
close(s);
return rc;
}
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl()
2026-01-08 14:01 ` [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
@ 2026-01-08 23:26 ` David Gibson
0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2026-01-08 23:26 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]
On Thu, Jan 08, 2026 at 03:01:15PM +0100, Laurent Vivier wrote:
> Timer update in tcp_epoll_ctl() has been added by
> bb708111833e ("treewide: Packet abstraction with mandatory boundary checks")
> because epoll_ref stores "conn - tc" that can change in tcp_table_compact().
>
> But since
> e2e8219f13b8 ("flow, tcp: Consolidate flow pointer<->index helpers")
> we use FLOW_IDX() and since
> 8981a720aac4 ("flow: Avoid moving flow entries to compact table")
> flow table doesn't use a compaction procedure so FLOW_IDX() never changes.
>
> Updating the timer event is now a no-op, remove it from tcp_epoll_ctl().
>
> Fixes: 8981a720aac4 ("flow: Avoid moving flow entries to compact table")
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index e7fa85f346b2..5141cdc7e839 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -552,18 +552,6 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>
> flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>
> - if (conn->timer != -1) {
> - union epoll_ref ref_t = { .type = EPOLL_TYPE_TCP_TIMER,
> - .fd = conn->timer,
> - .flow = FLOW_IDX(conn) };
> - struct epoll_event ev_t = { .data.u64 = ref_t.u64,
> - .events = EPOLLIN | EPOLLET };
> -
> - if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_MOD,
> - conn->timer, &ev_t))
> - return -errno;
> - }
> -
> return 0;
> }
>
> --
> 2.52.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection
2026-01-08 14:01 ` [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier
@ 2026-01-08 23:26 ` David Gibson
0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2026-01-08 23:26 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]
On Thu, Jan 08, 2026 at 03:01:16PM +0100, Laurent Vivier wrote:
> The fref union was used to convert flow_sidx_t to uint32_t for
> assignment to ref.data. This is unnecessary since epoll_ref already
> contains a flowside member of type flow_sidx_t, so we can assign
> directly.
>
> This aligns with how icmp.c and other callers assign flow_sidx_t to
> epoll_ref.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> udp_flow.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/udp_flow.c b/udp_flow.c
> index 8907f2f72741..0ba788060db7 100644
> --- a/udp_flow.c
> +++ b/udp_flow.c
> @@ -74,10 +74,6 @@ static int udp_flow_sock(const struct ctx *c,
> {
> const struct flowside *side = &uflow->f.side[sidei];
> uint8_t pif = uflow->f.pif[sidei];
> - union {
> - flow_sidx_t sidx;
> - uint32_t data;
> - } fref = { .sidx = FLOW_SIDX(uflow, sidei) };
> union epoll_ref ref;
> int rc;
> int s;
> @@ -89,7 +85,7 @@ static int udp_flow_sock(const struct ctx *c,
> }
>
> ref.type = EPOLL_TYPE_UDP;
> - ref.data = fref.data;
> + ref.flowside = FLOW_SIDX(uflow, sidei);
> ref.fd = s;
>
> flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT);
> --
> 2.52.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations
2026-01-08 14:01 ` [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
@ 2026-01-08 23:33 ` David Gibson
2026-01-09 9:26 ` Laurent Vivier
0 siblings, 1 reply; 10+ messages in thread
From: David Gibson @ 2026-01-08 23:33 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 17519 bytes --]
On Thu, Jan 08, 2026 at 03:01:19PM +0100, Laurent Vivier wrote:
> Currently, each flow type (TCP, TCP_SPLICE, PING, UDP) has its own
> code to add or modify file descriptors in epoll. This leads to
> duplicated boilerplate code across icmp.c, tcp.c, tcp_splice.c, and
> udp_flow.c, each setting up epoll_ref unions and calling epoll_ctl()
> with flow-type-specific details.
>
> Introduce flow_epoll_set() in flow.c to handle epoll operations for
> all flow types in a unified way. The function takes an explicit epoll
> type parameter, allowing it to handle not only flow socket types but
> also the TCP timer (EPOLL_TYPE_TCP_TIMER).
>
> This will be needed to migrate queue pair from an epollfd to another.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
I'm not totally convinced if folding in TCP timers with the socket fds
makes sense. I think there's still room for improving consistency of
the epoll lifecycle (removing the in_epoll flag equivalent). But both
of those can be looked at later. So,
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> flow.c | 33 ++++++++++++++++++++
> flow.h | 3 ++
> icmp.c | 11 +++----
> tcp.c | 64 ++++++++++++++++++++------------------
> tcp_splice.c | 86 +++++++++++++++++++++++++---------------------------
> udp_flow.c | 12 +++-----
> 6 files changed, 119 insertions(+), 90 deletions(-)
>
> diff --git a/flow.c b/flow.c
> index 4f53486586cd..73d24f9d3765 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -20,6 +20,7 @@
> #include "flow.h"
> #include "flow_table.h"
> #include "repair.h"
> +#include "epoll_ctl.h"
>
> const char *flow_state_str[] = {
> [FLOW_STATE_FREE] = "FREE",
> @@ -390,6 +391,38 @@ void flow_epollid_clear(struct flow_common *f)
> f->epollid = EPOLLFD_ID_INVALID;
> }
>
> +/**
> + * flow_epoll_set() - Add or modify epoll registration for a flow socket
> + * @command: epoll_ctl() command: EPOLL_CTL_ADD or EPOLL_CTL_MOD
> + * @type: epoll type
> + * @f: Flow to register socket for
> + * @events: epoll events to watch for
> + * @fd: File descriptor to register
> + * @sidei: Side index of the flow
> + *
> + * Return: 0 on success, -1 on error (from epoll_ctl())
> + */
> +int flow_epoll_set(int command, enum epoll_type type,
> + const struct flow_common *f, uint32_t events, int fd,
> + unsigned int sidei)
> +{
> + struct epoll_event ev;
> + union epoll_ref ref;
> +
> + ref.type = type;
> + ref.fd = fd;
> +
> + if (type == EPOLL_TYPE_TCP_TIMER)
> + ref.flow = flow_idx(f);
> + else
> + ref.flowside = flow_sidx(f, sidei);
> +
> + ev.events = events;
> + ev.data.u64 = ref.u64;
> +
> + return epoll_ctl(flow_epollfd(f), command, fd, &ev);
> +}
> +
> /**
> * flow_epollid_register() - Initialize the epoll id -> fd mapping
> * @epollid: epoll id to associate to
> diff --git a/flow.h b/flow.h
> index b43b0b1dd7f2..bd5c6d90322b 100644
> --- a/flow.h
> +++ b/flow.h
> @@ -265,6 +265,9 @@ bool flow_in_epoll(const struct flow_common *f);
> int flow_epollfd(const struct flow_common *f);
> void flow_epollid_set(struct flow_common *f, int epollid);
> void flow_epollid_clear(struct flow_common *f);
> +int flow_epoll_set(int command, enum epoll_type type,
> + const struct flow_common *f, uint32_t events, int fd,
> + unsigned int sidei);
> void flow_epollid_register(int epollid, int epollfd);
> void flow_defer_handler(const struct ctx *c, const struct timespec *now);
> int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage,
> diff --git a/icmp.c b/icmp.c
> index 9564c4963f7b..b6bb36d80715 100644
> --- a/icmp.c
> +++ b/icmp.c
> @@ -177,7 +177,6 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
> union flow *flow = flow_alloc();
> struct icmp_ping_flow *pingf;
> const struct flowside *tgt;
> - union epoll_ref ref;
>
> if (!flow)
> return NULL;
> @@ -211,13 +210,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
> goto cancel;
>
> flow_epollid_set(&pingf->f, EPOLLFD_ID_DEFAULT);
> -
> - ref.type = EPOLL_TYPE_PING;
> - ref.flowside = FLOW_SIDX(flow, TGTSIDE);
> - ref.fd = pingf->sock;
> -
> - if (epoll_add(flow_epollfd(&pingf->f), EPOLLIN, ref) < 0) {
> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_PING, &pingf->f,
> + EPOLLIN, pingf->sock,
> + TGTSIDE) < 0) {
> close(pingf->sock);
> + flow_epollid_clear(&pingf->f);
> goto cancel;
> }
>
> diff --git a/tcp.c b/tcp.c
> index 5141cdc7e839..d8cc11b377de 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -523,34 +523,44 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags)
>
> /**
> * tcp_epoll_ctl() - Add/modify/delete epoll state from connection events
> - * @c: Execution context
> * @conn: Connection pointer
> *
> * Return: 0 on success, negative error code on failure (not on deletion)
> */
> -static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> +static int tcp_epoll_ctl(struct tcp_tap_conn *conn)
> {
> - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
> - union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock,
> - .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), };
> - struct epoll_event ev = { .data.u64 = ref.u64 };
> - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
> - : c->epollfd;
> + uint32_t events;
> + int m;
>
> if (conn->events == CLOSED) {
> - if (flow_in_epoll(&conn->f))
> + if (flow_in_epoll(&conn->f)) {
> + int epollfd = flow_epollfd(&conn->f);
> +
> epoll_del(epollfd, conn->sock);
> - if (conn->timer != -1)
> - epoll_del(epollfd, conn->timer);
> + if (conn->timer != -1)
> + epoll_del(epollfd, conn->timer);
> + }
> +
> return 0;
> }
>
> - ev.events = tcp_conn_epoll_events(conn->events, conn->flags);
> + events = tcp_conn_epoll_events(conn->events, conn->flags);
>
> - if (epoll_ctl(epollfd, m, conn->sock, &ev))
> - return -errno;
> + if (flow_in_epoll(&conn->f)) {
> + m = EPOLL_CTL_MOD;
> + } else {
> + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
> + m = EPOLL_CTL_ADD;
> + }
> +
> + if (flow_epoll_set(m, EPOLL_TYPE_TCP, &conn->f, events, conn->sock,
> + !TAPSIDE(conn)) < 0) {
> + int ret = -errno;
>
> - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
> + if (m == EPOLL_CTL_ADD)
> + flow_epollid_clear(&conn->f);
> + return ret;
> + }
>
> return 0;
> }
> @@ -569,11 +579,6 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> return;
>
> if (conn->timer == -1) {
> - union epoll_ref ref = { .type = EPOLL_TYPE_TCP_TIMER,
> - .flow = FLOW_IDX(conn) };
> - struct epoll_event ev = { .data.u64 = ref.u64,
> - .events = EPOLLIN | EPOLLET };
> - int epollfd = flow_epollfd(&conn->f);
> int fd;
>
> fd = timerfd_create(CLOCK_MONOTONIC, 0);
> @@ -581,18 +586,17 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> flow_dbg_perror(conn, "failed to get timer");
> if (fd > -1)
> close(fd);
> - conn->timer = -1;
> return;
> }
> - conn->timer = fd;
> - ref.fd = conn->timer;
>
> - if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) {
> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_TCP_TIMER,
> + &conn->f, EPOLLIN | EPOLLET, fd, 0) < 0) {
> flow_dbg_perror(conn, "failed to add timer");
> - close(conn->timer);
> - conn->timer = -1;
> + close(fd);
> return;
> }
> +
> + conn->timer = fd;
> }
>
> if (conn->flags & ACK_TO_TAP_DUE) {
> @@ -669,7 +673,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
> }
>
> if (flag == STALLED || flag == ~STALLED)
> - tcp_epoll_ctl(c, conn);
> + tcp_epoll_ctl(conn);
>
> if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE ||
> (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) ||
> @@ -726,7 +730,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
> } else {
> if (event == CLOSED)
> flow_hash_remove(c, TAP_SIDX(conn));
> - tcp_epoll_ctl(c, conn);
> + tcp_epoll_ctl(conn);
> }
>
> if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED))
> @@ -1742,7 +1746,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
> conn_event(c, conn, TAP_SYN_ACK_SENT);
> }
>
> - tcp_epoll_ctl(c, conn);
> + tcp_epoll_ctl(conn);
>
> if (c->mode == MODE_VU) { /* To rebind to same oport after migration */
> socklen_t sl = sizeof(sa);
> @@ -3984,7 +3988,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
> tcp_send_flag(c, conn, ACK);
> tcp_data_from_sock(c, conn);
>
> - if ((rc = tcp_epoll_ctl(c, conn))) {
> + if ((rc = tcp_epoll_ctl(conn))) {
> flow_dbg(conn,
> "Failed to subscribe to epoll for migrated socket: %s",
> strerror_(-rc));
> diff --git a/tcp_splice.c b/tcp_splice.c
> index bf4ff466de07..26e9845c39ee 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -135,37 +135,35 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei)
>
> /**
> * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
> - * @c: Execution context
> * @conn: Connection pointer
> *
> * Return: 0 on success, negative error code on failure (not on deletion)
> */
> -static int tcp_splice_epoll_ctl(const struct ctx *c,
> - struct tcp_splice_conn *conn)
> +static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn)
> {
> - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
> - : c->epollfd;
> - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
> - const union epoll_ref ref[SIDES] = {
> - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[0],
> - .flowside = FLOW_SIDX(conn, 0) },
> - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[1],
> - .flowside = FLOW_SIDX(conn, 1) }
> - };
> - struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
> - { .data.u64 = ref[1].u64 } };
> -
> - ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0);
> - ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1);
> -
> -
> - if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) ||
> - epoll_ctl(epollfd, m, conn->s[1], &ev[1])) {
> + uint32_t events[2];
> + int m;
> +
> + if (flow_in_epoll(&conn->f)) {
> + m = EPOLL_CTL_MOD;
> + } else {
> + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
> + m = EPOLL_CTL_ADD;
> + }
> +
> + events[0] = tcp_splice_conn_epoll_events(conn->events, 0);
> + events[1] = tcp_splice_conn_epoll_events(conn->events, 1);
> +
> + if (flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[0],
> + conn->s[0], 0) ||
> + flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[1],
> + conn->s[1], 1)) {
> int ret = -errno;
> flow_perror(conn, "ERROR on epoll_ctl()");
> + if (m == EPOLL_CTL_ADD)
> + flow_epollid_clear(&conn->f);
> return ret;
> }
> - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>
> return 0;
> }
> @@ -205,7 +203,7 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
> }
> }
>
> -#define conn_flag(c, conn, flag) \
> +#define conn_flag(conn, flag) \
> do { \
> flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \
> conn_flag_do(conn, flag); \
> @@ -213,12 +211,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
>
> /**
> * conn_event_do() - Set and log connection events, update epoll state
> - * @c: Execution context
> * @conn: Connection pointer
> * @event: Connection event
> */
> -static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
> - unsigned long event)
> +static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event)
> {
> if (event & (event - 1)) {
> int flag_index = fls(~event);
> @@ -240,14 +236,14 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
> flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]);
> }
>
> - if (tcp_splice_epoll_ctl(c, conn))
> - conn_flag(c, conn, CLOSING);
> + if (tcp_splice_epoll_ctl(conn))
> + conn_flag(conn, CLOSING);
> }
>
> -#define conn_event(c, conn, event) \
> +#define conn_event(conn, event) \
> do { \
> flow_trace(conn, "event at %s:%i",__func__, __LINE__); \
> - conn_event_do(c, conn, event); \
> + conn_event_do(conn, event); \
> } while (0)
>
>
> @@ -315,7 +311,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
> if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) {
> flow_perror(conn, "cannot create %d->%d pipe",
> sidei, !sidei);
> - conn_flag(c, conn, CLOSING);
> + conn_flag(conn, CLOSING);
> return -EIO;
> }
>
> @@ -329,7 +325,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
> }
>
> if (!(conn->events & SPLICE_ESTABLISHED))
> - conn_event(c, conn, SPLICE_ESTABLISHED);
> + conn_event(conn, SPLICE_ESTABLISHED);
>
> return 0;
> }
> @@ -376,7 +372,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
>
> pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
>
> - conn_event(c, conn, SPLICE_CONNECT);
> + conn_event(conn, SPLICE_CONNECT);
>
> if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) {
> if (errno != EINPROGRESS) {
> @@ -385,7 +381,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
> return -errno;
> }
> } else {
> - conn_event(c, conn, SPLICE_ESTABLISHED);
> + conn_event(conn, SPLICE_ESTABLISHED);
> return tcp_splice_connect_finish(c, conn);
> }
>
> @@ -445,7 +441,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0)
> flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0);
>
> if (tcp_splice_connect(c, conn))
> - conn_flag(c, conn, CLOSING);
> + conn_flag(conn, CLOSING);
>
> FLOW_ACTIVATE(conn);
> }
> @@ -494,14 +490,14 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
>
> if (events & EPOLLOUT) {
> fromsidei = !evsidei;
> - conn_event(c, conn, ~OUT_WAIT(evsidei));
> + conn_event(conn, ~OUT_WAIT(evsidei));
> } else {
> fromsidei = evsidei;
> }
>
> if (events & EPOLLRDHUP)
> /* For side 0 this is fake, but implied */
> - conn_event(c, conn, FIN_RCVD(evsidei));
> + conn_event(conn, FIN_RCVD(evsidei));
>
> swap:
> eof = 0;
> @@ -536,7 +532,7 @@ retry:
> more = SPLICE_F_MORE;
>
> if (conn->flags & lowat_set_flag)
> - conn_flag(c, conn, lowat_act_flag);
> + conn_flag(conn, lowat_act_flag);
> }
>
> do
> @@ -568,8 +564,8 @@ retry:
> "Setting SO_RCVLOWAT %i: %s",
> lowat, strerror_(errno));
> } else {
> - conn_flag(c, conn, lowat_set_flag);
> - conn_flag(c, conn, lowat_act_flag);
> + conn_flag(conn, lowat_set_flag);
> + conn_flag(conn, lowat_act_flag);
> }
> }
>
> @@ -583,7 +579,7 @@ retry:
> if (conn->read[fromsidei] == conn->written[fromsidei])
> break;
>
> - conn_event(c, conn, OUT_WAIT(!fromsidei));
> + conn_event(conn, OUT_WAIT(!fromsidei));
> break;
> }
>
> @@ -605,7 +601,7 @@ retry:
> if ((conn->events & FIN_RCVD(sidei)) &&
> !(conn->events & FIN_SENT(!sidei))) {
> shutdown(conn->s[!sidei], SHUT_WR);
> - conn_event(c, conn, FIN_SENT(!sidei));
> + conn_event(conn, FIN_SENT(!sidei));
> }
> }
> }
> @@ -626,7 +622,7 @@ retry:
> return;
>
> close:
> - conn_flag(c, conn, CLOSING);
> + conn_flag(conn, CLOSING);
> }
>
> /**
> @@ -762,10 +758,10 @@ void tcp_splice_timer(struct tcp_splice_conn *conn)
> flow_trace(conn, "can't set SO_RCVLOWAT on %d",
> conn->s[sidei]);
> }
> - conn_flag(c, conn, ~RCVLOWAT_SET(sidei));
> + conn_flag(conn, ~RCVLOWAT_SET(sidei));
> }
> }
>
> flow_foreach_sidei(sidei)
> - conn_flag(c, conn, ~RCVLOWAT_ACT(sidei));
> + conn_flag(conn, ~RCVLOWAT_ACT(sidei));
> }
> diff --git a/udp_flow.c b/udp_flow.c
> index c4cf35c2c89d..b016a8095ec6 100644
> --- a/udp_flow.c
> +++ b/udp_flow.c
> @@ -74,7 +74,6 @@ static int udp_flow_sock(const struct ctx *c,
> {
> const struct flowside *side = &uflow->f.side[sidei];
> uint8_t pif = uflow->f.pif[sidei];
> - union epoll_ref ref;
> int rc;
> int s;
>
> @@ -84,14 +83,11 @@ static int udp_flow_sock(const struct ctx *c,
> return s;
> }
>
> - ref.type = EPOLL_TYPE_UDP;
> - ref.flowside = FLOW_SIDX(uflow, sidei);
> - ref.fd = s;
> -
> flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT);
> -
> - rc = epoll_add(flow_epollfd(&uflow->f), EPOLLIN, ref);
> - if (rc < 0) {
> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_UDP, &uflow->f,
> + EPOLLIN, s, sidei) < 0) {
> + rc = -errno;
> + flow_epollid_clear(&uflow->f);
> close(s);
> return rc;
> }
> --
> 2.52.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations
2026-01-08 23:33 ` David Gibson
@ 2026-01-09 9:26 ` Laurent Vivier
0 siblings, 0 replies; 10+ messages in thread
From: Laurent Vivier @ 2026-01-09 9:26 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On 1/9/26 00:33, David Gibson wrote:
> On Thu, Jan 08, 2026 at 03:01:19PM +0100, Laurent Vivier wrote:
>> Currently, each flow type (TCP, TCP_SPLICE, PING, UDP) has its own
>> code to add or modify file descriptors in epoll. This leads to
>> duplicated boilerplate code across icmp.c, tcp.c, tcp_splice.c, and
>> udp_flow.c, each setting up epoll_ref unions and calling epoll_ctl()
>> with flow-type-specific details.
>>
>> Introduce flow_epoll_set() in flow.c to handle epoll operations for
>> all flow types in a unified way. The function takes an explicit epoll
>> type parameter, allowing it to handle not only flow socket types but
>> also the TCP timer (EPOLL_TYPE_TCP_TIMER).
>>
>> This will be needed to migrate queue pair from an epollfd to another.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> I'm not totally convinced if folding in TCP timers with the socket fds
> makes sense. I think there's still room for improving consistency of
I agree. And as we don't need to manage the EPOLL_CTL_MOD for it anymore, the case is
simpler. Moreover, if we don't manage it in the function we can remove the epoll_type from
the parameters as we can rely on the flow type.
Thanks,
Laurent
> the epoll lifecycle (removing the in_epoll flag equivalent). But both
> of those can be looked at later. So,
>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>
>> ---
>> flow.c | 33 ++++++++++++++++++++
>> flow.h | 3 ++
>> icmp.c | 11 +++----
>> tcp.c | 64 ++++++++++++++++++++------------------
>> tcp_splice.c | 86 +++++++++++++++++++++++++---------------------------
>> udp_flow.c | 12 +++-----
>> 6 files changed, 119 insertions(+), 90 deletions(-)
>>
>> diff --git a/flow.c b/flow.c
>> index 4f53486586cd..73d24f9d3765 100644
>> --- a/flow.c
>> +++ b/flow.c
>> @@ -20,6 +20,7 @@
>> #include "flow.h"
>> #include "flow_table.h"
>> #include "repair.h"
>> +#include "epoll_ctl.h"
>>
>> const char *flow_state_str[] = {
>> [FLOW_STATE_FREE] = "FREE",
>> @@ -390,6 +391,38 @@ void flow_epollid_clear(struct flow_common *f)
>> f->epollid = EPOLLFD_ID_INVALID;
>> }
>>
>> +/**
>> + * flow_epoll_set() - Add or modify epoll registration for a flow socket
>> + * @command: epoll_ctl() command: EPOLL_CTL_ADD or EPOLL_CTL_MOD
>> + * @type: epoll type
>> + * @f: Flow to register socket for
>> + * @events: epoll events to watch for
>> + * @fd: File descriptor to register
>> + * @sidei: Side index of the flow
>> + *
>> + * Return: 0 on success, -1 on error (from epoll_ctl())
>> + */
>> +int flow_epoll_set(int command, enum epoll_type type,
>> + const struct flow_common *f, uint32_t events, int fd,
>> + unsigned int sidei)
>> +{
>> + struct epoll_event ev;
>> + union epoll_ref ref;
>> +
>> + ref.type = type;
>> + ref.fd = fd;
>> +
>> + if (type == EPOLL_TYPE_TCP_TIMER)
>> + ref.flow = flow_idx(f);
>> + else
>> + ref.flowside = flow_sidx(f, sidei);
>> +
>> + ev.events = events;
>> + ev.data.u64 = ref.u64;
>> +
>> + return epoll_ctl(flow_epollfd(f), command, fd, &ev);
>> +}
>> +
>> /**
>> * flow_epollid_register() - Initialize the epoll id -> fd mapping
>> * @epollid: epoll id to associate to
>> diff --git a/flow.h b/flow.h
>> index b43b0b1dd7f2..bd5c6d90322b 100644
>> --- a/flow.h
>> +++ b/flow.h
>> @@ -265,6 +265,9 @@ bool flow_in_epoll(const struct flow_common *f);
>> int flow_epollfd(const struct flow_common *f);
>> void flow_epollid_set(struct flow_common *f, int epollid);
>> void flow_epollid_clear(struct flow_common *f);
>> +int flow_epoll_set(int command, enum epoll_type type,
>> + const struct flow_common *f, uint32_t events, int fd,
>> + unsigned int sidei);
>> void flow_epollid_register(int epollid, int epollfd);
>> void flow_defer_handler(const struct ctx *c, const struct timespec *now);
>> int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage,
>> diff --git a/icmp.c b/icmp.c
>> index 9564c4963f7b..b6bb36d80715 100644
>> --- a/icmp.c
>> +++ b/icmp.c
>> @@ -177,7 +177,6 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
>> union flow *flow = flow_alloc();
>> struct icmp_ping_flow *pingf;
>> const struct flowside *tgt;
>> - union epoll_ref ref;
>>
>> if (!flow)
>> return NULL;
>> @@ -211,13 +210,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
>> goto cancel;
>>
>> flow_epollid_set(&pingf->f, EPOLLFD_ID_DEFAULT);
>> -
>> - ref.type = EPOLL_TYPE_PING;
>> - ref.flowside = FLOW_SIDX(flow, TGTSIDE);
>> - ref.fd = pingf->sock;
>> -
>> - if (epoll_add(flow_epollfd(&pingf->f), EPOLLIN, ref) < 0) {
>> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_PING, &pingf->f,
>> + EPOLLIN, pingf->sock,
>> + TGTSIDE) < 0) {
>> close(pingf->sock);
>> + flow_epollid_clear(&pingf->f);
>> goto cancel;
>> }
>>
>> diff --git a/tcp.c b/tcp.c
>> index 5141cdc7e839..d8cc11b377de 100644
>> --- a/tcp.c
>> +++ b/tcp.c
>> @@ -523,34 +523,44 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags)
>>
>> /**
>> * tcp_epoll_ctl() - Add/modify/delete epoll state from connection events
>> - * @c: Execution context
>> * @conn: Connection pointer
>> *
>> * Return: 0 on success, negative error code on failure (not on deletion)
>> */
>> -static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>> +static int tcp_epoll_ctl(struct tcp_tap_conn *conn)
>> {
>> - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
>> - union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock,
>> - .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), };
>> - struct epoll_event ev = { .data.u64 = ref.u64 };
>> - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
>> - : c->epollfd;
>> + uint32_t events;
>> + int m;
>>
>> if (conn->events == CLOSED) {
>> - if (flow_in_epoll(&conn->f))
>> + if (flow_in_epoll(&conn->f)) {
>> + int epollfd = flow_epollfd(&conn->f);
>> +
>> epoll_del(epollfd, conn->sock);
>> - if (conn->timer != -1)
>> - epoll_del(epollfd, conn->timer);
>> + if (conn->timer != -1)
>> + epoll_del(epollfd, conn->timer);
>> + }
>> +
>> return 0;
>> }
>>
>> - ev.events = tcp_conn_epoll_events(conn->events, conn->flags);
>> + events = tcp_conn_epoll_events(conn->events, conn->flags);
>>
>> - if (epoll_ctl(epollfd, m, conn->sock, &ev))
>> - return -errno;
>> + if (flow_in_epoll(&conn->f)) {
>> + m = EPOLL_CTL_MOD;
>> + } else {
>> + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>> + m = EPOLL_CTL_ADD;
>> + }
>> +
>> + if (flow_epoll_set(m, EPOLL_TYPE_TCP, &conn->f, events, conn->sock,
>> + !TAPSIDE(conn)) < 0) {
>> + int ret = -errno;
>>
>> - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>> + if (m == EPOLL_CTL_ADD)
>> + flow_epollid_clear(&conn->f);
>> + return ret;
>> + }
>>
>> return 0;
>> }
>> @@ -569,11 +579,6 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>> return;
>>
>> if (conn->timer == -1) {
>> - union epoll_ref ref = { .type = EPOLL_TYPE_TCP_TIMER,
>> - .flow = FLOW_IDX(conn) };
>> - struct epoll_event ev = { .data.u64 = ref.u64,
>> - .events = EPOLLIN | EPOLLET };
>> - int epollfd = flow_epollfd(&conn->f);
>> int fd;
>>
>> fd = timerfd_create(CLOCK_MONOTONIC, 0);
>> @@ -581,18 +586,17 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>> flow_dbg_perror(conn, "failed to get timer");
>> if (fd > -1)
>> close(fd);
>> - conn->timer = -1;
>> return;
>> }
>> - conn->timer = fd;
>> - ref.fd = conn->timer;
>>
>> - if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) {
>> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_TCP_TIMER,
>> + &conn->f, EPOLLIN | EPOLLET, fd, 0) < 0) {
>> flow_dbg_perror(conn, "failed to add timer");
>> - close(conn->timer);
>> - conn->timer = -1;
>> + close(fd);
>> return;
>> }
>> +
>> + conn->timer = fd;
>> }
>>
>> if (conn->flags & ACK_TO_TAP_DUE) {
>> @@ -669,7 +673,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
>> }
>>
>> if (flag == STALLED || flag == ~STALLED)
>> - tcp_epoll_ctl(c, conn);
>> + tcp_epoll_ctl(conn);
>>
>> if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE ||
>> (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) ||
>> @@ -726,7 +730,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
>> } else {
>> if (event == CLOSED)
>> flow_hash_remove(c, TAP_SIDX(conn));
>> - tcp_epoll_ctl(c, conn);
>> + tcp_epoll_ctl(conn);
>> }
>>
>> if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED))
>> @@ -1742,7 +1746,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af,
>> conn_event(c, conn, TAP_SYN_ACK_SENT);
>> }
>>
>> - tcp_epoll_ctl(c, conn);
>> + tcp_epoll_ctl(conn);
>>
>> if (c->mode == MODE_VU) { /* To rebind to same oport after migration */
>> socklen_t sl = sizeof(sa);
>> @@ -3984,7 +3988,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
>> tcp_send_flag(c, conn, ACK);
>> tcp_data_from_sock(c, conn);
>>
>> - if ((rc = tcp_epoll_ctl(c, conn))) {
>> + if ((rc = tcp_epoll_ctl(conn))) {
>> flow_dbg(conn,
>> "Failed to subscribe to epoll for migrated socket: %s",
>> strerror_(-rc));
>> diff --git a/tcp_splice.c b/tcp_splice.c
>> index bf4ff466de07..26e9845c39ee 100644
>> --- a/tcp_splice.c
>> +++ b/tcp_splice.c
>> @@ -135,37 +135,35 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei)
>>
>> /**
>> * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
>> - * @c: Execution context
>> * @conn: Connection pointer
>> *
>> * Return: 0 on success, negative error code on failure (not on deletion)
>> */
>> -static int tcp_splice_epoll_ctl(const struct ctx *c,
>> - struct tcp_splice_conn *conn)
>> +static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn)
>> {
>> - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f)
>> - : c->epollfd;
>> - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
>> - const union epoll_ref ref[SIDES] = {
>> - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[0],
>> - .flowside = FLOW_SIDX(conn, 0) },
>> - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[1],
>> - .flowside = FLOW_SIDX(conn, 1) }
>> - };
>> - struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
>> - { .data.u64 = ref[1].u64 } };
>> -
>> - ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0);
>> - ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1);
>> -
>> -
>> - if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) ||
>> - epoll_ctl(epollfd, m, conn->s[1], &ev[1])) {
>> + uint32_t events[2];
>> + int m;
>> +
>> + if (flow_in_epoll(&conn->f)) {
>> + m = EPOLL_CTL_MOD;
>> + } else {
>> + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>> + m = EPOLL_CTL_ADD;
>> + }
>> +
>> + events[0] = tcp_splice_conn_epoll_events(conn->events, 0);
>> + events[1] = tcp_splice_conn_epoll_events(conn->events, 1);
>> +
>> + if (flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[0],
>> + conn->s[0], 0) ||
>> + flow_epoll_set(m, EPOLL_TYPE_TCP_SPLICE, &conn->f, events[1],
>> + conn->s[1], 1)) {
>> int ret = -errno;
>> flow_perror(conn, "ERROR on epoll_ctl()");
>> + if (m == EPOLL_CTL_ADD)
>> + flow_epollid_clear(&conn->f);
>> return ret;
>> }
>> - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT);
>>
>> return 0;
>> }
>> @@ -205,7 +203,7 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
>> }
>> }
>>
>> -#define conn_flag(c, conn, flag) \
>> +#define conn_flag(conn, flag) \
>> do { \
>> flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \
>> conn_flag_do(conn, flag); \
>> @@ -213,12 +211,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn,
>>
>> /**
>> * conn_event_do() - Set and log connection events, update epoll state
>> - * @c: Execution context
>> * @conn: Connection pointer
>> * @event: Connection event
>> */
>> -static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
>> - unsigned long event)
>> +static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event)
>> {
>> if (event & (event - 1)) {
>> int flag_index = fls(~event);
>> @@ -240,14 +236,14 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
>> flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]);
>> }
>>
>> - if (tcp_splice_epoll_ctl(c, conn))
>> - conn_flag(c, conn, CLOSING);
>> + if (tcp_splice_epoll_ctl(conn))
>> + conn_flag(conn, CLOSING);
>> }
>>
>> -#define conn_event(c, conn, event) \
>> +#define conn_event(conn, event) \
>> do { \
>> flow_trace(conn, "event at %s:%i",__func__, __LINE__); \
>> - conn_event_do(c, conn, event); \
>> + conn_event_do(conn, event); \
>> } while (0)
>>
>>
>> @@ -315,7 +311,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
>> if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) {
>> flow_perror(conn, "cannot create %d->%d pipe",
>> sidei, !sidei);
>> - conn_flag(c, conn, CLOSING);
>> + conn_flag(conn, CLOSING);
>> return -EIO;
>> }
>>
>> @@ -329,7 +325,7 @@ static int tcp_splice_connect_finish(const struct ctx *c,
>> }
>>
>> if (!(conn->events & SPLICE_ESTABLISHED))
>> - conn_event(c, conn, SPLICE_ESTABLISHED);
>> + conn_event(conn, SPLICE_ESTABLISHED);
>>
>> return 0;
>> }
>> @@ -376,7 +372,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
>>
>> pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
>>
>> - conn_event(c, conn, SPLICE_CONNECT);
>> + conn_event(conn, SPLICE_CONNECT);
>>
>> if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) {
>> if (errno != EINPROGRESS) {
>> @@ -385,7 +381,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
>> return -errno;
>> }
>> } else {
>> - conn_event(c, conn, SPLICE_ESTABLISHED);
>> + conn_event(conn, SPLICE_ESTABLISHED);
>> return tcp_splice_connect_finish(c, conn);
>> }
>>
>> @@ -445,7 +441,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0)
>> flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0);
>>
>> if (tcp_splice_connect(c, conn))
>> - conn_flag(c, conn, CLOSING);
>> + conn_flag(conn, CLOSING);
>>
>> FLOW_ACTIVATE(conn);
>> }
>> @@ -494,14 +490,14 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
>>
>> if (events & EPOLLOUT) {
>> fromsidei = !evsidei;
>> - conn_event(c, conn, ~OUT_WAIT(evsidei));
>> + conn_event(conn, ~OUT_WAIT(evsidei));
>> } else {
>> fromsidei = evsidei;
>> }
>>
>> if (events & EPOLLRDHUP)
>> /* For side 0 this is fake, but implied */
>> - conn_event(c, conn, FIN_RCVD(evsidei));
>> + conn_event(conn, FIN_RCVD(evsidei));
>>
>> swap:
>> eof = 0;
>> @@ -536,7 +532,7 @@ retry:
>> more = SPLICE_F_MORE;
>>
>> if (conn->flags & lowat_set_flag)
>> - conn_flag(c, conn, lowat_act_flag);
>> + conn_flag(conn, lowat_act_flag);
>> }
>>
>> do
>> @@ -568,8 +564,8 @@ retry:
>> "Setting SO_RCVLOWAT %i: %s",
>> lowat, strerror_(errno));
>> } else {
>> - conn_flag(c, conn, lowat_set_flag);
>> - conn_flag(c, conn, lowat_act_flag);
>> + conn_flag(conn, lowat_set_flag);
>> + conn_flag(conn, lowat_act_flag);
>> }
>> }
>>
>> @@ -583,7 +579,7 @@ retry:
>> if (conn->read[fromsidei] == conn->written[fromsidei])
>> break;
>>
>> - conn_event(c, conn, OUT_WAIT(!fromsidei));
>> + conn_event(conn, OUT_WAIT(!fromsidei));
>> break;
>> }
>>
>> @@ -605,7 +601,7 @@ retry:
>> if ((conn->events & FIN_RCVD(sidei)) &&
>> !(conn->events & FIN_SENT(!sidei))) {
>> shutdown(conn->s[!sidei], SHUT_WR);
>> - conn_event(c, conn, FIN_SENT(!sidei));
>> + conn_event(conn, FIN_SENT(!sidei));
>> }
>> }
>> }
>> @@ -626,7 +622,7 @@ retry:
>> return;
>>
>> close:
>> - conn_flag(c, conn, CLOSING);
>> + conn_flag(conn, CLOSING);
>> }
>>
>> /**
>> @@ -762,10 +758,10 @@ void tcp_splice_timer(struct tcp_splice_conn *conn)
>> flow_trace(conn, "can't set SO_RCVLOWAT on %d",
>> conn->s[sidei]);
>> }
>> - conn_flag(c, conn, ~RCVLOWAT_SET(sidei));
>> + conn_flag(conn, ~RCVLOWAT_SET(sidei));
>> }
>> }
>>
>> flow_foreach_sidei(sidei)
>> - conn_flag(c, conn, ~RCVLOWAT_ACT(sidei));
>> + conn_flag(conn, ~RCVLOWAT_ACT(sidei));
>> }
>> diff --git a/udp_flow.c b/udp_flow.c
>> index c4cf35c2c89d..b016a8095ec6 100644
>> --- a/udp_flow.c
>> +++ b/udp_flow.c
>> @@ -74,7 +74,6 @@ static int udp_flow_sock(const struct ctx *c,
>> {
>> const struct flowside *side = &uflow->f.side[sidei];
>> uint8_t pif = uflow->f.pif[sidei];
>> - union epoll_ref ref;
>> int rc;
>> int s;
>>
>> @@ -84,14 +83,11 @@ static int udp_flow_sock(const struct ctx *c,
>> return s;
>> }
>>
>> - ref.type = EPOLL_TYPE_UDP;
>> - ref.flowside = FLOW_SIDX(uflow, sidei);
>> - ref.fd = s;
>> -
>> flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT);
>> -
>> - rc = epoll_add(flow_epollfd(&uflow->f), EPOLLIN, ref);
>> - if (rc < 0) {
>> + if (flow_epoll_set(EPOLL_CTL_ADD, EPOLL_TYPE_UDP, &uflow->f,
>> + EPOLLIN, s, sidei) < 0) {
>> + rc = -errno;
>> + flow_epollid_clear(&uflow->f);
>> close(s);
>> return rc;
>> }
>> --
>> 2.52.0
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-09 9:26 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-08 14:01 [PATCH v2 0/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 1/5] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
2026-01-08 23:26 ` David Gibson
2026-01-08 14:01 ` [PATCH v2 2/5] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier
2026-01-08 23:26 ` David Gibson
2026-01-08 14:01 ` [PATCH v2 3/5] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 4/5] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier
2026-01-08 14:01 ` [PATCH v2 5/5] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
2026-01-08 23:33 ` David Gibson
2026-01-09 9:26 ` Laurent Vivier
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).