* [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations
@ 2026-01-09 16:54 Laurent Vivier
2026-01-09 16:54 ` [PATCH v3 1/6] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Currently, each protocol handler (TCP, TCP splice, ICMP/ping, UDP) has its
own code to add or modify file descriptors in epoll. This leads to
duplicated boilerplate across icmp.c, tcp.c, tcp_splice.c, and udp_flow.c,
with each setting up epoll_ref unions and calling epoll_ctl() with
flow-type-specific details.
This series introduces flow_epoll_set() in flow.c to handle epoll
operations for all flow types in a unified way. The function derives
the epoll type from the flow type via a mapping array, providing a
simpler interface where the caller only needs to pass the flow, command,
events, file descriptor, and side index.
The API is:
int flow_epoll_set(const struct flow_common *f, int command,
uint32_t events, int fd, unsigned int sidei);
This centralized approach will be essential for queue pair migration in
the upcoming multithreading work, where flows need to be moved between
different epoll instances owned by different threads.
Preparatory patches:
- Patch 1 removes dead timer update code in tcp_epoll_ctl()
- Patch 2 refactors TCP timer creation to use epoll_add() helper
- Patch 3 removes unneeded epoll_ref indirection in udp_flow
- Patch 4 refactors udp_flow_sock() to assign the socket internally
- Patch 5 refactors tcp_splice_conn_epoll_events() to per-side computation
Core patch:
- Patch 6 introduces flow_epoll_set() used by all protocol handlers
Changes in v3:
- Added patch 2: "tcp: cleanup timer creation" refactoring tcp_timer_ctl()
to use epoll_add() helper with improved error handling
- Removed epoll type parameter from flow_epoll_set(): the type is now
derived from the flow type via a new flow_epoll[] mapping array
- Simplified function signature to flow_epoll_set(f, command, events, fd, sidei)
- TCP timer no longer uses flow_epoll_set(), uses epoll_add() directly
- Added Reviewed-by from David Gibson to patches 1 and 3
- Series now 6 patches (was 5)
Changes in v2:
- Added patch 1: Remove dead timer update in tcp_epoll_ctl() since flow
table compaction was eliminated
- Added patch 2: Remove unnecessary epoll_ref indirection in udp_flow.c
- Dropped v1 patch 6 (compute events inside flow_epoll_set())
- Dropped v1 patch 7 (retrieve fd from flow structure)
- Added epoll_ctl() command parameter (EPOLL_CTL_ADD/EPOLL_CTL_MOD)
- Callers must set epollfd via flow_epollid_set() before calling
flow_epoll_set()
- Removed FLOW_EPOLL_TIMER_ADD/FLOW_EPOLL_TIMER_MOD constants
- Reduced series from 7 to 5 patches
Laurent Vivier (6):
tcp: remove timer update in tcp_epoll_ctl()
tcp: cleanup timer creation
udp_flow: remove unneeded epoll_ref indirection
udp_flow: Assign socket to flow inside udp_flow_sock()
tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side
computation
flow: Introduce flow_epoll_set() to centralize epoll operations
flow.c | 37 +++++++++++++++++
flow.h | 2 +
icmp.c | 10 ++---
tcp.c | 88 ++++++++++++++++++++--------------------
tcp_splice.c | 111 +++++++++++++++++++++++----------------------------
udp_flow.c | 18 +++------
6 files changed, 140 insertions(+), 126 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v3 1/6] tcp: remove timer update in tcp_epoll_ctl() 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 2/6] tcp: cleanup timer creation Laurent Vivier ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier, David Gibson Timer update in tcp_epoll_ctl() has been added by bb708111833e ("treewide: Packet abstraction with mandatory boundary checks") because epoll_ref stores "conn - tc" that can change in tcp_table_compact(). But since e2e8219f13b8 ("flow, tcp: Consolidate flow pointer<->index helpers") we use FLOW_IDX() and since 8981a720aac4 ("flow: Avoid moving flow entries to compact table") flow table doesn't use a compaction procedure so FLOW_IDX() never changes. Updating the timer event is now a no-op, remove it from tcp_epoll_ctl(). Fixes: 8981a720aac4 ("flow: Avoid moving flow entries to compact table") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> --- tcp.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/tcp.c b/tcp.c index e7fa85f346b2..5141cdc7e839 100644 --- a/tcp.c +++ b/tcp.c @@ -552,18 +552,6 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); - if (conn->timer != -1) { - union epoll_ref ref_t = { .type = EPOLL_TYPE_TCP_TIMER, - .fd = conn->timer, - .flow = FLOW_IDX(conn) }; - struct epoll_event ev_t = { .data.u64 = ref_t.u64, - .events = EPOLLIN | EPOLLET }; - - if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_MOD, - conn->timer, &ev_t)) - return -errno; - } - return 0; } -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 2/6] tcp: cleanup timer creation 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 1/6] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 3/6] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier Refactor tcp_timer_ctl() to use the epoll_add() helper instead of manually constructing epoll_event and calling epoll_ctl() directly. Also separate the error handling for timerfd_create() failure (-1) from fd overflow (> FD_REF_MAX) to provide more specific debug messages. The overflow case now logs the actual fd value, using flow_dbg_perror() in this case was not correct (no errno set). Delay setting conn->timer until epoll_add() succeeds, removing redundant conn->timer = -1 assignments in error paths since the timer is already -1 when we enter this block. Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- tcp.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/tcp.c b/tcp.c index 5141cdc7e839..4b746b4ca16c 100644 --- a/tcp.c +++ b/tcp.c @@ -569,30 +569,32 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) return; if (conn->timer == -1) { - union epoll_ref ref = { .type = EPOLL_TYPE_TCP_TIMER, - .flow = FLOW_IDX(conn) }; - struct epoll_event ev = { .data.u64 = ref.u64, - .events = EPOLLIN | EPOLLET }; - int epollfd = flow_epollfd(&conn->f); + union epoll_ref ref; int fd; fd = timerfd_create(CLOCK_MONOTONIC, 0); - if (fd == -1 || fd > FD_REF_MAX) { + if (fd == -1) { flow_dbg_perror(conn, "failed to get timer"); - if (fd > -1) - close(fd); - conn->timer = -1; return; } - conn->timer = fd; - ref.fd = conn->timer; + if (fd > FD_REF_MAX) { + flow_dbg(conn, "timer fd overflow (%d > %d)", + fd, FD_REF_MAX); + close(fd); + return; + } - if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) { - flow_dbg_perror(conn, "failed to add timer"); - close(conn->timer); - conn->timer = -1; + ref.type = EPOLL_TYPE_TCP_TIMER; + ref.flow = FLOW_IDX(conn); + ref.fd = fd; + if (epoll_add(flow_epollfd(&conn->f), EPOLLIN | EPOLLET, + ref) < 0) { + flow_dbg(conn, "failed to add timer"); + close(fd); return; } + + conn->timer = fd; } if (conn->flags & ACK_TO_TAP_DUE) { -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 3/6] udp_flow: remove unneeded epoll_ref indirection 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 1/6] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 2/6] tcp: cleanup timer creation Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 4/6] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier, David Gibson The fref union was used to convert flow_sidx_t to uint32_t for assignment to ref.data. This is unnecessary since epoll_ref already contains a flowside member of type flow_sidx_t, so we can assign directly. This aligns with how icmp.c and other callers assign flow_sidx_t to epoll_ref. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> --- udp_flow.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/udp_flow.c b/udp_flow.c index 8907f2f72741..0ba788060db7 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -74,10 +74,6 @@ static int udp_flow_sock(const struct ctx *c, { const struct flowside *side = &uflow->f.side[sidei]; uint8_t pif = uflow->f.pif[sidei]; - union { - flow_sidx_t sidx; - uint32_t data; - } fref = { .sidx = FLOW_SIDX(uflow, sidei) }; union epoll_ref ref; int rc; int s; @@ -89,7 +85,7 @@ static int udp_flow_sock(const struct ctx *c, } ref.type = EPOLL_TYPE_UDP; - ref.data = fref.data; + ref.flowside = FLOW_SIDX(uflow, sidei); ref.fd = s; flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT); -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 4/6] udp_flow: Assign socket to flow inside udp_flow_sock() 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier ` (2 preceding siblings ...) 2026-01-09 16:54 ` [PATCH v3 3/6] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 5/6] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 6/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier, David Gibson Move the assignment of uflow->s[sidei] from the caller (udp_flow_new()) into udp_flow_sock() itself, placing it after the successful connect(). This is a pure refactoring with no functional change. The socket fd is now assigned within udp_flow_sock() where the socket is created, rather than requiring the caller to capture the return value. On error paths, uflow->s[sidei] remains at its initialized value of -1 rather than being set to the negative error code, which is semantically cleaner (though functionally equivalent given the >= 0 check in udp_flow_close()). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> --- udp_flow.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/udp_flow.c b/udp_flow.c index 0ba788060db7..c4cf35c2c89d 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -105,6 +105,7 @@ static int udp_flow_sock(const struct ctx *c, flow_dbg_perror(uflow, "Couldn't connect flow socket"); return rc; } + uflow->s[sidei] = s; /* It's possible, if unlikely, that we could receive some packets in * between the bind() and connect() which may or may not be for this @@ -159,7 +160,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, flow_foreach_sidei(sidei) { if (pif_is_socket(uflow->f.pif[sidei])) - if ((uflow->s[sidei] = udp_flow_sock(c, uflow, sidei)) < 0) + if (udp_flow_sock(c, uflow, sidei) < 0) goto cancel; } -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 5/6] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier ` (3 preceding siblings ...) 2026-01-09 16:54 ` [PATCH v3 4/6] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 6/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier, David Gibson The function tcp_splice_conn_epoll_events() currently takes an array of struct epoll_event and fills in the .events field for both sides using flow_foreach_sidei() loops. This works, but the function is doing two conceptually separate things at once: computing events for side 0 and computing events for side 1. The OUT_WAIT handling is particularly subtle, as it has cross-side effects: when OUT_WAIT(sidei) is set, we add EPOLLOUT to ev[sidei] but also remove EPOLLIN from ev[!sidei]. Refactor to make the function compute events for a single side at a time, taking sidei as a parameter and returning uint32_t. This makes the logic more focused and easier to follow. The cross-side effects of OUT_WAIT are preserved by checking both OUT_WAIT(sidei) and OUT_WAIT(!sidei) within each call. The caller tcp_splice_epoll_ctl() now invokes the function twice, once for each side, making the two-sided nature of the operation explicit. No functional change. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> --- tcp_splice.c | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 440522449c13..bf4ff466de07 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -114,29 +114,23 @@ static struct tcp_splice_conn *conn_at_sidx(flow_sidx_t sidx) * @events: Connection event flags * @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket */ -static void tcp_splice_conn_epoll_events(uint16_t events, - struct epoll_event ev[]) +static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei) { - unsigned sidei; - - flow_foreach_sidei(sidei) - ev[sidei].events = 0; + uint32_t e = 0; if (events & SPLICE_ESTABLISHED) { - flow_foreach_sidei(sidei) { - if (!(events & FIN_SENT(!sidei))) - ev[sidei].events = EPOLLIN | EPOLLRDHUP; - } - } else if (events & SPLICE_CONNECT) { - ev[1].events = EPOLLOUT; + if (!(events & FIN_SENT(!sidei))) + e = EPOLLIN | EPOLLRDHUP; + } else if (sidei == 1 && events & SPLICE_CONNECT) { + e = EPOLLOUT; } - flow_foreach_sidei(sidei) { - if (events & OUT_WAIT(sidei)) { - ev[sidei].events |= EPOLLOUT; - ev[!sidei].events &= ~EPOLLIN; - } - } + if (events & OUT_WAIT(sidei)) + e |= EPOLLOUT; + if (events & OUT_WAIT(!sidei)) + e &= ~EPOLLIN; + + return e; } /** @@ -161,7 +155,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c, struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 }, { .data.u64 = ref[1].u64 } }; - tcp_splice_conn_epoll_events(conn->events, ev); + ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0); + ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1); if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) || -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 6/6] flow: Introduce flow_epoll_set() to centralize epoll operations 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier ` (4 preceding siblings ...) 2026-01-09 16:54 ` [PATCH v3 5/6] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier @ 2026-01-09 16:54 ` Laurent Vivier 5 siblings, 0 replies; 7+ messages in thread From: Laurent Vivier @ 2026-01-09 16:54 UTC (permalink / raw) To: passt-dev; +Cc: Laurent Vivier Currently, each flow type (TCP, TCP_SPLICE, PING, UDP) has its own code to add or modify file descriptors in epoll. This leads to duplicated boilerplate code across icmp.c, tcp.c, tcp_splice.c, and udp_flow.c, each setting up epoll_ref unions and calling epoll_ctl() with flow-type-specific details. Introduce flow_epoll_set() in flow.c to handle epoll operations for all flow types in a unified way. This will be needed to migrate queue pair from an epollfd to another. Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- flow.c | 37 ++++++++++++++++++++++++ flow.h | 2 ++ icmp.c | 10 ++----- tcp.c | 48 ++++++++++++++++++------------ tcp_splice.c | 82 ++++++++++++++++++++++++---------------------------- udp_flow.c | 11 ++----- 6 files changed, 111 insertions(+), 79 deletions(-) diff --git a/flow.c b/flow.c index 4f53486586cd..cefe6c8b5b24 100644 --- a/flow.c +++ b/flow.c @@ -20,6 +20,7 @@ #include "flow.h" #include "flow_table.h" #include "repair.h" +#include "epoll_ctl.h" const char *flow_state_str[] = { [FLOW_STATE_FREE] = "FREE", @@ -53,6 +54,16 @@ const uint8_t flow_proto[] = { static_assert(ARRAY_SIZE(flow_proto) == FLOW_NUM_TYPES, "flow_proto[] doesn't match enum flow_type"); +static const enum epoll_type flow_epoll[] = { + [FLOW_TCP] = EPOLL_TYPE_TCP, + [FLOW_TCP_SPLICE] = EPOLL_TYPE_TCP_SPLICE, + [FLOW_PING4] = EPOLL_TYPE_PING, + [FLOW_PING6] = EPOLL_TYPE_PING, + [FLOW_UDP] = EPOLL_TYPE_UDP, +}; +static_assert(ARRAY_SIZE(flow_epoll) == FLOW_NUM_TYPES, + "flow_epoll[] doesn't match enum flow_type"); + #define foreach_established_tcp_flow(flow) \ flow_foreach_of_type((flow), FLOW_TCP) \ if (!tcp_flow_is_established(&(flow)->tcp)) \ @@ -390,6 +401,32 @@ void flow_epollid_clear(struct flow_common *f) f->epollid = EPOLLFD_ID_INVALID; } +/** + * flow_epoll_set() - Add or modify epoll registration for a flow socket + * @f: Flow to register socket for + * @command: epoll_ctl() command: EPOLL_CTL_ADD or EPOLL_CTL_MOD + * @events: epoll events to watch for + * @fd: File descriptor to register + * @sidei: Side index of the flow + * + * Return: 0 on success, -1 on error (from epoll_ctl()) + */ +int flow_epoll_set(const struct flow_common *f, int command, uint32_t events, + int fd, unsigned int sidei) +{ + struct epoll_event ev; + union epoll_ref ref; + + ref.fd = fd; + ref.type = flow_epoll[f->type]; + ref.flowside = flow_sidx(f, sidei); + + ev.events = events; + ev.data.u64 = ref.u64; + + return epoll_ctl(flow_epollfd(f), command, fd, &ev); +} + /** * flow_epollid_register() - Initialize the epoll id -> fd mapping * @epollid: epoll id to associate to diff --git a/flow.h b/flow.h index b43b0b1dd7f2..1b78d596fb17 100644 --- a/flow.h +++ b/flow.h @@ -265,6 +265,8 @@ bool flow_in_epoll(const struct flow_common *f); int flow_epollfd(const struct flow_common *f); void flow_epollid_set(struct flow_common *f, int epollid); void flow_epollid_clear(struct flow_common *f); +int flow_epoll_set(const struct flow_common *f, int command, uint32_t events, + int fd, unsigned int sidei); void flow_epollid_register(int epollid, int epollfd); void flow_defer_handler(const struct ctx *c, const struct timespec *now); int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, diff --git a/icmp.c b/icmp.c index 9564c4963f7b..eb7f11be5dad 100644 --- a/icmp.c +++ b/icmp.c @@ -177,7 +177,6 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, union flow *flow = flow_alloc(); struct icmp_ping_flow *pingf; const struct flowside *tgt; - union epoll_ref ref; if (!flow) return NULL; @@ -211,13 +210,10 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, goto cancel; flow_epollid_set(&pingf->f, EPOLLFD_ID_DEFAULT); - - ref.type = EPOLL_TYPE_PING; - ref.flowside = FLOW_SIDX(flow, TGTSIDE); - ref.fd = pingf->sock; - - if (epoll_add(flow_epollfd(&pingf->f), EPOLLIN, ref) < 0) { + if (flow_epoll_set(&pingf->f, EPOLL_CTL_ADD, EPOLLIN, pingf->sock, + TGTSIDE) < 0) { close(pingf->sock); + flow_epollid_clear(&pingf->f); goto cancel; } diff --git a/tcp.c b/tcp.c index 4b746b4ca16c..e7f284be6a52 100644 --- a/tcp.c +++ b/tcp.c @@ -523,34 +523,44 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags) /** * tcp_epoll_ctl() - Add/modify/delete epoll state from connection events - * @c: Execution context * @conn: Connection pointer * * Return: 0 on success, negative error code on failure (not on deletion) */ -static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) +static int tcp_epoll_ctl(struct tcp_tap_conn *conn) { - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; - union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock, - .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), }; - struct epoll_event ev = { .data.u64 = ref.u64 }; - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) - : c->epollfd; + uint32_t events; + int m; if (conn->events == CLOSED) { - if (flow_in_epoll(&conn->f)) + if (flow_in_epoll(&conn->f)) { + int epollfd = flow_epollfd(&conn->f); + epoll_del(epollfd, conn->sock); - if (conn->timer != -1) - epoll_del(epollfd, conn->timer); + if (conn->timer != -1) + epoll_del(epollfd, conn->timer); + } + return 0; } - ev.events = tcp_conn_epoll_events(conn->events, conn->flags); + events = tcp_conn_epoll_events(conn->events, conn->flags); - if (epoll_ctl(epollfd, m, conn->sock, &ev)) - return -errno; + if (flow_in_epoll(&conn->f)) { + m = EPOLL_CTL_MOD; + } else { + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); + m = EPOLL_CTL_ADD; + } - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); + if (flow_epoll_set(&conn->f, m, events, conn->sock, + !TAPSIDE(conn)) < 0) { + int ret = -errno; + + if (m == EPOLL_CTL_ADD) + flow_epollid_clear(&conn->f); + return ret; + } return 0; } @@ -671,7 +681,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, } if (flag == STALLED || flag == ~STALLED) - tcp_epoll_ctl(c, conn); + tcp_epoll_ctl(conn); if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE || (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) || @@ -728,7 +738,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, } else { if (event == CLOSED) flow_hash_remove(c, TAP_SIDX(conn)); - tcp_epoll_ctl(c, conn); + tcp_epoll_ctl(conn); } if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) @@ -1744,7 +1754,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, conn_event(c, conn, TAP_SYN_ACK_SENT); } - tcp_epoll_ctl(c, conn); + tcp_epoll_ctl(conn); if (c->mode == MODE_VU) { /* To rebind to same oport after migration */ socklen_t sl = sizeof(sa); @@ -3986,7 +3996,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd tcp_send_flag(c, conn, ACK); tcp_data_from_sock(c, conn); - if ((rc = tcp_epoll_ctl(c, conn))) { + if ((rc = tcp_epoll_ctl(conn))) { flow_dbg(conn, "Failed to subscribe to epoll for migrated socket: %s", strerror_(-rc)); diff --git a/tcp_splice.c b/tcp_splice.c index bf4ff466de07..a7c04ca8652a 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -135,37 +135,31 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei) /** * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events - * @c: Execution context * @conn: Connection pointer * * Return: 0 on success, negative error code on failure (not on deletion) */ -static int tcp_splice_epoll_ctl(const struct ctx *c, - struct tcp_splice_conn *conn) +static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn) { - int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) - : c->epollfd; - int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; - const union epoll_ref ref[SIDES] = { - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[0], - .flowside = FLOW_SIDX(conn, 0) }, - { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[1], - .flowside = FLOW_SIDX(conn, 1) } - }; - struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 }, - { .data.u64 = ref[1].u64 } }; - - ev[0].events = tcp_splice_conn_epoll_events(conn->events, 0); - ev[1].events = tcp_splice_conn_epoll_events(conn->events, 1); - - - if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) || - epoll_ctl(epollfd, m, conn->s[1], &ev[1])) { + uint32_t events[2]; + int m; + + if (flow_in_epoll(&conn->f)) { + m = EPOLL_CTL_MOD; + } else { + flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); + m = EPOLL_CTL_ADD; + } + + events[0] = tcp_splice_conn_epoll_events(conn->events, 0); + events[1] = tcp_splice_conn_epoll_events(conn->events, 1); + + if (flow_epoll_set(&conn->f, m, events[0], conn->s[0], 0) || + flow_epoll_set(&conn->f, m, events[1], conn->s[1], 1)) { int ret = -errno; flow_perror(conn, "ERROR on epoll_ctl()"); return ret; } - flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); return 0; } @@ -205,7 +199,7 @@ static void conn_flag_do(struct tcp_splice_conn *conn, } } -#define conn_flag(c, conn, flag) \ +#define conn_flag(conn, flag) \ do { \ flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ conn_flag_do(conn, flag); \ @@ -213,12 +207,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn, /** * conn_event_do() - Set and log connection events, update epoll state - * @c: Execution context * @conn: Connection pointer * @event: Connection event */ -static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn, - unsigned long event) +static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event) { if (event & (event - 1)) { int flag_index = fls(~event); @@ -240,14 +232,14 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn, flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]); } - if (tcp_splice_epoll_ctl(c, conn)) - conn_flag(c, conn, CLOSING); + if (tcp_splice_epoll_ctl(conn)) + conn_flag(conn, CLOSING); } -#define conn_event(c, conn, event) \ +#define conn_event(conn, event) \ do { \ flow_trace(conn, "event at %s:%i",__func__, __LINE__); \ - conn_event_do(c, conn, event); \ + conn_event_do(conn, event); \ } while (0) @@ -315,7 +307,7 @@ static int tcp_splice_connect_finish(const struct ctx *c, if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) { flow_perror(conn, "cannot create %d->%d pipe", sidei, !sidei); - conn_flag(c, conn, CLOSING); + conn_flag(conn, CLOSING); return -EIO; } @@ -329,7 +321,7 @@ static int tcp_splice_connect_finish(const struct ctx *c, } if (!(conn->events & SPLICE_ESTABLISHED)) - conn_event(c, conn, SPLICE_ESTABLISHED); + conn_event(conn, SPLICE_ESTABLISHED); return 0; } @@ -376,7 +368,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport); - conn_event(c, conn, SPLICE_CONNECT); + conn_event(conn, SPLICE_CONNECT); if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { @@ -385,7 +377,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) return -errno; } } else { - conn_event(c, conn, SPLICE_ESTABLISHED); + conn_event(conn, SPLICE_ESTABLISHED); return tcp_splice_connect_finish(c, conn); } @@ -445,7 +437,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0); if (tcp_splice_connect(c, conn)) - conn_flag(c, conn, CLOSING); + conn_flag(conn, CLOSING); FLOW_ACTIVATE(conn); } @@ -494,14 +486,14 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, if (events & EPOLLOUT) { fromsidei = !evsidei; - conn_event(c, conn, ~OUT_WAIT(evsidei)); + conn_event(conn, ~OUT_WAIT(evsidei)); } else { fromsidei = evsidei; } if (events & EPOLLRDHUP) /* For side 0 this is fake, but implied */ - conn_event(c, conn, FIN_RCVD(evsidei)); + conn_event(conn, FIN_RCVD(evsidei)); swap: eof = 0; @@ -536,7 +528,7 @@ retry: more = SPLICE_F_MORE; if (conn->flags & lowat_set_flag) - conn_flag(c, conn, lowat_act_flag); + conn_flag(conn, lowat_act_flag); } do @@ -568,8 +560,8 @@ retry: "Setting SO_RCVLOWAT %i: %s", lowat, strerror_(errno)); } else { - conn_flag(c, conn, lowat_set_flag); - conn_flag(c, conn, lowat_act_flag); + conn_flag(conn, lowat_set_flag); + conn_flag(conn, lowat_act_flag); } } @@ -583,7 +575,7 @@ retry: if (conn->read[fromsidei] == conn->written[fromsidei]) break; - conn_event(c, conn, OUT_WAIT(!fromsidei)); + conn_event(conn, OUT_WAIT(!fromsidei)); break; } @@ -605,7 +597,7 @@ retry: if ((conn->events & FIN_RCVD(sidei)) && !(conn->events & FIN_SENT(!sidei))) { shutdown(conn->s[!sidei], SHUT_WR); - conn_event(c, conn, FIN_SENT(!sidei)); + conn_event(conn, FIN_SENT(!sidei)); } } } @@ -626,7 +618,7 @@ retry: return; close: - conn_flag(c, conn, CLOSING); + conn_flag(conn, CLOSING); } /** @@ -762,10 +754,10 @@ void tcp_splice_timer(struct tcp_splice_conn *conn) flow_trace(conn, "can't set SO_RCVLOWAT on %d", conn->s[sidei]); } - conn_flag(c, conn, ~RCVLOWAT_SET(sidei)); + conn_flag(conn, ~RCVLOWAT_SET(sidei)); } } flow_foreach_sidei(sidei) - conn_flag(c, conn, ~RCVLOWAT_ACT(sidei)); + conn_flag(conn, ~RCVLOWAT_ACT(sidei)); } diff --git a/udp_flow.c b/udp_flow.c index c4cf35c2c89d..80b15433f0ac 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -74,7 +74,6 @@ static int udp_flow_sock(const struct ctx *c, { const struct flowside *side = &uflow->f.side[sidei]; uint8_t pif = uflow->f.pif[sidei]; - union epoll_ref ref; int rc; int s; @@ -84,14 +83,10 @@ static int udp_flow_sock(const struct ctx *c, return s; } - ref.type = EPOLL_TYPE_UDP; - ref.flowside = FLOW_SIDX(uflow, sidei); - ref.fd = s; - flow_epollid_set(&uflow->f, EPOLLFD_ID_DEFAULT); - - rc = epoll_add(flow_epollfd(&uflow->f), EPOLLIN, ref); - if (rc < 0) { + if (flow_epoll_set(&uflow->f, EPOLL_CTL_ADD, EPOLLIN, s, sidei) < 0) { + rc = -errno; + flow_epollid_clear(&uflow->f); close(s); return rc; } -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-09 16:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-09 16:54 [PATCH v3 0/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 1/6] tcp: remove timer update in tcp_epoll_ctl() Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 2/6] tcp: cleanup timer creation Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 3/6] udp_flow: remove unneeded epoll_ref indirection Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 4/6] udp_flow: Assign socket to flow inside udp_flow_sock() Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 5/6] tcp_splice: Refactor tcp_splice_conn_epoll_events() to per-side computation Laurent Vivier 2026-01-09 16:54 ` [PATCH v3 6/6] flow: Introduce flow_epoll_set() to centralize epoll operations Laurent Vivier
Code repositories for project(s) associated with this public inbox https://passt.top/passt This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for IMAP folder(s).