From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GcuAnoSb; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 363305A061B for ; Thu, 09 Oct 2025 15:04:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1760015062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XY+7WJHVp5anEFb05tHFV5iHmei3a78JiLA6X0dj3+I=; b=GcuAnoSbKqDzszhjvqSrCXBCAXm8lbXTRRYDyhstMu1tDfLFTa320m0S9RkJnUN5eRVs5f Ht1TIX3hYOct726YTBem/P42YnGzwYQ4rzl1DR5SVOumbiyL8XmS58s7MaG0RcdRJuN9vp uk2Ot0XSXl2z0hFvTKep2RVxc7lwdq4= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-593-rn5TbdRxPyipWNZkNwhdFA-1; Thu, 09 Oct 2025 09:04:20 -0400 X-MC-Unique: rn5TbdRxPyipWNZkNwhdFA-1 X-Mimecast-MFC-AGG-ID: rn5TbdRxPyipWNZkNwhdFA_1760015060 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E810B195609E for ; Thu, 9 Oct 2025 13:04:19 +0000 (UTC) Received: from lenovo-t14s.redhat.com (unknown [10.45.224.166]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D4EC21800115; Thu, 9 Oct 2025 13:04:18 +0000 (UTC) From: Laurent Vivier To: passt-dev@passt.top Subject: [PATCH v3 4/6] tcp, flow: Replace per-connection in_epoll flag with threadnb in flow_common Date: Thu, 9 Oct 2025 15:04:07 +0200 Message-ID: <20251009130409.3931795-5-lvivier@redhat.com> In-Reply-To: <20251009130409.3931795-1-lvivier@redhat.com> References: <20251009130409.3931795-1-lvivier@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7QKqUP9hWKX3dfdntQsH85-um74qW6Z6LSN5hIoeZyw_1760015060 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: FVBYBLH75NB6S63VYPAS2HRJM333QSI5 X-Message-ID-Hash: FVBYBLH75NB6S63VYPAS2HRJM333QSI5 X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: The in_epoll boolean flag in tcp_tap_conn and tcp_splice_conn only tracked whether a connection was registered with epoll, not which epoll instance. This limited flexibility for future multi-epoll support. Replace the boolean with a threadnb field in flow_common that identifies which thread (and thus which epoll instance) the flow is registered with. Use FLOW_THREADNB_INVALID to indicate when a flow is not registered with any epoll instance. A threadnb_to_epollfd[] mapping table translates thread numbers to their corresponding epoll file descriptors. Add helper functions: - flow_epollfd_valid() to check if a flow is registered with epoll - flow_epollfd_get() to retrieve the epoll fd for a flow's thread - flow_epollfd_set() to register a flow with a thread's epoll instance This change also simplifies tcp_timer_ctl() and conn_flag_do() by removing the need to pass the context 'c', since the epoll fd is now directly accessible from the flow structure via flow_epollfd_get(). Signed-off-by: Laurent Vivier --- flow.c | 43 ++++++++++++++++++++++++++++++++++++++++++- flow.h | 11 +++++++++++ tcp.c | 39 +++++++++++++++++++++------------------ tcp_conn.h | 8 +------- tcp_splice.c | 24 ++++++++++++------------ 5 files changed, 87 insertions(+), 38 deletions(-) diff --git a/flow.c b/flow.c index b14e9d8b63ff..6298bbafcd51 100644 --- a/flow.c +++ b/flow.c @@ -116,6 +116,7 @@ static_assert(ARRAY_SIZE(flow_proto) == FLOW_NUM_TYPES, unsigned flow_first_free; union flow flowtab[FLOW_MAX]; static const union flow *flow_new_entry; /* = NULL */ +static int threadnb_to_epollfd[FLOW_THREADNB_SIZE]; /* Hash table to index it */ #define FLOW_HASH_LOAD 70 /* % */ @@ -347,6 +348,45 @@ static void flow_set_state(struct flow_common *f, enum flow_state state) flow_log_details_(f, LOG_DEBUG, MAX(state, oldstate)); } +/** + * flow_epollfd_valid() - Check if flow is registered with an epoll instance + * @f: Flow to check + * + * Return: true if flow is registered with epoll, false otherwise + */ +bool flow_epollfd_valid(const struct flow_common *f) +{ + return f->threadnb != FLOW_THREADNB_INVALID; +} + +/** + * flow_epollfd_get() - Get the epoll file descriptor for a flow + * @f: Flow to query + * + * Return: epoll file descriptor associated with the flow's thread + */ +int flow_epollfd_get(const struct flow_common *f) +{ + ASSERT(f->threadnb < FLOW_THREADNB_MAX); + + return threadnb_to_epollfd[f->threadnb]; +} + +/** + * flow_epollfd_set() - Set the epoll instance for a flow + * @f: Flow to update + * @threadnb: Thread number to associate with this flow + * @epollfd: epoll file descriptor for the thread + */ +void flow_epollfd_set(struct flow_common *f, int threadnb, int epollfd) +{ + ASSERT(threadnb < FLOW_THREADNB_MAX); + ASSERT(epollfd >= 0); + + threadnb_to_epollfd[threadnb] = epollfd; + f->threadnb = threadnb; +} + /** * flow_initiate_() - Move flow to INI, setting pif[INISIDE] * @flow: Flow to change state @@ -548,6 +588,7 @@ union flow *flow_alloc(void) flow_new_entry = flow; memset(flow, 0, sizeof(*flow)); + flow->f.threadnb = FLOW_THREADNB_INVALID; flow_set_state(&flow->f, FLOW_STATE_NEW); return flow; @@ -827,7 +868,7 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now) case FLOW_TCP_SPLICE: closed = tcp_splice_flow_defer(&flow->tcp_splice); if (!closed && timer) - tcp_splice_timer(c, &flow->tcp_splice); + tcp_splice_timer(&flow->tcp_splice); break; case FLOW_PING4: case FLOW_PING6: diff --git a/flow.h b/flow.h index ef138b83add8..18ae45d67720 100644 --- a/flow.h +++ b/flow.h @@ -177,6 +177,8 @@ int flowside_connect(const struct ctx *c, int s, * @type: Type of packet flow * @pif[]: Interface for each side of the flow * @side[]: Information for each side of the flow + * @threadnb: Thread number flow is registered with + * (FLOW_THREADNB_INVALID if not) */ struct flow_common { #ifdef __GNUC__ @@ -192,8 +194,14 @@ struct flow_common { #endif uint8_t pif[SIDES]; struct flowside side[SIDES]; +#define FLOW_THREADNB_BITS 8 + unsigned int threadnb:FLOW_THREADNB_BITS; }; +#define FLOW_THREADNB_SIZE (1 << FLOW_THREADNB_BITS) +#define FLOW_THREADNB_MAX (FLOW_THREADNB_SIZE - 1) +#define FLOW_THREADNB_INVALID FLOW_THREADNB_MAX + #define FLOW_INDEX_BITS 17 /* 128k - 1 */ #define FLOW_MAX MAX_FROM_BITS(FLOW_INDEX_BITS) @@ -249,6 +257,9 @@ flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif, union flow; void flow_init(void); +bool flow_epollfd_valid(const struct flow_common *f); +int flow_epollfd_get(const struct flow_common *f); +void flow_epollfd_set(struct flow_common *f, int threadnb, int epollfd); void flow_defer_handler(const struct ctx *c, const struct timespec *now); int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, int fd); diff --git a/tcp.c b/tcp.c index c2d842bbdf4f..f9c67e2b006d 100644 --- a/tcp.c +++ b/tcp.c @@ -504,25 +504,27 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags) */ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) { - int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; + int m = flow_epollfd_valid(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock, .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), }; struct epoll_event ev = { .data.u64 = ref.u64 }; + int epollfd = flow_epollfd_valid(&conn->f) ? flow_epollfd_get(&conn->f) + : c->epollfd; if (conn->events == CLOSED) { - if (conn->in_epoll) - epoll_del(c->epollfd, conn->sock); + if (flow_epollfd_valid(&conn->f)) + epoll_del(epollfd, conn->sock); if (conn->timer != -1) - epoll_del(c->epollfd, conn->timer); + epoll_del(epollfd, conn->timer); return 0; } ev.events = tcp_conn_epoll_events(conn->events, conn->flags); - if (epoll_ctl(c->epollfd, m, conn->sock, &ev)) + if (epoll_ctl(epollfd, m, conn->sock, &ev)) return -errno; - conn->in_epoll = true; + flow_epollfd_set(&conn->f, 0, epollfd); if (conn->timer != -1) { union epoll_ref ref_t = { .type = EPOLL_TYPE_TCP_TIMER, @@ -531,7 +533,8 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) struct epoll_event ev_t = { .data.u64 = ref_t.u64, .events = EPOLLIN | EPOLLET }; - if (epoll_ctl(c->epollfd, EPOLL_CTL_MOD, conn->timer, &ev_t)) + if (epoll_ctl(flow_epollfd_get(&conn->f), EPOLL_CTL_MOD, + conn->timer, &ev_t)) return -errno; } @@ -540,12 +543,11 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) /** * tcp_timer_ctl() - Set timerfd based on flags/events, create timerfd if needed - * @c: Execution context * @conn: Connection pointer * * #syscalls timerfd_create timerfd_settime */ -static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) +static void tcp_timer_ctl(struct tcp_tap_conn *conn) { struct itimerspec it = { { 0 }, { 0 } }; @@ -570,7 +572,8 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) } conn->timer = fd; - if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) { + if (epoll_ctl(flow_epollfd_get(&conn->f), EPOLL_CTL_ADD, + conn->timer, &ev)) { flow_dbg_perror(conn, "failed to add timer"); close(conn->timer); conn->timer = -1; @@ -628,7 +631,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, * flags and factor this into the logic below. */ if (flag == ACK_FROM_TAP_DUE) - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); return; } @@ -644,7 +647,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE || (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) || (flag == ~ACK_TO_TAP_DUE && (conn->flags & ACK_FROM_TAP_DUE))) - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); } /** @@ -699,7 +702,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, tcp_epoll_ctl(c, conn); if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); } /** @@ -1737,7 +1740,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, seq, conn->seq_from_tap); tcp_send_flag(c, conn, ACK); - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); if (p->count == 1) { tcp_tap_window_update(c, conn, @@ -2386,7 +2389,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) if (conn->flags & ACK_TO_TAP_DUE) { tcp_send_flag(c, conn, ACK_IF_NEEDED); - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { flow_dbg(conn, "handshake timeout"); @@ -2408,7 +2411,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) return; tcp_data_from_sock(c, conn); - tcp_timer_ctl(c, conn); + tcp_timer_ctl(conn); } } else { struct itimerspec new = { { 0 }, { ACT_TIMEOUT, 0 } }; @@ -3456,7 +3459,7 @@ int tcp_flow_migrate_source_ext(const struct ctx *c, if (c->migrate_no_linger) close(s); else - epoll_del(c->epollfd, s); + epoll_del(flow_epollfd_get(&conn->f), s); /* Adjustments unrelated to FIN segments: sequence numbers we dumped are * based on the end of the queues. @@ -3605,7 +3608,7 @@ static int tcp_flow_repair_connect(const struct ctx *c, return rc; } - conn->in_epoll = 0; + conn->f.threadnb = FLOW_THREADNB_INVALID; conn->timer = -1; conn->listening_sock = -1; diff --git a/tcp_conn.h b/tcp_conn.h index 38b5c541f003..81333122d531 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -12,7 +12,6 @@ /** * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced) * @f: Generic flow information - * @in_epoll: Is the connection in the epoll set? * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT * @ws_from_tap: Window scaling factor advertised from tap/guest * @ws_to_tap: Window scaling factor advertised to tap/guest @@ -36,8 +35,6 @@ struct tcp_tap_conn { /* Must be first element */ struct flow_common f; - bool in_epoll :1; - #define TCP_RETRANS_BITS 3 unsigned int retrans :TCP_RETRANS_BITS; #define TCP_MAX_RETRANS MAX_FROM_BITS(TCP_RETRANS_BITS) @@ -196,7 +193,6 @@ struct tcp_tap_transfer_ext { * @written: Bytes written (not fully written from one other side read) * @events: Events observed/actions performed on connection * @flags: Connection flags (attributes, not events) - * @in_epoll: Is the connection in the epoll set? */ struct tcp_splice_conn { /* Must be first element */ @@ -220,8 +216,6 @@ struct tcp_splice_conn { #define RCVLOWAT_SET(sidei_) ((sidei_) ? BIT(1) : BIT(0)) #define RCVLOWAT_ACT(sidei_) ((sidei_) ? BIT(3) : BIT(2)) #define CLOSING BIT(4) - - bool in_epoll :1; }; /* Socket pools */ @@ -245,7 +239,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd bool tcp_flow_is_established(const struct tcp_tap_conn *conn); bool tcp_splice_flow_defer(struct tcp_splice_conn *conn); -void tcp_splice_timer(const struct ctx *c, struct tcp_splice_conn *conn); +void tcp_splice_timer(struct tcp_splice_conn *conn); int tcp_conn_pool_sock(int pool[]); int tcp_conn_sock(sa_family_t af); int tcp_sock_refill_pool(int pool[], sa_family_t af); diff --git a/tcp_splice.c b/tcp_splice.c index 6f21184bdc55..8842feef0684 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -149,7 +149,9 @@ static void tcp_splice_conn_epoll_events(uint16_t events, static int tcp_splice_epoll_ctl(const struct ctx *c, struct tcp_splice_conn *conn) { - int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; + int epollfd = flow_epollfd_valid(&conn->f) ? flow_epollfd_get(&conn->f) + : c->epollfd; + int m = flow_epollfd_valid(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; const union epoll_ref ref[SIDES] = { { .type = EPOLL_TYPE_TCP_SPLICE, .fd = conn->s[0], .flowside = FLOW_SIDX(conn, 0) }, @@ -161,25 +163,24 @@ static int tcp_splice_epoll_ctl(const struct ctx *c, tcp_splice_conn_epoll_events(conn->events, ev); - if (epoll_ctl(c->epollfd, m, conn->s[0], &ev[0]) || - epoll_ctl(c->epollfd, m, conn->s[1], &ev[1])) { + + if (epoll_ctl(epollfd, m, conn->s[0], &ev[0]) || + epoll_ctl(epollfd, m, conn->s[1], &ev[1])) { int ret = -errno; flow_perror(conn, "ERROR on epoll_ctl()"); return ret; } - - conn->in_epoll = true; + flow_epollfd_set(&conn->f, 0, epollfd); return 0; } /** * conn_flag_do() - Set/unset given flag, log, update epoll on CLOSING flag - * @c: Execution context * @conn: Connection pointer * @flag: Flag to set, or ~flag to unset */ -static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn, +static void conn_flag_do(struct tcp_splice_conn *conn, unsigned long flag) { if (flag & (flag - 1)) { @@ -204,15 +205,15 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn, } if (flag == CLOSING) { - epoll_del(c->epollfd, conn->s[0]); - epoll_del(c->epollfd, conn->s[1]); + epoll_del(flow_epollfd_get(&conn->f), conn->s[0]); + epoll_del(flow_epollfd_get(&conn->f), conn->s[1]); } } #define conn_flag(c, conn, flag) \ do { \ flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ - conn_flag_do(c, conn, flag); \ + conn_flag_do(conn, flag); \ } while (0) /** @@ -751,10 +752,9 @@ void tcp_splice_init(struct ctx *c) /** * tcp_splice_timer() - Timer for spliced connections - * @c: Execution context * @conn: Connection to handle */ -void tcp_splice_timer(const struct ctx *c, struct tcp_splice_conn *conn) +void tcp_splice_timer(struct tcp_splice_conn *conn) { unsigned sidei; -- 2.50.1