From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=cBPbNlSV; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 650D75A026F for ; Tue, 21 Oct 2025 15:13:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761052437; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=P4tI0E0u9xvuhOxZFwllGQ+ANEsCgt7pwGdyLvU/z3Y=; b=cBPbNlSVAVjlI1NnnqIHczzHdrFaZAZvl75axSh6KoK3dYtLAQA0nVyMJfIGVY4YBkux7z Y3l36IpavnpwlwBjNGZOQ7YKxGRVAdIGgPIc+lpHx33kLaZqQzvBEPZo/Apm8uoojK6gSk FuFscCEteT/JJAiau/kfBUcov7OrHQM= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-691-iuqabbJeOYGn4tLL3GWWAA-1; Tue, 21 Oct 2025 09:13:55 -0400 X-MC-Unique: iuqabbJeOYGn4tLL3GWWAA-1 X-Mimecast-MFC-AGG-ID: iuqabbJeOYGn4tLL3GWWAA_1761052434 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-42705afbf19so2263125f8f.1 for ; Tue, 21 Oct 2025 06:13:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761052434; x=1761657234; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=P4tI0E0u9xvuhOxZFwllGQ+ANEsCgt7pwGdyLvU/z3Y=; b=jdGo7GKtVOquRBpEFojLSuWx7ApcmM4wQbdUIPdG7WLRBmq24ILNnmRulnYhDOoG3a VIlI/SdFh8cnQ1M5whSWc8LPU59YvbrTi74yI7STGcyhpbqsYf7lLEhexsEryOr7kO0J C8V2K8EN6fM1nf1qPBf4UGJ/UrsH+uRJWIR/iJHIM4u1AgR6VQyZr8fW4L70IC5PQLCP mIG2LCaMnQ2tidThAwGAF3cTpfldX/XRghBaauhWVQ1Utgr7pET7mVrvXcna4MCC7x3q CW71SaoS77gDi0DrzOGqrWI0l+ViQgaJD+Bgg94kxIQUccPwAAGPaE/LVqex5U5IzU/T 6sOw== X-Gm-Message-State: AOJu0YzTsYC+Xac3JStuMoIIatQ486JbDZ/I6QVxx/0R3VphKBCrQfM0 SrcM0aHlnI52AjWYW7TP+qeXklfImgsoyeGx+vUf5M2RBtNcbny0IIoyFn9TxiADEazKw1u0aIO IMCQgv4mZRWb4h+Qf7LDkb9XAO9PNlVl0L55t2gFUKV6j3mZLGeReNg== X-Gm-Gg: ASbGncslqndBBpv6I3vM9TgUXPNGAQhw8mw/SEBBbqYD6eh3JQCi5EKjltmiQlA35DG PAm08XmLBNgSCXNLbG1glvKr0RvDDEt4MMy4nfgm+zFs/tO/ZNy3Ww+RzyeKyzJdBos89DSO0MU nOSfPLbk678LyDS/CH4sBdXMqEghDWcNdfGHE3KwKw9EZt4sIs42yAtJnPvEfUj5wO5lgEKMLCI 8K8vGWOn3Rh5cGRxFV7HPcL/Y6A5coHb6vlv25iKj5ZzCvctbqD/q4mt2mQ7KCHNvs3y67iFwW5 AJ4XdLM1pw1mxVaJmGAPvwYITwllqxnAUy56qGWwJ+z3XQEW9YDsHYdGl1uTfDl0/m3cray2t9w yPrcQgBTqIxUUGWiqdzEW3uWv9FCZ3dE6rbReUg== X-Received: by 2002:a05:6000:144e:b0:426:fb63:c01f with SMTP id ffacd0b85a97d-42704bf71ecmr12883436f8f.29.1761052434163; Tue, 21 Oct 2025 06:13:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHw1bponzV2i3v3t9E2lCsh+PugTInwG45bBk9vgnRFN0DYlXVlOJ1pVUKlL5E2THbJXPyDLA== X-Received: by 2002:a05:6000:144e:b0:426:fb63:c01f with SMTP id ffacd0b85a97d-42704bf71ecmr12883408f8f.29.1761052433542; Tue, 21 Oct 2025 06:13:53 -0700 (PDT) Received: from ?IPV6:2a01:e0a:e10:ef90:343a:68f:2e91:95c? ([2a01:e0a:e10:ef90:343a:68f:2e91:95c]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-427f009a797sm20306591f8f.27.2025.10.21.06.13.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 21 Oct 2025 06:13:52 -0700 (PDT) Message-ID: Date: Tue, 21 Oct 2025 15:13:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 4/7] tcp, flow: Replace per-connection in_epoll flag with threadnb in flow_common To: Stefano Brivio References: <20251017103129.229412-1-lvivier@redhat.com> <20251017103129.229412-5-lvivier@redhat.com> <20251017194319.53451242@elisabeth> From: Laurent Vivier Autocrypt: addr=lvivier@redhat.com; keydata= xsFNBFYFJhkBEAC2me7w2+RizYOKZM+vZCx69GTewOwqzHrrHSG07MUAxJ6AY29/+HYf6EY2 WoeuLWDmXE7A3oJoIsRecD6BXHTb0OYS20lS608anr3B0xn5g0BX7es9Mw+hV/pL+63EOCVm SUVTEQwbGQN62guOKnJJJfphbbv82glIC/Ei4Ky8BwZkUuXd7d5NFJKC9/GDrbWdj75cDNQx UZ9XXbXEKY9MHX83Uy7JFoiFDMOVHn55HnncflUncO0zDzY7CxFeQFwYRbsCXOUL9yBtqLer Ky8/yjBskIlNrp0uQSt9LMoMsdSjYLYhvk1StsNPg74+s4u0Q6z45+l8RAsgLw5OLtTa+ePM JyS7OIGNYxAX6eZk1+91a6tnqfyPcMbduxyBaYXn94HUG162BeuyBkbNoIDkB7pCByed1A7q q9/FbuTDwgVGVLYthYSfTtN0Y60OgNkWCMtFwKxRaXt1WFA5ceqinN/XkgA+vf2Ch72zBkJL RBIhfOPFv5f2Hkkj0MvsUXpOWaOjatiu0fpPo6Hw14UEpywke1zN4NKubApQOlNKZZC4hu6/ 8pv2t4HRi7s0K88jQYBRPObjrN5+owtI51xMaYzvPitHQ2053LmgsOdN9EKOqZeHAYG2SmRW LOxYWKX14YkZI5j/TXfKlTpwSMvXho+efN4kgFvFmP6WT+tPnwARAQABzSNMYXVyZW50IFZp dmllciA8bHZpdmllckByZWRoYXQuY29tPsLBeAQTAQIAIgUCVgVQgAIbAwYLCQgHAwIGFQgC CQoLBBYCAwECHgECF4AACgkQ8ww4vT8vvjwpgg//fSGy0Rs/t8cPFuzoY1cex4limJQfReLr SJXCANg9NOWy/bFK5wunj+h/RCFxIFhZcyXveurkBwYikDPUrBoBRoOJY/BHK0iZo7/WQkur 6H5losVZtrotmKOGnP/lJYZ3H6OWvXzdz8LL5hb3TvGOP68K8Bn8UsIaZJoeiKhaNR0sOJyI YYbgFQPWMHfVwHD/U+/gqRhD7apVysxv5by/pKDln1I5v0cRRH6hd8M8oXgKhF2+rAOL7gvh jEHSSWKUlMjC7YwwjSZmUkL+TQyE18e2XBk85X8Da3FznrLiHZFHQ/NzETYxRjnOzD7/kOVy gKD/o7asyWQVU65mh/ECrtjfhtCBSYmIIVkopoLaVJ/kEbVJQegT2P6NgERC/31kmTF69vn8 uQyW11Hk8tyubicByL3/XVBrq4jZdJW3cePNJbTNaT0d/bjMg5zCWHbMErUib2Nellnbg6bc 2HLDe0NLVPuRZhHUHM9hO/JNnHfvgiRQDh6loNOUnm9Iw2YiVgZNnT4soUehMZ7au8PwSl4I KYE4ulJ8RRiydN7fES3IZWmOPlyskp1QMQBD/w16o+lEtY6HSFEzsK3o0vuBRBVp2WKnssVH qeeV01ZHw0bvWKjxVNOksP98eJfWLfV9l9e7s6TaAeySKRRubtJ+21PRuYAxKsaueBfUE7ZT 7zfOwU0EVgUmGQEQALxSQRbl/QOnmssVDxWhHM5TGxl7oLNJms2zmBpcmlrIsn8nNz0rRyxT 460k2niaTwowSRK8KWVDeAW6ZAaWiYjLlTunoKwvF8vP3JyWpBz0diTxL5o+xpvy/Q6YU3BN efdq8Vy3rFsxgW7mMSrI/CxJ667y8ot5DVugeS2NyHfmZlPGE0Nsy7hlebS4liisXOrN3jFz asKyUws3VXek4V65lHwB23BVzsnFMn/bw/rPliqXGcwl8CoJu8dSyrCcd1Ibs0/Inq9S9+t0 VmWiQWfQkz4rvEeTQkp/VfgZ6z98JRW7S6l6eophoWs0/ZyRfOm+QVSqRfFZdxdP2PlGeIFM C3fXJgygXJkFPyWkVElr76JTbtSHsGWbt6xUlYHKXWo+xf9WgtLeby3cfSkEchACrxDrQpj+ Jt/JFP+q997dybkyZ5IoHWuPkn7uZGBrKIHmBunTco1+cKSuRiSCYpBIXZMHCzPgVDjk4viP brV9NwRkmaOxVvye0vctJeWvJ6KA7NoAURplIGCqkCRwg0MmLrfoZnK/gRqVJ/f6adhU1oo6 z4p2/z3PemA0C0ANatgHgBb90cd16AUxpdEQmOCmdNnNJF/3Zt3inzF+NFzHoM5Vwq6rc1JP jfC3oqRLJzqAEHBDjQFlqNR3IFCIAo4SYQRBdAHBCzkM4rWyRhuVABEBAAHCwV8EGAECAAkF AlYFJhkCGwwACgkQ8ww4vT8vvjwg9w//VQrcnVg3TsjEybxDEUBm8dBmnKqcnTBFmxN5FFtI WlEuY8+YMiWRykd8Ln9RJ/98/ghABHz9TN8TRo2b6WimV64FmlVn17Ri6FgFU3xNt9TTEChq AcNg88eYryKsYpFwegGpwUlaUaaGh1m9OrTzcQy+klVfZWaVJ9Nw0keoGRGb8j4XjVpL8+2x OhXKrM1fzzb8JtAuSbuzZSQPDwQEI5CKKxp7zf76J21YeRrEW4WDznPyVcDTa+tz++q2S/Bp P4W98bXCBIuQgs2m+OflERv5c3Ojldp04/S4NEjXEYRWdiCxN7ca5iPml5gLtuvhJMSy36gl U6IW9kn30IWuSoBpTkgV7rLUEhh9Ms82VWW/h2TxL8enfx40PrfbDtWwqRID3WY8jLrjKfTd R3LW8BnUDNkG+c4FzvvGUs8AvuqxxyHbXAfDx9o/jXfPHVRmJVhSmd+hC3mcQ+4iX5bBPBPM oDqSoLt5w9GoQQ6gDVP2ZjTWqwSRMLzNr37rJjZ1pt0DCMMTbiYIUcrhX8eveCJtY7NGWNyx FCRkhxRuGcpwPmRVDwOl39MB3iTsRighiMnijkbLXiKoJ5CDVvX5yicNqYJPKh5MFXN1bvsB kmYiStMRbrD0HoY1kx5/VozBtc70OU0EB8Wrv9hZD+Ofp0T3KOr1RUHvCZoLURfFhSQ= In-Reply-To: <20251017194319.53451242@elisabeth> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: x-RqB7W1ivzwuFRxNnCSg3Xxmqdj6RtC2vA0VOfhABc_1761052434 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: BCB4BSUYJGYJKVQU67CW4YJRWKNBXODT X-Message-ID-Hash: BCB4BSUYJGYJKVQU67CW4YJRWKNBXODT X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 17/10/2025 19:43, Stefano Brivio wrote: > Nits only: > > On Fri, 17 Oct 2025 12:31:26 +0200 > Laurent Vivier wrote: > >> The in_epoll boolean flag in tcp_tap_conn and tcp_splice_conn only tracked >> whether a connection was registered with epoll, not which epoll instance. >> This limited flexibility for future multi-epoll support. >> >> Replace the boolean with a threadnb field in flow_common that identifies >> which thread (and thus which epoll instance) the flow is registered with. >> Use FLOW_THREADNB_INVALID to indicate when a flow is not registered with >> any epoll instance. A threadnb_to_epollfd[] mapping table translates >> thread numbers to their corresponding epoll file descriptors. > > It wasn't really clear to me until I actually started digging into this > change what NB meant there. For me it's "notifier block" before > "number", but that didn't make sense either. > > Some ideas (even though maybe as David suggested this name > shouldn't have "thread" in it at all): > > - f->thread / FLOW_THREAD_INVALID > > - f->thread_id / FLOW_THREAD_ID_INVALID > > ...or f->epoll_id, reflecting David's observation? epoll_id seems reasonable. > >> Add helper functions: >> - flow_in_epoll() to check if a flow is registered with epoll >> - flow_epollfd() to retrieve the epoll fd for a flow's thread >> - flow_thread_register() to register an epoll fd with a thread >> - flow_thread_set() to set the thread number of a flow >> >> This change also simplifies tcp_timer_ctl() and conn_flag_do() by removing >> the need to pass the context 'c', since the epoll fd is now directly >> accessible from the flow structure via flow_epollfd(). >> >> Signed-off-by: Laurent Vivier >> --- >> flow.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++- >> flow.h | 12 ++++++++++++ >> passt.c | 1 + >> tcp.c | 39 ++++++++++++++++++++------------------ >> tcp_conn.h | 8 +------- >> tcp_splice.c | 24 ++++++++++++------------ >> 6 files changed, 99 insertions(+), 38 deletions(-) >> >> diff --git a/flow.c b/flow.c >> index b14e9d8b63ff..d56bae776239 100644 >> --- a/flow.c >> +++ b/flow.c >> @@ -116,6 +116,7 @@ static_assert(ARRAY_SIZE(flow_proto) == FLOW_NUM_TYPES, >> unsigned flow_first_free; >> union flow flowtab[FLOW_MAX]; >> static const union flow *flow_new_entry; /* = NULL */ > > It would be nice to have an idea of how this table is organised without > looking at how it's used, say: > > /* Table of epoll file descriptors, indexed by thread number */ > > ...or indexed by "epoll identifiers", perhaps, if we want to drop > references to threads here. I will drop reference to threads. > >> +static int threadnb_to_epollfd[FLOW_THREADNB_SIZE]; >> >> /* Hash table to index it */ >> #define FLOW_HASH_LOAD 70 /* % */ >> @@ -347,6 +348,55 @@ static void flow_set_state(struct flow_common *f, enum flow_state state) >> flow_log_details_(f, LOG_DEBUG, MAX(state, oldstate)); >> } >> >> +/** >> + * flow_in_epoll() - Check if flow is registered with an epoll instance >> + * @f: Flow to check >> + * >> + * Return: true if flow is registered with epoll, false otherwise >> + */ >> +bool flow_in_epoll(const struct flow_common *f) >> +{ >> + return f->threadnb != FLOW_THREADNB_INVALID; >> +} >> + >> +/** >> + * flow_epollfd() - Get the epoll file descriptor for a flow >> + * @f: Flow to query >> + * >> + * Return: epoll file descriptor associated with the flow's thread >> + */ >> +int flow_epollfd(const struct flow_common *f) >> +{ >> + ASSERT(f->threadnb < FLOW_THREADNB_MAX); >> + >> + return threadnb_to_epollfd[f->threadnb]; >> +} >> + >> +/** >> + * flow_thread_set() - Associate a flow with a thread >> + * @f: Flow to update >> + * @threadnb: Thread number to associate with this flow >> + */ >> +void flow_thread_set(struct flow_common *f, int threadnb) >> +{ >> + ASSERT(threadnb < FLOW_THREADNB_MAX); >> + >> + f->threadnb = threadnb; >> +} >> + >> +/** >> + * flow_thread_register() - Initialize the threadnb -> epollfd mapping >> + * @threadnb: Thread number to associate to >> + * @epollfd: epoll file descriptor for the thread >> + */ >> +void flow_thread_register(int threadnb, int epollfd) >> +{ >> + ASSERT(threadnb < FLOW_THREADNB_MAX); >> + ASSERT(epollfd >= 0); >> + >> + threadnb_to_epollfd[threadnb] = epollfd; >> +} >> + >> /** >> * flow_initiate_() - Move flow to INI, setting pif[INISIDE] >> * @flow: Flow to change state >> @@ -548,6 +598,7 @@ union flow *flow_alloc(void) >> >> flow_new_entry = flow; >> memset(flow, 0, sizeof(*flow)); >> + flow->f.threadnb = FLOW_THREADNB_INVALID; >> flow_set_state(&flow->f, FLOW_STATE_NEW); >> >> return flow; >> @@ -827,7 +878,7 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now) >> case FLOW_TCP_SPLICE: >> closed = tcp_splice_flow_defer(&flow->tcp_splice); >> if (!closed && timer) >> - tcp_splice_timer(c, &flow->tcp_splice); >> + tcp_splice_timer(&flow->tcp_splice); >> break; >> case FLOW_PING4: >> case FLOW_PING6: >> diff --git a/flow.h b/flow.h >> index ef138b83add8..700d8b32c990 100644 >> --- a/flow.h >> +++ b/flow.h >> @@ -177,6 +177,8 @@ int flowside_connect(const struct ctx *c, int s, >> * @type: Type of packet flow >> * @pif[]: Interface for each side of the flow >> * @side[]: Information for each side of the flow >> + * @threadnb: Thread number flow is registered with >> + * (FLOW_THREADNB_INVALID if not) > > You could phrase this more directly, say, "thread identifier, or > FLOW_THREAD_INVALID", or "epollfd identifier, or EPOLLFD_ID_INVALID". > > This is a structure describing flows, it's not surprising it's about > the flow. > >> */ >> struct flow_common { >> #ifdef __GNUC__ >> @@ -192,8 +194,14 @@ struct flow_common { >> #endif >> uint8_t pif[SIDES]; >> struct flowside side[SIDES]; >> +#define FLOW_THREADNB_BITS 8 >> + unsigned int threadnb:FLOW_THREADNB_BITS; >> }; >> >> +#define FLOW_THREADNB_SIZE (1 << FLOW_THREADNB_BITS) >> +#define FLOW_THREADNB_MAX (FLOW_THREADNB_SIZE - 1) >> +#define FLOW_THREADNB_INVALID FLOW_THREADNB_MAX >> + >> #define FLOW_INDEX_BITS 17 /* 128k - 1 */ >> #define FLOW_MAX MAX_FROM_BITS(FLOW_INDEX_BITS) >> >> @@ -249,6 +257,10 @@ flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif, >> union flow; >> >> void flow_init(void); >> +bool flow_in_epoll(const struct flow_common *f); >> +int flow_epollfd(const struct flow_common *f); >> +void flow_thread_set(struct flow_common *f, int threadnb); >> +void flow_thread_register(int threadnb, int epollfd); >> void flow_defer_handler(const struct ctx *c, const struct timespec *now); >> int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, >> int fd); >> diff --git a/passt.c b/passt.c >> index af928111786b..37f2c897be84 100644 >> --- a/passt.c >> +++ b/passt.c >> @@ -285,6 +285,7 @@ int main(int argc, char **argv) >> c.epollfd = epoll_create1(EPOLL_CLOEXEC); >> if (c.epollfd == -1) >> die_perror("Failed to create epoll file descriptor"); >> + flow_thread_register(0, c.epollfd); > > Temporary, I suppose. If not, maybe it deserves its own constant, such > as EPOLLFD_ID_DEFAULT? I agree > >> if (getrlimit(RLIMIT_NOFILE, &limit)) >> die_perror("Failed to get maximum value of open files limit"); >> diff --git a/tcp.c b/tcp.c >> index db9f17c0622f..8c49852b8454 100644 >> --- a/tcp.c >> +++ b/tcp.c >> @@ -504,25 +504,27 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags) >> */ >> static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) >> { >> - int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; >> + int m = flow_in_epoll(&conn->f) ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; >> union epoll_ref ref = { .type = EPOLL_TYPE_TCP, .fd = conn->sock, >> .flowside = FLOW_SIDX(conn, !TAPSIDE(conn)), }; >> struct epoll_event ev = { .data.u64 = ref.u64 }; >> + int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) >> + : c->epollfd; > > We usually align the second expression to the ternary operator, see > also example in sock_l4_sa(), say: > > int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) > : c->epollfd; > I do like this usually. Missed this one when I change the name of the function. >> >> if (conn->events == CLOSED) { >> - if (conn->in_epoll) >> - epoll_del(c->epollfd, conn->sock); >> + if (flow_in_epoll(&conn->f)) >> + epoll_del(epollfd, conn->sock); >> if (conn->timer != -1) >> - epoll_del(c->epollfd, conn->timer); >> + epoll_del(epollfd, conn->timer); >> return 0; >> } >> >> ev.events = tcp_conn_epoll_events(conn->events, conn->flags); >> >> - if (epoll_ctl(c->epollfd, m, conn->sock, &ev)) >> + if (epoll_ctl(epollfd, m, conn->sock, &ev)) >> return -errno; >> >> - conn->in_epoll = true; >> + flow_thread_set(&conn->f, 0); > > Not temporary I guess, maybe it's an EPOLLFD_ID_DEFAULT? I agree > >> >> if (conn->timer != -1) { >> union epoll_ref ref_t = { .type = EPOLL_TYPE_TCP_TIMER, >> @@ -531,7 +533,8 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) >> struct epoll_event ev_t = { .data.u64 = ref_t.u64, >> .events = EPOLLIN | EPOLLET }; >> >> - if (epoll_ctl(c->epollfd, EPOLL_CTL_MOD, conn->timer, &ev_t)) >> + if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_MOD, >> + conn->timer, &ev_t)) >> return -errno; >> } >> >> @@ -540,12 +543,11 @@ static int tcp_epoll_ctl(const struct ctx *c, struct tcp_tap_conn *conn) >> >> /** >> * tcp_timer_ctl() - Set timerfd based on flags/events, create timerfd if needed >> - * @c: Execution context >> * @conn: Connection pointer >> * >> * #syscalls timerfd_create timerfd_settime >> */ >> -static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) >> +static void tcp_timer_ctl(struct tcp_tap_conn *conn) >> { >> struct itimerspec it = { { 0 }, { 0 } }; >> >> @@ -570,7 +572,8 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) >> } >> conn->timer = fd; >> >> - if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, conn->timer, &ev)) { >> + if (epoll_ctl(flow_epollfd(&conn->f), EPOLL_CTL_ADD, > > I wouldn't find it outrageous if you assigned an epollfd local variable > first so that this doesn't need two lines. I will do... > >> + conn->timer, &ev)) { >> flow_dbg_perror(conn, "failed to add timer"); >> close(conn->timer); >> conn->timer = -1; >> @@ -628,7 +631,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, >> * flags and factor this into the logic below. >> */ >> if (flag == ACK_FROM_TAP_DUE) >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> >> return; >> } >> @@ -644,7 +647,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, >> if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE || >> (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) || >> (flag == ~ACK_TO_TAP_DUE && (conn->flags & ACK_FROM_TAP_DUE))) >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> } >> >> /** >> @@ -699,7 +702,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, >> tcp_epoll_ctl(c, conn); >> >> if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> } >> >> /** >> @@ -1757,7 +1760,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, >> seq, conn->seq_from_tap); >> >> tcp_send_flag(c, conn, ACK); >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> >> if (p->count == 1) { >> tcp_tap_window_update(c, conn, >> @@ -2406,7 +2409,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) >> >> if (conn->flags & ACK_TO_TAP_DUE) { >> tcp_send_flag(c, conn, ACK_IF_NEEDED); >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> } else if (conn->flags & ACK_FROM_TAP_DUE) { >> if (!(conn->events & ESTABLISHED)) { >> flow_dbg(conn, "handshake timeout"); >> @@ -2428,7 +2431,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) >> return; >> >> tcp_data_from_sock(c, conn); >> - tcp_timer_ctl(c, conn); >> + tcp_timer_ctl(conn); >> } >> } else { >> struct itimerspec new = { { 0 }, { ACT_TIMEOUT, 0 } }; >> @@ -3476,7 +3479,7 @@ int tcp_flow_migrate_source_ext(const struct ctx *c, >> if (c->migrate_no_linger) >> close(s); >> else >> - epoll_del(c->epollfd, s); >> + epoll_del(flow_epollfd(&conn->f), s); >> >> /* Adjustments unrelated to FIN segments: sequence numbers we dumped are >> * based on the end of the queues. >> @@ -3625,7 +3628,7 @@ static int tcp_flow_repair_connect(const struct ctx *c, >> return rc; >> } >> >> - conn->in_epoll = 0; >> + conn->f.threadnb = FLOW_THREADNB_INVALID; >> conn->timer = -1; >> conn->listening_sock = -1; >> >> diff --git a/tcp_conn.h b/tcp_conn.h >> index 38b5c541f003..81333122d531 100644 >> --- a/tcp_conn.h >> +++ b/tcp_conn.h >> @@ -12,7 +12,6 @@ >> /** >> * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced) >> * @f: Generic flow information >> - * @in_epoll: Is the connection in the epoll set? >> * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT >> * @ws_from_tap: Window scaling factor advertised from tap/guest >> * @ws_to_tap: Window scaling factor advertised to tap/guest >> @@ -36,8 +35,6 @@ struct tcp_tap_conn { >> /* Must be first element */ >> struct flow_common f; >> >> - bool in_epoll :1; >> - >> #define TCP_RETRANS_BITS 3 >> unsigned int retrans :TCP_RETRANS_BITS; >> #define TCP_MAX_RETRANS MAX_FROM_BITS(TCP_RETRANS_BITS) >> @@ -196,7 +193,6 @@ struct tcp_tap_transfer_ext { >> * @written: Bytes written (not fully written from one other side read) >> * @events: Events observed/actions performed on connection >> * @flags: Connection flags (attributes, not events) >> - * @in_epoll: Is the connection in the epoll set? >> */ >> struct tcp_splice_conn { >> /* Must be first element */ >> @@ -220,8 +216,6 @@ struct tcp_splice_conn { >> #define RCVLOWAT_SET(sidei_) ((sidei_) ? BIT(1) : BIT(0)) >> #define RCVLOWAT_ACT(sidei_) ((sidei_) ? BIT(3) : BIT(2)) >> #define CLOSING BIT(4) >> - >> - bool in_epoll :1; >> }; >> >> /* Socket pools */ >> @@ -245,7 +239,7 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd >> bool tcp_flow_is_established(const struct tcp_tap_conn *conn); >> >> bool tcp_splice_flow_defer(struct tcp_splice_conn *conn); >> -void tcp_splice_timer(const struct ctx *c, struct tcp_splice_conn *conn); >> +void tcp_splice_timer(struct tcp_splice_conn *conn); >> int tcp_conn_pool_sock(int pool[]); >> int tcp_conn_sock(sa_family_t af); >> int tcp_sock_refill_pool(int pool[], sa_family_t af); >> diff --git a/tcp_splice.c b/tcp_splice.c >> index 6f21184bdc55..703bd7610890 100644 >> --- a/tcp_splice.c >> +++ b/tcp_splice.c >> @@ -149,7 +149,9 @@ static void tcp_splice_conn_epoll_events(uint16_t events, >> static int tcp_splice_epoll_ctl(const struct ctx *c, >> struct tcp_splice_conn *conn) >> { >> - int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD; >> + int epollfd = flow_in_epoll(&conn->f) ? flow_epollfd(&conn->f) >> + : c->epollfd; > > Same as above. > > Given the new, more limited usage of c->epollfd, if it's not a temporary > thing (I couldn't quite guess what your plan is), maybe worth updating > its documentation in struct ctx? In a later patch I remove epollfd from ctx and rely on the threadnb (0 for main) to get epollfd. Thansk, Laurent