* [PATCH v2 0/6] Improvements to flow specific logging
@ 2026-06-17 3:11 David Gibson
2026-06-17 3:11 ` [PATCH v2 1/6] flow: Regularise flow specific logging helpers David Gibson
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
While working on podman bug 23739, I found some deficiences in how and
when we report various socket errors related to specific flows. Here
are some preliminary patches to improve that.
I think patches 1..3 are pretty straightforward and safe. 4/4 I think
is a good idea in principle - promoting the priority of a number of
messages now that we have ratelimiting. However, plumbing 'now' all
the places we need it resuls in a *lot* of churn, so we might want to
postpone or split it up.
Still, current drafted posted for consideration.
v2:
* Added patch 2/6 At Stefano's suggestion
* Simplified the errno Handling fix at Stefano's suggestion
* Added patch 6/6 which I noticed while revising 5/6.
* Made a handful of minor corrections based on feedback.
David Gibson (6):
flow: Regularise flow specific logging helpers
flow: Indent flow details messages
flow: Include flow details with higher priority log messages
flow, udp: Fix errno handling in udp_flow_sock()
flow, treewide: Promote priority of selected flow-linked messages
udp: Improve messages for errors getting errors
flow.c | 103 ++++++-------
flow.h | 72 ++++-----
icmp.c | 27 ++--
icmp.h | 3 +-
migrate.c | 27 ++--
migrate.h | 8 +-
passt.c | 8 +-
tcp.c | 387 ++++++++++++++++++++++++++++---------------------
tcp.h | 5 +-
tcp_buf.c | 67 +++++----
tcp_buf.h | 7 +-
tcp_conn.h | 8 +-
tcp_internal.h | 27 ++--
tcp_splice.c | 59 +++++---
tcp_splice.h | 3 +-
tcp_vu.c | 31 ++--
tcp_vu.h | 5 +-
udp.c | 61 +++++---
udp_flow.c | 18 ++-
19 files changed, 528 insertions(+), 398 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/6] flow: Regularise flow specific logging helpers 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson @ 2026-06-17 3:11 ` David Gibson 2026-06-17 3:11 ` [PATCH v2 2/6] flow: Indent flow details messages David Gibson ` (4 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson flow.h has a collection of logging helpers that automatically include information about a specific flow. Which variants are present are a bit ad-hoc, based on what we happened to want use (e.g. there are no LOG_WARNING level versions, at present). There's also a rather awkward and only occasionally used flow_log_details_() helper to print additional log messages with more details of the flow (basically its addresses). It's particularly awkward to try to combine that with ratelimiting. Re-organise this to be based around a flow_log__() internal helper, which has bool parameters to include strerror() / perror information and/or the extra details. Add wrapper macros for all combinations of perror, ratelimiting and DEBUG/WARNING/ERR priorities. Be a little more consistent about parameter order between the various functions / macros / wrappers while we're at it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- flow.c | 89 ++++++++++++++++++++++++++++------------------------------ flow.h | 70 +++++++++++++++++++++++---------------------- tcp.c | 4 +-- udp.c | 6 ++-- 4 files changed, 85 insertions(+), 84 deletions(-) diff --git a/flow.c b/flow.c index 7a5c81c2..e71f3de7 100644 --- a/flow.c +++ b/flow.c @@ -269,17 +269,20 @@ int flowside_connect(const struct ctx *c, int s, return connect(s, &sa.sa, socklen_inany(&sa)); } -/** flow_log_ - Log flow-related message +/** flow_log__ - Log flow-related message, internal helper * @f: flow the message is related to - * @newline: Append newline at the end of the message, if missing * @pri: Log priority + * @perror: Append strerror(errno) output + * @details: Add lines with flow addresses + * @state: State to display details for (used during state changes) * @fmt: Format string * @...: printf-arguments */ -void flow_log_(const struct flow_common *f, bool newline, int pri, - const char *fmt, ...) +void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, + enum flow_state state, const char *fmt, ...) { const char *type_or_state; + int errno_ = errno; char msg[BUFSIZ]; va_list args; @@ -293,45 +296,41 @@ void flow_log_(const struct flow_common *f, bool newline, int pri, else type_or_state = FLOW_TYPE(f); - logmsg(newline, false, pri, + logmsg(!perror, false, pri, "Flow %u (%s): %s", flow_idx(f), type_or_state, msg); -} -/** flow_log_details_() - Log the details of a flow - * @f: flow to log - * @pri: Log priority - * @state: State to log details according to - * - * Logs the details of the flow: endpoints, interfaces, type etc. - */ -void flow_log_details_(const struct flow_common *f, int pri, - enum flow_state state) -{ - char estr0[INANY_ADDRSTRLEN], fstr0[INANY_ADDRSTRLEN]; - char estr1[INANY_ADDRSTRLEN], fstr1[INANY_ADDRSTRLEN]; - const struct flowside *ini = &f->side[INISIDE]; - const struct flowside *tgt = &f->side[TGTSIDE]; - - if (state >= FLOW_STATE_TGT) - flow_log_(f, true, pri, - "%s [%s]:%hu -> [%s]:%hu => %s [%s]:%hu -> [%s]:%hu", - pif_name(f->pif[INISIDE]), - inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), - ini->eport, - inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), - ini->oport, - pif_name(f->pif[TGTSIDE]), - inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)), - tgt->oport, - inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)), - tgt->eport); - else if (state >= FLOW_STATE_INI) - flow_log_(f, true, pri, "%s [%s]:%hu -> [%s]:%hu => ?", - pif_name(f->pif[INISIDE]), - inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), - ini->eport, - inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), - ini->oport); + if (perror) + logmsg(true, true, pri, ": %s", strerror_(errno_)); + + if (details) { + char estr0[INANY_ADDRSTRLEN], fstr0[INANY_ADDRSTRLEN]; + char estr1[INANY_ADDRSTRLEN], fstr1[INANY_ADDRSTRLEN]; + const struct flowside *ini = &f->side[INISIDE]; + const struct flowside *tgt = &f->side[TGTSIDE]; + + if (state >= FLOW_STATE_TGT) { + flow_log__(f, pri, false, false, state, +"%s [%s]:%hu -> [%s]:%hu => %s [%s]:%hu -> [%s]:%hu", + pif_name(f->pif[INISIDE]), + inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), + ini->eport, + inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), + ini->oport, + pif_name(f->pif[TGTSIDE]), + inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)), + tgt->oport, + inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)), + tgt->eport); + } else if (state >= FLOW_STATE_INI) { + flow_log__(f, pri, false, false, state, + "%s [%s]:%hu -> [%s]:%hu => ?", + pif_name(f->pif[INISIDE]), + inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), + ini->eport, + inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), + ini->oport); + } + } } /** @@ -347,10 +346,9 @@ static void flow_set_state(struct flow_common *f, enum flow_state state) assert(oldstate < FLOW_NUM_STATES); f->state = state; - flow_log_(f, true, LOG_DEBUG, "%s -> %s", flow_state_str[oldstate], + flow_log__(f, LOG_DEBUG, false, true, MAX(state, oldstate), + "%s -> %s", flow_state_str[oldstate], FLOW_STATE(f)); - - flow_log_details_(f, LOG_DEBUG, MAX(state, oldstate)); } /** @@ -551,8 +549,7 @@ norule: /* This shouldn't happen, because if there's no rule for it we should * have no listening socket that would let us get here */ - flow_dbg(flow, "Missing forward rule"); - flow_log_details_(f, LOG_DEBUG, f->state); + flow_log(flow, LOG_DEBUG, false, true, "Missing forward rule"); nofwd: flow_err(flow, "No rules to forward %s %s [%s]:%hu -> [%s]:%hu", diff --git a/flow.h b/flow.h index 2623b3db..cfa22783 100644 --- a/flow.h +++ b/flow.h @@ -280,42 +280,46 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, int fd); -void flow_log_(const struct flow_common *f, bool newline, int pri, - const char *fmt, ...) - __attribute__((format(printf, 4, 5))); - -#define flow_log(f_, pri, ...) flow_log_(&(f_)->f, true, (pri), __VA_ARGS__) -#define flow_dbg(f, ...) flow_log((f), LOG_DEBUG, __VA_ARGS__) -#define flow_err(f, ...) flow_log((f), LOG_ERR, __VA_ARGS__) - -#define flow_trace(f, ...) \ +void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, + enum flow_state state, const char *fmt, ...); + +#define flow_log_(f_, pri_, perror_, details_, ...) \ + flow_log__((f_), (pri_), (perror_), (details_), (f_)->state, \ + __VA_ARGS__) + +#define flow_log(flow_, pri_, perror_, details_, ...) \ + flow_log_(&(flow_)->f, (pri_), (perror_), (details_), __VA_ARGS__) + +#define flow_dbg(flow_, ...) \ + flow_log((flow_), LOG_DEBUG, false, false, __VA_ARGS__) +#define flow_warn(flow_, ...) \ + flow_log((flow_), LOG_WARNING, false, false, __VA_ARGS__) +#define flow_err(flow_, ...) \ + flow_log((flow_), LOG_ERR, false, false, __VA_ARGS__) +#define flow_trace(flow_, ...) \ do { \ if (log_trace) \ - flow_dbg((f), __VA_ARGS__); \ - } while (0) - -#define flow_log_perror_(f, pri, ...) \ - do { \ - int errno_ = errno; \ - flow_log_((f), false, (pri), __VA_ARGS__); \ - logmsg(true, true, (pri), ": %s", strerror_(errno_)); \ + flow_dbg((flow_), __VA_ARGS__); \ } while (0) -#define flow_dbg_perror(f_, ...) flow_log_perror_(&(f_)->f, LOG_DEBUG, __VA_ARGS__) -#define flow_perror(f_, ...) flow_log_perror_(&(f_)->f, LOG_ERR, __VA_ARGS__) - -void flow_log_details_(const struct flow_common *f, int pri, - enum flow_state state); -#define flow_log_details(f_, pri) \ - flow_log_details_(&((f_)->f), (pri), (f_)->f.state) -#define flow_dbg_details(f_) flow_log_details((f_), LOG_DEBUG) -#define flow_err_details(f_) flow_log_details((f_), LOG_ERR) - -#define flow_dbg_ratelimit(f, now, ...) \ - logmsg_ratelimit(flow_dbg, debug, now, f, __VA_ARGS__) -#define flow_err_ratelimit(f, now, ...) \ - logmsg_ratelimit(flow_err, err, now, f, __VA_ARGS__) -#define flow_perror_ratelimit(f, now, ...) \ - logmsg_ratelimit(flow_perror, err, now, f, __VA_ARGS__) +#define flow_dbg_perror(flow_, ...) \ + flow_log((flow_), LOG_DEBUG, true, false, __VA_ARGS__) +#define flow_warn_perror(flow_, ...) \ + flow_log((flow_), LOG_WARNING, true, false, __VA_ARGS__) +#define flow_perror(flow_, ...) \ + flow_log((flow_), LOG_ERR, true, false, __VA_ARGS__) + +#define flow_dbg_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_dbg, debug, (now_), (flow_), __VA_ARGS__) +#define flow_dbg_perror_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_dbg_perror, debug, (now_), (flow_), __VA_ARGS__) +#define flow_warn_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_warn, warn, (now_), (flow_), __VA_ARGS__) +#define flow_warn_perror_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_warn_perror, warn, (now_), (flow_), __VA_ARGS__) +#define flow_err_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_err, err, (now_), (flow_), __VA_ARGS__) +#define flow_perror_ratelimit(flow_, now_, ...) \ + logmsg_ratelimit(flow_perror, err, (now_), (flow_), __VA_ARGS__) #endif /* FLOW_H */ diff --git a/tcp.c b/tcp.c index 097ab72e..3538eec2 100644 --- a/tcp.c +++ b/tcp.c @@ -1404,8 +1404,8 @@ void tcp_linger0_(const struct flow_common *f, int s) */ if (setsockopt(s, SOL_SOCKET, SO_LINGER, &linger0, sizeof(linger0)) < 0) { - flow_log_perror_(f, LOG_DEBUG, - "SO_LINGER failed, may not send RST to peer"); + flow_log_(f, LOG_DEBUG, true, false, + "SO_LINGER failed, may not send RST to peer"); } } diff --git a/udp.c b/udp.c index 375befa7..c95bb200 100644 --- a/udp.c +++ b/udp.c @@ -943,7 +943,8 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, if (events & EPOLLERR) { if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) { - flow_err(uflow, "Unrecoverable error on flow socket"); + flow_log(uflow, LOG_ERR, false, true, + "Unrecoverable error on flow socket"); goto fail; } } @@ -974,7 +975,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, udp_buf_sock_to_tap(c, s, n, tosidx); } } else { - flow_err(uflow, + flow_log(uflow, LOG_ERR, false, true, "No support for forwarding UDP from %s to %s", pif_name(pif_at_sidx(ref.flowside)), pif_name(topif)); @@ -984,7 +985,6 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, return; fail: - flow_err_details(uflow); udp_flow_close(c, uflow); } -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/6] flow: Indent flow details messages 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson 2026-06-17 3:11 ` [PATCH v2 1/6] flow: Regularise flow specific logging helpers David Gibson @ 2026-06-17 3:11 ` David Gibson 2026-06-17 3:11 ` [PATCH v2 3/6] flow: Include flow details with higher priority log messages David Gibson ` (3 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson flow_log__() has the option to append an extra log message with address details of the flow, to give more context to errors or warnings. Add an indent to this part of the message, to emphasise that it's additional information related to a nearby message, rather than a message in its own right. Suggested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- flow.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/flow.c b/flow.c index e71f3de7..2f1530b5 100644 --- a/flow.c +++ b/flow.c @@ -310,7 +310,7 @@ void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, if (state >= FLOW_STATE_TGT) { flow_log__(f, pri, false, false, state, -"%s [%s]:%hu -> [%s]:%hu => %s [%s]:%hu -> [%s]:%hu", +" %s [%s]:%hu -> [%s]:%hu => %s [%s]:%hu -> [%s]:%hu", pif_name(f->pif[INISIDE]), inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), ini->eport, @@ -323,7 +323,7 @@ void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, tgt->eport); } else if (state >= FLOW_STATE_INI) { flow_log__(f, pri, false, false, state, - "%s [%s]:%hu -> [%s]:%hu => ?", + " %s [%s]:%hu -> [%s]:%hu => ?", pif_name(f->pif[INISIDE]), inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), ini->eport, -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 3/6] flow: Include flow details with higher priority log messages 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson 2026-06-17 3:11 ` [PATCH v2 1/6] flow: Regularise flow specific logging helpers David Gibson 2026-06-17 3:11 ` [PATCH v2 2/6] flow: Indent flow details messages David Gibson @ 2026-06-17 3:11 ` David Gibson 2026-06-17 3:11 ` [PATCH v2 4/6] flow, udp: Fix errno handling in udp_flow_sock() David Gibson ` (2 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson Currently flow_log() and related functions / macros have a 'details' parameter which indicates whether to add extra messages with details of the flow's addresses. This is still a bit awkward to invoke, and only used in a few places. Change the logic, to automatically include the details if and only if the log priority is greater than LOG_DEBUG. Rationale: If at debug log level, there are already a bunch of debug messages tracking the flow life cycle, which include those details (we make sure to retain those). It's usually pretty easy to cross reference a specific flow debug message with the flow's history including the details. If at higher log level, and we generate a flow-connected error or warning we don't have those life cycle messages. So, just giving the flow index doesn't really tell you anything about which flow tripped the error. Adding the address details make the error message significantly more useful. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- flow.c | 2 +- flow.h | 22 +++++++++++----------- udp.c | 5 ++--- 3 files changed, 14 insertions(+), 15 deletions(-) diff --git a/flow.c b/flow.c index 2f1530b5..596a4722 100644 --- a/flow.c +++ b/flow.c @@ -549,7 +549,7 @@ norule: /* This shouldn't happen, because if there's no rule for it we should * have no listening socket that would let us get here */ - flow_log(flow, LOG_DEBUG, false, true, "Missing forward rule"); + flow_dbg(flow, "Missing forward rule"); nofwd: flow_err(flow, "No rules to forward %s %s [%s]:%hu -> [%s]:%hu", diff --git a/flow.h b/flow.h index cfa22783..b994f54a 100644 --- a/flow.h +++ b/flow.h @@ -283,19 +283,19 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, enum flow_state state, const char *fmt, ...); -#define flow_log_(f_, pri_, perror_, details_, ...) \ - flow_log__((f_), (pri_), (perror_), (details_), (f_)->state, \ - __VA_ARGS__) +#define flow_log_(f_, pri_, perror_, ...) \ + flow_log__((f_), (pri_), (perror_), (pri_) > LOG_DEBUG, \ + (f_)->state, __VA_ARGS__) -#define flow_log(flow_, pri_, perror_, details_, ...) \ - flow_log_(&(flow_)->f, (pri_), (perror_), (details_), __VA_ARGS__) +#define flow_log(flow_, pri_, perror_, ...) \ + flow_log_(&(flow_)->f, (pri_), (perror_), __VA_ARGS__) #define flow_dbg(flow_, ...) \ - flow_log((flow_), LOG_DEBUG, false, false, __VA_ARGS__) + flow_log((flow_), LOG_DEBUG, false, __VA_ARGS__) #define flow_warn(flow_, ...) \ - flow_log((flow_), LOG_WARNING, false, false, __VA_ARGS__) + flow_log((flow_), LOG_WARNING, false, __VA_ARGS__) #define flow_err(flow_, ...) \ - flow_log((flow_), LOG_ERR, false, false, __VA_ARGS__) + flow_log((flow_), LOG_ERR, false, __VA_ARGS__) #define flow_trace(flow_, ...) \ do { \ if (log_trace) \ @@ -303,11 +303,11 @@ void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, } while (0) #define flow_dbg_perror(flow_, ...) \ - flow_log((flow_), LOG_DEBUG, true, false, __VA_ARGS__) + flow_log((flow_), LOG_DEBUG, true, __VA_ARGS__) #define flow_warn_perror(flow_, ...) \ - flow_log((flow_), LOG_WARNING, true, false, __VA_ARGS__) + flow_log((flow_), LOG_WARNING, true, __VA_ARGS__) #define flow_perror(flow_, ...) \ - flow_log((flow_), LOG_ERR, true, false, __VA_ARGS__) + flow_log((flow_), LOG_ERR, true, __VA_ARGS__) #define flow_dbg_ratelimit(flow_, now_, ...) \ logmsg_ratelimit(flow_dbg, debug, (now_), (flow_), __VA_ARGS__) diff --git a/udp.c b/udp.c index c95bb200..2d9e2f04 100644 --- a/udp.c +++ b/udp.c @@ -943,8 +943,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, if (events & EPOLLERR) { if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) { - flow_log(uflow, LOG_ERR, false, true, - "Unrecoverable error on flow socket"); + flow_err(uflow, "Unrecoverable error on flow socket"); goto fail; } } @@ -975,7 +974,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, udp_buf_sock_to_tap(c, s, n, tosidx); } } else { - flow_log(uflow, LOG_ERR, false, true, + flow_err(uflow, "No support for forwarding UDP from %s to %s", pif_name(pif_at_sidx(ref.flowside)), pif_name(topif)); -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 4/6] flow, udp: Fix errno handling in udp_flow_sock() 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson ` (2 preceding siblings ...) 2026-06-17 3:11 ` [PATCH v2 3/6] flow: Include flow details with higher priority log messages David Gibson @ 2026-06-17 3:11 ` David Gibson 2026-06-17 3:11 ` [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages David Gibson 2026-06-17 3:11 ` [PATCH v2 6/6] udp: Improve messages for errors getting errors David Gibson 5 siblings, 0 replies; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson flowside_connect() behaves much like connect(2) itself, returning -1 on error with errno set to the error code. One of the callers, in udp_flow_sock(), uses the errno code with flow_dbg_perror() *after* it's called epoll_del() and close() either of which could clobber errno. Fix that by moving the debug message earlier: there's no reason it has to wait until the epoll_del() and close(). While we're at it, adjust the comment on flowside_connect() to clarify that it sets errno (rather than, say, returning an error code). Suggested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- flow.c | 2 +- udp_flow.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/flow.c b/flow.c index 596a4722..20938034 100644 --- a/flow.c +++ b/flow.c @@ -258,7 +258,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, * * Connect @s to the endpoint address and port from @tgt. * - * Return: 0 on success, negative on error + * Return: 0 on success, negative on error (setting errno) */ int flowside_connect(const struct ctx *c, int s, uint8_t pif, const struct flowside *tgt) diff --git a/udp_flow.c b/udp_flow.c index 35417bc4..31e2f8a7 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -90,11 +90,11 @@ static int udp_flow_sock(const struct ctx *c, if (flowside_connect(c, s, pif, side) < 0) { rc = -errno; + flow_dbg_perror(uflow, "Couldn't connect flow socket"); epoll_del(flow_epollfd(&uflow->f), s); close(s); - flow_dbg_perror(uflow, "Couldn't connect flow socket"); return rc; } uflow->s[sidei] = s; -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson ` (3 preceding siblings ...) 2026-06-17 3:11 ` [PATCH v2 4/6] flow, udp: Fix errno handling in udp_flow_sock() David Gibson @ 2026-06-17 3:11 ` David Gibson 2026-06-17 5:23 ` Stefano Brivio 2026-06-17 3:11 ` [PATCH v2 6/6] udp: Improve messages for errors getting errors David Gibson 5 siblings, 1 reply; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson Most of out flow specific log messages are debug level for fear of flooding the logs, even when they report real error conditions that might be off significance. Now that we have the mechanisms for log message rate limiting, we can do better. Promote many flow related messages to warning or error level, with rate limiting. While we're there add ratelimiting to a handful of existing warning or error level messages. They general heuristic is to promote messages that report a failure which is not something that should be triggered by the guest doing something weird. This mostly means failures from socket operations we expect to be legitimate. Adding the ratelimiting means plumbing the 'now' timestamp through much more of the code, hence the large churn. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- flow.c | 14 +- flow.h | 6 +- icmp.c | 27 ++-- icmp.h | 3 +- migrate.c | 27 ++-- migrate.h | 8 +- passt.c | 8 +- tcp.c | 383 ++++++++++++++++++++++++++++--------------------- tcp.h | 5 +- tcp_buf.c | 67 +++++---- tcp_buf.h | 7 +- tcp_conn.h | 8 +- tcp_internal.h | 27 ++-- tcp_splice.c | 59 +++++--- tcp_splice.h | 3 +- tcp_vu.c | 31 ++-- tcp_vu.h | 5 +- udp.c | 47 +++--- udp_flow.c | 18 ++- 19 files changed, 439 insertions(+), 314 deletions(-) diff --git a/flow.c b/flow.c index 20938034..7ba82f54 100644 --- a/flow.c +++ b/flow.c @@ -1084,16 +1084,18 @@ static int flow_migrate_repair_all(struct ctx *c, bool enable) * @c: Execution context * @stage: Migration stage information (unused) * @fd: Migration file descriptor (unused) + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, - int fd) + int fd, const struct timespec *now) { int rc; (void)stage; (void)fd; + (void)now; if (flow_migrate_need_repair()) repair_wait(c); @@ -1109,11 +1111,12 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, * @c: Execution context (unused) * @stage: Migration stage information (unused) * @fd: Migration file descriptor + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, - int fd) + int fd, const struct timespec *now) { uint32_t count = 0; bool first = true; @@ -1184,7 +1187,7 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, * as EIO). */ foreach_established_tcp_flow(flow) { - rc = tcp_flow_migrate_source_ext(c, fd, &flow->tcp); + rc = tcp_flow_migrate_source_ext(c, fd, &flow->tcp, now); if (rc) { flow_err(flow, "Can't send extended data: %s", strerror_(-rc)); @@ -1202,11 +1205,12 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, * @c: Execution context * @stage: Migration stage information (unused) * @fd: Migration file descriptor + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, - int fd) + int fd, const struct timespec *now) { uint32_t count; unsigned i; @@ -1243,7 +1247,7 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, repair_flush(c); for (i = 0; i < count; i++) { - rc = tcp_flow_migrate_target_ext(c, &flowtab[i].tcp, fd); + rc = tcp_flow_migrate_target_ext(c, &flowtab[i].tcp, fd, now); if (rc) { flow_dbg(FLOW(i), "Migration data failure, abort: %s", strerror_(-rc)); diff --git a/flow.h b/flow.h index b994f54a..94a26f37 100644 --- a/flow.h +++ b/flow.h @@ -274,11 +274,11 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now); int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, int fd); int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, - int fd); + int fd, const struct timespec *now); int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, - int fd); + int fd, const struct timespec *now); int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, - int fd); + int fd, const struct timespec *now); void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, enum flow_state state, const char *fmt, ...); diff --git a/icmp.c b/icmp.c index 18b6106a..0fe23667 100644 --- a/icmp.c +++ b/icmp.c @@ -66,8 +66,10 @@ static struct icmp_ping_flow *ping_at_sidx(flow_sidx_t sidx) * icmp_sock_handler() - Handle new data from ICMP or ICMPv6 socket * @c: Execution context * @ref: epoll reference + * @now: Current timestamp */ -void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) +void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, + const struct timespec *now) { struct icmp_ping_flow *pingf = ping_at_sidx(ref.flowside); const struct flowside *ini = &pingf->f.side[INISIDE]; @@ -84,7 +86,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) n = recvfrom(ref.fd, buf, sizeof(buf), 0, &sr.sa, &sl); if (n < 0) { - flow_perror(pingf, "recvfrom() error"); + flow_perror_ratelimit(pingf, now, "recvfrom() error"); return; } @@ -142,7 +144,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) return; unexpected: - flow_err(pingf, "Unexpected packet on ping socket"); + flow_err_ratelimit(pingf, now, "Unexpected packet on ping socket"); } /** @@ -165,12 +167,15 @@ static void icmp_ping_close(const struct ctx *c, * @id: ICMP id for the new socket * @saddr: Source address * @daddr: Destination address + * @now: Current timestamp * * Return: newly opened ping flow, or NULL on failure */ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, sa_family_t af, uint16_t id, - const void *saddr, const void *daddr) + const void *saddr, + const void *daddr, + const struct timespec *now) { uint8_t proto = af == AF_INET ? IPPROTO_ICMP : IPPROTO_ICMPV6; uint8_t flowtype = af == AF_INET ? FLOW_PING4 : FLOW_PING6; @@ -186,10 +191,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, goto cancel; if (flow->f.pif[TGTSIDE] != PIF_HOST) { - flow_err(flow, "No support for forwarding %s from %s to %s", - proto == IPPROTO_ICMP ? "ICMP" : "ICMPv6", - pif_name(flow->f.pif[INISIDE]), - pif_name(flow->f.pif[TGTSIDE])); + flow_err_ratelimit( + flow, now, "No support for forwarding %s from %s to %s", + proto == IPPROTO_ICMP ? "ICMP" : "ICMPv6", + pif_name(flow->f.pif[INISIDE]), + pif_name(flow->f.pif[TGTSIDE])); goto cancel; } @@ -299,7 +305,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, if (flow) pingf = &flow->ping; - else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr))) + else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr, now))) return 1; tgt = &pingf->f.side[TGTSIDE]; @@ -317,7 +323,8 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, msh.msg_flags = 0; if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) { - flow_dbg_perror(pingf, "failed to relay request to socket"); + flow_warn_perror_ratelimit(pingf, now, + "failed to relay request to socket"); } else { flow_dbg(pingf, "echo request to socket, ID: %"PRIu16", seq: %"PRIu16, diff --git a/icmp.h b/icmp.h index e8b928b6..08870dc7 100644 --- a/icmp.h +++ b/icmp.h @@ -13,7 +13,8 @@ struct ctx; struct icmp_ping_flow; -void icmp_sock_handler(const struct ctx *c, union epoll_ref ref); +void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, + const struct timespec *now); int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, struct iov_tail *data, const struct timespec *now); diff --git a/migrate.c b/migrate.c index 8937b85f..b233dc7f 100644 --- a/migrate.c +++ b/migrate.c @@ -48,12 +48,14 @@ struct migrate_seen_addrs_v2 { * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ /* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */ static int seen_addrs_source_v2(struct ctx *c, - const struct migrate_stage *stage, int fd) + const struct migrate_stage *stage, int fd, + const struct timespec *now) { struct migrate_seen_addrs_v2 addrs = { .addr6 = c->ip6.addr_seen, @@ -62,6 +64,7 @@ static int seen_addrs_source_v2(struct ctx *c, }; (void)stage; + (void)now; memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac)); @@ -76,15 +79,18 @@ static int seen_addrs_source_v2(struct ctx *c, * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ static int seen_addrs_target_v2(struct ctx *c, - const struct migrate_stage *stage, int fd) + const struct migrate_stage *stage, int fd, + const struct timespec *now) { struct migrate_seen_addrs_v2 addrs; (void)stage; + (void)now; if (read_all_buf(fd, &addrs, sizeof(addrs))) return errno; @@ -133,10 +139,11 @@ static const struct migrate_version versions[] = { * migrate_source() - Migration as source, send state to hypervisor * @c: Execution context * @fd: File descriptor for state transfer + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ -static int migrate_source(struct ctx *c, int fd) +static int migrate_source(struct ctx *c, int fd, const struct timespec *now) { const struct migrate_version *v = CURRENT_VERSION; const struct migrate_header header = { @@ -159,7 +166,7 @@ static int migrate_source(struct ctx *c, int fd) debug("Source side migration stage: %s", s->name); - if ((ret = s->source(c, s, fd))) { + if ((ret = s->source(c, s, fd, now))) { err("Source migration stage: %s: %s, abort", s->name, strerror_(ret)); return ret; @@ -209,10 +216,11 @@ static const struct migrate_version *migrate_target_read_header(int fd) * migrate_target() - Migration as target, receive state from hypervisor * @c: Execution context * @fd: File descriptor for state transfer + * @now: Current timestamp * * Return: 0 on success, positive error code on failure */ -static int migrate_target(struct ctx *c, int fd) +static int migrate_target(struct ctx *c, int fd, const struct timespec *now) { const struct migrate_version *v; const struct migrate_stage *s; @@ -227,7 +235,7 @@ static int migrate_target(struct ctx *c, int fd) debug("Target side migration stage: %s", s->name); - if ((ret = s->target(c, s, fd))) { + if ((ret = s->target(c, s, fd, now))) { err("Target migration stage: %s: %s, abort", s->name, strerror_(ret)); return ret; @@ -282,8 +290,9 @@ void migrate_request(struct ctx *c, int fd, bool target) /** * migrate_handler() - Send/receive passt internal state to/from hypervisor * @c: Execution context + * @now: Current timestamp */ -void migrate_handler(struct ctx *c) +void migrate_handler(struct ctx *c, const struct timespec *now) { int rc; @@ -294,9 +303,9 @@ void migrate_handler(struct ctx *c) c->device_state_fd, c->migrate_target); if (c->migrate_target) - rc = migrate_target(c, c->device_state_fd); + rc = migrate_target(c, c->device_state_fd, now); else - rc = migrate_source(c, c->device_state_fd); + rc = migrate_source(c, c->device_state_fd, now); migrate_close(c); diff --git a/migrate.h b/migrate.h index 844be477..515fa3a7 100644 --- a/migrate.h +++ b/migrate.h @@ -30,8 +30,10 @@ struct migrate_header { */ struct migrate_stage { const char *name; - int (*source)(struct ctx *c, const struct migrate_stage *stage, int fd); - int (*target)(struct ctx *c, const struct migrate_stage *stage, int fd); + int (*source)(struct ctx *c, const struct migrate_stage *stage, int fd, + const struct timespec *now); + int (*target)(struct ctx *c, const struct migrate_stage *stage, int fd, + const struct timespec *now); /* Add here separate rollback callbacks if needed */ }; @@ -49,6 +51,6 @@ struct migrate_version { void migrate_init(struct ctx *c); void migrate_close(struct ctx *c); void migrate_request(struct ctx *c, int fd, bool target); -void migrate_handler(struct ctx *c); +void migrate_handler(struct ctx *c, const struct timespec *now); #endif /* MIGRATE_H */ diff --git a/passt.c b/passt.c index b1d35d5b..65a07d72 100644 --- a/passt.c +++ b/passt.c @@ -258,7 +258,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) pasta_netns_quit_timer_handler(c, ref); break; case EPOLL_TYPE_TCP: - tcp_sock_handler(c, ref, eventmask); + tcp_sock_handler(c, ref, eventmask, &now); break; case EPOLL_TYPE_TCP_SPLICE: tcp_splice_sock_handler(c, ref, eventmask, &now); @@ -267,7 +267,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) tcp_listen_handler(c, ref, &now); break; case EPOLL_TYPE_TCP_TIMER: - tcp_timer_handler(c, ref); + tcp_timer_handler(c, ref, &now); break; case EPOLL_TYPE_UDP_LISTEN: udp_listen_sock_handler(c, ref, eventmask, &now); @@ -276,7 +276,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) udp_sock_handler(c, ref, eventmask, &now); break; case EPOLL_TYPE_PING: - icmp_sock_handler(c, ref); + icmp_sock_handler(c, ref, &now); break; case EPOLL_TYPE_VHOST_CMD: vu_control_handler(c->vdev, c->fd_tap, eventmask); @@ -309,7 +309,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) post_handler(c, &now); - migrate_handler(c); + migrate_handler(c, &now); } /** diff --git a/tcp.c b/tcp.c index 3538eec2..43ba4df4 100644 --- a/tcp.c +++ b/tcp.c @@ -464,17 +464,21 @@ static struct tcp_tap_conn *conn_at_sidx(flow_sidx_t sidx) * tcp_set_peek_offset() - Set SO_PEEK_OFF offset on connection if supported * @conn: Pointer to the TCP connection structure * @offset: Offset in bytes + * @now: Current timestamp * * Return: -1 when it fails, 0 otherwise. */ -int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset) +int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset, + const struct timespec *now) { if (!peek_offset_cap) return 0; if (setsockopt(conn->sock, SOL_SOCKET, SO_PEEK_OFF, &offset, sizeof(offset))) { - flow_perror(conn, "Failed to set SO_PEEK_OFF to %i", offset); + flow_perror_ratelimit(conn, now, + "Failed to set SO_PEEK_OFF to %i", + offset); return -1; } return 0; @@ -545,9 +549,12 @@ static int tcp_epoll_ctl(struct tcp_tap_conn *conn) * tcp_timer_ctl() - Set timerfd based on flags/events, create timerfd if needed * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp + * * #syscalls timerfd_create timerfd_settime|timerfd_settime32 */ -static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) +static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { struct itimerspec it = { { 0 }, { 0 } }; @@ -560,12 +567,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) fd = timerfd_create(CLOCK_MONOTONIC, 0); if (fd == -1) { - flow_dbg_perror(conn, "failed to get timer"); + flow_perror_ratelimit(conn, now, "failed to get timer"); return; } if (fd > FD_REF_MAX) { - flow_dbg(conn, "timer fd overflow (%d > %d)", - fd, FD_REF_MAX); + flow_err_ratelimit(conn, now, + "timer fd overflow (%d > %d)", + fd, FD_REF_MAX); close(fd); return; } @@ -575,7 +583,7 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) ref.fd = fd; if (epoll_add(flow_epollfd(&conn->f), EPOLLIN | EPOLLET, ref) < 0) { - flow_dbg(conn, "failed to add timer"); + flow_perror_ratelimit(conn, now, "failed to add timer"); close(fd); return; } @@ -622,9 +630,10 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) * @c: Execution context * @conn: Connection pointer * @flag: Flag to set, or ~flag to unset + * @now: Current timestamp */ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long flag) + unsigned long flag, const struct timespec *now) { if (flag & (flag - 1)) { int flag_index = fls(~flag); @@ -646,7 +655,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, * flags and factor this into the logic below. */ if (flag == ACK_FROM_TAP_DUE) - tcp_timer_ctl(c, conn); + tcp_timer_ctl(c, conn, now); return; } @@ -662,7 +671,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE || (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) || (flag == ~ACK_TO_TAP_DUE && (conn->flags & ACK_FROM_TAP_DUE))) - tcp_timer_ctl(c, conn); + tcp_timer_ctl(c, conn, now); } /** @@ -670,9 +679,10 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection pointer * @event: Connection event + * @now: Current timestamp */ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long event) + unsigned long event, const struct timespec *now) { int prev, new, num = fls(event); @@ -710,7 +720,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, num == -1 ? "CLOSED" : tcp_event_str[num]); if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD)) { - conn_flag(c, conn, ACTIVE_CLOSE); + conn_flag(c, conn, ACTIVE_CLOSE, now); } else { if (event == CLOSED) flow_hash_remove(c, TAP_SIDX(conn)); @@ -1101,13 +1111,15 @@ static uint32_t tcp_wnd_from_sndbuf(int s, struct tcp_tap_conn *conn, * @conn: Connection pointer * @force_seq: Force ACK sequence to latest segment, instead of checking socket * @tinfo: tcp_info from kernel, can be NULL if not pre-fetched + * @now: Current timestamp * * Return: 1 if sequence or window were updated, 0 otherwise * * #syscalls ioctl */ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, - bool force_seq, struct tcp_info_linux *tinfo) + bool force_seq, struct tcp_info_linux *tinfo, + const struct timespec *now) { uint32_t prev_wnd_to_tap = conn->wnd_to_tap << conn->ws_to_tap; uint32_t prev_ack_to_tap = conn->seq_ack_to_tap; @@ -1221,7 +1233,7 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, */ /* cppcheck-suppress [knownConditionTrueFalse, unmatchedSuppression] */ if (!conn->wnd_to_tap) - conn_flag(c, conn, ACK_TO_TAP_DUE); + conn_flag(c, conn, ACK_TO_TAP_DUE, now); out: /* Opportunistically store RTT approximation on valid TCP_INFO data */ @@ -1237,17 +1249,19 @@ out: * @c: Execution context * @conn: Connection pointer * @seq: Current ACK sequence, host order + * @now: Current timestamp */ static void tcp_update_seqack_from_tap(const struct ctx *c, - struct tcp_tap_conn *conn, uint32_t seq) + struct tcp_tap_conn *conn, uint32_t seq, + const struct timespec *now) { if (seq == conn->seq_to_tap) - conn_flag(c, conn, ~ACK_FROM_TAP_DUE); + conn_flag(c, conn, ~ACK_FROM_TAP_DUE, now); if (SEQ_GT(seq, conn->seq_ack_from_tap)) { /* Forward progress, but more data to acknowledge: reschedule */ if (SEQ_LT(seq, conn->seq_to_tap)) - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); conn->retries = 0; conn->seq_ack_from_tap = seq; @@ -1258,16 +1272,18 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, * tcp_rewind_seq() - Rewind sequence to tap and socket offset to current ACK * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp * * Return: 0 on success, -1 on failure, with connection reset */ -static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) +static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { conn->seq_to_tap = conn->seq_ack_from_tap; conn->events &= ~TAP_FIN_SENT; - if (tcp_set_peek_offset(conn, 0)) { - tcp_rst(c, conn); + if (tcp_set_peek_offset(conn, 0, now)) { + tcp_rst(c, conn, now); return -1; } @@ -1282,6 +1298,7 @@ static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) * @th: TCP header to update * @opts: TCP option buffer (output parameter) * @optlen: size of the TCP option buffer (output parameter) + * @now: Current timestamp * * Return: < 0 error code on connection reset, * 0 if there is no flag to send @@ -1289,7 +1306,7 @@ static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) */ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, int flags, struct tcphdr *th, struct tcp_syn_opts *opts, - size_t *optlen) + size_t *optlen, const struct timespec *now) { struct tcp_info_linux tinfo = { 0 }; socklen_t sl = sizeof(tinfo); @@ -1297,19 +1314,19 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && !flags && conn->wnd_to_tap) { - conn_flag(c, conn, ~ACK_TO_TAP_DUE); + conn_flag(c, conn, ~ACK_TO_TAP_DUE, now); return 0; } if (getsockopt(s, SOL_TCP, TCP_INFO, &tinfo, &sl)) { - conn_event(c, conn, CLOSED); + conn_event(c, conn, CLOSED, now); return -ECONNRESET; } if (!(conn->flags & LOCAL)) tcp_rtt_dst_check(conn, &tinfo); - if (!tcp_update_seqack_wnd(c, conn, !!flags, &tinfo) && !flags) + if (!tcp_update_seqack_wnd(c, conn, !!flags, &tinfo, now) && !flags) return 0; *optlen = 0; @@ -1351,13 +1368,13 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && conn->wnd_to_tap) - conn_flag(c, conn, ~ACK_TO_TAP_DUE); + conn_flag(c, conn, ~ACK_TO_TAP_DUE, now); else - conn_flag(c, conn, ACK_TO_TAP_DUE); + conn_flag(c, conn, ACK_TO_TAP_DUE, now); } if (th->fin) - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ if (th->fin || th->syn) @@ -1371,18 +1388,19 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection pointer * @flags: TCP flags: if not set, send segment only if ACK is due + * @now: Current timestamp * * Return: negative error code on fatal connection failure, 0 otherwise */ static int tcp_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, - int flags) + int flags, const struct timespec *now) { int ret; if (c->mode == MODE_VU) - ret = tcp_vu_send_flag(c, conn, flags); + ret = tcp_vu_send_flag(c, conn, flags, now); else - ret = tcp_buf_send_flag(c, conn, flags); + ret = tcp_buf_send_flag(c, conn, flags, now); return ret == -EAGAIN ? 0 : ret; } @@ -1413,28 +1431,32 @@ void tcp_linger0_(const struct flow_common *f, int s) * tcp_sock_rst() - Close TCP connection forcing RST on socket side * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp */ -static void tcp_sock_rst(const struct ctx *c, struct tcp_tap_conn *conn) +static void tcp_sock_rst(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { /* Force RST on socket to inform the peer */ tcp_linger0(conn, conn->sock); - conn_event(c, conn, CLOSED); + conn_event(c, conn, CLOSED, now); } /** * tcp_rst_do() - Reset a tap connection: send RST segment on both sides, close * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp */ -void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn) +void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { if (conn->events == CLOSED) return; /* Send RST on tap */ - tcp_send_flag(c, conn, RST); + tcp_send_flag(c, conn, RST, now); - tcp_sock_rst(c, conn); + tcp_sock_rst(c, conn, now); } /** @@ -1459,11 +1481,13 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection pointer * @wnd: Window value, host order, unscaled + * @now: Current timestamp * * Return: false on zero window (not stored to wnd_from_tap), true otherwise */ static bool tcp_tap_window_update(const struct ctx *c, - struct tcp_tap_conn *conn, unsigned wnd) + struct tcp_tap_conn *conn, unsigned wnd, + const struct timespec *now) { wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); @@ -1474,7 +1498,7 @@ static bool tcp_tap_window_update(const struct ctx *c, * that no data beyond the updated window will be acknowledged. */ if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { - tcp_rewind_seq(c, conn); + tcp_rewind_seq(c, conn, now); return false; } @@ -1600,9 +1624,11 @@ static uint16_t tcp_conn_tap_mss(const struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection entry for socket to bind * @s: Outbound TCP socket + * @now: Current timestamp */ static void tcp_bind_outbound(const struct ctx *c, - const struct tcp_tap_conn *conn, int s) + const struct tcp_tap_conn *conn, int s, + const struct timespec *now) { const struct flowside *tgt = &conn->f.side[TGTSIDE]; union sockaddr_inany bind_sa; @@ -1613,10 +1639,11 @@ static void tcp_bind_outbound(const struct ctx *c, if (bind(s, &bind_sa.sa, socklen_inany(&bind_sa))) { char sstr[INANY_ADDRSTRLEN]; - flow_dbg_perror(conn, - "Can't bind TCP outbound socket to %s:%hu", - inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)), - tgt->oport); + flow_warn_perror_ratelimit( + conn, now, + "Can't bind TCP outbound socket to %s:%hu", + inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)), + tgt->oport); } } @@ -1625,9 +1652,10 @@ static void tcp_bind_outbound(const struct ctx *c, if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, c->ip4.ifname_out, strlen(c->ip4.ifname_out))) { - flow_dbg_perror(conn, - "Can't bind IPv4 TCP socket to interface %s", - c->ip4.ifname_out); + flow_warn_perror_ratelimit( + conn, now, + "Can't bind IPv4 TCP socket to interface %s", + c->ip4.ifname_out); } } } else if (bind_sa.sa_family == AF_INET6) { @@ -1635,9 +1663,10 @@ static void tcp_bind_outbound(const struct ctx *c, if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, c->ip6.ifname_out, strlen(c->ip6.ifname_out))) { - flow_dbg_perror(conn, - "Can't bind IPv6 TCP socket to interface %s", - c->ip6.ifname_out); + flow_warn_perror_ratelimit( + conn, now, + "Can't bind IPv6 TCP socket to interface %s", + c->ip6.ifname_out); } } } @@ -1681,9 +1710,11 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, goto cancel; if (flow->f.pif[TGTSIDE] != PIF_HOST) { - flow_err(flow, "No support for forwarding TCP from %s to %s", - pif_name(flow->f.pif[INISIDE]), - pif_name(flow->f.pif[TGTSIDE])); + flow_err_ratelimit( + flow, now, + "No support for forwarding TCP from %s to %s", + pif_name(flow->f.pif[INISIDE]), + pif_name(flow->f.pif[TGTSIDE])); goto cancel; } @@ -1716,7 +1747,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, */ if (bind(s, &sa.sa, socklen_inany(&sa))) { if (errno != EADDRNOTAVAIL && errno != EACCES) - conn_flag(c, conn, LOCAL); + conn_flag(c, conn, LOCAL, now); } else { /* Not a local, bound destination, inconclusive test */ close(s); @@ -1728,10 +1759,10 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, conn->timer = -1; flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, s, TGTSIDE) < 0) { - flow_perror(flow, "Can't register with epoll"); + flow_perror_ratelimit(flow, now, "Can't register with epoll"); goto cancel; } - conn_event(c, conn, TAP_SYN_RCVD); + conn_event(c, conn, TAP_SYN_RCVD, now); conn->wnd_to_tap = WINDOW_DEFAULT; @@ -1756,11 +1787,11 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, conn->seq_to_tap = tcp_init_seq(hash, now); conn->seq_ack_from_tap = conn->seq_to_tap; - tcp_bind_outbound(c, conn, s); + tcp_bind_outbound(c, conn, s, now); if (connect(s, &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); goto cancel; } @@ -1768,10 +1799,10 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, } else { tcp_get_sndbuf(conn); - if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK, now)) goto cancel; - conn_event(c, conn, TAP_SYN_ACK_SENT); + conn_event(c, conn, TAP_SYN_ACK_SENT, now); } tcp_epoll_ctl(conn); @@ -1830,12 +1861,14 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) * tcp_data_from_sock() - Handle new data from socket, queue to tap, in window * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp * * Return: negative on connection reset, 0 otherwise * * #syscalls recvmsg */ -static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) +static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; uint32_t already_sent; @@ -1845,8 +1878,8 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", conn->seq_ack_from_tap, conn->seq_to_tap); conn->seq_to_tap = conn->seq_ack_from_tap; - if (tcp_set_peek_offset(conn, 0)) { - tcp_rst(c, conn); + if (tcp_set_peek_offset(conn, 0, now)) { + tcp_rst(c, conn, now); return -1; } } @@ -1855,16 +1888,16 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; if (!wnd_scaled || already_sent >= wnd_scaled) { - conn_flag(c, conn, ACK_FROM_TAP_BLOCKS); - conn_flag(c, conn, STALLED); - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_BLOCKS, now); + conn_flag(c, conn, STALLED, now); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); return 0; } if (c->mode == MODE_VU) - return tcp_vu_data_from_sock(c, conn, already_sent); + return tcp_vu_data_from_sock(c, conn, already_sent, now); - return tcp_buf_data_from_sock(c, conn, already_sent); + return tcp_buf_data_from_sock(c, conn, already_sent, now); } /** @@ -1890,13 +1923,15 @@ static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len) * @conn: Connection pointer * @p: Pool of TCP packets, with TCP headers * @idx: Index of first data packet in pool + * @now: Current timestamp * * #syscalls sendmsg * * Return: count of consumed packets */ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, - const struct pool *p, int idx) + const struct pool *p, int idx, + const struct timespec *now) { int i, iov_i, ack = 0, fin = 0, retr = 0, keep = -1, partial_send = 0; uint16_t max_ack_seq_wnd = conn->wnd_from_tap; @@ -1933,7 +1968,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, return -1; if (th->rst) { - tcp_sock_rst(c, conn); + tcp_sock_rst(c, conn, now); return 1; } @@ -1946,10 +1981,10 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, "keep-alive sequence: %u, previous: %u", seq, conn->seq_from_tap); - if (tcp_send_flag(c, conn, ACK)) + if (tcp_send_flag(c, conn, ACK, now)) return -1; - tcp_timer_ctl(c, conn); + tcp_timer_ctl(c, conn, now); if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, &((int){ 1 }), sizeof(int))) @@ -1957,7 +1992,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, if (p->count == 1) { tcp_tap_window_update(c, conn, - ntohs(th->window)); + ntohs(th->window), now); return 1; } @@ -1983,7 +2018,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, * well. */ if (!ntohs(th->window)) - tcp_rewind_seq(c, conn); + tcp_rewind_seq(c, conn, now); max_ack_seq_wnd = ntohs(th->window); max_ack_seq = ack_seq; @@ -2043,19 +2078,19 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, /* On socket flush failure, pretend there was no ACK, try again later */ if (ack && !tcp_sock_consume(conn, max_ack_seq)) - tcp_update_seqack_from_tap(c, conn, max_ack_seq); + tcp_update_seqack_from_tap(c, conn, max_ack_seq, now); - tcp_tap_window_update(c, conn, max_ack_seq_wnd); + tcp_tap_window_update(c, conn, max_ack_seq_wnd, now); if (retr) { flow_trace(conn, "fast re-transmit, ACK: %u, previous sequence: %u", conn->seq_ack_from_tap, conn->seq_to_tap); - if (tcp_rewind_seq(c, conn)) + if (tcp_rewind_seq(c, conn, now)) return -1; - tcp_data_from_sock(c, conn); + tcp_data_from_sock(c, conn, now); } if (!iov_i) @@ -2071,7 +2106,7 @@ eintr: * Then swiftly looked away and left. */ conn->seq_from_tap = seq_from_tap; - if (tcp_send_flag(c, conn, ACK)) + if (tcp_send_flag(c, conn, ACK, now)) return -1; } @@ -2079,7 +2114,7 @@ eintr: goto eintr; if (errno == EAGAIN || errno == EWOULDBLOCK) { - if (tcp_send_flag(c, conn, ACK | DUP_ACK)) + if (tcp_send_flag(c, conn, ACK | DUP_ACK, now)) return -1; uint32_t events = tcp_conn_epoll_events(conn->events, @@ -2115,7 +2150,7 @@ out: */ if (conn->seq_dup_ack_approx != (conn->seq_from_tap & 0xff)) { conn->seq_dup_ack_approx = conn->seq_from_tap & 0xff; - if (tcp_send_flag(c, conn, ACK | DUP_ACK)) + if (tcp_send_flag(c, conn, ACK | DUP_ACK, now)) return -1; } return p->count - idx; @@ -2123,14 +2158,14 @@ out: if (ack && conn->events & TAP_FIN_SENT && conn->seq_ack_from_tap == conn->seq_to_tap) - conn_event(c, conn, TAP_FIN_ACKED); + conn_event(c, conn, TAP_FIN_ACKED, now); if (fin && !partial_send) { conn->seq_from_tap++; - conn_event(c, conn, TAP_FIN_RCVD); + conn_event(c, conn, TAP_FIN_RCVD, now); } else { - if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) + if (tcp_send_flag(c, conn, ACK_IF_NEEDED, now)) return -1; } @@ -2144,13 +2179,15 @@ out: * @th: TCP header of SYN, ACK segment: caller MUST ensure it's there * @opts: Pointer to start of options * @optlen: Bytes in options: caller MUST ensure available length + * @now: Current timestamp */ static void tcp_conn_from_sock_finish(const struct ctx *c, struct tcp_tap_conn *conn, const struct tcphdr *th, - const char *opts, size_t optlen) + const char *opts, size_t optlen, + const struct timespec *now) { - tcp_tap_window_update(c, conn, ntohs(th->window)); + tcp_tap_window_update(c, conn, ntohs(th->window), now); tcp_get_tap_ws(conn, opts, optlen); /* First value is not scaled */ @@ -2163,21 +2200,21 @@ static void tcp_conn_from_sock_finish(const struct ctx *c, conn->seq_from_tap = conn->seq_init_from_tap; conn->seq_ack_to_tap = conn->seq_from_tap; - conn_event(c, conn, ESTABLISHED); - if (tcp_set_peek_offset(conn, 0)) { - tcp_rst(c, conn); + conn_event(c, conn, ESTABLISHED, now); + if (tcp_set_peek_offset(conn, 0, now)) { + tcp_rst(c, conn, now); return; } - if (tcp_send_flag(c, conn, ACK)) { - tcp_rst(c, conn); + if (tcp_send_flag(c, conn, ACK, now)) { + tcp_rst(c, conn, now); return; } /* The client might have sent data already, which we didn't * dequeue waiting for SYN,ACK from tap -- check now. */ - tcp_data_from_sock(c, conn); + tcp_data_from_sock(c, conn, now); } /** @@ -2318,7 +2355,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, flow_trace(conn, "packet length %zu from tap", l4len); if (th->rst) { - tcp_sock_rst(c, conn); + tcp_sock_rst(c, conn, now); return 1; } @@ -2326,12 +2363,13 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, conn->tap_inactive = false; if (th->ack && !(conn->events & ESTABLISHED)) - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq), now); /* Establishing connection from socket */ if (conn->events & SOCK_ACCEPTED) { if (th->syn && th->ack && !th->fin) { - tcp_conn_from_sock_finish(c, conn, th, opts, optlen); + tcp_conn_from_sock_finish(c, conn, th, opts, optlen, + now); return 1; } @@ -2346,22 +2384,23 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, if (!(conn->events & TAP_SYN_ACK_SENT)) goto reset; - conn_event(c, conn, ESTABLISHED); - if (tcp_set_peek_offset(conn, 0)) + conn_event(c, conn, ESTABLISHED, now); + if (tcp_set_peek_offset(conn, 0, now)) goto reset; if (th->fin) { conn->seq_from_tap++; if (shutdown(conn->sock, SHUT_WR) < 0) { - flow_dbg_perror(conn, "shutdown() failed"); + flow_warn_perror_ratelimit(conn, now, + "shutdown() failed"); goto reset; } - if (tcp_send_flag(c, conn, ACK)) + if (tcp_send_flag(c, conn, ACK, now)) goto reset; - conn_event(c, conn, SOCK_FIN_SENT); + conn_event(c, conn, SOCK_FIN_SENT, now); return 1; } @@ -2369,8 +2408,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, if (!th->ack) goto reset; - if (tcp_tap_window_update(c, conn, ntohs(th->window))) - tcp_data_from_sock(c, conn); + if (tcp_tap_window_update(c, conn, ntohs(th->window), now)) + tcp_data_from_sock(c, conn, now); if (p->count - idx == 1) return 1; @@ -2394,38 +2433,40 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, * later */ if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq))) - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq), + now); if (retr) { flow_trace(conn, "fast re-transmit, ACK: %u, previous sequence: %u", ntohl(th->ack_seq), conn->seq_to_tap); - if (tcp_rewind_seq(c, conn)) + if (tcp_rewind_seq(c, conn, now)) return -1; } - if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr) - tcp_data_from_sock(c, conn); + if (tcp_tap_window_update(c, conn, ntohs(th->window), now) || + retr) + tcp_data_from_sock(c, conn, now); if (conn->seq_ack_from_tap == conn->seq_to_tap) { if (th->ack && conn->events & TAP_FIN_SENT) - conn_event(c, conn, TAP_FIN_ACKED); + conn_event(c, conn, TAP_FIN_ACKED, now); if (conn->events & SOCK_FIN_RCVD && conn->events & TAP_FIN_ACKED) - conn_event(c, conn, CLOSED); + conn_event(c, conn, CLOSED, now); } return 1; } /* Established connections accepting data from tap */ - count = tcp_data_from_tap(c, conn, p, idx); + count = tcp_data_from_tap(c, conn, p, idx, now); if (count == -1) goto reset; - conn_flag(c, conn, ~STALLED); + conn_flag(c, conn, ~STALLED, now); if (conn->seq_ack_to_tap != conn->seq_from_tap) ack_due = 1; @@ -2435,12 +2476,13 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, struct tcp_info tinfo; if (shutdown(conn->sock, SHUT_WR) < 0) { - flow_dbg_perror(conn, "shutdown() failed"); + flow_warn_perror_ratelimit(conn, now, + "shutdown() failed"); goto reset; } - conn_event(c, conn, SOCK_FIN_SENT); - if (tcp_send_flag(c, conn, ACK)) + conn_event(c, conn, SOCK_FIN_SENT, now); + if (tcp_send_flag(c, conn, ACK, now)) goto reset; ack_due = 0; @@ -2461,7 +2503,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, } if (ack_due) - conn_flag(c, conn, ACK_TO_TAP_DUE); + conn_flag(c, conn, ACK_TO_TAP_DUE, now); return count; @@ -2470,7 +2512,7 @@ reset: * remaining packets in the batch, since they'd be invalidated when our * RST is received, even if otherwise good. */ - tcp_rst(c, conn); + tcp_rst(c, conn, now); return p->count - idx; } @@ -2478,25 +2520,27 @@ reset: * tcp_connect_finish() - Handle completion of connect() from EPOLLOUT event * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp */ -static void tcp_connect_finish(const struct ctx *c, struct tcp_tap_conn *conn) +static void tcp_connect_finish(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now) { socklen_t sl; int so; sl = sizeof(so); if (getsockopt(conn->sock, SOL_SOCKET, SO_ERROR, &so, &sl) || so) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return; } - if (tcp_send_flag(c, conn, SYN | ACK)) { - tcp_rst(c, conn); + if (tcp_send_flag(c, conn, SYN | ACK, now)) { + tcp_rst(c, conn, now); return; } - conn_event(c, conn, TAP_SYN_ACK_SENT); - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_event(c, conn, TAP_SYN_ACK_SENT, now); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); } /** @@ -2519,13 +2563,13 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow, flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, s, INISIDE) < 0) { - flow_perror(flow, "Can't register with epoll"); - conn_flag(c, conn, CLOSING); + flow_perror_ratelimit(flow, now, "Can't register with epoll"); + conn_flag(c, conn, CLOSING, now); FLOW_ACTIVATE(conn); return; } - conn_event(c, conn, SOCK_ACCEPTED); + conn_event(c, conn, SOCK_ACCEPTED, now); hash = flow_hash_insert(c, TAP_SIDX(conn)); conn->seq_to_tap = tcp_init_seq(hash, now); @@ -2534,13 +2578,13 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow, conn->wnd_from_tap = WINDOW_DEFAULT; - if (tcp_send_flag(c, conn, SYN)) { - conn_flag(c, conn, CLOSING); + if (tcp_send_flag(c, conn, SYN, now)) { + conn_flag(c, conn, CLOSING, now); FLOW_ACTIVATE(conn); return; } - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); tcp_get_sndbuf(conn); @@ -2598,7 +2642,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, switch (flow->f.pif[TGTSIDE]) { case PIF_SPLICE: case PIF_HOST: - tcp_splice_conn_from_sock(c, flow, s); + tcp_splice_conn_from_sock(c, flow, s, now); break; case PIF_TAP: @@ -2606,9 +2650,11 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, break; default: - flow_err(flow, "No support for forwarding TCP from %s to %s", - pif_name(flow->f.pif[INISIDE]), - pif_name(flow->f.pif[TGTSIDE])); + flow_err_ratelimit( + flow, now, + "No support for forwarding TCP from %s to %s", + pif_name(flow->f.pif[INISIDE]), + pif_name(flow->f.pif[TGTSIDE])); goto rst; } @@ -2625,12 +2671,14 @@ cancel: * tcp_timer_handler() - timerfd events: close, send ACK, retransmit, or reset * @c: Execution context * @ref: epoll reference of timer (not connection) + * @now: Current timestamp * * #syscalls timerfd_gettime|timerfd_gettime64 * #syscalls arm:timerfd_gettime64 i686:timerfd_gettime64 * #syscalls arm:timerfd_settime64 i686:timerfd_settime64 */ -void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) +void tcp_timer_handler(const struct ctx *c, union epoll_ref ref, + const struct timespec *now) { struct itimerspec check_armed = { { 0 }, { 0 } }; struct tcp_tap_conn *conn = &FLOW(ref.flow)->tcp; @@ -2643,17 +2691,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) * and we just set the timer to a new point in the future: discard it. */ if (timerfd_gettime(conn->timer, &check_armed)) - flow_perror(conn, "failed to read timer"); + flow_perror_ratelimit(conn, now, "failed to read timer"); if (check_armed.it_value.tv_sec || check_armed.it_value.tv_nsec) return; if (conn->flags & ACK_TO_TAP_DUE) { - if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) { - tcp_rst(c, conn); + if (tcp_send_flag(c, conn, ACK_IF_NEEDED, now)) { + tcp_rst(c, conn, now); return; } - tcp_timer_ctl(c, conn); + tcp_timer_ctl(c, conn, now); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { unsigned int max; @@ -2662,20 +2710,20 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) max = MIN(TCP_MAX_RETRIES, max); if (conn->retries >= max) { flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + tcp_rst(c, conn, now); } else { flow_trace(conn, "SYN timeout, retry"); - if (tcp_send_flag(c, conn, SYN)) { - tcp_rst(c, conn); + if (tcp_send_flag(c, conn, SYN, now)) { + tcp_rst(c, conn, now); return; } conn->retries++; - conn_flag(c, conn, SYN_RETRIED); - tcp_timer_ctl(c, conn); + conn_flag(c, conn, SYN_RETRIED, now); + tcp_timer_ctl(c, conn, now); } } else if (conn->retries == TCP_MAX_RETRIES) { flow_dbg(conn, "retransmissions count exceeded"); - tcp_rst(c, conn); + tcp_rst(c, conn, now); } else { flow_dbg(conn, "ACK timeout, retry"); @@ -2683,11 +2731,11 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) conn->wnd_from_tap = 1; /* Zero-window probe */ conn->retries++; - if (tcp_rewind_seq(c, conn)) + if (tcp_rewind_seq(c, conn, now)) return; - tcp_data_from_sock(c, conn); - tcp_timer_ctl(c, conn); + tcp_data_from_sock(c, conn, now); + tcp_timer_ctl(c, conn, now); } } } @@ -2697,9 +2745,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) * @c: Execution context * @ref: epoll reference * @events: epoll events bitmap + * @now: Current timestamp */ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, - uint32_t events) + uint32_t events, const struct timespec *now) { struct tcp_tap_conn *conn = conn_at_sidx(ref.flowside); @@ -2710,32 +2759,32 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, return; if (events & EPOLLERR) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return; } conn->inactive = false; if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) { - conn_event(c, conn, CLOSED); + conn_event(c, conn, CLOSED, now); return; } if (conn->events & ESTABLISHED) { if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) - conn_event(c, conn, CLOSED); + conn_event(c, conn, CLOSED, now); if (events & (EPOLLRDHUP | EPOLLHUP)) - conn_event(c, conn, SOCK_FIN_RCVD); + conn_event(c, conn, SOCK_FIN_RCVD, now); if (events & EPOLLIN) - tcp_data_from_sock(c, conn); + tcp_data_from_sock(c, conn, now); if (events & EPOLLOUT) { tcp_epoll_ctl(conn); - if (tcp_update_seqack_wnd(c, conn, false, NULL) && - tcp_send_flag(c, conn, ACK)) { - tcp_rst(c, conn); + if (tcp_update_seqack_wnd(c, conn, false, NULL, now) && + tcp_send_flag(c, conn, ACK, now)) { + tcp_rst(c, conn, now); return; } } @@ -2745,7 +2794,7 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, /* EPOLLHUP during handshake: reset */ if (events & EPOLLHUP) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return; } @@ -2755,7 +2804,7 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, if (conn->events == TAP_SYN_RCVD) { if (events & EPOLLOUT) - tcp_connect_finish(c, conn); + tcp_connect_finish(c, conn, now); /* Data? Check later */ } } @@ -2939,8 +2988,8 @@ static void tcp_keepalive(struct ctx *c, const struct timespec *now) if (conn->tap_inactive) { flow_dbg(conn, "No tap activity for least %us, send keepalive", KEEPALIVE_INTERVAL); - if (tcp_send_flag(c, conn, KEEPALIVE)) - tcp_rst(c, conn); + if (tcp_send_flag(c, conn, KEEPALIVE, now)) + tcp_rst(c, conn, now); } /* Ready to check fot next interval */ @@ -2969,7 +3018,7 @@ static void tcp_inactivity(struct ctx *c, const struct timespec *now) /* No activity in this interval, reset */ flow_dbg(conn, "Inactive for at least %us, resetting", INACTIVITY_INTERVAL); - tcp_rst(c, conn); + tcp_rst(c, conn, now); } /* Ready to check fot next interval */ @@ -2985,7 +3034,7 @@ static void tcp_inactivity(struct ctx *c, const struct timespec *now) /* cppcheck-suppress [constParameterPointer, unmatchedSuppression] */ void tcp_defer_handler(struct ctx *c, const struct timespec *now) { - tcp_payload_flush(c); + tcp_payload_flush(c, now); if (timespec_diff_ms(now, &c->tcp.timer_run) < TCP_TIMER_INTERVAL) return; @@ -3532,11 +3581,13 @@ int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn) * @c: Execution context * @fd: Descriptor for state migration * @conn: Pointer to the TCP connection structure + * @now: Current timesstamp * * Return: 0 on success, negative (not -EIO) on failure, -EIO on sending failure */ int tcp_flow_migrate_source_ext(const struct ctx *c, - int fd, const struct tcp_tap_conn *conn) + int fd, const struct tcp_tap_conn *conn, + const struct timespec *now) { uint32_t peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; struct tcp_tap_transfer_ext *t = &migrate_ext[FLOW_IDX(conn)]; @@ -3546,7 +3597,7 @@ int tcp_flow_migrate_source_ext(const struct ctx *c, /* Disable SO_PEEK_OFF, it will make accessing the queues in repair mode * weird. */ - if (tcp_set_peek_offset(conn, -1)) { + if (tcp_set_peek_offset(conn, -1, now)) { rc = -errno; goto fail; } @@ -3811,10 +3862,12 @@ out: * @c: Execution context * @conn: Connection entry to complete with extra data * @fd: Descriptor for state migration + * @now: Current timestamp * * Return: 0 on success, negative on fatal failure, but 0 on single flow failure */ -int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd) +int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, + int fd, const struct timespec *now) { uint32_t peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; struct tcp_tap_transfer_ext t; @@ -3968,13 +4021,13 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd } } - if (tcp_set_peek_offset(conn, peek_offset)) + if (tcp_set_peek_offset(conn, peek_offset, now)) goto fail; - if (tcp_send_flag(c, conn, ACK)) + if (tcp_send_flag(c, conn, ACK, now)) goto fail; - tcp_data_from_sock(c, conn); + tcp_data_from_sock(c, conn, now); if ((rc = tcp_epoll_ctl(conn))) { flow_dbg(conn, @@ -3992,7 +4045,7 @@ fail: } conn->flags = 0; /* Not waiting for ACK, don't schedule timer */ - tcp_rst(c, conn); + tcp_rst(c, conn, now); return 0; } diff --git a/tcp.h b/tcp.h index 57deef57..b68f882d 100644 --- a/tcp.h +++ b/tcp.h @@ -18,11 +18,12 @@ struct ctx; -void tcp_timer_handler(const struct ctx *c, union epoll_ref ref); +void tcp_timer_handler(const struct ctx *c, union epoll_ref ref, + const struct timespec *now); void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, const struct timespec *now); void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, - uint32_t events); + uint32_t events, const struct timespec *now); int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, uint32_t flow_lbl, const struct pool *p, int idx, const struct timespec *now); diff --git a/tcp_buf.c b/tcp_buf.c index 1fc49959..72c45412 100644 --- a/tcp_buf.c +++ b/tcp_buf.c @@ -106,9 +106,11 @@ void tcp_sock_iov_init(const struct ctx *c) * @conns: Array of connection pointers corresponding to queued frames * @frames: Two-dimensional array containing queued frames with sub-iovs * @num_frames: Number of entries in the two arrays to be compared + * @now: Current timestamp */ static void tcp_revert_seq(const struct ctx *c, struct tcp_tap_conn **conns, - struct iovec (*frames)[TCP_NUM_IOVS], int num_frames) + struct iovec (*frames)[TCP_NUM_IOVS], int num_frames, + const struct timespec *now) { int i; @@ -123,16 +125,17 @@ static void tcp_revert_seq(const struct ctx *c, struct tcp_tap_conn **conns, conn->seq_to_tap = seq; peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; - if (tcp_set_peek_offset(conn, peek_offset)) - tcp_rst(c, conn); + if (tcp_set_peek_offset(conn, peek_offset, now)) + tcp_rst(c, conn, now); } } /** * tcp_payload_flush() - Send out buffers for segments with data or flags * @c: Execution context + * @now: Current timestamp */ -void tcp_payload_flush(const struct ctx *c) +void tcp_payload_flush(const struct ctx *c, const struct timespec *now) { size_t m; @@ -140,7 +143,7 @@ void tcp_payload_flush(const struct ctx *c) tcp_payload_used); if (m != tcp_payload_used) { tcp_revert_seq(c, &tcp_frame_conns[m], &tcp_l2_iov[m], - tcp_payload_used - m); + tcp_payload_used - m, now); } tcp_payload_used = 0; } @@ -198,13 +201,15 @@ static void tcp_l2_buf_fill_headers(const struct ctx *c, /** * tcp_buf_send_flag() - Send segment with flags to tap (no payload) - * @c: Execution context - * @conn: Connection pointer - * @flags: TCP flags: if not set, send segment only if ACK is due + * @c: Execution context + * @conn: Connection pointer + * @flags: TCP flags: if not set, send segment only if ACK is due + * @now: Current timestamp * * Return: negative error code on connection reset, 0 otherwise */ -int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) +int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, + const struct timespec *now) { struct tcp_payload_t *payload; struct iovec *iov; @@ -223,7 +228,8 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) payload = iov[TCP_IOV_PAYLOAD].iov_base; seq = conn->seq_to_tap; ret = tcp_prepare_flags(c, conn, flags, &payload->th, - (struct tcp_syn_opts *)&payload->data, &optlen); + (struct tcp_syn_opts *)&payload->data, + &optlen, now); if (ret <= 0) return ret; @@ -253,7 +259,7 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) } if (tcp_payload_used > TCP_FRAMES_MEM - 2) - tcp_payload_flush(c); + tcp_payload_flush(c, now); return 0; } @@ -266,9 +272,11 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) * @no_csum: Don't compute IPv4 checksum, use the one from previous buffer * @seq: Sequence number to be sent * @push: Set PSH flag, last segment in a batch + * @now: Current timestamp */ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, - ssize_t dlen, int no_csum, uint32_t seq, bool push) + ssize_t dlen, int no_csum, uint32_t seq, bool push, + const struct timespec *now) { struct tcp_payload_t *payload; uint32_t check = IP4_CSUM; @@ -302,7 +310,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, tcp_l2_buf_pad(iov); if (++tcp_payload_used > TCP_FRAMES_MEM - 1) - tcp_payload_flush(c); + tcp_payload_flush(c, now); } /** @@ -310,13 +318,14 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection pointer * @already_sent: Number of bytes already sent to tap, but not acked + * @now: Current timestamp * * Return: negative on connection reset, 0 otherwise * * #syscalls recvmsg */ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, - uint32_t already_sent) + uint32_t already_sent, const struct timespec *now) { uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; int fill_bufs, send_bufs = 0, last_len, iov_rem = 0; @@ -336,12 +345,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, } if (tcp_prepare_iov(&mh_sock, iov_sock, already_sent, fill_bufs)) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return -1; } if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { - tcp_payload_flush(c); + tcp_payload_flush(c, now); /* Silence Coverity CWE-125 false positive */ tcp_payload_used = 0; @@ -361,19 +370,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, if (len < 0) { if (errno != EAGAIN && errno != EWOULDBLOCK) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return -errno; } if (already_sent) /* No new data and EAGAIN: set EPOLLET */ - conn_flag(c, conn, STALLED); + conn_flag(c, conn, STALLED, now); return 0; } if (!len) { if (already_sent) { - conn_flag(c, conn, STALLED); + conn_flag(c, conn, STALLED, now); } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { int ret; @@ -388,14 +397,14 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, */ conn->seq_ack_to_tap = conn->seq_from_tap; - ret = tcp_buf_send_flag(c, conn, FIN | ACK); + ret = tcp_buf_send_flag(c, conn, FIN | ACK, now); if (ret) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return ret; } - conn_event(c, conn, TAP_FIN_SENT); - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_event(c, conn, TAP_FIN_SENT, now); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); } return 0; @@ -405,18 +414,18 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, len -= already_sent; if (len <= 0) { - conn_flag(c, conn, STALLED); + conn_flag(c, conn, STALLED, now); return 0; } - conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS); - conn_flag(c, conn, ~STALLED); + conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS, now); + conn_flag(c, conn, ~STALLED, now); send_bufs = DIV_ROUND_UP(len, mss); last_len = len - (send_bufs - 1) * mss; /* Likely, some new data was acked too. */ - tcp_update_seqack_wnd(c, conn, false, NULL); + tcp_update_seqack_wnd(c, conn, false, NULL, now); /* Finally, queue to tap */ dlen = mss; @@ -430,11 +439,11 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, push = true; } - tcp_data_to_tap(c, conn, dlen, no_csum, seq, push); + tcp_data_to_tap(c, conn, dlen, no_csum, seq, push, now); seq += dlen; } - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); return 0; } diff --git a/tcp_buf.h b/tcp_buf.h index 710ed377..5d31cea6 100644 --- a/tcp_buf.h +++ b/tcp_buf.h @@ -7,9 +7,10 @@ #define TCP_BUF_H void tcp_sock_iov_init(const struct ctx *c); -void tcp_payload_flush(const struct ctx *c); +void tcp_payload_flush(const struct ctx *c, const struct timespec *now); int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, - uint32_t already_sent); -int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags); + uint32_t already_sent, const struct timespec *now); +int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, + const struct timespec *now); #endif /*TCP_BUF_H */ diff --git a/tcp_conn.h b/tcp_conn.h index d93ada7d..e3c55341 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -248,11 +248,13 @@ int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn); int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn); int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn); -int tcp_flow_migrate_source_ext(const struct ctx *c, int fd, - const struct tcp_tap_conn *conn); +int tcp_flow_migrate_source_ext(const struct ctx *c, + int fd, const struct tcp_tap_conn *conn, + const struct timespec *now); int tcp_flow_migrate_target(struct ctx *c, int fd); -int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd); +int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, + int fd, const struct timespec *now); bool tcp_flow_is_established(const struct tcp_tap_conn *conn); diff --git a/tcp_internal.h b/tcp_internal.h index c623569c..1af25182 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -158,27 +158,28 @@ struct tcp_syn_opts { extern char tcp_buf_discard [BUF_DISCARD_SIZE]; void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long flag); -#define conn_flag(c, conn, flag) \ + unsigned long flag, const struct timespec *now); +#define conn_flag(c, conn, flag, now) \ do { \ flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ - conn_flag_do(c, conn, flag); \ + conn_flag_do(c, conn, flag, now); \ } while (0) void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long event); -#define conn_event(c, conn, event) \ + unsigned long event, const struct timespec *now); +#define conn_event(c, conn, event, now) \ do { \ flow_trace(conn, "event at %s:%i", __func__, __LINE__); \ - conn_event_do(c, conn, event); \ + conn_event_do(c, conn, event, now); \ } while (0) -void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn); -#define tcp_rst(c, conn) \ +void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn, + const struct timespec *now); +#define tcp_rst(c, conn, now) \ do { \ flow_dbg((conn), "TCP reset at %s:%i", __func__, __LINE__); \ - tcp_rst_do(c, conn); \ + tcp_rst_do(c, conn, now); \ } while (0) struct tcp_info_linux; @@ -194,11 +195,13 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn, size_t dlen, uint32_t csum_flags, uint32_t seq); int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, - bool force_seq, struct tcp_info_linux *tinfo); + bool force_seq, struct tcp_info_linux *tinfo, + const struct timespec *now); int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, int flags, struct tcphdr *th, struct tcp_syn_opts *opts, - size_t *optlen); -int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset); + size_t *optlen, const struct timespec *now); +int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset, + const struct timespec *now); int tcp_prepare_iov(struct msghdr *msg, struct iovec *iov, uint32_t already_sent, int payload_iov_cnt); diff --git a/tcp_splice.c b/tcp_splice.c index 3fd33a10..4b01f1aa 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -136,10 +136,12 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei) /** * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events * @conn: Connection pointer + * @now: Current timestamp * * Return: 0 on success, negative error code on failure (not on deletion) */ -static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn) +static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn, + const struct timespec *now) { uint32_t events[2]; @@ -149,7 +151,7 @@ static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn) if (flow_epoll_set(&conn->f, EPOLL_CTL_MOD, events[0], conn->s[0], 0) || flow_epoll_set(&conn->f, EPOLL_CTL_MOD, events[1], conn->s[1], 1)) { int ret = -errno; - flow_perror(conn, "ERROR on epoll_ctl()"); + flow_perror_ratelimit(conn, now, "ERROR on epoll_ctl()"); return ret; } @@ -201,8 +203,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn, * conn_event_do() - Set and log connection events, update epoll state * @conn: Connection pointer * @event: Connection event + * @now: Current timestamp */ -static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event) +static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event, + const struct timespec *now) { if (event & (event - 1)) { int flag_index = fls(~event); @@ -224,14 +228,14 @@ static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event) flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]); } - if (tcp_splice_epoll_ctl(conn)) + if (tcp_splice_epoll_ctl(conn, now)) conn_flag(conn, CLOSING); } -#define conn_event(conn, event) \ +#define conn_event(conn, event, now) \ do { \ flow_trace(conn, "event at %s:%i",__func__, __LINE__); \ - conn_event_do(conn, event); \ + conn_event_do(conn, event, now); \ } while (0) /** @@ -292,11 +296,13 @@ bool tcp_splice_flow_defer(struct tcp_splice_conn *conn) * tcp_splice_connect_finish() - Completion of connect() or call on success * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp * * Return: 0 on success, -EIO on failure */ static int tcp_splice_connect_finish(const struct ctx *c, - struct tcp_splice_conn *conn) + struct tcp_splice_conn *conn, + const struct timespec *now) { unsigned sidei; int i = 0; @@ -314,8 +320,10 @@ static int tcp_splice_connect_finish(const struct ctx *c, if (conn->pipe[sidei][0] < 0) { if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) { - flow_perror(conn, "cannot create %d->%d pipe", - sidei, !sidei); + flow_perror_ratelimit( + conn, now, + "cannot create %d->%d pipe", + sidei, !sidei); tcp_splice_rst(conn); return -EIO; } @@ -330,7 +338,7 @@ static int tcp_splice_connect_finish(const struct ctx *c, } if (!(conn->events & SPLICE_ESTABLISHED)) - conn_event(conn, SPLICE_ESTABLISHED); + conn_event(conn, SPLICE_ESTABLISHED, now); return 0; } @@ -339,10 +347,12 @@ static int tcp_splice_connect_finish(const struct ctx *c, * tcp_splice_connect() - Create and connect socket for new spliced connection * @c: Execution context * @conn: Connection pointer + * @now: Current timestamp * * Return: 0 for connect() succeeded or in progress, negative value on error */ -static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) +static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, + const struct timespec *now) { const struct flowside *tgt = &conn->f.side[TGTSIDE]; sa_family_t af = inany_v4(&tgt->eaddr) ? AF_INET : AF_INET6; @@ -381,11 +391,11 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, conn->s[0], 0) || flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, conn->s[1], 1)) { int ret = -errno; - flow_perror(conn, "Cannot register to epollfd"); + flow_perror_ratelimit(conn, now, "Cannot register to epollfd"); return ret; } - conn_event(conn, SPLICE_CONNECT); + conn_event(conn, SPLICE_CONNECT, now); if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { @@ -394,8 +404,8 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) return -errno; } } else { - conn_event(conn, SPLICE_ESTABLISHED); - return tcp_splice_connect_finish(c, conn); + conn_event(conn, SPLICE_ESTABLISHED, now); + return tcp_splice_connect_finish(c, conn, now); } return 0; @@ -435,10 +445,12 @@ static int tcp_conn_sock_ns(const struct ctx *c, sa_family_t af) * @flow: flow to initialise * @s0: Accepted (side 0) socket * @sa: Peer address of connection + * @now: Current timestamp * * #syscalls:pasta setsockopt */ -void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) +void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0, + const struct timespec *now) { struct tcp_splice_conn *conn = FLOW_SET_TYPE(flow, FLOW_TCP_SPLICE, tcp_splice); @@ -453,7 +465,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) if (setsockopt(s0, SOL_TCP, TCP_QUICKACK, &((int){ 1 }), sizeof(int))) flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0); - if (tcp_splice_connect(c, conn)) + if (tcp_splice_connect(c, conn, now)) tcp_splice_rst(conn); FLOW_ACTIVATE(conn); @@ -499,7 +511,7 @@ static int tcp_splice_forward(struct ctx *c, if (readlen <= 0) { if (!readlen) /* EOF */ - conn_event(conn, FIN_RCVD(fromsidei)); + conn_event(conn, FIN_RCVD(fromsidei), now); /* We're either blocked or at EOF on the read side, and * there's nothing in the pipe so there's nothing to do @@ -551,9 +563,9 @@ static int tcp_splice_forward(struct ctx *c, * drain. */ if (conn->pending[fromsidei]) - conn_event(conn, OUT_WAIT(!fromsidei)); + conn_event(conn, OUT_WAIT(!fromsidei), now); else - conn_event(conn, ~OUT_WAIT(!fromsidei)); + conn_event(conn, ~OUT_WAIT(!fromsidei), now); if ((conn->events & FIN_RCVD(fromsidei)) && !(conn->events & FIN_SENT(!fromsidei)) && @@ -563,7 +575,7 @@ static int tcp_splice_forward(struct ctx *c, pif_name(conn->f.pif[!fromsidei])); return -1; } - conn_event(conn, FIN_SENT(!fromsidei)); + conn_event(conn, FIN_SENT(!fromsidei), now); } return 0; @@ -593,7 +605,8 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl); if (rc) - flow_perror(conn, "Error retrieving SO_ERROR"); + flow_perror_ratelimit(conn, now, + "Error retrieving SO_ERROR"); else flow_dbg_ratelimit(conn, now, "Error event on %s socket: %s", @@ -610,7 +623,7 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, events); goto reset; } - if (tcp_splice_connect_finish(c, conn)) + if (tcp_splice_connect_finish(c, conn, now)) goto reset; } diff --git a/tcp_splice.h b/tcp_splice.h index 8a1a1f67..c24e419f 100644 --- a/tcp_splice.h +++ b/tcp_splice.h @@ -13,7 +13,8 @@ union sockaddr_inany; void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now); -void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0); +void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0, + const struct timespec *now); void tcp_splice_init(struct ctx *c); #endif /* TCP_SPLICE_H */ diff --git a/tcp_vu.c b/tcp_vu.c index 00616955..868a35b9 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -116,12 +116,14 @@ static int tcp_vu_send_dup(const struct ctx *c, struct vu_virtq *vq, * @c: Execution context * @conn: Connection pointer * @flags: TCP flags: if not set, send segment only if ACK is due + * @now: Current timestamp * * Return: -ECONNRESET on fatal connection error, * -EAGAIN if vhost-user buffers are unavailable, * 0 otherwise */ -int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) +int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, + const struct timespec *now) { struct vu_dev *vdev = c->vdev; struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; @@ -158,7 +160,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_TCP); seq = conn->seq_to_tap; - ret = tcp_prepare_flags(c, conn, flags, &th, &opts, &optlen); + ret = tcp_prepare_flags(c, conn, flags, &th, &opts, &optlen, now); if (ret <= 0) { vu_queue_rewind(vq, elem_cnt); return ret; @@ -424,11 +426,12 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, * @c: Execution context * @conn: Connection pointer * @already_sent: Number of bytes already sent to tap, but not acked + * @now: Current timestamp * * Return: negative on connection reset, 0 otherwise */ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, - uint32_t already_sent) + uint32_t already_sent, const struct timespec *now) { uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; struct vu_dev *vdev = c->vdev; @@ -455,12 +458,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, &elem_cnt, &frame_cnt); if (len < 0) { if (len != -EAGAIN && len != -EWOULDBLOCK) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return len; } if (already_sent) /* No new data and EAGAIN: set EPOLLET */ - conn_flag(c, conn, STALLED); + conn_flag(c, conn, STALLED, now); return 0; } @@ -468,7 +471,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, if (!len) { vu_queue_rewind(vq, elem_cnt); if (already_sent) { - conn_flag(c, conn, STALLED); + conn_flag(c, conn, STALLED, now); } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { int ret; @@ -476,24 +479,24 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, /* See tcp_buf_data_from_sock() */ conn->seq_ack_to_tap = conn->seq_from_tap; - ret = tcp_vu_send_flag(c, conn, FIN | ACK); + ret = tcp_vu_send_flag(c, conn, FIN | ACK, now); if (ret) { - tcp_rst(c, conn); + tcp_rst(c, conn, now); return ret; } - conn_event(c, conn, TAP_FIN_SENT); - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_event(c, conn, TAP_FIN_SENT, now); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); } return 0; } - conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS); - conn_flag(c, conn, ~STALLED); + conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS, now); + conn_flag(c, conn, ~STALLED, now); /* Likely, some new data was acked too. */ - tcp_update_seqack_wnd(c, conn, false, NULL); + tcp_update_seqack_wnd(c, conn, false, NULL, now); /* initialize headers */ /* iov_vu is an array of buffers and the buffer size can be @@ -537,7 +540,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, } vu_queue_notify(vdev, vq); - conn_flag(c, conn, ACK_FROM_TAP_DUE); + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); return 0; } diff --git a/tcp_vu.h b/tcp_vu.h index 30e46925..6f2aa2e1 100644 --- a/tcp_vu.h +++ b/tcp_vu.h @@ -6,8 +6,9 @@ #ifndef TCP_VU_H #define TCP_VU_H -int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags); +int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, + const struct timespec *now); int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, - uint32_t already_sent); + uint32_t already_sent, const struct timespec *now); #endif /*TCP_VU_H */ diff --git a/udp.c b/udp.c index 2d9e2f04..d23163f6 100644 --- a/udp.c +++ b/udp.c @@ -539,6 +539,7 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @pif: Interface on which the error occurred * (only used if @sidx == FLOW_SIDX_NONE) * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) + * @now: Current timestamp * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue @@ -546,7 +547,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * #syscalls recvmsg */ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, - uint8_t pif, in_port_t port) + uint8_t pif, in_port_t port, + const struct timespec *now) { char buf[PKTINFO_SPACE + RECVERR_SPACE]; const struct sock_extended_err *ee; @@ -664,12 +666,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, } fail: - flow_dbg(uflow, "Can't propagate %s error from %s %s to %s %s", - str_ee_origin(ee), - pif_name(pif), - sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), - pif_name(topif), - inany_ntop(&toside->eaddr, astr, sizeof(astr))); + flow_warn_ratelimit(uflow, now, + "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), + sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; } @@ -681,11 +685,13 @@ fail: * @pif: Interface on which the error occurred * (only used if @sidx == FLOW_SIDX_NONE) * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) + * @now: Current timestamp * * Return: number of errors handled, or < 0 if we have an unrecoverable error */ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, - uint8_t pif, in_port_t port) + uint8_t pif, in_port_t port, + const struct timespec *now) { unsigned n_err = 0; socklen_t errlen; @@ -694,7 +700,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, assert(!c->no_udp); /* Empty the error queue */ - while ((rc = udp_sock_recverr(c, s, sidx, pif, port)) > 0) + while ((rc = udp_sock_recverr(c, s, sidx, pif, port, now)) > 0) n_err += rc; if (rc < 0) @@ -869,7 +875,7 @@ void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, strerror_(-rc)); /* Clear errors & carry on */ if (udp_sock_errs(c, s, FLOW_SIDX_NONE, - frompif, port) < 0) { + frompif, port, now) < 0) { err_ratelimit(now, "UDP: Unrecoverable error on listening socket: (%s port %hu)", pif_name(frompif), port); @@ -892,9 +898,10 @@ void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, } else if (flow_sidx_valid(tosidx)) { struct udp_flow *uflow = udp_at_sidx(tosidx); - flow_err(uflow, - "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); + flow_err_ratelimit( + uflow, now, + "No support for forwarding UDP from %s to %s", + pif_name(frompif), pif_name(topif)); discard = true; } else { warn_ratelimit(now, "Discarding datagram without flow"); @@ -942,8 +949,11 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, assert(!c->no_udp && uflow); if (events & EPOLLERR) { - if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) { - flow_err(uflow, "Unrecoverable error on flow socket"); + if (udp_sock_errs(c, ref.fd, ref.flowside, + PIF_NONE, 0, now) < 0) { + flow_err_ratelimit( + uflow, now, + "Unrecoverable error on flow socket"); goto fail; } } @@ -974,7 +984,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, udp_buf_sock_to_tap(c, s, n, tosidx); } } else { - flow_err(uflow, + flow_err_ratelimit(uflow, now, "No support for forwarding UDP from %s to %s", pif_name(pif_at_sidx(ref.flowside)), pif_name(topif)); @@ -1052,8 +1062,9 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, flow_sidx_t fromsidx = flow_sidx_opposite(tosidx); uint8_t frompif = pif_at_sidx(fromsidx); - flow_err(uflow, "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); + flow_err_ratelimit(uflow, now, + "No support for forwarding UDP from %s to %s", + pif_name(frompif), pif_name(topif)); return 1; } toside = flowside_at_sidx(tosidx); diff --git a/udp_flow.c b/udp_flow.c index 31e2f8a7..5a59f7f3 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -64,11 +64,13 @@ void udp_flow_close(const struct ctx *c, struct udp_flow *uflow) * @c: Execution context * @uflow: UDP flow to open socket for * @sidei: Side of @uflow to open socket for + * @now: Current timestamp * * Return: fd of new socket on success, -ve error code on failure */ static int udp_flow_sock(const struct ctx *c, - struct udp_flow *uflow, unsigned sidei) + struct udp_flow *uflow, unsigned sidei, + const struct timespec *now) { const struct flowside *side = &uflow->f.side[sidei]; uint8_t pif = uflow->f.pif[sidei]; @@ -77,7 +79,8 @@ static int udp_flow_sock(const struct ctx *c, s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side); if (s < 0) { - flow_dbg_perror(uflow, "Couldn't open flow specific socket"); + flow_perror_ratelimit(uflow, now, + "Couldn't open flow specific socket"); return s; } @@ -90,11 +93,11 @@ static int udp_flow_sock(const struct ctx *c, if (flowside_connect(c, s, pif, side) < 0) { rc = -errno; - flow_dbg_perror(uflow, "Couldn't connect flow socket"); + flow_warn_perror_ratelimit(uflow, now, + "Couldn't connect flow socket"); epoll_del(flow_epollfd(&uflow->f), s); close(s); - return rc; } uflow->s[sidei] = s; @@ -155,7 +158,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, flow_foreach_sidei(sidei) { if (pif_is_socket(uflow->f.pif[sidei])) - if (udp_flow_sock(c, uflow, sidei) < 0) + if (udp_flow_sock(c, uflow, sidei, now) < 0) goto cancel; } @@ -177,7 +180,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, goto cancel; } if (port != tgt->oport) { - flow_err(uflow, "Unexpected local port"); + flow_err_ratelimit(uflow, now, "Unexpected local port"); goto cancel; } } @@ -249,7 +252,8 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, * been initiated from a socket bound to 0.0.0.0 or ::, we don't * know our address, so we have to leave it unpopulated. */ - flow_err(flow, "Invalid endpoint on UDP recvfrom()"); + flow_err_ratelimit(flow, now, + "Invalid endpoint on UDP recvfrom()"); flow_alloc_cancel(flow); return FLOW_SIDX_NONE; } -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages 2026-06-17 3:11 ` [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages David Gibson @ 2026-06-17 5:23 ` Stefano Brivio 0 siblings, 0 replies; 8+ messages in thread From: Stefano Brivio @ 2026-06-17 5:23 UTC (permalink / raw) To: David Gibson; +Cc: passt-dev, Anshu Kumari [Cc: Anshu, preserving quote, no remarks here from my side] On Wed, 17 Jun 2026 13:11:21 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > Most of out flow specific log messages are debug level for fear of flooding > the logs, even when they report real error conditions that might be off > significance. > > Now that we have the mechanisms for log message rate limiting, we can do > better. Promote many flow related messages to warning or error level, with > rate limiting. While we're there add ratelimiting to a handful of existing > warning or error level messages. > > They general heuristic is to promote messages that report a failure which > is not something that should be triggered by the guest doing something > weird. This mostly means failures from socket operations we expect to be > legitimate. > > Adding the ratelimiting means plumbing the 'now' timestamp through much > more of the code, hence the large churn. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > flow.c | 14 +- > flow.h | 6 +- > icmp.c | 27 ++-- > icmp.h | 3 +- > migrate.c | 27 ++-- > migrate.h | 8 +- > passt.c | 8 +- > tcp.c | 383 ++++++++++++++++++++++++++++--------------------- > tcp.h | 5 +- > tcp_buf.c | 67 +++++---- > tcp_buf.h | 7 +- > tcp_conn.h | 8 +- > tcp_internal.h | 27 ++-- > tcp_splice.c | 59 +++++--- > tcp_splice.h | 3 +- > tcp_vu.c | 31 ++-- > tcp_vu.h | 5 +- > udp.c | 47 +++--- > udp_flow.c | 18 ++- > 19 files changed, 439 insertions(+), 314 deletions(-) > > diff --git a/flow.c b/flow.c > index 20938034..7ba82f54 100644 > --- a/flow.c > +++ b/flow.c > @@ -1084,16 +1084,18 @@ static int flow_migrate_repair_all(struct ctx *c, bool enable) > * @c: Execution context > * @stage: Migration stage information (unused) > * @fd: Migration file descriptor (unused) > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, > - int fd) > + int fd, const struct timespec *now) > { > int rc; > > (void)stage; > (void)fd; > + (void)now; > > if (flow_migrate_need_repair()) > repair_wait(c); > @@ -1109,11 +1111,12 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, > * @c: Execution context (unused) > * @stage: Migration stage information (unused) > * @fd: Migration file descriptor > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > - int fd) > + int fd, const struct timespec *now) > { > uint32_t count = 0; > bool first = true; > @@ -1184,7 +1187,7 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > * as EIO). > */ > foreach_established_tcp_flow(flow) { > - rc = tcp_flow_migrate_source_ext(c, fd, &flow->tcp); > + rc = tcp_flow_migrate_source_ext(c, fd, &flow->tcp, now); > if (rc) { > flow_err(flow, "Can't send extended data: %s", > strerror_(-rc)); > @@ -1202,11 +1205,12 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > * @c: Execution context > * @stage: Migration stage information (unused) > * @fd: Migration file descriptor > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, > - int fd) > + int fd, const struct timespec *now) > { > uint32_t count; > unsigned i; > @@ -1243,7 +1247,7 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, > repair_flush(c); > > for (i = 0; i < count; i++) { > - rc = tcp_flow_migrate_target_ext(c, &flowtab[i].tcp, fd); > + rc = tcp_flow_migrate_target_ext(c, &flowtab[i].tcp, fd, now); > if (rc) { > flow_dbg(FLOW(i), "Migration data failure, abort: %s", > strerror_(-rc)); > diff --git a/flow.h b/flow.h > index b994f54a..94a26f37 100644 > --- a/flow.h > +++ b/flow.h > @@ -274,11 +274,11 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now); > int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, > int fd); > int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, > - int fd); > + int fd, const struct timespec *now); > int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > - int fd); > + int fd, const struct timespec *now); > int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, > - int fd); > + int fd, const struct timespec *now); > > void flow_log__(const struct flow_common *f, int pri, bool perror, bool details, > enum flow_state state, const char *fmt, ...); > diff --git a/icmp.c b/icmp.c > index 18b6106a..0fe23667 100644 > --- a/icmp.c > +++ b/icmp.c > @@ -66,8 +66,10 @@ static struct icmp_ping_flow *ping_at_sidx(flow_sidx_t sidx) > * icmp_sock_handler() - Handle new data from ICMP or ICMPv6 socket > * @c: Execution context > * @ref: epoll reference > + * @now: Current timestamp > */ > -void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) > +void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, > + const struct timespec *now) > { > struct icmp_ping_flow *pingf = ping_at_sidx(ref.flowside); > const struct flowside *ini = &pingf->f.side[INISIDE]; > @@ -84,7 +86,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) > > n = recvfrom(ref.fd, buf, sizeof(buf), 0, &sr.sa, &sl); > if (n < 0) { > - flow_perror(pingf, "recvfrom() error"); > + flow_perror_ratelimit(pingf, now, "recvfrom() error"); > return; > } > > @@ -142,7 +144,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) > return; > > unexpected: > - flow_err(pingf, "Unexpected packet on ping socket"); > + flow_err_ratelimit(pingf, now, "Unexpected packet on ping socket"); > } > > /** > @@ -165,12 +167,15 @@ static void icmp_ping_close(const struct ctx *c, > * @id: ICMP id for the new socket > * @saddr: Source address > * @daddr: Destination address > + * @now: Current timestamp > * > * Return: newly opened ping flow, or NULL on failure > */ > static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > sa_family_t af, uint16_t id, > - const void *saddr, const void *daddr) > + const void *saddr, > + const void *daddr, > + const struct timespec *now) > { > uint8_t proto = af == AF_INET ? IPPROTO_ICMP : IPPROTO_ICMPV6; > uint8_t flowtype = af == AF_INET ? FLOW_PING4 : FLOW_PING6; > @@ -186,10 +191,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, > goto cancel; > > if (flow->f.pif[TGTSIDE] != PIF_HOST) { > - flow_err(flow, "No support for forwarding %s from %s to %s", > - proto == IPPROTO_ICMP ? "ICMP" : "ICMPv6", > - pif_name(flow->f.pif[INISIDE]), > - pif_name(flow->f.pif[TGTSIDE])); > + flow_err_ratelimit( > + flow, now, "No support for forwarding %s from %s to %s", > + proto == IPPROTO_ICMP ? "ICMP" : "ICMPv6", > + pif_name(flow->f.pif[INISIDE]), > + pif_name(flow->f.pif[TGTSIDE])); > goto cancel; > } > > @@ -299,7 +305,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > > if (flow) > pingf = &flow->ping; > - else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr))) > + else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr, now))) > return 1; > > tgt = &pingf->f.side[TGTSIDE]; > @@ -317,7 +323,8 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > msh.msg_flags = 0; > > if (sendmsg(pingf->sock, &msh, MSG_NOSIGNAL) < 0) { > - flow_dbg_perror(pingf, "failed to relay request to socket"); > + flow_warn_perror_ratelimit(pingf, now, > + "failed to relay request to socket"); > } else { > flow_dbg(pingf, > "echo request to socket, ID: %"PRIu16", seq: %"PRIu16, > diff --git a/icmp.h b/icmp.h > index e8b928b6..08870dc7 100644 > --- a/icmp.h > +++ b/icmp.h > @@ -13,7 +13,8 @@ > struct ctx; > struct icmp_ping_flow; > > -void icmp_sock_handler(const struct ctx *c, union epoll_ref ref); > +void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, > + const struct timespec *now); > int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > const void *saddr, const void *daddr, > struct iov_tail *data, const struct timespec *now); > diff --git a/migrate.c b/migrate.c > index 8937b85f..b233dc7f 100644 > --- a/migrate.c > +++ b/migrate.c > @@ -48,12 +48,14 @@ struct migrate_seen_addrs_v2 { > * @c: Execution context > * @stage: Migration stage, unused > * @fd: File descriptor for state transfer > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > /* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */ > static int seen_addrs_source_v2(struct ctx *c, > - const struct migrate_stage *stage, int fd) > + const struct migrate_stage *stage, int fd, > + const struct timespec *now) > { > struct migrate_seen_addrs_v2 addrs = { > .addr6 = c->ip6.addr_seen, > @@ -62,6 +64,7 @@ static int seen_addrs_source_v2(struct ctx *c, > }; > > (void)stage; > + (void)now; > > memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac)); > > @@ -76,15 +79,18 @@ static int seen_addrs_source_v2(struct ctx *c, > * @c: Execution context > * @stage: Migration stage, unused > * @fd: File descriptor for state transfer > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > static int seen_addrs_target_v2(struct ctx *c, > - const struct migrate_stage *stage, int fd) > + const struct migrate_stage *stage, int fd, > + const struct timespec *now) > { > struct migrate_seen_addrs_v2 addrs; > > (void)stage; > + (void)now; > > if (read_all_buf(fd, &addrs, sizeof(addrs))) > return errno; > @@ -133,10 +139,11 @@ static const struct migrate_version versions[] = { > * migrate_source() - Migration as source, send state to hypervisor > * @c: Execution context > * @fd: File descriptor for state transfer > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > -static int migrate_source(struct ctx *c, int fd) > +static int migrate_source(struct ctx *c, int fd, const struct timespec *now) > { > const struct migrate_version *v = CURRENT_VERSION; > const struct migrate_header header = { > @@ -159,7 +166,7 @@ static int migrate_source(struct ctx *c, int fd) > > debug("Source side migration stage: %s", s->name); > > - if ((ret = s->source(c, s, fd))) { > + if ((ret = s->source(c, s, fd, now))) { > err("Source migration stage: %s: %s, abort", s->name, > strerror_(ret)); > return ret; > @@ -209,10 +216,11 @@ static const struct migrate_version *migrate_target_read_header(int fd) > * migrate_target() - Migration as target, receive state from hypervisor > * @c: Execution context > * @fd: File descriptor for state transfer > + * @now: Current timestamp > * > * Return: 0 on success, positive error code on failure > */ > -static int migrate_target(struct ctx *c, int fd) > +static int migrate_target(struct ctx *c, int fd, const struct timespec *now) > { > const struct migrate_version *v; > const struct migrate_stage *s; > @@ -227,7 +235,7 @@ static int migrate_target(struct ctx *c, int fd) > > debug("Target side migration stage: %s", s->name); > > - if ((ret = s->target(c, s, fd))) { > + if ((ret = s->target(c, s, fd, now))) { > err("Target migration stage: %s: %s, abort", s->name, > strerror_(ret)); > return ret; > @@ -282,8 +290,9 @@ void migrate_request(struct ctx *c, int fd, bool target) > /** > * migrate_handler() - Send/receive passt internal state to/from hypervisor > * @c: Execution context > + * @now: Current timestamp > */ > -void migrate_handler(struct ctx *c) > +void migrate_handler(struct ctx *c, const struct timespec *now) > { > int rc; > > @@ -294,9 +303,9 @@ void migrate_handler(struct ctx *c) > c->device_state_fd, c->migrate_target); > > if (c->migrate_target) > - rc = migrate_target(c, c->device_state_fd); > + rc = migrate_target(c, c->device_state_fd, now); > else > - rc = migrate_source(c, c->device_state_fd); > + rc = migrate_source(c, c->device_state_fd, now); > > migrate_close(c); > > diff --git a/migrate.h b/migrate.h > index 844be477..515fa3a7 100644 > --- a/migrate.h > +++ b/migrate.h > @@ -30,8 +30,10 @@ struct migrate_header { > */ > struct migrate_stage { > const char *name; > - int (*source)(struct ctx *c, const struct migrate_stage *stage, int fd); > - int (*target)(struct ctx *c, const struct migrate_stage *stage, int fd); > + int (*source)(struct ctx *c, const struct migrate_stage *stage, int fd, > + const struct timespec *now); > + int (*target)(struct ctx *c, const struct migrate_stage *stage, int fd, > + const struct timespec *now); > > /* Add here separate rollback callbacks if needed */ > }; > @@ -49,6 +51,6 @@ struct migrate_version { > void migrate_init(struct ctx *c); > void migrate_close(struct ctx *c); > void migrate_request(struct ctx *c, int fd, bool target); > -void migrate_handler(struct ctx *c); > +void migrate_handler(struct ctx *c, const struct timespec *now); > > #endif /* MIGRATE_H */ > diff --git a/passt.c b/passt.c > index b1d35d5b..65a07d72 100644 > --- a/passt.c > +++ b/passt.c > @@ -258,7 +258,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) > pasta_netns_quit_timer_handler(c, ref); > break; > case EPOLL_TYPE_TCP: > - tcp_sock_handler(c, ref, eventmask); > + tcp_sock_handler(c, ref, eventmask, &now); > break; > case EPOLL_TYPE_TCP_SPLICE: > tcp_splice_sock_handler(c, ref, eventmask, &now); > @@ -267,7 +267,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) > tcp_listen_handler(c, ref, &now); > break; > case EPOLL_TYPE_TCP_TIMER: > - tcp_timer_handler(c, ref); > + tcp_timer_handler(c, ref, &now); > break; > case EPOLL_TYPE_UDP_LISTEN: > udp_listen_sock_handler(c, ref, eventmask, &now); > @@ -276,7 +276,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) > udp_sock_handler(c, ref, eventmask, &now); > break; > case EPOLL_TYPE_PING: > - icmp_sock_handler(c, ref); > + icmp_sock_handler(c, ref, &now); > break; > case EPOLL_TYPE_VHOST_CMD: > vu_control_handler(c->vdev, c->fd_tap, eventmask); > @@ -309,7 +309,7 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events) > > post_handler(c, &now); > > - migrate_handler(c); > + migrate_handler(c, &now); > } > > /** > diff --git a/tcp.c b/tcp.c > index 3538eec2..43ba4df4 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -464,17 +464,21 @@ static struct tcp_tap_conn *conn_at_sidx(flow_sidx_t sidx) > * tcp_set_peek_offset() - Set SO_PEEK_OFF offset on connection if supported > * @conn: Pointer to the TCP connection structure > * @offset: Offset in bytes > + * @now: Current timestamp > * > * Return: -1 when it fails, 0 otherwise. > */ > -int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset) > +int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset, > + const struct timespec *now) > { > if (!peek_offset_cap) > return 0; > > if (setsockopt(conn->sock, SOL_SOCKET, SO_PEEK_OFF, > &offset, sizeof(offset))) { > - flow_perror(conn, "Failed to set SO_PEEK_OFF to %i", offset); > + flow_perror_ratelimit(conn, now, > + "Failed to set SO_PEEK_OFF to %i", > + offset); > return -1; > } > return 0; > @@ -545,9 +549,12 @@ static int tcp_epoll_ctl(struct tcp_tap_conn *conn) > * tcp_timer_ctl() - Set timerfd based on flags/events, create timerfd if needed > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > + * > * #syscalls timerfd_create timerfd_settime|timerfd_settime32 > */ > -static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > +static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > struct itimerspec it = { { 0 }, { 0 } }; > > @@ -560,12 +567,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > > fd = timerfd_create(CLOCK_MONOTONIC, 0); > if (fd == -1) { > - flow_dbg_perror(conn, "failed to get timer"); > + flow_perror_ratelimit(conn, now, "failed to get timer"); > return; > } > if (fd > FD_REF_MAX) { > - flow_dbg(conn, "timer fd overflow (%d > %d)", > - fd, FD_REF_MAX); > + flow_err_ratelimit(conn, now, > + "timer fd overflow (%d > %d)", > + fd, FD_REF_MAX); > close(fd); > return; > } > @@ -575,7 +583,7 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > ref.fd = fd; > if (epoll_add(flow_epollfd(&conn->f), EPOLLIN | EPOLLET, > ref) < 0) { > - flow_dbg(conn, "failed to add timer"); > + flow_perror_ratelimit(conn, now, "failed to add timer"); > close(fd); > return; > } > @@ -622,9 +630,10 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > * @c: Execution context > * @conn: Connection pointer > * @flag: Flag to set, or ~flag to unset > + * @now: Current timestamp > */ > void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, > - unsigned long flag) > + unsigned long flag, const struct timespec *now) > { > if (flag & (flag - 1)) { > int flag_index = fls(~flag); > @@ -646,7 +655,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, > * flags and factor this into the logic below. > */ > if (flag == ACK_FROM_TAP_DUE) > - tcp_timer_ctl(c, conn); > + tcp_timer_ctl(c, conn, now); > > return; > } > @@ -662,7 +671,7 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, > if (flag == ACK_FROM_TAP_DUE || flag == ACK_TO_TAP_DUE || > (flag == ~ACK_FROM_TAP_DUE && (conn->flags & ACK_TO_TAP_DUE)) || > (flag == ~ACK_TO_TAP_DUE && (conn->flags & ACK_FROM_TAP_DUE))) > - tcp_timer_ctl(c, conn); > + tcp_timer_ctl(c, conn, now); > } > > /** > @@ -670,9 +679,10 @@ void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection pointer > * @event: Connection event > + * @now: Current timestamp > */ > void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, > - unsigned long event) > + unsigned long event, const struct timespec *now) > { > int prev, new, num = fls(event); > > @@ -710,7 +720,7 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, > num == -1 ? "CLOSED" : tcp_event_str[num]); > > if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD)) { > - conn_flag(c, conn, ACTIVE_CLOSE); > + conn_flag(c, conn, ACTIVE_CLOSE, now); > } else { > if (event == CLOSED) > flow_hash_remove(c, TAP_SIDX(conn)); > @@ -1101,13 +1111,15 @@ static uint32_t tcp_wnd_from_sndbuf(int s, struct tcp_tap_conn *conn, > * @conn: Connection pointer > * @force_seq: Force ACK sequence to latest segment, instead of checking socket > * @tinfo: tcp_info from kernel, can be NULL if not pre-fetched > + * @now: Current timestamp > * > * Return: 1 if sequence or window were updated, 0 otherwise > * > * #syscalls ioctl > */ > int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > - bool force_seq, struct tcp_info_linux *tinfo) > + bool force_seq, struct tcp_info_linux *tinfo, > + const struct timespec *now) > { > uint32_t prev_wnd_to_tap = conn->wnd_to_tap << conn->ws_to_tap; > uint32_t prev_ack_to_tap = conn->seq_ack_to_tap; > @@ -1221,7 +1233,7 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > */ > /* cppcheck-suppress [knownConditionTrueFalse, unmatchedSuppression] */ > if (!conn->wnd_to_tap) > - conn_flag(c, conn, ACK_TO_TAP_DUE); > + conn_flag(c, conn, ACK_TO_TAP_DUE, now); > > out: > /* Opportunistically store RTT approximation on valid TCP_INFO data */ > @@ -1237,17 +1249,19 @@ out: > * @c: Execution context > * @conn: Connection pointer > * @seq: Current ACK sequence, host order > + * @now: Current timestamp > */ > static void tcp_update_seqack_from_tap(const struct ctx *c, > - struct tcp_tap_conn *conn, uint32_t seq) > + struct tcp_tap_conn *conn, uint32_t seq, > + const struct timespec *now) > { > if (seq == conn->seq_to_tap) > - conn_flag(c, conn, ~ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ~ACK_FROM_TAP_DUE, now); > > if (SEQ_GT(seq, conn->seq_ack_from_tap)) { > /* Forward progress, but more data to acknowledge: reschedule */ > if (SEQ_LT(seq, conn->seq_to_tap)) > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > > conn->retries = 0; > conn->seq_ack_from_tap = seq; > @@ -1258,16 +1272,18 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, > * tcp_rewind_seq() - Rewind sequence to tap and socket offset to current ACK > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > * > * Return: 0 on success, -1 on failure, with connection reset > */ > -static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) > +static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > conn->seq_to_tap = conn->seq_ack_from_tap; > conn->events &= ~TAP_FIN_SENT; > > - if (tcp_set_peek_offset(conn, 0)) { > - tcp_rst(c, conn); > + if (tcp_set_peek_offset(conn, 0, now)) { > + tcp_rst(c, conn, now); > return -1; > } > > @@ -1282,6 +1298,7 @@ static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) > * @th: TCP header to update > * @opts: TCP option buffer (output parameter) > * @optlen: size of the TCP option buffer (output parameter) > + * @now: Current timestamp > * > * Return: < 0 error code on connection reset, > * 0 if there is no flag to send > @@ -1289,7 +1306,7 @@ static int tcp_rewind_seq(const struct ctx *c, struct tcp_tap_conn *conn) > */ > int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, > int flags, struct tcphdr *th, struct tcp_syn_opts *opts, > - size_t *optlen) > + size_t *optlen, const struct timespec *now) > { > struct tcp_info_linux tinfo = { 0 }; > socklen_t sl = sizeof(tinfo); > @@ -1297,19 +1314,19 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, > > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && > !flags && conn->wnd_to_tap) { > - conn_flag(c, conn, ~ACK_TO_TAP_DUE); > + conn_flag(c, conn, ~ACK_TO_TAP_DUE, now); > return 0; > } > > if (getsockopt(s, SOL_TCP, TCP_INFO, &tinfo, &sl)) { > - conn_event(c, conn, CLOSED); > + conn_event(c, conn, CLOSED, now); > return -ECONNRESET; > } > > if (!(conn->flags & LOCAL)) > tcp_rtt_dst_check(conn, &tinfo); > > - if (!tcp_update_seqack_wnd(c, conn, !!flags, &tinfo) && !flags) > + if (!tcp_update_seqack_wnd(c, conn, !!flags, &tinfo, now) && !flags) > return 0; > > *optlen = 0; > @@ -1351,13 +1368,13 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, > if (th->ack) { > if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && > conn->wnd_to_tap) > - conn_flag(c, conn, ~ACK_TO_TAP_DUE); > + conn_flag(c, conn, ~ACK_TO_TAP_DUE, now); > else > - conn_flag(c, conn, ACK_TO_TAP_DUE); > + conn_flag(c, conn, ACK_TO_TAP_DUE, now); > } > > if (th->fin) > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > > /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ > if (th->fin || th->syn) > @@ -1371,18 +1388,19 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection pointer > * @flags: TCP flags: if not set, send segment only if ACK is due > + * @now: Current timestamp > * > * Return: negative error code on fatal connection failure, 0 otherwise > */ > static int tcp_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, > - int flags) > + int flags, const struct timespec *now) > { > int ret; > > if (c->mode == MODE_VU) > - ret = tcp_vu_send_flag(c, conn, flags); > + ret = tcp_vu_send_flag(c, conn, flags, now); > else > - ret = tcp_buf_send_flag(c, conn, flags); > + ret = tcp_buf_send_flag(c, conn, flags, now); > > return ret == -EAGAIN ? 0 : ret; > } > @@ -1413,28 +1431,32 @@ void tcp_linger0_(const struct flow_common *f, int s) > * tcp_sock_rst() - Close TCP connection forcing RST on socket side > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > */ > -static void tcp_sock_rst(const struct ctx *c, struct tcp_tap_conn *conn) > +static void tcp_sock_rst(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > /* Force RST on socket to inform the peer */ > tcp_linger0(conn, conn->sock); > - conn_event(c, conn, CLOSED); > + conn_event(c, conn, CLOSED, now); > } > > /** > * tcp_rst_do() - Reset a tap connection: send RST segment on both sides, close > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > */ > -void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn) > +void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > if (conn->events == CLOSED) > return; > > /* Send RST on tap */ > - tcp_send_flag(c, conn, RST); > + tcp_send_flag(c, conn, RST, now); > > - tcp_sock_rst(c, conn); > + tcp_sock_rst(c, conn, now); > } > > /** > @@ -1459,11 +1481,13 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection pointer > * @wnd: Window value, host order, unscaled > + * @now: Current timestamp > * > * Return: false on zero window (not stored to wnd_from_tap), true otherwise > */ > static bool tcp_tap_window_update(const struct ctx *c, > - struct tcp_tap_conn *conn, unsigned wnd) > + struct tcp_tap_conn *conn, unsigned wnd, > + const struct timespec *now) > { > wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > @@ -1474,7 +1498,7 @@ static bool tcp_tap_window_update(const struct ctx *c, > * that no data beyond the updated window will be acknowledged. > */ > if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > - tcp_rewind_seq(c, conn); > + tcp_rewind_seq(c, conn, now); > return false; > } > > @@ -1600,9 +1624,11 @@ static uint16_t tcp_conn_tap_mss(const struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection entry for socket to bind > * @s: Outbound TCP socket > + * @now: Current timestamp > */ > static void tcp_bind_outbound(const struct ctx *c, > - const struct tcp_tap_conn *conn, int s) > + const struct tcp_tap_conn *conn, int s, > + const struct timespec *now) > { > const struct flowside *tgt = &conn->f.side[TGTSIDE]; > union sockaddr_inany bind_sa; > @@ -1613,10 +1639,11 @@ static void tcp_bind_outbound(const struct ctx *c, > if (bind(s, &bind_sa.sa, socklen_inany(&bind_sa))) { > char sstr[INANY_ADDRSTRLEN]; > > - flow_dbg_perror(conn, > - "Can't bind TCP outbound socket to %s:%hu", > - inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)), > - tgt->oport); > + flow_warn_perror_ratelimit( > + conn, now, > + "Can't bind TCP outbound socket to %s:%hu", > + inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)), > + tgt->oport); > } > } > > @@ -1625,9 +1652,10 @@ static void tcp_bind_outbound(const struct ctx *c, > if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, > c->ip4.ifname_out, > strlen(c->ip4.ifname_out))) { > - flow_dbg_perror(conn, > - "Can't bind IPv4 TCP socket to interface %s", > - c->ip4.ifname_out); > + flow_warn_perror_ratelimit( > + conn, now, > + "Can't bind IPv4 TCP socket to interface %s", > + c->ip4.ifname_out); > } > } > } else if (bind_sa.sa_family == AF_INET6) { > @@ -1635,9 +1663,10 @@ static void tcp_bind_outbound(const struct ctx *c, > if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, > c->ip6.ifname_out, > strlen(c->ip6.ifname_out))) { > - flow_dbg_perror(conn, > - "Can't bind IPv6 TCP socket to interface %s", > - c->ip6.ifname_out); > + flow_warn_perror_ratelimit( > + conn, now, > + "Can't bind IPv6 TCP socket to interface %s", > + c->ip6.ifname_out); > } > } > } > @@ -1681,9 +1710,11 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > goto cancel; > > if (flow->f.pif[TGTSIDE] != PIF_HOST) { > - flow_err(flow, "No support for forwarding TCP from %s to %s", > - pif_name(flow->f.pif[INISIDE]), > - pif_name(flow->f.pif[TGTSIDE])); > + flow_err_ratelimit( > + flow, now, > + "No support for forwarding TCP from %s to %s", > + pif_name(flow->f.pif[INISIDE]), > + pif_name(flow->f.pif[TGTSIDE])); > goto cancel; > } > > @@ -1716,7 +1747,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > */ > if (bind(s, &sa.sa, socklen_inany(&sa))) { > if (errno != EADDRNOTAVAIL && errno != EACCES) > - conn_flag(c, conn, LOCAL); > + conn_flag(c, conn, LOCAL, now); > } else { > /* Not a local, bound destination, inconclusive test */ > close(s); > @@ -1728,10 +1759,10 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > conn->timer = -1; > flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); > if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, s, TGTSIDE) < 0) { > - flow_perror(flow, "Can't register with epoll"); > + flow_perror_ratelimit(flow, now, "Can't register with epoll"); > goto cancel; > } > - conn_event(c, conn, TAP_SYN_RCVD); > + conn_event(c, conn, TAP_SYN_RCVD, now); > > conn->wnd_to_tap = WINDOW_DEFAULT; > > @@ -1756,11 +1787,11 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > conn->seq_to_tap = tcp_init_seq(hash, now); > conn->seq_ack_from_tap = conn->seq_to_tap; > > - tcp_bind_outbound(c, conn, s); > + tcp_bind_outbound(c, conn, s, now); > > if (connect(s, &sa.sa, socklen_inany(&sa))) { > if (errno != EINPROGRESS) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > goto cancel; > } > > @@ -1768,10 +1799,10 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, > } else { > tcp_get_sndbuf(conn); > > - if (tcp_send_flag(c, conn, SYN | ACK)) > + if (tcp_send_flag(c, conn, SYN | ACK, now)) > goto cancel; > > - conn_event(c, conn, TAP_SYN_ACK_SENT); > + conn_event(c, conn, TAP_SYN_ACK_SENT, now); > } > > tcp_epoll_ctl(conn); > @@ -1830,12 +1861,14 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) > * tcp_data_from_sock() - Handle new data from socket, queue to tap, in window > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > * > * Return: negative on connection reset, 0 otherwise > * > * #syscalls recvmsg > */ > -static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > +static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; > uint32_t already_sent; > @@ -1845,8 +1878,8 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", > conn->seq_ack_from_tap, conn->seq_to_tap); > conn->seq_to_tap = conn->seq_ack_from_tap; > - if (tcp_set_peek_offset(conn, 0)) { > - tcp_rst(c, conn); > + if (tcp_set_peek_offset(conn, 0, now)) { > + tcp_rst(c, conn, now); > return -1; > } > } > @@ -1855,16 +1888,16 @@ static int tcp_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; > > if (!wnd_scaled || already_sent >= wnd_scaled) { > - conn_flag(c, conn, ACK_FROM_TAP_BLOCKS); > - conn_flag(c, conn, STALLED); > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_BLOCKS, now); > + conn_flag(c, conn, STALLED, now); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > return 0; > } > > if (c->mode == MODE_VU) > - return tcp_vu_data_from_sock(c, conn, already_sent); > + return tcp_vu_data_from_sock(c, conn, already_sent, now); > > - return tcp_buf_data_from_sock(c, conn, already_sent); > + return tcp_buf_data_from_sock(c, conn, already_sent, now); > } > > /** > @@ -1890,13 +1923,15 @@ static ssize_t tcp_packet_data_len(const struct tcphdr *th, size_t l4len) > * @conn: Connection pointer > * @p: Pool of TCP packets, with TCP headers > * @idx: Index of first data packet in pool > + * @now: Current timestamp > * > * #syscalls sendmsg > * > * Return: count of consumed packets > */ > static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > - const struct pool *p, int idx) > + const struct pool *p, int idx, > + const struct timespec *now) > { > int i, iov_i, ack = 0, fin = 0, retr = 0, keep = -1, partial_send = 0; > uint16_t max_ack_seq_wnd = conn->wnd_from_tap; > @@ -1933,7 +1968,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > return -1; > > if (th->rst) { > - tcp_sock_rst(c, conn); > + tcp_sock_rst(c, conn, now); > return 1; > } > > @@ -1946,10 +1981,10 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > "keep-alive sequence: %u, previous: %u", > seq, conn->seq_from_tap); > > - if (tcp_send_flag(c, conn, ACK)) > + if (tcp_send_flag(c, conn, ACK, now)) > return -1; > > - tcp_timer_ctl(c, conn); > + tcp_timer_ctl(c, conn, now); > > if (setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, > &((int){ 1 }), sizeof(int))) > @@ -1957,7 +1992,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > if (p->count == 1) { > tcp_tap_window_update(c, conn, > - ntohs(th->window)); > + ntohs(th->window), now); > return 1; > } > > @@ -1983,7 +2018,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > * well. > */ > if (!ntohs(th->window)) > - tcp_rewind_seq(c, conn); > + tcp_rewind_seq(c, conn, now); > > max_ack_seq_wnd = ntohs(th->window); > max_ack_seq = ack_seq; > @@ -2043,19 +2078,19 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > /* On socket flush failure, pretend there was no ACK, try again later */ > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > - tcp_update_seqack_from_tap(c, conn, max_ack_seq); > + tcp_update_seqack_from_tap(c, conn, max_ack_seq, now); > > - tcp_tap_window_update(c, conn, max_ack_seq_wnd); > + tcp_tap_window_update(c, conn, max_ack_seq_wnd, now); > > if (retr) { > flow_trace(conn, > "fast re-transmit, ACK: %u, previous sequence: %u", > conn->seq_ack_from_tap, conn->seq_to_tap); > > - if (tcp_rewind_seq(c, conn)) > + if (tcp_rewind_seq(c, conn, now)) > return -1; > > - tcp_data_from_sock(c, conn); > + tcp_data_from_sock(c, conn, now); > } > > if (!iov_i) > @@ -2071,7 +2106,7 @@ eintr: > * Then swiftly looked away and left. > */ > conn->seq_from_tap = seq_from_tap; > - if (tcp_send_flag(c, conn, ACK)) > + if (tcp_send_flag(c, conn, ACK, now)) > return -1; > } > > @@ -2079,7 +2114,7 @@ eintr: > goto eintr; > > if (errno == EAGAIN || errno == EWOULDBLOCK) { > - if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > + if (tcp_send_flag(c, conn, ACK | DUP_ACK, now)) > return -1; > > uint32_t events = tcp_conn_epoll_events(conn->events, > @@ -2115,7 +2150,7 @@ out: > */ > if (conn->seq_dup_ack_approx != (conn->seq_from_tap & 0xff)) { > conn->seq_dup_ack_approx = conn->seq_from_tap & 0xff; > - if (tcp_send_flag(c, conn, ACK | DUP_ACK)) > + if (tcp_send_flag(c, conn, ACK | DUP_ACK, now)) > return -1; > } > return p->count - idx; > @@ -2123,14 +2158,14 @@ out: > > if (ack && conn->events & TAP_FIN_SENT && > conn->seq_ack_from_tap == conn->seq_to_tap) > - conn_event(c, conn, TAP_FIN_ACKED); > + conn_event(c, conn, TAP_FIN_ACKED, now); > > if (fin && !partial_send) { > conn->seq_from_tap++; > > - conn_event(c, conn, TAP_FIN_RCVD); > + conn_event(c, conn, TAP_FIN_RCVD, now); > } else { > - if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED, now)) > return -1; > } > > @@ -2144,13 +2179,15 @@ out: > * @th: TCP header of SYN, ACK segment: caller MUST ensure it's there > * @opts: Pointer to start of options > * @optlen: Bytes in options: caller MUST ensure available length > + * @now: Current timestamp > */ > static void tcp_conn_from_sock_finish(const struct ctx *c, > struct tcp_tap_conn *conn, > const struct tcphdr *th, > - const char *opts, size_t optlen) > + const char *opts, size_t optlen, > + const struct timespec *now) > { > - tcp_tap_window_update(c, conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window), now); > tcp_get_tap_ws(conn, opts, optlen); > > /* First value is not scaled */ > @@ -2163,21 +2200,21 @@ static void tcp_conn_from_sock_finish(const struct ctx *c, > conn->seq_from_tap = conn->seq_init_from_tap; > conn->seq_ack_to_tap = conn->seq_from_tap; > > - conn_event(c, conn, ESTABLISHED); > - if (tcp_set_peek_offset(conn, 0)) { > - tcp_rst(c, conn); > + conn_event(c, conn, ESTABLISHED, now); > + if (tcp_set_peek_offset(conn, 0, now)) { > + tcp_rst(c, conn, now); > return; > } > > - if (tcp_send_flag(c, conn, ACK)) { > - tcp_rst(c, conn); > + if (tcp_send_flag(c, conn, ACK, now)) { > + tcp_rst(c, conn, now); > return; > } > > /* The client might have sent data already, which we didn't > * dequeue waiting for SYN,ACK from tap -- check now. > */ > - tcp_data_from_sock(c, conn); > + tcp_data_from_sock(c, conn, now); > } > > /** > @@ -2318,7 +2355,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > flow_trace(conn, "packet length %zu from tap", l4len); > > if (th->rst) { > - tcp_sock_rst(c, conn); > + tcp_sock_rst(c, conn, now); > return 1; > } > > @@ -2326,12 +2363,13 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > conn->tap_inactive = false; > > if (th->ack && !(conn->events & ESTABLISHED)) > - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq), now); > > /* Establishing connection from socket */ > if (conn->events & SOCK_ACCEPTED) { > if (th->syn && th->ack && !th->fin) { > - tcp_conn_from_sock_finish(c, conn, th, opts, optlen); > + tcp_conn_from_sock_finish(c, conn, th, opts, optlen, > + now); > return 1; > } > > @@ -2346,22 +2384,23 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > if (!(conn->events & TAP_SYN_ACK_SENT)) > goto reset; > > - conn_event(c, conn, ESTABLISHED); > - if (tcp_set_peek_offset(conn, 0)) > + conn_event(c, conn, ESTABLISHED, now); > + if (tcp_set_peek_offset(conn, 0, now)) > goto reset; > > if (th->fin) { > conn->seq_from_tap++; > > if (shutdown(conn->sock, SHUT_WR) < 0) { > - flow_dbg_perror(conn, "shutdown() failed"); > + flow_warn_perror_ratelimit(conn, now, > + "shutdown() failed"); > goto reset; > } > > - if (tcp_send_flag(c, conn, ACK)) > + if (tcp_send_flag(c, conn, ACK, now)) > goto reset; > > - conn_event(c, conn, SOCK_FIN_SENT); > + conn_event(c, conn, SOCK_FIN_SENT, now); > > return 1; > } > @@ -2369,8 +2408,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > if (!th->ack) > goto reset; > > - if (tcp_tap_window_update(c, conn, ntohs(th->window))) > - tcp_data_from_sock(c, conn); > + if (tcp_tap_window_update(c, conn, ntohs(th->window), now)) > + tcp_data_from_sock(c, conn, now); > > if (p->count - idx == 1) > return 1; > @@ -2394,38 +2433,40 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > * later > */ > if (th->ack && !tcp_sock_consume(conn, ntohl(th->ack_seq))) > - tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > + tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq), > + now); > > if (retr) { > flow_trace(conn, > "fast re-transmit, ACK: %u, previous sequence: %u", > ntohl(th->ack_seq), conn->seq_to_tap); > > - if (tcp_rewind_seq(c, conn)) > + if (tcp_rewind_seq(c, conn, now)) > return -1; > } > > - if (tcp_tap_window_update(c, conn, ntohs(th->window)) || retr) > - tcp_data_from_sock(c, conn); > + if (tcp_tap_window_update(c, conn, ntohs(th->window), now) || > + retr) > + tcp_data_from_sock(c, conn, now); > > if (conn->seq_ack_from_tap == conn->seq_to_tap) { > if (th->ack && conn->events & TAP_FIN_SENT) > - conn_event(c, conn, TAP_FIN_ACKED); > + conn_event(c, conn, TAP_FIN_ACKED, now); > > if (conn->events & SOCK_FIN_RCVD && > conn->events & TAP_FIN_ACKED) > - conn_event(c, conn, CLOSED); > + conn_event(c, conn, CLOSED, now); > } > > return 1; > } > > /* Established connections accepting data from tap */ > - count = tcp_data_from_tap(c, conn, p, idx); > + count = tcp_data_from_tap(c, conn, p, idx, now); > if (count == -1) > goto reset; > > - conn_flag(c, conn, ~STALLED); > + conn_flag(c, conn, ~STALLED, now); > > if (conn->seq_ack_to_tap != conn->seq_from_tap) > ack_due = 1; > @@ -2435,12 +2476,13 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > struct tcp_info tinfo; > > if (shutdown(conn->sock, SHUT_WR) < 0) { > - flow_dbg_perror(conn, "shutdown() failed"); > + flow_warn_perror_ratelimit(conn, now, > + "shutdown() failed"); > goto reset; > } > > - conn_event(c, conn, SOCK_FIN_SENT); > - if (tcp_send_flag(c, conn, ACK)) > + conn_event(c, conn, SOCK_FIN_SENT, now); > + if (tcp_send_flag(c, conn, ACK, now)) > goto reset; > > ack_due = 0; > @@ -2461,7 +2503,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > } > > if (ack_due) > - conn_flag(c, conn, ACK_TO_TAP_DUE); > + conn_flag(c, conn, ACK_TO_TAP_DUE, now); > > return count; > > @@ -2470,7 +2512,7 @@ reset: > * remaining packets in the batch, since they'd be invalidated when our > * RST is received, even if otherwise good. > */ > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return p->count - idx; > } > > @@ -2478,25 +2520,27 @@ reset: > * tcp_connect_finish() - Handle completion of connect() from EPOLLOUT event > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > */ > -static void tcp_connect_finish(const struct ctx *c, struct tcp_tap_conn *conn) > +static void tcp_connect_finish(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now) > { > socklen_t sl; > int so; > > sl = sizeof(so); > if (getsockopt(conn->sock, SOL_SOCKET, SO_ERROR, &so, &sl) || so) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return; > } > > - if (tcp_send_flag(c, conn, SYN | ACK)) { > - tcp_rst(c, conn); > + if (tcp_send_flag(c, conn, SYN | ACK, now)) { > + tcp_rst(c, conn, now); > return; > } > > - conn_event(c, conn, TAP_SYN_ACK_SENT); > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_event(c, conn, TAP_SYN_ACK_SENT, now); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > } > > /** > @@ -2519,13 +2563,13 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow, > > flow_epollid_set(&conn->f, EPOLLFD_ID_DEFAULT); > if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, s, INISIDE) < 0) { > - flow_perror(flow, "Can't register with epoll"); > - conn_flag(c, conn, CLOSING); > + flow_perror_ratelimit(flow, now, "Can't register with epoll"); > + conn_flag(c, conn, CLOSING, now); > FLOW_ACTIVATE(conn); > return; > } > > - conn_event(c, conn, SOCK_ACCEPTED); > + conn_event(c, conn, SOCK_ACCEPTED, now); > > hash = flow_hash_insert(c, TAP_SIDX(conn)); > conn->seq_to_tap = tcp_init_seq(hash, now); > @@ -2534,13 +2578,13 @@ static void tcp_tap_conn_from_sock(const struct ctx *c, union flow *flow, > > conn->wnd_from_tap = WINDOW_DEFAULT; > > - if (tcp_send_flag(c, conn, SYN)) { > - conn_flag(c, conn, CLOSING); > + if (tcp_send_flag(c, conn, SYN, now)) { > + conn_flag(c, conn, CLOSING, now); > FLOW_ACTIVATE(conn); > return; > } > > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > > tcp_get_sndbuf(conn); > > @@ -2598,7 +2642,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, > switch (flow->f.pif[TGTSIDE]) { > case PIF_SPLICE: > case PIF_HOST: > - tcp_splice_conn_from_sock(c, flow, s); > + tcp_splice_conn_from_sock(c, flow, s, now); > break; > > case PIF_TAP: > @@ -2606,9 +2650,11 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, > break; > > default: > - flow_err(flow, "No support for forwarding TCP from %s to %s", > - pif_name(flow->f.pif[INISIDE]), > - pif_name(flow->f.pif[TGTSIDE])); > + flow_err_ratelimit( > + flow, now, > + "No support for forwarding TCP from %s to %s", > + pif_name(flow->f.pif[INISIDE]), > + pif_name(flow->f.pif[TGTSIDE])); > goto rst; > } > > @@ -2625,12 +2671,14 @@ cancel: > * tcp_timer_handler() - timerfd events: close, send ACK, retransmit, or reset > * @c: Execution context > * @ref: epoll reference of timer (not connection) > + * @now: Current timestamp > * > * #syscalls timerfd_gettime|timerfd_gettime64 > * #syscalls arm:timerfd_gettime64 i686:timerfd_gettime64 > * #syscalls arm:timerfd_settime64 i686:timerfd_settime64 > */ > -void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > +void tcp_timer_handler(const struct ctx *c, union epoll_ref ref, > + const struct timespec *now) > { > struct itimerspec check_armed = { { 0 }, { 0 } }; > struct tcp_tap_conn *conn = &FLOW(ref.flow)->tcp; > @@ -2643,17 +2691,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > * and we just set the timer to a new point in the future: discard it. > */ > if (timerfd_gettime(conn->timer, &check_armed)) > - flow_perror(conn, "failed to read timer"); > + flow_perror_ratelimit(conn, now, "failed to read timer"); > > if (check_armed.it_value.tv_sec || check_armed.it_value.tv_nsec) > return; > > if (conn->flags & ACK_TO_TAP_DUE) { > - if (tcp_send_flag(c, conn, ACK_IF_NEEDED)) { > - tcp_rst(c, conn); > + if (tcp_send_flag(c, conn, ACK_IF_NEEDED, now)) { > + tcp_rst(c, conn, now); > return; > } > - tcp_timer_ctl(c, conn); > + tcp_timer_ctl(c, conn, now); > } else if (conn->flags & ACK_FROM_TAP_DUE) { > if (!(conn->events & ESTABLISHED)) { > unsigned int max; > @@ -2662,20 +2710,20 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > max = MIN(TCP_MAX_RETRIES, max); > if (conn->retries >= max) { > flow_dbg(conn, "handshake timeout"); > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > } else { > flow_trace(conn, "SYN timeout, retry"); > - if (tcp_send_flag(c, conn, SYN)) { > - tcp_rst(c, conn); > + if (tcp_send_flag(c, conn, SYN, now)) { > + tcp_rst(c, conn, now); > return; > } > conn->retries++; > - conn_flag(c, conn, SYN_RETRIED); > - tcp_timer_ctl(c, conn); > + conn_flag(c, conn, SYN_RETRIED, now); > + tcp_timer_ctl(c, conn, now); > } > } else if (conn->retries == TCP_MAX_RETRIES) { > flow_dbg(conn, "retransmissions count exceeded"); > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > } else { > flow_dbg(conn, "ACK timeout, retry"); > > @@ -2683,11 +2731,11 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > conn->wnd_from_tap = 1; /* Zero-window probe */ > > conn->retries++; > - if (tcp_rewind_seq(c, conn)) > + if (tcp_rewind_seq(c, conn, now)) > return; > > - tcp_data_from_sock(c, conn); > - tcp_timer_ctl(c, conn); > + tcp_data_from_sock(c, conn, now); > + tcp_timer_ctl(c, conn, now); > } > } > } > @@ -2697,9 +2745,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > * @c: Execution context > * @ref: epoll reference > * @events: epoll events bitmap > + * @now: Current timestamp > */ > void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, > - uint32_t events) > + uint32_t events, const struct timespec *now) > { > struct tcp_tap_conn *conn = conn_at_sidx(ref.flowside); > > @@ -2710,32 +2759,32 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, > return; > > if (events & EPOLLERR) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return; > } > > conn->inactive = false; > > if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) { > - conn_event(c, conn, CLOSED); > + conn_event(c, conn, CLOSED, now); > return; > } > > if (conn->events & ESTABLISHED) { > if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) > - conn_event(c, conn, CLOSED); > + conn_event(c, conn, CLOSED, now); > > if (events & (EPOLLRDHUP | EPOLLHUP)) > - conn_event(c, conn, SOCK_FIN_RCVD); > + conn_event(c, conn, SOCK_FIN_RCVD, now); > > if (events & EPOLLIN) > - tcp_data_from_sock(c, conn); > + tcp_data_from_sock(c, conn, now); > > if (events & EPOLLOUT) { > tcp_epoll_ctl(conn); > - if (tcp_update_seqack_wnd(c, conn, false, NULL) && > - tcp_send_flag(c, conn, ACK)) { > - tcp_rst(c, conn); > + if (tcp_update_seqack_wnd(c, conn, false, NULL, now) && > + tcp_send_flag(c, conn, ACK, now)) { > + tcp_rst(c, conn, now); > return; > } > } > @@ -2745,7 +2794,7 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, > > /* EPOLLHUP during handshake: reset */ > if (events & EPOLLHUP) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return; > } > > @@ -2755,7 +2804,7 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, > > if (conn->events == TAP_SYN_RCVD) { > if (events & EPOLLOUT) > - tcp_connect_finish(c, conn); > + tcp_connect_finish(c, conn, now); > /* Data? Check later */ > } > } > @@ -2939,8 +2988,8 @@ static void tcp_keepalive(struct ctx *c, const struct timespec *now) > if (conn->tap_inactive) { > flow_dbg(conn, "No tap activity for least %us, send keepalive", > KEEPALIVE_INTERVAL); > - if (tcp_send_flag(c, conn, KEEPALIVE)) > - tcp_rst(c, conn); > + if (tcp_send_flag(c, conn, KEEPALIVE, now)) > + tcp_rst(c, conn, now); > } > > /* Ready to check fot next interval */ > @@ -2969,7 +3018,7 @@ static void tcp_inactivity(struct ctx *c, const struct timespec *now) > /* No activity in this interval, reset */ > flow_dbg(conn, "Inactive for at least %us, resetting", > INACTIVITY_INTERVAL); > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > } > > /* Ready to check fot next interval */ > @@ -2985,7 +3034,7 @@ static void tcp_inactivity(struct ctx *c, const struct timespec *now) > /* cppcheck-suppress [constParameterPointer, unmatchedSuppression] */ > void tcp_defer_handler(struct ctx *c, const struct timespec *now) > { > - tcp_payload_flush(c); > + tcp_payload_flush(c, now); > > if (timespec_diff_ms(now, &c->tcp.timer_run) < TCP_TIMER_INTERVAL) > return; > @@ -3532,11 +3581,13 @@ int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn) > * @c: Execution context > * @fd: Descriptor for state migration > * @conn: Pointer to the TCP connection structure > + * @now: Current timesstamp > * > * Return: 0 on success, negative (not -EIO) on failure, -EIO on sending failure > */ > int tcp_flow_migrate_source_ext(const struct ctx *c, > - int fd, const struct tcp_tap_conn *conn) > + int fd, const struct tcp_tap_conn *conn, > + const struct timespec *now) > { > uint32_t peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; > struct tcp_tap_transfer_ext *t = &migrate_ext[FLOW_IDX(conn)]; > @@ -3546,7 +3597,7 @@ int tcp_flow_migrate_source_ext(const struct ctx *c, > /* Disable SO_PEEK_OFF, it will make accessing the queues in repair mode > * weird. > */ > - if (tcp_set_peek_offset(conn, -1)) { > + if (tcp_set_peek_offset(conn, -1, now)) { > rc = -errno; > goto fail; > } > @@ -3811,10 +3862,12 @@ out: > * @c: Execution context > * @conn: Connection entry to complete with extra data > * @fd: Descriptor for state migration > + * @now: Current timestamp > * > * Return: 0 on success, negative on fatal failure, but 0 on single flow failure > */ > -int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd) > +int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, > + int fd, const struct timespec *now) > { > uint32_t peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; > struct tcp_tap_transfer_ext t; > @@ -3968,13 +4021,13 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd > } > } > > - if (tcp_set_peek_offset(conn, peek_offset)) > + if (tcp_set_peek_offset(conn, peek_offset, now)) > goto fail; > > - if (tcp_send_flag(c, conn, ACK)) > + if (tcp_send_flag(c, conn, ACK, now)) > goto fail; > > - tcp_data_from_sock(c, conn); > + tcp_data_from_sock(c, conn, now); > > if ((rc = tcp_epoll_ctl(conn))) { > flow_dbg(conn, > @@ -3992,7 +4045,7 @@ fail: > } > > conn->flags = 0; /* Not waiting for ACK, don't schedule timer */ > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > > return 0; > } > diff --git a/tcp.h b/tcp.h > index 57deef57..b68f882d 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -18,11 +18,12 @@ > > struct ctx; > > -void tcp_timer_handler(const struct ctx *c, union epoll_ref ref); > +void tcp_timer_handler(const struct ctx *c, union epoll_ref ref, > + const struct timespec *now); > void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, > const struct timespec *now); > void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, > - uint32_t events); > + uint32_t events, const struct timespec *now); > int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, > const void *saddr, const void *daddr, uint32_t flow_lbl, > const struct pool *p, int idx, const struct timespec *now); > diff --git a/tcp_buf.c b/tcp_buf.c > index 1fc49959..72c45412 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -106,9 +106,11 @@ void tcp_sock_iov_init(const struct ctx *c) > * @conns: Array of connection pointers corresponding to queued frames > * @frames: Two-dimensional array containing queued frames with sub-iovs > * @num_frames: Number of entries in the two arrays to be compared > + * @now: Current timestamp > */ > static void tcp_revert_seq(const struct ctx *c, struct tcp_tap_conn **conns, > - struct iovec (*frames)[TCP_NUM_IOVS], int num_frames) > + struct iovec (*frames)[TCP_NUM_IOVS], int num_frames, > + const struct timespec *now) > { > int i; > > @@ -123,16 +125,17 @@ static void tcp_revert_seq(const struct ctx *c, struct tcp_tap_conn **conns, > > conn->seq_to_tap = seq; > peek_offset = conn->seq_to_tap - conn->seq_ack_from_tap; > - if (tcp_set_peek_offset(conn, peek_offset)) > - tcp_rst(c, conn); > + if (tcp_set_peek_offset(conn, peek_offset, now)) > + tcp_rst(c, conn, now); > } > } > > /** > * tcp_payload_flush() - Send out buffers for segments with data or flags > * @c: Execution context > + * @now: Current timestamp > */ > -void tcp_payload_flush(const struct ctx *c) > +void tcp_payload_flush(const struct ctx *c, const struct timespec *now) > { > size_t m; > > @@ -140,7 +143,7 @@ void tcp_payload_flush(const struct ctx *c) > tcp_payload_used); > if (m != tcp_payload_used) { > tcp_revert_seq(c, &tcp_frame_conns[m], &tcp_l2_iov[m], > - tcp_payload_used - m); > + tcp_payload_used - m, now); > } > tcp_payload_used = 0; > } > @@ -198,13 +201,15 @@ static void tcp_l2_buf_fill_headers(const struct ctx *c, > > /** > * tcp_buf_send_flag() - Send segment with flags to tap (no payload) > - * @c: Execution context > - * @conn: Connection pointer > - * @flags: TCP flags: if not set, send segment only if ACK is due > + * @c: Execution context > + * @conn: Connection pointer > + * @flags: TCP flags: if not set, send segment only if ACK is due > + * @now: Current timestamp > * > * Return: negative error code on connection reset, 0 otherwise > */ > -int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > +int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, > + const struct timespec *now) > { > struct tcp_payload_t *payload; > struct iovec *iov; > @@ -223,7 +228,8 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > payload = iov[TCP_IOV_PAYLOAD].iov_base; > seq = conn->seq_to_tap; > ret = tcp_prepare_flags(c, conn, flags, &payload->th, > - (struct tcp_syn_opts *)&payload->data, &optlen); > + (struct tcp_syn_opts *)&payload->data, > + &optlen, now); > if (ret <= 0) > return ret; > > @@ -253,7 +259,7 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > } > > if (tcp_payload_used > TCP_FRAMES_MEM - 2) > - tcp_payload_flush(c); > + tcp_payload_flush(c, now); > > return 0; > } > @@ -266,9 +272,11 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > * @no_csum: Don't compute IPv4 checksum, use the one from previous buffer > * @seq: Sequence number to be sent > * @push: Set PSH flag, last segment in a batch > + * @now: Current timestamp > */ > static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > - ssize_t dlen, int no_csum, uint32_t seq, bool push) > + ssize_t dlen, int no_csum, uint32_t seq, bool push, > + const struct timespec *now) > { > struct tcp_payload_t *payload; > uint32_t check = IP4_CSUM; > @@ -302,7 +310,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > tcp_l2_buf_pad(iov); > > if (++tcp_payload_used > TCP_FRAMES_MEM - 1) > - tcp_payload_flush(c); > + tcp_payload_flush(c, now); > } > > /** > @@ -310,13 +318,14 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection pointer > * @already_sent: Number of bytes already sent to tap, but not acked > + * @now: Current timestamp > * > * Return: negative on connection reset, 0 otherwise > * > * #syscalls recvmsg > */ > int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > - uint32_t already_sent) > + uint32_t already_sent, const struct timespec *now) > { > uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; > int fill_bufs, send_bufs = 0, last_len, iov_rem = 0; > @@ -336,12 +345,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > } > > if (tcp_prepare_iov(&mh_sock, iov_sock, already_sent, fill_bufs)) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return -1; > } > > if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { > - tcp_payload_flush(c); > + tcp_payload_flush(c, now); > > /* Silence Coverity CWE-125 false positive */ > tcp_payload_used = 0; > @@ -361,19 +370,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > > if (len < 0) { > if (errno != EAGAIN && errno != EWOULDBLOCK) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return -errno; > } > > if (already_sent) /* No new data and EAGAIN: set EPOLLET */ > - conn_flag(c, conn, STALLED); > + conn_flag(c, conn, STALLED, now); > > return 0; > } > > if (!len) { > if (already_sent) { > - conn_flag(c, conn, STALLED); > + conn_flag(c, conn, STALLED, now); > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == > SOCK_FIN_RCVD) { > int ret; > @@ -388,14 +397,14 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > */ > conn->seq_ack_to_tap = conn->seq_from_tap; > > - ret = tcp_buf_send_flag(c, conn, FIN | ACK); > + ret = tcp_buf_send_flag(c, conn, FIN | ACK, now); > if (ret) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return ret; > } > > - conn_event(c, conn, TAP_FIN_SENT); > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_event(c, conn, TAP_FIN_SENT, now); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > } > > return 0; > @@ -405,18 +414,18 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > len -= already_sent; > > if (len <= 0) { > - conn_flag(c, conn, STALLED); > + conn_flag(c, conn, STALLED, now); > return 0; > } > > - conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS); > - conn_flag(c, conn, ~STALLED); > + conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS, now); > + conn_flag(c, conn, ~STALLED, now); > > send_bufs = DIV_ROUND_UP(len, mss); > last_len = len - (send_bufs - 1) * mss; > > /* Likely, some new data was acked too. */ > - tcp_update_seqack_wnd(c, conn, false, NULL); > + tcp_update_seqack_wnd(c, conn, false, NULL, now); > > /* Finally, queue to tap */ > dlen = mss; > @@ -430,11 +439,11 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > push = true; > } > > - tcp_data_to_tap(c, conn, dlen, no_csum, seq, push); > + tcp_data_to_tap(c, conn, dlen, no_csum, seq, push, now); > seq += dlen; > } > > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > > return 0; > } > diff --git a/tcp_buf.h b/tcp_buf.h > index 710ed377..5d31cea6 100644 > --- a/tcp_buf.h > +++ b/tcp_buf.h > @@ -7,9 +7,10 @@ > #define TCP_BUF_H > > void tcp_sock_iov_init(const struct ctx *c); > -void tcp_payload_flush(const struct ctx *c); > +void tcp_payload_flush(const struct ctx *c, const struct timespec *now); > int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > - uint32_t already_sent); > -int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags); > + uint32_t already_sent, const struct timespec *now); > +int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, > + const struct timespec *now); > > #endif /*TCP_BUF_H */ > diff --git a/tcp_conn.h b/tcp_conn.h > index d93ada7d..e3c55341 100644 > --- a/tcp_conn.h > +++ b/tcp_conn.h > @@ -248,11 +248,13 @@ int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn); > int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn); > > int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn); > -int tcp_flow_migrate_source_ext(const struct ctx *c, int fd, > - const struct tcp_tap_conn *conn); > +int tcp_flow_migrate_source_ext(const struct ctx *c, > + int fd, const struct tcp_tap_conn *conn, > + const struct timespec *now); > > int tcp_flow_migrate_target(struct ctx *c, int fd); > -int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd); > +int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, > + int fd, const struct timespec *now); > > bool tcp_flow_is_established(const struct tcp_tap_conn *conn); > > diff --git a/tcp_internal.h b/tcp_internal.h > index c623569c..1af25182 100644 > --- a/tcp_internal.h > +++ b/tcp_internal.h > @@ -158,27 +158,28 @@ struct tcp_syn_opts { > extern char tcp_buf_discard [BUF_DISCARD_SIZE]; > > void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, > - unsigned long flag); > -#define conn_flag(c, conn, flag) \ > + unsigned long flag, const struct timespec *now); > +#define conn_flag(c, conn, flag, now) \ > do { \ > flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ > - conn_flag_do(c, conn, flag); \ > + conn_flag_do(c, conn, flag, now); \ > } while (0) > > > void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, > - unsigned long event); > -#define conn_event(c, conn, event) \ > + unsigned long event, const struct timespec *now); > +#define conn_event(c, conn, event, now) \ > do { \ > flow_trace(conn, "event at %s:%i", __func__, __LINE__); \ > - conn_event_do(c, conn, event); \ > + conn_event_do(c, conn, event, now); \ > } while (0) > > -void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn); > -#define tcp_rst(c, conn) \ > +void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn, > + const struct timespec *now); > +#define tcp_rst(c, conn, now) \ > do { \ > flow_dbg((conn), "TCP reset at %s:%i", __func__, __LINE__); \ > - tcp_rst_do(c, conn); \ > + tcp_rst_do(c, conn, now); \ > } while (0) > > struct tcp_info_linux; > @@ -194,11 +195,13 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn, > size_t dlen, uint32_t csum_flags, uint32_t seq); > > int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > - bool force_seq, struct tcp_info_linux *tinfo); > + bool force_seq, struct tcp_info_linux *tinfo, > + const struct timespec *now); > int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, > int flags, struct tcphdr *th, struct tcp_syn_opts *opts, > - size_t *optlen); > -int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset); > + size_t *optlen, const struct timespec *now); > +int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset, > + const struct timespec *now); > > int tcp_prepare_iov(struct msghdr *msg, struct iovec *iov, > uint32_t already_sent, int payload_iov_cnt); > diff --git a/tcp_splice.c b/tcp_splice.c > index 3fd33a10..4b01f1aa 100644 > --- a/tcp_splice.c > +++ b/tcp_splice.c > @@ -136,10 +136,12 @@ static uint32_t tcp_splice_conn_epoll_events(uint16_t events, unsigned sidei) > /** > * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events > * @conn: Connection pointer > + * @now: Current timestamp > * > * Return: 0 on success, negative error code on failure (not on deletion) > */ > -static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn) > +static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn, > + const struct timespec *now) > { > uint32_t events[2]; > > @@ -149,7 +151,7 @@ static int tcp_splice_epoll_ctl(struct tcp_splice_conn *conn) > if (flow_epoll_set(&conn->f, EPOLL_CTL_MOD, events[0], conn->s[0], 0) || > flow_epoll_set(&conn->f, EPOLL_CTL_MOD, events[1], conn->s[1], 1)) { > int ret = -errno; > - flow_perror(conn, "ERROR on epoll_ctl()"); > + flow_perror_ratelimit(conn, now, "ERROR on epoll_ctl()"); > return ret; > } > > @@ -201,8 +203,10 @@ static void conn_flag_do(struct tcp_splice_conn *conn, > * conn_event_do() - Set and log connection events, update epoll state > * @conn: Connection pointer > * @event: Connection event > + * @now: Current timestamp > */ > -static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event) > +static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event, > + const struct timespec *now) > { > if (event & (event - 1)) { > int flag_index = fls(~event); > @@ -224,14 +228,14 @@ static void conn_event_do(struct tcp_splice_conn *conn, unsigned long event) > flow_dbg(conn, "%s", tcp_splice_event_str[flag_index]); > } > > - if (tcp_splice_epoll_ctl(conn)) > + if (tcp_splice_epoll_ctl(conn, now)) > conn_flag(conn, CLOSING); > } > > -#define conn_event(conn, event) \ > +#define conn_event(conn, event, now) \ > do { \ > flow_trace(conn, "event at %s:%i",__func__, __LINE__); \ > - conn_event_do(conn, event); \ > + conn_event_do(conn, event, now); \ > } while (0) > > /** > @@ -292,11 +296,13 @@ bool tcp_splice_flow_defer(struct tcp_splice_conn *conn) > * tcp_splice_connect_finish() - Completion of connect() or call on success > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > * > * Return: 0 on success, -EIO on failure > */ > static int tcp_splice_connect_finish(const struct ctx *c, > - struct tcp_splice_conn *conn) > + struct tcp_splice_conn *conn, > + const struct timespec *now) > { > unsigned sidei; > int i = 0; > @@ -314,8 +320,10 @@ static int tcp_splice_connect_finish(const struct ctx *c, > > if (conn->pipe[sidei][0] < 0) { > if (pipe2(conn->pipe[sidei], O_NONBLOCK | O_CLOEXEC)) { > - flow_perror(conn, "cannot create %d->%d pipe", > - sidei, !sidei); > + flow_perror_ratelimit( > + conn, now, > + "cannot create %d->%d pipe", > + sidei, !sidei); > tcp_splice_rst(conn); > return -EIO; > } > @@ -330,7 +338,7 @@ static int tcp_splice_connect_finish(const struct ctx *c, > } > > if (!(conn->events & SPLICE_ESTABLISHED)) > - conn_event(conn, SPLICE_ESTABLISHED); > + conn_event(conn, SPLICE_ESTABLISHED, now); > > return 0; > } > @@ -339,10 +347,12 @@ static int tcp_splice_connect_finish(const struct ctx *c, > * tcp_splice_connect() - Create and connect socket for new spliced connection > * @c: Execution context > * @conn: Connection pointer > + * @now: Current timestamp > * > * Return: 0 for connect() succeeded or in progress, negative value on error > */ > -static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) > +static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, > + const struct timespec *now) > { > const struct flowside *tgt = &conn->f.side[TGTSIDE]; > sa_family_t af = inany_v4(&tgt->eaddr) ? AF_INET : AF_INET6; > @@ -381,11 +391,11 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) > if (flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, conn->s[0], 0) || > flow_epoll_set(&conn->f, EPOLL_CTL_ADD, 0, conn->s[1], 1)) { > int ret = -errno; > - flow_perror(conn, "Cannot register to epollfd"); > + flow_perror_ratelimit(conn, now, "Cannot register to epollfd"); > return ret; > } > > - conn_event(conn, SPLICE_CONNECT); > + conn_event(conn, SPLICE_CONNECT, now); > > if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { > if (errno != EINPROGRESS) { > @@ -394,8 +404,8 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn) > return -errno; > } > } else { > - conn_event(conn, SPLICE_ESTABLISHED); > - return tcp_splice_connect_finish(c, conn); > + conn_event(conn, SPLICE_ESTABLISHED, now); > + return tcp_splice_connect_finish(c, conn, now); > } > > return 0; > @@ -435,10 +445,12 @@ static int tcp_conn_sock_ns(const struct ctx *c, sa_family_t af) > * @flow: flow to initialise > * @s0: Accepted (side 0) socket > * @sa: Peer address of connection > + * @now: Current timestamp > * > * #syscalls:pasta setsockopt > */ > -void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) > +void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0, > + const struct timespec *now) > { > struct tcp_splice_conn *conn = FLOW_SET_TYPE(flow, FLOW_TCP_SPLICE, > tcp_splice); > @@ -453,7 +465,7 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0) > if (setsockopt(s0, SOL_TCP, TCP_QUICKACK, &((int){ 1 }), sizeof(int))) > flow_trace(conn, "failed to set TCP_QUICKACK on %i", s0); > > - if (tcp_splice_connect(c, conn)) > + if (tcp_splice_connect(c, conn, now)) > tcp_splice_rst(conn); > > FLOW_ACTIVATE(conn); > @@ -499,7 +511,7 @@ static int tcp_splice_forward(struct ctx *c, > > if (readlen <= 0) { > if (!readlen) /* EOF */ > - conn_event(conn, FIN_RCVD(fromsidei)); > + conn_event(conn, FIN_RCVD(fromsidei), now); > > /* We're either blocked or at EOF on the read side, and > * there's nothing in the pipe so there's nothing to do > @@ -551,9 +563,9 @@ static int tcp_splice_forward(struct ctx *c, > * drain. > */ > if (conn->pending[fromsidei]) > - conn_event(conn, OUT_WAIT(!fromsidei)); > + conn_event(conn, OUT_WAIT(!fromsidei), now); > else > - conn_event(conn, ~OUT_WAIT(!fromsidei)); > + conn_event(conn, ~OUT_WAIT(!fromsidei), now); > > if ((conn->events & FIN_RCVD(fromsidei)) && > !(conn->events & FIN_SENT(!fromsidei)) && > @@ -563,7 +575,7 @@ static int tcp_splice_forward(struct ctx *c, > pif_name(conn->f.pif[!fromsidei])); > return -1; > } > - conn_event(conn, FIN_SENT(!fromsidei)); > + conn_event(conn, FIN_SENT(!fromsidei), now); > } > > return 0; > @@ -593,7 +605,8 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > > rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl); > if (rc) > - flow_perror(conn, "Error retrieving SO_ERROR"); > + flow_perror_ratelimit(conn, now, > + "Error retrieving SO_ERROR"); > else > flow_dbg_ratelimit(conn, now, > "Error event on %s socket: %s", > @@ -610,7 +623,7 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > events); > goto reset; > } > - if (tcp_splice_connect_finish(c, conn)) > + if (tcp_splice_connect_finish(c, conn, now)) > goto reset; > } > > diff --git a/tcp_splice.h b/tcp_splice.h > index 8a1a1f67..c24e419f 100644 > --- a/tcp_splice.h > +++ b/tcp_splice.h > @@ -13,7 +13,8 @@ union sockaddr_inany; > > void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > uint32_t events, const struct timespec *now); > -void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0); > +void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0, > + const struct timespec *now); > void tcp_splice_init(struct ctx *c); > > #endif /* TCP_SPLICE_H */ > diff --git a/tcp_vu.c b/tcp_vu.c > index 00616955..868a35b9 100644 > --- a/tcp_vu.c > +++ b/tcp_vu.c > @@ -116,12 +116,14 @@ static int tcp_vu_send_dup(const struct ctx *c, struct vu_virtq *vq, > * @c: Execution context > * @conn: Connection pointer > * @flags: TCP flags: if not set, send segment only if ACK is due > + * @now: Current timestamp > * > * Return: -ECONNRESET on fatal connection error, > * -EAGAIN if vhost-user buffers are unavailable, > * 0 otherwise > */ > -int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > +int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, > + const struct timespec *now) > { > struct vu_dev *vdev = c->vdev; > struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; > @@ -158,7 +160,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_TCP); > > seq = conn->seq_to_tap; > - ret = tcp_prepare_flags(c, conn, flags, &th, &opts, &optlen); > + ret = tcp_prepare_flags(c, conn, flags, &th, &opts, &optlen, now); > if (ret <= 0) { > vu_queue_rewind(vq, elem_cnt); > return ret; > @@ -424,11 +426,12 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, > * @c: Execution context > * @conn: Connection pointer > * @already_sent: Number of bytes already sent to tap, but not acked > + * @now: Current timestamp > * > * Return: negative on connection reset, 0 otherwise > */ > int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > - uint32_t already_sent) > + uint32_t already_sent, const struct timespec *now) > { > uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; > struct vu_dev *vdev = c->vdev; > @@ -455,12 +458,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > &elem_cnt, &frame_cnt); > if (len < 0) { > if (len != -EAGAIN && len != -EWOULDBLOCK) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return len; > } > > if (already_sent) /* No new data and EAGAIN: set EPOLLET */ > - conn_flag(c, conn, STALLED); > + conn_flag(c, conn, STALLED, now); > > return 0; > } > @@ -468,7 +471,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > if (!len) { > vu_queue_rewind(vq, elem_cnt); > if (already_sent) { > - conn_flag(c, conn, STALLED); > + conn_flag(c, conn, STALLED, now); > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == > SOCK_FIN_RCVD) { > int ret; > @@ -476,24 +479,24 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > /* See tcp_buf_data_from_sock() */ > conn->seq_ack_to_tap = conn->seq_from_tap; > > - ret = tcp_vu_send_flag(c, conn, FIN | ACK); > + ret = tcp_vu_send_flag(c, conn, FIN | ACK, now); > if (ret) { > - tcp_rst(c, conn); > + tcp_rst(c, conn, now); > return ret; > } > > - conn_event(c, conn, TAP_FIN_SENT); > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_event(c, conn, TAP_FIN_SENT, now); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > } > > return 0; > } > > - conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS); > - conn_flag(c, conn, ~STALLED); > + conn_flag(c, conn, ~ACK_FROM_TAP_BLOCKS, now); > + conn_flag(c, conn, ~STALLED, now); > > /* Likely, some new data was acked too. */ > - tcp_update_seqack_wnd(c, conn, false, NULL); > + tcp_update_seqack_wnd(c, conn, false, NULL, now); > > /* initialize headers */ > /* iov_vu is an array of buffers and the buffer size can be > @@ -537,7 +540,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > } > vu_queue_notify(vdev, vq); > > - conn_flag(c, conn, ACK_FROM_TAP_DUE); > + conn_flag(c, conn, ACK_FROM_TAP_DUE, now); > > return 0; > } > diff --git a/tcp_vu.h b/tcp_vu.h > index 30e46925..6f2aa2e1 100644 > --- a/tcp_vu.h > +++ b/tcp_vu.h > @@ -6,8 +6,9 @@ > #ifndef TCP_VU_H > #define TCP_VU_H > > -int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags); > +int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags, > + const struct timespec *now); > int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn, > - uint32_t already_sent); > + uint32_t already_sent, const struct timespec *now); > > #endif /*TCP_VU_H */ > diff --git a/udp.c b/udp.c > index 2d9e2f04..d23163f6 100644 > --- a/udp.c > +++ b/udp.c > @@ -539,6 +539,7 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) > * @pif: Interface on which the error occurred > * (only used if @sidx == FLOW_SIDX_NONE) > * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) > + * @now: Current timestamp > * > * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 > * if there was an error reading the queue > @@ -546,7 +547,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) > * #syscalls recvmsg > */ > static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, > - uint8_t pif, in_port_t port) > + uint8_t pif, in_port_t port, > + const struct timespec *now) > { > char buf[PKTINFO_SPACE + RECVERR_SPACE]; > const struct sock_extended_err *ee; > @@ -664,12 +666,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, > } > > fail: > - flow_dbg(uflow, "Can't propagate %s error from %s %s to %s %s", > - str_ee_origin(ee), > - pif_name(pif), > - sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), > - pif_name(topif), > - inany_ntop(&toside->eaddr, astr, sizeof(astr))); > + flow_warn_ratelimit(uflow, now, > + "Can't propagate %s error from %s %s to %s %s", > + str_ee_origin(ee), > + pif_name(pif), > + sockaddr_ntop(SO_EE_OFFENDER(ee), > + sastr, sizeof(sastr)), > + pif_name(topif), > + inany_ntop(&toside->eaddr, astr, sizeof(astr))); > return 1; > } > > @@ -681,11 +685,13 @@ fail: > * @pif: Interface on which the error occurred > * (only used if @sidx == FLOW_SIDX_NONE) > * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) > + * @now: Current timestamp > * > * Return: number of errors handled, or < 0 if we have an unrecoverable error > */ > static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, > - uint8_t pif, in_port_t port) > + uint8_t pif, in_port_t port, > + const struct timespec *now) > { > unsigned n_err = 0; > socklen_t errlen; > @@ -694,7 +700,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, > assert(!c->no_udp); > > /* Empty the error queue */ > - while ((rc = udp_sock_recverr(c, s, sidx, pif, port)) > 0) > + while ((rc = udp_sock_recverr(c, s, sidx, pif, port, now)) > 0) > n_err += rc; > > if (rc < 0) > @@ -869,7 +875,7 @@ void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > strerror_(-rc)); > /* Clear errors & carry on */ > if (udp_sock_errs(c, s, FLOW_SIDX_NONE, > - frompif, port) < 0) { > + frompif, port, now) < 0) { > err_ratelimit(now, > "UDP: Unrecoverable error on listening socket: (%s port %hu)", > pif_name(frompif), port); > @@ -892,9 +898,10 @@ void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, > } else if (flow_sidx_valid(tosidx)) { > struct udp_flow *uflow = udp_at_sidx(tosidx); > > - flow_err(uflow, > - "No support for forwarding UDP from %s to %s", > - pif_name(frompif), pif_name(topif)); > + flow_err_ratelimit( > + uflow, now, > + "No support for forwarding UDP from %s to %s", > + pif_name(frompif), pif_name(topif)); > discard = true; > } else { > warn_ratelimit(now, "Discarding datagram without flow"); > @@ -942,8 +949,11 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, > assert(!c->no_udp && uflow); > > if (events & EPOLLERR) { > - if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) { > - flow_err(uflow, "Unrecoverable error on flow socket"); > + if (udp_sock_errs(c, ref.fd, ref.flowside, > + PIF_NONE, 0, now) < 0) { > + flow_err_ratelimit( > + uflow, now, > + "Unrecoverable error on flow socket"); > goto fail; > } > } > @@ -974,7 +984,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, > udp_buf_sock_to_tap(c, s, n, tosidx); > } > } else { > - flow_err(uflow, > + flow_err_ratelimit(uflow, now, > "No support for forwarding UDP from %s to %s", > pif_name(pif_at_sidx(ref.flowside)), > pif_name(topif)); > @@ -1052,8 +1062,9 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, > flow_sidx_t fromsidx = flow_sidx_opposite(tosidx); > uint8_t frompif = pif_at_sidx(fromsidx); > > - flow_err(uflow, "No support for forwarding UDP from %s to %s", > - pif_name(frompif), pif_name(topif)); > + flow_err_ratelimit(uflow, now, > + "No support for forwarding UDP from %s to %s", > + pif_name(frompif), pif_name(topif)); > return 1; > } > toside = flowside_at_sidx(tosidx); > diff --git a/udp_flow.c b/udp_flow.c > index 31e2f8a7..5a59f7f3 100644 > --- a/udp_flow.c > +++ b/udp_flow.c > @@ -64,11 +64,13 @@ void udp_flow_close(const struct ctx *c, struct udp_flow *uflow) > * @c: Execution context > * @uflow: UDP flow to open socket for > * @sidei: Side of @uflow to open socket for > + * @now: Current timestamp > * > * Return: fd of new socket on success, -ve error code on failure > */ > static int udp_flow_sock(const struct ctx *c, > - struct udp_flow *uflow, unsigned sidei) > + struct udp_flow *uflow, unsigned sidei, > + const struct timespec *now) > { > const struct flowside *side = &uflow->f.side[sidei]; > uint8_t pif = uflow->f.pif[sidei]; > @@ -77,7 +79,8 @@ static int udp_flow_sock(const struct ctx *c, > > s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side); > if (s < 0) { > - flow_dbg_perror(uflow, "Couldn't open flow specific socket"); > + flow_perror_ratelimit(uflow, now, > + "Couldn't open flow specific socket"); > return s; > } > > @@ -90,11 +93,11 @@ static int udp_flow_sock(const struct ctx *c, > > if (flowside_connect(c, s, pif, side) < 0) { > rc = -errno; > - flow_dbg_perror(uflow, "Couldn't connect flow socket"); > + flow_warn_perror_ratelimit(uflow, now, > + "Couldn't connect flow socket"); > > epoll_del(flow_epollfd(&uflow->f), s); > close(s); > - > return rc; > } > uflow->s[sidei] = s; > @@ -155,7 +158,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > > flow_foreach_sidei(sidei) { > if (pif_is_socket(uflow->f.pif[sidei])) > - if (udp_flow_sock(c, uflow, sidei) < 0) > + if (udp_flow_sock(c, uflow, sidei, now) < 0) > goto cancel; > } > > @@ -177,7 +180,7 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, > goto cancel; > } > if (port != tgt->oport) { > - flow_err(uflow, "Unexpected local port"); > + flow_err_ratelimit(uflow, now, "Unexpected local port"); > goto cancel; > } > } > @@ -249,7 +252,8 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, > * been initiated from a socket bound to 0.0.0.0 or ::, we don't > * know our address, so we have to leave it unpopulated. > */ > - flow_err(flow, "Invalid endpoint on UDP recvfrom()"); > + flow_err_ratelimit(flow, now, > + "Invalid endpoint on UDP recvfrom()"); > flow_alloc_cancel(flow); > return FLOW_SIDX_NONE; > } -- Stefano ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 6/6] udp: Improve messages for errors getting errors 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson ` (4 preceding siblings ...) 2026-06-17 3:11 ` [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages David Gibson @ 2026-06-17 3:11 ` David Gibson 5 siblings, 0 replies; 8+ messages in thread From: David Gibson @ 2026-06-17 3:11 UTC (permalink / raw) To: passt-dev, Stefano Brivio; +Cc: David Gibson If udp_sock_recverr() gets an error retrieving the error queue, or udp_sock_errs() gets an error reading the SO_ERROR sockopt, we log a message with err_perror(), That severity is reasonable - this is something going unexpectedly wrong host side, not merely a connection getting shut down because we hit a network error. However, the messages can be made more useful by linking them to the specific flow, and safer by ratelimiting them. Remove the places where we included the numerical value of the socket fd: that's rarely useful, especially now that we have the context about the flow the fd belonged to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- udp.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/udp.c b/udp.c index d23163f6..60bcfb89 100644 --- a/udp.c +++ b/udp.c @@ -580,10 +580,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, rc = recvmsg(s, &mh, MSG_ERRQUEUE); if (rc < 0) { + struct udp_flow *uflow = udp_at_sidx(sidx); + if (errno == EAGAIN || errno == EWOULDBLOCK) return 0; - err_perror("UDP: Failed to read error queue"); + flow_perror_ratelimit(uflow, now, "Failed to read error queue"); return -1; } @@ -693,6 +695,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port, const struct timespec *now) { + struct udp_flow *uflow = udp_at_sidx(sidx); unsigned n_err = 0; socklen_t errlen; int rc, err; @@ -709,18 +712,20 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, errlen = sizeof(err); if (getsockopt(s, SOL_SOCKET, SO_ERROR, &err, &errlen) < 0 || errlen != sizeof(err)) { - err_perror("Error reading SO_ERROR"); + flow_perror_ratelimit(uflow, now, "Error reading SO_ERROR"); return -1; /* error reading error, unrecoverable */ } if (err) { - debug("Unqueued error on UDP socket %i: %s", s, strerror_(err)); + flow_dbg(uflow, "Unqueued error on UDP socket: %s", + strerror_(err)); n_err++; } if (!n_err) { /* EPOLLERR, but no errors to clear !? */ - err("EPOLLERR event without reported errors on socket %i", s); + flow_err_ratelimit(uflow, now, + "EPOLLERR event without reported errors", s); return -1; /* no way to clear, unrecoverable */ } -- 2.54.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-17 5:23 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-06-17 3:11 [PATCH v2 0/6] Improvements to flow specific logging David Gibson 2026-06-17 3:11 ` [PATCH v2 1/6] flow: Regularise flow specific logging helpers David Gibson 2026-06-17 3:11 ` [PATCH v2 2/6] flow: Indent flow details messages David Gibson 2026-06-17 3:11 ` [PATCH v2 3/6] flow: Include flow details with higher priority log messages David Gibson 2026-06-17 3:11 ` [PATCH v2 4/6] flow, udp: Fix errno handling in udp_flow_sock() David Gibson 2026-06-17 3:11 ` [PATCH v2 5/6] flow, treewide: Promote priority of selected flow-linked messages David Gibson 2026-06-17 5:23 ` Stefano Brivio 2026-06-17 3:11 ` [PATCH v2 6/6] udp: Improve messages for errors getting errors David Gibson
Code repositories for project(s) associated with this public inbox https://passt.top/passt This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for IMAP folder(s).