From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Subject: [PATCH 16/24] tcp_splice: Close sockets right away on high number of open files
Date: Fri, 25 Mar 2022 23:52:52 +0100 [thread overview]
Message-ID: <20220325225300.2803584-17-sbrivio@redhat.com> (raw)
In-Reply-To: <20220325225300.2803584-1-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4247 bytes --]
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.
Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
passt.c | 2 +-
passt.h | 2 ++
tcp.c | 1 +
tcp_splice.c | 28 ++++++++++++++++++++++------
tcp_splice.h | 1 +
5 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/passt.c b/passt.c
index 6550a22..292cf53 100644
--- a/passt.c
+++ b/passt.c
@@ -371,7 +371,7 @@ int main(int argc, char **argv)
perror("getrlimit");
exit(EXIT_FAILURE);
}
- limit.rlim_cur = limit.rlim_max;
+ c.nofile = limit.rlim_cur = limit.rlim_max;
if (setrlimit(RLIMIT_NOFILE, &limit)) {
perror("setrlimit");
exit(EXIT_FAILURE);
diff --git a/passt.h b/passt.h
index 3a62b15..9ea8f8d 100644
--- a/passt.h
+++ b/passt.h
@@ -98,6 +98,7 @@ enum passt_modes {
* @quiet: Don't print informational messages
* @foreground: Run in foreground, don't log to stderr by default
* @stderr: Force logging to stderr
+ * @nofile: Maximum number of open files (ulimit -n)
* @sock_path: Path for UNIX domain socket
* @pcap: Path for packet capture file
* @pid_file: Path to PID file, empty string if not configured
@@ -160,6 +161,7 @@ struct ctx {
int quiet;
int foreground;
int stderr;
+ int nofile;
char sock_path[UNIX_PATH_MAX];
char pcap[PATH_MAX];
char pid_file[PATH_MAX];
diff --git a/tcp.c b/tcp.c
index 384e7a6..2a5bf6e 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1560,6 +1560,7 @@ void tcp_defer_handler(struct ctx *c)
{
tcp_l2_flags_buf_flush(c);
tcp_l2_data_buf_flush(c);
+ tcp_splice_defer_handler(c);
}
/**
diff --git a/tcp_splice.c b/tcp_splice.c
index d374785..b7bdfc2 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -52,6 +52,7 @@
#define TCP_SPLICE_MAX_CONNS (128 * 1024)
#define TCP_SPLICE_PIPE_POOL_SIZE 16
#define REFILL_INTERVAL 1000 /* ms, refill pool of pipes */
+#define TCP_SPLICE_FILE_PRESSURE 30 /* % of c->nofile */
/* From tcp.c */
extern int init_sock_pool4 [TCP_SOCK_POOL_SIZE];
@@ -152,6 +153,7 @@ static void tcp_splice_conn_epoll_events(uint16_t events,
*b |= (events & SPLICE_B_OUT_WAIT) ? EPOLLOUT : 0;
}
+static void tcp_splice_destroy(struct ctx *c, struct tcp_splice_conn *conn);
static int tcp_splice_epoll_ctl(struct ctx *c, struct tcp_splice_conn *conn);
/**
@@ -832,13 +834,9 @@ void tcp_splice_init(struct ctx *c)
*/
void tcp_splice_timer(struct ctx *c, struct timespec *now)
{
- int i;
-
- for (i = c->tcp.splice_conn_count - 1; i >= 0; i--) {
- struct tcp_splice_conn *conn;
-
- conn = CONN(i);
+ struct tcp_splice_conn *conn;
+ for (conn = CONN(c->tcp.splice_conn_count - 1); conn >= tc; conn--) {
if (conn->flags & SPLICE_CLOSING) {
tcp_splice_destroy(c, conn);
continue;
@@ -865,3 +863,21 @@ void tcp_splice_timer(struct ctx *c, struct timespec *now)
if (timespec_diff_ms(now, &c->tcp.refill_ts) > REFILL_INTERVAL)
tcp_splice_pipe_refill(c);
}
+
+/**
+ * tcp_splice_defer_handler() - Close connections without timer on file pressure
+ * @c: Execution context
+ */
+void tcp_splice_defer_handler(struct ctx *c)
+{
+ int max_files = c->nofile / 100 * TCP_SPLICE_FILE_PRESSURE;
+ struct tcp_splice_conn *conn;
+
+ if (c->tcp.splice_conn_count * 6 < max_files)
+ return;
+
+ for (conn = CONN(c->tcp.splice_conn_count - 1); conn >= tc; conn--) {
+ if (conn->flags & SPLICE_CLOSING)
+ tcp_splice_destroy(c, conn);
+ }
+}
diff --git a/tcp_splice.h b/tcp_splice.h
index 45ab1ca..b744ba7 100644
--- a/tcp_splice.h
+++ b/tcp_splice.h
@@ -12,3 +12,4 @@ void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
void tcp_splice_destroy(struct ctx *c, struct tcp_splice_conn *conn);
void tcp_splice_init(struct ctx *c);
void tcp_splice_timer(struct ctx *c, struct timespec *now);
+void tcp_splice_defer_handler(struct ctx *c);
--
@@ -12,3 +12,4 @@ void tcp_sock_handler_splice(struct ctx *c, union epoll_ref ref,
void tcp_splice_destroy(struct ctx *c, struct tcp_splice_conn *conn);
void tcp_splice_init(struct ctx *c);
void tcp_splice_timer(struct ctx *c, struct timespec *now);
+void tcp_splice_defer_handler(struct ctx *c);
--
2.35.1
next prev parent reply other threads:[~2022-03-25 22:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-25 22:52 [PATCH 00/24] Boundary-checked "packets", TCP timerfd timeouts, assorted fixes Stefano Brivio
2022-03-25 22:52 ` [PATCH 01/24] conf, util, tap: Implement --trace option for extra verbose logging Stefano Brivio
2022-03-25 22:52 ` [PATCH 02/24] pcap: Fix mistake in printed string Stefano Brivio
2022-03-25 22:52 ` [PATCH 03/24] util: Drop CHECK_SET_MIN_MAX{,_PROTO_FD} macros Stefano Brivio
2022-03-25 22:52 ` [PATCH 04/24] util: Use standard int types Stefano Brivio
2022-03-25 22:52 ` [PATCH 05/24] tcp: Refactor to use events instead of states, split out spliced implementation Stefano Brivio
2022-03-25 22:52 ` [PATCH 06/24] test/lib/video: Fill in href attributes of video shortcuts Stefano Brivio
2022-03-25 22:52 ` [PATCH 07/24] udp: Drop _splice from recv, send, sendto static buffer names Stefano Brivio
2022-03-25 22:52 ` [PATCH 08/24] udp: Split buffer queueing/writing parts of udp_sock_handler() Stefano Brivio
2022-03-25 22:52 ` [PATCH 09/24] dhcpv6, tap, tcp: Use IN6_ARE_ADDR_EQUAL instead of open-coded memcmp() Stefano Brivio
2022-03-25 22:52 ` [PATCH 10/24] udp: Use flags for local, loopback, and configured unicast binds Stefano Brivio
2022-03-25 22:52 ` [PATCH 11/24] Makefile: Enable a few hardening flags Stefano Brivio
2022-03-25 22:52 ` [PATCH 12/24] test: Add asciinema(1) as requirement for CI in README Stefano Brivio
2022-03-25 22:52 ` [PATCH 13/24] test, seccomp, Makefile: Switch to valgrind runs for passt functional tests Stefano Brivio
2022-03-25 22:52 ` [PATCH 14/24] tcp, udp, util: Enforce 24-bit limit on socket numbers Stefano Brivio
2022-03-25 22:52 ` [PATCH 15/24] tcp: Rework timers to use timerfd instead of periodic bitmap scan Stefano Brivio
2022-03-25 22:52 ` Stefano Brivio [this message]
2022-03-25 22:52 ` [PATCH 17/24] test/perf: Work-around for virtio_net hang before long streams from guest Stefano Brivio
2022-03-25 22:52 ` [PATCH 18/24] README: Avoid "here" links Stefano Brivio
2022-03-25 22:52 ` [PATCH 19/24] README: Update Interfaces and Availability sections Stefano Brivio
2022-03-25 22:52 ` [PATCH 20/24] tcp: Fit struct tcp_conn into a single 64-byte cacheline Stefano Brivio
2022-03-25 22:52 ` [PATCH 21/24] dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62 Stefano Brivio
2022-03-25 22:52 ` [PATCH 22/24] tcp, tcp_splice: Use less awkward syntax to swap in/out sockets from pools Stefano Brivio
2022-03-25 22:52 ` [PATCH 23/24] util: Fix function declaration style of write_pidfile() Stefano Brivio
2022-03-25 22:53 ` [PATCH 24/24] treewide: Packet abstraction with mandatory boundary checks Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220325225300.2803584-17-sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).