From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets
Date: Fri, 17 Jan 2025 10:34:05 +0100 [thread overview]
Message-ID: <20250117093405.1253554-1-sbrivio@redhat.com> (raw)
Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on
both sides"), David argues that, in general, we don't know what kind
of TCP traffic we're dealing with, on any side or path.
TCP segments might have been delivered to our socket with a PSH flag,
but we don't have a way to know about it.
Similarly, the guest might send us segments with PSH or URG set, but
we don't know if we should generally TCP_CORK sockets and uncork on
those flags, because that would assume they're running a Linux kernel
(and a particular version of it) matching the kernel that delivers
outbound packets for us.
Given that we can't make any assumption and everything might very well
be interactive traffic, disable Nagle's algorithm on all non-spliced
sockets as well.
After all, John Nagle himself is nowadays recommending that delayed
ACKs should never be enabled together with his algorithm, but we
don't have a practical way to ensure that our environment is free from
delayed ACKs (TCP_QUICKACK is not really usable for this purpose):
https://news.ycombinator.com/item?id=34180239
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/tcp.c b/tcp.c
index 3b3193a..c570f42 100644
--- a/tcp.c
+++ b/tcp.c
@@ -756,6 +756,19 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
trace("TCP: failed to set SO_SNDBUF to %i", v);
}
+/**
+ * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's algorithm)
+ * @s: Socket, can be -1 to avoid check in the caller
+ */
+static void tcp_sock_set_nodelay(int s)
+{
+ if (s == -1)
+ return;
+
+ if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int)))
+ trace("TCP: failed to set TCP_NODELAY on socket %i", s);
+}
+
/**
* tcp_update_csum() - Calculate TCP checksum
* @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header
@@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af)
return -errno;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
return s;
}
@@ -2261,6 +2275,8 @@ static int tcp_sock_init_one(const struct ctx *c, const union inany_addr *addr,
return s;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
+
return s;
}
@@ -2322,6 +2338,8 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
else
s = -1;
+ tcp_sock_set_nodelay(s);
+
if (c->tcp.fwd_out.mode == FWD_AUTO)
tcp_sock_ns[port][V4] = s;
}
@@ -2348,6 +2366,8 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
else
s = -1;
+ tcp_sock_set_nodelay(s);
+
if (c->tcp.fwd_out.mode == FWD_AUTO)
tcp_sock_ns[port][V6] = s;
}
--
@@ -756,6 +756,19 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s)
trace("TCP: failed to set SO_SNDBUF to %i", v);
}
+/**
+ * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's algorithm)
+ * @s: Socket, can be -1 to avoid check in the caller
+ */
+static void tcp_sock_set_nodelay(int s)
+{
+ if (s == -1)
+ return;
+
+ if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int)))
+ trace("TCP: failed to set TCP_NODELAY on socket %i", s);
+}
+
/**
* tcp_update_csum() - Calculate TCP checksum
* @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header
@@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af)
return -errno;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
return s;
}
@@ -2261,6 +2275,8 @@ static int tcp_sock_init_one(const struct ctx *c, const union inany_addr *addr,
return s;
tcp_sock_set_bufsize(c, s);
+ tcp_sock_set_nodelay(s);
+
return s;
}
@@ -2322,6 +2338,8 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port)
else
s = -1;
+ tcp_sock_set_nodelay(s);
+
if (c->tcp.fwd_out.mode == FWD_AUTO)
tcp_sock_ns[port][V4] = s;
}
@@ -2348,6 +2366,8 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port)
else
s = -1;
+ tcp_sock_set_nodelay(s);
+
if (c->tcp.fwd_out.mode == FWD_AUTO)
tcp_sock_ns[port][V6] = s;
}
--
2.43.0
next reply other threads:[~2025-01-17 9:34 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 9:34 Stefano Brivio [this message]
2025-01-20 9:08 ` [PATCH] tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets David Gibson
2025-01-20 17:28 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250117093405.1253554-1-sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).