* [PATCH v2 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl() David Gibson
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_conn_update() calls tcp_splice_epoll_ctl() twice: first ignoring
the return value, then checking it. This serves no purpose. If the first
call succeeds, the second call will do exactly the same thing again, since
nothing has changed in conn. If the first call fails, then
tcp_splice_epoll_ctl() itself will EPOLL_CTL_DEL both fds, meaning when
the second call tries to EPOLL_CTL_MOD them it will necessarily fail.
It appears that this duplication was introduced by accident in an
otherwise unrelated patch.
Fixes: bb708111 ("treewide: Packet abstraction with mandatory boundary checks")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 133e825..87fbd50 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -255,7 +255,6 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
*/
void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
{
- tcp_splice_epoll_ctl(c, new);
if (tcp_splice_epoll_ctl(c, new))
conn_flag(c, new, CLOSING);
}
--
@@ -255,7 +255,6 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
*/
void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
{
- tcp_splice_epoll_ctl(c, new);
if (tcp_splice_epoll_ctl(c, new))
conn_flag(c, new, CLOSING);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
2023-11-07 2:42 ` [PATCH v2 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() David Gibson
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
If we get an error from epoll_ctl() in tcp_splice_epoll_ctl() we goto the
'delete' path where we remove both sockets from the epoll set and return
an error. There are several problems with this:
- We 'return -errno' after the EPOLL_CTL_DEL operations, which means the
deleting epoll_ctl() calls may have overwritten the errno values which
actually triggered the failures.
- The call from conn_flag_do() occurs when the CLOSING flag is set, in
which case we go do the delete path regardless of error. In that case
the 'return errno' is meaningless since we don't expect the EPOLL_CTL_DEL
operations to fail and we ignore the return code anyway.
- All other calls to tcp_splice_epoll_ctl() check the return code and if
non-zero immediately call conn_flag(..., CLOSING) which will call
tcp_splice_epoll_ctl() again explicitly to remove the sockets from epoll.
That means removing them when the error first occurs is redundant.
- We never specifically report an error on the epoll_ctl() operations. We
just set the connection to CLOSING, more or less silently killing it.
This could make debugging difficult in the unlikely even that we get a
failure here.
Re-organise tcp_splice_epoll_ctl() to just log a message then return in the
error case, and only EPOLL_CTL_DEL when explicitly asked to with the
CLOSING flag.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 87fbd50..6c4c664 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -182,25 +182,27 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING)
- goto delete;
+ if (conn->flags & CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
+ return 0;
+ }
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b))
- goto delete;
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
conn->in_epoll = true;
return 0;
-
-delete:
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return -errno;
}
/**
--
@@ -182,25 +182,27 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING)
- goto delete;
+ if (conn->flags & CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
+ return 0;
+ }
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b))
- goto delete;
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
conn->in_epoll = true;
return 0;
-
-delete:
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return -errno;
}
/**
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
2023-11-07 2:42 ` [PATCH v2 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl() David Gibson
2023-11-07 2:42 ` [PATCH v2 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 04/11] tcp_splice: Remove unnecessary forward declaration David Gibson
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
when conn->flags & CLOSING. This will always happen immediately after
setting that flag, since conn_flag_do() makes the call itself. That's also
the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
clearing the conn->in_epoll flag, meaning that any further calls to
tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
overlap with anything else the function does, so just move them to be
open coded in conn_flag_do().
This does require kernel 2.6.9 or later, in order to pass NULL as the
event structure for epoll_ctl(). However, we already require at least
3.13 to allow unprivileged user namespaces.
Given that, simply directly perform the EPOLL_CTL_DEL operations from
conn_flag_do() rather than unnecessarily multiplexini
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 6c4c664..570a8c6 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -152,8 +152,10 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
}
- if (flag == CLOSING)
- tcp_splice_epoll_ctl(c, conn);
+ if (flag == CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ }
}
#define conn_flag(c, conn, flag) \
@@ -182,12 +184,6 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return 0;
- }
-
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
--
@@ -152,8 +152,10 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
}
- if (flag == CLOSING)
- tcp_splice_epoll_ctl(c, conn);
+ if (flag == CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ }
}
#define conn_flag(c, conn, flag) \
@@ -182,12 +184,6 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return 0;
- }
-
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 04/11] tcp_splice: Remove unnecessary forward declaration
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (2 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() David Gibson
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
In tcp_splice.c we forward declare tcp_splice_epoll_ctl() then define it
later on. However, there are no circular dependencies which prevent us
from simply having the full definition in place of the forward declaration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 71 +++++++++++++++++++++++++---------------------------
1 file changed, 34 insertions(+), 37 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 570a8c6..44d1e4a 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -116,8 +116,41 @@ static void tcp_splice_conn_epoll_events(uint16_t events,
*b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
+/**
+ * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
+ * @c: Execution context
+ * @conn: Connection pointer
+ *
+ * Return: 0 on success, negative error code on failure (not on deletion)
+ */
static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn);
+ struct tcp_splice_conn *conn)
+{
+ int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
+ union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
+ .tcp.index = CONN_IDX(conn) };
+ union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
+ .tcp.index = CONN_IDX(conn) };
+ struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
+ struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ uint32_t events_a, events_b;
+
+ tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
+ ev_a.events = events_a;
+ ev_b.events = events_b;
+
+ if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
+
+ conn->in_epoll = true;
+
+ return 0;
+}
/**
* conn_flag_do() - Set/unset given flag, log, update epoll on CLOSING flag
@@ -165,42 +198,6 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
conn_flag_do(c, conn, flag); \
} while (0)
-/**
- * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
- * @conn: Connection pointer
- *
- * Return: 0 on success, negative error code on failure (not on deletion)
- */
-static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn)
-{
- int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
-
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
-
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
- int ret = -errno;
- err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
- CONN_IDX(conn), strerror(errno));
- return ret;
- }
-
- conn->in_epoll = true;
-
- return 0;
-}
-
/**
* conn_event_do() - Set and log connection events, update epoll state
* @c: Execution context
--
@@ -116,8 +116,41 @@ static void tcp_splice_conn_epoll_events(uint16_t events,
*b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
+/**
+ * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
+ * @c: Execution context
+ * @conn: Connection pointer
+ *
+ * Return: 0 on success, negative error code on failure (not on deletion)
+ */
static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn);
+ struct tcp_splice_conn *conn)
+{
+ int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
+ union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
+ .tcp.index = CONN_IDX(conn) };
+ union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
+ .tcp.index = CONN_IDX(conn) };
+ struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
+ struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ uint32_t events_a, events_b;
+
+ tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
+ ev_a.events = events_a;
+ ev_b.events = events_b;
+
+ if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
+
+ conn->in_epoll = true;
+
+ return 0;
+}
/**
* conn_flag_do() - Set/unset given flag, log, update epoll on CLOSING flag
@@ -165,42 +198,6 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
conn_flag_do(c, conn, flag); \
} while (0)
-/**
- * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
- * @conn: Connection pointer
- *
- * Return: 0 on success, negative error code on failure (not on deletion)
- */
-static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn)
-{
- int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
-
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
-
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
- int ret = -errno;
- err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
- CONN_IDX(conn), strerror(errno));
- return ret;
- }
-
- conn->in_epoll = true;
-
- return 0;
-}
-
/**
* conn_event_do() - Set and log connection events, update epoll state
* @c: Execution context
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (3 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 04/11] tcp_splice: Remove unnecessary forward declaration David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 06/11] tcp_splice: Don't pool pipes in pairs David Gibson
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
We initialise the events_a and events_b variables with
tcp_splice_conn_epoll_events() function, then immediately copy the values
into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct
epoll_event pointers rather than raw u32 pointers, avoiding the awkward
temporaries.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 44d1e4a..a706ebb 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event mask for socket with accepted connection, set on return
- * @b: Event mask for connection target socket, set on return
+ * @a: Event for socket with accepted connection, set on return
+ * @b: Event for connection target socket, set on return
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- uint32_t *a, uint32_t *b)
+ struct epoll_event *a,
+ struct epoll_event *b)
{
- *a = *b = 0;
+ a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) {
if (!(events & B_FIN_SENT))
- *a = EPOLLIN | EPOLLRDHUP;
+ a->events = EPOLLIN | EPOLLRDHUP;
if (!(events & A_FIN_SENT))
- *b = EPOLLIN | EPOLLRDHUP;
+ b->events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- *b = EPOLLOUT;
+ b->events = EPOLLOUT;
}
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
+ b->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
/**
@@ -133,11 +134,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
.tcp.index = CONN_IDX(conn) };
struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
+ tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
--
@@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event mask for socket with accepted connection, set on return
- * @b: Event mask for connection target socket, set on return
+ * @a: Event for socket with accepted connection, set on return
+ * @b: Event for connection target socket, set on return
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- uint32_t *a, uint32_t *b)
+ struct epoll_event *a,
+ struct epoll_event *b)
{
- *a = *b = 0;
+ a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) {
if (!(events & B_FIN_SENT))
- *a = EPOLLIN | EPOLLRDHUP;
+ a->events = EPOLLIN | EPOLLRDHUP;
if (!(events & A_FIN_SENT))
- *b = EPOLLIN | EPOLLRDHUP;
+ b->events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- *b = EPOLLOUT;
+ b->events = EPOLLOUT;
}
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
+ b->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
/**
@@ -133,11 +134,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
.tcp.index = CONN_IDX(conn) };
struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
+ tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 06/11] tcp_splice: Don't pool pipes in pairs
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (4 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 07/11] tcp_splice: Rename sides of connection from a/b to 0/1 David Gibson
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
To reduce latencies, the tcp splice code maintains a pool of pre-opened
pipes to use for new connections. This is structured as an array of pairs
of pipes, with each pipe, of course, being a pair of fds. Thus when we
use the pool, a single pool "slot" provides both the a->b and b->a pipes.
There's no strong reason to store the pool in pairs, though - we can
with not much difficulty instead take the a->b and b->a pipes for a new
connection independently from separate slots in the pool, or even take one
from the the pool and create the other as we need it, if there's only one
pipe left in the pool.
This marginally increases the length of code, but simplifies the structure
of the pipe pool. We should be able to re-shrink the code with later
changes, too.
In the process we also fix some minor bugs:
- If we both failed to find a pipe in the pool and to create a new one, we
didn't log an error and would silently drop the connection. That could
make debugging such a situation difficult. Add in an error message for
that case
- When refilling the pool, if we were only able to open a single pipe in
the pair, we attempted to rollback, but instead of closing the opened
pipe, we instead closed the pipe we failed to open (probably leading to
some ignored EBADFD errors).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 60 +++++++++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 29 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index a706ebb..7b36688 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -58,7 +58,7 @@
#include "tcp_conn.h"
#define MAX_PIPE_SIZE (8UL * 1024 * 1024)
-#define TCP_SPLICE_PIPE_POOL_SIZE 16
+#define TCP_SPLICE_PIPE_POOL_SIZE 32
#define TCP_SPLICE_CONN_PRESSURE 30 /* % of conn_count */
#define TCP_SPLICE_FILE_PRESSURE 30 /* % of c->nofile */
@@ -69,7 +69,7 @@ static int ns_sock_pool4 [TCP_SOCK_POOL_SIZE];
static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE];
/* Pool of pre-opened pipes */
-static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2][2];
+static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
#define CONN_V6(x) (x->flags & SPLICE_V6)
#define CONN_V4(x) (!CONN_V6(x))
@@ -307,19 +307,16 @@ static int tcp_splice_connect_finish(const struct ctx *c,
conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][0][1]);
-
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][1][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1][1]);
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
break;
}
}
-
if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC) ||
- pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create a->b pipe: %s",
+ strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
@@ -328,6 +325,22 @@ static int tcp_splice_connect_finish(const struct ctx *c,
trace("TCP (spliced): cannot set a->b pipe size to %lu",
c->tcp.pipe_size);
}
+ }
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ break;
+ }
+ }
+ if (conn->pipe_b_a[0] < 0) {
+ if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create b->a pipe: %s",
+ strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
trace("TCP (spliced): cannot set b->a pipe size to %lu",
@@ -716,12 +729,12 @@ close:
*/
static void tcp_set_pipe_size(struct ctx *c)
{
- int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE * 2][2], i, j;
+ int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE][2], i, j;
c->tcp.pipe_size = MAX_PIPE_SIZE;
smaller:
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE * 2; i++) {
+ for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (pipe2(probe_pipe[i], O_CLOEXEC)) {
i++;
break;
@@ -736,7 +749,7 @@ smaller:
close(probe_pipe[j][1]);
}
- if (i == TCP_SPLICE_PIPE_POOL_SIZE * 2)
+ if (i == TCP_SPLICE_PIPE_POOL_SIZE)
return;
if (!(c->tcp.pipe_size /= 2)) {
@@ -756,25 +769,14 @@ static void tcp_splice_pipe_refill(const struct ctx *c)
int i;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0)
+ if (splice_pipe_pool[i][0] >= 0)
break;
- if (pipe2(splice_pipe_pool[i][0], O_NONBLOCK | O_CLOEXEC))
- continue;
- if (pipe2(splice_pipe_pool[i][1], O_NONBLOCK | O_CLOEXEC)) {
- close(splice_pipe_pool[i][1][0]);
- close(splice_pipe_pool[i][1][1]);
+ if (pipe2(splice_pipe_pool[i], O_NONBLOCK | O_CLOEXEC))
continue;
- }
- if (fcntl(splice_pipe_pool[i][0][0], F_SETPIPE_SZ,
+ if (fcntl(splice_pipe_pool[i][0], F_SETPIPE_SZ,
c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
- c->tcp.pipe_size);
- }
-
- if (fcntl(splice_pipe_pool[i][1][0], F_SETPIPE_SZ,
- c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ trace("TCP (spliced): cannot set pool pipe size to %lu",
c->tcp.pipe_size);
}
}
--
@@ -58,7 +58,7 @@
#include "tcp_conn.h"
#define MAX_PIPE_SIZE (8UL * 1024 * 1024)
-#define TCP_SPLICE_PIPE_POOL_SIZE 16
+#define TCP_SPLICE_PIPE_POOL_SIZE 32
#define TCP_SPLICE_CONN_PRESSURE 30 /* % of conn_count */
#define TCP_SPLICE_FILE_PRESSURE 30 /* % of c->nofile */
@@ -69,7 +69,7 @@ static int ns_sock_pool4 [TCP_SOCK_POOL_SIZE];
static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE];
/* Pool of pre-opened pipes */
-static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2][2];
+static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
#define CONN_V6(x) (x->flags & SPLICE_V6)
#define CONN_V4(x) (!CONN_V6(x))
@@ -307,19 +307,16 @@ static int tcp_splice_connect_finish(const struct ctx *c,
conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][0][1]);
-
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][1][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1][1]);
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
break;
}
}
-
if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC) ||
- pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create a->b pipe: %s",
+ strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
@@ -328,6 +325,22 @@ static int tcp_splice_connect_finish(const struct ctx *c,
trace("TCP (spliced): cannot set a->b pipe size to %lu",
c->tcp.pipe_size);
}
+ }
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ break;
+ }
+ }
+ if (conn->pipe_b_a[0] < 0) {
+ if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create b->a pipe: %s",
+ strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
trace("TCP (spliced): cannot set b->a pipe size to %lu",
@@ -716,12 +729,12 @@ close:
*/
static void tcp_set_pipe_size(struct ctx *c)
{
- int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE * 2][2], i, j;
+ int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE][2], i, j;
c->tcp.pipe_size = MAX_PIPE_SIZE;
smaller:
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE * 2; i++) {
+ for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (pipe2(probe_pipe[i], O_CLOEXEC)) {
i++;
break;
@@ -736,7 +749,7 @@ smaller:
close(probe_pipe[j][1]);
}
- if (i == TCP_SPLICE_PIPE_POOL_SIZE * 2)
+ if (i == TCP_SPLICE_PIPE_POOL_SIZE)
return;
if (!(c->tcp.pipe_size /= 2)) {
@@ -756,25 +769,14 @@ static void tcp_splice_pipe_refill(const struct ctx *c)
int i;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0)
+ if (splice_pipe_pool[i][0] >= 0)
break;
- if (pipe2(splice_pipe_pool[i][0], O_NONBLOCK | O_CLOEXEC))
- continue;
- if (pipe2(splice_pipe_pool[i][1], O_NONBLOCK | O_CLOEXEC)) {
- close(splice_pipe_pool[i][1][0]);
- close(splice_pipe_pool[i][1][1]);
+ if (pipe2(splice_pipe_pool[i], O_NONBLOCK | O_CLOEXEC))
continue;
- }
- if (fcntl(splice_pipe_pool[i][0][0], F_SETPIPE_SZ,
+ if (fcntl(splice_pipe_pool[i][0], F_SETPIPE_SZ,
c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
- c->tcp.pipe_size);
- }
-
- if (fcntl(splice_pipe_pool[i][1][0], F_SETPIPE_SZ,
- c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ trace("TCP (spliced): cannot set pool pipe size to %lu",
c->tcp.pipe_size);
}
}
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 07/11] tcp_splice: Rename sides of connection from a/b to 0/1
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (5 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 06/11] tcp_splice: Don't pool pipes in pairs David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer() David Gibson
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Each spliced connection has two mostly, although not entirely, symmetric
sides. We currently call those "a" and "b" and have different fields in
the connection structure for each one.
We can better exploit that symmetry if we use two element arrays rather
thatn separately named fields. Do that in the places we can, and for the
others change the "a"/"b" terminology to 0/1 to match.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_conn.h | 45 +++++------
tcp_splice.c | 224 +++++++++++++++++++++++++--------------------------
2 files changed, 130 insertions(+), 139 deletions(-)
diff --git a/tcp_conn.h b/tcp_conn.h
index 0751e00..01d31d4 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -119,54 +119,47 @@ struct tcp_tap_conn {
uint32_t seq_init_from_tap;
};
+#define SIDES 2
/**
* struct tcp_splice_conn - Descriptor for a spliced TCP connection
* @c: Fields common with tcp_tap_conn
* @in_epoll: Is the connection in the epoll set?
- * @a: File descriptor number of socket for accepted connection
- * @pipe_a_b: Pipe ends for splice() from @a to @b
- * @b: File descriptor number of peer connected socket
- * @pipe_b_a: Pipe ends for splice() from @b to @a
+ * @s: File descriptor for sockets
+ * @pipe: File descriptors for pipes
* @events: Events observed/actions performed on connection
* @flags: Connection flags (attributes, not events)
- * @a_read: Bytes read from @a (not fully written to @b in one shot)
- * @a_written: Bytes written to @a (not fully written from one @b read)
- * @b_read: Bytes read from @b (not fully written to @a in one shot)
- * @b_written: Bytes written to @b (not fully written from one @a read)
+ * @read: Bytes read (not fully written to other side in one shot)
+ * @written: Bytes written (not fully written from one other side read)
*/
struct tcp_splice_conn {
/* Must be first element to match tcp_tap_conn */
struct tcp_conn_common c;
bool in_epoll :1;
- int a;
- int pipe_a_b[2];
- int b;
- int pipe_b_a[2];
+ int s[SIDES];
+ int pipe[SIDES][2];
uint8_t events;
#define SPLICE_CLOSED 0
#define SPLICE_CONNECT BIT(0)
#define SPLICE_ESTABLISHED BIT(1)
-#define A_OUT_WAIT BIT(2)
-#define B_OUT_WAIT BIT(3)
-#define A_FIN_RCVD BIT(4)
-#define B_FIN_RCVD BIT(5)
-#define A_FIN_SENT BIT(6)
-#define B_FIN_SENT BIT(7)
+#define OUT_WAIT_0 BIT(2)
+#define OUT_WAIT_1 BIT(3)
+#define FIN_RCVD_0 BIT(4)
+#define FIN_RCVD_1 BIT(5)
+#define FIN_SENT_0 BIT(6)
+#define FIN_SENT_1 BIT(7)
uint8_t flags;
#define SPLICE_V6 BIT(0)
-#define RCVLOWAT_SET_A BIT(1)
-#define RCVLOWAT_SET_B BIT(2)
-#define RCVLOWAT_ACT_A BIT(3)
-#define RCVLOWAT_ACT_B BIT(4)
+#define RCVLOWAT_SET_0 BIT(1)
+#define RCVLOWAT_SET_1 BIT(2)
+#define RCVLOWAT_ACT_0 BIT(3)
+#define RCVLOWAT_ACT_1 BIT(4)
#define CLOSING BIT(5)
- uint32_t a_read;
- uint32_t a_written;
- uint32_t b_read;
- uint32_t b_written;
+ uint32_t read[SIDES];
+ uint32_t written[SIDES];
};
/**
diff --git a/tcp_splice.c b/tcp_splice.c
index 7b36688..f405184 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -21,12 +21,12 @@
*
* - SPLICE_CONNECT: connection accepted, connecting to target
* - SPLICE_ESTABLISHED: connection to target established
- * - A_OUT_WAIT: pipe to accepted socket full, wait for EPOLLOUT
- * - B_OUT_WAIT: pipe to target socket full, wait for EPOLLOUT
- * - A_FIN_RCVD: FIN (EPOLLRDHUP) seen from accepted socket
- * - B_FIN_RCVD: FIN (EPOLLRDHUP) seen from target socket
- * - A_FIN_RCVD: FIN (write shutdown) sent to accepted socket
- * - B_FIN_RCVD: FIN (write shutdown) sent to target socket
+ * - OUT_WAIT_0: pipe to accepted socket full, wait for EPOLLOUT
+ * - OUT_WAIT_1: pipe to target socket full, wait for EPOLLOUT
+ * - FIN_RCVD_0: FIN (EPOLLRDHUP) seen from accepted socket
+ * - FIN_RCVD_1: FIN (EPOLLRDHUP) seen from target socket
+ * - FIN_SENT_0: FIN (write shutdown) sent to accepted socket
+ * - FIN_SENT_1: FIN (write shutdown) sent to target socket
*
* #syscalls:pasta pipe2|pipe fcntl armv6l:fcntl64 armv7l:fcntl64 ppc64:fcntl64
*/
@@ -79,14 +79,14 @@ static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
/* Display strings for connection events */
static const char *tcp_splice_event_str[] __attribute((__unused__)) = {
- "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "A_OUT_WAIT", "B_OUT_WAIT",
- "A_FIN_RCVD", "B_FIN_RCVD", "A_FIN_SENT", "B_FIN_SENT",
+ "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "OUT_WAIT_0", "OUT_WAIT_1",
+ "FIN_RCVD_0", "FIN_RCVD_1", "FIN_SENT_0", "FIN_SENT_1",
};
/* Display strings for connection flags */
static const char *tcp_splice_flag_str[] __attribute((__unused__)) = {
- "SPLICE_V6", "RCVLOWAT_SET_A", "RCVLOWAT_SET_B", "RCVLOWAT_ACT_A",
- "RCVLOWAT_ACT_B", "CLOSING",
+ "SPLICE_V6", "RCVLOWAT_SET_0", "RCVLOWAT_SET_1", "RCVLOWAT_ACT_0",
+ "RCVLOWAT_ACT_1", "CLOSING",
};
/* Forward declaration */
@@ -95,26 +95,24 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event for socket with accepted connection, set on return
- * @b: Event for connection target socket, set on return
+ * @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- struct epoll_event *a,
- struct epoll_event *b)
+ struct epoll_event ev[])
{
- a->events = b->events = 0;
+ ev[0].events = ev[1].events = 0;
if (events & SPLICE_ESTABLISHED) {
- if (!(events & B_FIN_SENT))
- a->events = EPOLLIN | EPOLLRDHUP;
- if (!(events & A_FIN_SENT))
- b->events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_1))
+ ev[0].events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_0))
+ ev[1].events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- b->events = EPOLLOUT;
+ ev[1].events = EPOLLOUT;
}
- a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- b->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ ev[0].events |= (events & OUT_WAIT_0) ? EPOLLOUT : 0;
+ ev[1].events |= (events & OUT_WAIT_1) ? EPOLLOUT : 0;
}
/**
@@ -128,17 +126,17 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct tcp_splice_conn *conn)
{
int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ union epoll_ref ref[SIDES] = {
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[0], .tcp.index = CONN_IDX(conn) },
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[1], .tcp.index = CONN_IDX(conn) }
+ };
+ struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
+ { .data.u64 = ref[1].u64 } };
- tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
+ tcp_splice_conn_epoll_events(conn->events, ev);
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ if (epoll_ctl(c->epollfd, m, conn->s[0], &ev[0]) ||
+ epoll_ctl(c->epollfd, m, conn->s[1], &ev[1])) {
int ret = -errno;
err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
CONN_IDX(conn), strerror(errno));
@@ -184,8 +182,8 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
if (flag == CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[0], NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[1], NULL);
}
}
@@ -263,26 +261,26 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
if (conn->events & SPLICE_ESTABLISHED) {
/* Flushing might need to block: don't recycle them. */
- if (conn->pipe_a_b[0] != -1) {
- close(conn->pipe_a_b[0]);
- close(conn->pipe_a_b[1]);
- conn->pipe_a_b[0] = conn->pipe_a_b[1] = -1;
+ if (conn->pipe[0][0] != -1) {
+ close(conn->pipe[0][0]);
+ close(conn->pipe[0][1]);
+ conn->pipe[0][0] = conn->pipe[0][1] = -1;
}
- if (conn->pipe_b_a[0] != -1) {
- close(conn->pipe_b_a[0]);
- close(conn->pipe_b_a[1]);
- conn->pipe_b_a[0] = conn->pipe_b_a[1] = -1;
+ if (conn->pipe[1][0] != -1) {
+ close(conn->pipe[1][0]);
+ close(conn->pipe[1][1]);
+ conn->pipe[1][0] = conn->pipe[1][1] = -1;
}
}
if (conn->events & SPLICE_CONNECT) {
- close(conn->b);
- conn->b = -1;
+ close(conn->s[1]);
+ conn->s[1] = -1;
}
- close(conn->a);
- conn->a = -1;
- conn->a_read = conn->a_written = conn->b_read = conn->b_written = 0;
+ close(conn->s[0]);
+ conn->s[0] = -1;
+ conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
conn->events = SPLICE_CLOSED;
conn->flags = 0;
@@ -303,47 +301,47 @@ static int tcp_splice_connect_finish(const struct ctx *c,
{
int i;
- conn->pipe_a_b[0] = conn->pipe_b_a[0] = -1;
- conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
+ conn->pipe[0][0] = conn->pipe[1][0] = -1;
+ conn->pipe[0][1] = conn->pipe[1][1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create a->b pipe: %s",
+ if (conn->pipe[0][0] < 0) {
+ if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 0->1 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_a_b[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
+ if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
c->tcp.pipe_size);
}
}
for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_b_a[0] < 0) {
- if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create b->a pipe: %s",
+ if (conn->pipe[1][0] < 0) {
+ if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 1->0 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
c->tcp.pipe_size);
}
}
@@ -379,12 +377,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
const struct sockaddr *sa;
socklen_t sl;
- conn->b = sock_conn;
+ conn->s[1] = sock_conn;
- if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK,
+ if (setsockopt(conn->s[1], SOL_TCP, TCP_QUICKACK,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i",
- conn->b);
+ conn->s[1]);
}
if (CONN_V6(conn)) {
@@ -395,7 +393,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
sl = sizeof(addr4);
}
- if (connect(conn->b, sa, sl)) {
+ if (connect(conn->s[1], sa, sl)) {
if (errno != EINPROGRESS) {
int ret = -errno;
@@ -473,13 +471,13 @@ static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
{
if (!reverse) {
*from = ref_sock;
- *to = (*from == conn->a) ? conn->b : conn->a;
+ *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
} else {
*to = ref_sock;
- *from = (*to == conn->a) ? conn->b : conn->a;
+ *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
}
- *pipes = *from == conn->a ? conn->pipe_a_b : conn->pipe_b_a;
+ *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
}
/**
@@ -521,7 +519,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
trace("TCP (spliced): failed to set TCP_QUICKACK on %i", s);
conn->c.spliced = true;
- conn->a = s;
+ conn->s[0] = s;
if (tcp_splice_new(c, conn, ref.port, ref.pif))
conn_flag(c, conn, CLOSING);
@@ -559,10 +557,10 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->a)
- conn_event(c, conn, ~A_OUT_WAIT);
+ if (s == conn->s[0])
+ conn_event(c, conn, ~OUT_WAIT_0);
else
- conn_event(c, conn, ~B_OUT_WAIT);
+ conn_event(c, conn, ~OUT_WAIT_1);
tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
} else {
@@ -570,33 +568,33 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLRDHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_RCVD);
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_RCVD_0);
else
- conn_event(c, conn, B_FIN_RCVD);
+ conn_event(c, conn, FIN_RCVD_1);
}
if (events & EPOLLHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_SENT); /* Fake, but implied */
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_SENT_0); /* Fake, but implied */
else
- conn_event(c, conn, B_FIN_SENT);
+ conn_event(c, conn, FIN_SENT_1);
}
swap:
eof = 0;
never_read = 1;
- if (from == conn->a) {
- seq_read = &conn->a_read;
- seq_write = &conn->a_written;
- lowat_set_flag = RCVLOWAT_SET_A;
- lowat_act_flag = RCVLOWAT_ACT_A;
+ if (from == conn->s[0]) {
+ seq_read = &conn->read[0];
+ seq_write = &conn->written[0];
+ lowat_set_flag = RCVLOWAT_SET_0;
+ lowat_act_flag = RCVLOWAT_ACT_0;
} else {
- seq_read = &conn->b_read;
- seq_write = &conn->b_written;
- lowat_set_flag = RCVLOWAT_SET_B;
- lowat_act_flag = RCVLOWAT_ACT_B;
+ seq_read = &conn->read[1];
+ seq_write = &conn->written[1];
+ lowat_set_flag = RCVLOWAT_SET_1;
+ lowat_act_flag = RCVLOWAT_ACT_1;
}
while (1) {
@@ -666,10 +664,10 @@ eintr:
if (never_read)
break;
- if (to == conn->a)
- conn_event(c, conn, A_OUT_WAIT);
+ if (to == conn->s[0])
+ conn_event(c, conn, OUT_WAIT_0);
else
- conn_event(c, conn, B_OUT_WAIT);
+ conn_event(c, conn, OUT_WAIT_1);
break;
}
@@ -685,31 +683,31 @@ eintr:
break;
}
- if ((conn->events & A_FIN_RCVD) && !(conn->events & B_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->b, SHUT_WR);
- conn_event(c, conn, B_FIN_SENT);
+ shutdown(conn->s[1], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_1);
}
}
- if ((conn->events & B_FIN_RCVD) && !(conn->events & A_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->a, SHUT_WR);
- conn_event(c, conn, A_FIN_SENT);
+ shutdown(conn->s[0], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_0);
}
}
- if (CONN_HAS(conn, A_FIN_SENT | B_FIN_SENT))
+ if (CONN_HAS(conn, FIN_SENT_0 | FIN_SENT_1))
goto close;
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
SWAP(from, to);
- if (pipes == conn->pipe_a_b)
- pipes = conn->pipe_b_a;
+ if (pipes == conn->pipe[0])
+ pipes = conn->pipe[1];
else
- pipes = conn->pipe_a_b;
+ pipes = conn->pipe[0];
goto swap;
}
@@ -843,26 +841,26 @@ void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
return;
}
- if ( (conn->flags & RCVLOWAT_SET_A) &&
- !(conn->flags & RCVLOWAT_ACT_A)) {
- if (setsockopt(conn->a, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_0) &&
+ !(conn->flags & RCVLOWAT_ACT_0)) {
+ if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->a);
+ "%i", conn->s[0]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_A);
+ conn_flag(c, conn, ~RCVLOWAT_SET_0);
}
- if ( (conn->flags & RCVLOWAT_SET_B) &&
- !(conn->flags & RCVLOWAT_ACT_B)) {
- if (setsockopt(conn->b, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_1) &&
+ !(conn->flags & RCVLOWAT_ACT_1)) {
+ if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->b);
+ "%i", conn->s[1]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_B);
+ conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
- conn_flag(c, conn, ~RCVLOWAT_ACT_A);
- conn_flag(c, conn, ~RCVLOWAT_ACT_B);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_0);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_1);
}
--
@@ -21,12 +21,12 @@
*
* - SPLICE_CONNECT: connection accepted, connecting to target
* - SPLICE_ESTABLISHED: connection to target established
- * - A_OUT_WAIT: pipe to accepted socket full, wait for EPOLLOUT
- * - B_OUT_WAIT: pipe to target socket full, wait for EPOLLOUT
- * - A_FIN_RCVD: FIN (EPOLLRDHUP) seen from accepted socket
- * - B_FIN_RCVD: FIN (EPOLLRDHUP) seen from target socket
- * - A_FIN_RCVD: FIN (write shutdown) sent to accepted socket
- * - B_FIN_RCVD: FIN (write shutdown) sent to target socket
+ * - OUT_WAIT_0: pipe to accepted socket full, wait for EPOLLOUT
+ * - OUT_WAIT_1: pipe to target socket full, wait for EPOLLOUT
+ * - FIN_RCVD_0: FIN (EPOLLRDHUP) seen from accepted socket
+ * - FIN_RCVD_1: FIN (EPOLLRDHUP) seen from target socket
+ * - FIN_SENT_0: FIN (write shutdown) sent to accepted socket
+ * - FIN_SENT_1: FIN (write shutdown) sent to target socket
*
* #syscalls:pasta pipe2|pipe fcntl armv6l:fcntl64 armv7l:fcntl64 ppc64:fcntl64
*/
@@ -79,14 +79,14 @@ static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
/* Display strings for connection events */
static const char *tcp_splice_event_str[] __attribute((__unused__)) = {
- "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "A_OUT_WAIT", "B_OUT_WAIT",
- "A_FIN_RCVD", "B_FIN_RCVD", "A_FIN_SENT", "B_FIN_SENT",
+ "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "OUT_WAIT_0", "OUT_WAIT_1",
+ "FIN_RCVD_0", "FIN_RCVD_1", "FIN_SENT_0", "FIN_SENT_1",
};
/* Display strings for connection flags */
static const char *tcp_splice_flag_str[] __attribute((__unused__)) = {
- "SPLICE_V6", "RCVLOWAT_SET_A", "RCVLOWAT_SET_B", "RCVLOWAT_ACT_A",
- "RCVLOWAT_ACT_B", "CLOSING",
+ "SPLICE_V6", "RCVLOWAT_SET_0", "RCVLOWAT_SET_1", "RCVLOWAT_ACT_0",
+ "RCVLOWAT_ACT_1", "CLOSING",
};
/* Forward declaration */
@@ -95,26 +95,24 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event for socket with accepted connection, set on return
- * @b: Event for connection target socket, set on return
+ * @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- struct epoll_event *a,
- struct epoll_event *b)
+ struct epoll_event ev[])
{
- a->events = b->events = 0;
+ ev[0].events = ev[1].events = 0;
if (events & SPLICE_ESTABLISHED) {
- if (!(events & B_FIN_SENT))
- a->events = EPOLLIN | EPOLLRDHUP;
- if (!(events & A_FIN_SENT))
- b->events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_1))
+ ev[0].events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_0))
+ ev[1].events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- b->events = EPOLLOUT;
+ ev[1].events = EPOLLOUT;
}
- a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- b->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ ev[0].events |= (events & OUT_WAIT_0) ? EPOLLOUT : 0;
+ ev[1].events |= (events & OUT_WAIT_1) ? EPOLLOUT : 0;
}
/**
@@ -128,17 +126,17 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct tcp_splice_conn *conn)
{
int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ union epoll_ref ref[SIDES] = {
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[0], .tcp.index = CONN_IDX(conn) },
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[1], .tcp.index = CONN_IDX(conn) }
+ };
+ struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
+ { .data.u64 = ref[1].u64 } };
- tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
+ tcp_splice_conn_epoll_events(conn->events, ev);
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ if (epoll_ctl(c->epollfd, m, conn->s[0], &ev[0]) ||
+ epoll_ctl(c->epollfd, m, conn->s[1], &ev[1])) {
int ret = -errno;
err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
CONN_IDX(conn), strerror(errno));
@@ -184,8 +182,8 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
if (flag == CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[0], NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[1], NULL);
}
}
@@ -263,26 +261,26 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
if (conn->events & SPLICE_ESTABLISHED) {
/* Flushing might need to block: don't recycle them. */
- if (conn->pipe_a_b[0] != -1) {
- close(conn->pipe_a_b[0]);
- close(conn->pipe_a_b[1]);
- conn->pipe_a_b[0] = conn->pipe_a_b[1] = -1;
+ if (conn->pipe[0][0] != -1) {
+ close(conn->pipe[0][0]);
+ close(conn->pipe[0][1]);
+ conn->pipe[0][0] = conn->pipe[0][1] = -1;
}
- if (conn->pipe_b_a[0] != -1) {
- close(conn->pipe_b_a[0]);
- close(conn->pipe_b_a[1]);
- conn->pipe_b_a[0] = conn->pipe_b_a[1] = -1;
+ if (conn->pipe[1][0] != -1) {
+ close(conn->pipe[1][0]);
+ close(conn->pipe[1][1]);
+ conn->pipe[1][0] = conn->pipe[1][1] = -1;
}
}
if (conn->events & SPLICE_CONNECT) {
- close(conn->b);
- conn->b = -1;
+ close(conn->s[1]);
+ conn->s[1] = -1;
}
- close(conn->a);
- conn->a = -1;
- conn->a_read = conn->a_written = conn->b_read = conn->b_written = 0;
+ close(conn->s[0]);
+ conn->s[0] = -1;
+ conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
conn->events = SPLICE_CLOSED;
conn->flags = 0;
@@ -303,47 +301,47 @@ static int tcp_splice_connect_finish(const struct ctx *c,
{
int i;
- conn->pipe_a_b[0] = conn->pipe_b_a[0] = -1;
- conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
+ conn->pipe[0][0] = conn->pipe[1][0] = -1;
+ conn->pipe[0][1] = conn->pipe[1][1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create a->b pipe: %s",
+ if (conn->pipe[0][0] < 0) {
+ if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 0->1 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_a_b[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
+ if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
c->tcp.pipe_size);
}
}
for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_b_a[0] < 0) {
- if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create b->a pipe: %s",
+ if (conn->pipe[1][0] < 0) {
+ if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 1->0 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
c->tcp.pipe_size);
}
}
@@ -379,12 +377,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
const struct sockaddr *sa;
socklen_t sl;
- conn->b = sock_conn;
+ conn->s[1] = sock_conn;
- if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK,
+ if (setsockopt(conn->s[1], SOL_TCP, TCP_QUICKACK,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i",
- conn->b);
+ conn->s[1]);
}
if (CONN_V6(conn)) {
@@ -395,7 +393,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
sl = sizeof(addr4);
}
- if (connect(conn->b, sa, sl)) {
+ if (connect(conn->s[1], sa, sl)) {
if (errno != EINPROGRESS) {
int ret = -errno;
@@ -473,13 +471,13 @@ static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
{
if (!reverse) {
*from = ref_sock;
- *to = (*from == conn->a) ? conn->b : conn->a;
+ *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
} else {
*to = ref_sock;
- *from = (*to == conn->a) ? conn->b : conn->a;
+ *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
}
- *pipes = *from == conn->a ? conn->pipe_a_b : conn->pipe_b_a;
+ *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
}
/**
@@ -521,7 +519,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
trace("TCP (spliced): failed to set TCP_QUICKACK on %i", s);
conn->c.spliced = true;
- conn->a = s;
+ conn->s[0] = s;
if (tcp_splice_new(c, conn, ref.port, ref.pif))
conn_flag(c, conn, CLOSING);
@@ -559,10 +557,10 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->a)
- conn_event(c, conn, ~A_OUT_WAIT);
+ if (s == conn->s[0])
+ conn_event(c, conn, ~OUT_WAIT_0);
else
- conn_event(c, conn, ~B_OUT_WAIT);
+ conn_event(c, conn, ~OUT_WAIT_1);
tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
} else {
@@ -570,33 +568,33 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLRDHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_RCVD);
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_RCVD_0);
else
- conn_event(c, conn, B_FIN_RCVD);
+ conn_event(c, conn, FIN_RCVD_1);
}
if (events & EPOLLHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_SENT); /* Fake, but implied */
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_SENT_0); /* Fake, but implied */
else
- conn_event(c, conn, B_FIN_SENT);
+ conn_event(c, conn, FIN_SENT_1);
}
swap:
eof = 0;
never_read = 1;
- if (from == conn->a) {
- seq_read = &conn->a_read;
- seq_write = &conn->a_written;
- lowat_set_flag = RCVLOWAT_SET_A;
- lowat_act_flag = RCVLOWAT_ACT_A;
+ if (from == conn->s[0]) {
+ seq_read = &conn->read[0];
+ seq_write = &conn->written[0];
+ lowat_set_flag = RCVLOWAT_SET_0;
+ lowat_act_flag = RCVLOWAT_ACT_0;
} else {
- seq_read = &conn->b_read;
- seq_write = &conn->b_written;
- lowat_set_flag = RCVLOWAT_SET_B;
- lowat_act_flag = RCVLOWAT_ACT_B;
+ seq_read = &conn->read[1];
+ seq_write = &conn->written[1];
+ lowat_set_flag = RCVLOWAT_SET_1;
+ lowat_act_flag = RCVLOWAT_ACT_1;
}
while (1) {
@@ -666,10 +664,10 @@ eintr:
if (never_read)
break;
- if (to == conn->a)
- conn_event(c, conn, A_OUT_WAIT);
+ if (to == conn->s[0])
+ conn_event(c, conn, OUT_WAIT_0);
else
- conn_event(c, conn, B_OUT_WAIT);
+ conn_event(c, conn, OUT_WAIT_1);
break;
}
@@ -685,31 +683,31 @@ eintr:
break;
}
- if ((conn->events & A_FIN_RCVD) && !(conn->events & B_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->b, SHUT_WR);
- conn_event(c, conn, B_FIN_SENT);
+ shutdown(conn->s[1], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_1);
}
}
- if ((conn->events & B_FIN_RCVD) && !(conn->events & A_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->a, SHUT_WR);
- conn_event(c, conn, A_FIN_SENT);
+ shutdown(conn->s[0], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_0);
}
}
- if (CONN_HAS(conn, A_FIN_SENT | B_FIN_SENT))
+ if (CONN_HAS(conn, FIN_SENT_0 | FIN_SENT_1))
goto close;
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
SWAP(from, to);
- if (pipes == conn->pipe_a_b)
- pipes = conn->pipe_b_a;
+ if (pipes == conn->pipe[0])
+ pipes = conn->pipe[1];
else
- pipes = conn->pipe_a_b;
+ pipes = conn->pipe[0];
goto swap;
}
@@ -843,26 +841,26 @@ void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
return;
}
- if ( (conn->flags & RCVLOWAT_SET_A) &&
- !(conn->flags & RCVLOWAT_ACT_A)) {
- if (setsockopt(conn->a, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_0) &&
+ !(conn->flags & RCVLOWAT_ACT_0)) {
+ if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->a);
+ "%i", conn->s[0]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_A);
+ conn_flag(c, conn, ~RCVLOWAT_SET_0);
}
- if ( (conn->flags & RCVLOWAT_SET_B) &&
- !(conn->flags & RCVLOWAT_ACT_B)) {
- if (setsockopt(conn->b, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_1) &&
+ !(conn->flags & RCVLOWAT_ACT_1)) {
+ if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->b);
+ "%i", conn->s[1]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_B);
+ conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
- conn_flag(c, conn, ~RCVLOWAT_ACT_A);
- conn_flag(c, conn, ~RCVLOWAT_ACT_B);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_0);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_1);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (6 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 07/11] tcp_splice: Rename sides of connection from a/b to 0/1 David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() David Gibson
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_timer() has two very similar blocks one after another that
handle the SO_RCVLOWAT flags for the two sides of the connection. We can
deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index f405184..cadad32 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
if (conn->flags & CLOSING) {
tcp_splice_destroy(c, conn_union);
return;
}
- if ( (conn->flags & RCVLOWAT_SET_0) &&
- !(conn->flags & RCVLOWAT_ACT_0)) {
- if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[0]);
- }
- conn_flag(c, conn, ~RCVLOWAT_SET_0);
- }
+ for (side = 0; side < SIDES; side++) {
+ uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) &&
- !(conn->flags & RCVLOWAT_ACT_1)) {
- if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[1]);
+ if ((conn->flags & set) && !(conn->flags & act)) {
+ if (setsockopt(conn->s[side], SOL_SOCKET, SO_RCVLOWAT,
+ &((int){ 1 }), sizeof(int))) {
+ trace("TCP (spliced): can't set SO_RCVLOWAT on "
+ "%i", conn->s[side]);
+ }
+ conn_flag(c, conn, ~set);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
conn_flag(c, conn, ~RCVLOWAT_ACT_0);
--
@@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
if (conn->flags & CLOSING) {
tcp_splice_destroy(c, conn_union);
return;
}
- if ( (conn->flags & RCVLOWAT_SET_0) &&
- !(conn->flags & RCVLOWAT_ACT_0)) {
- if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[0]);
- }
- conn_flag(c, conn, ~RCVLOWAT_SET_0);
- }
+ for (side = 0; side < SIDES; side++) {
+ uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) &&
- !(conn->flags & RCVLOWAT_ACT_1)) {
- if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[1]);
+ if ((conn->flags & set) && !(conn->flags & act)) {
+ if (setsockopt(conn->s[side], SOL_SOCKET, SO_RCVLOWAT,
+ &((int){ 1 }), sizeof(int))) {
+ trace("TCP (spliced): can't set SO_RCVLOWAT on "
+ "%i", conn->s[side]);
+ }
+ conn_flag(c, conn, ~set);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
conn_flag(c, conn, ~RCVLOWAT_ACT_0);
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (7 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy() David Gibson
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_connect_finish() has two very similar blocks opening the two
pipes for each direction of the connection. We can deduplicate this with
a loop across the two sides.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 65 ++++++++++++++++++++--------------------------------
1 file changed, 25 insertions(+), 40 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index cadad32..214bf22 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -299,50 +299,35 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
static int tcp_splice_connect_finish(const struct ctx *c,
struct tcp_splice_conn *conn)
{
- int i;
-
- conn->pipe[0][0] = conn->pipe[1][0] = -1;
- conn->pipe[0][1] = conn->pipe[1][1] = -1;
-
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[0][0] < 0) {
- if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 0->1 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ int i = 0;
+ int side;
- if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
- c->tcp.pipe_size);
+ for (side = 0; side < SIDES; side++) {
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe[side][0],
+ splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[side][1],
+ splice_pipe_pool[i][1]);
+ break;
+ }
}
- }
- for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[1][0] < 0) {
- if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 1->0 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ if (conn->pipe[side][0] < 0) {
+ if (pipe2(conn->pipe[side], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create %d->%d pipe: %s",
+ side, !side, strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
- if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
- c->tcp.pipe_size);
+ if (fcntl(conn->pipe[side][0], F_SETPIPE_SZ,
+ c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set %d->%d pipe size to %lu",
+ side, !side, c->tcp.pipe_size);
+ }
}
}
--
@@ -299,50 +299,35 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
static int tcp_splice_connect_finish(const struct ctx *c,
struct tcp_splice_conn *conn)
{
- int i;
-
- conn->pipe[0][0] = conn->pipe[1][0] = -1;
- conn->pipe[0][1] = conn->pipe[1][1] = -1;
-
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[0][0] < 0) {
- if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 0->1 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ int i = 0;
+ int side;
- if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
- c->tcp.pipe_size);
+ for (side = 0; side < SIDES; side++) {
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe[side][0],
+ splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[side][1],
+ splice_pipe_pool[i][1]);
+ break;
+ }
}
- }
- for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[1][0] < 0) {
- if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 1->0 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ if (conn->pipe[side][0] < 0) {
+ if (pipe2(conn->pipe[side], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create %d->%d pipe: %s",
+ side, !side, strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
- if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
- c->tcp.pipe_size);
+ if (fcntl(conn->pipe[side][0], F_SETPIPE_SZ,
+ c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set %d->%d pipe size to %lu",
+ side, !side, c->tcp.pipe_size);
+ }
}
}
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy()
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (8 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 2:42 ` [PATCH v2 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler David Gibson
2023-11-07 12:45 ` [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection Stefano Brivio
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_destroy() has some close-to-duplicated logic handling closing of
the socket and pipes for each side of the connection. We can use a loop
across the sides to reduce the duplication.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 214bf22..9f84d4f 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
- if (conn->events & SPLICE_ESTABLISHED) {
- /* Flushing might need to block: don't recycle them. */
- if (conn->pipe[0][0] != -1) {
- close(conn->pipe[0][0]);
- close(conn->pipe[0][1]);
- conn->pipe[0][0] = conn->pipe[0][1] = -1;
+ for (side = 0; side < SIDES; side++) {
+ if (conn->events & SPLICE_ESTABLISHED) {
+ /* Flushing might need to block: don't recycle them. */
+ if (conn->pipe[side][0] != -1) {
+ close(conn->pipe[side][0]);
+ close(conn->pipe[side][1]);
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+ }
}
- if (conn->pipe[1][0] != -1) {
- close(conn->pipe[1][0]);
- close(conn->pipe[1][1]);
- conn->pipe[1][0] = conn->pipe[1][1] = -1;
+
+ if (side == 0 || conn->events & SPLICE_CONNECT) {
+ close(conn->s[side]);
+ conn->s[side] = -1;
}
- }
- if (conn->events & SPLICE_CONNECT) {
- close(conn->s[1]);
- conn->s[1] = -1;
+ conn->read[side] = conn->written[side] = 0;
}
- close(conn->s[0]);
- conn->s[0] = -1;
- conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
-
conn->events = SPLICE_CLOSED;
conn->flags = 0;
debug("TCP (spliced): index %li, CLOSED", CONN_IDX(conn));
--
@@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
- if (conn->events & SPLICE_ESTABLISHED) {
- /* Flushing might need to block: don't recycle them. */
- if (conn->pipe[0][0] != -1) {
- close(conn->pipe[0][0]);
- close(conn->pipe[0][1]);
- conn->pipe[0][0] = conn->pipe[0][1] = -1;
+ for (side = 0; side < SIDES; side++) {
+ if (conn->events & SPLICE_ESTABLISHED) {
+ /* Flushing might need to block: don't recycle them. */
+ if (conn->pipe[side][0] != -1) {
+ close(conn->pipe[side][0]);
+ close(conn->pipe[side][1]);
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+ }
}
- if (conn->pipe[1][0] != -1) {
- close(conn->pipe[1][0]);
- close(conn->pipe[1][1]);
- conn->pipe[1][0] = conn->pipe[1][1] = -1;
+
+ if (side == 0 || conn->events & SPLICE_CONNECT) {
+ close(conn->s[side]);
+ conn->s[side] = -1;
}
- }
- if (conn->events & SPLICE_CONNECT) {
- close(conn->s[1]);
- conn->s[1] = -1;
+ conn->read[side] = conn->written[side] = 0;
}
- close(conn->s[0]);
- conn->s[0] = -1;
- conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
-
conn->events = SPLICE_CLOSED;
conn->flags = 0;
debug("TCP (spliced): index %li, CLOSED", CONN_IDX(conn));
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (9 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy() David Gibson
@ 2023-11-07 2:42 ` David Gibson
2023-11-07 12:45 ` [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection Stefano Brivio
11 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2023-11-07 2:42 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
which of the socket, pipe and counter fields to use depending on which
side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named
fields, we can instead just use a variable indicating the side and use
that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 81 ++++++++++++++--------------------------------------
1 file changed, 22 insertions(+), 59 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 9f84d4f..a5c1332 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
return tcp_splice_connect(c, conn, s, port);
}
-/**
- * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
- * @conn: Connection pointers
- * @ref_sock: Socket returned as reference from epoll
- * @reverse: Reverse direction: @ref_sock is used as destination
- * @from: Destination socket pointer to set
- * @to: Source socket pointer to set
- * @pipes: Pipe set, assigned on return
- */
-static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
- int reverse, int *from, int *to, int **pipes)
-{
- if (!reverse) {
- *from = ref_sock;
- *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
- } else {
- *to = ref_sock;
- *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
- }
-
- *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
-}
-
/**
* tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
* @c: Execution context
@@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
int s, uint32_t events)
{
uint8_t lowat_set_flag, lowat_act_flag;
- int from, to, *pipes, eof, never_read;
- uint32_t *seq_read, *seq_write;
+ int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED)
return;
@@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->s[0])
+ if (s == conn->s[0]) {
conn_event(c, conn, ~OUT_WAIT_0);
- else
+ fromside = 1;
+ } else {
conn_event(c, conn, ~OUT_WAIT_1);
-
- tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
+ fromside = 0;
+ }
} else {
- tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
+ fromside = s == conn->s[0] ? 0 : 1;
}
if (events & EPOLLRDHUP) {
@@ -566,24 +543,16 @@ swap:
eof = 0;
never_read = 1;
- if (from == conn->s[0]) {
- seq_read = &conn->read[0];
- seq_write = &conn->written[0];
- lowat_set_flag = RCVLOWAT_SET_0;
- lowat_act_flag = RCVLOWAT_ACT_0;
- } else {
- seq_read = &conn->read[1];
- seq_write = &conn->written[1];
- lowat_set_flag = RCVLOWAT_SET_1;
- lowat_act_flag = RCVLOWAT_ACT_1;
- }
+ lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) {
ssize_t readlen, to_write = 0, written;
int more = 0;
retry:
- readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
+ readlen = splice(conn->s[fromside], NULL,
+ conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from read-side call", readlen);
if (readlen < 0) {
@@ -608,7 +577,8 @@ retry:
}
eintr:
- written = splice(pipes[0], NULL, to, NULL, to_write,
+ written = splice(conn->pipe[fromside][0], NULL,
+ conn->s[!fromside], NULL, to_write,
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from write-side call (passed %lu)",
written, to_write);
@@ -622,8 +592,8 @@ eintr:
readlen > (long)c->tcp.pipe_size / 10) {
int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
- &lowat, sizeof(lowat));
+ setsockopt(conn->s[fromside], SOL_SOCKET,
+ SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag);
conn_flag(c, conn, lowat_act_flag);
@@ -632,8 +602,8 @@ eintr:
break;
}
- *seq_read += readlen > 0 ? readlen : 0;
- *seq_write += written > 0 ? written : 0;
+ conn->read[fromside] += readlen > 0 ? readlen : 0;
+ conn->written[fromside] += written > 0 ? written : 0;
if (written < 0) {
if (errno == EINTR)
@@ -645,10 +615,8 @@ eintr:
if (never_read)
break;
- if (to == conn->s[0])
- conn_event(c, conn, OUT_WAIT_0);
- else
- conn_event(c, conn, OUT_WAIT_1);
+ conn_event(c, conn,
+ fromside == 0 ? OUT_WAIT_1 : OUT_WAIT_0);
break;
}
@@ -665,14 +633,14 @@ eintr:
}
if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[1], SHUT_WR);
conn_event(c, conn, FIN_SENT_1);
}
}
if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[0], SHUT_WR);
conn_event(c, conn, FIN_SENT_0);
}
@@ -684,12 +652,7 @@ eintr:
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
- SWAP(from, to);
- if (pipes == conn->pipe[0])
- pipes = conn->pipe[1];
- else
- pipes = conn->pipe[0];
-
+ fromside = !fromside;
goto swap;
}
--
@@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
return tcp_splice_connect(c, conn, s, port);
}
-/**
- * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
- * @conn: Connection pointers
- * @ref_sock: Socket returned as reference from epoll
- * @reverse: Reverse direction: @ref_sock is used as destination
- * @from: Destination socket pointer to set
- * @to: Source socket pointer to set
- * @pipes: Pipe set, assigned on return
- */
-static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
- int reverse, int *from, int *to, int **pipes)
-{
- if (!reverse) {
- *from = ref_sock;
- *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
- } else {
- *to = ref_sock;
- *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
- }
-
- *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
-}
-
/**
* tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
* @c: Execution context
@@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
int s, uint32_t events)
{
uint8_t lowat_set_flag, lowat_act_flag;
- int from, to, *pipes, eof, never_read;
- uint32_t *seq_read, *seq_write;
+ int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED)
return;
@@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->s[0])
+ if (s == conn->s[0]) {
conn_event(c, conn, ~OUT_WAIT_0);
- else
+ fromside = 1;
+ } else {
conn_event(c, conn, ~OUT_WAIT_1);
-
- tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
+ fromside = 0;
+ }
} else {
- tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
+ fromside = s == conn->s[0] ? 0 : 1;
}
if (events & EPOLLRDHUP) {
@@ -566,24 +543,16 @@ swap:
eof = 0;
never_read = 1;
- if (from == conn->s[0]) {
- seq_read = &conn->read[0];
- seq_write = &conn->written[0];
- lowat_set_flag = RCVLOWAT_SET_0;
- lowat_act_flag = RCVLOWAT_ACT_0;
- } else {
- seq_read = &conn->read[1];
- seq_write = &conn->written[1];
- lowat_set_flag = RCVLOWAT_SET_1;
- lowat_act_flag = RCVLOWAT_ACT_1;
- }
+ lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) {
ssize_t readlen, to_write = 0, written;
int more = 0;
retry:
- readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
+ readlen = splice(conn->s[fromside], NULL,
+ conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from read-side call", readlen);
if (readlen < 0) {
@@ -608,7 +577,8 @@ retry:
}
eintr:
- written = splice(pipes[0], NULL, to, NULL, to_write,
+ written = splice(conn->pipe[fromside][0], NULL,
+ conn->s[!fromside], NULL, to_write,
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from write-side call (passed %lu)",
written, to_write);
@@ -622,8 +592,8 @@ eintr:
readlen > (long)c->tcp.pipe_size / 10) {
int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
- &lowat, sizeof(lowat));
+ setsockopt(conn->s[fromside], SOL_SOCKET,
+ SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag);
conn_flag(c, conn, lowat_act_flag);
@@ -632,8 +602,8 @@ eintr:
break;
}
- *seq_read += readlen > 0 ? readlen : 0;
- *seq_write += written > 0 ? written : 0;
+ conn->read[fromside] += readlen > 0 ? readlen : 0;
+ conn->written[fromside] += written > 0 ? written : 0;
if (written < 0) {
if (errno == EINTR)
@@ -645,10 +615,8 @@ eintr:
if (never_read)
break;
- if (to == conn->s[0])
- conn_event(c, conn, OUT_WAIT_0);
- else
- conn_event(c, conn, OUT_WAIT_1);
+ conn_event(c, conn,
+ fromside == 0 ? OUT_WAIT_1 : OUT_WAIT_0);
break;
}
@@ -665,14 +633,14 @@ eintr:
}
if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[1], SHUT_WR);
conn_event(c, conn, FIN_SENT_1);
}
}
if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[0], SHUT_WR);
conn_event(c, conn, FIN_SENT_0);
}
@@ -684,12 +652,7 @@ eintr:
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
- SWAP(from, to);
- if (pipes == conn->pipe[0])
- pipes = conn->pipe[1];
- else
- pipes = conn->pipe[0];
-
+ fromside = !fromside;
goto swap;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection
2023-11-07 2:42 [PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (10 preceding siblings ...)
2023-11-07 2:42 ` [PATCH v2 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler David Gibson
@ 2023-11-07 12:45 ` Stefano Brivio
11 siblings, 0 replies; 13+ messages in thread
From: Stefano Brivio @ 2023-11-07 12:45 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Tue, 7 Nov 2023 13:42:39 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> For spliced connections, both "sides" are sockets, and for many
> purposes how we deal with each side is symmetric. Currently, however,
> we track the information for each side in independent fields in the
> structure, meaning we can't easily exploit that symmetry.
>
> This makes a number of reorganizations of the tcp splice code so that
> we can explot that symmetry to reduce code size. This will have some
> additional advantages when we come to integrate with the in-progress
> unified flow table.
>
> Based on top of the interface identifiers and automatic forwarding
> cleanup series.
>
> Changes since v1:
> * Small updates to comments and commit messages
>
> David Gibson (11):
> tcp_splice: Remove redundant tcp_splice_epoll_ctl()
> tcp_splice: Correct error handling in tcp_splice_epoll_ctl()
> tcp_splice: Don't handle EPOLL_CTL_DEL as part of
> tcp_splice_epoll_ctl()
> tcp_splice: Remove unnecessary forward declaration
> tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl()
> tcp_splice: Don't pool pipes in pairs
> tcp_splice: Rename sides of connection from a/b to 0/1
> tcp_splice: Exploit side symmetry in tcp_splice_timer()
> tcp_splice: Exploit side symmetry in tcp_splice_connect_finish()
> tcp_splice: Exploit side symmetry in tcp_splice_destroy()
> tcp_splice: Simplify selection of socket and pipe sides in socket
> handler
Applied.
--
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread