* [PATCH 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl() David Gibson
` (9 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_conn_update() calls tcp_splice_epoll_ctl() twice: first ignoring
the return value, then checking it. This serves no purpose. If the first
call succeeds, the second call will do exactly the same thing again, since
nothing has changed in conn. If the first call fails, then
tcp_splice_epoll_ctl() itself will EPOLL_CTL_DEL both fds, meaning when
the second call tries to EPOLL_CTL_MOD them it will necessarily fail.
It appears that this duplication was introduced by accident in an
otherwise unrelated patch.
Fixes: bb708111 ("treewide: Packet abstraction with mandatory boundary checks")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 19f5406..fd6ce8d 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -255,7 +255,6 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
*/
void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
{
- tcp_splice_epoll_ctl(c, new);
if (tcp_splice_epoll_ctl(c, new))
conn_flag(c, new, CLOSING);
}
--
@@ -255,7 +255,6 @@ static void conn_event_do(const struct ctx *c, struct tcp_splice_conn *conn,
*/
void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
{
- tcp_splice_epoll_ctl(c, new);
if (tcp_splice_epoll_ctl(c, new))
conn_flag(c, new, CLOSING);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
2023-10-12 1:51 ` [PATCH 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() David Gibson
` (8 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
If we get an error from epoll_ctl() in tcp_splice_epoll_ctl() we goto the
'delete' path where we remove both sockets from the epoll set and return
an error. There are several problems with this:
- We 'return -errno' after the EPOLL_CTL_DEL operations, which means the
deleting epoll_ctl() calls may have overwritten the errno values which
actually triggered the failures.
- The call from conn_flag_do() occurs when the CLOSING flag is set, in
which case we go do the delete path regardless of error. In that case
the 'return errno' is meaningless since we don't expect the EPOLL_CTL_DEL
operations to fail and we ignore the return code anyway.
- All other calls to tcp_splice_epoll_ctl() check the return code and if
non-zero immediately call conn_flag(..., CLOSING) which will call
tcp_splice_epoll_ctl() again explicitly to remove the sockets from epoll.
That means removing them when the error first occurs is redundant.
- We never specifically report an error on the epoll_ctl() operations. We
just set the connection to CLOSING, more or less silently killing it.
This could make debugging difficult in the unlikely even that we get a
failure here.
Re-organise tcp_splice_epoll_ctl() to just log a message then return in the
error case, and only EPOLL_CTL_DEL when explicitly asked to with the
CLOSING flag.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index fd6ce8d..22a854e 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -182,25 +182,27 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING)
- goto delete;
+ if (conn->flags & CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
+ return 0;
+ }
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b))
- goto delete;
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
conn->in_epoll = true;
return 0;
-
-delete:
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return -errno;
}
/**
--
@@ -182,25 +182,27 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING)
- goto delete;
+ if (conn->flags & CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
+ return 0;
+ }
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b))
- goto delete;
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
conn->in_epoll = true;
return 0;
-
-delete:
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return -errno;
}
/**
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
2023-10-12 1:51 ` [PATCH 01/11] tcp_splice: Remove redundant tcp_splice_epoll_ctl() David Gibson
2023-10-12 1:51 ` [PATCH 02/11] tcp_splice: Correct error handling in tcp_splice_epoll_ctl() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-11-03 16:20 ` Stefano Brivio
2023-10-12 1:51 ` [PATCH 04/11] tcp_splice: Remove unnecessary forward declaration David Gibson
` (7 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
when conn->flags & CLOSING. This will always happen immediately after
setting that flag, since conn_flag_do() makes the call itself. That's also
the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
clearing the conn->in_epoll flag, meaning that any further calls to
tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
overlap with anything else the function does, so just move them to be
open coded in conn_flag_do().
Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as
the event structure for epoll_ctl(). I _think_ we already require newer
kernels than that for other features. We could work around that if support
for such old kernels is important.
Given that, simply directly perform the EPOLL_CTL_DEL operations from
conn_flag_do() rather than unnecessarily multiplexini
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 22a854e..e9ce268 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -152,8 +152,10 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
}
- if (flag == CLOSING)
- tcp_splice_epoll_ctl(c, conn);
+ if (flag == CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ }
}
#define conn_flag(c, conn, flag) \
@@ -182,12 +184,6 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return 0;
- }
-
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
--
@@ -152,8 +152,10 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
}
- if (flag == CLOSING)
- tcp_splice_epoll_ctl(c, conn);
+ if (flag == CLOSING) {
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ }
}
#define conn_flag(c, conn, flag) \
@@ -182,12 +184,6 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
uint32_t events_a, events_b;
- if (conn->flags & CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, &ev_a);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, &ev_b);
- return 0;
- }
-
tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
ev_a.events = events_a;
ev_b.events = events_b;
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl()
2023-10-12 1:51 ` [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() David Gibson
@ 2023-11-03 16:20 ` Stefano Brivio
2023-11-04 5:56 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-11-03 16:20 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 12 Oct 2023 12:51:06 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
> when conn->flags & CLOSING. This will always happen immediately after
> setting that flag, since conn_flag_do() makes the call itself. That's also
> the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
> clearing the conn->in_epoll flag, meaning that any further calls to
> tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
> fail since the fds are no longer in the epoll.
>
> The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
> overlap with anything else the function does, so just move them to be
> open coded in conn_flag_do().
>
> Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as
> the event structure for epoll_ctl(). I _think_ we already require newer
> kernels than that for other features.
Yes, we do, we need at least 3.13 for unprivileged user namespaces.
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl()
2023-11-03 16:20 ` Stefano Brivio
@ 2023-11-04 5:56 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-11-04 5:56 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]
On Fri, Nov 03, 2023 at 05:20:53PM +0100, Stefano Brivio wrote:
> On Thu, 12 Oct 2023 12:51:06 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
> > when conn->flags & CLOSING. This will always happen immediately after
> > setting that flag, since conn_flag_do() makes the call itself. That's also
> > the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
> > clearing the conn->in_epoll flag, meaning that any further calls to
> > tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
> > fail since the fds are no longer in the epoll.
> >
> > The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
> > overlap with anything else the function does, so just move them to be
> > open coded in conn_flag_do().
> >
> > Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as
> > the event structure for epoll_ctl(). I _think_ we already require newer
> > kernels than that for other features.
>
> Yes, we do, we need at least 3.13 for unprivileged user namespaces.
Thanks, message updated accordingly.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 04/11] tcp_splice: Remove unnecessary forward declaration
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (2 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 03/11] tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() David Gibson
` (6 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
In tcp_splice.c we forward declare tcp_splice_epoll_ctl() then define it
later on. However, there are no circular dependencies which prevent us
from simply having the full definition in place of the forward declaration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 71 +++++++++++++++++++++++++---------------------------
1 file changed, 34 insertions(+), 37 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index e9ce268..439fc1d 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -116,8 +116,41 @@ static void tcp_splice_conn_epoll_events(uint16_t events,
*b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
+/**
+ * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
+ * @c: Execution context
+ * @conn: Connection pointer
+ *
+ * Return: 0 on success, negative error code on failure (not on deletion)
+ */
static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn);
+ struct tcp_splice_conn *conn)
+{
+ int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
+ union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
+ .tcp.index = CONN_IDX(conn) };
+ union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
+ .tcp.index = CONN_IDX(conn) };
+ struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
+ struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ uint32_t events_a, events_b;
+
+ tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
+ ev_a.events = events_a;
+ ev_b.events = events_b;
+
+ if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
+
+ conn->in_epoll = true;
+
+ return 0;
+}
/**
* conn_flag_do() - Set/unset given flag, log, update epoll on CLOSING flag
@@ -165,42 +198,6 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
conn_flag_do(c, conn, flag); \
} while (0)
-/**
- * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
- * @conn: Connection pointer
- *
- * Return: 0 on success, negative error code on failure (not on deletion)
- */
-static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn)
-{
- int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
-
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
-
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
- int ret = -errno;
- err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
- CONN_IDX(conn), strerror(errno));
- return ret;
- }
-
- conn->in_epoll = true;
-
- return 0;
-}
-
/**
* conn_event_do() - Set and log connection events, update epoll state
* @c: Execution context
--
@@ -116,8 +116,41 @@ static void tcp_splice_conn_epoll_events(uint16_t events,
*b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
+/**
+ * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
+ * @c: Execution context
+ * @conn: Connection pointer
+ *
+ * Return: 0 on success, negative error code on failure (not on deletion)
+ */
static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn);
+ struct tcp_splice_conn *conn)
+{
+ int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
+ union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
+ .tcp.index = CONN_IDX(conn) };
+ union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
+ .tcp.index = CONN_IDX(conn) };
+ struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
+ struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ uint32_t events_a, events_b;
+
+ tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
+ ev_a.events = events_a;
+ ev_b.events = events_b;
+
+ if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
+ epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ int ret = -errno;
+ err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
+ CONN_IDX(conn), strerror(errno));
+ return ret;
+ }
+
+ conn->in_epoll = true;
+
+ return 0;
+}
/**
* conn_flag_do() - Set/unset given flag, log, update epoll on CLOSING flag
@@ -165,42 +198,6 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
conn_flag_do(c, conn, flag); \
} while (0)
-/**
- * tcp_splice_epoll_ctl() - Add/modify/delete epoll state from connection events
- * @c: Execution context
- * @conn: Connection pointer
- *
- * Return: 0 on success, negative error code on failure (not on deletion)
- */
-static int tcp_splice_epoll_ctl(const struct ctx *c,
- struct tcp_splice_conn *conn)
-{
- int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
-
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
-
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
- int ret = -errno;
- err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
- CONN_IDX(conn), strerror(errno));
- return ret;
- }
-
- conn->in_epoll = true;
-
- return 0;
-}
-
/**
* conn_event_do() - Set and log connection events, update epoll state
* @c: Execution context
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (3 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 04/11] tcp_splice: Remove unnecessary forward declaration David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-11-03 16:21 ` Stefano Brivio
2023-10-12 1:51 ` [PATCH 06/11] tcp_splice: Don't pool pipes in pairs David Gibson
` (5 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
We initialise the events_a and events_b variables with
tcp_splice_conn_epoll_events() function, then immediately copy the values
into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct
epoll_event pointers rather than raw u32 pointers, avoiding the awkward
temporaries.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 439fc1d..3419207 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event mask for socket with accepted connection, set on return
- * @b: Event mask for connection target socket, set on return
+ * @a: Event for socket with accepted connection, set on return
+ * @b: Event for connection target socket, set on return
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- uint32_t *a, uint32_t *b)
+ struct epoll_event *a,
+ struct epoll_event *b)
{
- *a = *b = 0;
+ a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) {
if (!(events & B_FIN_SENT))
- *a = EPOLLIN | EPOLLRDHUP;
+ a->events = EPOLLIN | EPOLLRDHUP;
if (!(events & A_FIN_SENT))
- *b = EPOLLIN | EPOLLRDHUP;
+ b->events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- *b = EPOLLOUT;
+ b->events = EPOLLOUT;
}
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
/**
@@ -133,11 +134,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
.tcp.index = CONN_IDX(conn) };
struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
+ tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
--
@@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event mask for socket with accepted connection, set on return
- * @b: Event mask for connection target socket, set on return
+ * @a: Event for socket with accepted connection, set on return
+ * @b: Event for connection target socket, set on return
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- uint32_t *a, uint32_t *b)
+ struct epoll_event *a,
+ struct epoll_event *b)
{
- *a = *b = 0;
+ a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) {
if (!(events & B_FIN_SENT))
- *a = EPOLLIN | EPOLLRDHUP;
+ a->events = EPOLLIN | EPOLLRDHUP;
if (!(events & A_FIN_SENT))
- *b = EPOLLIN | EPOLLRDHUP;
+ b->events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- *b = EPOLLOUT;
+ b->events = EPOLLOUT;
}
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
+ a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
}
/**
@@ -133,11 +134,8 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
.tcp.index = CONN_IDX(conn) };
struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
- uint32_t events_a, events_b;
- tcp_splice_conn_epoll_events(conn->events, &events_a, &events_b);
- ev_a.events = events_a;
- ev_b.events = events_b;
+ tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl()
2023-10-12 1:51 ` [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() David Gibson
@ 2023-11-03 16:21 ` Stefano Brivio
2023-11-04 5:58 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-11-03 16:21 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 12 Oct 2023 12:51:08 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> We initialise the events_a and events_b variables with
> tcp_splice_conn_epoll_events() function, then immediately copy the values
> into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
> tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
> leading to 'pointer may be unaligned' warnings if we attempt that.
>
> We can, however, make tcp_splice_conn_epoll_events() take struct
> epoll_event pointers rather than raw u32 pointers, avoiding the awkward
> temporaries.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp_splice.c | 24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 439fc1d..3419207 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
> /**
> * tcp_splice_conn_epoll_events() - epoll events masks for given state
> * @events: Connection event flags
> - * @a: Event mask for socket with accepted connection, set on return
> - * @b: Event mask for connection target socket, set on return
> + * @a: Event for socket with accepted connection, set on return
> + * @b: Event for connection target socket, set on return
> */
> static void tcp_splice_conn_epoll_events(uint16_t events,
> - uint32_t *a, uint32_t *b)
> + struct epoll_event *a,
> + struct epoll_event *b)
> {
> - *a = *b = 0;
> + a->events = b->events = 0;
>
> if (events & SPLICE_ESTABLISHED) {
> if (!(events & B_FIN_SENT))
> - *a = EPOLLIN | EPOLLRDHUP;
> + a->events = EPOLLIN | EPOLLRDHUP;
> if (!(events & A_FIN_SENT))
> - *b = EPOLLIN | EPOLLRDHUP;
> + b->events = EPOLLIN | EPOLLRDHUP;
> } else if (events & SPLICE_CONNECT) {
> - *b = EPOLLOUT;
> + b->events = EPOLLOUT;
> }
>
> - *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
> - *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
> + a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
> + a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
This should be b->events |= ..., but it's fixed in 7/11 anyway.
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl()
2023-11-03 16:21 ` Stefano Brivio
@ 2023-11-04 5:58 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-11-04 5:58 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2642 bytes --]
On Fri, Nov 03, 2023 at 05:21:34PM +0100, Stefano Brivio wrote:
> On Thu, 12 Oct 2023 12:51:08 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > We initialise the events_a and events_b variables with
> > tcp_splice_conn_epoll_events() function, then immediately copy the values
> > into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
> > tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
> > leading to 'pointer may be unaligned' warnings if we attempt that.
> >
> > We can, however, make tcp_splice_conn_epoll_events() take struct
> > epoll_event pointers rather than raw u32 pointers, avoiding the awkward
> > temporaries.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp_splice.c | 24 +++++++++++-------------
> > 1 file changed, 11 insertions(+), 13 deletions(-)
> >
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 439fc1d..3419207 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg);
> > /**
> > * tcp_splice_conn_epoll_events() - epoll events masks for given state
> > * @events: Connection event flags
> > - * @a: Event mask for socket with accepted connection, set on return
> > - * @b: Event mask for connection target socket, set on return
> > + * @a: Event for socket with accepted connection, set on return
> > + * @b: Event for connection target socket, set on return
> > */
> > static void tcp_splice_conn_epoll_events(uint16_t events,
> > - uint32_t *a, uint32_t *b)
> > + struct epoll_event *a,
> > + struct epoll_event *b)
> > {
> > - *a = *b = 0;
> > + a->events = b->events = 0;
> >
> > if (events & SPLICE_ESTABLISHED) {
> > if (!(events & B_FIN_SENT))
> > - *a = EPOLLIN | EPOLLRDHUP;
> > + a->events = EPOLLIN | EPOLLRDHUP;
> > if (!(events & A_FIN_SENT))
> > - *b = EPOLLIN | EPOLLRDHUP;
> > + b->events = EPOLLIN | EPOLLRDHUP;
> > } else if (events & SPLICE_CONNECT) {
> > - *b = EPOLLOUT;
> > + b->events = EPOLLOUT;
> > }
> >
> > - *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
> > - *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
> > + a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
> > + a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
>
> This should be b->events |= ..., but it's fixed in 7/11 anyway.
Oops. Fixed, nonetheless.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 06/11] tcp_splice: Don't pool pipes in pairs
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (4 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 05/11] tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 07/11] tcp_splice: Rename sides of connection from a/b to 0/1 David Gibson
` (4 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
To reduce latencies, the tcp splice code maintains a pool of pre-opened
pipes to use for new connections. This is structured as an array of pairs
of pipes, with each pipe, of course, being a pair of fds. Thus when we
use the pool, a single pool "slot" provides both the a->b and b->a pipes.
There's no strong reason to store the pool in pairs, though - we can
with not much difficulty instead take the a->b and b->a pipes for a new
connection independently from separate slots in the pool, or even take one
from the the pool and create the other as we need it, if there's only one
pipe left in the pool.
This marginally increases the length of code, but simplifies the structure
of the pipe pool. We should be able to re-shrink the code with later
changes, too.
In the process we also fix some minor bugs:
- If we both failed to find a pipe in the pool and to create a new one, we
didn't log an error and would silently drop the connection. That could
make debugging such a situation difficult. Add in an error message for
that case
- When refilling the pool, if we were only able to open a single pipe in
the pair, we attempted to rollback, but instead of closing the opened
pipe, we instead closed the pipe we failed to open (probably leading to
some ignored EBADFD errors).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 60 +++++++++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 29 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 3419207..b783326 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -58,7 +58,7 @@
#include "tcp_conn.h"
#define MAX_PIPE_SIZE (8UL * 1024 * 1024)
-#define TCP_SPLICE_PIPE_POOL_SIZE 16
+#define TCP_SPLICE_PIPE_POOL_SIZE 32
#define TCP_SPLICE_CONN_PRESSURE 30 /* % of conn_count */
#define TCP_SPLICE_FILE_PRESSURE 30 /* % of c->nofile */
@@ -69,7 +69,7 @@ static int ns_sock_pool4 [TCP_SOCK_POOL_SIZE];
static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE];
/* Pool of pre-opened pipes */
-static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2][2];
+static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
#define CONN_V6(x) (x->flags & SPLICE_V6)
#define CONN_V4(x) (!CONN_V6(x))
@@ -307,19 +307,16 @@ static int tcp_splice_connect_finish(const struct ctx *c,
conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][0][1]);
-
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][1][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1][1]);
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
break;
}
}
-
if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC) ||
- pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create a->b pipe: %s",
+ strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
@@ -328,6 +325,22 @@ static int tcp_splice_connect_finish(const struct ctx *c,
trace("TCP (spliced): cannot set a->b pipe size to %lu",
c->tcp.pipe_size);
}
+ }
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ break;
+ }
+ }
+ if (conn->pipe_b_a[0] < 0) {
+ if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create b->a pipe: %s",
+ strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
trace("TCP (spliced): cannot set b->a pipe size to %lu",
@@ -716,12 +729,12 @@ close:
*/
static void tcp_set_pipe_size(struct ctx *c)
{
- int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE * 2][2], i, j;
+ int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE][2], i, j;
c->tcp.pipe_size = MAX_PIPE_SIZE;
smaller:
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE * 2; i++) {
+ for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (pipe2(probe_pipe[i], O_CLOEXEC)) {
i++;
break;
@@ -736,7 +749,7 @@ smaller:
close(probe_pipe[j][1]);
}
- if (i == TCP_SPLICE_PIPE_POOL_SIZE * 2)
+ if (i == TCP_SPLICE_PIPE_POOL_SIZE)
return;
if (!(c->tcp.pipe_size /= 2)) {
@@ -756,25 +769,14 @@ static void tcp_splice_pipe_refill(const struct ctx *c)
int i;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0)
+ if (splice_pipe_pool[i][0] >= 0)
break;
- if (pipe2(splice_pipe_pool[i][0], O_NONBLOCK | O_CLOEXEC))
- continue;
- if (pipe2(splice_pipe_pool[i][1], O_NONBLOCK | O_CLOEXEC)) {
- close(splice_pipe_pool[i][1][0]);
- close(splice_pipe_pool[i][1][1]);
+ if (pipe2(splice_pipe_pool[i], O_NONBLOCK | O_CLOEXEC))
continue;
- }
- if (fcntl(splice_pipe_pool[i][0][0], F_SETPIPE_SZ,
+ if (fcntl(splice_pipe_pool[i][0], F_SETPIPE_SZ,
c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
- c->tcp.pipe_size);
- }
-
- if (fcntl(splice_pipe_pool[i][1][0], F_SETPIPE_SZ,
- c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ trace("TCP (spliced): cannot set pool pipe size to %lu",
c->tcp.pipe_size);
}
}
--
@@ -58,7 +58,7 @@
#include "tcp_conn.h"
#define MAX_PIPE_SIZE (8UL * 1024 * 1024)
-#define TCP_SPLICE_PIPE_POOL_SIZE 16
+#define TCP_SPLICE_PIPE_POOL_SIZE 32
#define TCP_SPLICE_CONN_PRESSURE 30 /* % of conn_count */
#define TCP_SPLICE_FILE_PRESSURE 30 /* % of c->nofile */
@@ -69,7 +69,7 @@ static int ns_sock_pool4 [TCP_SOCK_POOL_SIZE];
static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE];
/* Pool of pre-opened pipes */
-static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2][2];
+static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
#define CONN_V6(x) (x->flags & SPLICE_V6)
#define CONN_V4(x) (!CONN_V6(x))
@@ -307,19 +307,16 @@ static int tcp_splice_connect_finish(const struct ctx *c,
conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][0][1]);
-
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][1][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1][1]);
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
break;
}
}
-
if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC) ||
- pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create a->b pipe: %s",
+ strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
@@ -328,6 +325,22 @@ static int tcp_splice_connect_finish(const struct ctx *c,
trace("TCP (spliced): cannot set a->b pipe size to %lu",
c->tcp.pipe_size);
}
+ }
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ break;
+ }
+ }
+ if (conn->pipe_b_a[0] < 0) {
+ if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create b->a pipe: %s",
+ strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
trace("TCP (spliced): cannot set b->a pipe size to %lu",
@@ -716,12 +729,12 @@ close:
*/
static void tcp_set_pipe_size(struct ctx *c)
{
- int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE * 2][2], i, j;
+ int probe_pipe[TCP_SPLICE_PIPE_POOL_SIZE][2], i, j;
c->tcp.pipe_size = MAX_PIPE_SIZE;
smaller:
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE * 2; i++) {
+ for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (pipe2(probe_pipe[i], O_CLOEXEC)) {
i++;
break;
@@ -736,7 +749,7 @@ smaller:
close(probe_pipe[j][1]);
}
- if (i == TCP_SPLICE_PIPE_POOL_SIZE * 2)
+ if (i == TCP_SPLICE_PIPE_POOL_SIZE)
return;
if (!(c->tcp.pipe_size /= 2)) {
@@ -756,25 +769,14 @@ static void tcp_splice_pipe_refill(const struct ctx *c)
int i;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0][0] >= 0)
+ if (splice_pipe_pool[i][0] >= 0)
break;
- if (pipe2(splice_pipe_pool[i][0], O_NONBLOCK | O_CLOEXEC))
- continue;
- if (pipe2(splice_pipe_pool[i][1], O_NONBLOCK | O_CLOEXEC)) {
- close(splice_pipe_pool[i][1][0]);
- close(splice_pipe_pool[i][1][1]);
+ if (pipe2(splice_pipe_pool[i], O_NONBLOCK | O_CLOEXEC))
continue;
- }
- if (fcntl(splice_pipe_pool[i][0][0], F_SETPIPE_SZ,
+ if (fcntl(splice_pipe_pool[i][0], F_SETPIPE_SZ,
c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
- c->tcp.pipe_size);
- }
-
- if (fcntl(splice_pipe_pool[i][1][0], F_SETPIPE_SZ,
- c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ trace("TCP (spliced): cannot set pool pipe size to %lu",
c->tcp.pipe_size);
}
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/11] tcp_splice: Rename sides of connection from a/b to 0/1
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (5 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 06/11] tcp_splice: Don't pool pipes in pairs David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer() David Gibson
` (3 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Each spliced connection has two mostly, although not entirely, symmetric
sides. We currently call those "a" and "b" and have different fields in
the connection structure for each one.
We can better exploit that symmetry if we use two element arrays rather
thatn separately named fields. Do that in the places we can, and for the
others change the "a"/"b" terminology to 0/1 to match.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_conn.h | 45 +++++------
tcp_splice.c | 224 +++++++++++++++++++++++++--------------------------
2 files changed, 130 insertions(+), 139 deletions(-)
diff --git a/tcp_conn.h b/tcp_conn.h
index 0751e00..01d31d4 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -119,54 +119,47 @@ struct tcp_tap_conn {
uint32_t seq_init_from_tap;
};
+#define SIDES 2
/**
* struct tcp_splice_conn - Descriptor for a spliced TCP connection
* @c: Fields common with tcp_tap_conn
* @in_epoll: Is the connection in the epoll set?
- * @a: File descriptor number of socket for accepted connection
- * @pipe_a_b: Pipe ends for splice() from @a to @b
- * @b: File descriptor number of peer connected socket
- * @pipe_b_a: Pipe ends for splice() from @b to @a
+ * @s: File descriptor for sockets
+ * @pipe: File descriptors for pipes
* @events: Events observed/actions performed on connection
* @flags: Connection flags (attributes, not events)
- * @a_read: Bytes read from @a (not fully written to @b in one shot)
- * @a_written: Bytes written to @a (not fully written from one @b read)
- * @b_read: Bytes read from @b (not fully written to @a in one shot)
- * @b_written: Bytes written to @b (not fully written from one @a read)
+ * @read: Bytes read (not fully written to other side in one shot)
+ * @written: Bytes written (not fully written from one other side read)
*/
struct tcp_splice_conn {
/* Must be first element to match tcp_tap_conn */
struct tcp_conn_common c;
bool in_epoll :1;
- int a;
- int pipe_a_b[2];
- int b;
- int pipe_b_a[2];
+ int s[SIDES];
+ int pipe[SIDES][2];
uint8_t events;
#define SPLICE_CLOSED 0
#define SPLICE_CONNECT BIT(0)
#define SPLICE_ESTABLISHED BIT(1)
-#define A_OUT_WAIT BIT(2)
-#define B_OUT_WAIT BIT(3)
-#define A_FIN_RCVD BIT(4)
-#define B_FIN_RCVD BIT(5)
-#define A_FIN_SENT BIT(6)
-#define B_FIN_SENT BIT(7)
+#define OUT_WAIT_0 BIT(2)
+#define OUT_WAIT_1 BIT(3)
+#define FIN_RCVD_0 BIT(4)
+#define FIN_RCVD_1 BIT(5)
+#define FIN_SENT_0 BIT(6)
+#define FIN_SENT_1 BIT(7)
uint8_t flags;
#define SPLICE_V6 BIT(0)
-#define RCVLOWAT_SET_A BIT(1)
-#define RCVLOWAT_SET_B BIT(2)
-#define RCVLOWAT_ACT_A BIT(3)
-#define RCVLOWAT_ACT_B BIT(4)
+#define RCVLOWAT_SET_0 BIT(1)
+#define RCVLOWAT_SET_1 BIT(2)
+#define RCVLOWAT_ACT_0 BIT(3)
+#define RCVLOWAT_ACT_1 BIT(4)
#define CLOSING BIT(5)
- uint32_t a_read;
- uint32_t a_written;
- uint32_t b_read;
- uint32_t b_written;
+ uint32_t read[SIDES];
+ uint32_t written[SIDES];
};
/**
diff --git a/tcp_splice.c b/tcp_splice.c
index b783326..4a0580c 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -21,12 +21,12 @@
*
* - SPLICE_CONNECT: connection accepted, connecting to target
* - SPLICE_ESTABLISHED: connection to target established
- * - A_OUT_WAIT: pipe to accepted socket full, wait for EPOLLOUT
- * - B_OUT_WAIT: pipe to target socket full, wait for EPOLLOUT
- * - A_FIN_RCVD: FIN (EPOLLRDHUP) seen from accepted socket
- * - B_FIN_RCVD: FIN (EPOLLRDHUP) seen from target socket
- * - A_FIN_RCVD: FIN (write shutdown) sent to accepted socket
- * - B_FIN_RCVD: FIN (write shutdown) sent to target socket
+ * - OUT_WAIT_0: pipe to accepted socket full, wait for EPOLLOUT
+ * - OUT_WAIT_1: pipe to target socket full, wait for EPOLLOUT
+ * - FIN_RCVD_0: FIN (EPOLLRDHUP) seen from accepted socket
+ * - FIN_RCVD_1: FIN (EPOLLRDHUP) seen from target socket
+ * - FIN_SENT_0: FIN (write shutdown) sent to accepted socket
+ * - FIN_SENT_1: FIN (write shutdown) sent to target socket
*
* #syscalls:pasta pipe2|pipe fcntl armv6l:fcntl64 armv7l:fcntl64 ppc64:fcntl64
*/
@@ -79,14 +79,14 @@ static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
/* Display strings for connection events */
static const char *tcp_splice_event_str[] __attribute((__unused__)) = {
- "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "A_OUT_WAIT", "B_OUT_WAIT",
- "A_FIN_RCVD", "B_FIN_RCVD", "A_FIN_SENT", "B_FIN_SENT",
+ "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "OUT_WAIT_0", "OUT_WAIT_1",
+ "FIN_RCVD_0", "FIN_RCVD_1", "FIN_SENT_0", "FIN_SENT_1",
};
/* Display strings for connection flags */
static const char *tcp_splice_flag_str[] __attribute((__unused__)) = {
- "SPLICE_V6", "RCVLOWAT_SET_A", "RCVLOWAT_SET_B", "RCVLOWAT_ACT_A",
- "RCVLOWAT_ACT_B", "CLOSING",
+ "SPLICE_V6", "RCVLOWAT_SET_0", "RCVLOWAT_SET_1", "RCVLOWAT_ACT_0",
+ "RCVLOWAT_ACT_1", "CLOSING",
};
/* Forward declaration */
@@ -95,26 +95,24 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event for socket with accepted connection, set on return
- * @b: Event for connection target socket, set on return
+ * @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- struct epoll_event *a,
- struct epoll_event *b)
+ struct epoll_event ev[])
{
- a->events = b->events = 0;
+ ev[0].events = ev[1].events = 0;
if (events & SPLICE_ESTABLISHED) {
- if (!(events & B_FIN_SENT))
- a->events = EPOLLIN | EPOLLRDHUP;
- if (!(events & A_FIN_SENT))
- b->events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_1))
+ ev[0].events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_0))
+ ev[1].events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- b->events = EPOLLOUT;
+ ev[1].events = EPOLLOUT;
}
- a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ ev[0].events |= (events & OUT_WAIT_0) ? EPOLLOUT : 0;
+ ev[1].events |= (events & OUT_WAIT_1) ? EPOLLOUT : 0;
}
/**
@@ -128,17 +126,17 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct tcp_splice_conn *conn)
{
int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ union epoll_ref ref[SIDES] = {
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[0], .tcp.index = CONN_IDX(conn) },
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[1], .tcp.index = CONN_IDX(conn) }
+ };
+ struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
+ { .data.u64 = ref[1].u64 } };
- tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
+ tcp_splice_conn_epoll_events(conn->events, ev);
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ if (epoll_ctl(c->epollfd, m, conn->s[0], &ev[0]) ||
+ epoll_ctl(c->epollfd, m, conn->s[1], &ev[1])) {
int ret = -errno;
err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
CONN_IDX(conn), strerror(errno));
@@ -184,8 +182,8 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
if (flag == CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[0], NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[1], NULL);
}
}
@@ -263,26 +261,26 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
if (conn->events & SPLICE_ESTABLISHED) {
/* Flushing might need to block: don't recycle them. */
- if (conn->pipe_a_b[0] != -1) {
- close(conn->pipe_a_b[0]);
- close(conn->pipe_a_b[1]);
- conn->pipe_a_b[0] = conn->pipe_a_b[1] = -1;
+ if (conn->pipe[0][0] != -1) {
+ close(conn->pipe[0][0]);
+ close(conn->pipe[0][1]);
+ conn->pipe[0][0] = conn->pipe[0][1] = -1;
}
- if (conn->pipe_b_a[0] != -1) {
- close(conn->pipe_b_a[0]);
- close(conn->pipe_b_a[1]);
- conn->pipe_b_a[0] = conn->pipe_b_a[1] = -1;
+ if (conn->pipe[1][0] != -1) {
+ close(conn->pipe[1][0]);
+ close(conn->pipe[1][1]);
+ conn->pipe[1][0] = conn->pipe[1][1] = -1;
}
}
if (conn->events & SPLICE_CONNECT) {
- close(conn->b);
- conn->b = -1;
+ close(conn->s[1]);
+ conn->s[1] = -1;
}
- close(conn->a);
- conn->a = -1;
- conn->a_read = conn->a_written = conn->b_read = conn->b_written = 0;
+ close(conn->s[0]);
+ conn->s[0] = -1;
+ conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
conn->events = SPLICE_CLOSED;
conn->flags = 0;
@@ -303,47 +301,47 @@ static int tcp_splice_connect_finish(const struct ctx *c,
{
int i;
- conn->pipe_a_b[0] = conn->pipe_b_a[0] = -1;
- conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
+ conn->pipe[0][0] = conn->pipe[1][0] = -1;
+ conn->pipe[0][1] = conn->pipe[1][1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create a->b pipe: %s",
+ if (conn->pipe[0][0] < 0) {
+ if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 0->1 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_a_b[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
+ if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
c->tcp.pipe_size);
}
}
for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_b_a[0] < 0) {
- if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create b->a pipe: %s",
+ if (conn->pipe[1][0] < 0) {
+ if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 1->0 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
c->tcp.pipe_size);
}
}
@@ -379,12 +377,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
const struct sockaddr *sa;
socklen_t sl;
- conn->b = sock_conn;
+ conn->s[1] = sock_conn;
- if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK,
+ if (setsockopt(conn->s[1], SOL_TCP, TCP_QUICKACK,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i",
- conn->b);
+ conn->s[1]);
}
if (CONN_V6(conn)) {
@@ -395,7 +393,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
sl = sizeof(addr4);
}
- if (connect(conn->b, sa, sl)) {
+ if (connect(conn->s[1], sa, sl)) {
if (errno != EINPROGRESS) {
int ret = -errno;
@@ -473,13 +471,13 @@ static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
{
if (!reverse) {
*from = ref_sock;
- *to = (*from == conn->a) ? conn->b : conn->a;
+ *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
} else {
*to = ref_sock;
- *from = (*to == conn->a) ? conn->b : conn->a;
+ *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
}
- *pipes = *from == conn->a ? conn->pipe_a_b : conn->pipe_b_a;
+ *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
}
/**
@@ -521,7 +519,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
trace("TCP (spliced): failed to set TCP_QUICKACK on %i", s);
conn->c.spliced = true;
- conn->a = s;
+ conn->s[0] = s;
if (tcp_splice_new(c, conn, ref.port, ref.pif))
conn_flag(c, conn, CLOSING);
@@ -559,10 +557,10 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->a)
- conn_event(c, conn, ~A_OUT_WAIT);
+ if (s == conn->s[0])
+ conn_event(c, conn, ~OUT_WAIT_0);
else
- conn_event(c, conn, ~B_OUT_WAIT);
+ conn_event(c, conn, ~OUT_WAIT_1);
tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
} else {
@@ -570,33 +568,33 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLRDHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_RCVD);
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_RCVD_0);
else
- conn_event(c, conn, B_FIN_RCVD);
+ conn_event(c, conn, FIN_RCVD_1);
}
if (events & EPOLLHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_SENT); /* Fake, but implied */
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_SENT_0); /* Fake, but implied */
else
- conn_event(c, conn, B_FIN_SENT);
+ conn_event(c, conn, FIN_SENT_1);
}
swap:
eof = 0;
never_read = 1;
- if (from == conn->a) {
- seq_read = &conn->a_read;
- seq_write = &conn->a_written;
- lowat_set_flag = RCVLOWAT_SET_A;
- lowat_act_flag = RCVLOWAT_ACT_A;
+ if (from == conn->s[0]) {
+ seq_read = &conn->read[0];
+ seq_write = &conn->written[0];
+ lowat_set_flag = RCVLOWAT_SET_0;
+ lowat_act_flag = RCVLOWAT_ACT_0;
} else {
- seq_read = &conn->b_read;
- seq_write = &conn->b_written;
- lowat_set_flag = RCVLOWAT_SET_B;
- lowat_act_flag = RCVLOWAT_ACT_B;
+ seq_read = &conn->read[1];
+ seq_write = &conn->written[1];
+ lowat_set_flag = RCVLOWAT_SET_1;
+ lowat_act_flag = RCVLOWAT_ACT_1;
}
while (1) {
@@ -666,10 +664,10 @@ eintr:
if (never_read)
break;
- if (to == conn->a)
- conn_event(c, conn, A_OUT_WAIT);
+ if (to == conn->s[0])
+ conn_event(c, conn, OUT_WAIT_0);
else
- conn_event(c, conn, B_OUT_WAIT);
+ conn_event(c, conn, OUT_WAIT_1);
break;
}
@@ -685,31 +683,31 @@ eintr:
break;
}
- if ((conn->events & A_FIN_RCVD) && !(conn->events & B_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->b, SHUT_WR);
- conn_event(c, conn, B_FIN_SENT);
+ shutdown(conn->s[1], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_1);
}
}
- if ((conn->events & B_FIN_RCVD) && !(conn->events & A_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->a, SHUT_WR);
- conn_event(c, conn, A_FIN_SENT);
+ shutdown(conn->s[0], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_0);
}
}
- if (CONN_HAS(conn, A_FIN_SENT | B_FIN_SENT))
+ if (CONN_HAS(conn, FIN_SENT_0 | FIN_SENT_1))
goto close;
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
SWAP(from, to);
- if (pipes == conn->pipe_a_b)
- pipes = conn->pipe_b_a;
+ if (pipes == conn->pipe[0])
+ pipes = conn->pipe[1];
else
- pipes = conn->pipe_a_b;
+ pipes = conn->pipe[0];
goto swap;
}
@@ -843,26 +841,26 @@ void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
return;
}
- if ( (conn->flags & RCVLOWAT_SET_A) &&
- !(conn->flags & RCVLOWAT_ACT_A)) {
- if (setsockopt(conn->a, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_0) &&
+ !(conn->flags & RCVLOWAT_ACT_0)) {
+ if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->a);
+ "%i", conn->s[0]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_A);
+ conn_flag(c, conn, ~RCVLOWAT_SET_0);
}
- if ( (conn->flags & RCVLOWAT_SET_B) &&
- !(conn->flags & RCVLOWAT_ACT_B)) {
- if (setsockopt(conn->b, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_1) &&
+ !(conn->flags & RCVLOWAT_ACT_1)) {
+ if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->b);
+ "%i", conn->s[1]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_B);
+ conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
- conn_flag(c, conn, ~RCVLOWAT_ACT_A);
- conn_flag(c, conn, ~RCVLOWAT_ACT_B);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_0);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_1);
}
--
@@ -21,12 +21,12 @@
*
* - SPLICE_CONNECT: connection accepted, connecting to target
* - SPLICE_ESTABLISHED: connection to target established
- * - A_OUT_WAIT: pipe to accepted socket full, wait for EPOLLOUT
- * - B_OUT_WAIT: pipe to target socket full, wait for EPOLLOUT
- * - A_FIN_RCVD: FIN (EPOLLRDHUP) seen from accepted socket
- * - B_FIN_RCVD: FIN (EPOLLRDHUP) seen from target socket
- * - A_FIN_RCVD: FIN (write shutdown) sent to accepted socket
- * - B_FIN_RCVD: FIN (write shutdown) sent to target socket
+ * - OUT_WAIT_0: pipe to accepted socket full, wait for EPOLLOUT
+ * - OUT_WAIT_1: pipe to target socket full, wait for EPOLLOUT
+ * - FIN_RCVD_0: FIN (EPOLLRDHUP) seen from accepted socket
+ * - FIN_RCVD_1: FIN (EPOLLRDHUP) seen from target socket
+ * - FIN_SENT_0: FIN (write shutdown) sent to accepted socket
+ * - FIN_SENT_1: FIN (write shutdown) sent to target socket
*
* #syscalls:pasta pipe2|pipe fcntl armv6l:fcntl64 armv7l:fcntl64 ppc64:fcntl64
*/
@@ -79,14 +79,14 @@ static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2];
/* Display strings for connection events */
static const char *tcp_splice_event_str[] __attribute((__unused__)) = {
- "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "A_OUT_WAIT", "B_OUT_WAIT",
- "A_FIN_RCVD", "B_FIN_RCVD", "A_FIN_SENT", "B_FIN_SENT",
+ "SPLICE_CONNECT", "SPLICE_ESTABLISHED", "OUT_WAIT_0", "OUT_WAIT_1",
+ "FIN_RCVD_0", "FIN_RCVD_1", "FIN_SENT_0", "FIN_SENT_1",
};
/* Display strings for connection flags */
static const char *tcp_splice_flag_str[] __attribute((__unused__)) = {
- "SPLICE_V6", "RCVLOWAT_SET_A", "RCVLOWAT_SET_B", "RCVLOWAT_ACT_A",
- "RCVLOWAT_ACT_B", "CLOSING",
+ "SPLICE_V6", "RCVLOWAT_SET_0", "RCVLOWAT_SET_1", "RCVLOWAT_ACT_0",
+ "RCVLOWAT_ACT_1", "CLOSING",
};
/* Forward declaration */
@@ -95,26 +95,24 @@ static int tcp_sock_refill_ns(void *arg);
/**
* tcp_splice_conn_epoll_events() - epoll events masks for given state
* @events: Connection event flags
- * @a: Event for socket with accepted connection, set on return
- * @b: Event for connection target socket, set on return
+ * @ev: Events to fill in, 0 is accepted socket, 1 is connecting socket
*/
static void tcp_splice_conn_epoll_events(uint16_t events,
- struct epoll_event *a,
- struct epoll_event *b)
+ struct epoll_event ev[])
{
- a->events = b->events = 0;
+ ev[0].events = ev[1].events = 0;
if (events & SPLICE_ESTABLISHED) {
- if (!(events & B_FIN_SENT))
- a->events = EPOLLIN | EPOLLRDHUP;
- if (!(events & A_FIN_SENT))
- b->events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_1))
+ ev[0].events = EPOLLIN | EPOLLRDHUP;
+ if (!(events & FIN_SENT_0))
+ ev[1].events = EPOLLIN | EPOLLRDHUP;
} else if (events & SPLICE_CONNECT) {
- b->events = EPOLLOUT;
+ ev[1].events = EPOLLOUT;
}
- a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0;
- a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
+ ev[0].events |= (events & OUT_WAIT_0) ? EPOLLOUT : 0;
+ ev[1].events |= (events & OUT_WAIT_1) ? EPOLLOUT : 0;
}
/**
@@ -128,17 +126,17 @@ static int tcp_splice_epoll_ctl(const struct ctx *c,
struct tcp_splice_conn *conn)
{
int m = conn->in_epoll ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
- union epoll_ref ref_a = { .type = EPOLL_TYPE_TCP, .fd = conn->a,
- .tcp.index = CONN_IDX(conn) };
- union epoll_ref ref_b = { .type = EPOLL_TYPE_TCP, .fd = conn->b,
- .tcp.index = CONN_IDX(conn) };
- struct epoll_event ev_a = { .data.u64 = ref_a.u64 };
- struct epoll_event ev_b = { .data.u64 = ref_b.u64 };
+ union epoll_ref ref[SIDES] = {
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[0], .tcp.index = CONN_IDX(conn) },
+ { .type = EPOLL_TYPE_TCP, .fd = conn->s[1], .tcp.index = CONN_IDX(conn) }
+ };
+ struct epoll_event ev[SIDES] = { { .data.u64 = ref[0].u64 },
+ { .data.u64 = ref[1].u64 } };
- tcp_splice_conn_epoll_events(conn->events, &ev_a, &ev_b);
+ tcp_splice_conn_epoll_events(conn->events, ev);
- if (epoll_ctl(c->epollfd, m, conn->a, &ev_a) ||
- epoll_ctl(c->epollfd, m, conn->b, &ev_b)) {
+ if (epoll_ctl(c->epollfd, m, conn->s[0], &ev[0]) ||
+ epoll_ctl(c->epollfd, m, conn->s[1], &ev[1])) {
int ret = -errno;
err("TCP (spliced): index %li, ERROR on epoll_ctl(): %s",
CONN_IDX(conn), strerror(errno));
@@ -184,8 +182,8 @@ static void conn_flag_do(const struct ctx *c, struct tcp_splice_conn *conn,
}
if (flag == CLOSING) {
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->a, NULL);
- epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->b, NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[0], NULL);
+ epoll_ctl(c->epollfd, EPOLL_CTL_DEL, conn->s[1], NULL);
}
}
@@ -263,26 +261,26 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
if (conn->events & SPLICE_ESTABLISHED) {
/* Flushing might need to block: don't recycle them. */
- if (conn->pipe_a_b[0] != -1) {
- close(conn->pipe_a_b[0]);
- close(conn->pipe_a_b[1]);
- conn->pipe_a_b[0] = conn->pipe_a_b[1] = -1;
+ if (conn->pipe[0][0] != -1) {
+ close(conn->pipe[0][0]);
+ close(conn->pipe[0][1]);
+ conn->pipe[0][0] = conn->pipe[0][1] = -1;
}
- if (conn->pipe_b_a[0] != -1) {
- close(conn->pipe_b_a[0]);
- close(conn->pipe_b_a[1]);
- conn->pipe_b_a[0] = conn->pipe_b_a[1] = -1;
+ if (conn->pipe[1][0] != -1) {
+ close(conn->pipe[1][0]);
+ close(conn->pipe[1][1]);
+ conn->pipe[1][0] = conn->pipe[1][1] = -1;
}
}
if (conn->events & SPLICE_CONNECT) {
- close(conn->b);
- conn->b = -1;
+ close(conn->s[1]);
+ conn->s[1] = -1;
}
- close(conn->a);
- conn->a = -1;
- conn->a_read = conn->a_written = conn->b_read = conn->b_written = 0;
+ close(conn->s[0]);
+ conn->s[0] = -1;
+ conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
conn->events = SPLICE_CLOSED;
conn->flags = 0;
@@ -303,47 +301,47 @@ static int tcp_splice_connect_finish(const struct ctx *c,
{
int i;
- conn->pipe_a_b[0] = conn->pipe_b_a[0] = -1;
- conn->pipe_a_b[1] = conn->pipe_b_a[1] = -1;
+ conn->pipe[0][0] = conn->pipe[1][0] = -1;
+ conn->pipe[0][1] = conn->pipe[1][1] = -1;
for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_a_b[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_a_b[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_a_b[0] < 0) {
- if (pipe2(conn->pipe_a_b, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create a->b pipe: %s",
+ if (conn->pipe[0][0] < 0) {
+ if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 0->1 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_a_b[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set a->b pipe size to %lu",
+ if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
c->tcp.pipe_size);
}
}
for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe_b_a[0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe_b_a[1], splice_pipe_pool[i][1]);
+ SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
break;
}
}
- if (conn->pipe_b_a[0] < 0) {
- if (pipe2(conn->pipe_b_a, O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create b->a pipe: %s",
+ if (conn->pipe[1][0] < 0) {
+ if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create 1->0 pipe: %s",
strerror(errno));
conn_flag(c, conn, CLOSING);
return -EIO;
}
- if (fcntl(conn->pipe_b_a[0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set b->a pipe size to %lu",
+ if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
c->tcp.pipe_size);
}
}
@@ -379,12 +377,12 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
const struct sockaddr *sa;
socklen_t sl;
- conn->b = sock_conn;
+ conn->s[1] = sock_conn;
- if (setsockopt(conn->b, SOL_TCP, TCP_QUICKACK,
+ if (setsockopt(conn->s[1], SOL_TCP, TCP_QUICKACK,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): failed to set TCP_QUICKACK on socket %i",
- conn->b);
+ conn->s[1]);
}
if (CONN_V6(conn)) {
@@ -395,7 +393,7 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn,
sl = sizeof(addr4);
}
- if (connect(conn->b, sa, sl)) {
+ if (connect(conn->s[1], sa, sl)) {
if (errno != EINPROGRESS) {
int ret = -errno;
@@ -473,13 +471,13 @@ static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
{
if (!reverse) {
*from = ref_sock;
- *to = (*from == conn->a) ? conn->b : conn->a;
+ *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
} else {
*to = ref_sock;
- *from = (*to == conn->a) ? conn->b : conn->a;
+ *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
}
- *pipes = *from == conn->a ? conn->pipe_a_b : conn->pipe_b_a;
+ *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
}
/**
@@ -521,7 +519,7 @@ bool tcp_splice_conn_from_sock(const struct ctx *c,
trace("TCP (spliced): failed to set TCP_QUICKACK on %i", s);
conn->c.spliced = true;
- conn->a = s;
+ conn->s[0] = s;
if (tcp_splice_new(c, conn, ref.port, ref.pif))
conn_flag(c, conn, CLOSING);
@@ -559,10 +557,10 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->a)
- conn_event(c, conn, ~A_OUT_WAIT);
+ if (s == conn->s[0])
+ conn_event(c, conn, ~OUT_WAIT_0);
else
- conn_event(c, conn, ~B_OUT_WAIT);
+ conn_event(c, conn, ~OUT_WAIT_1);
tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
} else {
@@ -570,33 +568,33 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLRDHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_RCVD);
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_RCVD_0);
else
- conn_event(c, conn, B_FIN_RCVD);
+ conn_event(c, conn, FIN_RCVD_1);
}
if (events & EPOLLHUP) {
- if (s == conn->a)
- conn_event(c, conn, A_FIN_SENT); /* Fake, but implied */
+ if (s == conn->s[0])
+ conn_event(c, conn, FIN_SENT_0); /* Fake, but implied */
else
- conn_event(c, conn, B_FIN_SENT);
+ conn_event(c, conn, FIN_SENT_1);
}
swap:
eof = 0;
never_read = 1;
- if (from == conn->a) {
- seq_read = &conn->a_read;
- seq_write = &conn->a_written;
- lowat_set_flag = RCVLOWAT_SET_A;
- lowat_act_flag = RCVLOWAT_ACT_A;
+ if (from == conn->s[0]) {
+ seq_read = &conn->read[0];
+ seq_write = &conn->written[0];
+ lowat_set_flag = RCVLOWAT_SET_0;
+ lowat_act_flag = RCVLOWAT_ACT_0;
} else {
- seq_read = &conn->b_read;
- seq_write = &conn->b_written;
- lowat_set_flag = RCVLOWAT_SET_B;
- lowat_act_flag = RCVLOWAT_ACT_B;
+ seq_read = &conn->read[1];
+ seq_write = &conn->written[1];
+ lowat_set_flag = RCVLOWAT_SET_1;
+ lowat_act_flag = RCVLOWAT_ACT_1;
}
while (1) {
@@ -666,10 +664,10 @@ eintr:
if (never_read)
break;
- if (to == conn->a)
- conn_event(c, conn, A_OUT_WAIT);
+ if (to == conn->s[0])
+ conn_event(c, conn, OUT_WAIT_0);
else
- conn_event(c, conn, B_OUT_WAIT);
+ conn_event(c, conn, OUT_WAIT_1);
break;
}
@@ -685,31 +683,31 @@ eintr:
break;
}
- if ((conn->events & A_FIN_RCVD) && !(conn->events & B_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->b, SHUT_WR);
- conn_event(c, conn, B_FIN_SENT);
+ shutdown(conn->s[1], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_1);
}
}
- if ((conn->events & B_FIN_RCVD) && !(conn->events & A_FIN_SENT)) {
+ if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
if (*seq_read == *seq_write && eof) {
- shutdown(conn->a, SHUT_WR);
- conn_event(c, conn, A_FIN_SENT);
+ shutdown(conn->s[0], SHUT_WR);
+ conn_event(c, conn, FIN_SENT_0);
}
}
- if (CONN_HAS(conn, A_FIN_SENT | B_FIN_SENT))
+ if (CONN_HAS(conn, FIN_SENT_0 | FIN_SENT_1))
goto close;
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
SWAP(from, to);
- if (pipes == conn->pipe_a_b)
- pipes = conn->pipe_b_a;
+ if (pipes == conn->pipe[0])
+ pipes = conn->pipe[1];
else
- pipes = conn->pipe_a_b;
+ pipes = conn->pipe[0];
goto swap;
}
@@ -843,26 +841,26 @@ void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
return;
}
- if ( (conn->flags & RCVLOWAT_SET_A) &&
- !(conn->flags & RCVLOWAT_ACT_A)) {
- if (setsockopt(conn->a, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_0) &&
+ !(conn->flags & RCVLOWAT_ACT_0)) {
+ if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->a);
+ "%i", conn->s[0]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_A);
+ conn_flag(c, conn, ~RCVLOWAT_SET_0);
}
- if ( (conn->flags & RCVLOWAT_SET_B) &&
- !(conn->flags & RCVLOWAT_ACT_B)) {
- if (setsockopt(conn->b, SOL_SOCKET, SO_RCVLOWAT,
+ if ( (conn->flags & RCVLOWAT_SET_1) &&
+ !(conn->flags & RCVLOWAT_ACT_1)) {
+ if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
&((int){ 1 }), sizeof(int))) {
trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->b);
+ "%i", conn->s[1]);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_B);
+ conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
- conn_flag(c, conn, ~RCVLOWAT_ACT_A);
- conn_flag(c, conn, ~RCVLOWAT_ACT_B);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_0);
+ conn_flag(c, conn, ~RCVLOWAT_ACT_1);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (6 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 07/11] tcp_splice: Rename sides of connection from a/b to 0/1 David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-11-03 16:21 ` Stefano Brivio
2023-10-12 1:51 ` [PATCH 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() David Gibson
` (2 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_timer() has two very similar blocks one after another that
handle the SO_RCVLOWAT flags for the two sides of the connection. We can
deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 4a0580c..259d774 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
if (conn->flags & CLOSING) {
tcp_splice_destroy(c, conn_union);
return;
}
- if ( (conn->flags & RCVLOWAT_SET_0) &&
- !(conn->flags & RCVLOWAT_ACT_0)) {
- if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[0]);
- }
- conn_flag(c, conn, ~RCVLOWAT_SET_0);
- }
+ for (side = 0; side < SIDES; side++) {
+ uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) &&
- !(conn->flags & RCVLOWAT_ACT_1)) {
- if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[1]);
+ if ( (conn->flags & set) && !(conn->flags & act)) {
+ if (setsockopt(conn->s[side], SOL_SOCKET, SO_RCVLOWAT,
+ &((int){ 1 }), sizeof(int))) {
+ trace("TCP (spliced): can't set SO_RCVLOWAT on "
+ "%i", conn->s[side]);
+ }
+ conn_flag(c, conn, ~set);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
conn_flag(c, conn, ~RCVLOWAT_ACT_0);
--
@@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
if (conn->flags & CLOSING) {
tcp_splice_destroy(c, conn_union);
return;
}
- if ( (conn->flags & RCVLOWAT_SET_0) &&
- !(conn->flags & RCVLOWAT_ACT_0)) {
- if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[0]);
- }
- conn_flag(c, conn, ~RCVLOWAT_SET_0);
- }
+ for (side = 0; side < SIDES; side++) {
+ uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) &&
- !(conn->flags & RCVLOWAT_ACT_1)) {
- if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
- &((int){ 1 }), sizeof(int))) {
- trace("TCP (spliced): can't set SO_RCVLOWAT on "
- "%i", conn->s[1]);
+ if ( (conn->flags & set) && !(conn->flags & act)) {
+ if (setsockopt(conn->s[side], SOL_SOCKET, SO_RCVLOWAT,
+ &((int){ 1 }), sizeof(int))) {
+ trace("TCP (spliced): can't set SO_RCVLOWAT on "
+ "%i", conn->s[side]);
+ }
+ conn_flag(c, conn, ~set);
}
- conn_flag(c, conn, ~RCVLOWAT_SET_1);
}
conn_flag(c, conn, ~RCVLOWAT_ACT_0);
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer()
2023-10-12 1:51 ` [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer() David Gibson
@ 2023-11-03 16:21 ` Stefano Brivio
2023-11-04 5:59 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-11-03 16:21 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 12 Oct 2023 12:51:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> tcp_splice_timer() has two very similar blocks one after another that
> handle the SO_RCVLOWAT flags for the two sides of the connection. We can
> deduplicate this with a loop across the two sides.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp_splice.c | 27 +++++++++++----------------
> 1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 4a0580c..259d774 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
> void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
> {
> struct tcp_splice_conn *conn = &conn_union->splice;
> + int side;
>
> if (conn->flags & CLOSING) {
> tcp_splice_destroy(c, conn_union);
> return;
> }
>
> - if ( (conn->flags & RCVLOWAT_SET_0) &&
> - !(conn->flags & RCVLOWAT_ACT_0)) {
> - if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
> - &((int){ 1 }), sizeof(int))) {
> - trace("TCP (spliced): can't set SO_RCVLOWAT on "
> - "%i", conn->s[0]);
> - }
> - conn_flag(c, conn, ~RCVLOWAT_SET_0);
> - }
> + for (side = 0; side < SIDES; side++) {
> + uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
> + uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
>
> - if ( (conn->flags & RCVLOWAT_SET_1) &&
> - !(conn->flags & RCVLOWAT_ACT_1)) {
> - if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
> - &((int){ 1 }), sizeof(int))) {
> - trace("TCP (spliced): can't set SO_RCVLOWAT on "
> - "%i", conn->s[1]);
> + if ( (conn->flags & set) && !(conn->flags & act)) {
^
Nit: this extra whitespace is now useless.
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer()
2023-11-03 16:21 ` Stefano Brivio
@ 2023-11-04 5:59 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-11-04 5:59 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]
On Fri, Nov 03, 2023 at 05:21:53PM +0100, Stefano Brivio wrote:
> On Thu, 12 Oct 2023 12:51:11 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > tcp_splice_timer() has two very similar blocks one after another that
> > handle the SO_RCVLOWAT flags for the two sides of the connection. We can
> > deduplicate this with a loop across the two sides.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp_splice.c | 27 +++++++++++----------------
> > 1 file changed, 11 insertions(+), 16 deletions(-)
> >
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 4a0580c..259d774 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c)
> > void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union)
> > {
> > struct tcp_splice_conn *conn = &conn_union->splice;
> > + int side;
> >
> > if (conn->flags & CLOSING) {
> > tcp_splice_destroy(c, conn_union);
> > return;
> > }
> >
> > - if ( (conn->flags & RCVLOWAT_SET_0) &&
> > - !(conn->flags & RCVLOWAT_ACT_0)) {
> > - if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT,
> > - &((int){ 1 }), sizeof(int))) {
> > - trace("TCP (spliced): can't set SO_RCVLOWAT on "
> > - "%i", conn->s[0]);
> > - }
> > - conn_flag(c, conn, ~RCVLOWAT_SET_0);
> > - }
> > + for (side = 0; side < SIDES; side++) {
> > + uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
> > + uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
> >
> > - if ( (conn->flags & RCVLOWAT_SET_1) &&
> > - !(conn->flags & RCVLOWAT_ACT_1)) {
> > - if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT,
> > - &((int){ 1 }), sizeof(int))) {
> > - trace("TCP (spliced): can't set SO_RCVLOWAT on "
> > - "%i", conn->s[1]);
> > + if ( (conn->flags & set) && !(conn->flags & act)) {
> ^
>
> Nit: this extra whitespace is now useless.
Corrected.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (7 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 08/11] tcp_splice: Exploit side symmetry in tcp_splice_timer() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-10-12 1:51 ` [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy() David Gibson
2023-10-12 1:51 ` [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler David Gibson
10 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_connect_finish() has two very similar blocks opening the two
pipes for each direction of the connection. We can deduplicate this with
a loop across the two sides.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 65 ++++++++++++++++++++--------------------------------
1 file changed, 25 insertions(+), 40 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 259d774..99ef8a4 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -299,50 +299,35 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
static int tcp_splice_connect_finish(const struct ctx *c,
struct tcp_splice_conn *conn)
{
- int i;
-
- conn->pipe[0][0] = conn->pipe[1][0] = -1;
- conn->pipe[0][1] = conn->pipe[1][1] = -1;
-
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[0][0] < 0) {
- if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 0->1 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ int i = 0;
+ int side;
- if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
- c->tcp.pipe_size);
+ for (side = 0; side < SIDES; side++) {
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe[side][0],
+ splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[side][1],
+ splice_pipe_pool[i][1]);
+ break;
+ }
}
- }
- for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[1][0] < 0) {
- if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 1->0 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ if (conn->pipe[side][0] < 0) {
+ if (pipe2(conn->pipe[side], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create %d->%d pipe: %s",
+ side, !side, strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
- if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
- c->tcp.pipe_size);
+ if (fcntl(conn->pipe[side][0], F_SETPIPE_SZ,
+ c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set %d->%d pipe size to %lu",
+ side, !side, c->tcp.pipe_size);
+ }
}
}
--
@@ -299,50 +299,35 @@ void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
static int tcp_splice_connect_finish(const struct ctx *c,
struct tcp_splice_conn *conn)
{
- int i;
-
- conn->pipe[0][0] = conn->pipe[1][0] = -1;
- conn->pipe[0][1] = conn->pipe[1][1] = -1;
-
- for (i = 0; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[0][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[0][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[0][0] < 0) {
- if (pipe2(conn->pipe[0], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 0->1 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ int i = 0;
+ int side;
- if (fcntl(conn->pipe[0][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 0->1 pipe size to %lu",
- c->tcp.pipe_size);
+ for (side = 0; side < SIDES; side++) {
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+
+ for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
+ if (splice_pipe_pool[i][0] >= 0) {
+ SWAP(conn->pipe[side][0],
+ splice_pipe_pool[i][0]);
+ SWAP(conn->pipe[side][1],
+ splice_pipe_pool[i][1]);
+ break;
+ }
}
- }
- for (; i < TCP_SPLICE_PIPE_POOL_SIZE; i++) {
- if (splice_pipe_pool[i][0] >= 0) {
- SWAP(conn->pipe[1][0], splice_pipe_pool[i][0]);
- SWAP(conn->pipe[1][1], splice_pipe_pool[i][1]);
- break;
- }
- }
- if (conn->pipe[1][0] < 0) {
- if (pipe2(conn->pipe[1], O_NONBLOCK | O_CLOEXEC)) {
- err("TCP (spliced): cannot create 1->0 pipe: %s",
- strerror(errno));
- conn_flag(c, conn, CLOSING);
- return -EIO;
- }
+ if (conn->pipe[side][0] < 0) {
+ if (pipe2(conn->pipe[side], O_NONBLOCK | O_CLOEXEC)) {
+ err("TCP (spliced): cannot create %d->%d pipe: %s",
+ side, !side, strerror(errno));
+ conn_flag(c, conn, CLOSING);
+ return -EIO;
+ }
- if (fcntl(conn->pipe[1][0], F_SETPIPE_SZ, c->tcp.pipe_size)) {
- trace("TCP (spliced): cannot set 1->0 pipe size to %lu",
- c->tcp.pipe_size);
+ if (fcntl(conn->pipe[side][0], F_SETPIPE_SZ,
+ c->tcp.pipe_size)) {
+ trace("TCP (spliced): cannot set %d->%d pipe size to %lu",
+ side, !side, c->tcp.pipe_size);
+ }
}
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy()
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (8 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 09/11] tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-11-03 16:22 ` Stefano Brivio
2023-10-12 1:51 ` [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler David Gibson
10 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_destroy() has some close-to-duplicated logic handling closing of
the socket and ipies for each side of the connection. We can use a loop
across the sides to reduce the duplication.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 99ef8a4..239f6d2 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
- if (conn->events & SPLICE_ESTABLISHED) {
- /* Flushing might need to block: don't recycle them. */
- if (conn->pipe[0][0] != -1) {
- close(conn->pipe[0][0]);
- close(conn->pipe[0][1]);
- conn->pipe[0][0] = conn->pipe[0][1] = -1;
+ for (side = 0; side < SIDES; side++) {
+ if (conn->events & SPLICE_ESTABLISHED) {
+ /* Flushing might need to block: don't recycle them. */
+ if (conn->pipe[side][0] != -1) {
+ close(conn->pipe[side][0]);
+ close(conn->pipe[side][1]);
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+ }
}
- if (conn->pipe[1][0] != -1) {
- close(conn->pipe[1][0]);
- close(conn->pipe[1][1]);
- conn->pipe[1][0] = conn->pipe[1][1] = -1;
+
+ if (side == 0 || conn->events & SPLICE_CONNECT) {
+ close(conn->s[side]);
+ conn->s[side] = -1;
}
- }
- if (conn->events & SPLICE_CONNECT) {
- close(conn->s[1]);
- conn->s[1] = -1;
+ conn->read[side] = conn->written[side] = 0;
}
- close(conn->s[0]);
- conn->s[0] = -1;
- conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
-
conn->events = SPLICE_CLOSED;
conn->flags = 0;
debug("TCP (spliced): index %li, CLOSED", CONN_IDX(conn));
--
@@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
{
struct tcp_splice_conn *conn = &conn_union->splice;
+ int side;
- if (conn->events & SPLICE_ESTABLISHED) {
- /* Flushing might need to block: don't recycle them. */
- if (conn->pipe[0][0] != -1) {
- close(conn->pipe[0][0]);
- close(conn->pipe[0][1]);
- conn->pipe[0][0] = conn->pipe[0][1] = -1;
+ for (side = 0; side < SIDES; side++) {
+ if (conn->events & SPLICE_ESTABLISHED) {
+ /* Flushing might need to block: don't recycle them. */
+ if (conn->pipe[side][0] != -1) {
+ close(conn->pipe[side][0]);
+ close(conn->pipe[side][1]);
+ conn->pipe[side][0] = conn->pipe[side][1] = -1;
+ }
}
- if (conn->pipe[1][0] != -1) {
- close(conn->pipe[1][0]);
- close(conn->pipe[1][1]);
- conn->pipe[1][0] = conn->pipe[1][1] = -1;
+
+ if (side == 0 || conn->events & SPLICE_CONNECT) {
+ close(conn->s[side]);
+ conn->s[side] = -1;
}
- }
- if (conn->events & SPLICE_CONNECT) {
- close(conn->s[1]);
- conn->s[1] = -1;
+ conn->read[side] = conn->written[side] = 0;
}
- close(conn->s[0]);
- conn->s[0] = -1;
- conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
-
conn->events = SPLICE_CLOSED;
conn->flags = 0;
debug("TCP (spliced): index %li, CLOSED", CONN_IDX(conn));
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy()
2023-10-12 1:51 ` [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy() David Gibson
@ 2023-11-03 16:22 ` Stefano Brivio
2023-11-06 2:39 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-11-03 16:22 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 12 Oct 2023 12:51:13 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> tcp_splice_destroy() has some close-to-duplicated logic handling closing of
> the socket and ipies for each side of the connection. We can use a loop
^^^^^ pipes
> across the sides to reduce the duplication.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp_splice.c | 32 ++++++++++++++------------------
> 1 file changed, 14 insertions(+), 18 deletions(-)
>
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 99ef8a4..239f6d2 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
> void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
> {
> struct tcp_splice_conn *conn = &conn_union->splice;
> + int side;
>
> - if (conn->events & SPLICE_ESTABLISHED) {
> - /* Flushing might need to block: don't recycle them. */
> - if (conn->pipe[0][0] != -1) {
> - close(conn->pipe[0][0]);
> - close(conn->pipe[0][1]);
> - conn->pipe[0][0] = conn->pipe[0][1] = -1;
> + for (side = 0; side < SIDES; side++) {
> + if (conn->events & SPLICE_ESTABLISHED) {
> + /* Flushing might need to block: don't recycle them. */
> + if (conn->pipe[side][0] != -1) {
> + close(conn->pipe[side][0]);
> + close(conn->pipe[side][1]);
> + conn->pipe[side][0] = conn->pipe[side][1] = -1;
> + }
> }
> - if (conn->pipe[1][0] != -1) {
> - close(conn->pipe[1][0]);
> - close(conn->pipe[1][1]);
> - conn->pipe[1][0] = conn->pipe[1][1] = -1;
> +
> + if (side == 0 || conn->events & SPLICE_CONNECT) {
> + close(conn->s[side]);
> + conn->s[side] = -1;
> }
> - }
>
> - if (conn->events & SPLICE_CONNECT) {
> - close(conn->s[1]);
> - conn->s[1] = -1;
> + conn->read[side] = conn->written[side] = 0;
> }
>
> - close(conn->s[0]);
> - conn->s[0] = -1;
> - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
With this, on SPLICE_CONNECT, we would close the [0] side, but not the
[1] side. SPLICE_CONNECT means we already have an open socket for [1],
though. I think it should be:
[loop on sides]
if (side == 1 || conn->events & SPLICE_CONNECT) {
close(conn->s[side]);
conn->s[1] = -1;
}
}
and then we still need to unconditionally close conn->s[0]. Perhaps we could
take both parts outside of the loop:
if (conn->events & SPLICE_CONNECT) {
close(conn->s[1]);
conn->s[1] = -1;
}
close(conn->s[0]);
conn->s[0] = -1;
conn->read[side] = conn->written[side] = 0;
The handling for the SPLICE_ESTABLISHED case looks correct to me.
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy()
2023-11-03 16:22 ` Stefano Brivio
@ 2023-11-06 2:39 ` David Gibson
2023-11-06 13:21 ` Stefano Brivio
0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-11-06 2:39 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 3553 bytes --]
On Fri, Nov 03, 2023 at 05:22:13PM +0100, Stefano Brivio wrote:
> On Thu, 12 Oct 2023 12:51:13 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > tcp_splice_destroy() has some close-to-duplicated logic handling closing of
> > the socket and ipies for each side of the connection. We can use a loop
> ^^^^^ pipes
Oops, fixed.
> > across the sides to reduce the duplication.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp_splice.c | 32 ++++++++++++++------------------
> > 1 file changed, 14 insertions(+), 18 deletions(-)
> >
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 99ef8a4..239f6d2 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
> > void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
> > {
> > struct tcp_splice_conn *conn = &conn_union->splice;
> > + int side;
> >
> > - if (conn->events & SPLICE_ESTABLISHED) {
> > - /* Flushing might need to block: don't recycle them. */
> > - if (conn->pipe[0][0] != -1) {
> > - close(conn->pipe[0][0]);
> > - close(conn->pipe[0][1]);
> > - conn->pipe[0][0] = conn->pipe[0][1] = -1;
> > + for (side = 0; side < SIDES; side++) {
> > + if (conn->events & SPLICE_ESTABLISHED) {
> > + /* Flushing might need to block: don't recycle them. */
> > + if (conn->pipe[side][0] != -1) {
> > + close(conn->pipe[side][0]);
> > + close(conn->pipe[side][1]);
> > + conn->pipe[side][0] = conn->pipe[side][1] = -1;
> > + }
> > }
> > - if (conn->pipe[1][0] != -1) {
> > - close(conn->pipe[1][0]);
> > - close(conn->pipe[1][1]);
> > - conn->pipe[1][0] = conn->pipe[1][1] = -1;
> > +
> > + if (side == 0 || conn->events & SPLICE_CONNECT) {
> > + close(conn->s[side]);
> > + conn->s[side] = -1;
> > }
> > - }
> >
> > - if (conn->events & SPLICE_CONNECT) {
> > - close(conn->s[1]);
> > - conn->s[1] = -1;
> > + conn->read[side] = conn->written[side] = 0;
> > }
> >
> > - close(conn->s[0]);
> > - conn->s[0] = -1;
> > - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
>
> With this, on SPLICE_CONNECT, we would close the [0] side, but not the
> [1] side. SPLICE_CONNECT means we already have an open socket for [1],
> though. I think it should be:
>
> [loop on sides]
>
> if (side == 1 || conn->events & SPLICE_CONNECT) {
> close(conn->s[side]);
> conn->s[1] = -1;
> }
> }
>
> and then we still need to unconditionally close conn->s[0]. Perhaps we could
> take both parts outside of the loop:
Uh.. I think you're misreading. In the updated code we have:
if (side == 0 || conn->events & SPLICE_CONNECT) {
close(conn->s[side]);
conn->s[side] = -1;
}
That's an OR, so we always close side 0, and we close side 1 iff we
have SPLICE_CONNECT, which matches what you're describing.
> if (conn->events & SPLICE_CONNECT) {
> close(conn->s[1]);
> conn->s[1] = -1;
> }
>
> close(conn->s[0]);
> conn->s[0] = -1;
> conn->read[side] = conn->written[side] = 0;
>
> The handling for the SPLICE_ESTABLISHED case looks correct to me.
We are relying on the fact that setting SPLICE_ESTABLISHED doesn't
clear SPLICE_CONNECT.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy()
2023-11-06 2:39 ` David Gibson
@ 2023-11-06 13:21 ` Stefano Brivio
0 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-11-06 13:21 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Mon, 6 Nov 2023 13:39:14 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Nov 03, 2023 at 05:22:13PM +0100, Stefano Brivio wrote:
> > On Thu, 12 Oct 2023 12:51:13 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > tcp_splice_destroy() has some close-to-duplicated logic handling closing of
> > > the socket and ipies for each side of the connection. We can use a loop
> > ^^^^^ pipes
>
> Oops, fixed.
>
> > > across the sides to reduce the duplication.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > tcp_splice.c | 32 ++++++++++++++------------------
> > > 1 file changed, 14 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/tcp_splice.c b/tcp_splice.c
> > > index 99ef8a4..239f6d2 100644
> > > --- a/tcp_splice.c
> > > +++ b/tcp_splice.c
> > > @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new)
> > > void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union)
> > > {
> > > struct tcp_splice_conn *conn = &conn_union->splice;
> > > + int side;
> > >
> > > - if (conn->events & SPLICE_ESTABLISHED) {
> > > - /* Flushing might need to block: don't recycle them. */
> > > - if (conn->pipe[0][0] != -1) {
> > > - close(conn->pipe[0][0]);
> > > - close(conn->pipe[0][1]);
> > > - conn->pipe[0][0] = conn->pipe[0][1] = -1;
> > > + for (side = 0; side < SIDES; side++) {
> > > + if (conn->events & SPLICE_ESTABLISHED) {
> > > + /* Flushing might need to block: don't recycle them. */
> > > + if (conn->pipe[side][0] != -1) {
> > > + close(conn->pipe[side][0]);
> > > + close(conn->pipe[side][1]);
> > > + conn->pipe[side][0] = conn->pipe[side][1] = -1;
> > > + }
> > > }
> > > - if (conn->pipe[1][0] != -1) {
> > > - close(conn->pipe[1][0]);
> > > - close(conn->pipe[1][1]);
> > > - conn->pipe[1][0] = conn->pipe[1][1] = -1;
> > > +
> > > + if (side == 0 || conn->events & SPLICE_CONNECT) {
> > > + close(conn->s[side]);
> > > + conn->s[side] = -1;
> > > }
> > > - }
> > >
> > > - if (conn->events & SPLICE_CONNECT) {
> > > - close(conn->s[1]);
> > > - conn->s[1] = -1;
> > > + conn->read[side] = conn->written[side] = 0;
> > > }
> > >
> > > - close(conn->s[0]);
> > > - conn->s[0] = -1;
> > > - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
> >
> > With this, on SPLICE_CONNECT, we would close the [0] side, but not the
> > [1] side. SPLICE_CONNECT means we already have an open socket for [1],
> > though. I think it should be:
> >
> > [loop on sides]
> >
> > if (side == 1 || conn->events & SPLICE_CONNECT) {
> > close(conn->s[side]);
> > conn->s[1] = -1;
> > }
> > }
> >
> > and then we still need to unconditionally close conn->s[0]. Perhaps we could
> > take both parts outside of the loop:
>
> Uh.. I think you're misreading. In the updated code we have:
> if (side == 0 || conn->events & SPLICE_CONNECT) {
> close(conn->s[side]);
> conn->s[side] = -1;
> }
> That's an OR, so we always close side 0, and we close side 1 iff we
> have SPLICE_CONNECT, which matches what you're describing.
Gosh, yes, sorry, I read && for some reason.
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler
2023-10-12 1:51 [PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection David Gibson
` (9 preceding siblings ...)
2023-10-12 1:51 ` [PATCH 10/11] tcp_splice: Exploit side symmetry in tcp_splice_destroy() David Gibson
@ 2023-10-12 1:51 ` David Gibson
2023-11-03 16:21 ` Stefano Brivio
10 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-10-12 1:51 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
which of the socket, pipe and counter fields to use depending on which
side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named
fields, we can instead just use a variable indicating the side and use
that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 81 ++++++++++++++--------------------------------------
1 file changed, 22 insertions(+), 59 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 239f6d2..822d15a 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
return tcp_splice_connect(c, conn, s, port);
}
-/**
- * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
- * @conn: Connection pointers
- * @ref_sock: Socket returned as reference from epoll
- * @reverse: Reverse direction: @ref_sock is used as destination
- * @from: Destination socket pointer to set
- * @to: Source socket pointer to set
- * @pipes: Pipe set, assigned on return
- */
-static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
- int reverse, int *from, int *to, int **pipes)
-{
- if (!reverse) {
- *from = ref_sock;
- *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
- } else {
- *to = ref_sock;
- *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
- }
-
- *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
-}
-
/**
* tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
* @c: Execution context
@@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
int s, uint32_t events)
{
uint8_t lowat_set_flag, lowat_act_flag;
- int from, to, *pipes, eof, never_read;
- uint32_t *seq_read, *seq_write;
+ int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED)
return;
@@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->s[0])
+ if (s == conn->s[0]) {
conn_event(c, conn, ~OUT_WAIT_0);
- else
+ fromside = 1;
+ } else {
conn_event(c, conn, ~OUT_WAIT_1);
-
- tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
+ fromside = 0;
+ }
} else {
- tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
+ fromside = s == conn->s[0] ? 0 : 1;
}
if (events & EPOLLRDHUP) {
@@ -566,24 +543,16 @@ swap:
eof = 0;
never_read = 1;
- if (from == conn->s[0]) {
- seq_read = &conn->read[0];
- seq_write = &conn->written[0];
- lowat_set_flag = RCVLOWAT_SET_0;
- lowat_act_flag = RCVLOWAT_ACT_0;
- } else {
- seq_read = &conn->read[1];
- seq_write = &conn->written[1];
- lowat_set_flag = RCVLOWAT_SET_1;
- lowat_act_flag = RCVLOWAT_ACT_1;
- }
+ lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) {
ssize_t readlen, to_write = 0, written;
int more = 0;
retry:
- readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
+ readlen = splice(conn->s[fromside], NULL,
+ conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from read-side call", readlen);
if (readlen < 0) {
@@ -608,7 +577,8 @@ retry:
}
eintr:
- written = splice(pipes[0], NULL, to, NULL, to_write,
+ written = splice(conn->pipe[fromside][0], NULL,
+ conn->s[!fromside], NULL, to_write,
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from write-side call (passed %lu)",
written, to_write);
@@ -622,8 +592,8 @@ eintr:
readlen > (long)c->tcp.pipe_size / 10) {
int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
- &lowat, sizeof(lowat));
+ setsockopt(conn->s[fromside], SOL_SOCKET,
+ SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag);
conn_flag(c, conn, lowat_act_flag);
@@ -632,8 +602,8 @@ eintr:
break;
}
- *seq_read += readlen > 0 ? readlen : 0;
- *seq_write += written > 0 ? written : 0;
+ conn->read[fromside] += readlen > 0 ? readlen : 0;
+ conn->written[fromside] += written > 0 ? written : 0;
if (written < 0) {
if (errno == EINTR)
@@ -645,10 +615,8 @@ eintr:
if (never_read)
break;
- if (to == conn->s[0])
- conn_event(c, conn, OUT_WAIT_0);
- else
- conn_event(c, conn, OUT_WAIT_1);
+ conn_event(c, conn,
+ fromside == 0 ? OUT_WAIT_1 : OUT_WAIT_0);
break;
}
@@ -665,14 +633,14 @@ eintr:
}
if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[1], SHUT_WR);
conn_event(c, conn, FIN_SENT_1);
}
}
if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[0], SHUT_WR);
conn_event(c, conn, FIN_SENT_0);
}
@@ -684,12 +652,7 @@ eintr:
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
- SWAP(from, to);
- if (pipes == conn->pipe[0])
- pipes = conn->pipe[1];
- else
- pipes = conn->pipe[0];
-
+ fromside = !fromside;
goto swap;
}
--
@@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
return tcp_splice_connect(c, conn, s, port);
}
-/**
- * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
- * @conn: Connection pointers
- * @ref_sock: Socket returned as reference from epoll
- * @reverse: Reverse direction: @ref_sock is used as destination
- * @from: Destination socket pointer to set
- * @to: Source socket pointer to set
- * @pipes: Pipe set, assigned on return
- */
-static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
- int reverse, int *from, int *to, int **pipes)
-{
- if (!reverse) {
- *from = ref_sock;
- *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
- } else {
- *to = ref_sock;
- *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
- }
-
- *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
-}
-
/**
* tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
* @c: Execution context
@@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
int s, uint32_t events)
{
uint8_t lowat_set_flag, lowat_act_flag;
- int from, to, *pipes, eof, never_read;
- uint32_t *seq_read, *seq_write;
+ int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED)
return;
@@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
}
if (events & EPOLLOUT) {
- if (s == conn->s[0])
+ if (s == conn->s[0]) {
conn_event(c, conn, ~OUT_WAIT_0);
- else
+ fromside = 1;
+ } else {
conn_event(c, conn, ~OUT_WAIT_1);
-
- tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
+ fromside = 0;
+ }
} else {
- tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
+ fromside = s == conn->s[0] ? 0 : 1;
}
if (events & EPOLLRDHUP) {
@@ -566,24 +543,16 @@ swap:
eof = 0;
never_read = 1;
- if (from == conn->s[0]) {
- seq_read = &conn->read[0];
- seq_write = &conn->written[0];
- lowat_set_flag = RCVLOWAT_SET_0;
- lowat_act_flag = RCVLOWAT_ACT_0;
- } else {
- seq_read = &conn->read[1];
- seq_write = &conn->written[1];
- lowat_set_flag = RCVLOWAT_SET_1;
- lowat_act_flag = RCVLOWAT_ACT_1;
- }
+ lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
+ lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) {
ssize_t readlen, to_write = 0, written;
int more = 0;
retry:
- readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
+ readlen = splice(conn->s[fromside], NULL,
+ conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from read-side call", readlen);
if (readlen < 0) {
@@ -608,7 +577,8 @@ retry:
}
eintr:
- written = splice(pipes[0], NULL, to, NULL, to_write,
+ written = splice(conn->pipe[fromside][0], NULL,
+ conn->s[!fromside], NULL, to_write,
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
trace("TCP (spliced): %li from write-side call (passed %lu)",
written, to_write);
@@ -622,8 +592,8 @@ eintr:
readlen > (long)c->tcp.pipe_size / 10) {
int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
- &lowat, sizeof(lowat));
+ setsockopt(conn->s[fromside], SOL_SOCKET,
+ SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag);
conn_flag(c, conn, lowat_act_flag);
@@ -632,8 +602,8 @@ eintr:
break;
}
- *seq_read += readlen > 0 ? readlen : 0;
- *seq_write += written > 0 ? written : 0;
+ conn->read[fromside] += readlen > 0 ? readlen : 0;
+ conn->written[fromside] += written > 0 ? written : 0;
if (written < 0) {
if (errno == EINTR)
@@ -645,10 +615,8 @@ eintr:
if (never_read)
break;
- if (to == conn->s[0])
- conn_event(c, conn, OUT_WAIT_0);
- else
- conn_event(c, conn, OUT_WAIT_1);
+ conn_event(c, conn,
+ fromside == 0 ? OUT_WAIT_1 : OUT_WAIT_0);
break;
}
@@ -665,14 +633,14 @@ eintr:
}
if ((conn->events & FIN_RCVD_0) && !(conn->events & FIN_SENT_1)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[1], SHUT_WR);
conn_event(c, conn, FIN_SENT_1);
}
}
if ((conn->events & FIN_RCVD_1) && !(conn->events & FIN_SENT_0)) {
- if (*seq_read == *seq_write && eof) {
+ if (conn->read[fromside] == conn->written[fromside] && eof) {
shutdown(conn->s[0], SHUT_WR);
conn_event(c, conn, FIN_SENT_0);
}
@@ -684,12 +652,7 @@ eintr:
if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
events = EPOLLIN;
- SWAP(from, to);
- if (pipes == conn->pipe[0])
- pipes = conn->pipe[1];
- else
- pipes = conn->pipe[0];
-
+ fromside = !fromside;
goto swap;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler
2023-10-12 1:51 ` [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler David Gibson
@ 2023-11-03 16:21 ` Stefano Brivio
2023-11-04 6:02 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-11-03 16:21 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Thu, 12 Oct 2023 12:51:14 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
> which of the socket, pipe and counter fields to use depending on which
> side of the connection the socket event is coming from.
>
> Now that we are using arrays for the two sides, rather than separate named
> fields, we can instead just use a variable indicating the side and use
> that to index the arrays whever we need a particular side's field.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp_splice.c | 81 ++++++++++++++--------------------------------------
> 1 file changed, 22 insertions(+), 59 deletions(-)
>
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 239f6d2..822d15a 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> return tcp_splice_connect(c, conn, s, port);
> }
>
> -/**
> - * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
> - * @conn: Connection pointers
> - * @ref_sock: Socket returned as reference from epoll
> - * @reverse: Reverse direction: @ref_sock is used as destination
> - * @from: Destination socket pointer to set
> - * @to: Source socket pointer to set
> - * @pipes: Pipe set, assigned on return
> - */
> -static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
> - int reverse, int *from, int *to, int **pipes)
> -{
> - if (!reverse) {
> - *from = ref_sock;
> - *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
> - } else {
> - *to = ref_sock;
> - *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
> - }
> -
> - *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
> -}
> -
> /**
> * tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
> * @c: Execution context
> @@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
> int s, uint32_t events)
> {
> uint8_t lowat_set_flag, lowat_act_flag;
> - int from, to, *pipes, eof, never_read;
> - uint32_t *seq_read, *seq_write;
> + int fromside, eof, never_read;
>
> if (conn->events == SPLICE_CLOSED)
> return;
> @@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
> }
>
> if (events & EPOLLOUT) {
> - if (s == conn->s[0])
> + if (s == conn->s[0]) {
> conn_event(c, conn, ~OUT_WAIT_0);
> - else
> + fromside = 1;
> + } else {
> conn_event(c, conn, ~OUT_WAIT_1);
> -
> - tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
> + fromside = 0;
> + }
> } else {
> - tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
> + fromside = s == conn->s[0] ? 0 : 1;
> }
>
> if (events & EPOLLRDHUP) {
> @@ -566,24 +543,16 @@ swap:
> eof = 0;
> never_read = 1;
>
> - if (from == conn->s[0]) {
> - seq_read = &conn->read[0];
> - seq_write = &conn->written[0];
> - lowat_set_flag = RCVLOWAT_SET_0;
> - lowat_act_flag = RCVLOWAT_ACT_0;
> - } else {
> - seq_read = &conn->read[1];
> - seq_write = &conn->written[1];
> - lowat_set_flag = RCVLOWAT_SET_1;
> - lowat_act_flag = RCVLOWAT_ACT_1;
> - }
> + lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
> + lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
>
> while (1) {
> ssize_t readlen, to_write = 0, written;
> int more = 0;
>
> retry:
> - readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
> + readlen = splice(conn->s[fromside], NULL,
> + conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
> SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
> trace("TCP (spliced): %li from read-side call", readlen);
> if (readlen < 0) {
> @@ -608,7 +577,8 @@ retry:
> }
>
> eintr:
> - written = splice(pipes[0], NULL, to, NULL, to_write,
> + written = splice(conn->pipe[fromside][0], NULL,
> + conn->s[!fromside], NULL, to_write,
> SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
> trace("TCP (spliced): %li from write-side call (passed %lu)",
> written, to_write);
> @@ -622,8 +592,8 @@ eintr:
> readlen > (long)c->tcp.pipe_size / 10) {
> int lowat = c->tcp.pipe_size / 4;
>
> - setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
> - &lowat, sizeof(lowat));
> + setsockopt(conn->s[fromside], SOL_SOCKET,
> + SO_RCVLOWAT, &lowat, sizeof(lowat));
>
> conn_flag(c, conn, lowat_set_flag);
> conn_flag(c, conn, lowat_act_flag);
> @@ -632,8 +602,8 @@ eintr:
> break;
> }
>
> - *seq_read += readlen > 0 ? readlen : 0;
> - *seq_write += written > 0 ? written : 0;
> + conn->read[fromside] += readlen > 0 ? readlen : 0;
> + conn->written[fromside] += written > 0 ? written : 0;
The extra whitespace had the function of making this look more
"tabular", now you should add a further one (or drop it altogether
depending on taste).
--
Stefano
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 11/11] tcp_splice: Simplify selection of socket and pipe sides in socket handler
2023-11-03 16:21 ` Stefano Brivio
@ 2023-11-04 6:02 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-11-04 6:02 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 5644 bytes --]
On Fri, Nov 03, 2023 at 05:21:12PM +0100, Stefano Brivio wrote:
> On Thu, 12 Oct 2023 12:51:14 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
> > which of the socket, pipe and counter fields to use depending on which
> > side of the connection the socket event is coming from.
> >
> > Now that we are using arrays for the two sides, rather than separate named
> > fields, we can instead just use a variable indicating the side and use
> > that to index the arrays whever we need a particular side's field.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp_splice.c | 81 ++++++++++++++--------------------------------------
> > 1 file changed, 22 insertions(+), 59 deletions(-)
> >
> > diff --git a/tcp_splice.c b/tcp_splice.c
> > index 239f6d2..822d15a 100644
> > --- a/tcp_splice.c
> > +++ b/tcp_splice.c
> > @@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn,
> > return tcp_splice_connect(c, conn, s, port);
> > }
> >
> > -/**
> > - * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction
> > - * @conn: Connection pointers
> > - * @ref_sock: Socket returned as reference from epoll
> > - * @reverse: Reverse direction: @ref_sock is used as destination
> > - * @from: Destination socket pointer to set
> > - * @to: Source socket pointer to set
> > - * @pipes: Pipe set, assigned on return
> > - */
> > -static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock,
> > - int reverse, int *from, int *to, int **pipes)
> > -{
> > - if (!reverse) {
> > - *from = ref_sock;
> > - *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0];
> > - } else {
> > - *to = ref_sock;
> > - *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0];
> > - }
> > -
> > - *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1];
> > -}
> > -
> > /**
> > * tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection
> > * @c: Execution context
> > @@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
> > int s, uint32_t events)
> > {
> > uint8_t lowat_set_flag, lowat_act_flag;
> > - int from, to, *pipes, eof, never_read;
> > - uint32_t *seq_read, *seq_write;
> > + int fromside, eof, never_read;
> >
> > if (conn->events == SPLICE_CLOSED)
> > return;
> > @@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn,
> > }
> >
> > if (events & EPOLLOUT) {
> > - if (s == conn->s[0])
> > + if (s == conn->s[0]) {
> > conn_event(c, conn, ~OUT_WAIT_0);
> > - else
> > + fromside = 1;
> > + } else {
> > conn_event(c, conn, ~OUT_WAIT_1);
> > -
> > - tcp_splice_dir(conn, s, 1, &from, &to, &pipes);
> > + fromside = 0;
> > + }
> > } else {
> > - tcp_splice_dir(conn, s, 0, &from, &to, &pipes);
> > + fromside = s == conn->s[0] ? 0 : 1;
> > }
> >
> > if (events & EPOLLRDHUP) {
> > @@ -566,24 +543,16 @@ swap:
> > eof = 0;
> > never_read = 1;
> >
> > - if (from == conn->s[0]) {
> > - seq_read = &conn->read[0];
> > - seq_write = &conn->written[0];
> > - lowat_set_flag = RCVLOWAT_SET_0;
> > - lowat_act_flag = RCVLOWAT_ACT_0;
> > - } else {
> > - seq_read = &conn->read[1];
> > - seq_write = &conn->written[1];
> > - lowat_set_flag = RCVLOWAT_SET_1;
> > - lowat_act_flag = RCVLOWAT_ACT_1;
> > - }
> > + lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1;
> > + lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
> >
> > while (1) {
> > ssize_t readlen, to_write = 0, written;
> > int more = 0;
> >
> > retry:
> > - readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size,
> > + readlen = splice(conn->s[fromside], NULL,
> > + conn->pipe[fromside][1], NULL, c->tcp.pipe_size,
> > SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
> > trace("TCP (spliced): %li from read-side call", readlen);
> > if (readlen < 0) {
> > @@ -608,7 +577,8 @@ retry:
> > }
> >
> > eintr:
> > - written = splice(pipes[0], NULL, to, NULL, to_write,
> > + written = splice(conn->pipe[fromside][0], NULL,
> > + conn->s[!fromside], NULL, to_write,
> > SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
> > trace("TCP (spliced): %li from write-side call (passed %lu)",
> > written, to_write);
> > @@ -622,8 +592,8 @@ eintr:
> > readlen > (long)c->tcp.pipe_size / 10) {
> > int lowat = c->tcp.pipe_size / 4;
> >
> > - setsockopt(from, SOL_SOCKET, SO_RCVLOWAT,
> > - &lowat, sizeof(lowat));
> > + setsockopt(conn->s[fromside], SOL_SOCKET,
> > + SO_RCVLOWAT, &lowat, sizeof(lowat));
> >
> > conn_flag(c, conn, lowat_set_flag);
> > conn_flag(c, conn, lowat_act_flag);
> > @@ -632,8 +602,8 @@ eintr:
> > break;
> > }
> >
> > - *seq_read += readlen > 0 ? readlen : 0;
> > - *seq_write += written > 0 ? written : 0;
> > + conn->read[fromside] += readlen > 0 ? readlen : 0;
> > + conn->written[fromside] += written > 0 ? written : 0;
>
> The extra whitespace had the function of making this look more
> "tabular", now you should add a further one (or drop it altogether
> depending on taste).
Ah, yes. I've restored the alignment.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread