On Fri, May 24, 2024 at 01:26:55PM -0400, Jon Maloy wrote: > >From linux-6.9.0 the kernel will contain > commit 05ea491641d3 ("tcp: add support for SO_PEEK_OFF socket option"). > > This new feature makes is possible to call recv_msg(MSG_PEEK) and make > it start reading data from a given offset set by the SO_PEEK_OFF socket > option. This way, we can avoid repeated reading of already read bytes of > a received message, hence saving read cycles when forwarding TCP > messages in the host->name space direction. > > In this commit, we add functionality to leverage this feature when > available, while we fall back to the previous behavior when not. > > Measurements with iperf3 shows that throughput increases with 15-20 > percent in the host->namespace direction when this feature is used. > > Signed-off-by: Jon Maloy > --- > tcp.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 51 insertions(+), 8 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 146ab8f..01898f1 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -509,6 +509,9 @@ static struct iovec tcp6_l2_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; > static struct iovec tcp4_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; > static struct iovec tcp6_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; > > +/* Does the kernel support TCP_PEEK_OFF? */ > +static bool peek_offset_cap; > + > /* sendmsg() to socket */ > static struct iovec tcp_iov [UIO_MAXIOV]; > > @@ -524,6 +527,20 @@ static_assert(ARRAY_SIZE(tc_hash) >= FLOW_MAX, > int init_sock_pool4 [TCP_SOCK_POOL_SIZE]; > int init_sock_pool6 [TCP_SOCK_POOL_SIZE]; > > +/** > + * tcp_set_peek_offset() - Set SO_PEEK_OFF offset on a socket if supported > + * @s: Socket to update > + * @offset: Offset in bytes > + */ > +static void tcp_set_peek_offset(int s, int offset) > +{ > + if (!peek_offset_cap) > + return; > + > + if (setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &offset, sizeof(offset))) > + err("Failed to set SO_PEEK_OFF to %i in socket %i", offset, s); I feel like we need to reset the connection if we ever reach here. This means that SO_PEEK_OFF is now out of sync and we apparently can't fix it. If we keep the connection alive, we will inevitably send incorrect data across it, which seems pretty bad. Or, maybe we think this is unlikely enough we could just die(). Otherwise, LGTM. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson