public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH] tcp: Set PSH flag for last incoming packets in a batch
Date: Wed, 22 Jan 2025 11:36:05 +1030	[thread overview]
Message-ID: <Z5BEfYXmzfADc_jj@zatzit> (raw)
In-Reply-To: <20250120234958.3247370-1-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6406 bytes --]

On Tue, Jan 21, 2025 at 12:49:58AM +0100, Stefano Brivio wrote:
> So far we omitted setting PSH flags for inbound traffic altogether: as
> we ignore the nature of the data we're sending, we can't conclude that
> some data is more or less urgent. This works fine with Linux guests,
> as the Linux kernel doesn't do much with it, on input: it will
> generally deliver data to the application layer without delay.
> 
> However, with Windows, things change: if we don't set the PSH flag on
> interactive inbound traffic, we can expect long delays before the data
> is delivered to the application.
> 
> This is very visible with RDP, where packets we send on behalf of the
> RDP client are delivered with delays exceeding one second:
> 
>   $ tshark -r rdp.pcap -td -Y 'frame.number in { 33170 .. 33173 }' --disable-protocol tls
>   33170   0.030296 93.235.154.248 → 88.198.0.164 54 TCP 49012 → 3389 [ACK] Seq=13820 Ack=285229 Win=387968 Len=0
>   33171   0.985412 88.198.0.164 → 93.235.154.248 105 TCP 3389 → 49012 [PSH, ACK] Seq=285229 Ack=13820 Win=63198 Len=51
>   33172   0.030373 93.235.154.248 → 88.198.0.164 54 TCP 49012 → 3389 [ACK] Seq=13820 Ack=285280 Win=387968 Len=0
>   33173   1.383776 88.198.0.164 → 93.235.154.248 424 TCP 3389 → 49012 [PSH, ACK] Seq=285280 Ack=13820 Win=63198 Len=370
> 
> in this example (packet capture taken by passt), frame #33172 is a
> mouse event sent by the RDP client, and frame #33173 is the first
> event (display reacting to click) sent back by the server. This
> appears as a 1.4 s delay before we get frame #33173.
> 
> If we set PSH, instead:
> 
>   $ tshark -r rdp_psh.pcap -td -Y 'frame.number in { 314 .. 317 }' --disable-protocol tls
>   314   0.002503 93.235.154.248 → 88.198.0.164 170 TCP 51066 → 3389 [PSH, ACK] Seq=7779 Ack=74047 Win=31872 Len=116
>   315   0.000557 88.198.0.164 → 93.235.154.248 54 TCP 3389 → 51066 [ACK] Seq=79162 Ack=7895 Win=62872 Len=0
>   316   0.012752 93.235.154.248 → 88.198.0.164 170 TCP 51066 → 3389 [PSH, ACK] Seq=7895 Ack=79162 Win=31872 Len=116
>   317   0.011927 88.198.0.164 → 93.235.154.248 107 TCP 3389 → 51066 [PSH, ACK] Seq=79162 Ack=8011 Win=62756 Len=53
> 
> here, in frame #116, our mouse event is delivered without a delay and
> receives a response in approximately 12 ms.
> 
> Set PSH on the last segment for any batch we dequeue from the socket,
> that is, set it whenever we know that we might not be sending data to
> the same port for a while.

Huh, fascinating.

> Reported-by: NN708
> Link: https://bugs.passt.top/show_bug.cgi?id=107
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>


> ---
>  tcp_buf.c | 11 ++++++++---
>  tcp_vu.c  |  7 +++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/tcp_buf.c b/tcp_buf.c
> index cbefa42..72d99c5 100644
> --- a/tcp_buf.c
> +++ b/tcp_buf.c
> @@ -239,9 +239,10 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
>   * @dlen:	TCP payload length
>   * @no_csum:	Don't compute IPv4 checksum, use the one from previous buffer
>   * @seq:	Sequence number to be sent
> + * @push:	Set PSH flag, last segment in a batch
>   */
>  static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> -			    ssize_t dlen, int no_csum, uint32_t seq)
> +			    ssize_t dlen, int no_csum, uint32_t seq, bool push)
>  {
>  	struct tcp_payload_t *payload;
>  	const uint16_t *check = NULL;
> @@ -268,6 +269,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  	payload->th.th_x2 = 0;
>  	payload->th.th_flags = 0;
>  	payload->th.ack = 1;
> +	payload->th.psh = push;
>  	iov[TCP_IOV_PAYLOAD].iov_len = dlen + sizeof(struct tcphdr);
>  	tcp_l2_buf_fill_headers(conn, iov, check, seq, false);
>  	if (++tcp_payload_used > TCP_FRAMES_MEM - 1)
> @@ -402,11 +404,14 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
>  	seq = conn->seq_to_tap;
>  	for (i = 0; i < send_bufs; i++) {
>  		int no_csum = i && i != send_bufs - 1 && tcp_payload_used;
> +		bool push = false;
>  
> -		if (i == send_bufs - 1)
> +		if (i == send_bufs - 1) {
>  			dlen = last_len;
> +			push = true;
> +		}
>  
> -		tcp_data_to_tap(c, conn, dlen, no_csum, seq);
> +		tcp_data_to_tap(c, conn, dlen, no_csum, seq, push);
>  		seq += dlen;
>  	}
>  
> diff --git a/tcp_vu.c b/tcp_vu.c
> index a216bb1..fad7065 100644
> --- a/tcp_vu.c
> +++ b/tcp_vu.c
> @@ -289,10 +289,11 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c,
>   * @iov_cnt:		Number of entries in @iov
>   * @check:		Checksum, if already known
>   * @no_tcp_csum:	Do not set TCP checksum
> + * @push:		Set PSH flag, last segment in a batch
>   */
>  static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn,
>  			   struct iovec *iov, size_t iov_cnt,
> -			   const uint16_t **check, bool no_tcp_csum)
> +			   const uint16_t **check, bool no_tcp_csum, bool push)
>  {
>  	const struct flowside *toside = TAPFLOW(conn);
>  	bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr));
> @@ -334,6 +335,7 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn,
>  	memset(th, 0, sizeof(*th));
>  	th->doff = sizeof(*th) / 4;
>  	th->ack = 1;
> +	th->psh = push;
>  
>  	tcp_fill_headers(conn, NULL, ip4h, ip6h, th, &payload,
>  			 *check, conn->seq_to_tap, no_tcp_csum);
> @@ -443,6 +445,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
>  		struct iovec *iov = &elem[head[i]].in_sg[0];
>  		int buf_cnt = head[i + 1] - head[i];
>  		ssize_t dlen = iov_size(iov, buf_cnt) - hdrlen;
> +		bool push = i == head_cnt - 1;
>  
>  		vu_set_vnethdr(vdev, iov->iov_base, buf_cnt);
>  
> @@ -451,7 +454,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
>  			check = NULL;
>  		previous_dlen = dlen;
>  
> -		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap);
> +		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push);
>  
>  		if (*c->pcap) {
>  			pcap_iov(iov, buf_cnt,

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      parent reply	other threads:[~2025-01-22  1:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-20 23:49 [PATCH] tcp: Set PSH flag for last incoming packets in a batch Stefano Brivio
2025-01-20 23:56 ` Stefano Brivio
2025-01-22  1:06 ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5BEfYXmzfADc_jj@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).