From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by passt.top (Postfix) with ESMTP id DDAD75A0082
	for <passt-dev@passt.top>; Thu, 26 Jan 2023 00:21:50 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1674688909;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=u0LEV8ATxwQtXP8hRXKWTeFf6EGqxQL/3SKqr8ZLKxw=;
	b=hlm3u9LrjmDWB1cpzxO5nU+4VhUE6nXgLyCrpa1nSeWNTOXIgZRdCYSEAPWTjvKZy2Af1R
	X0eMhW/ucU2HkfcFHW6OQGpYr6qxeGtppFU8/9TUgDD4GE6QMetGU0+AuIjqNJWTr8LFZz
	lZoPRstBMnp7GPXeJq2/QxsRjstHTLk=
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-248-rqUsXV92MR2oXjVNeJUUnQ-1; Wed, 25 Jan 2023 18:21:48 -0500
X-MC-Unique: rqUsXV92MR2oXjVNeJUUnQ-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3ED603814593;
	Wed, 25 Jan 2023 23:21:48 +0000 (UTC)
Received: from maya.cloud.tilaa.com (ovpn-208-4.brq.redhat.com [10.40.208.4])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id AFEFE40C200C;
	Wed, 25 Jan 2023 23:21:47 +0000 (UTC)
Date: Thu, 26 Jan 2023 00:21:33 +0100
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v3 00/18] RFC: Unify and simplify tap send path
Message-ID: <20230126002133.7a8eec98@elisabeth>
In-Reply-To: <Y9CeaIb8i+27zn18@yekko>
References: <20230106004322.985665-1-david@gibson.dropbear.id.au>
	<20230124222043.281ef58c@elisabeth>
	<Y9CeaIb8i+27zn18@yekko>
Organization: Red Hat
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID-Hash: WWZYTQX5RHRG5TJ6M5X3NWUMM4XN5PDY
X-Message-ID-Hash: WWZYTQX5RHRG5TJ6M5X3NWUMM4XN5PDY
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.3
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20230126002133.7a8eec98@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/WWZYTQX5RHRG5TJ6M5X3NWUMM4XN5PDY/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Wed, 25 Jan 2023 14:13:44 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Jan 24, 2023 at 10:20:43PM +0100, Stefano Brivio wrote:
> > On Fri,  6 Jan 2023 11:43:04 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > Although we have an abstraction for the "slow path" (DHCP, NDP) guest
> > > bound packets, the TCP and UDP forwarding paths write directly to the
> > > tap fd.  However, it turns out how they send frames to the tap device
> > > is more similar than it originally appears.
> > > 
> > > This series unifies the low-level tap send functions for TCP and UDP,
> > > and makes some clean ups along the way.
> > > 
> > > This is based on my earlier outstanding series.  
> > 
> > For some reason, performance tests consistently get stuck (both TCP and
> > UDP, sometimes throughput, sometimes latency tests) with this series,
> > and not without it, but I don't see any possible relationship with that.  
> 
> Drat, I didn't encounter that.   Any chance you could bisect to figure
> out which patch specifically seems to trigger it?

I couldn't do it conclusively, yet. :/

Before "tcp: Combine two parts of passt tap send path together", no
stalls at all. After that, I'm routinely getting a stall on the
perf/passt_udp test, IPv4 host-to-guest with 256B MTU.

I know, that test is probably meaningless as a performance figure, but
it helps find issues like this, at least. :)

Yes, UDP -- the iperf3 client doesn't connect to the server, passt
doesn't crash, but it's gone (zombie) by the time I get to it. I think
it's the test scripts terminating it (even though I don't see anything
on the terminal), and script.log ends with:

2023/01/25 21:27:14 socat[3432381] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
ssh-keygen: generating new host keys: RSA 2023/01/25 21:27:14 socat[3432390] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
2023/01/25 21:27:14 socat[3432393] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
2023/01/25 21:27:14 socat[3432396] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
2023/01/25 21:27:14 socat[3432399] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
DSA ECDSA ED25519 
# Warning: Permanently added 'guest' (ED25519) to the list of known hosts.

which looks like fairly normal retries.

If I run the tests with DEBUG=1, they get stuck during UDP functional
testing, so I'm letting that aside for a moment.

If I apply the whole series, other tests get stuck (including TCP ones).

There might be something going wrong with iperf3's (TCP) control
message exchange. I'm going to run this single test next, and add some
debugging prints here and there.

> I wonder if this could be related to the stalls I'm debugging,
> although those didn't appear on the perf tests and also occur on
> main.  I have now discovered they seem to be masked by large socket
> buffer sizes - more info at https://bugs.passt.top/show_bug.cgi?id=41

Maybe the subsequent failures (or even this one) could actually be
related, and triggered somehow by some change in timing. I'm still
clueless at the moment.

-- 
Stefano