From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id A25D55A0082 for ; Thu, 10 Nov 2022 13:59:46 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668085185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U+vpZd8zVFKCRhvjEIxW2gclut+sYuVO+rHf5IfTyR4=; b=dCOEgBP21r8s74GpU/Lvd1gc04BcmbSE773YTdDHACnt7C3UqTVt3ZbDPiIhXx7oGQDqiK O9aZPk0XA10tJrNiz5lZ/6HY7UQ3nM08/fokQTUbcPZlpmG53l9fSH/9X/88Q/+A37GrpY GDW/iUEdKrTG7vzylAjBT8yiFvaoR1U= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-407-GzJ7iUOBM7Cbg3E9zlRn5w-1; Thu, 10 Nov 2022 07:59:42 -0500 X-MC-Unique: GzJ7iUOBM7Cbg3E9zlRn5w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E4D5A101A56D; Thu, 10 Nov 2022 12:59:41 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-9.brq.redhat.com [10.40.208.9]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 757CC2023A16; Thu, 10 Nov 2022 12:59:41 +0000 (UTC) Date: Thu, 10 Nov 2022 13:59:38 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls Message-ID: <20221110135938.27d23884@elisabeth> In-Reply-To: <20221109112929.0ee6c107@elisabeth> References: <20221108085419.3220900-1-sbrivio@redhat.com> <20221108085419.3220900-2-sbrivio@redhat.com> <20221109112929.0ee6c107@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: A4VJCHM5X6MOIUWXJFTHI5GMTECICX77 X-Message-ID-Hash: A4VJCHM5X6MOIUWXJFTHI5GMTECICX77 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 9 Nov 2022 11:29:29 +0100 Stefano Brivio wrote: > On Wed, 9 Nov 2022 21:15:48 +1100 > David Gibson wrote: > > > On Tue, Nov 08, 2022 at 09:54:18AM +0100, Stefano Brivio wrote: > > > I got all paranoid after triggering a divide-by-zero general > > > protection fault in passt with a qemu version without the virtio_net > > > TX hang fix, while flooding UDP. I start thinking this was actually > > > coming from some random changes I was playing with, but before > > > reaching this conclusion I reviewed once more the relatively short > > > path in tap_handler_passt() before we start using packet_*() > > > functions, and found this. > > > > > > Never observed in practice, but artificially reproduced with changes > > > in qemu's socket interface: if we don't receive from qemu a complete > > > length descriptor in one recv() call, or if we receive a partial one > > > at the end of one call, we currently disregard the rest, which would > > > make the stream inconsistent. > > > > > > Nothing really bad happens, except that from that point on we would > > > disregard all the packets we get until, if ever, we get the stream > > > back in sync by chance. > > > > > > Force reading a complete packet length descriptor with a blocking > > > recv(), if needed -- not just a complete packet later. > > > > > > Signed-off-by: Stefano Brivio > > > > This seems an ok short term fix, but I think we want another approach > > in the slightly longer term. Read on.. > > > > > --- > > > tap.c | 18 +++++++++++++++--- > > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > > > diff --git a/tap.c b/tap.c > > > index f8314ef..11ac732 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -747,14 +747,26 @@ redo: > > > return -ECONNRESET; > > > } > > > > > > - while (n > (ssize_t)sizeof(uint32_t)) { > > > - ssize_t len = ntohl(*(uint32_t *)p); > > > + while (n > 0) { > > > + ssize_t len; > > > + > > > + /* Force receiving at least a complete length descriptor to > > > + * avoid an inconsistent stream. > > > + */ > > > > Is it actually enough for this to be blocking? AFAICT, recv() on a > > stream socket, like read(), can return less data than you requested. > > It's not enough, hence the check on 'rem' afterwards, and this doesn't > cover anyway the case were qemu would decide to send one byte at a > time (because as you pointed out blocking doesn't mean we'll get the > full amount requested), which never happens in practice, though. > > > > + if (n < (ssize_t)sizeof(uint32_t)) { > > > + rem = recv(c->fd_tap, p + n, > > > + (ssize_t)sizeof(uint32_t) - n, 0); > > > + if ((n += rem) != (ssize_t)sizeof(uint32_t)) > > > + return 0; > > > + } > > > + > > > + len = ntohl(*(uint32_t *)p); > > > > > > p += sizeof(uint32_t); > > > n -= sizeof(uint32_t); > > > > > > /* At most one packet might not fit in a single read, and this > > > - * needs to be blocking. > > > + * also needs to be blocking. > > > > Same issue here (obviously not introduced by this patch, though). > > Same here. > > > > */ > > > if (len > n) { > > > rem = recv(c->fd_tap, p + n, len - n, 0); > > > > Can we handle both these cases more neatly (and without blocking > > recv()) calls, if we maintain two pointers into pkt_buf. The first > > one tracks how much we've read from the qemu socket, the second tracks > > how much has been parsed into packets. When we get an epoll > > notification on the qemu socket, we recv() and advance the first > > pointer. Then we discern as many full packets as we can, advancing > > the second pointer. > > Yes, and I actually drafted something like that, but it takes a lot of > attention and time to get it right, so I preferred to keep it simple > until now. I can file a ticket as enhancement. https://bugs.passt.top/show_bug.cgi?id=38 -- Stefano