From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 385F35A005E for ; Thu, 17 Nov 2022 17:18:16 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668701895; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YjxPEuWsTHG2yeP1JsxYm2j8YDxy+tgEP2WKSQj/WZ0=; b=KC88FYvr9L0mDl5SbuXsV4lsozAcpLcUxQ5nNqpel6bGvsflbWLWuRgJutVSxCzwU7fxZx cnZ0sw/2aPJ17u4YJOB4bjL7oOG4q17hNamD4sqSwp1fqPJwO9WNhaeqxOjJpaxzHMTcMH TQrfBMj1YLUCDqNxJKB6ZJTJfOGtPBI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-472-0AKKa5XjPgmC0vgCz_kZ3g-1; Thu, 17 Nov 2022 11:18:13 -0500 X-MC-Unique: 0AKKa5XjPgmC0vgCz_kZ3g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 790E91C05EB7 for ; Thu, 17 Nov 2022 16:18:13 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-8.brq.redhat.com [10.40.208.8]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C4AFD40C83AA; Thu, 17 Nov 2022 16:18:07 +0000 (UTC) Date: Thu, 17 Nov 2022 17:18:05 +0100 From: Stefano Brivio To: "Richard W.M. Jones" Subject: Re: [PATCH v2 3/5] passt, tap: Add --fd option Message-ID: <20221117171805.3746f53a@elisabeth> In-Reply-To: <20221117160243.GH7636@redhat.com> References: <20221117122614.1269214-4-rjones@redhat.com> <20221117152640.2535159-1-sbrivio@redhat.com> <20221117153306.GF7636@redhat.com> <20221117164931.5585f60f@elisabeth> <20221117160243.GH7636@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: VCN3PXHMVZXOHNQQLQ53CZCUVLXK6OOY X-Message-ID-Hash: VCN3PXHMVZXOHNQQLQ53CZCUVLXK6OOY X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 17 Nov 2022 16:02:43 +0000 "Richard W.M. Jones" wrote: > On Thu, Nov 17, 2022 at 04:49:31PM +0100, Stefano Brivio wrote: > > On Thu, 17 Nov 2022 15:33:06 +0000 > > "Richard W.M. Jones" wrote: > > > > > On Thu, Nov 17, 2022 at 04:26:40PM +0100, Stefano Brivio wrote: > > > > From: "Richard W.M. Jones" > > > > > > > > This passes a fully connected stream socket to passt. > > > > > > > > Signed-off-by: Richard W.M. Jones > > > > [sbrivio: reuse fd_tap instead of adding a new descriptor, > > > > imply --one-off on --fd, add to optstring and usage()] > > > > Signed-off-by: Stefano Brivio > > > > --- > > > > v2: > > > > - reuse fd_tap, we don't need a separate file descriptor > > > > - add F to optstring and usage(), for both passt and pasta > > > > - imply --one-off, we can't do much once the socket is closed > > > > > > > > With this, the trick from 5/5 goes a bit further: passt reads > > > > from the file descriptor passed by the wrapper. > > > > > > Thanks for the v2 .. I'll add it to my series and play with it. > > > > > > > However, we get EPOLLRDHUP right away, from the close() on > > > > one end of the socket pair I guess. Should we just ignore > > > > all EPOLLRDHUP events, just the first one...? > > > > > > Does it see the event out-of-band or does it get an in-band > > > read(2) == 0 after finishing reading the data? > > > > Out-of-band, so to speak: we won't even recv() if we get EPOLLRDHUP > > (that's handled in tap_handler()). If I do this on top of this patch: > > > > --- a/tap.c > > +++ b/tap.c > > @@ -1073,7 +1073,7 @@ void tap_sock_init(struct ctx *c) > > struct epoll_event ev = { 0 }; > > > > ev.data.fd = c->fd_tap; > > - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; > > + ev.events = EPOLLIN | EPOLLET; > > > > Then it gets those four bytes: > > > > [pid 2538704] epoll_wait(5, 0x7ffedc4a6320, 8, 1000) = 1 > > [pid 2538704] recvfrom(4, 0x560797677000, 8323069, MSG_DONTWAIT, NULL, NULL) = 4 > > [pid 2538704] epoll_wait(5, [], 8, 1000) = 0 > > [pid 2538704] epoll_wait(5, 0x7ffedc4a6320, 8, 1000) = -1 EINTR (Interrupted system call) > > > > and does nothing with them, as expected. Two epoll_wait() calls later, > > the syscall is interrupted, I'm not sure why and how we should react (in > > main(), passt.c) in that case. > > With EPOLLRDHUP removed it's a bit deadlock-y. One case I commonly > see is: > > child (qemu): > > 1295637 write(5, "\0\0\0\0", 4 > 1295637 <... write resumed>) = 4 > 1295637 poll([{fd=5, events=POLLIN|POLLOUT}], 1, -1 > 1295637 <... poll resumed>) = 1 ([{fd=5, revents=POLLOUT}]) > 1295637 read(3, > 1295637 <... read resumed>"", 512) = 0 > 1295637 shutdown(5, SHUT_WR > 1295637 <... shutdown resumed>) = 0 > 1295637 poll([{fd=5, events=POLLIN}], 1, -1 > > then later the parent (passt): > > 1295636 epoll_wait(5, 0x7ffe93a894e0, 8, 1000) = 1 > 1295636 recvfrom(4, 0x5576383cf000, 8323069, MSG_DONTWAIT, NULL, NULL) = 4 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > 1295636 epoll_wait(5, [], 8, 1000) = 0 > (forever) Ah, yes, we ignore EPOLLRDHUP, and there's a one-second timeout there, so it's actually expected. > Removing EPOLLET (edge triggered) delivers the EPOLLIN event over and > over again to passt as expected: > > 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 > 1299436 epoll_wait(5, 0x7ffdd8a65640, 8, 1000) = 1 > 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 > 1299436 epoll_wait(5, 0x7ffdd8a65640, 8, 1000) = 1 > 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 > (forever) I'm not sure that's EPOLLIN, it could be anything (we don't check explicitly). > but I expected that passt would exit the first time it reads 0 from > the socket. > > tap_handler_passt() seems like it only considers the n<0 (error) and > n>0 (data) cases. What do you think about checking for n == 0 and > exiting immediately if c->one_off is true? I think it's reasonable in any case. At the same time, we don't necessarily need to ignore EPOLLRDHUP, we could also simply try to read from the socket first, that is, move this: if ((c->mode == MODE_PASST && tap_handler_passt(c, now)) || (c->mode == MODE_PASTA && tap_handler_pasta(c, now))) goto reinit; before this: if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) goto reinit; and we'll just get EBADF on recv(). If we have data to read, we won't necessarily get a EPOLLRDHUP (or EPOLLHUP) and EPOLLIN separately, so trying to read first makes sense for the case you described. An explicit check on EPOLLIN is probably a good idea too: read if we have anything to read, first, then quit if we have a reason to do so. -- Stefano