From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id ACE495A0082 for ; Tue, 14 Feb 2023 13:29:06 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676377745; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nzZwp9wrkQRpSAqW0X1BQzFNZk9/JVnip8Q2WAZot5U=; b=Lw2hence2fSxrILSi0OX/Uh01fYw+5cbDysoSsM+Sj+7vUATMvMNFAOhCYnKkoLtC7bWpV TGhY6Ts30YzJb9tRy3eFF8ntzodSKyKebq6zbKQsmgKv1GEVwWM8iLlrBfCM6AYBNCA44F dLVJ0FPZmswsukY3mRk/OvOtJAtvnX4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-495-6-6e121lMaClGxFc0fpf3g-1; Tue, 14 Feb 2023 07:29:04 -0500 X-MC-Unique: 6-6e121lMaClGxFc0fpf3g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 552B418E5345 for ; Tue, 14 Feb 2023 12:29:04 +0000 (UTC) Received: from maya.cloud.tilaa.com (unknown [10.33.32.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 03AFE40C1423; Tue, 14 Feb 2023 12:29:04 +0000 (UTC) Date: Tue, 14 Feb 2023 13:29:01 +0100 From: Stefano Brivio To: Michal =?UTF-8?B?UHLDrXZvem7DrWs=?= Subject: Re: [libvirt PATCH] qemu: allow passt to self-daemonize Message-ID: <20230214132901.1ca646cf@elisabeth> In-Reply-To: <218df103-ef4b-7329-bf87-4e77c8de3e3f@redhat.com> References: <20230208231310.1728051-1-laine@redhat.com> <20230214110813.5a6a568c@elisabeth> <218df103-ef4b-7329-bf87-4e77c8de3e3f@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: HUK4C6O3KX2HCJURIXG2HLIKBRGTBNJJ X-Message-ID-Hash: HUK4C6O3KX2HCJURIXG2HLIKBRGTBNJJ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: libvir-list@redhat.com, =?UTF-8?B?SsOhbg==?= Tomko , passt-dev@passt.top, Laine Stump X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 14 Feb 2023 12:13:28 +0100 Michal Pr=C3=ADvozn=C3=ADk wrote: > On 2/14/23 11:08, Stefano Brivio wrote: > > On Tue, 14 Feb 2023 09:01:39 +0100 > > Michal Pr=C3=ADvozn=C3=ADk wrote: > > =20 > >> On 2/9/23 00:13, Laine Stump wrote: =20 > >>> I initially had the passt process being started in an identical > >>> fashion to the slirp-helper - libvirt was daemonizing the new process > >>> and recording its pid in a pidfile. The problem with this is that, > >>> since it is daemonized immediately, any startup error in passt happen= s > >>> after the daemonization, and thus isn't seen by libvirt - libvirt > >>> believes that the process has started successfully and continues on > >>> its merry way. The result was that sometimes a guest would be started= , > >>> but there would be no passt process for qemu to use for network > >>> traffic. > >>> > >>> Instead, we should be starting passt in the same manner we start > >>> dnsmasq - we just exec it as normal (along with a request that passt > >>> create the pidfile, which is just another option on the passt > >>> commandline) and wait for the child process to exit; passt then has a > >>> chance to parse its commandline and complete all the setup prior to > >>> daemonizing itself; if it encounters an error and exits with a non-0 > >>> code, libvirt will see the code and know about the failure. We can > >>> then grab the output from stderr, log that so the "user" has some ide= a > >>> of what went wrong, and then fail the guest startup. > >>> > >>> Signed-off-by: Laine Stump > >>> --- > >>> src/qemu/qemu_passt.c | 9 ++++----- > >>> 1 file changed, 4 insertions(+), 5 deletions(-) =20 > >> > >> > >> OOOPS, somehow I've accidentally merged this. Let me post follow up pa= tches. > >> =20 > >>> > >>> diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c > >>> index 0f09bf3db8..f640a69c00 100644 > >>> --- a/src/qemu/qemu_passt.c > >>> +++ b/src/qemu/qemu_passt.c > >>> @@ -141,24 +141,23 @@ qemuPasstStart(virDomainObj *vm, > >>> g_autofree char *passtSocketName =3D qemuPasstCreateSocketPath(v= m, net); > >>> g_autoptr(virCommand) cmd =3D NULL; > >>> g_autofree char *pidfile =3D qemuPasstCreatePidFilename(vm, net)= ; > >>> + g_autofree char *errbuf =3D NULL; > >>> char macaddr[VIR_MAC_STRING_BUFLEN]; > >>> size_t i; > >>> pid_t pid =3D (pid_t) -1; > >>> int exitstatus =3D 0; > >>> int cmdret =3D 0; > >>> - VIR_AUTOCLOSE errfd =3D -1; > >>> =20 > >>> cmd =3D virCommandNew(PASST); > >>> =20 > >>> virCommandClearCaps(cmd); > >>> - virCommandSetPidFile(cmd, pidfile); > >>> - virCommandSetErrorFD(cmd, &errfd); > >>> - virCommandDaemonize(cmd); > >>> + virCommandSetErrorBuffer(cmd, &errbuf); > >>> =20 > >>> virCommandAddArgList(cmd, > >>> "--one-off", =20 > >> > >> BTW: we definitely need something better than this. IF, something goes > >> wrong after we've executed passt but before we execute QEMU, then pass= t > >> just hangs there. This is because passt clone()-s itself (i.e. creates= a > >> child process), but QEMU that would connect to the socket never comes > >> around. Thus, the child process never sees the EOF on the socket and > >> just hangs in there thinking there will be somebody connecting, soon. = =20 > >=20 > > Okay, I see the point now -- I thought libvirtd would start passt only > > once it knows for sure that the guest will connect to it. =20 >=20 > I'm failing to see how that would be possible. Starting a guest involves > many actions, each one of can fail. From defensive coding POV it's > better we have the option to kill passt. I don't know exactly, I thought the "probing" phase would be considered enough -- I'm not saying it's possible, just that it was my (flawed, then) assumption. > >> I thought this could be solved by just killing the whole process group= , > >> but the child process calls setsid(), which creates its own process > >> group. I've managed to work around this by passing --foreground, but I= 'm > >> unclear about the consequences. Though, it looks like it's still > >> dropping caps, creating its own namespaces, etc. So this may actually = be > >> the way to go. =20 > >=20 > > I wouldn't recommend that: --foreground is really intended for > > interactive usage and we won't be able, for example, to spawn a child > > in a new PID namespace, which is a nice security feature, I think. =20 >=20 > Well, it's clone() that brings all the problems (well, in combination > with setsid()). Yes, but other than being a security feature, that's how non-interactive executables are typically implemented. > > I already suggested this to Laine offline: can libvirt just connect() t= o > > the socket and close() it, in case QEMU doesn't start? Then passt will > > terminate. =20 >=20 > That relies on the fact that passt isn't stuck and responds to the EOF. There's no need for an end-of-file, just closing the socket is enough. Any other method of terminating the process relies on passt to do or not do something specific anyway, such as writing the correct PID file, writing a PID file at all, not blocking SIGTERM (in case you use that), etc. Even if you run it with --foreground, you still rely on it on correctly parsing options and not creating new processes in new sessions. Connecting to the socket and closing it is in the same class of reliability, I think. Statistically speaking, we had one (embarrassing) issue with the contents of the PID file being wrong, see passt commit 3ec02c097536 ("passt: Truncate PID file on open()"), and (so far) zero reported issues with passt not terminating on EPOLLHUP on its socket with --one-off. > We certainly can do that if passt needs graceful shutdown, but mustn't > rely on that. It doesn't need that -- it does absolutely nothing on shutdown. I'm just saying you can use that to terminate passt, only in case QEMU doesn't start. > > It should be a few (~5) lines of code, instead of all the complexity > > potentially involved in tracking PIDs and avoiding related races, and > > design-wise it looks clean to me (libvirtd plays for a moment the QEMU > > role, because QEMU is not around). >=20 > Well, we can place all these helper processes into a CGroup and let it > trace PIDs. That should be race free. The problem is where you get those PIDs from, at least if you just rely on PID files. If you don't use PID file descriptors ("pidfd", which I don't see used anywhere in libvirt), you could add the PID of another process (which had its PID recycled from a passt process that terminated meanwhile) to the cgroup, and later terminate something unrelated. --=20 Stefano