From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 98A225A0082 for ; Tue, 14 Feb 2023 16:07:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676387230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j9OMYL/poHDDgr9yxVZyBkDNeXMFyW56Me/7qnFWRKU=; b=ZGsnJ1odkbhVcExHwytmZ4GgG3VnNfvVyY5FCliZLKH6WxzZbYqkwQoUPVNri3c/1Fc56Q F0HJ+oZzW6DoC2pJ7TJuIedQZgIHLUs1bRGBMMpQDFpBqToVceObPqwpu3Z6LbEvISAfwA 1oZ7RQA5v15TdDPiWdjf3FwczyZOnug= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-627-CcNqUT-4MW6z-60Kmkr8ZQ-1; Tue, 14 Feb 2023 10:07:01 -0500 X-MC-Unique: CcNqUT-4MW6z-60Kmkr8ZQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E13E21871D9B; Tue, 14 Feb 2023 15:06:56 +0000 (UTC) Received: from maya.cloud.tilaa.com (unknown [10.33.32.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 61797C15BA0; Tue, 14 Feb 2023 15:06:56 +0000 (UTC) Date: Tue, 14 Feb 2023 16:06:53 +0100 From: Stefano Brivio To: Noah Gold Subject: Re: Improved handling of changing DNS resolvers Message-ID: <20230214160653.14192635@elisabeth> In-Reply-To: References: <20230121104703.3ebcc753@elisabeth> <20230202120940.2e044c4b@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: K3NPPZII66PYTGB7XOTND3ZWFEOL7JTL X-Message-ID-Hash: K3NPPZII66PYTGB7XOTND3ZWFEOL7JTL X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 13 Feb 2023 18:45:20 -0800 Noah Gold wrote: > On Thu, Feb 2, 2023 at 3:09 AM Stefano Brivio wrote: > > > > On Mon, 30 Jan 2023 16:11:38 -0800 > > Noah Gold wrote: > > > > > Sorry for the delay, I've been really busy this past week. > > > > > > On Sun, Jan 22, 2023 at 10:26 PM David Gibson > > > wrote: > > > > > > > > On Sat, Jan 21, 2023 at 10:47:03AM +0100, Stefano Brivio wrote: > > > > > Hi Noah, > > > > > > > > > > Sorry for the delay, I didn't check pending mailing list posts for a > > > > > couple of days. Comments below: > > > > > > > > > > On Tue, 17 Jan 2023 11:50:50 -0800 > > > > > Noah Gold wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > libslirp and Passt have different approaches to sharing DNS resolvers with > > > > > > the guest system, each with their own benefits & drawbacks. On the libslirp > > > > > > project, we're discussing [1] how to support DNS failover. Passt already has > > > > > > support for this, but there is a drawback to its solution which prevents us > > > > > > from taking a similar approach: the resolvers are read exactly once, so if the > > > > > > host changes networks at runtime, the guest will not receive the updated > > > > > > resolvers and thus its connectivity will break. > > > > > > > > So, passt/pasta kinda-sorta binds itself to a particular host > > > > interface, so DNS won't be the only issue if the host changes > > > > network. For one thing, at least by default the guest gets the same > > > > IP as the host, so if the host IP changes the guest will get out of > > > > sync. We'll mostly cope with that ok, but there will be some edge > > > > cases which will break (most obviously if after the network change the > > > > guest wants to talk to something at the host's old address / its > > > > current address). > > > > > > > > > Right -- the main motivation behind this (other than simplicity) is that > > > > > we can close /etc/resolv.conf before sandboxing. > > > > > > > > > > However, we could keep a handle on it, just like we do for PID and pcap > > > > > files, while still unmounting the filesystem. > > > > > > > > > > And we could also use inotify to detect changes I guess -- we do the > > > > > same to monitor namespaces in pasta mode (see pasta_netns_quit_init()). > > > > > > > > All true, but I'm not sure those are actually the most pressing issues > > > > we'll face with a host network change. > > > > > > > > > > libslirp's current approach is to DNAT a single address exposed to the guest > > > > > > to one of the resolvers configured on the host. The problem here is that if that > > > > > > one resolver goes down, the guest can't resolve DNS names. We're > > > > > > considering changing so that instead of a single address, we expose a set of > > > > > > MAXNS addresses, and DNAT those 1:1 to the DNS resolvers registered with > > > > > > the host. Because the DNAT table lives on the host side, we can refresh the > > > > > > guest's resolvers whenever the host's resolvers change, but without the need to > > > > > > expire a DHCP lease (even with short leases, the guest will still lose > > > > > > connectivity > > > > > > for a time). > > > > > > > > > > > > Does this sound like an approach Passt would be open to adopting as well? > > > > > > > > > > Yes, definitely, patches would be very welcome. > > > > > > > > Hm, that's doesn't fit that easily into the passt model. For the most > > > > part we don't NAT at all, we only have a couple of special cases where > > > > we do. Because of that the problem with adding any extra NAT case is > > > > address allocation. Currently we use the host's gateway address, > > > > which mostly works but is a bit troublesome. I have some ideas I > > > > think will work better, but those don't necessarily get us more > > > > available addresses. > > > > > > For libslirp we have the guest on a private subnet, so pulling addresses from > > > that pool is pretty easy. For passt is the issue that there is no address range, > > > or that the infrastructure to allocate from the range just doesn't exist yet? > > > > [David is out this and next week] > > > > There's no address range because it's not designed with NAT in mind, > > even though it can do NAT. From what we discussed with David in the > > past, the idea, if I recall correctly, was that you could decide to, at > > least, remap a particular address instead of the gateway address (more > > on that below) -- and perhaps something more flexible with more > > addresses, but not an arbitrary number of them, as passt doesn't do > > dynamic memory allocation. > > Ah okay, it's sharing the host network by default? Or at least, doing > its best to pretend that's the case? Yes -- not sharing it, just pretending. Again, that's only by default. > > > When you say "we use the host's gateway address", what is it used for > > > exactly? (I didn't follow the loopback example below.) > > > > The host's default gateway address (for both IPv4 and IPv6) is > > advertised, by default, as gateway address/next hop of default route, > > to the guest, via DHCP/NDP. > > > > Again by default (unless --no-map-gw is used), the guest can then use > > this address to refer to the host (and not its default gateway). See > > also the "Handling of traffic with local destination and source > > addressses" section in the NOTES of passt(1). > > > > However, this is, at the moment, unrelated to how DNS addresses are > > mapped: right now you can specify --dns-forward zero to two times > > (separately for IPv4 and IPv6) and that will forward DNS queries (with > > reverse mapping) to the first configured resolver. > > > > So, if you are happy with this kind of solution (with a NAT), you pick > > the addresses yourself, you don't need pools or ranges, and you would > > "just" need, on top of what's already available, to change, at runtime, > > the resolver passt forwards queries to (perhaps via inotify as I > > mentioned). > > Makes sense. The trouble is when N > 2, see below. > > > > > > Note that David (Cc'ed) is currently working on a generalised/flexible > > > > > address mapping mechanism, some kind of (simple) NAT table as far as I > > > > > understood it. > > > > > > > > That's a bit overstating it. I'm making our current single NAT case > > > > (translating host side loopback to gateway address on the guest) more > > > > configurable. I have plans (or at least ideas) for a more generalized > > > > NAT mechanism, but I'm really not implementing that yet. What I'm > > > > doing now is kind of a soft prerequisite for that rework though (as > > > > well as useful in its own right). > > > > > > > > > This might even address your DNS idea already, I'm not sure, I'd wait > > > > > for him to comment. > > > > > > > > Hadn't considered specifically that model, but it's a reasonbly > > > > natural extension of it (address allocation is still a complication). > > > > I'll certainly consider this case when I do more on this. > > > > > > It sounds like there might be a path to using NAT, but it's not something > > > that would be ready soon. Given that, would there be long term concerns > > > with using NAT for DNS in the way proposed here? I understand we can't > > > implement it now, but I'd like to understand if it's an approach we would > > > still rather avoid, even long term. > > > > I don't really see an issue with it, also because, actually, we already > > do it. :) ...even though it's for two address pairs only > > (internal/external IPv4/IPv6 addresses). If that's enough for your use > > case (more on that below), I think we can also implement a runtime > > change of resolvers now. > > Got it. The problem with just two pairs is when the host has N DNS > resolvers, and N-1 of them are broken (N > 3 is unfortunately possible > on the non unix systems (Windows) libslirp supports). It sounds like > the *future* approach for passt might be tricky if dynamic allocation > is completely off the table. Is some dynamic allocation permitted at > initialization time? If so, we could detect the # of resolvers and > perhaps take a start address as an argument? If really necessary, I think we could consider to do some dynamic allocation before the seccomp profile is applied. However, how many addresses of resolvers could we possibly want? 16? 32? Reading around a bit, it looks like Windows DHCP servers generally support assigning 25 addresses. It's not the same thing, but it would suggest that 32 a reasonable choice -- and that's something we could merrily use to size static buffers. > > > > -- > > > > David Gibson | I'll have my music baroque, and my code > > > > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > > > > | _way_ _around_! > > > > http://www.ozlabs.org/~dgibson > > > > > > On Wed, Jan 25, 2023 at 9:55 AM Stefano Brivio wrote: > > > > > > > > On Mon, 23 Jan 2023 17:20:13 +1100 > > > > David Gibson wrote: > > > > > > > > > On Sat, Jan 21, 2023 at 10:47:03AM +0100, Stefano Brivio wrote: > > > > > > Hi Noah, > > > > > > > > > > > > Sorry for the delay, I didn't check pending mailing list posts for a > > > > > > couple of days. Comments below: > > > > > > > > > > > > On Tue, 17 Jan 2023 11:50:50 -0800 > > > > > > Noah Gold wrote: > > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > > > libslirp and Passt have different approaches to sharing DNS resolvers with > > > > > > > the guest system, each with their own benefits & drawbacks. On the libslirp > > > > > > > project, we're discussing [1] how to support DNS failover. Passt already has > > > > > > > support for this, but there is a drawback to its solution which prevents us > > > > > > > from taking a similar approach: the resolvers are read exactly once, so if the > > > > > > > host changes networks at runtime, the guest will not receive the updated > > > > > > > resolvers and thus its connectivity will break. > > > > > > > > > > So, passt/pasta kinda-sorta binds itself to a particular host > > > > > interface, so DNS won't be the only issue if the host changes > > > > > network. For one thing, at least by default the guest gets the same > > > > > IP as the host, so if the host IP changes the guest will get out of > > > > > sync. We'll mostly cope with that ok, but there will be some edge > > > > > cases which will break (most obviously if after the network change the > > > > > guest wants to talk to something at the host's old address / its > > > > > current address). > > > > > > > > Noah, by the way, if your usage for DNS failover is related to a > > > > virtual machine being migrated to another host with different > > > > addressing, mind that you could simply tell qemu to connect to a new > > > > instance of passt. That's something you can't do with libslirp. > > > > > > It's not related to machine migration, though that's another interesting > > > case with similar constraints. The use case I'm thinking about is for a > > > mobile device that may experience network changes as part of its > > > normal operation (e.g. changing wifi networks). > > > > So... I admit I have no idea what happens exactly when you change parts > > of the host configuration, this kind of use case wasn't really a > > priority for passt in the... past. > > For the use case I'm looking at (present, not passt), it's probably > fine for the typical thing to happen (all open sockets timeout or hit > resets) since that's happening on the host anyways. > > > I expect it to mostly work. By default, we don't do NAT because (with > > default options) the address of the guest matches the address of the > > host. But once you change addresses and routes on the host, passt > > should just start doing NAT, it's implicit and not something you need > > to enable or disable. > > > > Would you have a chance to try it out in the use case you had in mind, > > so that we can go through any issue you might hit? > > I'm working exclusively with Windows at the moment, so presently this > is more to make sure the adjustments we make in libslirp could be > applied to passt... in the future. (Time travel aside, my vague > understanding is that passt may be the successor for libslirp, at > least based on the interest from the maintainers in keeping some > compatibility in terms of features. I'd be very curious if someone > could clarify how the two projects relate beyond solving very similar > problems.) I guess we should eventually add a FAQ section to the project website. Meanwhile, there's some kind of summary about that in slide 16 from a presentation at KVM Forum last year: https://static.sched.com/hosted_files/kvmforum2022/01/passt_kubevirt_kvm_forum_2022_final.pdf (recording at: https://www.youtube.com/watch?v=U89bWP1HNgU). By the way, a Win2k port is up for grabs at https://bugs.passt.top/show_bug.cgi?id=8. I think the complexity might be similar to a FreeBSD/Darwin port (tracked at https://bugs.passt.top/show_bug.cgi?id=6). > Conceptually though, I'll definitely keep this thread > updated if we run into issues implementing first in libslirp, as they > may apply to passt as well. Thanks! > > > > Would that solve your problem, or your issue is specifically related to > > > > DNS failover without any VM migration playing a role? > > > > > > It's not related to migration, but I wonder whether there's an idea there > > > which could be used. The approach I was taking was to make the > > > network component resilient to network changes. But another option is > > > to detect network changes and restart the network component. libslirp > > > still needs a way to support exposing multiple servers though, and I > > > wonder whether we would want to require library consumers to write > > > network awareness into their applications as opposed to solving it > > > for them. > > > > Restarting the network component has a single, fundamental advantage, I > > think: it's a convenient way to reset a number of states and stored > > information in an implicit way. > > > > For example, it's better to reset TCP connections (stop the process, > > sockets close) than to let them hang. We could reset connections > > explicitly, of course, but this adds a bit of complexity. > > > > Still, with some effort we could make an attempt at actually keeping > > them alive. Maybe this even works with passt already. > > > > So I'm not really sure what would be the best approach. Making the > > network component resilient to network changes, in the long term, > > sounds more appropriate and elegant to me. > > > > I was just suggesting that, in the short term, restarting passt should > > cover whatever use case you might have. > > Makes sense. I agree, long term resiliency seems like the cleaner solution. > -- Stefano