From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Qlkh2EDS; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id F40955A004C for ; Thu, 17 Oct 2024 10:31:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729153891; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NhY7+xXDAnMmmvGcemmXJMyzPTStDcWdKngJ83qGRO0=; b=Qlkh2EDSUOGI5A5rMTvMlsfYGhaq+kz4oZGqB8Otc6HNljcT7P+hNd/6kQ1OaZvKhSAdGw NA3QOtG73nKGJanCxVaQ8ny/LxCrGQ0ZFJAEWKgqFB9SBKQZUV+Vdoa3+rCr9ik4SBeWdS SOy/mfjTjbPLfbJQ6ObgtfO8o8X2EWw= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-221-dmPJy-YkNIGj85yn3_kTDw-1; Thu, 17 Oct 2024 04:31:29 -0400 X-MC-Unique: dmPJy-YkNIGj85yn3_kTDw-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43159603c92so3468135e9.2 for ; Thu, 17 Oct 2024 01:31:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729153888; x=1729758688; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NhY7+xXDAnMmmvGcemmXJMyzPTStDcWdKngJ83qGRO0=; b=VfaSJxlU0iZ40aoCzNLT7hp6IpKCXrwKyrpChZfG338wuNRsAIRpq+QVbjHkLcr4q4 qH9KzpL5YkptisT/xGeOVb8Gpf+sFQ721EltOEAc2V7U0DMyfcAggRD3GnZ407T9Nxp0 ARST+zhdJvERRNc+b4ls+sIfzukbWX9DdJQtv3pBCqF69fYbfgRvUXr4Gr4oGsyKpnJ3 o4R/3Z46rgjEAQgYd+3pvNyTpbsQOzD+FK8llFyrtqT+/t6aABUTwIRq03Mcu1LMqYxD Qdsuu5nd66RHo8M3O1rrDaWbyq5bUaRgBHq5I4N630AElDL7fc7m5uaYqVrWo4R0sAMK szrQ== X-Gm-Message-State: AOJu0Yy8fus2zTwRQh4g3yMLVfc5/hMfDB65xLVy0dVeiKEEalyw5pt7 NYZvvrzFmBHihzR8tjhzwoZm2OXq1Ss59f/urghcNqI+WpH8Hel7038+RML/eYHYMXVnLqeXInJ Z9JPFvA+GRQGf+cT9QdGpyXc7mCCBuCBREL3eFpyqe4Qsh4vjUGME+qCCVQ== X-Received: by 2002:a05:600c:19d0:b0:428:1b0d:8657 with SMTP id 5b1f17b1804b1-43125609022mr143080525e9.22.1729153887748; Thu, 17 Oct 2024 01:31:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHBUMUAtdcErFBDkD3mPZqmZYZ3s+MHvdjWe0OaIL2mxhg2K56x1L4oRebNCWbDv1sh3w0MfQ== X-Received: by 2002:a05:600c:19d0:b0:428:1b0d:8657 with SMTP id 5b1f17b1804b1-43125609022mr143080125e9.22.1729153886857; Thu, 17 Oct 2024 01:31:26 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37d7fa7a04asm6522737f8f.8.2024.10.17.01.31.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Oct 2024 01:31:26 -0700 (PDT) Date: Thu, 17 Oct 2024 10:31:22 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v3 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: <20241017103122.29b1afb0@elisabeth> In-Reply-To: References: <20241002054826.1812844-1-david@gibson.dropbear.id.au> <20241002054826.1812844-5-david@gibson.dropbear.id.au> <20241009150721.63af48f6@elisabeth> <20241009224433.7fc28fc7@elisabeth> <20241016172648.666b0f8c@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 2Z7IUPY5NE3ZVKJEPSMPGGVMXSSJDK3W X-Message-ID-Hash: 2Z7IUPY5NE3ZVKJEPSMPGGVMXSSJDK3W X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 17 Oct 2024 12:19:58 +1100 David Gibson wrote: > On Wed, Oct 16, 2024 at 05:26:48PM +0200, Stefano Brivio wrote: > > On Wed, 16 Oct 2024 19:39:40 +1100 > > David Gibson wrote: > > > > > On Wed, Oct 16, 2024 at 04:46:52PM +1100, David Gibson wrote: > > > > On Wed, Oct 16, 2024 at 02:15:19PM +1100, David Gibson wrote: > > > > > On Thu, Oct 10, 2024 at 04:57:32PM +1100, David Gibson wrote: > > > > > > On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote: > > > > > > > On Wed, 9 Oct 2024 15:07:21 +0200 > > > > > > > Stefano Brivio wrote: > > > > > [snip] > > > > > > > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > > > > > > > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > > > > > > > > /* spliceable */ > > > > > > > > > > > > > > > > > > - /* Preserve the specific loopback adddress used, but let the > > > > > > > > > - * kernel pick a source port on the target side > > > > > > > > > + /* The traffic will go over the guest's 'lo' interface, but by > > > > > > > > > + * default use its external address, so we don't inadvertently > > > > > > > > > + * expose services that listen only on the guest's loopback > > > > > > > > > + * address. That can be overridden by --host-lo-to-ns-lo which > > > > > > > > > + * will instead forward to the loopback address in the guest. > > > > > > > > > + * > > > > > > > > > + * In either case, let the kernel pick the source address to > > > > > > > > > + * match. > > > > > > > > > */ > > > > > > > > > - tgt->oaddr = ini->eaddr; > > > > > > > > > + if (inany_v4(&ini->eaddr)) { > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > + tgt->eaddr = inany_loopback4; > > > > > > > > > + else > > > > > > > > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > > > > > > > > + tgt->oaddr = inany_any4; > > > > > > > > > + } else { > > > > > > > > > + if (c->host_lo_to_ns_lo) > > > > > > > > > + tgt->eaddr = inany_loopback6; > > > > > > > > > + else > > > > > > > > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > > > > > > > > > > > > > > > Either this... > > > > > > > > > > > > > > > > > + tgt->oaddr = inany_any6; > > > > > > > > > > > > > > > > or this (and not something before this patch, up to 3/4) make the > > > > > > > > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang, > > > > > > > > sometimes (about one in three/four runs), that's what I mistakenly > > > > > > > > reported as coming from Laurent's series at: > > > > > > > > > > > > Huh, interesting. Just got back from my leave and ran that group of > > > > > > tests in a loop this afternoon, but didn't manage to reproduce. I > > > > > > have administrivia that will probably fill the rest of this week, but > > > > > > I'll look into this as soon as I can. > > > > > > > > > > I reproduced the problem on passt.top, and I have a partial idea > > > > > what's going on. As you say it's seeming like the address (addr_seen > > > > > == addr in this case) isn't properly ready. This is over splice, but > > > > > on the tap interface, I see the container sending NS messages for its > > > > > own address - seems like it's doing DAD. But more importantly, we're > > > > > answering those NS messages with NA messages, because we answer all > > > > > NS. i.e. we're making the DAD fail. What I'm not sure of is how this > > > > > ever worked at all. --config-net makes sense, since we disable DAD, > > > > > but our test suite has always been using NDP+DHCP instead of > > > > > --config-net. > > > > > > > > > > So, AFACT, we'll always fail guest DAD attempts, both IPv6, which > > > > > happens most of the time and for IPv4 via ARP, which is used much more > > > > > rarely. I think we need to be more selective in what NS or ARP > > > > > lookups we resopnd to. The question is what approach to take: > > > > > > > > Hmm... no.. there's more to this. > > > > > > > > Usually DAD requests have :: as the source address, and we *do* > > > > exclude those from getting replies. In this case though, we're > > > > getting NS requests for the assigned address from what looks like the > > > > SLAAC address. So, I do think it would be wise to explicitly exclude > > > > these: we shouldn't be giving NA responses for an address that ought > > > > to belong to the guest, even if it doesn't look like a DAD. > > > > > > > > But, I'm not sure what's triggering this. Is for some reason the DHCP > > > > address not "taking", so the container is trying to locate it on the > > > > network instead? Or _is_ this DAD, but under some circumstances > > > > rather than using :: as the source address it uses another configured > > > > address. > > > > > > Ok.. I've understood a bit more. While timing is a factor here, it > > > looks like the main reason I wasn't seeing it on my machine is what > > > I'd consider a bug in the Debian version of the dhclient-script: > > > when adding an IPv6 address, it returns without waiting for DAD to > > > complete (i.e. for the address to be non-tentative). > > > > Oops. On one hand, I would feel inclined to propose a fix for the > > Debian and Ubuntu packages. On the other hand, I wonder if it's > > universally considered a bug: the DHCPv6 client did its job at that > > point, and it's debatable whether dhclient should wait for the address > > to be usable before forking to background. > > > > That is, arguably, the job of dhclient's is to request and configure an > > address. It's not a network configuration daemon. There might be many > > other reasons why that address is unusable, and yet dhclient is not > > responsible for them. > > Hrm... I guess. Counterpoints.. > - Most other failures to get a usable address will result in a > visible error > - dhclient has a --dad-wait-time option which seems to imply that the > script should wait for DAD > - The upstream script version waits for DAD > > In any case I filed a report for it > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231 > > > By the way, I guess it's just an issue for test scripts like this one. > > Why do you guess that? Because it's kind of rare that your address changes if you use DHCPv6, I guess, so this would be relevant almost exclusively at boot. And, at boot, if a remote peer/client happens to try to connect to the machine where the client is running right after an address was assigned, it must have a retry mechanism almost for sure. -- Stefano