From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FQRqdFEs; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id CFDA65A061F for ; Thu, 20 Feb 2025 11:14:31 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740046470; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UNnoOPOs3V5k0hrQYvI3PwtdZpoCH76UCisyu6iBS3c=; b=FQRqdFEsmzzzN4ufgBczyrSSVbqSRwQIJmVjEv7RDdBJqo3o2sxfumitUhtwG7fqQ1f9pu z6Z50dq1giRfo5lHSYTE0V9Icxa5VxEcB7EydhIArxosRcrUUenvACdiPecMOQwsrbu4le WbZWSMjuGW2GDY0BJZXcecV8M4ExJ3k= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-256-P9rH95H_OTKhHZ7rY_aaAA-1; Thu, 20 Feb 2025 05:14:29 -0500 X-MC-Unique: P9rH95H_OTKhHZ7rY_aaAA-1 X-Mimecast-MFC-AGG-ID: P9rH95H_OTKhHZ7rY_aaAA_1740046468 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4393e873962so3544845e9.3 for ; Thu, 20 Feb 2025 02:14:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740046468; x=1740651268; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=UNnoOPOs3V5k0hrQYvI3PwtdZpoCH76UCisyu6iBS3c=; b=B1GEvulIZatu0i8n2RvDy/39zV7KjifsPlyOrha3UQgW2JeTnYGGkG0njRTlMxXQU5 dQsSoEOq6P7WNL3iwmc3O4sqWjN3gfcramL6TyTsvWTDYpzhR6+XPycoTsJs4lWbflYW h/L/6UYydwWNM0/zQ2Yi+p/Fw4MGvqDgPS2WGMpXDmAm4Wy6zuKTYhPqYS1+lXwzYkVG s1uRXxOi3/BDJTEzquht/XrIpg6utcgeja2uKrC8IIAYj7GKdxU7WOfiC3JFL51VlH0N E+8L8wKXfKBW1OibJ4PUK1IgELO0yHR8lBu+wRaIbDCZOS9YgHdzxjQKUHjy0dACrKE1 QrYQ== X-Gm-Message-State: AOJu0Yzok/kCJ3Vk51ZpUg6hx7Xz3FJypikfKciKpVqn9toNDHdIS4G7 p8j5Bel6GKvRbJ+nI1AZ12xdjAUlfugK2OyrYg8k+A4DwowOGTJsljC9AHTZCOxXBEtjXzoiga7 TxwGABmGZ9/ZuJvUkywkcRhzkLBpIbaiSqtJ3/TYndmPzfng5hQ== X-Gm-Gg: ASbGncvBWP9xfFx5JJRcEZz3J124z1m/KtXZFNB64o5D6B6CBum7yTzbEHRugxLW4aY IrW+ljY1E7vmXU0FWoR3ZYO2GIwVZzaQOvqQ/IAsWhanWJdrwXJcJXjy7cD7TeIeMrKQQyqXU23 AWDPc6Q22FM+XStq9dVwpKbi6WvL6McYyanppnFB0j/LaiCcU8P3ySQXRpzjJK+DTQMncNZRVJM dViKeBg2PDkOuGU/hdFj8kZMYTtvoDRQbS7up71PMtISPyzY9Q7j3cvm5pVwr0YsPoBgFNZrbCi I0jKl8DWTFirfOgC X-Received: by 2002:a05:600c:1e11:b0:439:86fb:7324 with SMTP id 5b1f17b1804b1-43986fb738amr129953225e9.4.1740046468101; Thu, 20 Feb 2025 02:14:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IF7UPz+MyxObgST6FOUGDSl2YGVZdZk42cTBXhvw35ZBRQ2+EAwCMORip0t8dwPoSApDJoKKQ== X-Received: by 2002:a05:600c:1e11:b0:439:86fb:7324 with SMTP id 5b1f17b1804b1-43986fb738amr129952815e9.4.1740046467620; Thu, 20 Feb 2025 02:14:27 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43989087517sm104394075e9.8.2025.02.20.02.14.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2025 02:14:27 -0800 (PST) Date: Thu, 20 Feb 2025 11:14:25 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/3] conf: Be more precise about minimum MTUs Message-ID: <20250220111425.66a8657e@elisabeth> In-Reply-To: References: <20250219031429.3708026-1-david@gibson.dropbear.id.au> <20250219031429.3708026-4-david@gibson.dropbear.id.au> <20250219063728.309bf1ac@elisabeth> <20250220074540.318bee27@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XJ7qYDxZ1XwmOS2wwVVHLJNlx-rHcO7UeoteljXpLS8_1740046468 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: VW3MT5EUPWIQQ45BZ5FTTFBPQO5X25OU X-Message-ID-Hash: VW3MT5EUPWIQQ45BZ5FTTFBPQO5X25OU X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 20 Feb 2025 21:06:52 +1100 David Gibson wrote: > On Thu, Feb 20, 2025 at 07:45:40AM +0100, Stefano Brivio wrote: > > On Thu, 20 Feb 2025 14:55:30 +1100 > > David Gibson wrote: > > > > > On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote: > > > > On Wed, 19 Feb 2025 14:14:29 +1100 > > > > David Gibson wrote: > > > > > > > > > Currently we reject the -m option if given a value less than ETH_MAX_MTU > > > > > > > > ETH_MIN_MTU > > > > > > > > > (68). That define is derived from the kernel, but its name is misleading: > > > > > it doesn't really have anything to do with Ethernet per se, but is rather > > > > > the minimum payload any L2 link must be able to handle in order to carry > > > > > IPv4. > > > > > > > > Yes, that should be IPV4_MIN_MTU instead, but it was only added as > > > > recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you > > > > pointed out, but safe. > > > > > > Ah, thanks, I hadn't realised that newer kernels had better named > > > constants. When I respin I'll use matching names. > > > > > > > > For IPv6, it's not sufficient: that requires an MTU of at least > > > > > 1280. > > > > > > > > > > Furthermore, the value of 68 is the minimum IP *fragment* size the link > > > > > must be able to carry. Since we don't support IP fragmentation, it's not > > > > > sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the > > > > > minimum IP datagram size that all hosts must be able to accept. > > > > > > > > First off, the only assumption in RFC 791 terms we can _perhaps_ make is > > > > that we are some kind of "module" (also called "node", could be host or > > > > router), not a (full) host. Maybe not even a module. So, with that > > > > regard, we don't need to be prepared to _accept_ (for ourselves as > > > > destination) any particular datagram size. > > > > > > > > Second, even if all hosts need to be able to accept 576-byte datagrams, > > > > that doesn't mean that all links need to be able to carry them. The MTU > > > > refers _to the link_, not to what a host is able to accept. > > > > > > Ah... yes. I was thinking that that requirement implied that a link > > > which can't fragment was useless if it couldn't carry 576-byte > > > datagrams, but thinking over your examples here I realise I was > > > mistaken. > > > > > > > And that's the reason why you can set 68 bytes as MTU on most network > > > > interfaces on Linux. We set sub-576 values ourselves in tests: > > > > > > > > $ grep -rn "mtu 256" * > > > > passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 > > > > > > > > That is, indeed, all hosts (not "modules") need to be able to accept > > > > (not "forward") datagram sizes of at least 576 bytes... but that's only > > > > assuming you can deliver those datagrams to them. > > > > > > > > This is not just a theoretical matter. As late as 2018, I was made > > > > aware of a setup with several (local!) nodes with links between them > > > > having ~380 bytes as MTU. > > > > > > > > Sure enough, the reason why I know about this was an issue coming from > > > > the same flawed assumption made in kernel commit c9fefa08190f > > > > ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by > > > > 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check > > > > in ip6_tnl_xmit"). > > > > > > > > See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, > > > > vti6 can carry IPv4 too") on the subject of what links can carry vs. > > > > what endpoints should be able to forward. > > > > > > > > > Move the verification of the MTU's lower bound to logic specific to the IP > > > > > versions and correct those errors. > > > > > > > > > > Signed-off-by: David Gibson > > > > > --- > > > > > conf.c | 20 +++++++++++++++----- > > > > > ip.h | 7 +++++++ > > > > > util.h | 3 --- > > > > > 3 files changed, 22 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/conf.c b/conf.c > > > > > index c5ee07b0..e127acc1 100644 > > > > > --- a/conf.c > > > > > +++ b/conf.c > > > > > @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) > > > > > if (errno || *e) > > > > > die("Invalid MTU: %s", optarg); > > > > > > > > > > - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { > > > > > - die("MTU %lu out of range (%u..%u)", mtu, > > > > > - ETH_MIN_MTU, ETH_MAX_MTU); > > > > > + if (mtu > ETH_MAX_MTU) { > > > > > + die("MTU %lu too large (max %u)", > > > > > + mtu, ETH_MAX_MTU); > > > > > } > > > > > > > > > > c->mtu = mtu; > > > > > @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) > > > > > log_conf_parsed = true; /* Stop printing everything */ > > > > > > > > > > nl_sock_init(c, false); > > > > > - if (!v6_only) > > > > > + if (!v6_only) { > > > > > + if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > > > > > Now, if you want to make this symmetric with the IPv6 case, we could > > > > also move this here... it just unnecessarily adds lines of code, and > > > > this function is already (necessarily) rather long. > > > > > > Sorry, I'm not following what change you're suggesting (or discussing?). > > > > The exact change I quoted: moving the check on the minimum MTU to here: > > > > if (c->mtu < IPV4_MINMAX_DATAGRAM) { > > > > compared to doing it earlier in conf(). > > But... the diff you're commenting on is already doing exactly that. Right, I said that we could (anyway) move the check here as your patch does, just to make that symmetric with IPv6, regardless of my other considerations. I just think it's unnecessary. > What am I missing? Nothing, I guess. > > > > > + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", > > > > > + c->mtu, IPV4_MINMAX_DATAGRAM); > > > > > + } > > > > > c->ifi4 = conf_ip4(ifi4, &c->ip4); > > > > > - if (!v4_only) > > > > > + } > > > > > + if (!v4_only) { > > > > > + if (c->mtu < IPV6_MIN_MTU) { > > > > > + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", > > > > > + c->mtu, IPV6_MIN_MTU); > > > > > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > > working at all times? In my opinion not. > > > > > > > > It's also rather convenient to be able to specify '-m 200' (for > > > > whatever test) without having to give '-4' explicitly. > > > > > > > > >From a functionality perspective, I think warn() would be a better > > > > choice. > > > > > > warn() and disable the relevant protocol. That makes sense, I'll make > > > that change. > > > > I don't think it makes sense to disable IPv4, highlighting quote: > > > > > > Does the fact that we don't disable IPv6 imply that IPv6 must be > > > > working at all times? In my opinion not. > > > > ...you can advertise a small MTU for whatever reason. The guest might > > configure it or not. The guest might change it later on. We have no way > > to re-enable IPv6 once it's disabled, though. > > Ah... good point. > > > So let's just do what the user says, I would suggest, and warn them > > that it *might* not work. There is zero functionality gained by > > disabling IPv6. > > Ok, I'll send a v3 which does that. Okay, thanks. -- Stefano