From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/3] conf: Be more precise about minimum MTUs
Date: Wed, 19 Feb 2025 06:37:28 +0100 [thread overview]
Message-ID: <20250219063728.309bf1ac@elisabeth> (raw)
In-Reply-To: <20250219031429.3708026-4-david@gibson.dropbear.id.au>
On Wed, 19 Feb 2025 14:14:29 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> Currently we reject the -m option if given a value less than ETH_MAX_MTU
ETH_MIN_MTU
> (68). That define is derived from the kernel, but its name is misleading:
> it doesn't really have anything to do with Ethernet per se, but is rather
> the minimum payload any L2 link must be able to handle in order to carry
> IPv4.
Yes, that should be IPV4_MIN_MTU instead, but it was only added as
recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you
pointed out, but safe.
> For IPv6, it's not sufficient: that requires an MTU of at least
> 1280.
>
> Furthermore, the value of 68 is the minimum IP *fragment* size the link
> must be able to carry. Since we don't support IP fragmentation, it's not
> sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the
> minimum IP datagram size that all hosts must be able to accept.
First off, the only assumption in RFC 791 terms we can _perhaps_ make is
that we are some kind of "module" (also called "node", could be host or
router), not a (full) host. Maybe not even a module. So, with that
regard, we don't need to be prepared to _accept_ (for ourselves as
destination) any particular datagram size.
Second, even if all hosts need to be able to accept 576-byte datagrams,
that doesn't mean that all links need to be able to carry them. The MTU
refers _to the link_, not to what a host is able to accept.
And that's the reason why you can set 68 bytes as MTU on most network
interfaces on Linux. We set sub-576 values ourselves in tests:
$ grep -rn "mtu 256" *
passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256
passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256
That is, indeed, all hosts (not "modules") need to be able to accept
(not "forward") datagram sizes of at least 576 bytes... but that's only
assuming you can deliver those datagrams to them.
This is not just a theoretical matter. As late as 2018, I was made
aware of a setup with several (local!) nodes with links between them
having ~380 bytes as MTU.
Sure enough, the reason why I know about this was an issue coming from
the same flawed assumption made in kernel commit c9fefa08190f
("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by
82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check
in ip6_tnl_xmit").
See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU,
vti6 can carry IPv4 too") on the subject of what links can carry vs.
what endpoints should be able to forward.
> Move the verification of the MTU's lower bound to logic specific to the IP
> versions and correct those errors.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> conf.c | 20 +++++++++++++++-----
> ip.h | 7 +++++++
> util.h | 3 ---
> 3 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/conf.c b/conf.c
> index c5ee07b0..e127acc1 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv)
> if (errno || *e)
> die("Invalid MTU: %s", optarg);
>
> - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) {
> - die("MTU %lu out of range (%u..%u)", mtu,
> - ETH_MIN_MTU, ETH_MAX_MTU);
> + if (mtu > ETH_MAX_MTU) {
> + die("MTU %lu too large (max %u)",
> + mtu, ETH_MAX_MTU);
> }
>
> c->mtu = mtu;
> @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv)
> log_conf_parsed = true; /* Stop printing everything */
>
> nl_sock_init(c, false);
> - if (!v6_only)
> + if (!v6_only) {
> + if (c->mtu < IPV4_MINMAX_DATAGRAM) {
Now, if you want to make this symmetric with the IPv6 case, we could
also move this here... it just unnecessarily adds lines of code, and
this function is already (necessarily) rather long.
> + die("MTU %"PRIu16" is too small for IPv4 (minimum %u)",
> + c->mtu, IPV4_MINMAX_DATAGRAM);
> + }
> c->ifi4 = conf_ip4(ifi4, &c->ip4);
> - if (!v4_only)
> + }
> + if (!v4_only) {
> + if (c->mtu < IPV6_MIN_MTU) {
> + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)",
> + c->mtu, IPV6_MIN_MTU);
Does the fact that we don't disable IPv6 imply that IPv6 must be
working at all times? In my opinion not.
It's also rather convenient to be able to specify '-m 200' (for
whatever test) without having to give '-4' explicitly.
From a functionality perspective, I think warn() would be a better
choice.
> + }
> c->ifi6 = conf_ip6(ifi6, &c->ip6);
> + }
> if ((*c->ip4.ifname_out && !c->ifi4) ||
> (*c->ip6.ifname_out && !c->ifi6))
> die("External interface not usable");
> diff --git a/ip.h b/ip.h
> index 1544dbf4..8f5262fa 100644
> --- a/ip.h
> +++ b/ip.h
> @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = {
> /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */
> static const struct in_addr in4addr_broadcast = { 0xffffffff };
>
> +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 791) */
> +#define IPV4_MINMAX_DATAGRAM 576
> +
> +#ifndef IPV6_MIN_MTU
> +#define IPV6_MIN_MTU 1280
> +#endif
> +
> #endif /* IP_H */
> diff --git a/util.h b/util.h
> index 50e96d32..bdca5ee6 100644
> --- a/util.h
> +++ b/util.h
> @@ -34,9 +34,6 @@
> #ifndef ETH_MAX_MTU
> #define ETH_MAX_MTU USHRT_MAX
> #endif
> -#ifndef ETH_MIN_MTU
> -#define ETH_MIN_MTU 68
> -#endif
I would be fine with using IPV4_MIN_MTU by the way and adding a
fallback for it (there's no such thing as IP_MIN_MTU).
> #ifndef IP_MAX_MTU
> #define IP_MAX_MTU USHRT_MAX
> #endif
--
Stefano
next prev parent reply other threads:[~2025-02-19 5:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 3:14 [PATCH 0/3] Improve validation of --mtu option David Gibson
2025-02-19 3:14 ` [PATCH 1/3] conf: More thorough error checking when parsing " David Gibson
2025-02-19 6:56 ` Stefano Brivio
2025-02-19 3:14 ` [PATCH 2/3] conf: Use 0 instead of -1 as "unassigned" mtu value David Gibson
2025-02-19 6:56 ` Stefano Brivio
2025-02-19 3:14 ` [PATCH 3/3] conf: Be more precise about minimum MTUs David Gibson
2025-02-19 5:37 ` Stefano Brivio [this message]
2025-02-20 3:55 ` David Gibson
2025-02-20 6:45 ` Stefano Brivio
2025-02-20 10:06 ` David Gibson
2025-02-20 10:14 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250219063728.309bf1ac@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).