Do not apply this version of the patch: I just discovered it's entirely breaking ping. Investigating... On Wed, Jul 05, 2023 at 11:04:27AM +1000, David Gibson wrote: > On Tue, Jul 04, 2023 at 01:21:04PM +0200, Stefano Brivio wrote: > > On Tue, 4 Jul 2023 14:36:23 +1000 > > David Gibson wrote: > > > > > We don't handle defragmentation of IP packets coming from the tap side, > > > and we're unlikely to any time soon (with our large MTU, it's not useful > > > for practical use cases). Currently, however, we simply ignore the > > > fragmentation flags and treat fragments as though they were whole IP > > > packets. This isn't ideal and can lead to rather cryptic behaviour if we > > > do receive IP fragments. > > > > > > Change the code to explicitly drop fragmented packets, and print a rate > > > limited warning if we do encounter them. > > > > > > Bugzilla: https://bugs.passt.top/show_bug.cgi?id=62 > > > > By the way, I silently replaced those with "Link:" in the past, just in > > case we want to automate something around it one day, to avoid > > differences between references to different bug trackers. > > Oh, ok. I'll keep that in mind. > > > Once upon a time, I wrote some scripting to automatically link HTML > > reports with (Linux kernel) commits to bug trackers, and it was quite > > painful to discover all possible spellings of "Bugzilla" plus a few > > others, hence my thought. But let me know if something speaks against > > this. > > No, that makes sense. > > > > Signed-off-by: David Gibson > > > --- > > > tap.c | 31 +++++++++++++++++++++++++++++++ > > > 1 file changed, 31 insertions(+) > > > > > > diff --git a/tap.c b/tap.c > > > index e3235299..2e6939fa 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -62,6 +62,7 @@ static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf); > > > static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf); > > > > > > #define TAP_SEQS 128 /* Different L4 tuples in one batch */ > > > +#define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warnings */ > > > > > > /** > > > * tap_send() - Send frame, with qemu socket header if needed > > > @@ -543,6 +544,32 @@ static void tap_packet_debug(const struct iphdr *iph, > > > } > > > } > > > > > > +/** > > > + * tap4_is_fragment() - Determine if a packet is an IP fragment > > > > This is actually independent from the "tap" "side", it could also be > > e.g. ipv4_is_fragment(), in util.c. Not a strong preference though, I > > guess we can also merge it as it is. > > Well.. the detection of fragments is independent, but the warning > message is specific to tap. I'm inclined to leave it as is, at least > until we have a need for this logic somewhere else, at which point we > can refactor. > > > > > > + * @iph: IPv4 header (length already validated) > > > + * @now: Current timestamp > > > + * > > > + * Return: true if iph is an IP fragment, false otherwise > > > + */ > > > +static bool tap4_is_fragment(const struct iphdr *iph, > > > + const struct timespec *now) > > > +{ > > > + if (iph->frag_off & ~IP_DF) { > > > + /* Ratelimit messages */ > > > + static time_t last_message; > > > + static unsigned num_dropped; > > > + > > > + num_dropped++; > > > + if (now->tv_sec - last_message > FRAGMENT_MSG_RATE) { > > > + warn("Can't process IPv4 fragments (%lu dropped)", num_dropped); > > > + last_message = now->tv_sec; > > > + num_dropped = 0; > > > + } > > > + return true; > > > + } > > > + return false; > > > +} > > > + > > > /** > > > * tap4_handler() - IPv4 and ARP packet handler for tap file descriptor > > > * @c: Execution context > > > @@ -591,6 +618,10 @@ resume: > > > hlen > l3_len) > > > continue; > > > > > > + /* We don't handle IP fragments, drop them */ > > > + if (tap4_is_fragment(iph, now)) > > > + continue; > > > + > > > l4_len = l3_len - hlen; > > > > > > if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr) { > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson