From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=CWnlrzDd; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 9E1FA5A026F for ; Sun, 05 Oct 2025 17:52:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759679566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QHQ5QFAY4Q3/CsusBIh20uN2WwTj+WO/gFw2rgnsQYI=; b=CWnlrzDd3fzVQoG1vDNE7k+1Y4XZBvHNrtIyh8eebs1qMCFlvQttXHz8kqv8THPY7USMIA +CnXjDL6QnxIvG2f8zVxqHXOFUKa4dTPzNFw0V5emTVIMo+NVdRk5uo0LSTNLxqTwfK5AY ymOVuwlIGy0uOBnucUhxlGRTbmWJOYg= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-626-H4QUk8SONfisoPEqo99Ydg-1; Sun, 05 Oct 2025 11:52:45 -0400 X-MC-Unique: H4QUk8SONfisoPEqo99Ydg-1 X-Mimecast-MFC-AGG-ID: H4QUk8SONfisoPEqo99Ydg_1759679564 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-4e0fcbf8eb0so100703801cf.1 for ; Sun, 05 Oct 2025 08:52:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759679564; x=1760284364; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QHQ5QFAY4Q3/CsusBIh20uN2WwTj+WO/gFw2rgnsQYI=; b=kTBYWkBjvLkyBu1mEKvaLU9Wmruc1+tQjjAetDLzHH+OzyW8Kuy3lks/x01MKw2Q2x fGgAb4DnwGoEtsa/M3vbQlNO34W+SAxV1o3Wl5CVF7o7RJtMe4xiZfy6cqHF6jH7scTu WjVkI9k2Ng3HzpQGvICbBm1/0DValt+wC2xz45DfEf1o8H/rtr1DwfKE3TbTHm+GlIW4 Wv8/dx84ThrIMpK0Bw07ILasjzSB9yCfU4laesu7XpDunfPvGvmaOuIOV1sFP42JLOec itBGq9fTltXO4JJDtgolcVfA+FRVkzNNmA4N9OsF1BuMTwuH85yyYj4y/KqHUCGbONeu w/eQ== X-Forwarded-Encrypted: i=1; AJvYcCXOX21RZq0HWIt8j11yOltSpEWpAOf1AcOyotKLlEgyn9lzxg0ZIFVeG2xFjgx5M0Q+YIrgKEvEyRg=@passt.top X-Gm-Message-State: AOJu0YyiJzwwWCje0ySjOo/rcvz98jDuCE4DKYc1Rlg6dE4zdvOp+Chk TlRNsJzoHGcsGAShfplGeAI+49g3rSiVJVhDA+uZS8/x8UMKDvKXV3ZlwFY2FukMmhpNYVMSfLk rXTZF+wdzFOboc7T7ggWtDNgdqV4D76OTNgOEum2Cl37a81HlnumdJQ== X-Gm-Gg: ASbGncvy4GbWF/77SsRCDw9h9h40Sb8pw5rJLO6DUVV2s44+82lu4hpbVkqvde7Wd7Y xNun+JKBlRFp15brr7bLlppFfeYMwTWThW1m/XWS4ilGMSHDryf53IUvWLBOt3V3L8ldED2vwny hLTYbFe4X/q619vaAyms0e59t4NbVvjICkUOYa0OyJt8uzt4aicG6PrjaP9113Y32PxUpUxCAqj 8waID4481JH4xvoH4l6l+TUQ1l4RtiEXSuhrpMLQWfJHie38fi3SlcaSaa4L8WvfBr350srsdqF VjddtoAPRupQRdaP8YCqprqO47YlcWh8UCqlGCHXU5gCrTgY7kcOhX//3QFqcftp5Qm81HM3YB3 9wcvjy+TDfguTDkE= X-Received: by 2002:ac8:574f:0:b0:4d0:78cf:7f7c with SMTP id d75a77b69052e-4e576a97739mr116883861cf.36.1759679564517; Sun, 05 Oct 2025 08:52:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/sYVsO6WcLIAfKYV9LckfwhDB5zADN4vBxgCx/0IShZQgGnG6b9vQZ9UruXL4XMNLuhhtTg== X-Received: by 2002:ac8:574f:0:b0:4d0:78cf:7f7c with SMTP id d75a77b69052e-4e576a97739mr116883651cf.36.1759679563991; Sun, 05 Oct 2025 08:52:43 -0700 (PDT) Received: from ?IPV6:2001:4958:2193:9901:6217:960c:2ef1:f0f3? ([2001:4958:2193:9901:6217:960c:2ef1:f0f3]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4e55a7519a6sm96108551cf.11.2025.10.05.08.52.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 05 Oct 2025 08:52:43 -0700 (PDT) Message-ID: <82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com> Date: Sun, 5 Oct 2025 11:52:42 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents To: David Gibson References: <20251003003412.588801-1-jmaloy@redhat.com> <20251003003412.588801-3-jmaloy@redhat.com> From: Jon Maloy In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: E7Bdv5trJaSxRcaODy6xJhB62UvNOSDsxQJFekKfzJw_1759679564 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: ACTTAQXHOVAIWI3ZVGLXWKGO2VP6FWXY X-Message-ID-Hash: ACTTAQXHOVAIWI3ZVGLXWKGO2VP6FWXY X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2025-10-03 00:31, David Gibson wrote: > On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote: >> We add a cache table to keep track of the contents of the kernel ARP >> and NDP tables. The table is fed from the just introduced netlink based [...] >> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, >> + const uint8_t *mac) >> +{ >> + struct neigh_table *t = &neigh_table; >> + struct neigh_table_entry *e; >> + ssize_t slot; >> + >> + /* MAC address might change sometimes */ >> + e = fwd_neigh_table_find(c, addr); >> + if (e) { >> + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) >> + return; >> + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) >> + return; > > This doesn't make sense to me. From the way its looked up in 4/9, the > IP addresses in the table are as seen by the host, not by the guest > (we look up the table *after* applying NAT). Which makes guest_gw not > meaningful here. > > You _do_ need to handle the case that addr is loopback (which guest_gw > might be translated to), and that's handled by your dummy entries. > > The other case we might need to consider here is if the (translated) > address is the host's IP. Do we want to give out_tap_addr for that > case? Or the host's real MAC on the template interface? If the > latter will that be in the host ARP table, or would we have to handle > it specially? > This is my current understanding of this: 1) Adding the loopback entries to the neigbour table is harmless, but unnecessary. fwd_neigh_mac_get() will always return our_tap_mac when it doesn't find an entry in the table, which is the case here. This addition was a mistake. 2) Regarding the default gw, we have two cases: 2.1) guest_gw IP is the same as the host_gw IP: In this case, we'll have an event trying to inserting an entry into the table with the host_gw's real mac address. In v11, this mapping was announced to the guest, but later contradicted by all ARP/NDP messages he receives, because they all come from PASTA/PASST and uses our_tap_mac as source mac in the message header. This is probably harmless, but causes confusion in the guest and a warning wireshark. In v12, I therefore chose to add the entry but suppress the announcements for the gw, and also the updates of the mac fields from subsequent events. This is not right either. We either have to be consistent, and add the real host_gw mac in all messages sent from the purported gw address, or we just as consistently use our_tap_mac. I think the latter is simpler with less consequences for the code. I think we should just simply suppress all events from the host gw in such cases. (Here I have a question: I don't see any NAT mapping making it possible to communicate between the guest and the the host gw when guest gw IP and host gw IP are are equal. Shouldn't there be one?) 2.2) guest_gw IP is different from host_gw IP: This case is simpler. We just treat the host_gw as any other local host, add and update its entry at new events, and pass it on to the guest as an announcement at first contact, consistently using that host's true mapping. 3) Regarding the host address, we also have two cases: 3.1) guest IP is the same as the host IP (on the default interface): In this case the guest can only reach the host by using the guest_gw IP (which lands on the host), some or the other IPs on the host, or the map_host_loopback IP, as far as I understand. The host IP will never enter the neigbour table by ARP/NDP event, and there is no reason for us to add a dummy for it. 3.2) guest IP differs from the host IP. In this case we let fwd_neigh_table_init() add an entry for the host as if it were any other host, using the IP and mac addresses from the template interface. Conclusion: - We use nat_inbound() instead of nat_outbound() before consulting the table, like you suggest. - We need to manually add an entry representing the host in the case the host IP differs from the guest IP. This entry is announced. In the case the IPs are the same we don't add any entry. - Local host entries are added by ARP/NDP events, but only if nat_inbound() doesn't translate the IP address (will that ever happen?). We suppress all events from the host gw in case it has the same IP as the guest gw (unless this is covered by the nat_inbound() check?). - We may want a NAT mapping making it possible to reach the host gw in the case it has the same IP as the guest gw. This has no effect on our neighbour table. If you agree with my above analysis I can go ahead and post a v13 of the series early next week, including the above changes and fixes for your other comments to the series. ///jon >> + >> + memcpy(e->mac, mac, ETH_ALEN); >> + return; >> + } >> + >> + e = t->free; >> + if (!e) { >> + debug("Failed to allocate neighbour table entry"); >> + return; >> + } >> + t->free = e->next; >> + >> + slot = neigh_table_slot(c, addr); >> + e->next = t->slots[slot]; >> + t->slots[slot] = e; >> + >> + memcpy(&e->addr, addr, sizeof(*addr)); >> + memcpy(e->mac, mac, ETH_ALEN); >> +} >> + >> +/** >> + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list >> + * @c: Execution context >> + * @addr: IP address used to find the slot for the entry >> + */ >> +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) >> +{ >> + ssize_t slot = neigh_table_slot(c, addr); >> + struct neigh_table *t = &neigh_table; >> + struct neigh_table_entry *e, **prev; >> + >> + prev = &t->slots[slot]; >> + e = t->slots[slot]; >> + while (e && !inany_equals(&e->addr, addr)) { >> + prev = &e->next; >> + e = e->next; >> + } >> + if (!e) >> + return; >> + >> + *prev = e->next; >> + e->next = t->free; >> + t->free = e; >> + memset(&e->addr, 0, sizeof(*addr)); >> + memset(e->mac, 0, ETH_ALEN); > > As Stefano noted earlier, 0xff is probably a better placeholder here, > since 00:00:00:00:00:00 is a valid MAC. > >> +} >> + >> +/** >> + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table >> + * @c: Execution context >> + * @addr: Neighbour IP address used as lookup key >> + * @mac: Buffer for Ethernet MAC to return, found or default value. >> + * >> + * Return: true if real MAC found, false if not found or if failure >> + */ >> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, >> + uint8_t *mac) >> +{ >> + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); >> + >> + if (e) >> + memcpy(mac, e->mac, ETH_ALEN); >> + else >> + memcpy(mac, c->our_tap_mac, ETH_ALEN); >> + >> + return !!e; >> +} >> + >> +/** >> + * fwd_neigh_table_init() - Initialize the neighbour table >> + * @c: Execution context >> + */ >> +void fwd_neigh_table_init(const struct ctx *c) >> +{ >> + struct neigh_table *t = &neigh_table; >> + const uint8_t *omac = c->our_tap_mac; >> + struct neigh_table_entry *e; >> + int i; >> + >> + memset(t, 0, sizeof(*t)); >> + for (i = 0; i < NEIGH_TABLE_SIZE; i++) { >> + e = &t->entries[i]; >> + e->next = t->free; >> + t->free = e; >> + } >> + >> + /* These addresses must always map to our own MAC address */ >> + fwd_neigh_table_update(c, &inany_loopback4, omac); >> + fwd_neigh_table_update(c, &inany_loopback6, omac); > > These make sense. > >> + fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac); >> + fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac); > > But these don't. For two reasons: > * As noted guest_gw is only meaningful guest side, but the table is > based on host side addresses. > * These will be no-ops, since you explicitly exclude these addresses > in fwd_neigh_table_update(). > >> +} >> + >> /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral >> * >> * Work out what ports the host thinks are emphemeral and record it for later >> diff --git a/fwd.h b/fwd.h >> index 65c7c96..6ca743c 100644 >> --- a/fwd.h >> +++ b/fwd.h >> @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, >> const struct flowside *ini, struct flowside *tgt); >> uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, >> const struct flowside *ini, struct flowside *tgt); >> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, >> + const uint8_t *mac); >> +void fwd_neigh_table_free(const struct ctx *c, >> + const union inany_addr *addr); >> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, >> + uint8_t *mac); >> +void fwd_neigh_table_init(const struct ctx *c); >> >> #endif /* FWD_H */ >> diff --git a/netlink.c b/netlink.c >> index 3fe2fdd..4be5fcf 100644 >> --- a/netlink.c >> +++ b/netlink.c >> @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) >> inany_from_af(&addr, ndm->ndm_family, dst); >> inany_ntop(dst, ip_str, sizeof(ip_str)); >> >> - if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) >> + if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) { >> trace("neigh table update: %s / %s", ip_str, mac_str); >> - else >> + fwd_neigh_table_update(c, &addr, mac); >> + } else { >> trace("neigh table delete: %s / %s", ip_str, mac_str); >> + fwd_neigh_table_free(c, &addr); >> + } >> } >> >> /** >> -- >> 2.50.1 >> >