From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=hq4y1r6I; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 47F1B5A0008 for ; Wed, 30 Apr 2025 18:05:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1746029133; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Qu6JZMMYtiXBg3b6xAS+oZHgazIqBrMNnX1QWiwr1Xc=; b=hq4y1r6IVcP5ig1be37S9x0UHHLVwa0LsmFW6HUb9fZSQoBynl2BAL4Z3Clj+AsjHalhkq 4h6Il0GNEr2790Gf8XkgKxMwoNm4GYI1J9RC7F+T39YxKOp8gLMpvZSW6lVIi1XRc4GxAk gMEwHqkMSDkC6uW4CKqrGLW6p525psc= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-170-gHU4hWSCN8W9QY7bQLEftg-1; Wed, 30 Apr 2025 12:05:29 -0400 X-MC-Unique: gHU4hWSCN8W9QY7bQLEftg-1 X-Mimecast-MFC-AGG-ID: gHU4hWSCN8W9QY7bQLEftg_1746029128 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 99F981955DD0 for ; Wed, 30 Apr 2025 16:05:28 +0000 (UTC) Received: from lenovo-t14s.redhat.com (unknown [10.44.32.213]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id F3D3330001A2; Wed, 30 Apr 2025 16:05:26 +0000 (UTC) From: Laurent Vivier To: passt-dev@passt.top Subject: [PATCH] flow: fix podman issue #25959 Date: Wed, 30 Apr 2025 18:05:25 +0200 Message-ID: <20250430160525.3642997-1-lvivier@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: tmMMtDmSMZa0MaWn4Cf4qXlOBTtfAozPpw_C9bUSaIk_1746029128 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: PXMUNKV2XYXESA2CKX2HK5RAJ2OJK2GX X-Message-ID-Hash: PXMUNKV2XYXESA2CKX2HK5RAJ2OJK2GX X-MailFrom: lvivier@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: While running piHole using podman, traffic can trigger the following assert: ASSSERTION FAILED in flow_alloc (flow.c:521): flow->f.state == FLOW_STATE_FREE Backtrace shows that this happens in flow_defer_handler(): #4 0x00005610d6f5b481 flow_alloc (passt + 0xb481) #5 0x00005610d6f74f86 udp_flow_from_sock (passt + 0x24f86) #6 0x00005610d6f737c3 udp_sock_fwd (passt + 0x237c3) #7 0x00005610d6f74c07 udp_flush_flow (passt + 0x24c07) #8 0x00005610d6f752c2 udp_flow_defer (passt + 0x252c2) #9 0x00005610d6f5bce1 flow_defer_handler (passt + 0xbce1) We are trying to allocate a new flow inside the loop freeing them. Inside the loop free_head points to the first free flow entry in the current cluster. But if we allocate a new entry during the loop, free_head is not updated and can point now to the entry we have just allocated. We can fix the problem by spliting the loop in two parts: - first part where we can close some of them and allocate some new flow entries, - second part where we free the entries closed in the previous loop and we aggregate the free entries to merge consecutive the clusters. Link: https://github.com/containers/podman/issues/25959 Signed-off-by: Laurent Vivier --- flow.c | 107 ++++++++++++++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 50 deletions(-) diff --git a/flow.c b/flow.c index 3c81cb42f921..00c1b2cc316f 100644 --- a/flow.c +++ b/flow.c @@ -788,6 +788,7 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now) { struct flow_free_cluster *free_head = NULL; unsigned *last_next = &flow_first_free; + bool to_free[FLOW_MAX] = { 0 }; bool timer = false; union flow *flow; @@ -798,9 +799,44 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now) ASSERT(!flow_new_entry); /* Incomplete flow at end of cycle */ - flow_foreach_slot(flow) { + /* Check which flows we might need to close first, but don't free them + * yet as it's not safe to do that in the middle of flow_foreach(). + */ + flow_foreach(flow) { bool closed = false; + switch (flow->f.type) { + case FLOW_TYPE_NONE: + ASSERT(false); + break; + case FLOW_TCP: + closed = tcp_flow_defer(&flow->tcp); + break; + case FLOW_TCP_SPLICE: + closed = tcp_splice_flow_defer(&flow->tcp_splice); + if (!closed && timer) + tcp_splice_timer(c, &flow->tcp_splice); + break; + case FLOW_PING4: + case FLOW_PING6: + if (timer) + closed = icmp_ping_timer(c, &flow->ping, now); + break; + case FLOW_UDP: + closed = udp_flow_defer(c, &flow->udp, now); + if (!closed && timer) + closed = udp_flow_timer(c, &flow->udp, now); + break; + default: + /* Assume other flow types don't need any handling */ + ; + } + + to_free[FLOW_IDX(flow)] = closed; + } + + /* Second step: actually free the flows */ + flow_foreach_slot(flow) { switch (flow->f.state) { case FLOW_STATE_FREE: { unsigned skip = flow->free.n; @@ -833,60 +869,31 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now) break; case FLOW_STATE_ACTIVE: - /* Nothing to do */ + if (to_free[FLOW_IDX(flow)]) { + flow_set_state(&flow->f, FLOW_STATE_FREE); + memset(flow, 0, sizeof(*flow)); + + if (free_head) { + /* Add slot to current free cluster */ + ASSERT(FLOW_IDX(flow) == + FLOW_IDX(free_head) + free_head->n); + free_head->n++; + flow->free.n = flow->free.next = 0; + } else { + /* Create new free cluster */ + free_head = &flow->free; + free_head->n = 1; + *last_next = FLOW_IDX(flow); + last_next = &free_head->next; + } + } else { + free_head = NULL; + } break; default: ASSERT(false); } - - switch (flow->f.type) { - case FLOW_TYPE_NONE: - ASSERT(false); - break; - case FLOW_TCP: - closed = tcp_flow_defer(&flow->tcp); - break; - case FLOW_TCP_SPLICE: - closed = tcp_splice_flow_defer(&flow->tcp_splice); - if (!closed && timer) - tcp_splice_timer(c, &flow->tcp_splice); - break; - case FLOW_PING4: - case FLOW_PING6: - if (timer) - closed = icmp_ping_timer(c, &flow->ping, now); - break; - case FLOW_UDP: - closed = udp_flow_defer(c, &flow->udp, now); - if (!closed && timer) - closed = udp_flow_timer(c, &flow->udp, now); - break; - default: - /* Assume other flow types don't need any handling */ - ; - } - - if (closed) { - flow_set_state(&flow->f, FLOW_STATE_FREE); - memset(flow, 0, sizeof(*flow)); - - if (free_head) { - /* Add slot to current free cluster */ - ASSERT(FLOW_IDX(flow) == - FLOW_IDX(free_head) + free_head->n); - free_head->n++; - flow->free.n = flow->free.next = 0; - } else { - /* Create new free cluster */ - free_head = &flow->free; - free_head->n = 1; - *last_next = FLOW_IDX(flow); - last_next = &free_head->next; - } - } else { - free_head = NULL; - } } *last_next = FLOW_MAX; -- 2.49.0