From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id B38B35A026F for ; Fri, 5 Jan 2024 08:34:36 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1704440070; bh=s5Pyj1rq+SPiusBxx1N6/oh9N35dD8cMI4ZPDJx0paw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kwbajsY9iIu/o4lTjAB3dKDmT9Zc1pJbBIrbRGDIPEwiMLn9jYu4mKaco4izDh9ow G6jr2xMFweEY1Ptw5+AcIkGBwbL3ZC0ShfVycWC1MNG+4x8v6I4erIQDtVrWO1ZYSi yJicv/XXq8S250r2B/jOuMf6XtUhtofibugUcU1L9OnrtASYl6j0M807WddNn6FRVT bx9GKmo7e5NVi7aN5Dgz/lzXkQC0ZhLjlvp+xljeX7YrN1nEfOkAP+RmwpyuUIfFdr W4+owY+nJDlbM3+0kOtjIkyYTldlhj0T4f/RIV5ZIXRlLc0TtgLDrwERlJQmk1QHvc n1g4QAdqhwd5Q== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4T5wFf5jSMz4wyS; Fri, 5 Jan 2024 18:34:30 +1100 (AEDT) Date: Thu, 4 Jan 2024 21:02:19 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 13/13] flow: Avoid moving flow entries to compact table Message-ID: References: <20231221061549.976358-1-david@gibson.dropbear.id.au> <20231221061549.976358-14-david@gibson.dropbear.id.au> <20231228192525.7ba1ee48@elisabeth> <20231230113304.37c60a9a@elisabeth> <20240102191341.7c91dd44@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="FyDaohthlYaKCwwx" Content-Disposition: inline In-Reply-To: <20240102191341.7c91dd44@elisabeth> Message-ID-Hash: LYW4DTAEEXWU6JP75WLXGYLIOV4LWBWK X-Message-ID-Hash: LYW4DTAEEXWU6JP75WLXGYLIOV4LWBWK X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --FyDaohthlYaKCwwx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 02, 2024 at 07:13:41PM +0100, Stefano Brivio wrote: > On Mon, 1 Jan 2024 23:01:17 +1100 > David Gibson wrote: >=20 > > On Sat, Dec 30, 2023 at 11:33:04AM +0100, Stefano Brivio wrote: > > > On Thu, 28 Dec 2023 19:25:25 +0100 > > > Stefano Brivio wrote: > > > =20 > > > > > On Thu, 21 Dec 2023 17:15:49 +1100 > > > > > David Gibson wrote: > > > > >=20 > > > > > [...] =20 > > > > > > > > [...] > > > > > > > > I wonder if we really have to keep track of the number of (non-)ent= ries > > > > in the free "block", and if we have to be explicit about the two ca= ses. > > > >=20 > > > > I'm trying to find out if we can simplify the whole thing with slig= htly > > > > different variants, for example: =20 > > >=20 > > > So... I think the version with (explicit) blocks has this fundamental > > > advantage, on deletion: > > > =20 > > > > > + flow->f.type =3D FLOW_TYPE_NONE; > > > > > + /* Put it back in a length 1 free block, don't attempt to fully= reverse > > > > > + * flow_alloc()s steps. This will get folded together the next= time > > > > > + * flow_defer_handler runs anyway() */ > > > > > + flow->free.n =3D 1; > > > > > + flow->free.next =3D flow_first_free; > > > > > + flow_first_free =3D FLOW_IDX(flow); =20 > > >=20 > > > which is doable even without explicit blocks, but much harder to > > > follow. =20 > >=20 > > Remember this is not a general deletion, only a "cancel" of the most > > recent allocation. >=20 > Oh, I thought that was only the case for this series and you would use > that as actual deletion in another pending series (which I haven't > finished reviewing yet). No. Not allowing deletion of any entry at any time is what I'm trading off to get both O(1) allocation and (effectively) O(1) deletion. > But now I'm not sure anymore why I was thinking this... >=20 > Anyway... do we really need it, then? Can't we just mark the "failed" > flows as whatever means "closed" for a specific protocol, and clean > them up later, instead of calling cancel() right away? We could, but I'm not sure we want to. For starters, that requires protocol-specific behaviour whenever we need to back out an allocation like this. Not a big deal, since that's in protocol specific code already, but I think it's uglier than calling cancel. It also requires that the protocol specific deferred cleanup functions (e.g. tcp_flow_defer()) handle partially initialised entries. With 'cancel' we can back out just the initialisation steps we've already done (because we know where we've failed during init), then remove the entry. The deferred cleanup function only needs to deal with "complete" entries. Again, certainly possible, but IMO uglier than having 'cancel'. Finally, leaving the "cancelled" entry there until the next deferred pass means if we allocate other flows before that defer pass they'll avoid this slot. That works against trying to keep the entries reasonably compact. > > To reduce fragmentation we are keeping the linked > > list of free clusters in strictly ascending order. The logic above is > > only correct if the entry we're freeing is before any other free entry > > in the table. That will be the case for the most recent allocation, > > because we always allocatte the first free entry in the table. >=20 > I see now. This isn't entirely clear from the "Theory of Operation", > on the other hand it's probably a bad idea to overload that description > with all these details. I'll see what I can in terms of making that clearer in the description. > > > Other than the comments I have, I would go ahead with your approach. = My > > > attempt below in case you're interested. The comment at the end of > > > flow_del() shows the issue I'm facing: > > >=20 > > > struct flow_free_block { > > > /* Must be first element */ > > > struct flow_common f; > > >=20 > > > unsigned next_free; > > > unsigned next_used; > > > }; > > >=20 > > > static union flow *flow_alloc(void) > > > { > > > union flow *flow; > > >=20 > > > if (flow_first_free >=3D FLOW_MAX) > > > return NULL; > > >=20 > > > flow =3D flowtab + flow_first_free; > > >=20 > > > flow_first_free =3D flow->free.next_free; > > >=20 > > > memset(flow, 0, sizeof(*flow)); /* TODO: select region */ > > > return flow; > > > } > > >=20 > > > static void flow_del(union flow *del) > > > { > > > union flow *left, *right, *next_used =3D NULL, *first_free; =20 > > fg > > >=20 > > > del->f.type =3D FLOW_TYPE_NONE; > > >=20 > > > left =3D (FLOW_IDX(del) > 0) ? FLOW(FLOW_IDX(del) - 1) := NULL; > > > right =3D (FLOW_IDX(del) < FLOW_MAX - 1) ? FLOW(FLOW_IDX(del) + 1) := NULL; > > >=20 > > > first_free =3D flow_first_free < FLOW_MAX ? FLOW(flow_first_free) : = NULL; > > >=20 > > > if (right) { > > > if (right->f.type =3D=3D FLOW_TYPE_NONE) > > > del->free.next_used =3D right->free.next_used; > > > else > > > del->free.next_used =3D right; > > > } else { > > > del->free.next_used =3D FLOW_MAX; > > > } > > >=20 > > > if (left && left->f.type =3D=3D FLOW_TYPE_NONE) { > > > left->free.next_free =3D FLOW_IDX(del); > > > left->free.next_used =3D del->free.next_used; > > > return; > > > } > > >=20 > > > if (flow_first_free =3D=3D FLOW_MAX) { > > > flow_first_free =3D FLOW_IDX(del); > > > } else if (flow_first_free > FLOW_IDX(del)) { > > > flow_first_free =3D FLOW_IDX(del); > > > del->free.next_free =3D flow_first_free; > > > } else if (flow_first_free < FLOW_IDX(del)) { > > > ; > > > /* Walk free slots from flow_first_free until FLOW_IDX(del) to > > > * find insertion point... but that makes deletion at most O(n), > > > * perhaps O(log(n)), certainly not O(1). =20 > >=20 > > Exactly. I can't see any way (short of far more complex data > > structures) around having a linear scan somewhere, although you can > > kind of choose whether it happens on alloc or free. > >=20 > > The idea of my implementation is to have it at free time - but to > > merge it with an existing linear scan so it won't have an additional > > cost. >=20 > Right, yes, I see now. This idea is also not terribly clear from the > description, by the way, but I don't have a good proposal right now. I'll see what I can come up with. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --FyDaohthlYaKCwwx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmWWghwACgkQzQJF27ox 2Gd7sA/+O6phH/3o03YKaIaIYgKSb5HFw9I10dF0RbbXH4/HqkgD9oIR80Q9aEaH bNGTwybI6XGYJVz9U+ET7guHYvV4/kSDDjhgvr6prJF6AEiQxNEuwXFl3wQdKnIA 9V5cfuUDk5PJNRkcNMZHOw2fYuGh08xb3pHgu5ORMCFUkzFOCzZ1xDjjspZu3uG3 1lRKH/WMiS5PLi2SF8wX/N9nJJcxXN0DWxHkF4OAEGE8uO4wZwtVZCtE+2fRewfO 2DGK0PRz8kuwH45b5kMsRFxEPjpo1yStugBFm8/jvSxxn56pXQMBHGSJbxoy72g8 gQsOCEj5SOUnk0xe6akAWzNLB/J2ccBroM4+S3rqN8iql8JwH0W+qOlnzCZDgFyi 6gTLsSBxZcl9lcmwQ4MvqXnS3iPa0NtNTvvJc9qEPaxVy8wpkkgSbuysf2sr3SWE khV+owiY3xe+7aP6e99IN9XOoJ5oV9wfI6p2+84hSABiwvVsUCpu7GuDxbUlJxPz aG7gMDY4hWRHGOM0yKEPYrJTq9oxfcJ+zzPHgDwH88rAJca5vRX/2H+hczEwM/FR YGvhvTD5Xpw0GQqizWepEWICzpEyphAZFiwG9hJ5YvYjHApXdbIEeGfjv631Vljy MOMrBoME2BEKIpErj24MhqRySFHWAYHBpX2XocrdvGf+bznlLXQ= =VASe -----END PGP SIGNATURE----- --FyDaohthlYaKCwwx--