feat: add Data.Fin.Enum #1007

fgdorais · 2024-10-25T00:45:13Z

Depends on feat: coding functions for Fin #1332

WIP replacement for Mathlib's FinEnum class.

leanprover-community-bot · 2024-10-25T02:07:42Z

Mathlib CI status (docs):

✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2024-10-25 02:07:41) View Log
🟡 Mathlib branch batteries-pr-testing-1007 build against this PR was cancelled. (2024-10-25 21:25:33) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2024-10-25 22:21:35) View Log
❌ Mathlib branch batteries-pr-testing-1007 built against this PR, but was unexpectedly noisy. (2024-10-26 01:36:34) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2024-10-26 02:48:22) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2024-10-26 14:06:56) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2024-12-02 19:47:55) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2025-03-24 19:45:05) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2025-03-25 14:48:18) View Log
🟡 Mathlib branch batteries-pr-testing-1007 build against this PR was cancelled. (2025-05-31 20:53:34) View Log
✅ Mathlib branch batteries-pr-testing-1007 has successfully built against this PR. (2025-05-31 21:38:10) View Log

lyphyser · 2024-10-29T22:38:18Z

This still seems asymptotically inefficient and thus not acceptable for actual computation.

For instance:

(decodePi f x) has runtime linear in n. Instead it should be sublinear by keeping a data structure of partial products and just doing an (x / a) % b where a is the product up to the input index and b is the size of the Fin (f i) corresponding to the input index i (both a and b need to be retrieved in sublinear time from a lazy data structure)
(decodeSigma f) is also linear in n. Instead it should be sublinear by performing a binary search in an array or a search in a binary tree of partial sums to find the first member of the pair as the index, and then subtract the value to get the second member.
haven't checked others, but in general any decode function that recurses is probably suboptimal

Note that both data structures also need to be computed lazily, so that for instance decoding 0 or in general a small value doesn't require to compute set of partial products or partial sum (which would again be linear time rather than sublinear time, and in fact might be larger than available memory). This might require to use some sort of binary tree or set of thunks of increasing-power-of-2-sized arrays instead or to modify core Lean to provide some sort of ThunkArray primitive to.

This is important since computable functions should always be asymptotically optimal, and also because I think the "decide" tactic would end up decoding all possible values, so if those functions are linear, then decide will be quadratic, which is very bad.

digama0 · 2024-10-30T00:39:32Z

I don't think it's that simple. Most functions don't have the ability to do precomputation, writing let x := precomputed; fun y => f x y doesn't do what you think it does. So you have to think about the cost of doing the calculation from scratch. Even if they could do precomputation, it's not necessarily a win - keeping an array of partial products could be hugely expensive if the number of possible values is large.

This is important since computable functions should always be asymptotically optimal, and also because I think the "decide" tactic would end up decoding all possible values, so if those functions are linear, then decide will be quadratic, which is very bad.

Not exactly. decide uses kernel computation, and kernel computation is pervasively cached. So calling a function which is linear time, a linear number of times, might also be linear time depending on how similar the computations are.

fgdorais · 2024-10-30T01:20:15Z

@lyphyser I totally agree about potential optimizations. This is partly why this PR is marked WIP. Some of your thoughts are also things I am considering for the final PR.

Your optimizations focus on time only. Part of the reason for this PR is that these functions should, in principle, far extend the memory capacity of a single machine and perhaps even physical reality! So memory is actually more of a concern than time for the "generic" implementation. This might sound high brow but there are incredibly large arrays of data out there that could easily overload any kind of local storage situation as you propose.

Anyway, as far as time optimization is concerned, I think there could be a more general class that allows decoding gaps, where decode returns none sometimes. That could be used to delay range compression until the end. That would certainly help in some places, but I'm not sure there is a great enough overall gain in doing so.

lyphyser · 2024-10-30T10:14:21Z

Most functions don't have the ability to do precomputation, writing let x := precomputed; fun y => f x y doesn't do what you think it does

What do you mean? What does it do? This only computes "precomputed" at most once in most/all programming languages and I would expect it to work in Lean as well under compilation (and also with reduction, why not?)

Or do you mean that it could compute "precomputed" less than once/only partially? (which would be good) Perhaps "precomputation" is not the right word for this and "memoization" or "lazy computation" is a better fit.

So memory is actually more of a concern than time for the "generic" implementation

Both are a concern, which is why I suggested that the precomputed data structures need to be lazily implemented, so that getting item at index i of the data structure only requires to create a data structure of size O(i).

Note that I think this means they can't literally be simple arrays, but I think that having a lazy list of thunks of arrays where the i-the array has 2^i items which are also thunks (each referencing the previous thunk) and where the concatenation of arrays is the whole sequence should work. A better solution would to implement a ThunkArray in core that can grow the underlying array lazily as needed.

There is the downside that this doesn't support the case where the data produced would exceed available memory but still be computable in the available time; I'm not sure how to support this well and it seems to be less important than making decoding of all/almost all values linear instead of quadratic; it's possible to try to band-aid over this by querying total available memory from the OS and not creating the data structure if it would be too large, although of course this will just make the program hang for more than the user is willing to wait anyway if it was relying on sublinear running time.

Not exactly. decide uses kernel computation, and kernel computation is pervasively cached

But the user might also want to use native_decide (here e.g. I'm talking about deciding statements of the form "for all x in finitely enumerable type, P(x)", which requires decoding all numbers if no enumerator is provided) or otherwise use it in compiled code so it needs to be asymptotically optimal without relying on that, and if it is asymptotically optimal under compilation, then I think it will be under reduction as well as long as no proofs that aren't constant-sized are present (right?).

Anyway, as far as time optimization is concerned, I think there could be a more general class that allows decoding gaps, where decode returns none sometimes

Mathlib currently has this for infinite types, in the form of Encodable allowing gaps and Denumerable extending Encodable and not allowing gaps.

I think it would be good to support this.

It would also be good to support an iterable class where elements are computed one-by-one in order, which would be the most efficient way to decide "forall x in finenum, P(x)" statements: this would be like Rust IntoIterator/Iterator, but also including a proof that every value in type is returned at least once, then extended by a stronger interface that also guarantees that they are returned exactly once.

But the equivalence-based class in the current pull request should be asymptotically optimal anyway even if those interfaces are present since some use cases might need it and not be able to use other classes.

BTW, it would also be good to have more general classes that support efficient encoding/decoding and partial enumeration of infinite countable types in addition to finite types, and ideally also supporting encoding/decoding of an arbitrary countable subset in case of non-countable types (this would be useful if one wants to just choose any N distinct elements computably); some classes expressing that Zero.zero encodes to 0, One.one encodes to 1, Inhabited.default encodes to 0 would also be useful.

Co-authored-by: Kim Morrison <477956+kim-em@users.noreply.github.com>

fgdorais added the WIP work in progress label Oct 25, 2024

fgdorais force-pushed the fin-coding branch 4 times, most recently from c69d46c to 3c36b0e Compare October 25, 2024 01:22

feat: add Data.Fin.Coding and Data.Fin.Enum

8412201

fgdorais force-pushed the fin-coding branch from 3c36b0e to 8412201 Compare October 25, 2024 01:27

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 25, 2024

Update Batteries branch for testing leanprover-community/batteries#1007

45e73ef

leanprover-community-bot added the builds-mathlib label Oct 25, 2024

fgdorais added 3 commits October 25, 2024 12:19

chore: use encode/decode

aef9392

feat: make find? tail recursive

146f718

chore: fix long line

401211b

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 25, 2024

Trigger CI for leanprover-community/batteries#1007

5720c0f

chore: cleanup

4e209ef

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 25, 2024

Trigger CI for leanprover-community/batteries#1007

9dbede0

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 26, 2024

Trigger CI for leanprover-community/batteries#1007

7b4a99b

leanprover-community-bot added breaks-mathlib and removed builds-mathlib labels Oct 26, 2024

fgdorais force-pushed the fin-coding branch from ee4b11d to 8b95016 Compare October 26, 2024 02:07

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 26, 2024

Trigger CI for leanprover-community/batteries#1007

eaf496f

leanprover-community-bot added builds-mathlib and removed breaks-mathlib labels Oct 26, 2024

feat: add instance for Char

9927b6c

fgdorais force-pushed the fin-coding branch from 8b95016 to 9927b6c Compare October 26, 2024 13:24

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 26, 2024

Trigger CI for leanprover-community/batteries#1007

46625fc

leanprover-community-mathlib4-bot added the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Feb 26, 2025

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Mar 24, 2025

Trigger CI for leanprover-community/batteries#1007

d045c18

leanprover-community-mathlib4-bot removed the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Mar 24, 2025

chore: merge main

efd2554

fgdorais force-pushed the fin-coding branch from d899e1e to efd2554 Compare March 25, 2025 13:56

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Mar 25, 2025

Trigger CI for leanprover-community/batteries#1007

443f114

chore: merge main

77c21db

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request May 31, 2025

Trigger CI for leanprover-community/batteries#1007

74e28cd

feat: add Char lemma

7e1c1a8

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request May 31, 2025

Trigger CI for leanprover-community/batteries#1007

2855459

fgdorais added 3 commits July 19, 2025 05:06

chore: merge main

0efc743

chore: delete Fin.Enum

b6f1ca2

chore: restore Fin.Enum

1f7686f

fgdorais changed the title ~~feat: add Data.Fin.Coding and Data.Fin.Enum~~ feat: add Data.Fin.Enum Jul 19, 2025

fgdorais mentioned this pull request Jul 21, 2025

feat: coding functions for Fin #1332

Open

1 task

fgdorais and others added 4 commits July 23, 2025 06:44

chore: delete empty file (#1336)

b3c4434

feat: Laws for MonadStateOf instances (#1313)

5f622f1

feat: add ASCII character casing lemmas (#1333)

1ea61e7

chore: merge main

6b1536c

leanprover-community-mathlib4-bot added the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Jul 23, 2025

fgdorais and others added 8 commits August 8, 2025 20:12

fix: review comments

0ffd635

chore: merge main

83ba8d5

chore: cleanup simps

d751a0a

chore: cleanup

7ecf87d

Merge branch 'main' into fin-coding-recipes

00195b1

fix: code review suggestions

a610da8

Co-authored-by: Kim Morrison <477956+kim-em@users.noreply.github.com>

chore: merge main

6761523

chore: merge fin-coding-recipes

f7166ad

leanprover-community-mathlib4-bot removed the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Sep 8, 2025

leanprover-community-mathlib4-bot added the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Data.Fin.Enum #1007

feat: add Data.Fin.Enum #1007

fgdorais commented Oct 25, 2024 •

edited

Loading

Uh oh!

leanprover-community-bot commented Oct 25, 2024 •

edited

Loading

Uh oh!

lyphyser commented Oct 29, 2024 •

edited

Loading

Uh oh!

digama0 commented Oct 30, 2024 •

edited

Loading

Uh oh!

fgdorais commented Oct 30, 2024

Uh oh!

lyphyser commented Oct 30, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

feat: add Data.Fin.Enum #1007

Are you sure you want to change the base?

feat: add Data.Fin.Enum #1007

Conversation

fgdorais commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leanprover-community-bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyphyser commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

digama0 commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fgdorais commented Oct 30, 2024

Uh oh!

lyphyser commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

fgdorais commented Oct 25, 2024 •

edited

Loading

leanprover-community-bot commented Oct 25, 2024 •

edited

Loading

lyphyser commented Oct 29, 2024 •

edited

Loading

digama0 commented Oct 30, 2024 •

edited

Loading

lyphyser commented Oct 30, 2024 •

edited

Loading