- 
                Notifications
    
You must be signed in to change notification settings  - Fork 126
 
feat: add Data.Fin.Enum #1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add Data.Fin.Enum #1007
Conversation
c69d46c    to
    3c36b0e      
    Compare
  
    | 
           Mathlib CI status (docs): 
  | 
    
| 
           This still seems asymptotically inefficient and thus not acceptable for actual computation. For instance: 
 Note that both data structures also need to be computed lazily, so that for instance decoding 0 or in general a small value doesn't require to compute set of partial products or partial sum (which would again be linear time rather than sublinear time, and in fact might be larger than available memory). This might require to use some sort of binary tree or set of thunks of increasing-power-of-2-sized arrays instead or to modify core Lean to provide some sort of ThunkArray primitive to. This is important since computable functions should always be asymptotically optimal, and also because I think the "decide" tactic would end up decoding all possible values, so if those functions are linear, then decide will be quadratic, which is very bad.  | 
    
| 
           I don't think it's that simple. Most functions don't have the ability to do precomputation, writing  
 Not exactly.   | 
    
| 
           @lyphyser I totally agree about potential optimizations. This is partly why this PR is marked WIP. Some of your thoughts are also things I am considering for the final PR. Your optimizations focus on time only. Part of the reason for this PR is that these functions should, in principle, far extend the memory capacity of a single machine and perhaps even physical reality! So memory is actually more of a concern than time for the "generic" implementation. This might sound high brow but there are incredibly large arrays of data out there that could easily overload any kind of local storage situation as you propose. Anyway, as far as time optimization is concerned, I think there could be a more general class that allows decoding gaps, where decode returns   | 
    
          
 What do you mean? What does it do? This only computes "precomputed" at most once in most/all programming languages and I would expect it to work in Lean as well under compilation (and also with reduction, why not?) Or do you mean that it could compute "precomputed" less than once/only partially? (which would be good) Perhaps "precomputation" is not the right word for this and "memoization" or "lazy computation" is a better fit. 
 Both are a concern, which is why I suggested that the precomputed data structures need to be lazily implemented, so that getting item at index i of the data structure only requires to create a data structure of size O(i). Note that I think this means they can't literally be simple arrays, but I think that having a lazy list of thunks of arrays where the i-the array has 2^i items which are also thunks (each referencing the previous thunk) and where the concatenation of arrays is the whole sequence should work. A better solution would to implement a ThunkArray in core that can grow the underlying array lazily as needed. There is the downside that this doesn't support the case where the data produced would exceed available memory but still be computable in the available time; I'm not sure how to support this well and it seems to be less important than making decoding of all/almost all values linear instead of quadratic; it's possible to try to band-aid over this by querying total available memory from the OS and not creating the data structure if it would be too large, although of course this will just make the program hang for more than the user is willing to wait anyway if it was relying on sublinear running time. 
 But the user might also want to use native_decide (here e.g. I'm talking about deciding statements of the form "for all x in finitely enumerable type, P(x)", which requires decoding all numbers if no enumerator is provided) or otherwise use it in compiled code so it needs to be asymptotically optimal without relying on that, and if it is asymptotically optimal under compilation, then I think it will be under reduction as well as long as no proofs that aren't constant-sized are present (right?). 
 Mathlib currently has this for infinite types, in the form of Encodable allowing gaps and Denumerable extending Encodable and not allowing gaps. I think it would be good to support this. It would also be good to support an iterable class where elements are computed one-by-one in order, which would be the most efficient way to decide "forall x in finenum, P(x)" statements: this would be like Rust IntoIterator/Iterator, but also including a proof that every value in type is returned at least once, then extended by a stronger interface that also guarantees that they are returned exactly once. But the equivalence-based class in the current pull request should be asymptotically optimal anyway even if those interfaces are present since some use cases might need it and not be able to use other classes. BTW, it would also be good to have more general classes that support efficient encoding/decoding and partial enumeration of infinite countable types in addition to finite types, and ideally also supporting encoding/decoding of an arbitrary countable subset in case of non-countable types (this would be useful if one wants to just choose any N distinct elements computably); some classes expressing that Zero.zero encodes to 0, One.one encodes to 1, Inhabited.default encodes to 0 would also be useful.  | 
    
Co-authored-by: Kim Morrison <477956+kim-em@users.noreply.github.com>
Fin#1332WIP replacement for Mathlib's
FinEnumclass.