-
-
Notifications
You must be signed in to change notification settings - Fork 368
feat: bounded size auto sharding via zarr.config
#3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: bounded size auto sharding via zarr.config
#3574
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3574 +/- ##
==========================================
+ Coverage 61.86% 61.90% +0.03%
==========================================
Files 85 85
Lines 10137 10150 +13
==========================================
+ Hits 6271 6283 +12
- Misses 3866 3867 +1
🚀 New features to boost your workflow:
|
|
Not sure I fully grasp what is going on with codecov, it looks like the tests run and assert the things they should and use the funcionality promised. |
|
In SpatialData we could also have a use case for this. Many users are not as experienced with the whole concept of sharding and chunking, so having the option to provide a maximum size with some explanation in documentation would be very beneficial. |
|
I was curious how @jhamman Is this true? Would |
|
@ilan-gold I think that is correct at the moment. See also pydata/xarray#9947. In dask I am working on having the interface just provide a |
tests/test_config.py
Outdated
| "array": { | ||
| "order": "C", | ||
| "write_empty_chunks": False, | ||
| "max_bytes_per_shard_for_auto_sharding": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we find a denser name for this parameter? Something like max_shard_size_bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also perhaps something more accurate? Like target_shard_size_bytes? It's not meant as a bound but a target
d-v-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small requests but otherwise this looks great, thanks @ilan-gold
…old/zarr-python into ig/chunk_grid_auto_config_setting
|
@d-v-b What is the take on the fact that this is pre-compression? Something I just realized messing around with this. Should we document that more clearly? |
Maybe just indicate in the docs that this setting governs the in-memory size? Also, how will this handle data types that don't have a computable size per element (like variable-length bytes)? |
closes #3568
TODO:
docs/user-guide/*.mdchanges/