-
-
Notifications
You must be signed in to change notification settings - Fork 367
add bytes dtype #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add bytes dtype #3559
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3559 +/- ##
==========================================
- Coverage 61.89% 61.72% -0.18%
==========================================
Files 85 85
Lines 10153 10246 +93
==========================================
+ Hits 6284 6324 +40
- Misses 3869 3922 +53
🚀 New features to boost your workflow:
|
…ions for bytes / variable-length bytes; set up alias logic for variable-length bytes; parse_dtype now takes the bytes type as an alias for the bytes dtype
|
I am assuming that zarr-developers/zarr-extensions#38 will be merged soon, as it was approved by @normanrz. the The scope of this PR has broadened somewhat because we are effectively replacing one data type ( These two data types cannot co-exist in the data type registry, because they have the same zarr v2 JSON representation ( So I removed For any users who are committed to the This is not a breaking change for runtime functionality ( |
| if dtype_spec is bytes: | ||
| # Treat the bytes type as a request for the Bytes dtype | ||
| return Bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging this change -- parse_dtype(bytes, zarr_format = 3) will now return an instance of Bytes
| def enable_legacy_bytes_dtype() -> None: | ||
| """ | ||
| Unregister the new Bytes data type from the registry, and replace it with the | ||
| VariableLengthBytes dtype instead. Used for backwards compatibility. | ||
| """ | ||
| if ( | ||
| "bytes" in data_type_registry.contents | ||
| and "variable_length_bytes" not in data_type_registry.contents | ||
| ): | ||
| data_type_registry.unregister("bytes") | ||
| data_type_registry.register("variable_length_bytes", VariableLengthBytes) | ||
|
|
||
|
|
||
| def disable_legacy_bytes_dtype() -> None: | ||
| """ | ||
| Unregister the old VariableLengthBytes dtype from the registry, and replace it with | ||
| the new Bytes dtype. Used to reverse the effect of enable_legacy_bytes_dtype | ||
| """ | ||
| if ( | ||
| "variable_length_bytes" in data_type_registry.contents | ||
| and "bytes" not in data_type_registry.contents | ||
| ): | ||
| data_type_registry.unregister("variable_length_bytes") | ||
| data_type_registry.register("bytes", Bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these functions let users effectively disable the changes in this PR
| class DataTypeValidationError(ValueError): ... | ||
|
|
||
|
|
||
| class ScalarTypeValidationError(ValueError): ... | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these errors were moved to the main errors module
| class ScalarTypeValidationError(ValueError): ... | ||
|
|
||
|
|
||
| class DataTypeResolutionError(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a new error class that models problems encountered during data type resolution, i.e. the process of mapping an input to one of the available data types. This is different from data type validation, which we have defined to mean the process of mapping an input to a specific data type.
This PR adds a
Bytesdtype that is nearly identical to the existingVariableLengthBytesdtype save a few differences:"bytes"instead of"variable_length_bytes"Bytesis consistent with a published spec, instead of not being described by a spec.The latter point is the most important.
Because
Bytesis nearly identical toVariableLengthBytes, it could be a drop-in replacement forVariableLengthBytes, save for the JSON fill value encoding differences between the two codecs, which raises two questions:bytesdata type and thevariable_length_bytesdata types could be fused completely.Since zarr python 2.x was saving bytes fill values as base64-encoded strings, there's a bit of inertia there. Would also be good to hear thoughts from other implementers (@LDeakin , @jbms, @manzt )