Skip to content

Parquet string serialization breaks conversion to numpy #3696

@DylanModesitt

Description

@DylanModesitt

Version of Awkward Array

2.8.9

Description and code to reproduce

a = ak.Array([{"foo": "bar"}, {"foo": "baz"}])
b = np.asarray(a["foo"])

ak.to_parquet(a, "test.parquet")
x = ak.from_parquet("test.parquet")
y = np.asarray(x["foo"])  # !!!

Results in

  File "/Users/dcm/Developer/awkward/bugex.py", line 15, in main
    y = np.asarray(x["foo"])
  File "/Users/dcm/Developer/awkward/src/awkward/highlevel.py", line 1548, in __array__
    with ak._errors.OperationErrorContext(
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        "numpy.asarray", (self,), {"dtype": dtype, "copy": copy}
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/Users/dcm/Developer/awkward/src/awkward/_errors.py", line 80, in __exit__
    raise self.decorate_exception(exception_type, exception_value)
  File "/Users/dcm/Developer/awkward/src/awkward/highlevel.py", line 1553, in __array__
    return convert_to_array(self._layout, dtype=dtype, copy=copy)
  File "/Users/dcm/Developer/awkward/src/awkward/_connect/numpy.py", line 511, in convert_to_array
    out = ak.operations.to_numpy(layout, allow_missing=False)
  File "/Users/dcm/Developer/awkward/src/awkward/_dispatch.py", line 67, in dispatch
    next(gen_or_result)
    ~~~~^^^^^^^^^^^^^^^
  File "/Users/dcm/Developer/awkward/src/awkward/operations/ak_to_numpy.py", line 48, in to_numpy
    return _impl(array, allow_missing)
  File "/Users/dcm/Developer/awkward/src/awkward/operations/ak_to_numpy.py", line 60, in _impl
    return numpy_layout.to_backend_array(allow_missing=allow_missing)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dcm/Developer/awkward/src/awkward/contents/content.py", line 1126, in to_backend_array
    return self._to_backend_array(allow_missing, backend)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dcm/Developer/awkward/src/awkward/contents/listoffsetarray.py", line 2064, in _to_backend_array
    backend[
    ~~~~~~~^
        "awkward_NumpyArray_prepare_utf8_to_utf32_padded",
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        _max_code_points.dtype.type,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ](
    ^
  File "/Users/dcm/Developer/awkward/src/awkward/_backends/numpy.py", line 32, in __getitem__
    return NumpyKernel(awkward_cpp.cpu_kernels.kernel[index], index)
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: ('awkward_NumpyArray_prepare_utf8_to_utf32_padded', <class 'numpy.uint8'>, <class 'numpy.int32'>, <class 'numpy.int64'>)

This error occurred while calling

    numpy.asarray(
        <Array ['bar', 'baz'] type='2 * string'>
        dtype = None
        copy = None
    )

I think we just need to add kernel specializations for int32/uint32 -- should have a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThe problem described is something that must be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions