Skip to content

Commit 2d6d370

Browse files
committed
Cycle 5: Quantities with Array API support, Improved Support for Masks and Uncertainties
1 parent 4487a3f commit 2d6d370

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
### Title
2+
3+
Quantities with Array API support, Improved Support for Masks and Uncertainties
4+
5+
### Project Team
6+
7+
Marten van Kerkwijk
8+
9+
### Project Description / Scope of Work
10+
11+
I request continued partial buy-out from my professorship at UofT to be able
12+
to work one day a week on projects that are too large for the time I can
13+
otherwise commit for astropy. Specifically, I propose,
14+
15+
- Facilitate Quantity becoming a container class that can handle not just
16+
ndarray but any type of array, i.e., also dask, jax, etc.
17+
- Ensure Quantity is fully compliant with the new Quantity API being developped.
18+
- Extend the same machinery to Masked and Distribution so that all main astropy
19+
classes can use arbitrary array classes.
20+
- Also extend the machinery to the internal arrays used by Time.
21+
- Speed up unit conversion and thus all of astropy by smarter conversion functions
22+
and caching.
23+
- Finish my implementation of a Variable class that tracks uncertainties and
24+
their correlations analytically (based on the uncertainties package).
25+
26+
#### Roadmap Items
27+
28+
I split these into direct goals of my work and pieces that will be enabled by
29+
it. Here, note that my goal of adding quantity support for non-numpy arrays
30+
includes support for JAX and Dask arrays, which would thus provide a major
31+
requirement for astropy as a whole having support for those.
32+
33+
Directly addressed:
34+
35+
- :green_circle: Add quantity support for non-NumPy arrays.
36+
37+
- :large_orange_diamond: Improve interoperability between unit packages (e.g.,
38+
`astropy.units`, `pint`, `unyt`).
39+
40+
Provides a major requiremeent for:
41+
42+
- :red_square: Support JIT compilation (e.g., numba, JAX, etc.) throughout
43+
Astropy core and coordinated packages.
44+
45+
- :large_orange_diamond: Improve and/or maintain interoperability with
46+
performant I/O file formats and libraries such as HDF5 and Dask.
47+
48+
#### Project / Work / Deliverables
49+
50+
Prior to cycle 4, I spent about a day per week on astropy core, in reviews,
51+
bug fixes, and development. I managed to use extra time for fairly large
52+
developments (Quantity historically and Masked and Uncertainty more recently,
53+
with also fairly major contributions to Time, Table, Representation and
54+
numpy), but it was difficult to find enough time to actually wrap up larger
55+
projects (at least outside sabbaticals). This changed with cycle 4 funding,
56+
and a major part of this request is to complete some of the main parts of the
57+
project I proposed for that cycle.
58+
59+
In particular, in the current cycle I have started to develop Quantity 2.0.
60+
As proposed in [APE 25](https://github.com/astropy/astropy-APEs/pull/91), this
61+
follows the [Array API](https://data-apis.org/array-api/), ensuring
62+
that the new Quantity class will work with any array that supports that API,
63+
which includes those that really matter, like Dask for large, disk-based data
64+
sets and JAX for GPU acceleration. There is a
65+
[prototype](https://github.com/astropy/quantity-2.0), which already supports a
66+
large part of the Array API (basically, those provided by numpy ufuncs) for JAX
67+
and Dask. The work has been waylaid a little in a good way: during this period,
68+
serious discussions started between the various units packages about a shared
69+
[Quantity API](https://github.com/quantity-dev), which we would of course want
70+
to follow.
71+
72+
The primary goal of my proposal here is to finish the implementation, make it
73+
compatible with the new Quantity API, ensure there are no performance
74+
regressions, and of course document it all.
75+
76+
A nice benefit of the approach laid out in
77+
[APE 25](https://github.com/astropy/astropy-APEs/pull/91) is that it will be
78+
very easy to extend it to Masked and Distribution (and possibly Variable), as
79+
those basically are already the type of container classes that APE 25
80+
envisions.
81+
82+
Furthermore, a direct benefit of Quantity being able to use other array types
83+
than ndarray is that this will nearly automatically extend to coordinates
84+
(since those use quantities almost exclusively; I foresee little more work
85+
than adjusting tests!). Time will be slightly more work, as it works directly
86+
with ndarray, but also here the path is straightforward: I can just follow my
87+
earlier work on ensuring Time can work with Masked.
88+
89+
Most of the above would benefit application of astropy on large arrays, by
90+
allowing disk-based ones, and analysis via GPUs. But astropy is often used on
91+
small arrays too, and while reviewing our own Quantity code as well as the
92+
code for ndarray that it relies on, I realized there are a number of ways in
93+
which we can improve the performance of Quantity and Unit operations for
94+
scalars and small arrays, mostly by reducing overhead. Some initial PRs on
95+
the numpy side add a [fast path for
96+
scalars](https://github.com/numpy/numpy/pull/29819) and [include array storage
97+
in the object](https://github.com/numpy/numpy/pull/29878). On the Quantity
98+
side proper, I have a skeleton of code that would make unit conversion
99+
substantially faster, especially if combined with caching. This would again
100+
mostly benefit small arrays. Also for larger ones, I see a nice path forward:
101+
the new dtype machinery of numpy provides a way to do the scaling needed for
102+
unit conversion as part of an operation, thus avoiding the need to create
103+
large temporary arrays.
104+
105+
Finally, an undergraduate I was taught that a number without a unit or an
106+
uncertainty is meaningless. Quantity provides the former, and Distribution
107+
provides a monto-carlo like method for the latter. But often we just would
108+
like to have error propagation, but including covariance. More than a decade
109+
ago, I made a [PR](https://github.com/astropy/astropy/pull/3715) to introduce
110+
a Variable class that tracks uncertainties and covariances (based on the
111+
[uncertainties package](https://pythonhosted.org/uncertainties/), but extended
112+
it to deal natively with arrays). This has been stalled since, but I believe
113+
would still be super useful. A stretch goal of the current proposal is to
114+
finally finish it.
115+
116+
### Approximate Budget
117+
118+
I request funding to replace salary equivalent to one day a week, reducing my
119+
regular employment at the University of Toronto correspondingly. At a
120+
standard rate of USD 150/hour for 8 hours per week and 45 weeks, this
121+
corresponds to USD $54000 per year.
122+
123+
### Period of Performance
124+
125+
Ideally, I would be covered until June 2027, which is the end of an academic
126+
year.
127+
128+
I note that the funding provides me with teaching relief, which is for one
129+
semester of an academic year (where thus more than 1 day/week is spent on
130+
astropy, while less time is spent when I teach). So far, the relevant
131+
semesters have been July-December 2024 and January to June 2026. It may be
132+
possible to ensure the next semester will be July to December 2026, so that
133+
most work is finished in 2026.

0 commit comments

Comments
 (0)