Releases: AbsaOSS/atum-service
v0.5.1
v0.5.0
Breaking Changes ⚠️
- *Dropped Support for Spark 2.4 developed by @benedeki, @lsulak in #196
- Users must now run Atum Service on Spark versions later than 2.4. This change improves compatibility with newer Spark features and enhances stability.
New Features 🎉
-
Created PATCH endpoint
/api/v2/partitionings/{partitioning_id}/parentsto patch the partitioning parents developed by @ABLL526, @lsulak, @benedeki in #273- This enables users to update parent relationships of partitionings without full recreation, simplifying workflows and reducing overhead.
-
Created GET endpoint
/api/v2/partitionings/{partitioning_id}/ancestorsreturns all ancestors, not just direct ones developed by @ABLL526, @lsulak, @benedeki in #305- Users now have easy access to the full hierarchy of partitionings, supporting advanced lineage analysis and debugging.
-
Created endpoints for Kubernetes developed by @salamonpavel in #354
- These endpoints make Kubernetes integration smoother, enabling better orchestration, monitoring, and automation.
Bugfixes 🛠
- Measures can be created with custom measureName developed by @salamonpavel, @benedeki in #342
- Missing support for private constructors for case classes in Scala 2.12 required measure classes to be refactored to standard classes.
- Restored protection when using custom measures
Technical ⚙️
- Refactored Server's http package developed by @salamonpavel in #332
- Refactored app structure with respect to API versioning developed by @salamonpavel in #335
- Moved ErrorMonad type bound in Reader classes to evidence based developed by @salamonpavel in #339
- Improved REST API test coverage to include ancestor and parent endpoints developed by @ABLL526 in #367
- Made Atum Service releasable under the new Maven central repository developed by @ABLL526 in #370
Full Changelog
v0.4.1
New Features 🎉
- 2 new control functions added sum of truncated values developed by @ABLL526 in #314
- Created the
aggregatedTruncTotalmeasure - only the whole part of the number is used in the control sum. - Created the
absAggregatedTruncTotalmeasure - only the absolute value of the whole part fo the number is used in control sum .
- Created the
Bugfixes 🛠
- Fixed
has_moreflag computation is SQL functions that support pagination. This affects the associated flag in REST API response data too developed by @salamonpavel in #337 - Reader classes' signatures changed to receive MonadError instance as implicit evidence instead of type bound developed by @salamonpavel in #339
Silent Live 🤫
- Endpoint
GET /partitionings/{partId}/parents-> returns all ancestors of a partitioning` developed by @ABLL526 in #305
Known issues ⚠️
Measurescan be created with custommeasureNamewhich saves the broken data causing eventual problems in #342- Dependency shading might be needed when using the Agent in Spark environment (especially when some Hadoop dependencies are in use as well, for example if you package the application that contains the Agent and use such JAR in Spark Submit command). Issue in #343. Here's the suggested project code snippet for Maven:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven.shade.plugin.version}</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<relocations>
<relocation>
<pattern>okhttp3</pattern>
<shadedPattern>shaded.okhttp3</shadedPattern>
</relocation>
<relocation>
<pattern>okio</pattern>
<shadedPattern>shaded.okio</shadedPattern>
</relocation>
<relocation>
<pattern>sttp</pattern>
<shadedPattern>shaded.sttp</shadedPattern>
</relocation>
<relocation>
<pattern>cats</pattern>
<shadedPattern>shaded.cats</shadedPattern>
</relocation>
<relocation>
<pattern>shapeless</pattern>
<shadedPattern>shaded.shapeless</shadedPattern>
</relocation>
<relocation>
<pattern>kotlin</pattern>
<shadedPattern>shaded.kotlin</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>Full Changelog
v0.4.0
New Features 🎉
- Created a module named Reader, that reads information stored on the server in #243
- Added the ability to query REST endpoints from Reader module developed by @benedeki co-authored by @lsulak in #245
- Implement basics of
ParitioningReader, a class to read Partitioning data developed by @benedeki in #246 - Implement basics of
FlowReader, a class to read Flow data developed by @benedeki in #247
- GetFlowCheckpoints endpoint refactored to reverse the order of data returned and to include partitioning data developed by @salamonpavel, @benedeki in #303
- Change table
runs.checkpointsto set columnmeasured_by_atum_agentto haveNOT NULLconstraint withDEFAULT FALSEdeveloped by @ABLL526 in #242 - Server now includes AWS Sts dependency by @salamonpavel in #295
- Several endpoints' Swagger documentation has been published (the endpoints had existed before, but they usage discouraged because of the chance of change)
Known Issues ⚠️
- Dependency shading might be needed when using the Agent in Spark environment (especially when some Hadoop dependencies are in use as well, for example if you package the application that contains the Agent and use such JAR in Spark Submit command). Issue in #343. Here's the suggested project code snippet for Maven:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven.shade.plugin.version}</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<relocations>
<relocation>
<pattern>okhttp3</pattern>
<shadedPattern>shaded.okhttp3</shadedPattern>
</relocation>
<relocation>
<pattern>okio</pattern>
<shadedPattern>shaded.okio</shadedPattern>
</relocation>
<relocation>
<pattern>sttp</pattern>
<shadedPattern>shaded.sttp</shadedPattern>
</relocation>
<relocation>
<pattern>cats</pattern>
<shadedPattern>shaded.cats</shadedPattern>
</relocation>
<relocation>
<pattern>shapeless</pattern>
<shadedPattern>shaded.shapeless</shadedPattern>
</relocation>
<relocation>
<pattern>kotlin</pattern>
<shadedPattern>shaded.kotlin</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>Full Changelog
v0.3.0
Breaking Changes 💥
- Additional data methods of
AtumContextuses REST API v2 (incompatibility of Agent of 0.3.0+ with server 0.2.0) by @lsulak in #283 - Full Flyway integration developed by @benedeki in #276
New Features 🎉
-
Atum server REST API v2 developed by @salamonpavel, @TebaleloS, @lsulak, @benedeki in #140
-
Introduced response envelopes providing additional metadata (requestId) for REST API v2 endpoints by @salamonpavel in #197
-
Replaced Json4s and Jackson serialization libraries with Circe by @TebaleloS, @salamonpavel, @benedeki in #214
-
Introduced health API endpoint in a form StatusBoard projects expects by @salamonpavel in #282
-
Dockerfile and application configuration verified for deployment with ZIO and Http4s web server by @salamonpavel in #274
-
Dockerfile adjusted to ZIO framework and custom configuration now being passed during docker run, i.e. independent of the sbt build and docker build by @lsulak in #279
Silent Live 🤫
- Introduced the Reader module to make reading of information stored in Atum server easy. by @benedeki in #248 (not publsiehd yet, only in code-base)
- Atum server REST API v2 endpoints developed by @salamonpavel, @TebaleloS, @lsulak, @benedeki in #140
- There are numerous other endpoints implemented beside those mentioned above. We yet discourage from their usage though, as they are subject to change, particularly their payloads.
Known Issues ⚠️
- Dependency shading might be needed when using the Agent in Spark environment (especially when some Hadoop dependencies are in use as well, for example if you package the application that contains the Agent and use such JAR in Spark Submit command). Here's the suggested project code snippet for Maven:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven.shade.plugin.version}</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<relocations>
<relocation>
<pattern>okhttp3</pattern>
<shadedPattern>shaded.okhttp3</shadedPattern>
</relocation>
<relocation>
<pattern>okio</pattern>
<shadedPattern>shaded.okio</shadedPattern>
</relocation>
<relocation>
<pattern>sttp</pattern>
<shadedPattern>shaded.sttp</shadedPattern>
</relocation>
<relocation>
<pattern>cats</pattern>
<shadedPattern>shaded.cats</shadedPattern>
</relocation>
<relocation>
<pattern>shapeless</pattern>
<shadedPattern>shaded.shapeless</shadedPattern>
</relocation>
<relocation>
<pattern>kotlin</pattern>
<shadedPattern>shaded.kotlin</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>Full Changelog
v0.1.1
This version fixes the configuration of the application in the Dockerized environment.
Bugfixes 🛠
- Wrong format of application.properties file by @lsulak in #130
- Renaming the
application.propertiesfile to be just a template and not the real one, the docker image MUST provide it by @lsulak in #131 - Bugfix/remove hardcoded
application.propertiesreferences from the code and adding ability to use config fromSPRING_CONFIG_LOCATIONenv var by @lsulak in #132
Full Changelog: v0.1.0...v0.1.1
v0.2.0
Breaking Changes 💥
- Dropped support of Spark 2.4 by @benedeki, @lsulak, @salamonpavel in #193
- Server moved from Spring to Zio/Tapir by @salamonpavel in #145
- As the application has Http4s Blaze server backend included now there is no need for any servlet container like Tomcat.
- The application is packaged as JAR file and run directly using java-jar.
- Server requires Java 11 platform by @salamonpavel in #151
- The groupId of the libraries changed from
za.co.absatoza.co.absa.atum-service.atum-service
New Features 🎉
- Flows can now be identified by their "main partitioning" - the partitioning they were created by @benedeki in #178
- Implemented monitoring of Atum server's runtime and of http server's communication. by @salamonpavel, @benedeki in #166
- Database functions (API) to get checkpoints of a partitioning or flow by @lsulak in #187 and @TebaleloS, @benedeki in #189
- To improve testability of agent in
AtumAgentthe class was refactored andCapturingDispatcher(in memory storage of server requests) was added by @filiphornak, @benedeki in #97 - Integration tests defined and distinguished from unite tests and added to CI/CD by @miroslavpojer in #185
- Partitioning is now checked to be in expected JSON format upon write to DB by @lsulak, @benedeki in #69
- DB login credentials are read from AWS Secrets Manager by @TebaleloS, @lsulak in #107
- _Using Fa-Db library with Doobie as engine instead of Slick by @salamonpavel in #148
- Atum server is now build using Scala 2.13 by @salamonpavel in #149
- Ability to save and retrive Additional data (additional metadata) with
AtumContextby @benedeki, @lsulak, @salamonpavel, @TebaleloS in #36 - Measures now require 0-n columns in their definition instead of exactly one (depending on the function nature) by @salamonpavel in #100
Atum Contextcontent is now properly read from Atum Server by @benedeki, @lsulak in #59
Bugfixes 🛠
- A request for
AtumContextcontaining custom/unknown measure will not fail anymore by @salamonpavel in #170 - Sbt cross-build fixed by @benedeki, @salamonpavel in #184
Full Changelog
v0.1.0
Initial release of the Atum service
Server
- has two endpoints
/api/v1/createPartitioningto register or retrieve a partitioning and optionally establish a a relation with another partitioning/api/v1/createCheckpointto record measurement data
- connects to Postgres DB that stores the data
- newly created partitioning automatically contains the count function to measure
Agent
- spawn context based on key provided (partitioning)
- add measuring functions; supported now are:
- count
- distinctCount
- aggregatedTotal - sum of values in the column
- absAggregatedTotal
- hashCrc32
- provides interfaces to measure data completeness on DataFrames (create checkpoints)
Database
- created, including DB Roles and an ownership model of the database objects for the Roles
- stores and processes data related to:
- Partitioning
- Additional Data
- Measurement
- Measure Definition
- Checkpoint
- Flow a concept how to describe the data as they go through the systems and how different partitionings relate.