DPK Alliance

This project implements a simplification of and extensions to the Data Prep Kit project to better support AI Alliance needs, especially for the Open Trusted Data Initiative.

Data Prep Kit is a community project to democratize and accelerate unstructured data preparation for LLM app developers. With the explosive growth of LLM-enabled use cases, developers are faced with the enormous challenge of preparing use case-specific unstructured data to fine-tune, instruct-tune the LLMs or to build RAG applications for LLMs. As the variety of use cases grow, so does the need to support:

New ways of transforming the data to enhance the performance of the resulting LLMs for each specific use case.
A large variety in the scale of data to be processed, from laptop-scale to datacenter-scale
Support for different data modalities including language, code, vision, multimodal etc

To get more information about implementation architecture, please read this blog post.

The main components of the framework are:

transforms - base classes for transforms creation
runtime - implementation of the runtime responsible for starting and executing transforms
data access - extendable implementation of configurable data access, including local files, S3 and Hugging Face data sets

We also provide several examples of the framework usage:

The project's code was extensively tested leveraging transform testing framework

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github		.github
data_processing_lib		data_processing_lib
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DPK Alliance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

The-AI-Alliance/dpk-alliance

Folders and files

Latest commit

History

Repository files navigation

DPK Alliance

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages