DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.
If you like this project, please give it a Star⭐️!
- Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
- Visual Orchestration: Drag-and-drop data processing workflow design.
- Operator Ecosystem: Rich built-in operators and support for custom operators.
- Git (for pulling source code)
- Make (for building and installing)
- Docker (for building images and deploying services)
- Docker-Compose (for service deployment - Docker method)
- Kubernetes (for service deployment - k8s method)
- Helm (for service deployment - k8s method)
This project supports deployment via two methods: docker-compose and helm. After executing the command, please enter the corresponding number for the deployment method. The command echo is as follows:
Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMatemake installmake build-mineru
make install-mineru- Modify
runtime/deer-flow/.env.exampleand add configurations for SEARCH_API_KEY and the EMBEDDING model. - Modify
runtime/deer-flow/.conf.yaml.exampleand add basic model service configurations. - Execute
make install-deer-flow
After modifying the local code, please execute the following commands to build the image and deploy using the local image.
make build
make install REGISTRY=""Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it's submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make the project better.
• 📮 GitHub Issues: Submit bugs or feature suggestions.
• 🔧 GitHub Pull Requests: Contribute code improvements.
DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.