- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3k
 
Restructure project as monorepo. #2111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3/main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR restructures the codebase into a monorepo by extracting the Factory pattern into a separate package (graphrag-factory) and updating all import references. The Factory class is enhanced with singleton support and improved error messages.
Key Changes:
- Extracted Factory class into standalone 
graphrag-factorypackage with enhanced singleton/transient service scope support - Updated all Factory imports from 
graphrag.factory.factorytographrag_factory - Added 
--all-packagesflags to CI/CD workflows to support monorepo structure 
Reviewed Changes
Copilot reviewed 26 out of 403 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| packages/graphrag-factory/pyproject.toml | New package configuration for extracted Factory module | 
| packages/graphrag-factory/graphrag_factory/factory.py | Enhanced Factory class with singleton/transient service scopes | 
| packages/graphrag-factory/graphrag_factory/init.py | Package initialization exposing Factory class | 
| packages/graphrag-factory/README.md | Documentation for the new Factory package with usage examples | 
| packages/graphrag/graphrag/logger/factory.py | Updated Factory import to use new package | 
| packages/graphrag/graphrag/language_model/factory.py | Updated Factory import to use new package | 
| packages/graphrag/graphrag/language_model/providers/litellm/services/retry/retry_factory.py | Updated Factory import to use new package | 
| packages/graphrag/graphrag/language_model/providers/litellm/services/rate_limiter/rate_limiter_factory.py | Updated Factory import to use new package | 
| packages/graphrag/graphrag/index/input/factory.py | Updated Factory import to use new package | 
| packages/graphrag/graphrag/cache/factory.py | Updated Factory import to use new package | 
| packages/graphrag/README.md | New README for graphrag package within monorepo | 
| .vscode/launch.json | Enhanced debug configuration with user input prompts | 
| .github/workflows/*.yml | Updated CI/CD workflows to use --all-packages flag | 
| docs/examples_notebooks/*.ipynb | Formatting cleanup of import statements | 
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 
           Please include an architecture diagram in the documentation illustrating this change, along with a short explanation of what each submodule is responsible for. This will help establish clear guardrails for future development. I might be misunderstanding, but from what I see, this change introduces two modules — Factories and GraphRAG. Could you clarify the role of the Factory module? Specifically, what’s the rationale for treating it as a standalone logical unit worth exposing independently?  | 
    
          
 Hey @AlonsoGuevara, the monorepo structure does not change the system architecture or public API surface of GraphRAG. The workflows and all the pieces still fit together and work as they have been. The monorepo structure just pulls out some code into separate, independent pypi packages so that they can be used in isolation in our other projects. Our team GraphRAG Monorepo loop page discusses goals, principles, and modules to pull out. I think that might answer questions about guardrails and future development plans but let me know if I am misunderstanding that piece. 
 So far there are only two packages but there will be more packages pulled out from graphrag core in future PRs that I am working on. I did two packages instead of one in this PR to give a better idea of the monorepo structure and how it will look as we add more packages. Factory was chosen as the first package to pull out from core because it is simple with minimal impact and will need to exist as a package as other packages we pull out (cache, vectorstore, etc) will rely on the base factory class. Let me know if you disagree with factory needing to be its own package and what alternate approach may be better suited. One such alternative approach may be to just copy the factory class code to packages that need it.  | 
    
          
 Just so that I understand correctly, what you are describing here is to have something like: flowchart TD
    A[graphrag] -->|depends on| B[graphrag-vectorstore]
    B --> |depends on| C[graphrag-factory]
    A -->|depends on| C
    
 I kind of don't like the idea of exposing a package that only have one file in it, and we would need to publish this into pypi so that it can be used as a dependency in other packages. Also, would this mean that for example if I had my own custom implementation of a vector store, would i need to first register that vectorstore in some factory in graphrag-vectorstore and then pass that to graphrag-core? What do you think about not having a graphrag-factory and let graphrag-core manage the factories so that we don't have that dependency and only have graphrag-core depend on the different packages? Since graphrag-core will depend on the different vectorstore, cache, etc it will have access to the ABC or Protocols we have in there so it would be able to create and manage all the factories it needs to work and register default implementations, while not having to copy paste the factories in every module. Let me know what you think :)  | 
    
          
 Fair point. GraphRAG core can and will manage some of the factories. The one other package I know that will need a factory implementation is  
 Not exactly. I did a poor job of listing out packages. My list was merely a hypothetical list of packages that may need a factory but I agree with your point that some of this management should be done by  
 Why not? GitHub actions will manage publishing to pypi so that's not problematic. Another approach would be to not roll out our own DI container logic and lean on an existing library like https://pypi.org/project/dependency-injector/ but that is a bigger lift and there has been hesitation to do this in the past. 
 I may be misunderstanding this point, but this is true regardless of where the factory lives. Whether the factory is in  If the concern is around what gets imported,  I hope I was able to address your concerns in a reasonable manner. In hindsight, I wish I kept this PR more focused and had only 1 package in the PR,   | 
    
Restructure codebase as a monorepo project.
Checklist