A comprehensive study repository for exploring LangChain, RAG (Retrieval-Augmented Generation), embeddings, and semantic vector search techniques with practical implementations.
This repository contains hands-on experiments and production-ready implementations of modern AI techniques, focusing on:
- π LangChain Framework: Building sophisticated LLM applications
- π RAG (Retrieval-Augmented Generation): Enhancing LLM responses with relevant context
- π§ Vector Embeddings: Converting text into semantic representations
- π Semantic Search: Finding relevant documents using meaning, not just keywords
- π Document Processing: PDF parsing, chunking, and vectorization strategies
- PDF Processing: Extract and process text from PDF documents
- Vector Embeddings: Convert text chunks into semantic vectors using OpenAI
- Similarity Search: Find relevant content using semantic similarity
- Interactive Chat: Build a conversational interface with context-aware responses
- Structured Data: Work with JSON-based FAQ datasets
- Multi-Category Search: Handle different types of questions (product, service, technical)
- Production-Ready Chatbot: Implement a robust FAQ answering system
- Context Retrieval: Smart document retrieval for accurate responses
- TypeScript - Type-safe JavaScript development
- LangChain - Framework for building LLM applications
- OpenAI GPT-4 - Advanced language model for text generation
- OpenAI Embeddings - Text-to-vector conversion
- PostgreSQL - Vector database for storing embeddings
- Drizzle ORM - Type-safe database operations
- PDF-Parse - PDF document processing
- Node.js 18+
- PostgreSQL database
- OpenAI API key
- Clone the repository
git clone https://github.com/Natanaelvich/langchain-rag-embeddings-study.git
cd langchain-rag-embeddings-study- Install dependencies
npm install- Set up environment variables
cp .env.example .env
# Edit .env with your OpenAI API key and database credentials- Run the examples
PDF Processing & Chat:
npm run dev src/01-introduction/gpt-embeddings-pdf.tsFAQ Chatbot:
npm run dev src/02-real-world-faq/chat-faq.tssrc/
βββ 01-introduction/          # Basic embeddings and PDF processing
β   βββ gpt-embeddings-pdf.ts # Interactive chat with PDF content
β   βββ load-embeddings-pdf.ts # PDF loading and vectorization
β   βββ search-embeddings-pdf.ts # Vector search implementation
βββ 02-real-world-faq/        # Production FAQ system
β   βββ chat-faq.ts          # Interactive FAQ chatbot
β   βββ load-faq-data.ts     # FAQ data loading and processing
β   βββ search-faq.ts        # FAQ-specific search logic
βββ schema.ts                # Database schema definitions
tmp/
βββ agents-data/             # Sample data for agents
βββ faq-data/               # FAQ datasets (product, service, technical)
βββ pdf/                    # PDF documents for processing
- npm run dev- Start development server with hot reload
- npm run build- Build TypeScript to JavaScript
- npm run start- Run built application
- npm run test- Run test suite
- npm run lint- Check code quality
- npm run format- Format code with Prettier
- npm run studio- Open Drizzle Studio for database management
- Start with 01-introduction/to understand basic concepts
- Learn about embeddings and vector search
- Build your first RAG application
- Explore 02-real-world-faq/for production patterns
- Understand structured data processing
- Implement multi-category search
- Customize the implementations for your use case
- Add new data sources and processing pipelines
- Optimize performance and accuracy
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repository if you found it helpful for your AI/ML journey!