-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Memory system feature #564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for airi-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for airi-vtuber ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Feedback is welcome and expected! I'm particularly interested in any thoughts you might have on the current (very basic) context-building system (Airi's "brain"). I have some ideas for how to make it more efficient, and I'm looking forward to collaborating on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lucas-oma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly upgrades AIRI's intelligence by integrating a robust memory system. This system allows AIRI to retain and retrieve information from past conversations, enabling it to generate more contextually aware and consistent responses. The changes span both the frontend, with improved chat history navigation, and a new dedicated backend service responsible for processing, storing, and retrieving conversational memory using advanced AI models and a PostgreSQL vector database.
Highlights
- Core Memory System: Introduces a new "Memory System" for AIRI, enabling it to access and utilize long-term context for more relevant and consistent responses.
- Context-Building Pipeline: Implements a new pipeline that runs before each LLM call, using message embeddings to query a PostgreSQL database for various context types (recent messages, relevant memory fragments, consolidated memories, entities, and goals).
- Enhanced Chat History UI: Adds functionality to load initial chat history messages and includes a "Load More History" button for infinite scrolling of past conversations.
- Comprehensive Memory Service Configuration: Introduces a dedicated settings page in the web UI for configuring the memory service, including LLM and embedding providers, connection testing, and monitoring embedding regeneration progress.
- New Backend Memory Service: Adds a standalone Node.js backend service (services/memory-service) with a PostgreSQL database (using pgvector for vector embeddings) to manage message ingestion, AI response storage, memory processing, and context building.
- Developer Tooling: Includes new pnpm scripts (dev:memory, dev:with-memory) and Docker Compose configurations for easier local development and deployment of the memory service.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an impressive pull request that introduces a significant new feature: a comprehensive memory system for AIRI. The architecture is well-designed, featuring a dedicated backend service, a robust database schema with vector support, and a sophisticated context-building (RAG) pipeline. The separation of concerns, attention to performance with background processing and smart batching, and clear documentation are all commendable.
My review focuses on a few areas for improvement:
- Robustness: I've identified a potential issue in how memory fragments are linked to other data, which could be brittle.
- Consistency: There are minor inconsistencies between desktop and mobile components that could be unified.
- Code Quality: I've suggested refactoring some duplicated code and improving type safety.
- Architecture: I've also reiterated a point made by the author about a key future optimization to reduce network latency.
Overall, this is a very strong foundation for the memory system, and the proposed changes will help make it even more robust and maintainable.
| // TODO [lucas-oma]: optimize this function | ||
| // Right now, it fetches context and then makes LLM call for AI response, | ||
| // this can and should be integrated into a new streamText call/function and the server API should be removed and handled | ||
| // inside the message ingestion API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in the TODO and the PR description, the current context-building flow involves an extra round trip between the client and the memory service before the final LLM call. This adds significant latency. Prioritizing the optimization to a server-side flow (client -> memory_service -> llm_provider) will be critical for a good user experience. This change would streamline the process, reduce latency, and also move API keys and complex logic entirely to the backend.
| // Build entity relations if we have entities and entity relations data | ||
| if (createdEntities.length > 0 && structuredData.entityRelations) { | ||
| const entityMap = new Map(createdEntities.map(e => [e.name, e.id])) | ||
| const fragmentMap = new Map(createdFragments.map((f, i) => [structuredData.memoryFragments[i]?.content, f.id])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In updateMemoryTables, you create a map from fragment content to fragment ID to link entity relations. This approach is brittle because if the LLM generates two memory fragments with identical content, the map will only store the last one, and subsequent relations might be linked incorrectly or missed. A more robust approach would be to have the LLM return relations linked by an index or a temporary ID, or to process fragments and their relations together to avoid relying on content as a unique key.
| const { onBeforeMessageComposed, onTokenLiteral } = useChatStore() | ||
| // Patch for eslint lintern | ||
| console.warn(!!loadingInitialHistory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using console.warn to bypass linter warnings for unused variables should be avoided in production code. If the variable is truly unused, it should be removed. If it's temporarily unused but needed for the future, consider prefixing it with an underscore (e.g., _loadingInitialHistory) if your linter is configured for it, or use a specific // eslint-disable-next-line comment with a clear explanation.
| // Load more history without changing scroll position | ||
| async function handleLoadMore() { | ||
| if (!chatHistoryRef.value) | ||
| return | ||
| // Store current scroll height and position | ||
| const oldScrollHeight = chatHistoryRef.value.scrollHeight | ||
| const oldScrollTop = chatHistoryRef.value.scrollTop | ||
| // Load more history | ||
| await loadMoreHistory() | ||
| // After new content is loaded, adjust scroll position to maintain relative position | ||
| await nextTick() | ||
| if (chatHistoryRef.value) { | ||
| const newScrollHeight = chatHistoryRef.value.scrollHeight | ||
| const heightDiff = newScrollHeight - oldScrollHeight | ||
| chatHistoryRef.value.scrollTop = oldScrollTop + heightDiff | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handleLoadMore implementation here calculates scroll position based on height differences. In MobileChatHistory.vue, a more robust "anchor" method is used, which tracks the first visible element. The anchor method is generally less prone to "jiggle" or incorrect scroll positioning. For consistency and robustness, consider adopting the anchor-based implementation from MobileChatHistory.vue in this component as well.
| const { onBeforeMessageComposed, onTokenLiteral } = useChatStore() | ||
| // Patch for eslint lintern | ||
| console.warn(!!sending, !!streamingMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using console.warn to bypass linter warnings for unused variables should be avoided in production code. If the variables are truly unused, they should be removed. If they are temporarily unused but needed for the future, consider prefixing them with an underscore (e.g., _sending) if your linter is configured for it, or use a specific // eslint-disable-next-line comment with a clear explanation.
| // Force check for changes on mount | ||
| const hasChanges | ||
| = tempEmbeddingProvider.value !== embeddingProvider.value | ||
| || tempEmbeddingModel.value !== embeddingModel.value | ||
| || tempEmbeddingDim.value !== embeddingDim.value | ||
| || tempEmbeddingApiKey.value !== embeddingApiKey.value | ||
| settingsChanged.value = hasChanges | ||
| showRegenerationWarning.value = hasChanges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here to detect changes on mount appears to be redundant. The watch hook on lines 178-196 is configured with immediate: true, which means it will execute once when the component is created and perform the initial check. You can likely remove this change detection logic from onMounted to avoid code duplication and rely solely on the watcher.
| const oldestMessage = messages.value | ||
| .filter(msg => msg.role !== 'system') | ||
| .sort((a, b) => ((a as any).created_at || 0) - ((b as any).created_at || 0))[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of (a as any).created_at indicates a potential type issue. While the created_at property is added dynamically when loading history, it's not part of the base ChatMessage type, forcing the use of any. To improve type safety, consider extending the UserMessage and ChatAssistantMessage types to include an optional created_at: number property. This would remove the need for type casting and make the code more robust.
|
Rebase needed. |
70a9953 to
e539c98
Compare
|
@nekomeowww Rebased and updated Tho I see some workflows here in github are failing, should I look into that? |
|
@lucas-oma from diagram, steps 1-8, I can see you change how it interact with llm, which add "context builder" before its actual send message to the llm So this mean user will be forced to use embedding and postgresql? "What is an API key" is already a problem for newbies and what about web based one? like https://airi.moeru.ai/ Im not saying the implementation was bad, but Im trying to become a non tech weebs who trying to install AIRI later |
|
Hi @skirkru , you are right that this type of setup might be a bit difficult for non-techs, although for creating a decent memory system I think embeddings and a vector database are needed.
My implementation is managed in the settings page, and must be activated via a checkbox from there, if not activated then the behavior is as usual. Still you raise a valid point, maybe DuckDB-Wasm can be used instead (or as an alternative) of postgres, then having this extra postgres step will not be necessary. Not sure about the performance when talking about vectors in DuckDB tho, but I can give it a try (probably non-techs will prefer this kind of approach) |
You are right, ChromaDB is needed for good long-term memory storage. Together with another guy, we are currently trying to implement and locally embed Short-term and Long-term memory with various AGI features (determining the user's emotions, adjusting to their emotions, and improved semantic search and storage of memory by emotional parameters and others, up to adapting the AI voice to the user's mood) But we use Rust as a base for storing all memory locally, we tried to raise local servers through Py scripts, but they crashed from the received data volume, besides, there will be an obvious limitation on data storage, so I think Rust local storage is the best option. |
|
Hi @lucas-oma, yes, embedding and vector database are essential for memory, but my main concern are that for non tech people, who want to test it on browser first While for embedding it can use the current provider list, but not all provider like open router provide embedding I think the stack should be discussed with @nekomeowww, while the current model list are using indexeddb to store its files. Also I think that we also should able to use both database format, and can be choosen in the setting, so there are options for non tech friendly and for someone who need performant database |
|
How about using PGLite? |
|
Wait, but if we use PGlite we need to rewrite all queries as it does not support drizzle ORM |
|
Would you mind checking PR #589 ? |
They support. https://orm.drizzle.team/docs/connect-pglite |
|
Cool. Then is it better to rewrite #597 to use PGLite? Currently it is using embedded postgres and the last step is packaging(considering pkg, or tauri sidecar) |
Yes. |
|
Rebase needed. |
|
Thank you! And now we need a last step to implement PGLite support :) |
|
My idea is to support both variants, Embedded Postgres and PGLite(since they have different purpose) |
|
I am currently implementing PGLite support on #597 |
|
Hey, sorry I've been kinda busy lately, will return to this and related subject in the next days. Was the PGLite feature merged into main or is it still in development? @nekomeowww @gg582 Also any opinions on using duckdb instead? (not sure about its vector capabilities but might be a way to integrate a local memory system for people that prefer a plug-and-play approach) |
|
Yes, duckdb will be fine. I have my pglite merged branch on my forked repo. Can I make a PR to your fork? :) or you can use #597. |
|
I reopened #597 in advance. |
|
Also, I've got a request to move this into /packages...can we do it? |
|
but if we rewrite it in duckdb, we should rewrite backup functionality, etc...I've made an export function to manage history in #597, and I think it will be fine after adding memory separation per model cards...And it is a small request...can you write a db structure when it's changed? I'd like to reform my memory backup to changed database... |
|
So..you are still working on it, right? I hope that I can get how much have you done. |
Description
This PR introduces a Memory System for AIRI, enabling it to access and utilize long-term context to generate more relevant and consistent responses. This is still in early stage, hence lot of optimization opportunity available, and that is one of the reasons for the PR (so admins could provide feedback and optimization ideas)
The core functionality is a new context-building pipeline that runs before each LLM call. As detailed in the sequence diagram:
sequenceDiagram actor Alice as User participant P1 as Backend Service: <br>API Server participant P2 as LLM Providers<br>(multiple) participant P3 as Backend Service: <br>Context Building participant P4 as Postgres DB Alice ->> P1: 1. new msg P1 ->> P2: 2. generate_embedding(msg) P2 ->> P1: 3. return msg_embedding P1 ->> P3: 4. msg_embedding par 🧠 Context building P3 ->> P4: 5a. get last 10 messgaes + responses P3 ->> P4: 5b. get top 5 most relevant P3 ->> P4: 5c. get top 10 most relevant memory_fragments P3 ->> P4: 5d. get top 10 most memory_associated P3 ->> P4: 5e. get top 10 most memory_consolidated P3 ->> P4: 5f. get all memory_entities P3 ->> P4: 5g. get all memory_long_term_goals P3 ->> P4: 5h. get all memory_short_term_ideas end P4 ->> P3: 6. return context data P3 ->> P3: 7. assemble context P3 ->> P1: 8a.return assembled context P1 ->> Alice: 8b. return assembled context Alice ->> P2: 9. send msg + context P2 ->> Alice: 10. return LLM responseThis is the main functional change in this PR.
Additional Changes
This PR also includes a few nice updates:
Linked Issues
#387
Additional Context