Memory system feature #564

lucas-oma · 2025-09-08T05:59:26Z

Description

This PR introduces a Memory System for AIRI, enabling it to access and utilize long-term context to generate more relevant and consistent responses. This is still in early stage, hence lot of optimization opportunity available, and that is one of the reasons for the PR (so admins could provide feedback and optimization ideas)

Note: it has heavily based/inspired by the telegram-bot memory

The core functionality is a new context-building pipeline that runs before each LLM call. As detailed in the sequence diagram:

sequenceDiagram
  actor Alice as User
  participant P1 as Backend Service: <br>API Server
  participant P2 as LLM Providers<br>(multiple)
  participant P3 as Backend Service: <br>Context Building
  participant P4 as Postgres DB

  Alice ->> P1: 1. new msg
  P1 ->> P2: 2. generate_embedding(msg)
  P2 ->> P1: 3. return msg_embedding
  P1 ->> P3: 4. msg_embedding
  par 🧠 Context building
    P3 ->> P4: 5a. get last 10 messgaes + responses
    P3 ->> P4: 5b. get top 5 most relevant
    P3 ->> P4: 5c. get top 10 most relevant memory_fragments
    P3 ->> P4: 5d. get top 10 most memory_associated
    P3 ->> P4: 5e. get top 10 most memory_consolidated
    P3 ->> P4: 5f. get all memory_entities
    P3 ->> P4: 5g. get all memory_long_term_goals
    P3 ->> P4: 5h. get all memory_short_term_ideas
  end
  P4 ->> P3: 6. return context data
  P3 ->> P3: 7. assemble context
  P3 ->> P1: 8a.return assembled context
  P1 ->> Alice: 8b. return assembled context
  Alice ->> P2: 9. send msg + context
  P2 ->> Alice: 10. return LLM response

When a new message arrives, it is first converted into a numerical embedding.
This embedding is used to query the PostgreSQL database to retrieve various types of context, including recent messages, relevant memory fragments, consolidated memories, and key entities and goals.
This assembled context is then passed to the LLM (message + context) to inform its response, allowing it to "remember" past conversations and user preferences. Here we can optimize by doing this call server side, hence not returning back to user for it to make another call)

This is the main functional change in this PR.

Additional Changes

This PR also includes a few nice updates:

The chat history loads (some) past messages on init.
A "Load More History" button when scrolled to the top to load more (past) messages.

Linked Issues

#387

Additional Context

This is not the final version, but a working architecture which should be easy to upgrade (context building logic is concentrated in a single file).
Some memory tables are not yet used for context building (or not even populated)
I have some more ideas for upgrades and optimizations, these are mentioned in the memory-service-overview.md file
Memory service Readme file

netlify · 2025-09-08T05:59:31Z

✅ Deploy Preview for airi-docs ready!

Name	Link
🔨 Latest commit
🔍 Latest deploy log	https://app.netlify.com/projects/airi-docs/deploys/68c265b03546694d3c3d9663
😎 Deploy Preview	https://deploy-preview-564--airi-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2025-09-08T05:59:31Z

✅ Deploy Preview for airi-vtuber ready!

Name	Link
🔨 Latest commit
🔍 Latest deploy log	https://app.netlify.com/projects/airi-vtuber/deploys/68c265b32123bd562faf7383
😎 Deploy Preview	https://deploy-preview-564--airi-vtuber.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

lucas-oma · 2025-09-08T06:00:08Z

Feedback is welcome and expected! I'm particularly interested in any thoughts you might have on the current (very basic) context-building system (Airi's "brain"). I have some ideas for how to make it more efficient, and I'm looking forward to collaborating on them.

gemini-code-assist

Summary of Changes

Hello @lucas-oma, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades AIRI's intelligence by integrating a robust memory system. This system allows AIRI to retain and retrieve information from past conversations, enabling it to generate more contextually aware and consistent responses. The changes span both the frontend, with improved chat history navigation, and a new dedicated backend service responsible for processing, storing, and retrieving conversational memory using advanced AI models and a PostgreSQL vector database.

Highlights

Core Memory System: Introduces a new "Memory System" for AIRI, enabling it to access and utilize long-term context for more relevant and consistent responses.
Context-Building Pipeline: Implements a new pipeline that runs before each LLM call, using message embeddings to query a PostgreSQL database for various context types (recent messages, relevant memory fragments, consolidated memories, entities, and goals).
Enhanced Chat History UI: Adds functionality to load initial chat history messages and includes a "Load More History" button for infinite scrolling of past conversations.
Comprehensive Memory Service Configuration: Introduces a dedicated settings page in the web UI for configuring the memory service, including LLM and embedding providers, connection testing, and monitoring embedding regeneration progress.
New Backend Memory Service: Adds a standalone Node.js backend service (services/memory-service) with a PostgreSQL database (using pgvector for vector embeddings) to manage message ingestion, AI response storage, memory processing, and context building.
Developer Tooling: Includes new pnpm scripts (dev:memory, dev:with-memory) and Docker Compose configurations for easier local development and deployment of the memory service.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an impressive pull request that introduces a significant new feature: a comprehensive memory system for AIRI. The architecture is well-designed, featuring a dedicated backend service, a robust database schema with vector support, and a sophisticated context-building (RAG) pipeline. The separation of concerns, attention to performance with background processing and smart batching, and clear documentation are all commendable.

My review focuses on a few areas for improvement:

Robustness: I've identified a potential issue in how memory fragments are linked to other data, which could be brittle.
Consistency: There are minor inconsistencies between desktop and mobile components that could be unified.
Code Quality: I've suggested refactoring some duplicated code and improving type safety.
Architecture: I've also reiterated a point made by the author about a key future optimization to reduce network latency.

Overall, this is a very strong foundation for the memory system, and the proposed changes will help make it even more robust and maintainable.

gemini-code-assist · 2025-09-08T06:02:26Z

packages/stage-ui/src/stores/llm.ts

+  // TODO [lucas-oma]: optimize this function
+  // Right now, it fetches context and then makes LLM call for AI response,
+  // this can and should be integrated into a new streamText call/function and the server API should be removed and handled
+  // inside the message ingestion API.


As noted in the TODO and the PR description, the current context-building flow involves an extra round trip between the client and the memory service before the final LLM call. This adds significant latency. Prioritizing the optimization to a server-side flow (client -> memory_service -> llm_provider) will be critical for a good user experience. This change would streamline the process, reduce latency, and also move API keys and complex logic entirely to the backend.

gemini-code-assist · 2025-09-08T06:02:26Z

services/memory-service/src/services/llm-memory-manager.ts

+        // Build entity relations if we have entities and entity relations data
+        if (createdEntities.length > 0 && structuredData.entityRelations) {
+          const entityMap = new Map(createdEntities.map(e => [e.name, e.id]))
+          const fragmentMap = new Map(createdFragments.map((f, i) => [structuredData.memoryFragments[i]?.content, f.id]))


In updateMemoryTables, you create a map from fragment content to fragment ID to link entity relations. This approach is brittle because if the LLM generates two memory fragments with identical content, the map will only store the last one, and subsequent relations might be linked incorrectly or missed. A more robust approach would be to have the LLM return relations linked by an index or a temporary ID, or to process fragments and their relations together to avoid relying on content as a unique key.

gemini-code-assist · 2025-09-08T06:02:26Z

apps/stage-web/src/components/Widgets/ChatHistory.vue


-const { onBeforeMessageComposed, onTokenLiteral } = useChatStore()
+// Patch for eslint lintern
+console.warn(!!loadingInitialHistory)


Using console.warn to bypass linter warnings for unused variables should be avoided in production code. If the variable is truly unused, it should be removed. If it's temporarily unused but needed for the future, consider prefixing it with an underscore (e.g., _loadingInitialHistory) if your linter is configured for it, or use a specific // eslint-disable-next-line comment with a clear explanation.

gemini-code-assist · 2025-09-08T06:02:26Z

apps/stage-web/src/components/Widgets/ChatHistory.vue

+// Load more history without changing scroll position
+async function handleLoadMore() {
+  if (!chatHistoryRef.value)
+    return
+
+  // Store current scroll height and position
+  const oldScrollHeight = chatHistoryRef.value.scrollHeight
+  const oldScrollTop = chatHistoryRef.value.scrollTop
+
+  // Load more history
+  await loadMoreHistory()
+
+  // After new content is loaded, adjust scroll position to maintain relative position
+  await nextTick()
+  if (chatHistoryRef.value) {
+    const newScrollHeight = chatHistoryRef.value.scrollHeight
+    const heightDiff = newScrollHeight - oldScrollHeight
+    chatHistoryRef.value.scrollTop = oldScrollTop + heightDiff
+  }
+}


The handleLoadMore implementation here calculates scroll position based on height differences. In MobileChatHistory.vue, a more robust "anchor" method is used, which tracks the first visible element. The anchor method is generally less prone to "jiggle" or incorrect scroll positioning. For consistency and robustness, consider adopting the anchor-based implementation from MobileChatHistory.vue in this component as well.

gemini-code-assist · 2025-09-08T06:02:26Z

apps/stage-web/src/components/Widgets/MobileChatHistory.vue


-const { onBeforeMessageComposed, onTokenLiteral } = useChatStore()
+// Patch for eslint lintern
+console.warn(!!sending, !!streamingMessage)


Using console.warn to bypass linter warnings for unused variables should be avoided in production code. If the variables are truly unused, they should be removed. If they are temporarily unused but needed for the future, consider prefixing them with an underscore (e.g., _sending) if your linter is configured for it, or use a specific // eslint-disable-next-line comment with a clear explanation.

gemini-code-assist · 2025-09-08T06:02:27Z

apps/stage-web/src/pages/settings/memory/index.vue

+  // Force check for changes on mount
+  const hasChanges
+    = tempEmbeddingProvider.value !== embeddingProvider.value
+      || tempEmbeddingModel.value !== embeddingModel.value
+      || tempEmbeddingDim.value !== embeddingDim.value
+      || tempEmbeddingApiKey.value !== embeddingApiKey.value
+
+  settingsChanged.value = hasChanges
+  showRegenerationWarning.value = hasChanges


The logic here to detect changes on mount appears to be redundant. The watch hook on lines 178-196 is configured with immediate: true, which means it will execute once when the component is created and perform the initial check. You can likely remove this change detection logic from onMounted to avoid code duplication and rely solely on the watcher.

gemini-code-assist · 2025-09-08T06:02:27Z

packages/stage-ui/src/stores/chat.ts

+    const oldestMessage = messages.value
+      .filter(msg => msg.role !== 'system')
+      .sort((a, b) => ((a as any).created_at || 0) - ((b as any).created_at || 0))[0]


The use of (a as any).created_at indicates a potential type issue. While the created_at property is added dynamically when loading history, it's not part of the base ChatMessage type, forcing the use of any. To improve type safety, consider extending the UserMessage and ChatAssistantMessage types to include an optional created_at: number property. This would remove the need for type casting and make the code more robust.

nekomeowww · 2025-09-09T18:44:34Z

Rebase needed.

lucas-oma · 2025-09-10T00:41:42Z

@nekomeowww Rebased and updated

Tho I see some workflows here in github are failing, should I look into that?

skirkru · 2025-09-11T04:17:52Z

@lucas-oma from diagram, steps 1-8, I can see you change how it interact with llm, which add "context builder" before its actual send message to the llm

So this mean user will be forced to use embedding and postgresql?
For newbies these two probably will be a major roadblock

"What is an API key" is already a problem for newbies

and what about web based one? like https://airi.moeru.ai/
surely it doesnt have those access

Im not saying the implementation was bad, but Im trying to become a non tech weebs who trying to install AIRI later

lucas-oma · 2025-09-11T04:35:48Z

Hi @skirkru , you are right that this type of setup might be a bit difficult for non-techs, although for creating a decent memory system I think embeddings and a vector database are needed.

So this mean user will be forced to use embedding and postgresql?

My implementation is managed in the settings page, and must be activated via a checkbox from there, if not activated then the behavior is as usual.

Still you raise a valid point, maybe DuckDB-Wasm can be used instead (or as an alternative) of postgres, then having this extra postgres step will not be necessary. Not sure about the performance when talking about vectors in DuckDB tho, but I can give it a try (probably non-techs will prefer this kind of approach)

Jhon4561488228 · 2025-09-11T05:16:21Z

Hi @skirkru , you are right that this type of setup might be a bit difficult for non-techs, although for creating a decent memory system I think embeddings and a vector database are needed.

So this mean user will be forced to use embedding and postgresql?

My implementation is managed in the settings page, and must be activated via a checkbox from there, if not activated then the behavior is as usual.

Still you raise a valid point, maybe DuckDB-Wasm can be used instead (or as an alternative) of postgres, then having this extra postgres step will not be necessary. Not sure about the performance when talking about vectors in DuckDB tho, but I can give it a try (probably non-techs will prefer this kind of approach)

You are right, ChromaDB is needed for good long-term memory storage.

Together with another guy, we are currently trying to implement and locally embed Short-term and Long-term memory with various AGI features (determining the user's emotions, adjusting to their emotions, and improved semantic search and storage of memory by emotional parameters and others, up to adapting the AI voice to the user's mood)

But we use Rust as a base for storing all memory locally, we tried to raise local servers through Py scripts, but they crashed from the received data volume, besides, there will be an obvious limitation on data storage, so I think Rust local storage is the best option.
Short-term memory - we use SQL (it will be slower than RAM, but the context will be saved much longer)
Long-term memory - CromaDB (full integration of mem0 and partially memvid for video storage of mp4 with the h265 codec (ultra compression, ffmpeg is required))

skirkru · 2025-09-11T05:39:54Z

Hi @lucas-oma, yes, embedding and vector database are essential for memory, but my main concern are that for non tech people, who want to test it on browser first

While for embedding it can use the current provider list, but not all provider like open router provide embedding

I think the stack should be discussed with @nekomeowww, while the current model list are using indexeddb to store its files.

Also I think that we also should able to use both database format, and can be choosen in the setting, so there are options for non tech friendly and for someone who need performant database

gg582 · 2025-09-14T12:20:34Z

How about using PGLite?

gg582 · 2025-09-14T12:35:34Z

Wait, but if we use PGlite we need to rewrite all queries as it does not support drizzle ORM

gg582 · 2025-09-14T15:17:01Z

Would you mind checking PR #589 ?
This uses embedded postgres.
Thanks,

nekomeowww · 2025-09-21T05:52:13Z

Wait, but if we use PGlite we need to rewrite all queries as it does not support drizzle ORM

They support. https://orm.drizzle.team/docs/connect-pglite
Even if they do not support, you can use new PgDialect() and new QueryBuilder() from drizzle-orm/pg-core for this.

gg582 · 2025-09-21T14:28:24Z

Cool. Then is it better to rewrite #597 to use PGLite? Currently it is using embedded postgres and the last step is packaging(considering pkg, or tauri sidecar)

nekomeowww · 2025-09-23T07:50:16Z

Cool. Then is it better to rewrite #597 to use PGLite? Currently it is using embedded postgres and the last step is packaging(considering pkg, or tauri sidecar)

Yes.

nekomeowww · 2025-09-23T07:53:58Z

Rebase needed.
Generally LGTM. I think this PR have no issues once merged with #597. I can improve it later.

gg582 · 2025-09-23T07:55:19Z

Thank you! And now we need a last step to implement PGLite support :)

gg582 · 2025-09-23T07:56:17Z

My idea is to support both variants, Embedded Postgres and PGLite(since they have different purpose)

gg582 · 2025-09-23T18:45:35Z

I am currently implementing PGLite support on #597

lucas-oma · 2025-10-10T22:17:43Z

Hey, sorry I've been kinda busy lately, will return to this and related subject in the next days.

Was the PGLite feature merged into main or is it still in development? @nekomeowww @gg582

Also any opinions on using duckdb instead? (not sure about its vector capabilities but might be a way to integrate a local memory system for people that prefer a plug-and-play approach)

gg582 · 2025-10-11T01:19:32Z

Yes, duckdb will be fine. I have my pglite merged branch on my forked repo. Can I make a PR to your fork? :) or you can use #597.

gg582 · 2025-10-11T12:52:04Z

I reopened #597 in advance.
You can check this and partially/fully merge.
Also, Memory separation per model cards hasn't implemented.
Can we use model's name as a identifier?
We can separate chat history and model by simply adding a delimeter.

gg582 · 2025-10-13T03:56:34Z

Also, I've got a request to move this into /packages...can we do it?

gg582 · 2025-10-13T03:58:51Z

but if we rewrite it in duckdb, we should rewrite backup functionality, etc...I've made an export function to manage history in #597, and I think it will be fine after adding memory separation per model cards...And it is a small request...can you write a db structure when it's changed? I'd like to reform my memory backup to changed database...

gg582 · 2025-10-24T13:43:36Z

So..you are still working on it, right? I hope that I can get how much have you done.

gemini-code-assist bot reviewed Sep 8, 2025

View reviewed changes

lucas-oma added 4 commits September 9, 2025 20:34

feat: Introduce memory system (v0)

defae7b

Remove old docs

9a59782

Updat README for memory service

13f8c4b

Update README: remove few sections

e539c98

lucas-oma force-pushed the memory-system-feature branch from 70a9953 to e539c98 Compare September 10, 2025 00:37

lucas-oma mentioned this pull request Sep 10, 2025

Memory System Architecture Implementation Alternatives #387

Open

3 tasks

gg582 mentioned this pull request Sep 14, 2025

Memory System Feature: Add Embedded Postgres implementation for PR #564 #589

Closed

Uh oh!

Memory system feature #564

Are you sure you want to change the base?

Memory system feature #564

Conversation

lucas-oma commented Sep 8, 2025

Description

Additional Changes

Linked Issues

Additional Context

Uh oh!

netlify bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for airi-docs ready!

Uh oh!

netlify bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for airi-vtuber ready!

Uh oh!

lucas-oma commented Sep 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

nekomeowww commented Sep 9, 2025

Uh oh!

lucas-oma commented Sep 10, 2025

Uh oh!

skirkru commented Sep 11, 2025

Uh oh!

lucas-oma commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jhon4561488228 commented Sep 11, 2025

Uh oh!

skirkru commented Sep 11, 2025

Uh oh!

gg582 commented Sep 14, 2025

Uh oh!

gg582 commented Sep 14, 2025

Uh oh!

gg582 commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nekomeowww commented Sep 21, 2025

Uh oh!

gg582 commented Sep 21, 2025

Uh oh!

nekomeowww commented Sep 23, 2025

Uh oh!

nekomeowww commented Sep 23, 2025

Uh oh!

netlify bot commented Sep 8, 2025 •

edited

Loading

netlify bot commented Sep 8, 2025 •

edited

Loading

lucas-oma commented Sep 11, 2025 •

edited

Loading

gg582 commented Sep 14, 2025 •

edited

Loading

gg582 commented Sep 23, 2025 •

edited

Loading

gg582 commented Oct 11, 2025 •

edited

Loading

gg582 commented Oct 11, 2025 •

edited

Loading

gg582 commented Oct 13, 2025 •

edited

Loading