T
traeai
登录
返回首页
Towards Data Science

通过钩子实现跨代理的统一记忆层

8.5Score
通过钩子实现跨代理的统一记忆层

TL;DR · AI 摘要

文章提出通过钩子实现跨代理的统一记忆层,提升代码工具的可移植性和数据一致性。

核心要点

  • 使用钩子实现跨代理共享记忆层
  • Neo4j作为持久化存储方案
  • 钩子提供被动、确定性日志记录

结构提纲

按章节快速跳转。

  1. 讨论AI代码工具中代理框架的重要性及其局限性。

  2. ·MCP工具的局限性

    MCP工具依赖代理主动调用,存在不一致和依赖模型判断的问题。

  3. 钩子在生命周期事件中自动触发,提供标准化的集成方式。

  4. 使用Neo4j实现跨代理的持久化记忆存储。

  5. 钩子可用于会话开始、用户提交提示等关键事件的处理。

思维导图

用一张图看清主题之间的关系。

查看大纲文本(无障碍 / 无 JS 友好)
  • 统一代理记忆层
    • 技术挑战
      • 代理框架限制
    • 解决方案
      • 钩子机制
      • Neo4j存储
    • 实现方式
      • 生命周期事件处理
      • 数据持久化

金句 / Highlights

值得收藏与分享的关键句。

#AI#代码工具#架构设计
打开原文

that the main debate isn’t about when the next better model drops, but about who will build the right harness around them. A harness is the scaffolding around the model: _the agent loop, tool definitions, context management, memory, prompts, and workflows that turn a raw LLM into a useful product_. The model is the engine, the harness is everything that makes it actually drive. Examples of harnesses are Cursor, Claude Desktop, and others.

There’s a running debate in the AI coding tool space: does committing to a specific harness mean vendor lock-in? Memory is the sharpest edge of this. If your agent’s memory lives inside a closed harness or behind a proprietary API, you don’t really own it, and switching costs add up fast. But it doesn’t have to be that way.

The idea is for this blog post is simple: keep the memory layer outside the harness, and let any harness plug into it.

Image 1: Unified agentic memory diagram

Unified agentic memory design.

In this post, I’ll show how you can build asingle, shared memory layerthat works across three different coding agents — Claude Code, OpenAI’s Codex, and Cursor — usinghooksas the integration mechanism andNeo4jas the persistent store.

The code for hook integration is available on GitHub.

MCP tools can only get you so far with memory

MCP (Model Context Protocol) servers are the go-to answer for giving agents access to external systems. And they work. You can expose a Neo4j database as an MCP tool and let the agent query it when it decides to.

But MCP tools areagent-initiated. The model has to decide to call the tool, and it has to know when and why to do so. That means:

  • The agent needs to “remember to remember”, it must proactively decide to store something worth recalling later.
  • There’s no guarantee of consistency, one session might log everything, the next might log nothing.
  • You’re relying on the model’s judgment about what’s important for memory, in real time, while it’s busy doing something else.

What you really want ispassive, deterministic logging, which is something that captures every session event regardless of what the model is doing, without consuming any of its context or attention.

_This is exactly what hooks give you._

Image 2

Hooks allow you to write programmatical and deterministic flows based on predefined set of events.

Enter hooks

Hooks are shell commands that fire automatically on lifecycle events: when a session starts, when the user submits a prompt, before and after every tool use, and when the session ends. The agent doesn’t decide to call them, they run programatically.

The key insight is thathooks are remarkably standardized across providers. Claude Code, Codex, Cursor, and others all support essentially the same lifecycle events:

  • SessionStartfor when the agent session begins
  • UserPromptSubmit(orbeforeSubmitPromptin Cursor) for when the user sends a message
  • PreToolUse/PostToolUsefor before and after each tool call
  • Stopfor when the session ends

The hook receives a JSON payload on stdin with the session ID, event name, tool details, and user prompt. And the hook can emit JSON on stdout to inject additional context back into the conversation. Same contract, three harnesses/clients.

There are other hooks too, things like notification events, subagent stop, or pre-compact hooks, but we won’t be using those here.

Shared memory layer

Now we need somewhere to persist the memory. Quick disclaimer: I work at Neo4j, so we will be using it in this example.

Image 3

Session structure.

The model is straightforward. Each agent session is a node, connected to a linked list of event nodes, one per hook invocation. Events are typed by the lifecycle event that triggered them: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop. A session ends up as an ordered timeline of everything that happened during that run.

All five event types are written to the store, which gives you a complete audit trail of every session across every harness. Two of them are also injection points. SessionStart fires before the agent reads its system prompt, so anything the hook emits there gets prepended to the system prompt. That is how persistent, agent-level memory makes its way into context. UserPromptSubmit fires just before the user message is sent, and anything emitted there gets appended to the user prompt. That is the hook for turn-level context, like pulling in memories relevant to what the user just typed.

So, what happens if we start a new session in any of these harnesses with active hooks, for example Cursor?

Image 4

Example interactions in Cursor

_If we inspect the results in Neo4j browser._

Image 5

Example session persisted as graph in Neo4j.

One important constraint: hooks run outside the harness’s model session. You cannot reuse the LLM the agent is talking to. If you want LLM-powered work inside a hook you have to make your own model call, which adds latency to every event the agent fires. That is why the hooks here only do two things: log events and inject pre-computed memories. They stay fast and deterministic.

Dream phase

The actual memory work happens in a separate dream phase: extracting facts from sessions, summarizing what happened, updating the graph. This is just a batch job that runs every few hours, reads the events accumulated since the last run, and writes back to the memory store. You could in principle kick off a memory update asynchronously every time a session stops, but that feels like a bit too much; a periodic batch is simpler and works fine for this demonstration.

The dream job pulls every event since the session’s last watermark, hands them to Claude along with the current memory store, and asks it to write back a small set of durable notes. The notes themselves imitate a markdown wiki, the same shape Karpathy and others have been gravitating toward for personal LLM memory and the same shape Anthropic’s skills already use: each memory is a file at a semantic path like profile/role.md, tools/bash/common-flags.md, or project/neo4j-skills.md, with YAML frontmatter on top and prose underneath. Claude is told to merge rather than append, so a path is a living document, not a log; if new events contradict an old note, the old note gets rewritten. The result is a tree of small, self-contained markdown files a future session can read cold, indistinguishable in form from a skill, just authored by the dream phase instead of by hand.

If we run it on our example, we get the following memories created.

Image 6

Dream phase adding and editing memories.

And now if I opened a different harness, this time Claude Code Desktop with hooks activated, I would get the following response.

Image 7

Claude code desktop using the unified memory layer.

Accessing the memory

The final piece of the puzzle is allowing the agent to access the memory layer. As mentioned, there are two ways to inject information into the agent: hooks and MCP tools.

Image 8

Agent interacting with the memory layer through hooks and MCP tools.

Hooks are deterministic and run at the start of every session to populate the system prompt. This is where profile information and instructions on how to use memory efficiently should go. You can also append additional context when a user prompt submission event fires, but it’s append-only; you can’t manipulate other parts of the prompt.

MCP tools, on the other hand, give the LLM direct access to the memory layer on demand. Instead of passively receiving context at startup, the agent can search for relevant memories, store new information, and update or remove existing entries. Essentially, it’s basic CRUD over the abstracted markdown files stored in Neo4j.

In the end, I think you’ll almost always need both. In this project we only have hooks, no MCP tools, but you can always just plug in the official Neo4j MCP to let the agent explore the graph.

Getting it to work

Somewhat interesting, the way I set up the hooks was to point the agent in any of the harnesses and asked it to install hooks, but I’m sure there are better approaches as well.

Image 9

Cursor agent installing hooks.

Summary

If you don’t own your memory, you don’t own your agent. Every harness today builds its own walled garden of context, preferences, and session history. Switch them and you start from zero. That doesn’t have to be the case.

Hooks break that pattern. They let you write integrations that plug into any harness from the outside and the interface is remarkably consistent. Claude Code, Codex, and Cursor all fire the same lifecycle events: session start, prompt submission, tool use, session end. The hook receives JSON on stdin, optionally emits JSON on stdout to inject context, and that’s the entire contract. Because hooks run deterministically on every event, they don’t consume model attention or rely on the agent to decide what’s worth saving. The same two Python scripts handle all three clients; thin shell wrappers that pass a--clientflag are the only per-harness glue.

The architecture has three layers:

  1. Hooks (online)— passively log every event into Neo4j as a linked list per session. No model calls, no latency cost, just append.
  2. Dream phase (offline)— a batch job reads accumulated events, asks Claude to distill them into durable markdown memories, and writes them back. Memories are organized by topic and merged rather than appended, so they stay current instead of growing forever.
  3. Injection (online)— on the next session start in _any_ harness, profile memories are loaded into context. On each user prompt, relevant memories are searched and appended automatically.

The result is a memory layer that sits below all three harnesses, works without any of them knowing about the others, and belongs entirely to you. You can switch from Cursor to Claude Code to Codex mid-project and pick up exactly where you left off. Your agent’s understanding of who you are, what you’re working on, and how you prefer to work follows you, not the tool.

Code is available here.

_P.s.: All images are created by the author._

AI 可能会生成不准确的信息,请核实重要内容