𝗛𝗲𝗿𝗲'𝘀 𝗮 𝗰𝗼𝘀𝘁 𝘁𝗿𝗶𝗰𝗸 𝗺𝗼𝘀𝘁 𝘁𝗲𝗮𝗺𝘀 𝗺𝗶𝘀𝘀 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲𝗶𝗿 𝘃𝗲𝗰𝘁𝗼𝗿 ...

TL;DR · AI 摘要
Milvus 提出通过 compaction(段合并与物理删除)和 TTL(自动过期)两项内置机制,可显著降低向量数据库存储成本,尤其适用于会话数据、时效性 RAG 等有生命周期的数据场景。
核心要点
- 向量数据库中逻辑删除不释放磁盘空间,导致存储膨胀达2–5倍
- Compaction 物理合并段并清除已删向量,需避开业务高峰执行
- TTL 可自动清理过期数据,配合 compaction 实现零手动运维的成本优化
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- Milvus 存储成本优化
- 问题根源
- 追加写系统逻辑删除不释放磁盘
- 段堆积 → 查询扫描无效行 → 存储膨胀2–5x
- 关键技术
- Compaction:合并段 + 物理删除
- TTL:按时间自动过期
- 适用场景
- 会话数据
- 时效性 RAG
- 有生命周期的工作负载
金句 / Highlights
值得收藏与分享的关键句。
In append-only systems like Milvus, deleting a vector only flags it: the data stays on disk.
Segments keep piling up, queries scan rows that no longer exist, and storage quietly grows 2–5x beyond your actual live data.
Compaction handles the rest. If you're running session data, time-sensitive RAG, or any workload where data has a shelf life, these two settings alone can cut storage costs dramatically.
𝗪𝗲'𝘃𝗲 𝗻𝗼𝘁𝗶𝗰𝗲𝗱 𝗺𝗼𝘀𝘁 𝘁𝗲𝗮𝗺𝘀 𝗻𝗲𝘃𝗲𝗿 𝘁𝗵𝗶𝗻𝗸 𝗮𝗯𝗼𝘂𝘁 𝘃𝗲𝗰𝘁𝗼𝗿 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗰𝗼𝘀𝘁𝘀 https://t.co/s0PCsgxGIs" / X
𝗛𝗲𝗿𝗲'𝘀 𝗮 𝗰𝗼𝘀𝘁 𝘁𝗿𝗶𝗰𝗸 𝗺𝗼𝘀𝘁 𝘁𝗲𝗮𝗺𝘀 𝗺𝗶𝘀𝘀 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲𝗶𝗿 𝘃𝗲𝗰𝘁𝗼𝗿 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲: 𝘂𝘀𝗶𝗻𝗴 𝗰𝗼𝗺𝗽𝗮𝗰𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗧𝗧𝗟. 𝗪𝗲'𝘃𝗲 𝗻𝗼𝘁𝗶𝗰𝗲𝗱 𝗺𝗼𝘀𝘁 𝘁𝗲𝗮𝗺𝘀 𝗻𝗲𝘃𝗲𝗿 𝘁𝗵𝗶𝗻𝗸 𝗮𝗯𝗼𝘂𝘁 𝘃𝗲𝗰𝘁𝗼𝗿 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗰𝗼𝘀𝘁𝘀 𝘂𝗻𝘁𝗶𝗹 𝘁𝗵𝗲 𝗯𝗶𝗹𝗹 𝘀𝗽𝗶𝗸𝗲𝘀. In append-only systems like Milvus, deleting a vector only flags it: the data stays on disk. Segments keep piling up, queries scan rows that no longer exist, and storage quietly grows 2–5x beyond your actual live data. 𝗧𝘄𝗼 𝗯𝘂𝗶𝗹𝘁-𝗶𝗻 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗳𝗶𝘅 𝘁𝗵𝗶𝘀: • 𝗖𝗼𝗺𝗽𝗮𝗰𝘁𝗶𝗼𝗻 — merges small segments and physically drops deleted rows. Schedule it off-peak since it's I/O-heavy. • 𝗧𝗧𝗟 — auto-expires old data so you skip manual cleanup entirely. Compaction handles the rest. 𝗜𝗳 𝘆𝗼𝘂'𝗿𝗲 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝘀𝗲𝘀𝘀𝗶𝗼𝗻 𝗱𝗮𝘁𝗮, 𝘁𝗶𝗺𝗲-𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝗥𝗔𝗚, 𝗼𝗿 𝗮𝗻𝘆 𝘄𝗼𝗿𝗸𝗹𝗼𝗮𝗱 𝘄𝗵𝗲𝗿𝗲 𝗱𝗮𝘁𝗮 𝗵𝗮𝘀 𝗮 𝘀𝗵𝗲𝗹𝗳 𝗹𝗶𝗳𝗲, 𝘁𝗵𝗲𝘀𝗲 𝘁𝘄𝗼 𝘀𝗲𝘁𝘁𝗶𝗻𝗴𝘀 𝗮𝗹𝗼𝗻𝗲 𝗰𝗮𝗻 𝗰𝘂𝘁 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗰𝗼𝘀𝘁𝘀 𝗱𝗿𝗮𝗺𝗮𝘁𝗶𝗰𝗮𝗹𝗹𝘆. 𝗙𝘂𝗹𝗹 𝗴𝘂𝗶𝗱𝗲 𝗼𝗻 𝘁𝗵𝗲 𝗠𝗶𝗹𝘃𝘂𝘀 𝗯𝗹𝗼𝗴 milvus.io/blog/how-to-cu