Mixture-of-Experts (MoE) 最近有什么新动态？

traeai 已收录 4 篇与 Mixture-of-Experts (MoE) 相关的内容。最新一篇是「Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains」，由 Hugging Face Blog 发布。

概念

Mixture-of-Experts (MoE)

别名：MoE

一种稀疏激活的神经网络架构，通过条件计算降低推理成本同时保持模型容量。

已跟踪 4 条高相关材料

TraeAI 观察

如果只读 3 篇

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Hugging Face Blog · 9 分

JetBrains 发布 12B MoE 架构模型 Mellum2，每 token 仅激活 2.5B 参数，推理速度超同类模型 2 倍以上，专为代码与文本任务优化，支持私有部署和 RAG 等高频低延迟场景。

EMO: Pretraining mixture of experts for emergent modularity

Hugging Face Blog · 9 分

EMO 是一种通过端到端预训练实现模块化涌现的混合专家模型，仅需12.5%的专家即可保持接近全模型性能，同时支持按需组合专家，显著提升大模型部署效率与灵活性。

Best Small Language Models on Hugging Face Right Now!

KDnuggets · 8.5 分

This article highlights the advancements in small language models, specifically those with under 7 billion parameters, which can now run on...

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Hugging Face Blog6月1日564 字 (约 3 分钟)

入选理由：Mellum2 是 12B 参数 MoE 模型，每 token 仅激活 2.5B 参数，推理效率提升 2x+，适合高吞吐生产环境。

精选文章#MoE#JetBrains#大模型#代码生成#RAG英文

EMO: Pretraining mixture of experts for emergent modularity

EMO: 预训练混合专家以实现模块化涌现

Hugging Face Blog5月9日1748 字 (约 7 分钟)

EMO是一种通过端到端预训练实现模块化涌现的混合专家模型，仅需12.5%的专家即可保持接近全模型性能。

入选理由：EMO 使用14B总参数、1B活跃参数，仅激活1/8专家即达近全模型性能。

精选文章#混合专家#模块化#大模型#AI研究#预训练中文

Best Small Language Models on Hugging Face Right Now!

KDnuggets5月22日3855 字 (约 16 分钟)

This article highlights the advancements in small language models, specifically those with under 7 billion parameters, which can now run on consumer GPUs or even laptops. It emphasizes that these models are now capable of performing tasks that were previously only achievable by much larger models, thanks to improvements in training data quality, distillation techniques, and architectural innovations like Mixture-of-Experts (MoE). The article provides a curated list of the best small language models available on Hugging Face, along with their capabilities and benchmark scores.

入选理由：Small language models under 7 billion parameters are now capable of performing complex tasks previously reserved for much larger models.

精选文章#Language Models#Hugging Face#AI#Machine Learning#Small Models英文

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

NVIDIA Nemotron 3 Ultra现已登陆Amazon SageMaker JumpStart

AWS Machine Learning Blog昨天952 字 (约 4 分钟)

NVIDIA Nemotron 3 Ultra已在Amazon SageMaker JumpStart上线，支持一键部署。该550B参数MoE模型专为长程Agent设计，推理速度提升5倍，成本降低30%，支持1M上下文。

入选理由：Nemotron 3 Ultra采用混合Transformer-Mamba MoE架构，550B总参仅激活55B，显著降低Agent任务计算开销。

精选文章#Nemotron 3 Ultra#SageMaker JumpStart#Agentic AI#MoE#AWS英文

跨材料问答 · Mixture-of-Experts (MoE)

回答基于：Mixture-of-Experts (MoE) 相关 4 条材料