🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms:...

Hunyuan(@TXhunyuan)

Hunyuan(@TXhunyuan)2026年6月9日

🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms:...

8.5Score

TL;DR · AI 摘要

腾讯推出 UniRL，一个统一的强化学习框架，支持多种多模态模型，并引入 DRPO 和 Flow-DPPO 两种新算法。

核心要点

UniRL 是一个统一的强化学习基础设施，适用于扩散模型、流匹配模型、LLMs/VLMs 和统一多模态模型。
UniRL 引入了两种新算法：DRPO 和 Flow-DPPO，以提升多模态模型的训练效率。
UniRL 的代码已在 GitHub 上开源，方便开发者直接使用。

结构提纲

按章节快速跳转。

§引言
介绍 UniRL 的发布及其在多模态模型训练中的重要性。
§UniRL 框架
UniRL 是一个统一的强化学习基础设施，适用于多种模型类型。
§新算法 DRPO 和 Flow-DPPO
介绍两种新算法 DRPO 和 Flow-DPPO 的设计及其优势。
§代码开源
UniRL 的代码已在 GitHub 上开源，便于开发者使用和扩展。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

UniRL 强化学习框架
- 适用模型类型
  - 扩散模型
  - 流匹配模型
  - LLMs/VLMs
  - 统一多模态模型
- 新算法
  - DRPO
  - Flow-DPPO
- 开源代码
  - GitHub

金句 / Highlights

值得收藏与分享的关键句。

UniRL 是一个统一的强化学习基础设施，适用于扩散模型、流匹配模型、LLMs/VLMs 和统一多模态模型。
— 正文第一段
⬇︎ 下载 PNG 𝕏 分享到 X
UniRL 引入了两种新算法：DRPO 和 Flow-DPPO，以提升多模态模型的训练效率。
— 正文第二段
⬇︎ 下载 PNG 𝕏 分享到 X
UniRL 的代码已在 GitHub 上开源，方便开发者直接使用。
— 正文第三段
⬇︎ 下载 PNG 𝕏 分享到 X

#强化学习#多模态模型#腾讯#UniRL#DRPO#Flow-DPPO

打开原文

Tencent Hy on X: "🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms: DRPO and Flow-DPPO. One RL loop across diffusion/flow matching models, LLMs/VLMs, and unified multimodal models👇 Code: https://t.co/fhKEqqFpc8 (yes — U(you)-ni-(need) RL 😉) https://t.co/1o9Swg2biE" / X

Tencent Hy

@TencentHunyuan

🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms: DRPO and Flow-DPPO. One RL loop across diffusion/flow matching models, LLMs/VLMs, and unified multimodal models👇 Code:

github.com/Tencent-Hunyua…

(yes — U(you)-ni-(need) RL 😉)

Made with AI

12:03 PM · Jun 9, 2026

13.7K

Views

7

1

9

19

119

4

3

43

Read 7 replies