DeepSeek-V4 Pro 现在可在 Together AI 上使用

![Image 7 Serverless Inference High-performance inference as APIs](https://www.together.ai/serverless-inference)
![Image 8 Batch Inference Inference for batch workloads](https://www.together.ai/batch-inference)
![Image 9 Dedicated Model Inference Inference on custom hardware](https://www.together.ai/dedicated-model-inference)
![Image 10 Dedicated Container Inference Inference for custom models](https://www.together.ai/dedicated-container-inference)

![Image 11 MiniMax M2.5 Nano Banana Pro Qwen3.5-397B GLM-5 kimi k2.5 gpt-oss-120B Model library Explore the top open-source models](https://www.together.ai/models)

Compute

Accelerated Compute

![Image 17 GPU Clusters Reliable GPU clusters at scale](https://www.together.ai/gpu-clusters)
![Image 18 AI Factory Custom infrastructure at frontier scale](https://www.together.ai/ai-factory)

Developer Environments

![Image 19 Sandbox Build development environments for AI](https://www.together.ai/sandbox)

Storage

![Image 20 Managed Storage Store model weights & data securely](https://www.together.ai/managed-storage)

Model Shaping

![Image 21 Fine-Tuning Shape models with your data](https://www.together.ai/fine-tuning)
![Image 22 Evaluations Measure model quality](https://www.together.ai/evaluations)

![Image 23 DeepSeek V3.1 GLM 5 FP4 Qwen3-VL 32B gpt-oss-120b kimi k2.5 Llama 4 Maverick Model library Fine-tune top open-source models](https://www.together.ai/models)

Research

![Image 29 Research Systems research for production AI](https://www.together.ai/research)
![Image 30 Research blog All our research publications](https://www.together.ai/research-blog)

Featured publications

Show all

Developers

![Image 31 Documentation Technical docs for Together AI](https://docs.together.ai/)
![Image 32 Demos Our open-source demo apps](https://www.together.ai/demos)
![Image 33 Cookbooks Practical implementation guides](https://www.together.ai/cookbooks)
![Image 34 Voice Agents Build voice agents for production](https://www.together.ai/solutions/voice)

Company

Resources

![Image 35 Customer stories Testimonials from AI Natives](https://www.together.ai/customers)
![Image 36 Startup accelerator Build and scale your startup](https://www.together.ai/startup-accelerator)
![Image 37 Customer support Find answers to your questions](https://www.together.ai/support)
![Image 38 Blog Our latest news & blog posts](https://www.together.ai/blog)
![Image 39 Events Explore our events calendar](https://www.together.ai/events)

Company

![Image 40 About Get to know us](https://www.together.ai/about-us)
![Image 41 Careers Join our mission](https://www.together.ai/careers)

Pricing

*

![Image 42 Serverless Inference High-performance inference as APIs](https://www.together.ai/serverless-inference)
![Image 43 Batch Inference Inference for batch workloads](https://www.together.ai/batch-inference)
![Image 44 Dedicated Model Inference Inference on custom hardware](https://www.together.ai/dedicated-model-inference)
![Image 45 Dedicated Container Inference Inference for custom models](https://www.together.ai/dedicated-container-inference)

![Image 46 MiniMax M2.5 Nano Banana Pro Qwen3.5-397B GLM-5 kimi k2.5 gpt-oss-120B Model library Explore the top open-source models](https://www.together.ai/models)

* Accelerated Compute

![Image 52 GPU Clusters Reliable GPU clusters at scale](https://www.together.ai/gpu-clusters)
![Image 53 AI Factory Custom infrastructure at frontier scale](https://www.together.ai/ai-factory)

Developer Environments

![Image 54 Sandbox Build development environments for AI](https://www.together.ai/sandbox)

Storage

![Image 55 Managed Storage Store model weights & data securely](https://www.together.ai/managed-storage)

*

![Image 56 Fine-Tuning Shape models with your data](https://www.together.ai/fine-tuning)
![Image 57 Evaluations Measure model quality](https://www.together.ai/evaluations)

![Image 58 DeepSeek V3.1 GLM 5 FP4 Qwen3-VL 32B gpt-oss-120b kimi k2.5 Llama 4 Maverick Model library Fine-tune top open-source models](https://www.together.ai/models)

*

![Image 64 Research Systems research for production AI](https://www.together.ai/research)
![Image 65 Research blog All our research publications](https://www.together.ai/research-blog)

Featured publications

Show all

*

![Image 66 Documentation Technical docs for Together AI](https://docs.together.ai/)
![Image 67 Demos Our open-source demo apps](https://www.together.ai/demos)
![Image 68 Cookbooks Practical implementation guides](https://www.together.ai/cookbooks)
![Image 69 Voice Agents Build voice agents for production](https://www.together.ai/solutions/voice)

* Resources

![Image 70 Customer stories Testimonials from AI Natives](https://www.together.ai/customers)
![Image 71 Startup accelerator Build and scale your startup](https://www.together.ai/startup-accelerator)
![Image 72 Customer support Find answers to your questions](https://www.together.ai/support)
![Image 73 Blog Our latest news & blog posts](https://www.together.ai/blog)
![Image 74 Events Explore our events calendar](https://www.together.ai/events)

Company

![Image 75 About Get to know us](https://www.together.ai/about-us)
![Image 76 Careers Join our mission](https://www.together.ai/careers)

Contact sales

Sign in

All blog posts

Model Library

Published 4/29/2026

DeepSeek-V4 Pro now available on Together AI

1.6T-parameter MoE reasoning model with 512K context on Together AI, controllable reasoning modes, and cached-input pricing for long-context workloads.

Authors Sonny Khan
Table of contents

Links in this article Try now

Quickstart Guide

What's New

**DeepSeek V4 Pro on Together AI:** DeepSeek V4 Pro is now available on Together AI with a 512K-token context window for long-context reasoning workloads.
Large-scale MoE architecture: DeepSeek V4 Pro uses a 1.6T-parameter Mixture-of-Experts architecture with 49B activated parameters.
Controllable reasoning modes: Non-Think, Think High, and Think Max let teams choose between fast responses, deeper reasoning, and maximum reasoning effort.
‍Transparent serverless pricing: DeepSeek V4 Pro is available at $2.10 per 1M input tokens, $0.20 per 1M cached input tokens, and $4.40 per 1M output tokens.

Long-context reasoning changes what teams can ask a model to do. Entire repositories, large document sets, long agent traces, and tool outputs can fit into the model’s working context instead of being compressed into brittle summaries. But the models that can use that much context are also the hardest to serve: a 1.6T-parameter MoE with million-token context is not something most teams want to deploy, tune, and operate themselves.

DeepSeek-V4 Pro is now available on Together AI, the AI Native Cloud, so teams can start with Serverless Inference at 512K context and move to dedicated infrastucture for full 1M context, reserved capacity, and production control. DeepSeek-V4 Flash is coming soon, giving teams another V4 option for workloads where speed and cost matter more than maximum reasoning depth.

At a glance

| Spec | Value | | --- | --- | | Model | DeepSeek V4 Pro on Together AI | | Endpoint | deepseek-ai/DeepSeek-V4-Pro | | Architecture | 1.6T-parameter MoE | | Activated parameters | 49B | | Context on Together AI | 512K tokens | | Model-level context | 1M tokens | | Reasoning modes | Non-Think, Think High, Think Max | | Deployment | Serverless, Monthly Reserved | | Input price | $2.10 / 1M tokens | | Cached input price | $0.20 / 1M tokens | | Output price | $4.40 / 1M tokens | | Best-fit workloads | Code agents, document intelligence, long-context agents, research synthesis |

Built for long-context reasoning

DeepSeek V4 Pro is built for workloads where the model needs to reason over more than a short prompt: large repositories, long technical documents, dense retrieval bundles, tool-call histories, and research corpora.

DeepSeek V4 Pro supports million-token context at the model level; on Together AI, it is currently available with a 512K-token context window. That distinction matters because model capability and deployed serving profile are not always the same thing. Together AI is launching DeepSeek V4 Pro with a context window designed for reliable production serving, while still giving teams enough room for serious long-context workloads.

The architecture also matters because long context is not only a product spec. As context grows, serving cost, memory pressure, KV cache usage, latency, and concurrency all become part of the system design. DeepSeek V4 Pro uses hybrid attention, combining Compressed Sparse Attention and Heavily Compressed Attention, with DeepSeek reporting 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek V3.2 at million-token context.

Choose reasoning effort by workload

DeepSeek V4 Pro supports three reasoning modes, so teams can match reasoning depth to task difficulty instead of treating every request the same.

| Mode | Use when | Tradeoff | | --- | --- | --- | | Non-Think | Extraction, classification, simple Q&A, routine responses | Fastest path for lower-complexity tasks | | Think High | Code planning, document analysis, multi-step reasoning | More reasoning depth for complex work | | Think Max | Hard debugging, deep research synthesis, agentic decision points | Maximum reasoning effort; expect higher latency and token usage |

A document assistant might use Non-Think for simple extraction, Think High for conflict analysis across policies, and Think Max only when the model needs to reason through a difficult decision. A code agent might use Think High for planning a migration and Think Max for debugging a subtle cross-service failure.

DeepSeek reports benchmark results across coding, reasoning, long-context, and agentic tasks, including 93.5% LiveCodeBench, 90.1% GPQA Diamond, 80.6% SWE-bench Verified, 83.5% MRCR 1M, and 62.0% CorpusQA 1M.

Make repeated long-context queries cheaper with cached input pricing

Long-context systems often reuse the same large context across multiple questions: a repository snapshot, a document bundle, a policy archive, a retrieval payload, or a long agent trace. Cached input pricing makes those repeated workloads more practical.

DeepSeek V4 Pro is priced at $2.10 / 1M input tokens, with cached input at $0.20 / 1M tokens and output at $4.40 / 1M tokens. That represents a 90% cost reduction for reused context, which matters when the expensive part of the request is a stable block of text that gets reused across follow-up analysis.

Example pattern:

Load a large stable context, such as a 300K-token repo summary, contract set, or policy archive.
Ask several follow-up questions over that same context.
Use cached input pricing where applicable to drastically reduce the cost of repeated analysis.

Workload patterns

Code agents

Use DeepSeek V4 Pro when an agent needs to reason across repository slices, issue traces, internal documentation, prior tool calls, and proposed patches. Think High or Think Max is most useful for planning changes, debugging failures, or resolving cross-file dependencies.

Document intelligence

Use long context for contracts, policy sets, technical manuals, or research collections that need to be compared in one request. Non-Think can handle extraction and simple Q&A; Think High is better for conflict analysis, interpretation, and synthesis.

Long-context agent traces

Use DeepSeek V4 Pro to inspect long tool-call histories, intermediate results, and execution traces. Higher reasoning modes are most useful at decision points: when the agent needs to decide whether to continue, call another tool, revise a plan, or stop.

Research synthesis

Use DeepSeek V4 Pro for workflows that combine papers, notes, benchmark reports, retrieved documents, and prior analysis. Cached input pricing is especially useful when the same evidence set is reused across multiple questions.

Start serverless, move to reserved capacity

DeepSeek V4 Pro is available on Together AI Serverless Inference and Monthly Reserved infrastructure. Serverless is the right starting point for evaluation, development, and variable traffic. Monthly Reserved is better for steadier production demand where teams need more predictable capacity and cost control.

For long-context workloads, the deployment path matters. Teams are not only choosing a model; they are choosing how to manage throughput, concurrency, latency, KV cache pressure, and cost as context sizes grow. Together AI gives teams a path from evaluation to production without standing up the serving stack themselves.

Try it now

DeepSeek-V4 Pro is available today on Together AI Serverless Inference and Dedicated Endpoints.

python

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {
            "role": "user",
            "content": "Prove that the square root of 2 is irrational.",
        }
    ],
    stream=True,
)

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    if hasattr(delta, "reasoning") and delta.reasoning:
        print(delta.reasoning, end="", flush=True)
    if hasattr(delta, "content") and delta.content:
        print(delta.content, end="", flush=True)

Start with Serverless Inference for development and evaluation. For production workloads that require full 1M context, reserved capacity, workload isolation, or more predictable throughput, contact sales to deploy DeepSeek-V4 Pro on Together AI Dedicated Inference.

Get started

→ Follow our DeepSeek-V4 quickstart to get up and running in minutes

→ View the DeepSeek-V4 Pro Model Page

→ Try DeepSeek-V4 Pro in the Playground

→ Contact Sales for Dedicated Inference deployment and volume pricing

‍

Start building on Together AI

From optimized training and model shaping to large-scale production inference

Get Started now

* Products

Models

See all models DeepSeek Meta Qwen Google OpenAI Mistral AI Custom models * Developers

Pricing

Pricing overview

* Resources

[](https://discord.gg/9Rk6sSeWEG)
[](https://x.com/togethercompute)
[](https://www.linkedin.com/company/togethercomputer/)

DeepSeek-V4 Pro 现在可在 Together AI 上使用

TL;DR · AI 摘要

核心要点

结构提纲

思维导图

金句 / Highlights

DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro now available on Together AI

**At a glance**

**Built for long-context reasoning**

**Choose reasoning effort by workload**

**Make repeated long-context queries cheaper with cached input pricing**

**Workload patterns**

**Start serverless, move to reserved capacity**

Try it now

Get started

Start building on Together AI

At a glance

Built for long-context reasoning

Choose reasoning effort by workload

Make repeated long-context queries cheaper with cached input pricing

Workload patterns

Start serverless, move to reserved capacity