One Flexible Tool Beats a Hundred Dedicated Ones

TL;DR · AI 摘要
在2026年启动时,用户需要安装MCP服务器以让LLM代理与系统交互。
核心要点
- MCP服务器通过提供工具如`create_issue`、`list_pull_requests`等,为代理提供灵活的交互方式。
- MCP设计允许代理从菜单中选择正确的工具,无需构建复杂工具。
- MCP比CLI更灵活,能管理环境并减少上下文丢失。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- MCP与CLI的对比
- MCP架构
- 服务打包
- CLI架构
- 命令行接口
金句 / Highlights
值得收藏与分享的关键句。
MCP设计通常将服务包装成一堆专用工具,而CLI则让代理拥有真正的灵活性。
when you wanted an LLM agent to talk to a system at the start of 2026 was to install an MCP server for it.
GitHub. Jira. Slack. Linear. Postgres. Neo4j. Each one ships a server that exposes a tidy menu of tools,create_issue,list_pull_requests,merge_pull_request,get_repository,search_code, and so on, and you point your agent at it.
It’s a great onboarding experience. It’s also, for a surprising number of real workloads, the wrong shape.
The thesis is short:MCP design usually wraps each service as a pile of dedicated tools; a CLI hands the agent one really flexible tool. With today’s models, the flexible tool wins.

Comparison of MCP vs CLI approaches.
The two shapes ask the model to do different work. With a pile of dedicated tools, the agent just has to _pick the right one off a menu_. With a flexible tool, it has to _figure out how to put the pieces together itself_. That second part used to be the hard one. Models would hallucinate flags, lose the thread on long pipelines, misread help text, so wrapping every operation in a pre-baked tool was a sensible defense. That just isn’t true anymore. Today’s models read a--helppage or SKILL.md when they need to, know the canonical CLIs from training, string together bash without supervision, and retry when they get a flag wrong. The hard part got easy, the easy part was always easy, and all those neatly-wrapped tools mostly just bloat the model’s context for nothing now.
Of course it’s not all roses and sunshine. Handing the agent a terminal also hands it a much bigger blast radius. The same flexibility that lets it composegh | jq | xargsinto something useful also lets a prompt injection talk it into something a lot worse than a hostile Cypher query. So yes, there’s a trade-off, and you have to actually think about it (sandbox, allowlist, separate OS user, read-only role at the database, the usual stuff).
_But when you can give the agent a terminal in a reasonably safe way, the flexible side still comes out ahead._
Where CLI shines
The same “wrap a service as a pile of dedicated tools” pattern shows up wherever MCP does. Postgres MCPs vs. psql. Kubernetes MCPs vs. kubectl. Filesystem MCPs vs. cat, ls, mv, grep glued by pipes. Same instinct every time, same CLI counterpart every time. And the same three failure modes too, because they aren’t really about any one product.
_Nothing in the MCP spec actually requires this approach of piling up dedicated tools. The protocol asks for typed tools, nothing more; it says nothing about how narrow each tool has to be. Implementations just gravitate toward many small narrow tools for historic reasons. You can build flexible tools that take a single expressive input the agent shapes however it wants, and most of the time you probably should._
To make it concrete, we’ll look at an example pitting Neo4j MCP server against Neo4j CLI.
_Disclaimer up front: I work at Neo4j. The choice is just convenience, but the learnings apply to most other CLIs._
The Neo4j MCP server is the official server that exposes Neo4j to agents through MCP, shipping a handful of dedicated tools like read query, write query, and get schema. neo4j.sh is the official command-line interface for Neo4j, a single binary you run in a terminal with credential profiles for each database you talk to. To keep the comparison honest, we’ll only look at the read-query and schema pair on the MCP side against the equivalent query invocation in neo4j.sh. Same operations, same database, same Cypher going over the wire. The only thing that changes is whether the agent reaches them through a typed tool schema or through a string handed to a shell.
Querying across environments
We already saw how a pile of dedicated tools eats the context window with descriptions, and that some servers now ship deferred tools to push that cost off until the agent actually reaches for them. But there’s a second multiplier nobody talks about: what happens when you want to talk to more than one instance of the same service. With MCP, the tool count doesn’t just grow with features, it grows with environments.

Connecting to multiple database via MCP or CLI.
The agent wants a node count from dev, staging, and prod. Through MCP, you stand up a neo4j-mcp-server per environment, each one carrying its four tool schemas into the agent’s context on every turn. Three databases is twelve schemas in the model’s window, the same four schemas three times over, before the agent has done anything.
Through the CLI, it’s a for loop:
$ for c in dev staging prod-ro; do
neo4j-cli query -c $c --format toon \
"MATCH (n) RETURN count(n) AS nodes"
doneOne binary, three credential profiles, zero per-turn context cost. Adding a fourth environment is one more credential dbms add, not one more MCP server process. The same shape carries over to any “reach out to N similar things” workflow you might want: snapshotting prod before a risky deploy, diffing the schema between staging and prod, running a health check across every database the agent knows about.
Chaining queries
Say the agent is investigating a known fraud account: from a single seed, find every account it transacted with, then find which _other_ accounts those counterparties transact with the most often. Two queries against the same database, where the second’s parameters are the output of the first.

Chaining queries
Through MCP, the model has to be the pipe. It callsread-cypher, the result comes back as a list of, say, 80 counterparty IDs, those 80 IDs sit in the model’s context now, the model formats them into the parameter for the secondread-cyphercall, and only then can query two run. The intermediate list rides the conversation verbatim, and every extra ID is another row of context the agent pays for whether it ever reads it again or not.
Through the CLI, the pipe is a literal|:
$ neo4j-cli query -c prod-ro --format json \
--param "seed=acct_19f3" \
"MATCH (:Account {id: \$seed})-[:TRANSACTED]-(c:Account)
WHERE c.id <> \$seed
RETURN collect(DISTINCT c.id) AS counterparties" \
| neo4j-cli query -c prod-ro --params-from-stdin \
"MATCH (a:Account)-[:TRANSACTED]-(b:Account)
WHERE a.id IN \$counterparties
AND NOT b.id IN \$counterparties + ['acct_19f3']
RETURN b.id, count(DISTINCT a) AS edges_into_cluster
ORDER BY edges_into_cluster DESC LIMIT 20"--params-from-stdinreads the previous query’s JSON result and binds it as a parameter for the next. The counterparties list never enters the model’s context, the agent’s token cost is the same whether the cluster has 5 counterparties or 500.
This is where the shell starts to feel like a different category of tool altogether. The agent isn’t picking from a menu of operations anymore, it’s composing pipelines, and the intermediate data never has to surface. A two-step query becomes a |. A fan-out becomes a for loop. A join across two databases becomes one query piped into another with --params-from-stdin. Each of those would be three or four MCP round-trips with every intermediate result paraded through the context window, and at that point the agent has spent more tokens shuffling rows than thinking about them.
Pipe across many CLIs
Same problem, bigger scale. Say the agent wants to materialize a project’s recent GitHub issues into Neo4j: an:Issuenode per ticket, a:Usernode per author, a:TAGGEDrelationship per label. The data lives in one CLI (gh), wants reshaping (jqdoes that), and lands in another CLI (neo4j-cli). Three different tools in one line. Through MCP, you’d hit GitHub’s MCP server for the issue list, every issue body lands in the model’s context, the model extracts the fields it wants, andwrite-cypherfires once per issue. Hundreds of round trips through the model, every issue body sitting in the conversation along the way.
Through the CLI, three programs in a pipe:
$ gh issue list --repo neo4j/neo4j --limit 100 \
--json number,title,author,labels \
| jq -c '.[]' \
| while read issue; do
neo4j-cli query --rw -c prod \
--param "data=$issue" \
"WITH apoc.convert.fromJsonMap(\$data) AS i
MERGE (n:Issue {number: i.number}) SET n.title = i.title
MERGE (u:User {login: i.author.login})
MERGE (u)-[:OPENED]->(n)
FOREACH (label IN i.labels |
MERGE (l:Label {name: label.name})
MERGE (n)-[:TAGGED]->(l))"
doneghpulls the issues,jqreshapes each one into a single JSON line, thewhileloop hands each line toneo4j-clias a Cypher parameter. The model writes this script once and then steps off; the data flows through bash, not through the agent. A hundred issues or ten thousand, the agent’s token cost is the same.
The shape generalizes well beyond GitHub. Swapghfor any other CLI that emits JSON (jira issue list,linear,curlagainst a webhook, your own internaldumpcommand), swap the Cypher pattern for whatever database you’re building, and the pipeline carries. Two MCP tools can’t pipe to each other; two CLIs can, and so can ten.
Terminal control is powerful, and that’s the catch
The terminal isn’t a fixed surface, it’s the most flexible tool you can hand an agent because it composes with everything else on the box.
That power is also the catch. A flexible tool used badly does flexible damage. With great terminal access comes the obvious responsibility: sandbox the shell, allowlist the verbs you actually want, run the agent as a separate OS user, bind credentials to roles that physically can’t do the destructive thing. None of this is novel, it’s just sysadmin hygiene applied to an LLM that types fast. And if you can’t do any of that, an MCP server with a small fixed surface is still the right answer; the protocol-level guarantee that the agent can’t cat ~/.ssh/id_rsa is a real thing.
The broader point holds even if you stay entirely inside MCP. The reason the terminal wins isn’t that bash is special, it’s that bash is one tool with very flexible input. Pipes, variables, substitution, looping. That’s the shape worth copying. Read the terminal as MCP’s limit case and design toward it: fewer tools, each one accepting expressive input, the agent doing the composing instead of you anticipating every combination in advance. Most MCP servers are a long list of narrow endpoints because that’s how the underlying API was already shaped, not because the agent works better that way. The servers that age well will be the ones that picked a smaller, more expressive surface on purpose.
_All images in this blog post are created by the author._