The next article initially appeared on Block’s weblog and is being republished right here with the creator’s permission.
Should you’ve been following MCP, you’ve in all probability heard about instruments that are capabilities that permit AI assistants do issues like learn recordsdata, question databases, or name APIs. However there’s one other MCP function that’s much less talked about and arguably extra attention-grabbing: sampling.
Sampling flips the script. As an alternative of the AI calling your software, your software calls the AI.
Let’s say you’re constructing an MCP server that should do one thing clever like summarize a doc, translate textual content, or generate artistic content material. You’ve got three choices:
Choice 1: Hardcode the logic. Write conventional code to deal with it. This works for deterministic duties, however falls aside once you want flexibility or creativity.
Choice 2: Bake in your individual LLM. Your MCP server makes its personal calls to OpenAI, Anthropic, or no matter. This works, however now you’ve acquired API keys to handle and prices to trace, and also you’ve locked customers into your mannequin alternative.
Choice 3: Use sampling. Ask the AI that’s already related to do the considering for you. No additional API keys. No mannequin lock-in. The person’s present AI setup handles it.
How Sampling Works
When an MCP shopper like goose connects to an MCP server, it establishes a two-way channel. The server can expose instruments for the AI to name, however it could possibly additionally request that the AI generate textual content on its behalf.
Right here’s what that appears like in code (utilizing Python with FastMCP):

The ctx.pattern() name sends a immediate again to the related AI and waits for a response. From the person’s perspective, they simply referred to as a “summarize” software. However beneath the hood, that software delegated the arduous half to the AI itself.
A Actual Instance: Council of Mine
Council of Mine is an MCP server that takes sampling to an excessive. It simulates a council of 9 AI personas who debate matters and vote on one another’s opinions.
However there’s no LLM operating contained in the server. Each opinion, each vote, each little bit of reasoning comes from sampling requests again to the person’s related LLM.
The council has 9 members, every with a definite persona:
The Pragmatist – “Will this truly work?”
The Visionary – “What may this turn into?”
The Methods Thinker – “How does this have an effect on the broader system?”
The Optimist – “What’s the upside?”
The Satan’s Advocate – “What if we’re fully incorrect?”
The Mediator – “How can we combine these views?”
The Consumer Advocate – “How will actual individuals work together with this?”
The Traditionalist – “What has labored traditionally?”
The Analyst – “What does the info present?”
Every persona is outlined as a system immediate that will get prepended to sampling requests.
If you begin a debate, the server makes 9 sampling calls, one for every council member:

That temperature=0.8 setting encourages various, artistic responses. Every council member “thinks” independently as a result of every is a separate LLM name with a distinct persona immediate.
After opinions are collected, the server runs one other spherical of sampling. Every member opinions everybody else’s opinions and votes for the one which resonates most with their values:

The server parses the structured response to extract votes and reasoning.
Another sampling name generates a balanced abstract that includes all views and acknowledges the successful viewpoint.
Whole LLM calls per debate: 19
- 9 for opinions
- 9 for voting
- 1 for synthesis
All of these calls undergo the person’s present LLM connection. The MCP server itself has zero LLM dependencies.
Advantages of Sampling
Sampling allows a brand new class of MCP servers that orchestrate clever conduct with out managing their very own LLM infrastructure.
No API key administration: The MCP server doesn’t want its personal credentials. Customers convey their very own AI, and sampling makes use of no matter they’ve already configured.
Mannequin flexibility: If a person switches from GPT to Claude to a neighborhood Llama mannequin, the server mechanically makes use of the brand new mannequin.
Easier structure: MCP server builders can concentrate on constructing a software, not an AI utility. They’ll let the AI be the AI, whereas the server focuses on orchestration, knowledge entry, and area logic.
When to Use Sampling
Sampling is sensible when a software must:
- Generate artistic content material (summaries, translations, rewrites)
- Make judgment calls (sentiment evaluation, categorization)
- Course of unstructured knowledge (extract information from messy textual content)
It’s much less helpful for:
- Deterministic operations (math, knowledge transformation, API calls)
- Latency-critical paths (every pattern provides round-trip time)
- Excessive-volume processing (prices add up shortly)
The Mechanics
Should you’re implementing sampling, listed below are the important thing parameters:

The response object incorporates the generated textual content, which you’ll must parse. Council of Mine consists of sturdy extraction logic as a result of completely different LLM suppliers return barely completely different response codecs:

Safety Issues
If you’re passing person enter into sampling prompts, you’re creating a possible immediate injection vector. Council of Mine handles this with clear delimiters and specific directions:

This isn’t bulletproof, nevertheless it raises the bar considerably.
Strive It Your self
If you wish to see sampling in motion, Council of Mine is a superb playground. Ask goose to begin a council debate on any subject and watch as 9 distinct views emerge, vote on one another, and synthesize right into a conclusion all powered by sampling.

The Pragmatist – “Will this truly work?”
The Visionary – “What may this turn into?”
The Methods Thinker – “How does this have an effect on the broader system?”
The Optimist – “What’s the upside?”
The Satan’s Advocate – “What if we’re fully incorrect?”
The Mediator – “How can we combine these views?”
The Consumer Advocate – “How will actual individuals work together with this?”
The Traditionalist – “What has labored traditionally?”
The Analyst – “What does the info present?”