What the MCP? (Part 3): When Code LLMs Need Help

← back
11 min read· 08 Jan 2026
What the MCP? (Part 3): When Code LLMs Need Help
Contents

Introduction#

Everyone's racing toward fully autonomous agents. The vision is compelling: AI that tolerates failure, recovers gracefully, and keeps marching toward its goals. And with 2,000+ MCP servers now in the registry, the tooling ecosystem is exploding.

But here's what nobody's talking about: what happens when the LLM doesn't have all the info it needs to call a tool?

The MCP folks saw this coming. They built something called Elicitation. Most clients don't support it yet. I built Quick Call with this from day one -> only realized I was ahead of the curve when I found out Claude Code doesn't support it.

Let me show you what I mean.


The Scenario: "Send hi to Slack"#

Same request. Two very different execution paths.

Quick Call (with elicitation)#

Quick Call: one tool call, user picks channel inline
Quick Call: one tool call, user picks channel inline

What happens:

  1. User: "Send hi to Slack"
  2. Quick Call MCP server recognizes channel is missing
  3. Server pauses, shows dropdown: "Which channel?"
  4. User picks #general
  5. Message sent

Result: One tool call. One user interaction. Done.


Claude Code (without elicitation)#

Claude Code: two tool calls, extra round-trip
Claude Code: two tool calls, extra round-trip

What happens:

  1. User: "Send hi to Slack"
  2. Claude thinks: "I need to know which channel"
  3. Claude calls list_channels -> gets channel list back
  4. Claude presents options: "Which channel?"
  5. User types: #general
  6. Claude calls send_message(channel="#general", message="hi")
  7. Done

Result: Two tool calls. Extra tokens. Extra latency.

To be clear: Claude Code is being smart here. It figured out it needed more info and found a workaround. But it's still a workaround.

The difference? Elicitation lets the tool ask for what it needs. Without it, the LLM has to figure out how to get that info itself.

Think about it: who knows better what parameters a tool needs -> the tool or the LLM guessing from a description? The tool, obviously. Elicitation puts the tool in control of gathering its own inputs. That's the fundamental shift.


The Cost of Being Clever#

Every Extra Tool Call = $$$#

The math is simple:

  • Each tool call = input tokens (tool definitions) + output tokens (response)
  • Extra list_channels call: ~500-1000 tokens round-trip
  • At scale: 1,000 messages/day × 500 tokens = 500K extra tokens/day

What does that cost?

ModelInput (per 1M)Output (per 1M)DailyMonthly
Claude Opus 4.5$5$25~$5~$150
GPT-4o$2.50$10~$2~$68

That's $70-150/month for one feature's inefficiency. Multiply by every tool that needs user input.


Beyond Cost: Reliability#

Anthropic's own benchmarks tell the story. From their Opus 4.5 announcement:

Scaled tool use (MCP Atlas):

ModelScoreFailure Rate
Opus 4.562.3%~38%
Sonnet 4.543.8%~56%
Opus 4.140.9%~59%

Even the best model fails 38% of the time on complex tool use scenarios. And that's Opus 4.5 -> Anthropic's flagship. Fewer tool calls = fewer chances to fail.


Latency Adds Up#

Each tool call involves:

  • Model inference time
  • API round-trip
  • Response parsing

Claude Code's workaround means 2x the wait time. The user sits there while the LLM fetches the channel list, processes it, formats the question, waits for input, then makes another call.

With elicitation? The tool pauses, asks, continues. One smooth interaction.

So how do we fix this?


Where Elicitation Shines#

Use Cases#

ScenarioWithout ElicitationWith Elicitation
AmbiguityFail or guess wrongAsk: "Which subscription to cancel?"
ConfirmationProceed blindlyAsk: "Type workspace name to confirm delete"
Missing paramsExtra tool call or errorAsk: "Enter your API key"
Progressive inputFront-load everything upfrontCollect step-by-step as needed

See It In Action#

I've open-sourced a demo app that showcases Quick Call's elicitation framework: quickcall-mcp-elicitation

The prompt is deliberately vague: "Schedule a meeting" -> no title, no participants, no time. The tool collects what it needs progressively through elicitation. One tool call, multiple user inputs, zero extra LLM round-trips.

Meeting scheduler with progressive elicitation
Meeting scheduler with progressive elicitation

Here's how the flow works:

MCP ServerBackendFrontendUserMCP ServerBackendFrontendUserElicitation: Get titleElicitation: Get participants, duration, time"Schedule a meeting"Forward requestExecute toolctx.elicit("title?")Show text input"Weekly Standup"Resume with valueMeeting createdResponse"Meeting scheduled!"

The tool pauses at each ctx.elicit() call, collects input via SSE, and resumes.

Wait, aren't those still round-trips?

Each ctx.elicit() is a round-trip between backend and frontend: SSE event out, user responds, POST back, tool resumes. But critically, it's not an LLM round-trip. The LLM calls schedule_meeting once. That single tool execution handles all user interactions internally. The LLM doesn't re-enter the loop until the tool returns.


How It Works#

Server Side: ctx.elicit()#

In your MCP tool, call ctx.elicit() when you need user input:

from fastmcp.server.dependencies import get_context
 
@mcp.tool()
async def schedule_meeting(title: Optional[str] = None, duration: Optional[str] = None):
    ctx = get_context()
 
    # Free text input
    if not title:
        result = await ctx.elicit(
            message="What should the meeting be called?",
            response_type=str,
        )
        if result.action == "cancel":
            return {"error": "Cancelled by user"}
        title = result.data
 
    # Single select from options
    if not duration:
        result = await ctx.elicit(
            message="How long should the meeting be?",
            response_type=["30 minutes", "1 hour", "2 hours"],
        )
        duration = result.data
 
    return {"title": title, "duration": duration}

response_type determines the UI:

  • str -> text input
  • ["option1", "option2"] -> single select buttons
  • int, bool -> appropriate input fields

Client Side: Handle the pause#

When ctx.elicit() is called, your client receives an SSE event:

{
  "type": "elicitation_request",
  "elicitation_id": "chat_abc123",
  "message": "What should the meeting be called?",
  "options": null
}

Render the UI, collect input, POST back:

POST /elicitation/respond
{
  "elicitation_id": "chat_abc123",
  "response": {"action": "accept", "value": "Weekly Standup"}
}

The tool resumes from where it paused. That's it.


Current Client Support#

ClientElicitationNotes
Claude CodeNoIssue #2799 - 106 upvotes, assigned but no timeline
Quick CallYesBuilt-in from day one
GitHub CopilotYesShipped Dec 2025 - VS Code, VS 2026, JetBrains
CursorYesShipped - supports string, number, boolean, enum schemas

When I built Quick Call, elicitation was already available in FastMCP. I used it because making users re-prompt when a parameter was missing felt wrong. I'm looking forward to seeing Claude Code support this.


Final Thoughts#

Elicitation isn't UX polish. It's the difference between tools that ask for what they need and LLMs that scramble to figure it out themselves.

Fewer tool calls. Fewer tokens. Fewer failures. Better UX.

Cursor and Copilot already support it. Claude Code will get there. Until then, build your tools right -> assume elicitation exists, and let your tools do the asking.


The MCP elicitation demo is open-sourced: quickcall-mcp-elicitation

Try Quick Call: Now with Claude Code integration -> quickcall.dev/claude-code

Catch up: Part 1: What the MCP? | Part 2: I Built Quick Call


Resources#

Written by Sagar Sarkale