
You’re building an AI agent that analyzes molecular structures for drug discovery. The computation takes 45 minutes. Your current MCP server… just sits there. Waiting. Blocking. Eventually timing out.
Or maybe you’re orchestrating enterprise workflows where an agent needs to kick off three data processing jobs, then come back later to collect results. Right now? You’d have to hold the connection open for hours. That’s not realistic.
This is exactly the problem MCP Core Maintainers just solved.
SEP-1686 introduces a task primitive to the Model Context Protocol. It’s a clean “call-now, fetch-later” execution pattern that lets agents submit long-running operations, go do other work, and check back when ready. No blocking. No timeouts. No hacky workarounds.
And honestly? This changes how we build production AI systems.
Let’s dig in.
The Problem: MCP Was Built for Quick Operations
Here’s the thing about the original MCP design: it assumed tools return results quickly. You call a tool, get a response, move on. Works great for:
- Fetching a file
- Running a database query
- Making an API request
But what about:
- Analyzing 500,000 molecules through multiple inference models (30-60 minutes)
- Running comprehensive test suites across distributed environments (hours)
- Deep research operations requiring recursive web searches and synthesis
- Enterprise batch processing with hundreds of concurrent jobs
- Code migration analysis across massive codebases
Current MCP? You’re stuck. Clients don’t know if the response will ever arrive. Servers can’t signal progress. Agents can’t do anything else while waiting. The whole interaction model breaks down.

MCP Core Maintainers heard this loud and clear from real production teams, especially in healthcare, life sciences, and enterprise automation. The feedback wasn’t just “this would be nice.” It was “we literally can’t use MCP for our use cases without this.”
So they built it.
SEP-1686: Tasks as a Core Primitive
The accepted proposal introduces tasks as a first-class concept in MCP. They’re not tool-specific, not client-specific. They’re a generic primitive that works uniformly across all request types.
Here’s the mental model: instead of a tool call that blocks until completion, you submit a task. The server immediately acknowledges receipt, assigns it a task ID, and starts processing in the background. The client can then:
- Poll for status updates
- Check progress
- Retrieve results when ready
- Cancel if needed
- Do other work in the meantime
It’s the difference between calling a restaurant and waiting on hold until your food is ready versus placing an online order, going about your day, and picking it up when the notification comes through.
Simple. Powerful. Production-ready.
How It Works: The Call-Now, Fetch-Later Pattern
Let’s walk through the actual flow.
Step 1: Request Augmentation with _meta Field
When a client wants to execute a tool as a long-running task, it augments the request with metadata using the _meta field:
{
"method": "tools/call",
"params": {
"name": "analyze_molecules",
"arguments": {
"library_id": "chem_lib_42",
"models": ["inference_v1", "inference_v2"]
},
"_meta": {
"modelcontextprotocol.io/task": {
"taskId": "task_abc123",
"keepAlive": 3600
}
}
}
}
Notice what’s happening here:
- taskId: Client-generated unique identifier. This enables idempotent retries. If the network fails, you can safely retry with the same ID.
- keepAlive: How long (in seconds) the server should retain results after completion. Set to
nullfor indefinite retention.
The beauty of using _meta? It’s backward compatible. Servers that don’t understand task metadata just ignore it and process the request normally. No breaking changes.
Step 2: Task Creation Notification
The server immediately responds with acknowledgment and sends a notification:
{
"method": "notifications/tasks/created",
"params": {
"taskId": "task_abc123",
"status": "submitted",
"pollFrequency": 30
}
}
This notification signals: “Your task is created. You can start polling now. I recommend checking every 30 seconds.”
This solves a subtle race condition. Without this notification, a client might start polling before the server finishes task setup. The notification acts as a synchronization point.
Step 3: Status Polling with tasks/get
The client periodically checks task status:
{
"method": "tasks/get",
"params": {
"taskId": "task_abc123"
}
}
Response:
{
"taskId": "task_abc123",
"status": "working",
"progress": {
"completed": 125000,
"total": 500000,
"message": "Processing molecular structure 125000/500000"
},
"pollFrequency": 30
}
Step 4: Result Retrieval with tasks/result
Once status shows “completed”, the client fetches results:
{
"method": "tasks/result",
"params": {
"taskId": "task_abc123"
}
}
Response:
{
"taskId": "task_abc123",
"status": "completed",
"result": {
"analysis": {
"candidates": 47,
"high_confidence": 12,
"recommended_next_steps": [...]
}
}
}
The agent gets its answer. The task is done. Everything worked without blocking for 45 minutes.
Task Lifecycle: From Submitted to Terminal States
Tasks move through a well-defined state machine:
Initial State:
submitted— Task created, queued for execution
Active States:
working— Currently processinginput_required— Needs additional information from the client (think: human-in-the-loop scenarios)
Terminal States:
completed— Successfully finishedfailed— Execution error occurredcancelled— Client requested cancellationunknown— Unexpected condition (server crashed, etc.)
The spec enforces strict transition rules. You can’t go from completed back to working. Tasks can’t jump states arbitrarily. This predictability matters for building reliable systems.
Core Operations: The Full API
Beyond basic status and result retrieval, the task API includes:
tasks/list — Enumerate All Tasks
{
"method": "tasks/list",
"params": {
"status": "working",
"limit": 50,
"cursor": "page_2"
}
}
Returns paginated list of tasks matching criteria. Perfect for dashboards showing all active operations.
tasks/delete — Cleanup Completed Tasks
{
"method": "tasks/delete",
"params": {
"taskId": "task_abc123"
}
}
Explicitly removes task and results. For long-running servers managing many tasks, this is important. You don’t want completed results sitting around indefinitely consuming memory.
Graceful Degradation
What if the server doesn’t support tasks? The client just falls back to synchronous execution. No capability negotiation needed. No complex handshakes. It just works.
This backward compatibility means you can adopt tasks incrementally without breaking existing deployments.
Security Considerations: Production-Ready from Day One
MCP Core Maintainers didn’t just design a functional system. They designed a secure one. Here’s what the spec requires:
1. Session Scoping
Tasks must be scoped to their originating session and authentication context. A task created by User A cannot be accessed by User B, even if they somehow guess the task ID.
This prevents lateral movement attacks and ensures multi-tenant isolation.
2. Rate Limiting
Servers should enforce limits on:
- Task creation rate (prevent flood attacks)
- Polling frequency (prevent DOS via rapid status checks)
- Concurrent tasks per requestor (resource exhaustion protection)
The spec recommends tracking these metrics and returning appropriate errors when limits are exceeded.
3. KeepAlive Duration Management
The keepAlive parameter controls result retention. But here’s the security implication. If you set it too long, sensitive results sit around longer than necessary. If too short, legitimate clients might lose data before retrieval.
Best practice: Retrieve sensitive results promptly rather than relying on indefinite retention. For less sensitive data, longer durations improve reliability.
4. Audit Logging
Production systems should log task lifecycle events:
- Creation (who, when, what)
- Status transitions
- Result retrieval
- Deletion
This audit trail is essential for debugging, compliance, and security forensics.
Real-World Use Cases: Why This Matters
The proposal includes six customer scenarios that drove the design. These aren’t hypothetical — they’re real requirements from production teams.
1. Healthcare Analytics
The Problem: Pharmaceutical companies analyze molecular properties to predict drug interactions. Each analysis processes hundreds of thousands of data points through multiple inference models simultaneously.
Time Required: Small molecules take 30-60 minutes. Large molecules with complex simulations can take several hours.
Why Tasks Matter: Researchers can submit analysis jobs, continue working on other compounds, and check results when ready. No blocking. No connection timeouts. The AI agent orchestrates multiple parallel analyses without manual intervention.
2. Enterprise Automation
The Problem: Financial services firms run batch processes across distributed systems: data validation, compliance checks, report generation. These processes run nightly and take hours.
Why Tasks Matter: An AI agent can dispatch dozens of concurrent jobs, monitor their progress, and aggregate results when complete. The agent doesn’t need to maintain open connections for hours. It polls periodically and handles results as they arrive.
3. Code Migration at Scale
The Problem: Migrating legacy codebases (think: millions of lines) from one framework to another. Static analysis, dependency resolution, and transformation generation take significant time.
Why Tasks Matter: Without deterministic polling, LLMs tend to hallucinate progress or make assumptions about completion. With tasks, the agent gets real status updates from the migration tool. No guessing. No hallucinations. Just facts.
4. Test Execution Platforms
The Problem: Running comprehensive test suites across distributed environments. Unit tests, integration tests, end-to-end tests. The full matrix.
Why Tasks Matter: The agent kicks off the test run, streams logs concurrently (via separate resources), and retrieves the final report when complete. Developers get real-time visibility into test execution without blocking the agent’s other workflows.
5. Deep Research Operations
The Problem: Research agents perform recursive web searches, synthesize information from multiple sources, and generate comprehensive reports. This can take 10-30 minutes per query.
Why Tasks Matter: The agent can initiate research in parallel across multiple topics, continue answering user questions, and present results as they become available. User experience improves dramatically.
6. Multi-Agent Systems
The Problem: In multi-agent architectures, one agent’s work often depends on another’s completion. If Agent A blocks waiting for Agent B, and Agent B blocks waiting for Agent C, you end up with cascading delays.
Why Tasks Matter: Agents submit work asynchronously and poll for completion. The system stays responsive. No deadlocks. No cascading delays. Clean orchestration.
Future Improvements: What’s Coming Next
The current spec is powerful, but the community already sees opportunities for enhancement.
Callback/Webhook Mechanisms
David Soria Parra (Member of Technical Staff at Anthropic and MCP co-creator) mentioned in community discussions that push notifications for task completion are being explored.
Instead of clients polling every 30 seconds, the server could push a notification when the task completes. This reduces unnecessary network traffic and improves latency.
The idea is webhook-style callbacks where the client registers a callback URL during task creation, and the server POSTs results when ready.
Intermediate Result Streaming
Some long-running tasks produce partial results over time. Imagine a code generation task that outputs functions as they’re completed, rather than waiting for the entire codebase.
Future iterations might support streaming intermediate results while the task continues processing.
Nested Task Execution
Complex workflows often involve hierarchical dependencies. Task A spawns Task B and Task C, which each spawn their own subtasks.
The spec leaves room for nested task execution, enabling agents to build sophisticated workflow DAGs.
Community Discussion: Polling vs Callbacks at Scale
The MCP community has debated the tradeoffs between polling-based and callback-based approaches.
Polling Pros:
- Simple client implementation
- No firewall/NAT issues
- Works across all network topologies
- Client controls polling frequency
Polling Cons:
- Increased network traffic
- Higher latency (bound by poll frequency)
- Server load from frequent status checks
Callback Pros:
- Immediate notification on completion
- Lower network overhead
- Better latency
Callback Cons:
- Complex client implementation
- Firewall/NAT traversal challenges
- Security implications (exposing client endpoints)
The current spec starts with polling because it’s universally compatible and simpler to implement. Future enhancements can add callbacks for clients that support them, maintaining backward compatibility.
Andrew Jefferson and others in the community have discussed scaling patterns where hybrid approaches work well — polling for initial status checks, with optional callback registration for completion events.
Implementation Patterns: Building with Tasks Today
If you’re implementing long-running task support in your MCP server, here are proven patterns:
Pattern 1: In-Memory Task Registry
For simple cases, maintain task state in memory:
class TaskRegistry:
def __init__(self):
self.tasks = {}
def create_task(self, task_id, operation):
self.tasks[task_id] = {
'id': task_id,
'status': 'submitted',
'operation': operation,
'result': None,
'created_at': time.time()
}
# Start background execution
threading.Thread(
target=self._execute_task,
args=(task_id,)
).start()
return self.tasks[task_id]
def get_status(self, task_id):
return self.tasks.get(task_id)
def _execute_task(self, task_id):
task = self.tasks[task_id]
task['status'] = 'working'
try:
result = task['operation']()
task['status'] = 'completed'
task['result'] = result
except Exception as e:
task['status'] = 'failed'
task['error'] = str(e)
Pattern 2: Persistent Task Queue
For production systems, use a proper task queue:
from celery import Celery
app = Celery('mcp_tasks', broker='redis://localhost')
@app.task
def analyze_molecules(library_id, models):
# Long-running analysis
return results
def create_task(task_id, tool_name, arguments):
# Dispatch to Celery
async_result = analyze_molecules.apply_async(
args=[arguments['library_id'], arguments['models']],
task_id=task_id
)
return {
'taskId': task_id,
'status': 'submitted'
}
def get_status(task_id):
async_result = AsyncResult(task_id)
if async_result.ready():
if async_result.successful():
return {'status': 'completed', 'result': async_result.result}
else:
return {'status': 'failed', 'error': str(async_result.info)}
else:
return {'status': 'working'}
Pattern 3: External Workflow Orchestration
For complex multi-step workflows, integrate with workflow engines:
import { WorkflowClient } from '@temporalio/client';
async function createTask(taskId: string, workflowType: string, args: any) {
const client = new WorkflowClient();
// Start Temporal workflow
const handle = await client.start(workflowType, {
taskQueue: 'mcp-tasks',
workflowId: taskId,
args: [args]
});
return {
taskId,
status: 'submitted'
};
}
async function getStatus(taskId: string) {
const client = new WorkflowClient();
const handle = client.getHandle(taskId);
try {
const result = await handle.result();
return { status: 'completed', result };
} catch (err) {
if (err.name === 'WorkflowNotFound') {
return { status: 'unknown' };
}
return { status: 'failed', error: err.message };
}
}
This pattern separates concerns. Your MCP server becomes a lightweight client to a robust workflow engine that handles execution, retries, and fault tolerance.
Observability: Monitoring Task Execution in Production
Long-running tasks need visibility. When a molecular analysis takes 45 minutes, you need to know:
- Is it actually running or stuck?
- How far along is it?
- Are there resource bottlenecks?
- What’s the completion rate?
Key Metrics to Track
Task Creation Rate: How many tasks per minute are being created? Spikes might indicate automation gone wrong or potential abuse.
Task Duration Distribution: What’s the p50, p90, p99 completion time? This helps set appropriate pollFrequency recommendations.
Status Transition Times: How long do tasks spend in each state? Long times in submitted suggest queue backlog. Long times in working might indicate performance issues.
Completion Success Rate: What percentage of tasks complete successfully vs fail or timeout?
KeepAlive Retention: Are clients retrieving results before expiration? High expiration rates suggest clients aren’t polling frequently enough.
For deeper observability beyond basic metrics, platforms like Agnost AI provide purpose-built monitoring for MCP servers. They track task execution, tool invocation patterns, agent reasoning, and end-to-end conversation flows. When you need to understand why an agent made a specific decision or how it utilized task results, having the full context is invaluable.
Adoption Path: Adding Tasks to Your MCP Server
If you maintain an MCP server, here’s how to add task support:
Step 1: Identify Long-Running Tools
Audit your existing tools. Which ones take more than a few seconds? Those are candidates for task-based execution.
Examples:
- Database queries on large datasets
- External API calls with unpredictable latency
- Compute-intensive operations
- Batch processing
Step 2: Implement Task Registry
Choose your backend:
- In-memory: Simple, good for development and low-traffic servers
- Redis: Persistent, supports distributed servers, great for production
- Database: Full auditability, good for compliance requirements
- Workflow Engine (Temporal/Airflow): Enterprise-grade reliability
Step 3: Add Task Endpoints
Implement the core operations:
tasks/get— Status lookuptasks/result— Result retrievaltasks/list— Task enumerationtasks/delete— Cleanup
Step 4: Update Tool Handlers
Modify tool handlers to check for _meta.modelcontextprotocol.io/task. If present, create task and return acknowledgment. If not, execute synchronously for backward compatibility.
def handle_tool_call(method, params):
task_meta = params.get('_meta', {}).get('modelcontextprotocol.io/task')
if task_meta:
# Task-based execution
task_id = task_meta['taskId']
keep_alive = task_meta.get('keepAlive')
task = create_task(task_id, method, params['arguments'])
send_notification('notifications/tasks/created', task)
return {'taskId': task_id, 'status': task['status']}
else:
# Synchronous execution (backward compatible)
return execute_synchronously(method, params['arguments'])
Step 5: Add Observability
Instrument task lifecycle events with your observability platform. Track creation, transitions, completion, and retrieval.
Step 6: Test with Real Workloads
Before production:
- Test task creation under load
- Verify status polling works correctly
- Confirm keepAlive expiration behavior
- Validate error handling (what happens if execution fails?)
- Check multi-client scenarios (session isolation)
Step 7: Document Recommended Poll Frequency
Based on your task characteristics, recommend appropriate polling intervals in the pollFrequency field. Too frequent wastes resources. Too infrequent increases perceived latency.
A good heuristic: set pollFrequency to 10% of expected task duration. A 300-second task? Recommend polling every 30 seconds.
A Note on Status: Coming Soon
Here’s the thing though. The proposal is officially accepted. The direction is locked in. But it’s not in a released version of MCP yet. This is still in development.

That said, the acceptance by MCP Core Maintainers is significant. It means the core mechanics are being built into the next version. The community is discussing implementation details. The direction is clear.
If you’re planning your MCP server architecture, you should be thinking about these patterns now. Design with tasks in mind. When the feature lands, your server will be ready.
The Bottom Line
Long-running task support isn’t just a nice-to-have feature. It’s fundamental to production AI systems.
Without it, MCP was limited to quick operations. With it, you can build:
- Healthcare AI analyzing complex datasets for hours
- Enterprise automation orchestrating overnight batch processes
- Research agents conducting deep multi-source investigations
- Multi-agent systems with clean asynchronous coordination
- Code analysis tools processing massive codebases
SEP-1686 gives us a clean, secure, backward-compatible way to handle all of this. The “call-now, fetch-later” pattern feels natural. The task lifecycle is well-defined. The security considerations are baked in.
And the best part? Graceful degradation means you can adopt this incrementally. Old clients keep working. New clients get superpowers.
MCP just became production-ready for an entirely new class of applications.
If you’re building AI agents that do real work, not just toy demos, this is the foundation you need. Start with polling. Add callbacks later if you need them. Build something amazing.
The ecosystem is moving fast. The community is engaged. The maintainers are responsive.
This is where AI infrastructure gets interesting.
Key Takeaway: SEP-1686 introduces a task primitive to MCP that enables hours-long operations through a call-now, fetch-later pattern. With proper lifecycle management, security scoping, and graceful degradation, production AI systems can now handle healthcare analytics, enterprise automation, and multi-agent orchestration without blocking or timeouts.
Reading Time: ~12 min
Resources and Further Reading
Official Specifications:
- SEP-1686: Tasks Proposal — Full specification and discussion
- MCP Specification — Official Model Context Protocol documentation
- MCP Roadmap — What’s coming next
Related Proposals:
- SEP-1391: Asynchronous Tool Execution — Alternative async approach
- Task Semantics Discussion #314 — Early community discussion
Implementation Examples:
- MCP Python SDK — Reference implementation
- MCP TypeScript SDK — JavaScript/TypeScript support
Community:
- MCP Discord — Active community, maintainers present
- Agnost AI Discord — MCP server builders, monitoring discussions
Related Reading:
- Testing MCP Servers Complete Guide — Production testing strategies
- MCP Analytics Guide — Monitoring and observability
- Google’s MCP Toolbox Deep Dive — Enterprise database integration
Join the MCP Community
Building production AI systems with MCP? You’re not alone. Join our Discord community to connect with other developers implementing long-running tasks, share patterns, debug tricky scenarios, and stay current on protocol updates.
We also built Agnost AI to give you real-time visibility into your MCP servers — task execution, tool invocation patterns, performance metrics, and conversation flows. Check it out at docs.agnost.ai or book a demo at call.agnost.ai.
Building MCP servers that handle real production workloads? We’d love to help. Let’s talk.