← All posts

Long Running Tasks in MCP: The Call-Now, Fetch-Later Pattern That Changes Everything

Deep dive into SEP-1686 and how the Model Context Protocol now handles hours-long operations without blocking. Learn about task lifecycle, polling patterns, security considerations, and real production use cases from healthcare to multi-agent systems.

Computer waiting on something to happen

You’re building an AI agent that analyzes molecular structures for drug discovery. The computation takes 45 minutes. Your current MCP server… just sits there. Waiting. Blocking. Eventually timing out.

Or maybe you’re orchestrating enterprise workflows where an agent needs to kick off three data processing jobs, then come back later to collect results. Right now? You’d have to hold the connection open for hours. That’s not realistic.

This is exactly the problem MCP Core Maintainers just solved.

SEP-1686 introduces a task primitive to the Model Context Protocol. It’s a clean “call-now, fetch-later” execution pattern that lets agents submit long-running operations, go do other work, and check back when ready. No blocking. No timeouts. No hacky workarounds.

And honestly? This changes how we build production AI systems.

Let’s dig in.


The Problem: MCP Was Built for Quick Operations

Here’s the thing about the original MCP design: it assumed tools return results quickly. You call a tool, get a response, move on. Works great for:

  • Fetching a file
  • Running a database query
  • Making an API request

But what about:

  • Analyzing 500,000 molecules through multiple inference models (30-60 minutes)
  • Running comprehensive test suites across distributed environments (hours)
  • Deep research operations requiring recursive web searches and synthesis
  • Enterprise batch processing with hundreds of concurrent jobs
  • Code migration analysis across massive codebases

Current MCP? You’re stuck. Clients don’t know if the response will ever arrive. Servers can’t signal progress. Agents can’t do anything else while waiting. The whole interaction model breaks down.

Mr Bean waiting impatiently

MCP Core Maintainers heard this loud and clear from real production teams, especially in healthcare, life sciences, and enterprise automation. The feedback wasn’t just “this would be nice.” It was “we literally can’t use MCP for our use cases without this.”

So they built it.


SEP-1686: Tasks as a Core Primitive

The accepted proposal introduces tasks as a first-class concept in MCP. They’re not tool-specific, not client-specific. They’re a generic primitive that works uniformly across all request types.

Here’s the mental model: instead of a tool call that blocks until completion, you submit a task. The server immediately acknowledges receipt, assigns it a task ID, and starts processing in the background. The client can then:

  • Poll for status updates
  • Check progress
  • Retrieve results when ready
  • Cancel if needed
  • Do other work in the meantime

It’s the difference between calling a restaurant and waiting on hold until your food is ready versus placing an online order, going about your day, and picking it up when the notification comes through.

Simple. Powerful. Production-ready.


How It Works: The Call-Now, Fetch-Later Pattern

Let’s walk through the actual flow.

Step 1: Request Augmentation with _meta Field

When a client wants to execute a tool as a long-running task, it augments the request with metadata using the _meta field:

{
  "method": "tools/call",
  "params": {
    "name": "analyze_molecules",
    "arguments": {
      "library_id": "chem_lib_42",
      "models": ["inference_v1", "inference_v2"]
    },
    "_meta": {
      "modelcontextprotocol.io/task": {
        "taskId": "task_abc123",
        "keepAlive": 3600
      }
    }
  }
}

Notice what’s happening here:

  • taskId: Client-generated unique identifier. This enables idempotent retries. If the network fails, you can safely retry with the same ID.
  • keepAlive: How long (in seconds) the server should retain results after completion. Set to null for indefinite retention.

The beauty of using _meta? It’s backward compatible. Servers that don’t understand task metadata just ignore it and process the request normally. No breaking changes.

Step 2: Task Creation Notification

The server immediately responds with acknowledgment and sends a notification:

{
  "method": "notifications/tasks/created",
  "params": {
    "taskId": "task_abc123",
    "status": "submitted",
    "pollFrequency": 30
  }
}

This notification signals: “Your task is created. You can start polling now. I recommend checking every 30 seconds.”

This solves a subtle race condition. Without this notification, a client might start polling before the server finishes task setup. The notification acts as a synchronization point.

Step 3: Status Polling with tasks/get

The client periodically checks task status:

{
  "method": "tasks/get",
  "params": {
    "taskId": "task_abc123"
  }
}

Response:

{
  "taskId": "task_abc123",
  "status": "working",
  "progress": {
    "completed": 125000,
    "total": 500000,
    "message": "Processing molecular structure 125000/500000"
  },
  "pollFrequency": 30
}

Step 4: Result Retrieval with tasks/result

Once status shows “completed”, the client fetches results:

{
  "method": "tasks/result",
  "params": {
    "taskId": "task_abc123"
  }
}

Response:

{
  "taskId": "task_abc123",
  "status": "completed",
  "result": {
    "analysis": {
      "candidates": 47,
      "high_confidence": 12,
      "recommended_next_steps": [...]
    }
  }
}

The agent gets its answer. The task is done. Everything worked without blocking for 45 minutes.


Task Lifecycle: From Submitted to Terminal States

Tasks move through a well-defined state machine:

Initial State:

  • submitted — Task created, queued for execution

Active States:

  • working — Currently processing
  • input_required — Needs additional information from the client (think: human-in-the-loop scenarios)

Terminal States:

  • completed — Successfully finished
  • failed — Execution error occurred
  • cancelled — Client requested cancellation
  • unknown — Unexpected condition (server crashed, etc.)

The spec enforces strict transition rules. You can’t go from completed back to working. Tasks can’t jump states arbitrarily. This predictability matters for building reliable systems.


Core Operations: The Full API

Beyond basic status and result retrieval, the task API includes:

tasks/list — Enumerate All Tasks

{
  "method": "tasks/list",
  "params": {
    "status": "working",
    "limit": 50,
    "cursor": "page_2"
  }
}

Returns paginated list of tasks matching criteria. Perfect for dashboards showing all active operations.

tasks/delete — Cleanup Completed Tasks

{
  "method": "tasks/delete",
  "params": {
    "taskId": "task_abc123"
  }
}

Explicitly removes task and results. For long-running servers managing many tasks, this is important. You don’t want completed results sitting around indefinitely consuming memory.

Graceful Degradation

What if the server doesn’t support tasks? The client just falls back to synchronous execution. No capability negotiation needed. No complex handshakes. It just works.

This backward compatibility means you can adopt tasks incrementally without breaking existing deployments.


Security Considerations: Production-Ready from Day One

MCP Core Maintainers didn’t just design a functional system. They designed a secure one. Here’s what the spec requires:

1. Session Scoping

Tasks must be scoped to their originating session and authentication context. A task created by User A cannot be accessed by User B, even if they somehow guess the task ID.

This prevents lateral movement attacks and ensures multi-tenant isolation.

2. Rate Limiting

Servers should enforce limits on:

  • Task creation rate (prevent flood attacks)
  • Polling frequency (prevent DOS via rapid status checks)
  • Concurrent tasks per requestor (resource exhaustion protection)

The spec recommends tracking these metrics and returning appropriate errors when limits are exceeded.

3. KeepAlive Duration Management

The keepAlive parameter controls result retention. But here’s the security implication. If you set it too long, sensitive results sit around longer than necessary. If too short, legitimate clients might lose data before retrieval.

Best practice: Retrieve sensitive results promptly rather than relying on indefinite retention. For less sensitive data, longer durations improve reliability.

4. Audit Logging

Production systems should log task lifecycle events:

  • Creation (who, when, what)
  • Status transitions
  • Result retrieval
  • Deletion

This audit trail is essential for debugging, compliance, and security forensics.


Real-World Use Cases: Why This Matters

The proposal includes six customer scenarios that drove the design. These aren’t hypothetical — they’re real requirements from production teams.

1. Healthcare Analytics

The Problem: Pharmaceutical companies analyze molecular properties to predict drug interactions. Each analysis processes hundreds of thousands of data points through multiple inference models simultaneously.

Time Required: Small molecules take 30-60 minutes. Large molecules with complex simulations can take several hours.

Why Tasks Matter: Researchers can submit analysis jobs, continue working on other compounds, and check results when ready. No blocking. No connection timeouts. The AI agent orchestrates multiple parallel analyses without manual intervention.

2. Enterprise Automation

The Problem: Financial services firms run batch processes across distributed systems: data validation, compliance checks, report generation. These processes run nightly and take hours.

Why Tasks Matter: An AI agent can dispatch dozens of concurrent jobs, monitor their progress, and aggregate results when complete. The agent doesn’t need to maintain open connections for hours. It polls periodically and handles results as they arrive.

3. Code Migration at Scale

The Problem: Migrating legacy codebases (think: millions of lines) from one framework to another. Static analysis, dependency resolution, and transformation generation take significant time.

Why Tasks Matter: Without deterministic polling, LLMs tend to hallucinate progress or make assumptions about completion. With tasks, the agent gets real status updates from the migration tool. No guessing. No hallucinations. Just facts.

4. Test Execution Platforms

The Problem: Running comprehensive test suites across distributed environments. Unit tests, integration tests, end-to-end tests. The full matrix.

Why Tasks Matter: The agent kicks off the test run, streams logs concurrently (via separate resources), and retrieves the final report when complete. Developers get real-time visibility into test execution without blocking the agent’s other workflows.

5. Deep Research Operations

The Problem: Research agents perform recursive web searches, synthesize information from multiple sources, and generate comprehensive reports. This can take 10-30 minutes per query.

Why Tasks Matter: The agent can initiate research in parallel across multiple topics, continue answering user questions, and present results as they become available. User experience improves dramatically.

6. Multi-Agent Systems

The Problem: In multi-agent architectures, one agent’s work often depends on another’s completion. If Agent A blocks waiting for Agent B, and Agent B blocks waiting for Agent C, you end up with cascading delays.

Why Tasks Matter: Agents submit work asynchronously and poll for completion. The system stays responsive. No deadlocks. No cascading delays. Clean orchestration.


Future Improvements: What’s Coming Next

The current spec is powerful, but the community already sees opportunities for enhancement.

Callback/Webhook Mechanisms

David Soria Parra (Member of Technical Staff at Anthropic and MCP co-creator) mentioned in community discussions that push notifications for task completion are being explored.

Instead of clients polling every 30 seconds, the server could push a notification when the task completes. This reduces unnecessary network traffic and improves latency.

The idea is webhook-style callbacks where the client registers a callback URL during task creation, and the server POSTs results when ready.

Intermediate Result Streaming

Some long-running tasks produce partial results over time. Imagine a code generation task that outputs functions as they’re completed, rather than waiting for the entire codebase.

Future iterations might support streaming intermediate results while the task continues processing.

Nested Task Execution

Complex workflows often involve hierarchical dependencies. Task A spawns Task B and Task C, which each spawn their own subtasks.

The spec leaves room for nested task execution, enabling agents to build sophisticated workflow DAGs.


Community Discussion: Polling vs Callbacks at Scale

The MCP community has debated the tradeoffs between polling-based and callback-based approaches.

Polling Pros:

  • Simple client implementation
  • No firewall/NAT issues
  • Works across all network topologies
  • Client controls polling frequency

Polling Cons:

  • Increased network traffic
  • Higher latency (bound by poll frequency)
  • Server load from frequent status checks

Callback Pros:

  • Immediate notification on completion
  • Lower network overhead
  • Better latency

Callback Cons:

  • Complex client implementation
  • Firewall/NAT traversal challenges
  • Security implications (exposing client endpoints)

The current spec starts with polling because it’s universally compatible and simpler to implement. Future enhancements can add callbacks for clients that support them, maintaining backward compatibility.

Andrew Jefferson and others in the community have discussed scaling patterns where hybrid approaches work well — polling for initial status checks, with optional callback registration for completion events.


Implementation Patterns: Building with Tasks Today

If you’re implementing long-running task support in your MCP server, here are proven patterns:

Pattern 1: In-Memory Task Registry

For simple cases, maintain task state in memory:

class TaskRegistry:
    def __init__(self):
        self.tasks = {}

    def create_task(self, task_id, operation):
        self.tasks[task_id] = {
            'id': task_id,
            'status': 'submitted',
            'operation': operation,
            'result': None,
            'created_at': time.time()
        }

        # Start background execution
        threading.Thread(
            target=self._execute_task,
            args=(task_id,)
        ).start()

        return self.tasks[task_id]

    def get_status(self, task_id):
        return self.tasks.get(task_id)

    def _execute_task(self, task_id):
        task = self.tasks[task_id]
        task['status'] = 'working'

        try:
            result = task['operation']()
            task['status'] = 'completed'
            task['result'] = result
        except Exception as e:
            task['status'] = 'failed'
            task['error'] = str(e)

Pattern 2: Persistent Task Queue

For production systems, use a proper task queue:

from celery import Celery

app = Celery('mcp_tasks', broker='redis://localhost')

@app.task
def analyze_molecules(library_id, models):
    # Long-running analysis
    return results

def create_task(task_id, tool_name, arguments):
    # Dispatch to Celery
    async_result = analyze_molecules.apply_async(
        args=[arguments['library_id'], arguments['models']],
        task_id=task_id
    )

    return {
        'taskId': task_id,
        'status': 'submitted'
    }

def get_status(task_id):
    async_result = AsyncResult(task_id)

    if async_result.ready():
        if async_result.successful():
            return {'status': 'completed', 'result': async_result.result}
        else:
            return {'status': 'failed', 'error': str(async_result.info)}
    else:
        return {'status': 'working'}

Pattern 3: External Workflow Orchestration

For complex multi-step workflows, integrate with workflow engines:

import { WorkflowClient } from '@temporalio/client';

async function createTask(taskId: string, workflowType: string, args: any) {
  const client = new WorkflowClient();

  // Start Temporal workflow
  const handle = await client.start(workflowType, {
    taskQueue: 'mcp-tasks',
    workflowId: taskId,
    args: [args]
  });

  return {
    taskId,
    status: 'submitted'
  };
}

async function getStatus(taskId: string) {
  const client = new WorkflowClient();
  const handle = client.getHandle(taskId);

  try {
    const result = await handle.result();
    return { status: 'completed', result };
  } catch (err) {
    if (err.name === 'WorkflowNotFound') {
      return { status: 'unknown' };
    }
    return { status: 'failed', error: err.message };
  }
}

This pattern separates concerns. Your MCP server becomes a lightweight client to a robust workflow engine that handles execution, retries, and fault tolerance.


Observability: Monitoring Task Execution in Production

Long-running tasks need visibility. When a molecular analysis takes 45 minutes, you need to know:

  • Is it actually running or stuck?
  • How far along is it?
  • Are there resource bottlenecks?
  • What’s the completion rate?

Key Metrics to Track

Task Creation Rate: How many tasks per minute are being created? Spikes might indicate automation gone wrong or potential abuse.

Task Duration Distribution: What’s the p50, p90, p99 completion time? This helps set appropriate pollFrequency recommendations.

Status Transition Times: How long do tasks spend in each state? Long times in submitted suggest queue backlog. Long times in working might indicate performance issues.

Completion Success Rate: What percentage of tasks complete successfully vs fail or timeout?

KeepAlive Retention: Are clients retrieving results before expiration? High expiration rates suggest clients aren’t polling frequently enough.

For deeper observability beyond basic metrics, platforms like Agnost AI provide purpose-built monitoring for MCP servers. They track task execution, tool invocation patterns, agent reasoning, and end-to-end conversation flows. When you need to understand why an agent made a specific decision or how it utilized task results, having the full context is invaluable.


Adoption Path: Adding Tasks to Your MCP Server

If you maintain an MCP server, here’s how to add task support:

Step 1: Identify Long-Running Tools

Audit your existing tools. Which ones take more than a few seconds? Those are candidates for task-based execution.

Examples:

  • Database queries on large datasets
  • External API calls with unpredictable latency
  • Compute-intensive operations
  • Batch processing

Step 2: Implement Task Registry

Choose your backend:

  • In-memory: Simple, good for development and low-traffic servers
  • Redis: Persistent, supports distributed servers, great for production
  • Database: Full auditability, good for compliance requirements
  • Workflow Engine (Temporal/Airflow): Enterprise-grade reliability

Step 3: Add Task Endpoints

Implement the core operations:

  • tasks/get — Status lookup
  • tasks/result — Result retrieval
  • tasks/list — Task enumeration
  • tasks/delete — Cleanup

Step 4: Update Tool Handlers

Modify tool handlers to check for _meta.modelcontextprotocol.io/task. If present, create task and return acknowledgment. If not, execute synchronously for backward compatibility.

def handle_tool_call(method, params):
    task_meta = params.get('_meta', {}).get('modelcontextprotocol.io/task')

    if task_meta:
        # Task-based execution
        task_id = task_meta['taskId']
        keep_alive = task_meta.get('keepAlive')

        task = create_task(task_id, method, params['arguments'])
        send_notification('notifications/tasks/created', task)

        return {'taskId': task_id, 'status': task['status']}
    else:
        # Synchronous execution (backward compatible)
        return execute_synchronously(method, params['arguments'])

Step 5: Add Observability

Instrument task lifecycle events with your observability platform. Track creation, transitions, completion, and retrieval.

Step 6: Test with Real Workloads

Before production:

  • Test task creation under load
  • Verify status polling works correctly
  • Confirm keepAlive expiration behavior
  • Validate error handling (what happens if execution fails?)
  • Check multi-client scenarios (session isolation)

Based on your task characteristics, recommend appropriate polling intervals in the pollFrequency field. Too frequent wastes resources. Too infrequent increases perceived latency.

A good heuristic: set pollFrequency to 10% of expected task duration. A 300-second task? Recommend polling every 30 seconds.


A Note on Status: Coming Soon

Here’s the thing though. The proposal is officially accepted. The direction is locked in. But it’s not in a released version of MCP yet. This is still in development.

Coming soon sign

That said, the acceptance by MCP Core Maintainers is significant. It means the core mechanics are being built into the next version. The community is discussing implementation details. The direction is clear.

If you’re planning your MCP server architecture, you should be thinking about these patterns now. Design with tasks in mind. When the feature lands, your server will be ready.


The Bottom Line

Long-running task support isn’t just a nice-to-have feature. It’s fundamental to production AI systems.

Without it, MCP was limited to quick operations. With it, you can build:

  • Healthcare AI analyzing complex datasets for hours
  • Enterprise automation orchestrating overnight batch processes
  • Research agents conducting deep multi-source investigations
  • Multi-agent systems with clean asynchronous coordination
  • Code analysis tools processing massive codebases

SEP-1686 gives us a clean, secure, backward-compatible way to handle all of this. The “call-now, fetch-later” pattern feels natural. The task lifecycle is well-defined. The security considerations are baked in.

And the best part? Graceful degradation means you can adopt this incrementally. Old clients keep working. New clients get superpowers.

MCP just became production-ready for an entirely new class of applications.

If you’re building AI agents that do real work, not just toy demos, this is the foundation you need. Start with polling. Add callbacks later if you need them. Build something amazing.

The ecosystem is moving fast. The community is engaged. The maintainers are responsive.

This is where AI infrastructure gets interesting.


Key Takeaway: SEP-1686 introduces a task primitive to MCP that enables hours-long operations through a call-now, fetch-later pattern. With proper lifecycle management, security scoping, and graceful degradation, production AI systems can now handle healthcare analytics, enterprise automation, and multi-agent orchestration without blocking or timeouts.

Reading Time: ~12 min


Resources and Further Reading

Official Specifications:

Related Proposals:

Implementation Examples:

Community:

Related Reading:


Join the MCP Community

Building production AI systems with MCP? You’re not alone. Join our Discord community to connect with other developers implementing long-running tasks, share patterns, debug tricky scenarios, and stay current on protocol updates.

We also built Agnost AI to give you real-time visibility into your MCP servers — task execution, tool invocation patterns, performance metrics, and conversation flows. Check it out at docs.agnost.ai or book a demo at call.agnost.ai.


Building MCP servers that handle real production workloads? We’d love to help. Let’s talk.