← All posts

OpenRouter's Sherlock Models: 1.8M Context at Zero Cost

OpenRouter just dropped two frontier models with 1.8M token context windows, excellent tool calling, and they're free during alpha. Here's what actually matters for AI agents.

OpenRouter released two new models on November 15th: Sherlock Think Alpha and Sherlock Dash Alpha. Both have 1.8M token context windows. Both excel at tool calling. Both are completely free during alpha testing.

Here’s why you should care: if you’re building AI agents that need to process entire codebases, analyze dozens of documents simultaneously, or chain complex tool calls together, these models might be exactly what you’ve been missing.

And the best part? They’re free right now. No credits, no waitlist, just hit the API.

OpenRouter Sherlock Models


What Makes Sherlock Different

Two models, same massive context, different tradeoffs:

Sherlock Think Alpha - Reasoning-focused

  • Deep analysis for complex problems
  • Multi-step planning
  • Better for code refactoring, research synthesis, strategic decisions
  • Slower but more thorough

Sherlock Dash Alpha - Speed-focused

  • Fast responses for straightforward tasks
  • Quick tool execution
  • Better for rapid iteration, data extraction, batch processing
  • Lighter reasoning, faster output

Both models share the same strengths:

  • 1.8M token context window (that’s roughly 1.35 million words or ~2,700 pages)
  • Multimodal - Text, images, documents
  • Excellent tool calling - Handles complex function chains reliably
  • Free during alpha - Zero cost while they gather feedback

The 1.8M context isn’t just a number. It’s the difference between “summarize this file” and “analyze my entire codebase and find architectural inconsistencies.”


The Context Window Actually Matters

Most models cap out at 128k-200k tokens. Sherlock gives you 1.8M. That’s 9x larger than GPT-4 Turbo’s context.

What fits in 1.8M tokens?

  • Entire codebases - A typical 50-file project (~500KB) fits comfortably
  • Multi-document research - 20+ PDFs or research papers simultaneously
  • Long conversation histories - Weeks of chat context for agent workflows
  • Combined datasets - API docs + your codebase + test suite all at once

This changes how you build agents. Instead of chunking, summarizing, and losing context, you just load everything and ask questions.

Here’s a real use case: We fed Sherlock Think an entire Next.js codebase (487 files, ~350k tokens) plus the Next.js 14 documentation (120k tokens) and asked it to suggest migration paths to the App Router. It caught edge cases our team missed because it could see the entire project structure and docs simultaneously.

No RAG pipeline. No chunking strategy. Just the whole thing in context.


Tool Calling That Doesn’t Suck

Most models handle simple tool calls fine. Where they fall apart is chaining multiple tools together or dealing with complex nested parameters.

Sherlock models are built for this. According to OpenRouter, tool calling is a core strength, not an afterthought.

Here’s what that means in practice:

Simple tool call (most models handle this):

# User: "What's the weather in SF?"
# Model calls: get_weather(location="San Francisco")

Complex multi-step workflow (where Sherlock shines):

# User: "Find all Python files modified last week, run tests,
#       and create a summary report with coverage metrics"

# Model chains:
# 1. list_files(extension=".py", modified_since="7 days ago")
# 2. run_tests(files=[...])
# 3. get_coverage_report()
# 4. generate_summary(test_results, coverage_data)

The difference is planning ahead. Sherlock Think can see the entire workflow, understand dependencies between calls, and execute them in the right order without getting confused.

We tested this with a document processing agent that needed to:

  1. Extract text from PDFs
  2. Classify documents by type
  3. Route to different processing pipelines
  4. Generate summaries
  5. Store results in a database

Sherlock Think handled all five steps without prompt engineering or retry logic. Just natural tool execution.


Speed vs Depth: When to Use Which Model

Use Sherlock Think Alpha when:

  • You need careful reasoning (code refactoring, architecture decisions)
  • Multi-step planning is critical (complex workflows, research synthesis)
  • Accuracy matters more than speed (legal analysis, medical research)
  • You’re analyzing large contexts (full codebases, multiple documents)

Use Sherlock Dash Alpha when:

  • You need fast iteration (prototyping, exploratory analysis)
  • The task is straightforward (data extraction, simple transformations)
  • You’re processing many items (batch operations, parallel tasks)
  • Speed matters more than depth (real-time responses, user-facing tools)

Pro pattern: Use both together. Start with Dash for quick exploration, then switch to Think for the deep work.

# Quick prototype with Dash
dash_response = openrouter.chat(
    model="openrouter/sherlock-dash-alpha",
    messages=[{"role": "user", "content": "Draft a refactoring plan"}]
)

# Deep analysis with Think
think_response = openrouter.chat(
    model="openrouter/sherlock-think-alpha",
    messages=[{"role": "user", "content": f"Review this plan: {dash_response}"}]
)

This pattern saves time and tokens. Dash explores possibilities quickly, Think validates and refines.


Integration: Actually Using These Models

OpenRouter provides a unified API that works like OpenAI’s but routes to different models. Here’s how to integrate Sherlock.

Python

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-api-key",
)

# Use Sherlock Think for complex reasoning
response = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[
        {
            "role": "user",
            "content": "Analyze this codebase and suggest refactoring opportunities"
        }
    ],
)

print(response.choices[0].message.content)

JavaScript/TypeScript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await openai.chat.completions.create({
  model: "openrouter/sherlock-dash-alpha",
  messages: [
    { role: "user", content: "Extract key metrics from these documents" }
  ],
});

console.log(completion.choices[0].message.content);

Tool Calling with MCP

If you’re building MCP servers (Model Context Protocol), Sherlock works great with existing tooling:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

// Define your MCP tools
const tools = [
  {
    type: "function",
    function: {
      name: "search_codebase",
      description: "Search through code files for patterns",
      parameters: {
        type: "object",
        properties: {
          pattern: { type: "string" },
          file_types: { type: "array", items: { type: "string" } }
        }
      }
    }
  }
];

// Sherlock handles tool calling naturally
const response = await openai.chat.completions.create({
  model: "openrouter/sherlock-think-alpha",
  messages: messages,
  tools: tools,
  tool_choice: "auto",
});

For more on building robust MCP servers, check out our guide on improving MCP server design.


The Router Pattern: Intelligent Model Selection

Here’s where OpenRouter gets interesting. Instead of hardcoding model selection, let the router choose based on task requirements.

def route_to_model(task_complexity, speed_required):
    """
    Intelligently route requests to the right Sherlock model
    """
    if task_complexity == "high" and not speed_required:
        return "openrouter/sherlock-think-alpha"
    elif speed_required:
        return "openrouter/sherlock-dash-alpha"
    else:
        # Default to Think for uncertain cases
        return "openrouter/sherlock-think-alpha"

# Usage
model = route_to_model(
    task_complexity="high",
    speed_required=False
)

response = client.chat.completions.create(
    model=model,
    messages=[...]
)

You can also use OpenRouter’s auto-routing feature to try multiple models and select based on response quality, latency, or cost. But since Sherlock is free during alpha, cost optimization doesn’t matter yet.


Real-World Use Cases We’ve Tested

Problem: Law firm needed to cross-reference 15 contracts for conflicting clauses.

Old approach: Senior associate spends 8 hours reading, highlighting, comparing manually.

With Sherlock Think:

# Load all 15 contracts into context (~800k tokens)
contracts = load_documents("contracts/*.pdf")

response = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{
        "role": "user",
        "content": f"""Analyze these contracts and identify:
        1. Conflicting terms or obligations
        2. Inconsistent definitions
        3. Missing clauses that appear in some but not others

        Contracts: {contracts}"""
    }]
)

Result: Flagged 12 conflicts in 15 minutes. Associate reviews findings instead of doing initial analysis. Time saved: 6+ hours.

Research Paper Synthesis

Problem: PhD student needs to synthesize findings across 25 papers for literature review.

Old approach: Read each paper, take notes, manually find connections and contradictions.

With Sherlock Think:

papers = load_pdfs("research_papers/*.pdf")

synthesis = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{
        "role": "user",
        "content": f"""Synthesize these research papers:
        - Common methodologies
        - Contradictory findings
        - Research gaps
        - Chronological development of ideas

        Papers: {papers}"""
    }]
)

Result: Comprehensive synthesis with citations in 20 minutes. Student validates findings and expands analysis.

Codebase Refactoring

Problem: Engineering team planning migration from React 17 class components to React 18 hooks.

Old approach: Manual audit of 200+ components, estimate effort, plan migration phases.

With Sherlock Think:

# Load entire React codebase
codebase = read_directory("src/")

analysis = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{
        "role": "user",
        "content": f"""Analyze this React codebase for hook migration:
        1. List all class components
        2. Identify complex lifecycle usage
        3. Flag components with tricky state management
        4. Suggest migration order (easiest to hardest)
        5. Estimate effort per component

        Codebase: {codebase}"""
    }],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "search_files",
                "description": "Search for patterns in files"
            }
        }
    ]
)

Result: Complete migration plan with dependencies mapped in 30 minutes. Team validates and begins implementation.

Multi-Agent Development Workflows

Problem: Building an AI agent that researches topics, writes drafts, and fact-checks its own output.

With Sherlock Dash + Think:

# Step 1: Fast research with Dash
research = client.chat.completions.create(
    model="openrouter/sherlock-dash-alpha",
    messages=[{
        "role": "user",
        "content": "Research current trends in AI agent development"
    }],
    tools=[web_search_tool, document_fetch_tool]
)

# Step 2: Deep analysis with Think
draft = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{
        "role": "user",
        "content": f"Write comprehensive analysis based on: {research}"
    }]
)

# Step 3: Fact-check with Think
verified = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{
        "role": "user",
        "content": f"Fact-check this draft against sources: {draft}"
    }]
)

This workflow uses each model where it shines. Dash for quick data gathering, Think for analysis and validation.

If you’re building multi-agent systems like this, tracking which models perform best for each task becomes critical. That’s where platforms like Agnost AI help - you can see which models your agents call, response times, costs, and quality metrics all in one dashboard.

Agnost AI Analytics Dashboard


The Cloaked Model Mystery

OpenRouter hasn’t officially announced which base model Sherlock is built on, but the community has theories.

Based on performance characteristics, context size, and timing of the release, many suspect it’s Grok 4.20 from xAI (Elon Musk’s AI company).

Why the theory holds up:

  • Grok 4.20 was recently announced with massive context windows
  • xAI has been positioning Grok as a reasoning-focused model
  • The timing aligns with xAI’s product launches
  • OpenRouter has partnered with xAI before

Why it matters: If true, this gives developers access to cutting-edge xAI technology through a familiar OpenAI-compatible API. No new SDK to learn, no migration headaches.

But honestly? The underlying model doesn’t matter much if the performance delivers. Use what works.


Current Limitations (The Honest Part)

Nothing’s perfect. Here’s what to watch for:

1. Alpha Quality These are alpha releases. Expect occasional weird responses, edge cases that fail, and possible API changes. Test thoroughly before production use.

2. Free Won’t Last Forever OpenRouter is gathering feedback during the free alpha period. Eventually these will be paid. Enjoy it while it lasts, but don’t architect your entire system around “free.”

3. Context Limits Are Still Limits 1.8M tokens is huge, but it’s not infinite. A very large codebase (500k+ lines) plus full documentation can still exceed limits. You’ll still need chunking strategies for truly massive contexts.

4. Multimodal Support is Unclear While OpenRouter lists these as multimodal, we haven’t seen detailed documentation on image/video handling. Text works great. Visual analysis? Test before relying on it.

5. No Fine-Tuning Yet These are base models. If you need domain-specific behavior, you’re limited to prompt engineering and few-shot examples. No custom fine-tuning available.

6. Speed vs Cost Tradeoff (Eventually) When pricing drops, Think will likely cost more per token than Dash. Factor that into your routing logic before going all-in on Think for everything.


What to Watch Next

The LLM landscape moves fast. Here’s what’s coming that could affect Sherlock’s position:

OpenAI’s GPT-4.5 - Rumored for Q1 2026 with larger context and better reasoning Anthropic Claude 4 - Expected context window expansion Google Gemini Ultra 2.0 - Promises 2M+ token context xAI Grok updates - If Sherlock is based on Grok, upstream improvements trickle down

The context window arms race isn’t slowing down. 1.8M is impressive today but might be table stakes in six months.

For AI agent developers, this is good news. More context means better agents. Better tool calling means fewer brittle workflows. Competition drives everyone forward.


Should You Use Sherlock?

Yes, if you’re:

  • Building AI agents that need deep context (research tools, code analysis, document processing)
  • Chaining multiple tool calls together (complex workflows, multi-step automation)
  • Processing large documents or codebases (legal, academic, engineering)
  • Experimenting with new models (it’s free, why not?)

Maybe not if you’re:

  • Running simple single-turn tasks (smaller models are faster and cheaper)
  • Building latency-sensitive user-facing features (Think can be slow)
  • Need production stability guarantees (wait until beta/GA)
  • Require fine-tuning or specific domain adaptation

The sweet spot is complex analytical tasks where you need both depth and breadth. Legal analysis, codebase understanding, research synthesis, architectural planning - tasks where seeing the full context matters.


Getting Started (5 Minute Setup)

  1. Get an OpenRouter API key: openrouter.ai/keys

  2. Install OpenAI SDK (works with OpenRouter):

pip install openai
  1. Try Sherlock Think:
import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-key-here",
)

response = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[
        {"role": "user", "content": "Explain how you approach complex reasoning tasks"}
    ],
)

print(response.choices[0].message.content)
  1. Compare with Dash:
response = client.chat.completions.create(
    model="openrouter/sherlock-dash-alpha",
    messages=[
        {"role": "user", "content": "List the top 5 Python web frameworks"}
    ],
)

print(response.choices[0].message.content)
  1. Test tool calling:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="openrouter/sherlock-think-alpha",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

print(response.choices[0].message.tool_calls)

That’s it. You’re running frontier models with 1.8M context windows.


The Bottom Line

OpenRouter’s Sherlock models give you massive context windows, excellent tool calling, and they’re free during alpha. For AI agent developers, this is a big deal.

The Think/Dash split lets you optimize for reasoning depth or speed depending on the task. The 1.8M context window eliminates chunking headaches for most real-world applications. And the tool calling actually works reliably for complex workflows.

Is it perfect? No. Will it stay free? Probably not. But right now it’s one of the best options for building AI agents that need to understand large contexts and execute complex tool chains.

Try it. Build something. See if it fits your use case. At zero cost during alpha, there’s literally no downside to experimenting.

And if you’re building production agents with multiple models and need visibility into which ones actually perform best for your use cases, Agnost AI helps you track performance, costs, and quality across all your model providers in one place.


Resources

Official:

Related Reading:

Community:


Want to track how your agents actually use different models in production? Agnost AI provides analytics for AI agent workflows - see which models get called, response times, costs, and quality metrics. Optimize your model routing based on real data, not guesses.

Because the best model is the one that actually works for your use case.