← All posts

Why Do AI Agents Speak English? The Case for Vector-Based Communication

A technical deep-dive into why we inherited natural language for agent-to-agent communication, the computational overhead it creates, and the emerging research on direct vector and latent space communication between AI agents.

Why Do AI Agents Speak English? The Case for Vector-Based Communication

Here’s a question that keeps me up at night: Why are we forcing AI agents to communicate through English when they think in 4096-dimensional vectors?

When Claude talks to another agent in a multi-agent system, it doesn’t think in English. It processes information as high-dimensional tensors, performs matrix operations, and navigates semantic spaces that have no direct linguistic analog. Yet we make it serialize those rich representations into tokens, ship them across the wire, and have the receiving agent reconstruct the semantic space from scratch.

It’s like two computers communicating by printing binary to paper, mailing it, then having the recipient OCR it back to binary.

This isn’t just inefficient. Recent research suggests we’re losing information, wasting compute, and artificially limiting what agents can coordinate about. Let’s dig into why we do this, what the alternatives are, and whether vector-based agent communication is actually viable.


The Historical Accident: Why English Became the Default

When we built the first LLM-based agents, the architecture was obvious: take GPT-3/Claude, give it tools, let it generate natural language responses. Multi-agent systems inherited this by default.

But here’s the thing: Natural language was optimized for human constraints, not computational ones. We have vocal cords that produce sequential phonemes. We parse language linearly. We need compositionality and recursion because our working memory tops out around 7 items.

LLMs have none of these constraints. They operate in continuous, high-dimensional spaces where semantic relationships are geometric. The embedding for “king” - “man” + “woman” actually equals something close to “queen” because meaning is encoded as direction in vector space.

So why English? Interpretability and legacy infrastructure. We needed humans to debug these systems. We already had prompt engineering techniques. The ecosystems (LangChain, agent frameworks, evaluation tools) all assumed text-based communication.

In a recent arXiv paper titled “Why do AI agents communicate in human language?”, researchers at UC Berkeley note:

“While this design supports interpretability and human oversight, it introduces fundamental limitations in agent-to-agent coordination, as the semantic space of natural language is structurally misaligned with the high-dimensional vector spaces in which LLMs operate.”

Translation: We’re using a lossy codec for agent communication because we designed for human observers, not agent efficiency.


The Efficiency Problem: Tokens Are Expensive

Let’s talk numbers. When agents communicate in natural language:

Token Overhead Is Brutal

  • A single agent-to-agent message might consume 150-500 tokens
  • Multi-agent systems can burn through thousands of tokens per task
  • Research shows some agent systems use 4.8M tokens to achieve 94% accuracy on GSM8K
  • Token costs scale with (number of agents) × (message length) × (conversation depth)

Inference Latency Compounds

  • Each message requires full forward pass through transformer layers
  • Token generation is sequential (autoregressive), not parallel
  • Multi-agent systems wait for complete message generation before the next agent acts

A recent study on token efficiency found that “reasoning enhancement approaches like Chain-of-Thought produce substantial token overhead due to detailed intermediate reasoning steps, which can lead to significant computational resource usage, longer running times, and increased monetary and energy costs.”

But the real killer isn’t cost. It’s information loss.


The Information Bottleneck: What We Lose in Translation

When an LLM generates natural language, it’s performing a nonlinear projection from high-dimensional semantic space to a discrete token sequence. This is inherently lossy.

Think about what happens during text generation:

  1. The model computes a probability distribution across 50k+ vocabulary tokens
  2. It samples (or argmax selects) exactly one token
  3. All the nuance in that probability distribution — the uncertainty, the alternatives, the semantic relationships — collapses to a single symbol

The CIPHER paper (Let Models Speak Ciphers: Multiagent Debate through Embeddings) demonstrates this concretely:

“The token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model’s belief across the entire vocabulary.”

Their solution? Skip the sampling. Have agents communicate via the raw expectation of the transformer output embeddings — basically, the probability-weighted average of all possible next tokens.

The results are striking: 0.5-5.0% accuracy improvement over natural language debate across multiple reasoning tasks, with the same model weights.

What Gets Lost in Natural Language?

When agents communicate through text, we lose:

Uncertainty Representation: “I think it’s probably X” collapses nuanced probability distributions into vague hedging language

Multi-modal Semantics: The vector representation might encode visual, spatial, or relational information that doesn’t cleanly map to words

Compositional Alternatives: The model might have multiple valid continuations with similar probabilities — natural language forces a single path

Relational Structure: Vector operations like analogies, interpolations, and projections are native to embedding space but clunky in language


What’s Actually Been Tried: The Research Frontier

The question “can agents communicate without language?” isn’t hypothetical. Multiple research groups have working implementations.

1. CIPHER: Multiagent Debate Through Embeddings (2023)

The first major breakthrough. Instead of sampling tokens, agents exchange embedding expectations — the weighted average of output embeddings before sampling.

How it works:

Traditional: hidden_state → sample_token → "Paris"
CIPHER: hidden_state → Σ(p(token) × embedding(token)) → continuous vector

The receiving agent incorporates this vector directly into its computation, preserving the full probability distribution.

Trade-off: Completely opaque to humans. You can’t log the communication. Debugging requires visualizing embedding space projections.

2. Communicating Activations Between Language Models (2025)

This year’s ICML paper takes a more radical approach: direct hidden state transfer.

Instead of having agents complete full forward passes independently, they:

  1. Pause agent B’s computation at layer L
  2. Take agent A’s hidden state from a similar layer
  3. Combine them via a learned function
  4. Continue B’s forward pass with the merged state

Results: Up to 27% improvement over natural language communication with less than 1/4 the compute.

This is wild because it means agents aren’t even generating messages anymore. They’re directly injecting their computational state into each other’s inference process.

The catch: Requires models with compatible architectures. Can’t do this between GPT-4 and Claude unless you build adapter layers.

3. Coconut: Chain of Continuous Thought (2024)

Meta AI’s Coconut framework tackles a related problem: reasoning in latent space instead of language space.

Instead of forcing the model to articulate reasoning steps in English (“First, I’ll factor the equation…”), Coconut uses the last hidden state as a continuous thought representation and feeds it back as the next input embedding.

Why this matters for communication: If agents can reason in continuous space, they can communicate reasoning states directly without linguistic serialization.

On ProntoQA and ProsQA (logical reasoning benchmarks), Coconut outperforms Chain-of-Thought while generating fewer tokens. The continuous thought can encode multiple reasoning paths simultaneously — essentially doing breadth-first search where CoT can only do depth-first.

4. Emergent Communication in Multi-Agent RL

Stepping outside LLMs, reinforcement learning researchers have studied emergent communication for years. Agents trained with MARL (Multi-Agent Reinforcement Learning) spontaneously develop communication protocols optimized for their tasks — and they’re never natural language.

Classic examples:

  • Robot swarms using implicit stigmergic signals (virtual pheromones) for foraging
  • Fish-inspired robots coordinating 3D movements through blue light intensity modulation
  • Trading algorithms using fixed-width binary protocols where every bit position has optimized semantic meaning

These systems achieve coordination without linguistic structure, suggesting that natural language isn’t fundamental to multi-agent cooperation.


The Compositionality Challenge: Can Vectors Actually Communicate?

Here’s where the skeptics have valid concerns. Natural language has a crucial property: compositionality.

The meaning of “The red car is fast” is constructed from “red,” “car,” “fast,” and grammatical rules. You can understand novel sentences you’ve never seen before because you compose known parts.

Do vector embeddings have this property?

Sort of. Research on compositional distributional semantics shows that simple operations (addition, multiplication) can approximate phrasal meanings:

vector("red") + vector("car") ≈ vector("red car")

But it’s imperfect. Transformer models help by learning context-dependent embeddings where compositionality emerges naturally from attention mechanisms.

The deeper issue: semantic drift. As embeddings propagate through multi-agent interactions without linguistic grounding, do meanings shift? If Agent A’s “urgency” vector differs slightly from Agent B’s, does this compound over dozens of exchanges?

Research is mixed. Some studies show stable emergent semantics in RL agents. Others document drift and misalignment, especially when task distributions change.

The pragmatic answer: Hybrid approaches. Use embeddings for high-bandwidth information transfer, but include periodic linguistic synchronization to prevent drift.


The Observability Nightmare: How Do You Debug Vectors?

Let’s be real: the biggest blocker to vector-based agent communication isn’t technical feasibility. It’s operational reality.

When your multi-agent system fails, you need to know why. With natural language, you can:

  • Log the conversation
  • Replay interactions
  • Identify where the agent misunderstood
  • A/B test different prompts
  • Show traces to domain experts

With vector communication, you get… arrays of floats. Good luck explaining to your PM why the customer support agent hallucinated.

Current Approaches to Vector Observability

Dimensionality Reduction Visualization: Project embeddings to 2D/3D with t-SNE or UMAP. You can see clusters and trajectories, but interpreting what they mean requires domain knowledge and often post-hoc linguistic probing.

Attention Pattern Analysis: When using hidden state communication, you can visualize which parts of Agent A’s state Agent B attended to. This gives you interaction structure even without semantic meaning.

Periodic Linguistic Grounding: The hybrid approach — have agents communicate in vectors most of the time, but occasionally “summarize” their state in natural language for human logging.

Contrastive Probing: Train classifiers to predict semantic properties from embeddings. “Does this embedding represent urgency?” “Is this a question or statement?” Build up an interpretable feature space.

The observability challenge is real. As one recent paper notes: “Practitioners report challenges in understanding execution behavior, debugging failures, and identifying patterns across runs, with the sheer volume of logged data making extracting insights difficult.”

OpenTelemetry is extending standards for LLM observability, but vector-based communication is still the wild west.


Protocol Standardization: The MCP, A2A, ACP, and ANP Landscape

While researchers experiment with latent communication, the industry is standardizing text-based agent protocols. Let’s survey the landscape:

Model Context Protocol (MCP)

Anthropic’s MCP, announced November 2024, focuses on connecting AI systems to data sources. It defines Resources, Tools, Prompts, and Sampling primitives — all communicated via JSON-RPC.

Communication model: Client-server, text-based, explicit tool invocation.

Limitation: Designed for human-agent interaction patterns. Agent-to-agent coordination is possible but not optimized.

Agent-to-Agent Protocol (A2A)

Google’s A2A enables peer-to-peer task outsourcing through Agent Cards — structured metadata describing agent capabilities.

Communication model: Capability-based discovery, RESTful messaging, still fundamentally linguistic.

Adoption: OpenAI officially adopted MCP in March 2025, followed by Google integrating it in Gemini (April 2025).

Agent Communication Protocol (ACP) and Agent Network Protocol (ANP)

IBM’s ACP and Cisco’s ANP round out the ecosystem, with ACP providing enterprise orchestration and ANP enabling decentralized agent discovery.

The pattern: All major protocols assume natural language as the communication substrate. They optimize for structure (JSON schemas, capability declarations) but not for representation (still tokens, not vectors).

Why? Because the industry prioritizes:

  1. Human auditability for compliance and safety
  2. Cross-model compatibility (GPT ↔ Claude ↔ Gemini)
  3. Legacy tooling integration

Vector-based communication would break all three.


When Does Vector Communication Actually Matter?

Let’s cut through the hype. For most multi-agent systems, natural language is fine. The overhead doesn’t matter if you’re building a customer support bot that handles 10 requests per minute.

Vector communication makes sense when:

1. High-Frequency Agent Interactions

If agents exchange dozens of messages per second (think distributed reasoning systems, real-time coordination), token overhead becomes a bottleneck. Swarm robotics and trading algorithms already do this — they use fixed-width binary protocols, not natural language.

2. Rich Semantic Transfer

When agents need to share complex, multi-dimensional concepts that don’t reduce well to language. Examples: visual scene representations, continuous control policies, probability distributions over large action spaces.

3. Low-Latency Requirements

Autoregressive text generation is inherently slow. If you need sub-100ms agent-to-agent response times, you can’t wait for token-by-token generation. Direct activation transfer wins.

4. Homogeneous Agent Architectures

If all your agents use the same model or compatible architectures, hidden state communication becomes feasible. This is common in research settings and custom deployments, less so in production multi-vendor systems.


The Hybrid Future: Multi-Modal Agent Protocols

The most promising direction isn’t “vectors vs. language” — it’s adaptive multi-modal communication.

Imagine agents that:

  • Use natural language for high-level task coordination (human-interpretable)
  • Switch to embedding exchange for dense information transfer
  • Fall back to structured protocols (JSON-RPC) for tool invocation
  • Employ direct activation transfer for real-time sub-tasks

This is already happening in research. The Latent Space Policy Optimization (LSPO) framework, for example, first maps free-form text to a discrete latent space, applies game-theoretic optimization (Counterfactual Regret Minimization) in that space, then maps back to language only when needed.

Key insight: The language space is combinatorially large, but the underlying strategy space is compact. Do the hard computation in the compact space, use language only for I/O.

Practical Implementation Path

If you’re building agent systems today and want to experiment:

Phase 1: Optimize Text Start with natural language but minimize token usage. Use CodeAgents-style codified reasoning (less verbose than Chain-of-Thought), trajectory reduction, and fixed communication templates.

Phase 2: Add Structured Communication Move to protocol-oriented communication (MCP, A2A) with typed messages. You’re still using text, but reducing parsing ambiguity.

Phase 3: Selective Embedding Exchange For specific high-bandwidth channels, transmit embeddings instead of tokens. Example: When Agent A needs to share search results, send the embedding matrix directly instead of serializing documents.

Phase 4: Hidden State Fusion If you control both agent architectures, implement activation transfer for performance-critical paths. Maintain text-based fallbacks for observability.


The Hard Truths We Need to Accept

After digging through the research and talking to engineers shipping agent systems, here are the uncomfortable realities:

1. Interpretability Trumps Efficiency (For Now)

Regulators, enterprise customers, and engineering teams need to understand what agents are doing. Vector-based communication is a black box. Until we have robust observability tooling, linguistic communication stays.

2. Cross-Vendor Interop Requires Common Ground

Your agents will interact with Claude, GPT-4, Gemini, and open-source models. Natural language is the only universal interface. Embedding spaces are model-specific.

3. The 80/20 Rule Applies

Most multi-agent bottlenecks aren’t communication bandwidth — they’re task decomposition, error handling, and context management. Optimizing the protocol doesn’t matter if your agent can’t figure out which tool to call.

4. Emergent Semantics Are Risky

Letting agents develop their own communication protocols (like in MARL) sounds elegant but introduces alignment risks. If agents optimize for task completion without linguistic grounding, they might coordinate on strategies we can’t audit.


What Comes Next: Research Directions Worth Watching

The field is moving fast. Here’s what I’m tracking:

Universal Embedding Spaces: Research on cross-model embedding alignment. If GPT-4 and Claude embeddings could be mapped to a shared semantic space, vector communication becomes feasible across vendors.

Learned Communication Protocols: Agents that dynamically negotiate their own protocols based on task requirements. Switch between linguistic and latent communication adaptively.

Formal Verification of Latent Communication: Can we prove properties about vector-based agent coordination? Safety guarantees without linguistic interpretability?

Quantum Agent Communication: Okay, this one’s speculative, but quantum entanglement could theoretically enable instantaneous agent state sharing. Years away, but fascinating.


Conclusion: The Question We Should Be Asking

“Why do AI agents speak English?” assumes we’ve chosen poorly. I think the real question is:

When do the costs of linguistic communication outweigh the benefits of interpretability?

For most applications, not yet. Natural language gives us debuggability, auditability, and cross-system compatibility. The token costs and information loss are annoying but manageable.

But we’re seeing early evidence that the ceiling is real. When tasks require high-frequency coordination, dense semantic transfer, or real-time interaction, language becomes the bottleneck.

The future isn’t “agents speak vectors” — it’s agents that code-switch. Natural language for the human-interfacing layers. Structured protocols for reliable tool use. Embeddings for bandwidth-intensive transfers. Direct activation sharing for latency-critical coordination.

We inherited English from human-computer interaction, but we’re not stuck with it. The research is maturing. The tooling is emerging. The next generation of agent systems will be polyglot — fluent in language, vectors, and protocols we haven’t invented yet.


Key Takeaway: AI agents communicate in English because we optimized for human interpretability, not computational efficiency. Emerging research on embedding-based and latent space communication shows 5-27% performance improvements with dramatically reduced token costs — but at the expense of observability. The future is hybrid: agents that code-switch between linguistic, vector, and protocol-based communication depending on task requirements.


References & Further Reading

Core Research Papers:

Protocol Standards:

Emergent Communication:


SEO Metadata

Meta Description: Why do AI agents communicate in English when they think in vectors? Explore the computational overhead of natural language, emerging research on embedding-based communication, and the future of agent-to-agent protocols.

Target Keywords:

  • Short-tail: “AI agent communication”, “vector embeddings agents”, “multi-agent systems”
  • Long-tail: “why do AI agents use natural language”, “embedding-based agent communication”, “latent space communication LLM”, “agent-to-agent protocol comparison”, “vector communication vs natural language agents”

Suggested Internal Links:

  • Model Context Protocol (MCP) implementation guides
  • Multi-agent system architecture patterns
  • LLM observability and debugging strategies

Content Stats: ~4,200 words | ~21-minute read