← All posts

Why AI-Native Products Need Auto-Generated Intents, Not Off-the-Shelf Metrics

AI-native product metrics like DAU and D7 retention hide what matters. Here is why auto-generated intents track the real story of your agent.

Your dashboard says DAU is up and D7 retention is holding. Your AI agent is quietly failing half the people who use it, and you have no idea. The classic SaaS metrics were built for tap-and-click apps, and for an AI-native product whose core loop is a conversation, they measure the wrong things, often backwards.

The fix is not a better retention curve. It is metrics derived from what users are actually trying to do: auto-generated intents, tracked live, so you can see resolution rate, frustration, and unmet requests instead of guessing from session counts.

Why classic product metrics break on AI-native products

The metrics most teams ship with (DAU, WAU, D7 retention, session length, feature adoption) all share one assumption: more usage equals more value. That assumption holds for a project management tool or a photo app. Every tap is a deliberate choice, and a screen view roughly maps to intent.

For an AI agent, that assumption falls apart. The interface is a conversation. A user does not click through your IA, they ask for something and wait to see if your agent can deliver. Usage volume tells you almost nothing about whether the thing they asked for actually happened.

Here is the trap: a high number in a classic metric can be a symptom of failure on an AI product.

Session length is the clearest example

In a media app, longer sessions mean engagement. In an AI support agent or copilot, the user’s goal is usually to get an answer and leave. A long session often means the agent could not understand the request, the user rephrased three times, got a wrong answer, corrected it, and finally gave up or escalated.

So you can have:

  • Rising average session length that your growth dashboard celebrates
  • …driven entirely by users fighting the agent to get a simple task done

Same number, opposite meaning. If you optimize for session length on an agent, you are optimizing for friction.

Retention hides the slow churn

D7 and D30 retention tell you someone came back. They do not tell you why they came back, or that they came back angry. A user who returns four days in a row because the agent keeps failing the same setup step looks identical, on a retention chart, to a power user who loves the product. Both are “retained.” One is about to file a refund.

The metric that should scare you, the one that predicts the cancellation, is invisible in the standard stack.

What should you actually track for an AI-native product?

Track what the user was trying to do, and whether your agent did it. That is the whole game. Everything useful flows from a set of intents auto-generated out of real conversations: bug reports, feature requests, setup friction, pricing objections, churn signals, and whatever else is specific to your product.

Once you have intents, the metrics that matter are obvious:

Old metric (tap-and-click)What to track for AI-nativeWhat it tells you
Session lengthResolution rate per intentDid the agent actually complete what the user asked?
DAU / WAUIntent volume + trendWhich jobs are people bringing to the agent, and which are growing?
D7 retentionFrustration rateHow often users push back, repeat themselves, or express anger
Feature adoptionUnmet-request rateRequests your agent cannot handle yet (your roadmap, ranked by demand)
Funnel drop-offWhy-they-churned breakdownThe specific intent or failure that preceded a cancellation
NPS surveyResolution-to-frustration ratio per cohortReal sentiment, measured per conversation, not from a 4% survey response

A few of these deserve a closer look.

Resolution rate per intent

This is your North Star for an agent. Not “did the user have a session,” but “of the people who came to do X, what fraction left with X done.” Slice it by intent and the truth jumps out: your agent resolves 91% of password resets and 38% of billing disputes. Now you know exactly where to spend the next sprint, and its not a guess.

Frustration rate

Conversations carry signal that clicks never do. People say “no, that’s not what I meant,” “this is the third time,” “just give me a human.” That is a measurable, per-turn frustration signal. Track it as a rate per intent and you get an early-warning system that fires weeks before the retention chart dips.

Unmet-request rate

Every time a user asks for something your agent cannot do, that is a roadmap vote with a real person attached. Most teams throw this data away because their logs are unstructured and nobody reads 40,000 transcripts. Aggregated into intents, unmet requests become a demand-ranked backlog. We have seen teams reprioritize an entire quarter off this one number.

Why-they-churned breakdown

Instead of a churn rate, you get a churn reason. Not “we lost 6% this month” but “we lost 6% this month, and 70% of them hit the same broken multi-step onboarding the agent never recovered from.” One is a number to report. The other is a thing you can fix on Monday.

How do auto-generated intents actually work in practice?

The objection I hear is fair: “We do not have the team to hand-label intents across thousands of conversations.” Correct. Nobody does. That is exactly why the intents have to be generated automatically from the conversations themselves, not defined in a planning doc six months ago.

The flow looks like this:

  1. Connect to your agent and read every conversation, regardless of LLM or framework.
  2. Auto-generate intents specific to your product instead of forcing a generic taxonomy onto it.
  3. Track those intents live: resolution, frustration, unmet requests, churn precursors.
  4. Turn the findings into fixes against your system prompts, agent harness, and W&B configs, as pull requests you review and merge.

This is the part most analytics tools miss. Surfacing that billing disputes resolve at 38% is useful. Opening a PR that rewrites the billing-flow system prompt to fix it, then letting you approve it, closes the loop. That loop, conversations to intents to metrics to merged fixes, is the infrastructure for self-improving AI agents, and its the reason we built Agnost AI the way we did.

You do not need a data team to start. It is roughly a three-line SDK or an OpenTelemetry hook, and you can see your first intents in a couple minutes.

Where this is heading

The teams shipping the best agents right now have already stopped reporting DAU to their boards for the AI surface. They report resolution rate per intent and frustration trend, because those move the business and the vanity metrics do not. Off-the-shelf dashboards will keep existing, theyre fine for the parts of your product that are still buttons and screens. But the conversational core needs its own instrument, and that instrument is built on intents.

Expect “resolution rate” to become as standard for AI products as conversion rate is for e-commerce. The teams that get there first will be the ones who can answer “why did this user churn” with a sentence instead of a shrug.

FAQ

Is session length ever a useful metric for an AI agent? Rarely as a top-line goal. For a companion or entertainment agent, longer can be good. For anything task-oriented (support, copilots, internal tools), shorter sessions with high resolution rates are the win. Always read session length alongside resolution rate, never alone.

Do I have to throw out DAU and retention entirely? No. Keep them for the non-conversational parts of your product and as coarse health checks. Just stop treating them as the measure of whether your agent is working. Resolution rate per intent and frustration rate are far better proxies for AI-native value.

How is this different from tagging conversations with predefined categories? Predefined categories assume you already know what users want, which on a new AI product you usually do not. Auto-generated intents come from the actual conversations, so they catch the requests and failure modes you never thought to tag, including the ones quietly driving churn.

If you are running an AI agent in production and your dashboard still leads with DAU, you are measuring the old product, not the one you actually shipped. Agnost AI reads your conversations, generates the intents that matter for your product, and opens the pull requests to fix what they reveal, so the agent gets better while you sleep.