Why Time in App Is a Misleading Metric for AI Companion Products

Here’s a scenario I’ve heard from at least a dozen teams building AI companion products.

They ship a new feature. Time in app goes up. The team cheers. They tell investors the product is getting stickier. Six weeks later, retention tanks and nobody can explain it.

What happened? The feature made conversations slower to resolve. Users spent more time trying to get what they needed. Time in app went up because the product got worse, not because it got better.

That’s the trap. And it’s one of the most expensive mistakes you can make as a founder building in this category.

Two Spidermen pointing at each other

^ your “engagement is up” notification and your actual user experience, sharing a dashboard in perfect ignorance of each other

Why time in app made sense for the products that popularized it

This isn’t a bad metric by nature. It’s a metric that worked beautifully for a specific type of product and then got copy-pasted everywhere else without anyone stopping to ask whether the underlying logic still held.

For social media, streaming, and games, time in app is genuinely meaningful. The value those products create is directly proportional to the time spent in them. More time on a social feed means more content consumed, more ad impressions served, more revenue generated. More time on a streaming platform means more of the subscription’s value is being delivered. More time in a game means the player is more likely to pay for the next expansion. The metric maps cleanly to business outcomes because in those products, user attention IS the product.

The companies that made time in app a standard KPI were optimizing attention-based businesses. The metric made sense for them because their revenue model ran on time.

AI companion products share exactly none of these properties.

Your companion product doesn’t serve ads. It doesn’t earn more revenue per hour of engagement. And here’s the critical part: the value your product creates isn’t in the duration of the conversation. It’s in the outcome of the conversation. A user who spends 8 minutes reaching something meaningful got more value than a user who spent 40 minutes going in circles. Time in app can’t see the difference.

The 3 ways time in app misleads you for AI companions

Mislead 1: High time can mean high frustration

Imagine two users, both spending 40 minutes in your companion app today.

User A is deep in it. They’ve been working through something real. The AI is following the thread, picking up on context, helping them feel genuinely understood. They didnt even notice 40 minutes went by. That’s a product success.

User B has spent 40 minutes trying to get the AI to understand what they’re asking. Rephrasing. Trying again. Getting responses that are technically coherent but emotionally off. Politely persisting because they want this to work. They’re frustrated in a way they can’t quite articulate but will definitely remember.

Pull up your time in app metric.

Both users look identical.

This isn’t an edge case. Across the conversations we’ve analyzed at Agnost, frustrated users in AI companion products exhibit elevated session times compared to satisfied users in similar contexts, because frustration in a natural language product looks like continued engagement. Users don’t hit a dead end and bounce. They try harder. They rephrase. They come back one more time. That persistence is real but it isn’t the signal you want.

Mislead 2: Low time can mean high value

A user who spends 8 minutes having a conversation that genuinely shifts something for them got enormous value from your product. A user who spends 8 minutes getting confused answers and then quietly closes the app got negative value.

Same number. Completely opposite product outcomes.

The companies who figured this out in other contexts (support tooling being the obvious parallel) stopped optimizing for handle time and started optimizing for resolution. Shorter is often better if the outcome was achieved. The same logic applies here. An AI companion that understands a user so well it can respond to the heart of what they’re saying in fewer turns isn’t underperforming. It’s working exactly as intended.

Dog sitting in burning room saying "this is fine"

^ your product team, looking at time in app trending up after the last model change and calling it a win

Mislead 3: Time trends can be inverse to product health

This one’s the sneaky one.

In a genuinely healthy companion product, what should happen over time is that the AI gets better at understanding a specific user. The model builds context. Conversations become more efficient. The same depth of connection and value happens in less time because the AI doesn’t need to re-establish what it already knows.

If your time in app is consistently trending upward across your active user base, you have to ask: is this because users are finding the product MORE valuable and spending more time in it by choice? Or is it because the product is less efficient at delivering value and requires more effort from users to get what they need?

Both show up as an upward trend. Most teams assume the former. Often it’s the latter.

Improving AI performance should, over time, produce a specific pattern: session value up, session time flat or down. If you’re only tracking session time, you’ll never see this. You might even pull back on improvements that are actually working.

What time in app optimizes you toward (and why that’s bad)

This is the Goodhart’s Law problem applied directly to companion products.

When a measure becomes a target, it ceases to be a good measure. The moment your product team has time in app as a KPI they’re accountable for, you’ve created a set of incentives that point in exactly the wrong direction.

To grow time in app, you are incentivized to:

Make the AI slower to get to the point
Add more conversational back-and-forth before reaching resolution
Reduce how efficiently the AI understands context (inadvertently, just by not prioritizing it)
Build features that keep users talking rather than features that help users feel heard

Some of this sounds extreme when stated bluntly. But this is genuinely what teams drift toward when they’re chasing a time-based engagement metric. It’s not malicious. It’s just metric gravity. You optimize for what you’re measuring.

The product that wins in the AI companion category isn’t the one that consumes the most of a user’s time. It’s the one that creates the most value in the time it has. Those are not the same thing and optimizing for one actively undermines the other.

Surprised Pikachu

^ PMs realizing their “engagement growth” initiative made the AI worse at actually helping people

The metrics that actually measure companion value

Okay, so what should you track instead? Here’s the framework we’ve arrived at after watching a lot of teams navigate this.

Intent Resolution Rate, companion version. Did the user feel genuinely heard and helped in this session? You can’t always ask directly (though post-session micro-surveys have better response rates than most teams expect). But you can proxy it: does the user return for a different topic in the next session, suggesting the previous one resolved? Do they deepen the thread in a follow-up session rather than restart from scratch? Did their message sentiment shift positively over the course of the conversation? IRR is the single most meaningful quality signal for a companion product and most teams aren’t tracking it at all.

Conversation quality score. Are the user’s exchanges getting more personal, more specific, more meaningful over time? A user who starts with surface-level messages in week one and is sharing real emotional context by week four is on a depth trajectory that predicts long-term retention. This trend is completely invisible in time-based metrics but legible in conversation-level analysis.

Emotional outcome signals. This one requires natural language analysis but it’s worth building. Does the user’s tone shift across the session? A user who arrives tense (shorter sentences, guarded language, qualified statements) and leaves calmer (more expansive messages, warmer tone, closing positively) had a session that created real value. That’s the outcome you’re selling. Measure it.

Conversation continuity. Does the AI pick up on prior context in a way the user notices and appreciates? Return messages that reference something from a previous session (“you mentioned last time that…”) are a strong signal of a product that’s delivering on its core promise. The absence of this, users having to re-explain themselves every session, is one of the most common reasons companions lose users who seemed engaged on paper.

None of these metrics show up in your standard analytics stack. They live inside the conversations.

How to transition your dashboard off time in app

Don’t delete time in app from your dashboard immediately. I know I’ve spent this whole post arguing it’s misleading, and it is, but it still has coarse value as a baseline engagement proxy. The goal isn’t to eliminate it. The goal is to stop treating it as a north star.

Here’s the practical path.

Start by building a session quality score alongside time in app. The simplest version is a composite of Intent Resolution Rate (proxied by behavioral signals), session depth relative to prior sessions for that user, and a basic frustration signal (message repetition rate is the easiest to instrument). Run both metrics in parallel.

Within one quarter, you’ll see them diverge in ways that are genuinely interesting. The sessions with high time and high quality score are your product working well. The sessions with high time and low quality score are your failure modes, the user stuck in a loop, spending time without getting value. These are the sessions you need to understand and fix. They’re currently invisible in your time in app metric.

Once you can see the divergence clearly, gradually shift your optimization targets. Start using quality score (not time) as the input to product prioritization decisions. Use time only to catch dramatic anomalies. Within two quarters, most teams find that quality score is a far more predictive leading indicator of retention than time in app ever was.

The thing nobody tells you: this transition usually surfaces a handful of product issues that were hiding in plain sight. Specific intent categories with high time and low quality. Specific user cohorts that look engaged but are actually stuck. These aren’t new problems. They were there the whole time. You just couldn’t see them through the time-in-app lens.

Wrapping it up

Time in app was a great metric for the products that invented it. Those products were selling attention. The more time users spent, the more value those businesses captured.

AI companion products aren’t selling attention. They’re selling outcomes. Connection. Understanding. Progress. The occasional profound 8-minute conversation that a user will still be thinking about tomorrow.

You can’t optimize for those outcomes by measuring time. You can only optimize for them by measuring what actually happened in the conversation.

The teams building winning companion products right now are the ones who figured this out. They stopped asking “did the user spend time here?” and started asking “did the user get something they couldn’t get anywhere else?” That question can’t be answered by session length. It can only be answered by conversation data.

If you want to see your companion product through this lens and you don’t have weeks to build the pipeline from scratch, this is exactly what Agnost was built for. Quality scoring, IRR tracking, emotional signal detection, conversation continuity analysis, all running natively on your conversation data. Take a look at agnost.ai.

Hackerman meme coding at multiple screens confidently

^ you, after replacing time in app with a quality score and seeing your actual product health for the first time

TL;DR: Time in app worked for attention-based businesses because their value was proportional to time spent. AI companion products create value through outcomes, not duration. High time can mean frustration. Low time can mean enormous value. Optimizing for time in app actively incentivizes making your companion worse at resolving user needs. Replace it with Intent Resolution Rate, conversation quality score, emotional outcome signals, and conversation continuity. Your retention chart will thank you.

Reading Time: ~9 min