← All posts

The Problem With Tracking Conversations Like Pageviews

Your session numbers look great. Your users are churning. Here's why event-based analytics was never built for conversational AI products, and what to do instead.

The Problem With Tracking Conversations Like Pageviews

Picture this. You’re a PM at an AI startup, six months post-launch. You open the dashboard on a Monday morning and everything looks… fine? Session count is up 20% week over week. Average session length is 4 minutes and 30 seconds. DAU is climbing. You screenshot it and drop it in the investor update Slack channel.

Then you look at retention.

Week 4 retention is 12%. Week 8 is 4%. Users are showing up, having conversations, and disappearing. The metrics say engagement is strong. The business says something is very wrong.

Here’s the thing nobody tells you when you ship your first AI product: you’ve been tracking conversations like pageviews, and that’s why your dashboard lies to you every single morning.

Person staring at metrics dashboard looking confused

^ every AI PM on Monday morning when the numbers look good but retention is falling off a cliff


The Pageview Was Built for a World Where Content Sits Still

The pageview metric was invented in the mid-90s to answer one question: did someone look at this thing? That’s it. A newspaper prints a story. Did you open it? Click. Pageview logged. The content doesn’t change based on what you do. It just sits there. You either consumed it or you didn’t.

This mental model spread everywhere. Clicks, sessions, time-on-page, bounce rate, page depth. All of it built on the same foundational assumption: the product is a static artifact and the user is moving through it. Engagement equals consumption. More clicks means more engagement. More engagement means more value.

That assumption held for 25 years. It made analytics what it is today.

And then we shipped products where the product itself responds to what the user says. The entire premise collapsed, and most teams haven’t noticed yet.

A conversation is NOT a static artifact. It’s a two-sided, dynamic exchange with internal structure. The meaning of a conversation lives in the sequence: what was asked, how it was answered, what happened next, whether the user got what they actually came for. None of that shows up in an event stream.


What You Lose When You Log conversation_started and conversation_ended

When you instrument your AI product the way you’d instrument a web app, here’s what your event log looks like:

conversation_started   { user_id: 123, timestamp: 10:04:12 }
message_sent           { user_id: 123, turn: 1 }
message_sent           { user_id: 123, turn: 2 }
message_sent           { user_id: 123, turn: 3 }
conversation_ended     { user_id: 123, duration: 4m32s }

Looks reasonable. Now let me tell you three stories that produce EXACTLY that log:

Story A: User asked a coding question. Got a perfect answer on the first try. Asked a follow-up. Got a clarification. Said “thanks” and left satisfied. Done in 4 minutes.

Story B: User asked a coding question. Got a wrong answer. Rephrased it. Got another wrong answer. Rephrased it again. Got frustrated. Closed the tab mid-sentence. The session ended because the browser closed.

Story C: User got stuck in a hallucination loop. The AI kept confidently answering the wrong question. User tried 3 different phrasings to correct it. Eventually gave up, went to Stack Overflow, solved it in 90 seconds, and never came back.

Same event log. Completely different outcomes. Story A is product-market fit. Story B is a quality problem. Story C is churn-in-progress.

Your dashboard shows three successful 4-minute sessions.

Surprised Pikachu

^ me realizing all three of those users look identical in Amplitude


The Metrics That Fall Out of Event Tracking Are Measuring the Wrong Thing

MAU, DAU, session length, conversation depth, they all carry the same assumption: more activity equals more value. They’re proxies for engagement, and engagement was a useful proxy for value when the product was a website.

For a conversational AI product, that proxy breaks in a specific way.

A user who’s stuck in a loop is highly engaged. They’re sending messages, they’re spending time in the app, the session is long, the turn count is high. Every traditional metric flags them as a power user. They’re actually one bad conversation away from canceling.

And on the other side: your best users might have short, efficient conversations. They get what they need, they leave. Low session time, low turn count, low “engagement.” The metrics say they’re casual users. They’re actually the ones who just had a moment of genuine product value.

We track millions of conversations across the products using Agnost, and this pattern is consistent. The users with the highest session time are not the happiest users. Sometimes they’re the most frustrated ones.

Here’s a quick comparison of what most teams are tracking versus what actually tells you something useful:

What You’re Tracking NowWhat It Actually Tells You
Session countHow many conversations started
Session lengthHow long users spent (includes frustration time)
Turn countHow many messages were exchanged
DAU/MAU ratioHow often users open the product
Bounce rateDid they leave quickly
What You Should Be TrackingWhat It Actually Tells You
Task completion rateDid the user accomplish what they came for
First-turn resolutionDid they get a useful answer without rephrasing
Repetition rateDid the user repeat themselves (frustration signal)
Drop-off by turnWhere in the conversation did users give up
Resolution + returnDid a resolved conversation lead to coming back

The right column requires you to understand conversation structure. Not just that turns happened, but what kind of turns they were, what the user was trying to do, and whether they succeeded.


The Right Mental Model: A Conversation Is a Task Attempt

Here’s the reframe that changes how you think about everything.

A conversation is not a page view. It’s not even a session in the traditional sense. A conversation is a task attempt. Your user showed up with an intent, they tried to accomplish something, and either they succeeded or they didn’t.

The question isn’t “did they have a conversation?” The question is “did they complete the task they came for?”

This is how you evaluate a support agent. You don’t track how many tickets they opened. You track how many tickets got resolved. You dont track how long each call lasted. You track whether the customer’s problem was actually solved.

Your AI product is no different. The conversation is just the medium through which the task attempt happens. The analytics should reflect that.

Once you make this shift, a bunch of things become clearer:

A short conversation can be a great outcome (efficient resolution) or a terrible one (user gave up immediately). Context determines which. A long conversation can be deep engagement or a frustration spiral. Again, context.

You need an analytics layer that understands which is which.


What Conversation-Native Analytics Actually Looks Like

This isn’t hypothetical. The data layer exists, it just hasn’t been productized well for AI teams.

Conversation-native analytics starts with the assumption that a conversation has structure. There’s an intent, there’s a sequence of attempts to address that intent, and there’s an outcome. Every turn in the conversation is evidence about how well that structure is working.

Practically, this means tracking things like:

Resolution signals. Did the conversation end in a way that suggests the user got what they wanted? Positive sentiment at close, a follow-up that builds on a previous answer, the user returning within 48 hours for a related question. These are positive signals. The user rephrasing the same question three times, dropping off mid-thread, or immediately going back to the start — those are failure signals.

Repetition detection. When a user says the same thing twice in slightly different words, that’s almost always a signal that the AI didn’t understand them the first time. That’s a quality failure, and it should show up in your metrics, not get averaged out into session length.

Drop-off by position. Where in the conversation are users giving up? If 40% of your conversations end after turn 3 with no resolution signal, you have a specific problem at turn 3. That’s fixable. You just need to see it.

Intent clusters. Grouping conversations by what users are actually trying to do, not just by what feature they touched or what page they landed on. This is how you find out that 30% of your support conversations are asking the same question your docs dont answer well.

None of this is magic. It’s just applying the right model to the data you already have. The raw conversation data is there. The question is whether your analytics layer knows how to read it.

Hackerman meme person coding intensely

^ building conversation-native analytics after realizing your entire dashboard has been lying to you


Where This Is All Going

The teams I see winning in AI products right now share one thing: they’ve stopped optimizing for engagement and started optimizing for resolution. They measure success at the conversation level, not the session level. They know their task completion rate, their first-turn resolution rate, and exactly where in a typical conversation their product loses users.

They’ve built or found tools that treat the conversation as the primary unit of analysis, not an aggregate of events.

The rest of the industry is still screenshot-ing their MAU charts and wondering why retention is hard.

The shift is coming. Conversational AI is forcing a new analytics paradigm the same way mobile forced a new web analytics paradigm ten years ago. The teams that figure this out first will have a meaningful advantage, not just in product quality, but in being able to diagnose and fix problems before they show up in churn numbers.


Wrapping It Up

If your core product loop is a conversation, your core analytics primitive should be a conversation, not an event.

Event-based tracking gives you a number. That number tells you almost nothing about whether your product is actually working. It’s like evaluating a movie by counting the frames. Technically accurate. Completely useless.

The conversation is the unit. Resolution is the metric. Task completion is the outcome you’re optimizing for.

Everything else is noise.

This is exactly what we built Agnost to solve — analytics designed from the ground up for conversational products, so you can stop guessing and start knowing. Try it here.

Success kid fist pump meme

^ shipping your first fix after seeing actual conversation-level data for the first time


TL;DR: Event-based analytics tracks whether conversations happened. Conversation-native analytics tracks whether they worked. You need the second one.

Reading Time: ~7 min