Your users almost never file feature requests. But they tell your AI agent exactly what they want, dozens of times a day, in phrases like “can you also…” and “is there a way to…”. To find feature requests from conversations, you detect those request-shaped intents automatically, cluster the near-duplicates, then rank them by frequency and the revenue sitting behind them.
The hard part is not detection. The hard part is separating a real feature gap from your agent simply misunderstanding the question. Get that wrong and you ship things nobody asked for while ignoring the asks that actually churn accounts.
This is a method piece. Here is how to do it without drowning in logs.
Why feature requests hide inside conversations
Public roadmaps and feedback boards capture maybe the loudest 2% of your users. The motivated ones. The ones who already love you enough to log in and type a request into a form.
Everybody else just talks to the agent. They hit a wall, ask the agent to do the thing, get told “I cant do that yet,” and move on. That moment is a feature request. It just never gets written down anywhere a PM will see it.
The conversation is the highest-signal feedback channel you have, because the request shows up at the exact moment of intent. The user wanted something, right now, badly enough to ask. No survey fatigue. No “what would you pay for” hypotheticals. Real demand, timestamped.
The catch: it is buried in unstructured text across thousands of threads, mixed in with greetings, off-topic questions, and the agent’s own failures. Keyword search wont save you here. “Can you” matches half your transcripts and tells you nothing.
What does a request-shaped intent actually look like?
Before you can cluster anything, you need to reliably flag the turns that carry a request. In practice they fall into a handful of patterns:
- Direct asks: “Can you also export this to CSV?” / “Is there a way to schedule these?”
- Capability probes: “Do you support Slack notifications?” / “Does this work with Salesforce?”
- Workaround signals: “I usually have to copy this into a spreadsheet to…” (they are describing the gap you should close)
- Comparative gaps: “Tool X lets me do Y, can you do that?”
- Frustrated repeats: the user rephrases the same blocked ask three times in one session
That last one is gold and almost everyone misses it. A single phrasing might be ambiguous. The same user trying three different ways to get the agent to do something is not ambiguous at all. That is intensity, and intensity is a ranking signal.
You can classify these with a cheap LLM pass over each conversation turn. Tag every turn as request / question / complaint / smalltalk / other. Don’t overthink the taxonomy. The goal is a clean bucket of request-shaped turns you can work with, not a perfect ontology.
How do you separate real gaps from agent misunderstandings?
Here is the trap. “Is there a way to do X?” can mean two completely different things:
- X genuinely does not exist in your product. Real feature request.
- X exists, but the agent didn’t know about it, surfaced it badly, or hallucinated that it couldn’t.
If you treat every “is there a way to” as a feature request, you’ll flood your roadmap with phantom demand for features you already shipped. We’ve watched teams almost build a duplicate of an existing integration because the agent kept telling users it wasn’t supported.
The way through is to cross-check each candidate request against two things: what your product can actually do, and what the agent said next. A simple decision table:
| User asks for X | Agent’s response | Verdict |
|---|---|---|
| X does not exist | ”I can’t do that” | True feature gap |
| X exists | ”I can’t do that” | Agent / prompt bug, not a feature request |
| X exists | gives wrong steps | Agent knowledge gap |
| X is ambiguous | asks a clarifying question | Needs human review |
Rows two and three are not feature requests at all. They are agent failures wearing a feature request costume. They still matter (a lot), but they get fixed in your system prompt and tooling, not on your product roadmap. Routing them correctly is half the value of this whole exercise.
Cluster, then rank by what actually moves the business
Once you have a clean set of true requests, you’ll notice the same ask shows up in fifty different phrasings. “Export to CSV,” “download as spreadsheet,” “get this in Excel,” “can I pull this into a sheet.” Same request. Cluster them semantically so they collapse into one line item with a count of 50, not fifty separate tickets.
Then rank. And please, do not rank by raw frequency alone. Frequency tells you what is common, not what is valuable. The ranking that actually drives decisions combines:
- Frequency: how many distinct users asked
- Intensity: repeats within a session, escalations, frustration markers
- Revenue exposure: the ARR of the accounts asking, weighted toward expansion and churn-risk accounts
- Stall correlation: did the conversation end in drop-off, downgrade, or a “never mind” right after the blocked ask
A request from three enterprise accounts that all stalled out afterward beats a request from two hundred free-tier users who kept using the product anyway. That is the difference between a feature analysis and a feature board nobody trusts.
When teams run this properly, the volume is genuinely surprising. One team, Odysser, surfaced over 1,000 distinct feature requests hiding in their agent chats, the overwhelming majority of which had never been filed anywhere. They weren’t getting fewer requests than they thought. They were just blind to them.
Turning this from a one-time audit into a living loop
A quarterly export-and-cluster exercise is better than nothing. But it ages fast, and by the time you’ve finished the analysis the next batch of conversations is already piling up.
The real win is making this continuous: every conversation gets classified as it happens, requests get clustered into existing themes automatically, and your top-ranked gaps stay current without anyone running a script. This is exactly the loop Agnost AI is built to run. It reads every conversation your agent has, auto-generates intents like feature requests and setup friction that are specific to your product, and tracks them live so you see which gaps are growing and which accounts they’re tied to. When the issue is actually an agent misunderstanding rather than a missing feature, it opens a pull request against your system prompt or harness to fix it, and you review and merge.
The point is to stop treating conversations as logs you grep when something breaks, and start treating them as your most honest product backlog.
FAQ
How is this different from a feedback board like Canny?
Feedback boards capture the small slice of users motivated enough to file something. This method captures demand from the 98% who never will, by reading what they already told your agent. Use both. Boards for explicit, prioritized asks; conversation mining for the silent majority and the requests people don’t realize are requests.
Won’t an LLM over-flag everything as a feature request?
Yes, if you skip the cross-check step. The fix is the decision table above: compare each candidate against what your product can actually do and what the agent said next. That filters out the “feature request” that is really just the agent forgetting a feature exists, which is the single biggest source of false positives.
What’s the minimum to get started this week?
Run a one-time LLM classification pass over your last month of transcripts, bucket the request-shaped turns, cluster the duplicates, and tag each cluster with the asking account’s plan tier. You’ll have a ranked list by Friday. Make it continuous after you’ve seen it work once.
If you’d rather not stitch this pipeline together by hand, Agnost AI does it continuously across every conversation and tracks each request cluster against the accounts and revenue behind it. It’s free to start, integrates in about two minutes, and works with any LLM or framework.