AI Customer Interview Analysis: 3 Techniques Most Product Teams Miss

Sep 1

Most AI synthesis that Product teams are doing boils down to: "Upload your transcripts, ask for themes, then synthesize." If that.

It sounds reasonable. It doesn't work.

I've spent the last two years testing AI analysis methods across 150+ studies—with synthetic users, real participants, and consulting clients and students at companies like Canva, Figma, and Meta.

I've seen what most teams produce. And I've seen what's possible when you understand how these models actually work.

The gap is enormous.

Here's what's actually going on—and three techniques that will fundamentally change your results.

Why "Upload Everything and Ask for Themes" Fails

When you dump 8 transcripts into ChatGPT and ask for patterns, you get statistical averaging disguised as insight.

Here's what I mean: LLMs process your entire input as one context. When Customer A mentions checkout friction in a high-urgency B2B context and Customer B mentions it while casually browsing, the model doesn't maintain those distinctions. It sees "checkout friction: 2 mentions" and reports a theme.

You lose the why. You lose the when. You lose everything that makes an insight actionable.

This isn't a prompting problem. It's a fundamental limitation of how these models handle information density. And no amount of "be specific" or "consider context" instructions will fix it.

The solution isn't better prompts. It's a different architecture.

Process each transcript as an isolated analysis. Keep customer contexts completely separate until you—not the AI—decide how to synthesize across them.

This sounds obvious. Almost no one does it.

Three Techniques That Actually Work

1. Prime With Your Own Analysis (Don't Ask AI to Start From Scratch)

This is the technique that changed everything for my clients.

Most people ask AI to analyze their transcript cold. That's backwards.

Instead: review the transcript yourself first. Identify 3-5 opportunities you see. Then give those to the AI as a starting point.

The prompt structure:

“Here's a customer interview transcript. I've done an initial analysis and identified these potential opportunities:

1. [Your opportunity]

2. [Your opportunity]

3. [Your opportunity]

Review the transcript and:

- Strengthen, weaken, or refine each opportunity based on evidence

- Identify 1-2 opportunities I may have missed

- Flag anything that contradicts my interpretations”

Why this works:

You're not outsourcing your thinking.

You're pressure-testing it.

The AI becomes an aggressive devil's advocate against your own hypotheses instead of a summary machine generating plausible-sounding themes.

Teams who ask AI to analyze cold get: "Users want faster onboarding."

Teams who prime with their own analysis get:

"Your hypothesis about onboarding friction is partially supported—Customer explicitly mentioned time pressure in lines 34-38.

However, the deeper issue is likely trust, not speed. See lines 52-57 where they describe hesitation about data security despite completing onboarding quickly."

One of those changes product decisions. The other gets ignored for telling us what we already knew five years ago.

2. The Verification Loop (AI Auditing Itself)

Here's something most people don't realize: AI will confidently cite evidence that doesn't exist.

In my testing, roughly 25-30% of "supporting quotes" that ChatGPT provides are either paraphrased beyond recognition or completely fabricated. Not because the model is broken—because it's optimizing for coherent, helpful-sounding output.

The fix is a two-step verification loop:

Step 1: Generate findings with citations

“Analyze this transcript. For each opportunity you identify, include:

- The exact quote (verbatim, in quotation marks)

- The line numbers where it appears

- Your confidence level (high/medium/low)”

—

Step 2: Verify against source

“Review each finding above. Go back to the original transcript and confirm:

- Does the exact quote appear verbatim?

- Do the line numbers match?

- Is the interpretation justified by surrounding context?

Flag any findings where the evidence doesn't hold up.”

—

This catches hallucinations before they become "insights" you present to stakeholders.

I've had this loop surface that an AI-generated "key finding" was based on a quote that didn't exist anywhere in the transcript. Without verification, that fake insight would have influenced product decisions.

3. Few-Shot Priming (Show the Model What Good Looks Like)

This is the most underutilized technique in AI research analysis.

LLMs are pattern-matching machines. If you show them examples of your desired output quality, they'll match that pattern. If you don't, they default to generic summarization.

Build a "golden example"—one transcript you've analyzed exceptionally well.

Include:

The raw transcript
Your analysis output in exactly the format you want
Annotations explaining why you made certain interpretations

Then include this as context when analyzing new transcripts:

“Here's an example of how I analyze customer interviews:

[Your golden example]

Now analyze this new transcript using the same depth, format, and reasoning approach:

[New transcript]”

—

The quality jump from few-shot priming is dramatic. You're essentially training the model on your analytical standards in real-time.

Teams using few-shot priming produce analyses that sound like they came from a senior researcher. Teams prompting cold get book-report summaries.

The Compound Effect

Each technique helps independently. Combined, they create a feedback loop:

Priming with your own analysis forces you to engage with the data
Verification catches errors before they compound
Few-shot examples raise the quality ceiling on every analysis

This is how my students go from "AI gives me useless summaries" to "AI is the best research collaborator I've ever had."

The teams still uploading transcripts and asking for themes? They're getting exactly what they've always gotten—tidy lists of findings that don't change anything.

Same tool. Completely different results.

The Real Gap

Here's what I've learned from teaching this to hundreds of senior researchers and PMs:

The gap isn't access to AI anymore. Everyone has ChatGPT. Everyone has Claude.

The gap is in the approach to using it.

Understanding why models fail at synthesis.
Knowing which techniques actually move the needle.
Building workflows that use AI's speed without sacrificing results you can make your toughest decisions on.

That's the system I teach in AI Analysis for PMs—the complete workflow from transcripts and survey responses to insights you can back hard decision with.

It includes full workflows, prompt templates, verification frameworks, and every live session gets you from data to insights you can confidently share with stakeholders the next day.

Learn about the course

Caitlin Sullivan