The 2025 Guide to AI Product Development

8 Principles We Think Every Leader Should Know

Jul 08, 2025

When we started building our first AI products in 2017, we were operating in uncharted territory. There was no playbook for 'prompt engineering', that concept wouldn't emerge for years. Instead, we were learning hard lessons about early language models: discovering why our early assistant prototypes would unexpectedly go off piste, and fielding questions from onlookers about why small language models embedded in mobile apps didn’t feel more like Scarlett Johansson’s famous AI avatar.

Fast forward to today, and we're helping ASX-listed companies create highly impactful LLM-powered systems that handle vast amounts of valuable interactions.

The technology has transformed the way ambitious organisations are now creating new experiences for customers, driving efficiency and growth internally and augmenting the value that their people and data offer.

In working across a diverse bunch of AI projects over the last few years, we’ve developed some principles that, although not locked in stone, are robustly discussed every day in our work.

We've found that applying product development principles to internal AI initiatives—which forms a significant portion of our work—is just as valuable as using them for customer-facing AI applications.

So whether you're leading AI strategy from the C-suite or building it in the trenches, we think these eight principles will help you ship AI products that users actually adopt and businesses actually benefit from:

1. User Jobs-to-be-Done, Not AI Jobs-to-be-Done

Whereas traditional thinking always had us start with user needs and building features to meet them, in working with AI we still start with user needs but need to figure out where AI actually adds value.

The biggest mistake we see people make? Falling in love with what the AI can (or might be able to) do instead of what users need it to do.

Your users don't care that your LLM can write poetry, count the Rs in strawberry and explain quantum physics. They care about getting their specific job done faster, better, or cheaper or even more creatively. Before you write a single prompt, map out the core user journey and identify the specific friction points where AI can genuinely help.

Additionally, when it comes to AI workflow design, we constantly need to be questioning why things are currently done in a certain way. In 2025, there stacks of ways to reimagine customer experiences and organisational workflows with a view to what’s now possible.. that simply wasn’t yesterday.

2. AI Product Owners: business and technology

The most successful AI implementations we've seen at Move 37 have one thing in common: dedicated business and technology product owners who can make decisions in real-time.

Not just stakeholders or reviewers but actual decision-makers with some form of skin in the game. Just to be clear, these people come in many different forms - CEO of medium-sized businesses, CTOs, CMOs, Product Innovation Leads etc.

Here's why this matters more than ever with AI products:

a. AI decisions happen at the intersection of business logic and technical possibility.

When your LLM suggests three different approaches to handling a customer complaint, you need someone who understands both the business implications ("Will this response align with our brand voice and customer retention goals?") and the technical constraints ("Can we reliably detect when to use this response pattern, and what's our fallback if the confidence score is low?").

b. The judgment calls are constant and nuanced:

Should the AI be more helpful or more cautious in this scenario?
When should we escalate to humans vs. push for automation?
How do we balance personalisation with privacy?
What level of AI confidence threshold makes business sense?
How do we handle edge cases that represent 1% of interactions but 50% of user frustration?

AI product development is a team sport that requires both business alignment and technical discernment, at the same time.

3. Deterministic Outcomes from Non-Deterministic Systems

One of the biggest challenges in working with business and technology teams is developing a shared understanding of the characteristic of the AI in play in any given system and job-to-be-done. We literally never rely solely on an LLM’s training data to deliver value in a system. Our collaborative Move 37/client teams work hard at context engineering, and all that that entails (managing local data, memory, user personalisation and state and heaps more). In traditional software engineering, the aim was always to build predictable systems with consistent outputs. But with an LLM and the surrounding engineering, the opportunity is far greater.

So something we’ve learned when working with generative AI, is that we need to:

Embrace variability while engineering reliability.

LLMs are probabilistic by nature so they'll give you different outputs for the same input. This drives traditional product people crazy, but it's a feature, not a bug.

The key is building systems that deliver consistent user outcomes even when the underlying AI responses vary.

So consider your new reliability framework:

Input standardisation: Control what goes into the model
Output validation: Check what comes out before it reaches users
Graceful degradation: Have fallbacks when AI fails
Human-in-the-loop triggers: Know when to escalate to humans

Initially, think of it like managing a brilliant but inexperienced intern. You give them clear instructions, check their work, and have backup plans.

4. Progressive Prompt Architecture (Your New MVP Strategy)

Forget trying to build the AI assistant that does everything. Start with one specific task, nail the prompt and context engineering, then gradually expand. We call this "progressive prompt architecture"—kinda like progressive web apps, but for AI capabilities.

A potential expansion path can look like this:

Single-shot reliability: get one specific task working really well
Context expansion: Add relevant context, memory and other
Multi-turn interactions: Enable back-and-forth interactions where needed
Workflow integration: Connect multiple AI capabilities and external systems
Autonomous operation: Reduce human oversight where appropriate

5. Evaluation-Driven Development

There’s not a day (or even hour) that Pan, our co-founder/CTO doesn’t mention evals. In traditional product development we’d largely focus on user behaviour and business metrics to assess the successful implementation of an app or platform.

But with AI, we need to measure AI performance before it impacts these same user metrics.

Traditional A/B testing isn't enough for AI products. You need evaluation pipelines that catch AI failures before they become user failures. This means building evaluation frameworks that test:

Correctness: Is the AI giving accurate information?
Relevance: Is the response actually helpful for this user's context?
Safety: Could this output cause harm or violate policies?
Consistency: Would the AI handle similar inputs similarly?
Latency: Is it fast enough for the user experience?

Pro tip: be careful about overfitting in evaluation. If you’re achieving 100% accuracy it’s likely that your product is deeply broken or you are tracking the wrong metrics.

Hypothetical AI product collaboration image :)

6. Context is Your Product Moat

I get the feeling most people are getting across this these days but it’s still worth saying out loud - your unfair advantage is the context you provide to your models.

Every company has access to the same foundation models. Your competitive moat isn't the LLM itself, it's the proprietary context, data, and domain knowledge you feed into it.

There are a lot of different aspects to context but some common lenses include:

User context: Who is this person and what's their history?
Situational context: What's happening right now? What is job to be done?
Domain context: What are the rules, constraints, and best practices?
Organisational context: How does this fit into broader company processes?

The companies winning with AI are engineering better context. Invest in your context pipeline like you would invest in your core product features.

7. Human-AI Collaboration Patterns (Not Replacement)

There’s a reason we have the phrase “Human in the loop” embroidered on our Move 37 bomber jackets. It’s because since day one we understood that our role as product creators was to amplify human capabilities with AI superpowers via augmentation.

The most successful AI products we've built don't replace humans, they make humans superhuman. Instead of thinking "how can AI do this job," think "how can AI help humans do this job better."

Some proven collaboration patterns:

AI drafts, human refines: AI creates first version, human polishes
Human steers, AI executes: Human provides direction, AI handles execution
AI suggests, human decides: AI provides options, human makes final call
AI monitors, human intervenes: AI handles routine cases, escalates edge cases

Your users probably don't want to be replaced by AI. They want to be empowered by it. At a high level, design for augmentation, not automation (even some specific tasks become automated).

8. Cost-Performance Product Trade-offs

AI products introduce a new dimension to the traditional quality-speed-cost triangle. Every AI interaction has a marginal cost, and model performance often comes with exponential price increases. We always need to be keeping an eye on how to balance model capability, latency, and cost per interaction.

Your new optimisation framework:

Model tiering: Use smaller models for simple tasks, larger models for complex ones
Caching strategies: Don't re-compute what you've already computed
Prompt optimisation: efficient prompt engineering = lower costs and faster responses
Result recycling: Similar inputs should reuse previous outputs when appropriate

For example, we built a content authoring system that uses a small, fast model for obvious cases (clearly appropriate or clearly inappropriate content) and only escalates ambiguous cases to a more expensive, sophisticated model. This reduced costs by 80% while maintaining accuracy.

In Summary, the meta-principle: Build for Humans, Scale with AI

After eight years and a bunch of AI products for ourselves and client partners, here's what we've learned: the best AI products feel magical to users and boring to engineers.

Users shouldn't have to think about prompts, tokens, or model limitations. They should just get their job done better than before. Our product and engineering teams, meanwhile, should have robust systems, clear evaluation metrics, and predictable cost structures.

Now that hardly any organisations need to train their own models, the companies that will capture advantage are the ones that apply these kinds of well-considered product development principles to their AI initiatives.

What's your experience been? How are you thinking about building AI products? What principles would you add to this list?

Move 37 helps ambitious organisations build AI products and tools that drive impact. If you're wrestling with any of these challenges, reach out, we'd love to share what we've learned.

AI Field Notes

Discussion about this post