I Built an AI Coach in 2 Months. AI Engineering is Engineering.

Two months ago, ARDA was just an idea born from a 20-year-old dream: an AI coach that could provide the brutally honest, reality-based relationship advice I wished I’d had as a young man. Today, it’s a deployed, full-stack application.

This wasn’t a miracle. It was a sprint of intense AI engineering, a process that was far more about taming the beast than simply wiring up an API. This is the real story of what it takes to build a truly intelligent, context-aware AI system.

The First Mistake: Naive RAG and the “Similarity” Trap

My initial approach was the textbook one: a simple Retrieval-Augmented Generation (RAG) pipeline. The idea was to take a user’s question, find “similar” chunks of text in my vast knowledge base using token similarity, and feed those chunks to the AI.

It was a complete failure.

The system was stupid. It would find superficial keyword matches but miss the deep, underlying principles. It was like a research assistant who could find books with the word “king” in the title but couldn’t tell you the first thing about Machiavelli. I realized that for a system as nuanced as ARDA, a simple RAG pipeline was not enough.

The Pivot: Brute-Force Context

This led to a radical pivot. Instead of spoon-feeding the AI small, “similar” chunks of knowledge, I decided to give it the entire library. I would leverage the massive context windows of modern models to inject ARDA’s entire philosophical DNA into every single conversation.

My job transformed from “developer” to “knowledge distiller.” I spent weeks meticulously deconstructing dozens of foundational texts – from Doc Love to Daniel Priestley to Jordan Peterson – and structuring them into a tiered JSON knowledge base, working them to first principles, ensuring no critical nuance was lost.

The BASE_ASSISTANT gets an 85k token knowledge dump.
The WINGMAN_ASSISTANT gets 120k.
The FULL_COACH gets the entire 150k+ token library.

The Model Gauntlet: Finding the Right “Brain” for the Job

Injecting the full context was a breakthrough, but it revealed the next major challenge: not all models are created equal. I ran a gauntlet of tests:

GPT-5: Brilliant, creative, and utterly undisciplined. It would constantly break character, defaulting to a helpful, therapeutic, mainstream persona. It was a genius that couldn’t follow orders. Useless for ARDA.
Gemini 2.5 Pro and Flash: The massive context window was seductive, but like GPT-5, it would sometimes wander off track, losing the core thread of the ARDA philosophy.
Claude Sonnet 4: Impressively good at adhering to a large set of initial instructions. It was a disciplined soldier, excellent for scaffolding and following architectural principles. A strong contender. VERY EXPENSIVE.
GPT-4.1-mini: The unexpected champion. While GPT-4.1 was marginally better at analysis, it was significantly slower and more expensive. GPT-4.1-mini hit the perfect sweet spot: it was disciplined enough to stay in character, fast enough for a real-time chat experience, and cost-effective enough to build a viable business on.

The final architecture became a multi-model “special forces” team, orchestrated with LangChain4j:

The Router (GPT-5-micro): Solid, fast, and reliable for the simple task of classifying user intent.
The Translator (GPT-5-mini): Exceptionally good at its one job.
The Coach (GPT-4.1-mini): The workhorse, the disciplined core of the operation.

I am also exploring self-hosting models, I don’t want a critical piece to depend on one vendor.

The “In-Memory Database”: Conversation as Context

Early on, I considered modeling the user’s situation in a complex set of database entities. I quickly realized this was a fool’s errand. I would be constantly translating text to entities and back to text. Overengineering much?

The solution was simpler and more powerful: the conversation history is the database. The entire, ongoing chat log is passed to the AI with every turn. We do have large context models. This allows the AI to “remember” its previous diagnoses, track the user’s progress, and detect changes in his situation or proficiency level without a complex persistence layer.

How Do I Test This?

Oh this was maybe 40% of the effort. I have a batch of 150 input “stories” where I already knew what the answers should be. I also went to Reddit for fresh content and see how the AI would respond and I found many gaps this way. Some were added to the test batch. With any major rebuild of the system prompt or a reorg of the knowledge base I would run them again and verify the AI responses.

This can absolutely be automated in the future. I am even thinking of adding LIVE VERIFICATION of the AI responses – did it address the user question directly? Did it empathize if the user was in pain? Did it introduce new (and appropriate) information? Did it reframe the situation in a productive way? (for the user to see their own way out)

But live verification would mean longer response times. We’ll see where this goes.

Conclusion: AI Engineering is the New Frontier

In the end, building ARDA was less of an application engineering challenge and more of an AI engineering challenge. The Spring Boot backend and React frontend were straightforward, thanks to the Clean Architecture principles I had mastered on previous projects. That solid foundation gave me the freedom to iterate on the AI component at an incredible pace, going from idea to deployment in just two months.

The final lesson is this: building a powerful AI product today is not about having the most complex backend. It’s about a relentless, empirical process of:

Distilling and structuring deep domain knowledge.
Selecting the right model for the specific job, not just the biggest one.
Engineering a master prompt that acts as an unshakeable constitution for the AI’s “mind.”
Test, verify, and then verify some more, because when you tweak something, you might get unexpected breakdowns.

It’s a new kind of engineering, a hybrid of librarian, psychologist, and architect. And we are only just getting started.

Comments

One response to “I Built an AI Coach in 2 Months. AI Engineering is Engineering.”

November 24, 2025

Re-Engineering ARDA's AI Pipeline – Bijuterie Software Blog

[…] month ago, I wrote about building ARDA in record time. The system worked. It was deployed. Users were getting […]

I Built an AI Coach in 2 Months. AI Engineering is Engineering.

The First Mistake: Naive RAG and the “Similarity” Trap

The Pivot: Brute-Force Context

The Model Gauntlet: Finding the Right “Brain” for the Job

The “In-Memory Database”: Conversation as Context

How Do I Test This?

Conclusion: AI Engineering is the New Frontier

Comments

One response to “I Built an AI Coach in 2 Months. AI Engineering is Engineering.”

Leave a Reply Cancel reply

More posts

AMODX First Iteration: This Demonstrates The Idea

WordPress Hosting is a Grift – I Built a Serverless OS to Escape It

Building AMODX: the Ultimate WordPress Alternative

Private AI Infrastructure: How to Use AI in Regulated Industries