A month ago, I wrote about building ARDA in record time. The system worked. It was deployed. Users were getting advice.
But “working” is not the same as “working well.” And as I learned in my previous article on technical debt, the phrase that should cause the most panic isn’t “it’s broken” – it’s “don’t touch it, it works.”
ARDA was starting to show cracks. The single-brain architecture that got us to launch was hitting its breaking point. This is the story of how I re-engineered the entire AI pipeline – and why the results were worth the pain.
The Breaking Point: One Assistant, Too Many Jobs
The original ARDA architecture was elegant in its simplicity: one powerful AI model with a massive system prompt containing the entire framework. Give it a user’s situation, and it would:
- Determine if the question was on-topic
- Detect who was asking (male or female perspective)
- Assess relationship timelines and stages
- Evaluate interest levels
- Identify red and green flags
- Score male behaviors (confidence, self-control, challenge, humor)
- Assess female attitude (integrity, giving nature, flexibility)
- Identify missing critical information
- AND THEN compose a coherent, personalized coaching response
That’s nine complex cognitive tasks for a single AI call. And it was starting to fail in predictable ways.
The system prompt was a 15,000+ word constitution trying to govern an impossibly broad mandate. The AI was constantly making trade-offs: focus on analysis or focus on advice? Be concise or be thorough? Follow the framework strictly or adapt to edge cases?
The worst failures came when women asked questions. The system was trained primarily on male-perspective coaching. When a woman would ask about her relationship situation, the AI would sometimes get confused about which role it was analyzing. It would fumble the perspective, mixing up who should be leading and who should be supporting. The advice wasn’t just mediocre – it was sometimes backwards.
The Realization: This is a Systems Engineering Problem
My background is in high-stakes systems engineering – avionics, medical devices, payment security. In those domains, you never give one component too many responsibilities. You decompose. You isolate. You create clear boundaries.
I had violated my own principles.
The solution was obvious once I saw it: treat the AI pipeline like a distributed system. Break the monolithic “super-brain” into a team of specialized analyzers, each with one focused job.
The New Architecture: A Special Forces Team
The redesigned pipeline became a three-stage orchestration:
Stage 1: The Router (Fast Triage)
A lightweight model (gpt-5-nano) with one job: classify the user’s intent. Is this on-topic? Which tier of coaching does it need? This runs in under a few seconds.
Stage 2: The Analyzer Battalion (Parallel Extraction)
Seven specialized analyzers run in parallel, each with a focused prompt and dedicated knowledge base:
- Polarity Analyzer: Determines user perspective (masculine leader vs feminine partner)
- Timeline Analyzer: Maps relationship stages and identifies failure patterns
- Missing Data Analyzer: Identifies gaps in the user’s story
- Interest Level Analyzer: Quantifies female interest (the core metric)
- Attitude Matrix Analyzer: Scores integrity, giving nature, flexibility
- Male Behaviors Analyzer: Evaluates confidence, self-control, challenge, humor
- Flags Analyzer: Catalogs red flags and green flags
Each analyzer produces structured JSON output. Each one completes in 5-8 maybe 15 seconds and they execute in parallel, they don’t wait for each other.
Stage 3: The Coach (Synthesis and Advice)
The final AI (gpt-4.1-mini) receives:
- The user’s original question
- The vast knowledge base
- The complete conversation history
- All seven analyzer reports as structured data
Its job is now singular and clear: compose a coaching response that addresses the user’s actual question, using the pre-analyzed data as its foundation.
The system prompt shrinks from 15,000 words to about 4,000. The cognitive load drops by two-thirds.
The Implementation: Clean Architecture Saves the Day
This re-architecture would have been a nightmare in a tangled codebase. But ARDA was built on Clean Architecture principles from day one.
Each analyzer is isolated with retry logic and individual database commits. If one analyzer fails, the others continue. The frontend receives real-time progress updates via Server-Sent Events.
The refactor took three intense days. Zero breaking changes to the API. Zero downtime.
The Results: Night and Day Difference
The improvement was immediate and dramatic:
Before (One prompt to do everything):
- Generic, sometimes confused responses
- Frequent perspective errors with female users
- Missed critical red flags and data points in complex situations (doubled by inability to identify missing data)
- Applying female attitude analysis on male partners
After (specialized analyzers feed data to the assistant):
- Precise, data-grounded advice
- Perfect handling of female-perspective questions (polarity analyzer routes to the correct system prompt)
- Comprehensive flag detection in every response
- Consistent, structured analysis regardless of complexity
- Mapping of every relationship and missing data
The polarity analyzer solved the female-user problem completely. Now when a woman asks for advice, the system:
- Detects she’s asking from the feminine perspective
- Loads a completely different system prompt tuned for that role
- Analyzes her situation through the correct lens
- Provides advice appropriate to her position
The Lesson: Separation of Concerns at the AI Level
This refactor reinforced a principle I’ve learned throughout my career: the fundamental patterns of good engineering transcend the technology.
Whether you’re building avionics software or an AI coaching system, the same rules apply:
- Single Responsibility: Each component should have one job
- Clear Boundaries: Inputs and outputs should be explicit
- Testability: You should be able to verify each piece independently
- Composability: Complex behavior emerges from simple, focused pieces
The AI revolution hasn’t changed these principles. If anything, it’s made them more critical.
Looking Forward: The Pipeline is Just the Beginning
The analyzer architecture opens new possibilities:
- Personalized Knowledge Injection: Each analyzer can pull specific framework knowledge relevant to its analysis
- Adaptive Depth: Another analyzer can pinpoint the exact knowledge base entries that apply, making it into an effective RAG selector
- Live Verification: Additional analyzers could check the final response for quality (empathy, directness, actionability)
- Historical Tracking: Analyzers can reference previous assessments to track user progress over time
But the core lesson remains: when you’re building AI systems, think like a systems engineer. Don’t ask one brain to do nine jobs. Build a team.
Technical Note: The complete pipeline runs on Spring Boot with LangChain4j orchestration, using a mix of OpenAI models (gpt-5-nano for routing, gpt-4.1-mini for coaching) and parallel execution via Java’s CompletableFuture. The frontend is React with Server-Sent Events for real-time progress updates.
Leave a Reply