Re-Engineering ARDA's AI Pipeline - Bijuterie Software Blog

A month ago, I wrote about building ARDA in record time. The system worked. It was deployed. Users were getting advice.

But “working” is not the same as “working well.” And as I learned in my previous article on technical debt, the phrase that should cause the most panic isn’t “it’s broken” – it’s “don’t touch it, it works.”

ARDA was starting to show cracks. The single-brain architecture that got us to launch was hitting its breaking point. This is the story of how I re-engineered the entire AI pipeline – and why the results were worth the pain.

The Breaking Point: One Assistant, Too Many Jobs

The original ARDA architecture was elegant in its simplicity: one powerful AI model with a massive system prompt containing the entire framework. Give it a user’s situation, and it would:

Determine if the question was on-topic
Detect who was asking (male or female perspective)
Assess relationship timelines and stages
Evaluate interest levels
Identify red and green flags
Score male behaviors (confidence, self-control, challenge, humor)
Assess female attitude (integrity, giving nature, flexibility)
Identify missing critical information
AND THEN compose a coherent, personalized coaching response

That’s nine complex cognitive tasks for a single AI call. And it was starting to fail in predictable ways.

The system prompt was a 15,000+ word constitution trying to govern an impossibly broad mandate. The AI was constantly making trade-offs: focus on analysis or focus on advice? Be concise or be thorough? Follow the framework strictly or adapt to edge cases?

The worst failures came when women asked questions. The system was trained primarily on male-perspective coaching. When a woman would ask about her relationship situation, the AI would sometimes get confused about which role it was analyzing. It would fumble the perspective, mixing up who should be leading and who should be supporting. The advice wasn’t just mediocre – it was sometimes backwards.

The Realization: This is a Systems Engineering Problem

My background is in high-stakes systems engineering – avionics, medical devices, payment security. In those domains, you never give one component too many responsibilities. You decompose. You isolate. You create clear boundaries.

I had violated my own principles.

The solution was obvious once I saw it: treat the AI pipeline like a distributed system. Break the monolithic “super-brain” into a team of specialized analyzers, each with one focused job.

The New Architecture: A Special Forces Team

The redesigned pipeline became a three-stage orchestration:

Stage 1: The Router (Fast Triage)

A lightweight model (gpt-5-nano) with one job: classify the user’s intent. Is this on-topic? Which tier of coaching does it need? This runs in under a few seconds.

Stage 2: The Analyzer Battalion (Parallel Extraction)

Seven specialized analyzers run in parallel, each with a focused prompt and dedicated knowledge base:

Polarity Analyzer: Determines user perspective (masculine leader vs feminine partner)
Timeline Analyzer: Maps relationship stages and identifies failure patterns
Missing Data Analyzer: Identifies gaps in the user’s story
Interest Level Analyzer: Quantifies female interest (the core metric)
Attitude Matrix Analyzer: Scores integrity, giving nature, flexibility
Male Behaviors Analyzer: Evaluates confidence, self-control, challenge, humor
Flags Analyzer: Catalogs red flags and green flags

Each analyzer produces structured JSON output. Each one completes in 5-8 maybe 15 seconds and they execute in parallel, they don’t wait for each other.

Stage 3: The Coach (Synthesis and Advice)

The final AI (gpt-4.1-mini) receives:

The user’s original question
The vast knowledge base
The complete conversation history
All seven analyzer reports as structured data

Its job is now singular and clear: compose a coaching response that addresses the user’s actual question, using the pre-analyzed data as its foundation.

The system prompt shrinks from 15,000 words to about 4,000. The cognitive load drops by two-thirds.

The Implementation: Clean Architecture Saves the Day

This re-architecture would have been a nightmare in a tangled codebase. But ARDA was built on Clean Architecture principles from day one.

Each analyzer is isolated with retry logic and individual database commits. If one analyzer fails, the others continue. The frontend receives real-time progress updates via Server-Sent Events.

The refactor took three intense days. Zero breaking changes to the API. Zero downtime.

The Results: Night and Day Difference

The improvement was immediate and dramatic:

Before (One prompt to do everything):

Generic, sometimes confused responses
Frequent perspective errors with female users
Missed critical red flags and data points in complex situations (doubled by inability to identify missing data)
Applying female attitude analysis on male partners

After (specialized analyzers feed data to the assistant):

Precise, data-grounded advice
Perfect handling of female-perspective questions (polarity analyzer routes to the correct system prompt)
Comprehensive flag detection in every response
Consistent, structured analysis regardless of complexity
Mapping of every relationship and missing data

The polarity analyzer solved the female-user problem completely. Now when a woman asks for advice, the system:

Detects she’s asking from the feminine perspective
Loads a completely different system prompt tuned for that role
Analyzes her situation through the correct lens
Provides advice appropriate to her position

The Lesson: Separation of Concerns at the AI Level

This refactor reinforced a principle I’ve learned throughout my career: the fundamental patterns of good engineering transcend the technology.

Whether you’re building avionics software or an AI coaching system, the same rules apply:

Single Responsibility: Each component should have one job
Clear Boundaries: Inputs and outputs should be explicit
Testability: You should be able to verify each piece independently
Composability: Complex behavior emerges from simple, focused pieces

The AI revolution hasn’t changed these principles. If anything, it’s made them more critical.

Looking Forward: The Pipeline is Just the Beginning

The analyzer architecture opens new possibilities:

Personalized Knowledge Injection: Each analyzer can pull specific framework knowledge relevant to its analysis
Adaptive Depth: Another analyzer can pinpoint the exact knowledge base entries that apply, making it into an effective RAG selector
Live Verification: Additional analyzers could check the final response for quality (empathy, directness, actionability)
Historical Tracking: Analyzers can reference previous assessments to track user progress over time

But the core lesson remains: when you’re building AI systems, think like a systems engineer. Don’t ask one brain to do nine jobs. Build a team.

Technical Note: The complete pipeline runs on Spring Boot with LangChain4j orchestration, using a mix of OpenAI models (gpt-5-nano for routing, gpt-4.1-mini for coaching) and parallel execution via Java’s CompletableFuture. The frontend is React with Server-Sent Events for real-time progress updates.

Re-Engineering ARDA’s AI Pipeline

The Breaking Point: One Assistant, Too Many Jobs

The Realization: This is a Systems Engineering Problem

The New Architecture: A Special Forces Team

Stage 1: The Router (Fast Triage)

Stage 2: The Analyzer Battalion (Parallel Extraction)

Stage 3: The Coach (Synthesis and Advice)

The Implementation: Clean Architecture Saves the Day

The Results: Night and Day Difference

The Lesson: Separation of Concerns at the AI Level

Looking Forward: The Pipeline is Just the Beginning

Comments

Leave a Reply Cancel reply

More posts

AMODX First Iteration: This Demonstrates The Idea

WordPress Hosting is a Grift – I Built a Serverless OS to Escape It

Building AMODX: the Ultimate WordPress Alternative

Private AI Infrastructure: How to Use AI in Regulated Industries