Introducing FRAKTAG: The Fractal Knowledge Engine

By Andrei Roman

Principal Architect, The Foundry

Five days ago, I started building FRAKTAG. One day of Claude Code sprinting through infrastructure. Four days of manual testing, refactoring the data model, and building the UI. It is live. It works. And it solves a problem that has been grinding against my workflow for months.

FRAKTAG is a Fractal Knowledge Engine. It replaces flat RAG (Retrieval-Augmented Generation) with structural navigation. Instead of word-matching through vector soup, it builds a hierarchical map of your knowledge and lets AI agents zoom through it like a human would scan a table of contents.

The Problem: Context Amnesia

Standard AI interactions suffer from Context Amnesia. Models forget the big picture as conversation windows scroll. You spend half your prompts re-explaining project structure. You copy-paste architectural decisions from old chat logs. You lose hours reconstructing context that should already exist.

Existing tools handle storage (files) and search (vectors), but fail at organization and synthesis. A git repo has files. A vector database has chunks. Neither has a map.

I needed something that structures information exactly how a Senior Engineer does: broad concepts at the root, architectural maps in the middle, implementation details at the leaves.

The Pivot: From Automatic to Human-Supervised

I started out too ambitious. Automatic splitting. Automatic tree placement. Automatic everything. That was a mistake.

Testing with real documents revealed the bottleneck: ingestion quality determines retrieval quality. Garbage in, garbage out. The AI was making bad splitting decisions. Creating malformed trees. Missing context boundaries.

So I pivoted. FRAKTAG is now human-supervised. The AI proposes. The human approves. Every split. Every placement. Every decision logged in an audit trail.

The Data Model: Pure Folders and Strict Types

FRAKTAG enforces a strict taxonomy:

Folders are pure structure. They organize. They do not contain content. They can have subfolders or content, but not both. This constraint prevents chaos.
Documents are leaf nodes. They contain full text. They can have fragments (chunks from splitting), but fragments are forever tied to their parent document. Think of fragments as technical children, not organizational children.
Every node has a Title (label) and a Gist (semantic summary). Both. Always. This duality solves the leaf folder constraint problem. The gist is the README. The title is the label.
Content only lives in leaf folders. You cannot dump a document into a folder that has subfolders. This forces intentional organization.

The Ingestion Workflow

Upload: Drag-and-drop a document into the UI.

Programmatic Splitting: The system detects natural boundaries. Headers (H1, H2, H3). Horizontal rules. Page markers (for PDFs). Numbered sections (1., A., I., 1.1.). Custom regex patterns. No AI calls yet. Just deterministic logic.

Human Review: The UI shows a document minimap. Visual preview of where splits fall. You can merge adjacent sections. Split individual sections further. Edit titles and content directly. The AI can assist with splitting, but you approve everything.

AI Placement Proposal: Once splits are finalized, the AI analyzes each section and proposes a target folder. It provides reasoning and confidence scores. You can override. You can create new folders inline if needed.

Commit: Documents and fragments are created. Full audit trail persists to disk. Every decision tagged with actor (HUMAN, AI, SYSTEM) and timestamp.

FRAKTAG vs. Classic RAG

Classic RAG is Vector Soup. It chops documents into arbitrary 500-character chunks. Throws them into a flat pile. When you search, it retrieves the top 5 chunks that sound like your query. The flaw: it lacks structural awareness. It might return a paragraph from Chapter 1 and a paragraph from Chapter 10, completely missing the bridge between them.

FRAKTAG uses RAG as a seed, not the answer. Vector search finds a few relevant nodes. Then the system looks around the tree neighborhood. It reads parent context. It checks sibling nodes. It explores child fragments. It understands that Chunk A is the parent of Chunk B.

The retrieval phase also scans the entire tree structure (titles and gists only, not full content) to identify branches the vector search missed. Experience shows: vector search always misses relevant pieces. The graph topology catches them.

Shout-Out: Steve Yegge's Beads

I was inspired by Steve Yegge's Beads project. Beads is a task execution engine for AI agents. It solves dementia (agents forgetting what step they are on). It tracks dependencies. It manages time and execution.

FRAKTAG solves the opposite problem. Beads is the task engine (the what). FRAKTAG is the knowledge engine (the how and the why). Beads manages execution. FRAKTAG manages truth. Beads tracks what needs to be done. FRAKTAG tracks what is known.

In a fully realized system, Beads drives the car. FRAKTAG holds the map.

What Works Now

The Oracle: Answering complex questions by synthesizing multiple disparate sources. How does the Gas Town architecture impact our current AWS deployment strategy? FRAKTAG navigates the tree, assembles relevant nodes from multiple branches, and delivers a coherent answer with source attribution.

The UI shows streaming responses. Sources appear as they are discovered. The answer streams in real-time. Full audit trail downloadable.

Knowledge Base Portability

FRAKTAG is designed for portability. Each knowledge base is self-contained:

kb.json defines identity and organizing principles. trees/ contains one or more tree structures (different organizational views over the same content). content/ holds immutable content atoms (deduplicated via SHA-256). indexes/ stores vector embeddings. audit.log tracks every decision.

Copy a folder. Move a knowledge base. Git version it. Mount from external drives. No external databases. No vendor lock-in.

The Architecture

FRAKTAG uses hexagonal architecture. The core logic is pure TypeScript. It does not know if it runs on AWS Lambda or locally.

Two deployment stacks: Cloud (AWS Lambda, S3, Cognito, OpenAI) for production systems needing infinite scale. Local (Node.js, Ollama, filesystem) for privacy-conscious deployments. Your second brain runs on your hardware.

The database stores metadata and graph structure as JSON. Content lives in content-addressable storage (CAS). This keeps queries fast and enables full-text search when needed.

What Is Next

FRAKTAG is functional. It ingests. It retrieves. It validates. The core mechanics work. Already deployed for personal use and for private AI infrastructure projects.

Next steps:

Conversation memory (multi-turn Q&A with context retention).
Question and answer caching (why recompute answers to identical questions).
Cloud deployment via AWS CDK.

Repo: https://github.com/andreirx/FRAKTAG

Steve Yegge's Beads: https://steve-yegge.medium.com

Discussion (0)

No comments yet. Be the first!