Forensic AI Studio
Private Tool · Not a Product

Forensic AI
Studio

I built a private AI investigator for a real legal case. It ingests evidence, maps entity relationships, analyzes audio recordings, and reasons across 100K+ documents in real-time.

155
TypeScript files
45
API routes
28+
Agent tools
25
DB tables

What Is This?

This is a full-stack AI investigation platform I built for personal use during a complex legal case. It's not a product, not a demo — it's a working tool I use daily. The codebase is a Next.js 16 app with a PostgreSQL + pgvector backend, a 28-tool AI agent layer, a voice interface, and a deep research orchestrator. Everything is purpose-built for one job: finding the truth in a mountain of evidence.

Evidence Ingestion

Parses PDF, DOCX, ODT, EML, HTML, CSV, WhatsApp SQLite DBs, and audio/video files. Chunks, embeds, and extracts named entities automatically.

AI Agent with 28+ Tools

A persistent agent that can search evidence, recall memories, run deep research, analyze audio, query Gmail/Drive, browse the web, and execute multi-step investigations.

Audio Intelligence

Sends audio recordings to Gemini's audio modality (50–80K tokens per file). Extracts transcripts, speaker labels, entities, and forensic findings. Results cached to avoid re-spending tokens.

Capability Map

Every module is production-quality, purpose-built, and wired together into a single coherent system.

DONE

Evidence Browser

  • Search + filter 169 ingested documents
  • Chunk viewer with source attribution
  • Inline ingestion panel
  • pgvector semantic search + GIN full-text
DONE

Entity Graph

  • d3-force relationship visualization
  • 8 entity types (people, companies, …)
  • Greek + English NER via Gemini Flash
  • Entity CRM with profile editor
DONE

Media Lab

  • Waveform viewer + transport bar
  • Gemini audio modality analysis
  • Speaker diarization + labeling
  • Checkpoint cache (no re-spend)
DONE

Deep Research

  • 5-phase orchestrator: Plan → Gather → Synthesize → Iterate → Report
  • Tavily web search + local evidence
  • Gmail / Drive integration
  • Gemini Deep Research (optional)
DONE

Vector Memory

  • Semantic dedup (cosine > 0.85 → merge)
  • 5 categories: entity, fact, decision, preference, focus
  • Sub-millisecond in-memory search
  • Lightweight context hint (~50 tokens)
DONE

Context Manager

  • Budget-aware layered context
  • Dynamic turn window (4–20 turns)
  • Gemini explicit cache (75% cost reduction)
  • Handles 1M → 128K model switches
PARTIAL

Integrity & Tamper

  • SHA-256 hash verification per document
  • Integrity checks table + trust scores
  • Immutable audit log
  • Audio spectral analysis (planned)
DONE

Timeline

  • Normalized event timeline
  • Cross-referenced with entities
  • Filterable by event type
  • Auto-population from ingested data (planned)
DONE

Voice Interface

  • Whisper STT
  • ElevenLabs TTS (sentence-level streaming)
  • Gemini TTS (free tier fallback)
  • Voice settings: provider, model, voice, mode

The Agent's Toolbox

28+ tools across 7 categories. The agent decides which to call, in what order, and how to chain results.

Memory

  • recall_memorySemantic search over stored memories
  • save_memoryExplicitly store a fact or preference
  • manage_memoryDelete / edit / prune memory store

Knowledge & Investigation

  • fast_contextParallel semantic + FTS + metadata search
  • investigateCross-reference multiple sources
  • create_investigationMulti-step research plan

Deep Research

  • deep_researchLaunch 5-phase background research task
  • check_researchPoll task status
  • followup_researchCheap follow-up on Gemini DR results

Google Workspace

  • search_gmail / read_gmail / send_gmailFull Gmail access
  • search_drive / read_drive_fileGoogle Drive
  • list_calendar_eventsCalendar
  • search_contactsContacts

Audio Intelligence

  • analyze_audioGemini audio modality — listen + analyze
  • get_analysisRetrieve cached analysis results

Self-Inspection

  • check_app_stateApp activity snapshot
  • read_tracesInspect own behavior logs
  • adjust_personaPropose behavioral change
  • send_to_inbox / read_inboxMessage IDE agent (Cascade)

System Architecture


User Message
    │
    ▼
context-manager.ts  ──── Budget-aware layered context
    │                    ┌─ Compass (90K, Gemini cache)
    │                    ├─ Persona (~820 tokens)
    │                    ├─ Memory Hint (~50 tokens)
    │                    ├─ Session Summary (~300 tokens)
    │                    └─ Recent Turns (4–20, dynamic)
    │
    ▼
AI SDK streamText()  ──── Multi-model: Gemini / Claude / GPT
    │
    ├── Tool Calls ──────► 28+ tools execute in parallel
    │       │
    │       ├── fast_context    → pgvector + FTS search
    │       ├── analyze_audio   → Gemini audio modality
    │       ├── deep_research   → 5-phase orchestrator
    │       ├── search_gmail    → Google OAuth
    │       └── recall_memory   → in-memory cosine search
    │
    ▼
Assistant Response
    │
    ▼
memory-extractor.ts  ──── Async: extract + store memories
    │
    ▼
memory_vectors (PG)  ──── HNSW indexed, semantic dedup

Database

  • PostgreSQL + pgvector
  • 25 tables
  • HNSW vector indexes
  • GIN full-text indexes

AI Models

  • Gemini 2.0 Flash (primary)
  • Claude 3.5 Sonnet
  • GPT-4o
  • OpenAI embeddings (1536d)

Stack

  • Next.js 16 App Router
  • Vercel AI SDK 6
  • Drizzle ORM
  • Google Cloud Storage

Key Engineering Decisions

Problem

Flat memory tables capped out and dumped entirely into context window

Solution

Vector-backed memory: each memory is a separate row with its own embedding. Semantic dedup (cosine > 0.85 → merge). Context gets a 50-token hint instead of a 1,200-token dump.

Impact
96% context reduction for memory layer
Problem

Fixed turn windows (e.g. 'last 20 turns') broke on model switches from 1M → 128K tokens

Solution

Budget-aware dynamic allocation: 30% reserved for reasoning, 15% for tool results, 55% for context. Turn window binary-searches for max turns that fit remaining budget.

Impact
Seamless model switching without context overflow
Problem

Implicit Gemini caching didn't guarantee cache hits, wasting tokens on the 90K compass document

Solution

Explicit Gemini cache API: uploads compass + persona + tools once, references by name on subsequent requests.

Impact
75% token cost reduction per request
Problem

Re-analyzing the same audio file burned 50–80K tokens every time

Solution

document_analyses table tracks completed analyses. analyze_audio checks cache before spending tokens. First analysis: 30–60s. Subsequent: 100ms.

Impact
Effectively free repeated audio queries
Problem

Custom message converter stripped tool invocations from history, causing models to 'forget' their tools

Solution

Use AI SDK's convertToModelMessages which preserves all parts: tool-call, tool-result, reasoning, file parts.

Impact
Reliable multi-turn tool use across all models

Performance Characteristics

Memory search
In-memory cosine similarity
<1ms
Embedding cache hit
LRU cache, 5-min TTL
0ms
Context build
Depends on turn count
10–50ms
Session summarization
Gemini Flash, every 10 turns
500–1500ms
Audio analysis (cached)
DB lookup only
100ms
Audio analysis (first run)
50–80K audio tokens
30–60s
Deep research (quick)
Depth 1
15–30s
Deep research (full)
Depth 3 + Gemini DR
3–5 min

Why Build This?

Real Stakes, Real Data

This wasn't built to impress anyone. It was built because I needed it. The evidence was real, the legal case was real, and the existing tools weren't good enough. Building under real constraints produces better engineering than building for demos.

Proof of Depth

When I say I build production AI systems, this is what I mean. Not a chatbot wrapper. A full-stack AI platform with a custom context manager, vector memory, audio intelligence, deep research orchestrator, and 28 agent tools — all wired together and actually used.

The Hardest AI Problems Are Personal

Reasoning across 100K+ documents, detecting tampered audio, mapping entity relationships across years of evidence — these are the same problems enterprises pay millions to solve. I built the solution for myself first.

Full Tech Stack

Next.js 16TypeScriptVercel AI SDK 6PostgreSQLpgvectorDrizzle ORMGoogle Gemini 2.0Anthropic ClaudeOpenAI GPT-4oWhisper STTElevenLabs TTSGemini TTSGoogle Cloud StorageNextAuth.jsGoogle OAuthTavily Searchd3-forceshadcn/uiTailwindCSSGemini Explicit CacheHNSW Indexes

© 2026 Systems Engineer | AI Ecosystems Specialist — Built with Next.js & Tailwind

Catalyst is a personal AI operating system and intelligent assistant platform providing real-time voice and text interactions, knowledge base access, and integrated tool capabilities. Learn more