Private Tool · Not a Product

Forensic AI
Studio

I built a private AI investigator for a real legal case. It ingests evidence, maps entity relationships, analyzes audio recordings, and reasons across 100K+ documents in real-time.

155

TypeScript files

API routes

28+

Agent tools

DB tables

What Is This?

This is a full-stack AI investigation platform I built for personal use during a complex legal case. It's not a product, not a demo — it's a working tool I use daily. The codebase is a Next.js 16 app with a PostgreSQL + pgvector backend, a 28-tool AI agent layer, a voice interface, and a deep research orchestrator. Everything is purpose-built for one job: finding the truth in a mountain of evidence.

Evidence Ingestion

Parses PDF, DOCX, ODT, EML, HTML, CSV, WhatsApp SQLite DBs, and audio/video files. Chunks, embeds, and extracts named entities automatically.

AI Agent with 28+ Tools

A persistent agent that can search evidence, recall memories, run deep research, analyze audio, query Gmail/Drive, browse the web, and execute multi-step investigations.

Audio Intelligence

Sends audio recordings to Gemini's audio modality (50–80K tokens per file). Extracts transcripts, speaker labels, entities, and forensic findings. Results cached to avoid re-spending tokens.

Capability Map

Every module is production-quality, purpose-built, and wired together into a single coherent system.

DONE

Evidence Browser

Search + filter 169 ingested documents
Chunk viewer with source attribution
Inline ingestion panel
pgvector semantic search + GIN full-text

DONE

Entity Graph

d3-force relationship visualization
8 entity types (people, companies, …)
Greek + English NER via Gemini Flash
Entity CRM with profile editor

DONE

Media Lab

Waveform viewer + transport bar
Gemini audio modality analysis
Speaker diarization + labeling
Checkpoint cache (no re-spend)

DONE

Deep Research

5-phase orchestrator: Plan → Gather → Synthesize → Iterate → Report
Tavily web search + local evidence
Gmail / Drive integration
Gemini Deep Research (optional)

DONE

Vector Memory

Semantic dedup (cosine > 0.85 → merge)
5 categories: entity, fact, decision, preference, focus
Sub-millisecond in-memory search
Lightweight context hint (~50 tokens)

DONE

Context Manager

Budget-aware layered context
Dynamic turn window (4–20 turns)
Gemini explicit cache (75% cost reduction)
Handles 1M → 128K model switches

PARTIAL

Integrity & Tamper

SHA-256 hash verification per document
Integrity checks table + trust scores
Immutable audit log
Audio spectral analysis (planned)

DONE

Timeline

Normalized event timeline
Cross-referenced with entities
Filterable by event type
Auto-population from ingested data (planned)

DONE

Voice Interface

Whisper STT
ElevenLabs TTS (sentence-level streaming)
Gemini TTS (free tier fallback)
Voice settings: provider, model, voice, mode

The Agent's Toolbox

28+ tools across 7 categories. The agent decides which to call, in what order, and how to chain results.

Memory

recall_memorySemantic search over stored memories
save_memoryExplicitly store a fact or preference
manage_memoryDelete / edit / prune memory store

Knowledge & Investigation

fast_contextParallel semantic + FTS + metadata search
investigateCross-reference multiple sources
create_investigationMulti-step research plan

Deep Research

deep_researchLaunch 5-phase background research task
check_researchPoll task status
followup_researchCheap follow-up on Gemini DR results

Google Workspace

search_gmail / read_gmail / send_gmailFull Gmail access
search_drive / read_drive_fileGoogle Drive
list_calendar_eventsCalendar
search_contactsContacts

Audio Intelligence

analyze_audioGemini audio modality — listen + analyze
get_analysisRetrieve cached analysis results

Self-Inspection

check_app_stateApp activity snapshot
read_tracesInspect own behavior logs
adjust_personaPropose behavioral change
send_to_inbox / read_inboxMessage IDE agent (Cascade)

System Architecture


User Message
    │
    ▼
context-manager.ts  ──── Budget-aware layered context
    │                    ┌─ Compass (90K, Gemini cache)
    │                    ├─ Persona (~820 tokens)
    │                    ├─ Memory Hint (~50 tokens)
    │                    ├─ Session Summary (~300 tokens)
    │                    └─ Recent Turns (4–20, dynamic)
    │
    ▼
AI SDK streamText()  ──── Multi-model: Gemini / Claude / GPT
    │
    ├── Tool Calls ──────► 28+ tools execute in parallel
    │       │
    │       ├── fast_context    → pgvector + FTS search
    │       ├── analyze_audio   → Gemini audio modality
    │       ├── deep_research   → 5-phase orchestrator
    │       ├── search_gmail    → Google OAuth
    │       └── recall_memory   → in-memory cosine search
    │
    ▼
Assistant Response
    │
    ▼
memory-extractor.ts  ──── Async: extract + store memories
    │
    ▼
memory_vectors (PG)  ──── HNSW indexed, semantic dedup

Database

PostgreSQL + pgvector
25 tables
HNSW vector indexes
GIN full-text indexes

AI Models

Gemini 2.0 Flash (primary)
Claude 3.5 Sonnet
GPT-4o
OpenAI embeddings (1536d)

Stack

Next.js 16 App Router
Vercel AI SDK 6
Drizzle ORM
Google Cloud Storage

Key Engineering Decisions

Problem

Flat memory tables capped out and dumped entirely into context window

Solution

Vector-backed memory: each memory is a separate row with its own embedding. Semantic dedup (cosine > 0.85 → merge). Context gets a 50-token hint instead of a 1,200-token dump.

Impact

96% context reduction for memory layer

Problem

Fixed turn windows (e.g. 'last 20 turns') broke on model switches from 1M → 128K tokens

Solution

Budget-aware dynamic allocation: 30% reserved for reasoning, 15% for tool results, 55% for context. Turn window binary-searches for max turns that fit remaining budget.

Impact

Seamless model switching without context overflow

Problem

Implicit Gemini caching didn't guarantee cache hits, wasting tokens on the 90K compass document

Solution

Explicit Gemini cache API: uploads compass + persona + tools once, references by name on subsequent requests.

Impact

75% token cost reduction per request

Problem

Re-analyzing the same audio file burned 50–80K tokens every time

Solution

document_analyses table tracks completed analyses. analyze_audio checks cache before spending tokens. First analysis: 30–60s. Subsequent: 100ms.

Impact

Effectively free repeated audio queries

Problem

Custom message converter stripped tool invocations from history, causing models to 'forget' their tools

Solution

Use AI SDK's convertToModelMessages which preserves all parts: tool-call, tool-result, reasoning, file parts.

Impact

Reliable multi-turn tool use across all models

Performance Characteristics

Memory search

In-memory cosine similarity

<1ms

Embedding cache hit

LRU cache, 5-min TTL

0ms

Context build

Depends on turn count

10–50ms

Session summarization

Gemini Flash, every 10 turns

500–1500ms

Audio analysis (cached)

DB lookup only

100ms

Audio analysis (first run)

50–80K audio tokens

30–60s

Deep research (quick)

Depth 1

15–30s

Deep research (full)

Depth 3 + Gemini DR

3–5 min

Why Build This?

Real Stakes, Real Data

This wasn't built to impress anyone. It was built because I needed it. The evidence was real, the legal case was real, and the existing tools weren't good enough. Building under real constraints produces better engineering than building for demos.

Proof of Depth

When I say I build production AI systems, this is what I mean. Not a chatbot wrapper. A full-stack AI platform with a custom context manager, vector memory, audio intelligence, deep research orchestrator, and 28 agent tools — all wired together and actually used.

The Hardest AI Problems Are Personal

Reasoning across 100K+ documents, detecting tampered audio, mapping entity relationships across years of evidence — these are the same problems enterprises pay millions to solve. I built the solution for myself first.

Full Tech Stack

Next.js 16TypeScriptVercel AI SDK 6PostgreSQLpgvectorDrizzle ORMGoogle Gemini 2.0Anthropic ClaudeOpenAI GPT-4oWhisper STTElevenLabs TTSGemini TTSGoogle Cloud StorageNextAuth.jsGoogle OAuthTavily Searchd3-forceshadcn/uiTailwindCSSGemini Explicit CacheHNSW Indexes

Forensic AIStudio

What Is This?

Evidence Ingestion

AI Agent with 28+ Tools

Audio Intelligence

Capability Map

Evidence Browser

Entity Graph

Media Lab

Deep Research

Vector Memory

Context Manager

Integrity & Tamper

Timeline

Voice Interface

The Agent's Toolbox

Memory

Knowledge & Investigation

Deep Research

Google Workspace

Audio Intelligence

Self-Inspection

System Architecture

Database

AI Models

Stack

Key Engineering Decisions

Performance Characteristics

Why Build This?

Real Stakes, Real Data

Proof of Depth

The Hardest AI Problems Are Personal

Full Tech Stack

Forensic AI
Studio