
Forensic AI
Studio
I built a private AI investigator for a real legal case. It ingests evidence, maps entity relationships, analyzes audio recordings, and reasons across 100K+ documents in real-time.
What Is This?
This is a full-stack AI investigation platform I built for personal use during a complex legal case. It's not a product, not a demo — it's a working tool I use daily. The codebase is a Next.js 16 app with a PostgreSQL + pgvector backend, a 28-tool AI agent layer, a voice interface, and a deep research orchestrator. Everything is purpose-built for one job: finding the truth in a mountain of evidence.
Evidence Ingestion
Parses PDF, DOCX, ODT, EML, HTML, CSV, WhatsApp SQLite DBs, and audio/video files. Chunks, embeds, and extracts named entities automatically.
AI Agent with 28+ Tools
A persistent agent that can search evidence, recall memories, run deep research, analyze audio, query Gmail/Drive, browse the web, and execute multi-step investigations.
Audio Intelligence
Sends audio recordings to Gemini's audio modality (50–80K tokens per file). Extracts transcripts, speaker labels, entities, and forensic findings. Results cached to avoid re-spending tokens.
Capability Map
Every module is production-quality, purpose-built, and wired together into a single coherent system.
Evidence Browser
- Search + filter 169 ingested documents
- Chunk viewer with source attribution
- Inline ingestion panel
- pgvector semantic search + GIN full-text
Entity Graph
- d3-force relationship visualization
- 8 entity types (people, companies, …)
- Greek + English NER via Gemini Flash
- Entity CRM with profile editor
Media Lab
- Waveform viewer + transport bar
- Gemini audio modality analysis
- Speaker diarization + labeling
- Checkpoint cache (no re-spend)
Deep Research
- 5-phase orchestrator: Plan → Gather → Synthesize → Iterate → Report
- Tavily web search + local evidence
- Gmail / Drive integration
- Gemini Deep Research (optional)
Vector Memory
- Semantic dedup (cosine > 0.85 → merge)
- 5 categories: entity, fact, decision, preference, focus
- Sub-millisecond in-memory search
- Lightweight context hint (~50 tokens)
Context Manager
- Budget-aware layered context
- Dynamic turn window (4–20 turns)
- Gemini explicit cache (75% cost reduction)
- Handles 1M → 128K model switches
Integrity & Tamper
- SHA-256 hash verification per document
- Integrity checks table + trust scores
- Immutable audit log
- Audio spectral analysis (planned)
Timeline
- Normalized event timeline
- Cross-referenced with entities
- Filterable by event type
- Auto-population from ingested data (planned)
Voice Interface
- Whisper STT
- ElevenLabs TTS (sentence-level streaming)
- Gemini TTS (free tier fallback)
- Voice settings: provider, model, voice, mode
The Agent's Toolbox
28+ tools across 7 categories. The agent decides which to call, in what order, and how to chain results.
Memory
recall_memorySemantic search over stored memoriessave_memoryExplicitly store a fact or preferencemanage_memoryDelete / edit / prune memory store
Knowledge & Investigation
fast_contextParallel semantic + FTS + metadata searchinvestigateCross-reference multiple sourcescreate_investigationMulti-step research plan
Deep Research
deep_researchLaunch 5-phase background research taskcheck_researchPoll task statusfollowup_researchCheap follow-up on Gemini DR results
Google Workspace
search_gmail / read_gmail / send_gmailFull Gmail accesssearch_drive / read_drive_fileGoogle Drivelist_calendar_eventsCalendarsearch_contactsContacts
Audio Intelligence
analyze_audioGemini audio modality — listen + analyzeget_analysisRetrieve cached analysis results
Self-Inspection
check_app_stateApp activity snapshotread_tracesInspect own behavior logsadjust_personaPropose behavioral changesend_to_inbox / read_inboxMessage IDE agent (Cascade)
System Architecture
User Message
│
▼
context-manager.ts ──── Budget-aware layered context
│ ┌─ Compass (90K, Gemini cache)
│ ├─ Persona (~820 tokens)
│ ├─ Memory Hint (~50 tokens)
│ ├─ Session Summary (~300 tokens)
│ └─ Recent Turns (4–20, dynamic)
│
▼
AI SDK streamText() ──── Multi-model: Gemini / Claude / GPT
│
├── Tool Calls ──────► 28+ tools execute in parallel
│ │
│ ├── fast_context → pgvector + FTS search
│ ├── analyze_audio → Gemini audio modality
│ ├── deep_research → 5-phase orchestrator
│ ├── search_gmail → Google OAuth
│ └── recall_memory → in-memory cosine search
│
▼
Assistant Response
│
▼
memory-extractor.ts ──── Async: extract + store memories
│
▼
memory_vectors (PG) ──── HNSW indexed, semantic dedup
Database
- PostgreSQL + pgvector
- 25 tables
- HNSW vector indexes
- GIN full-text indexes
AI Models
- Gemini 2.0 Flash (primary)
- Claude 3.5 Sonnet
- GPT-4o
- OpenAI embeddings (1536d)
Stack
- Next.js 16 App Router
- Vercel AI SDK 6
- Drizzle ORM
- Google Cloud Storage
Key Engineering Decisions
Flat memory tables capped out and dumped entirely into context window
Vector-backed memory: each memory is a separate row with its own embedding. Semantic dedup (cosine > 0.85 → merge). Context gets a 50-token hint instead of a 1,200-token dump.
Fixed turn windows (e.g. 'last 20 turns') broke on model switches from 1M → 128K tokens
Budget-aware dynamic allocation: 30% reserved for reasoning, 15% for tool results, 55% for context. Turn window binary-searches for max turns that fit remaining budget.
Implicit Gemini caching didn't guarantee cache hits, wasting tokens on the 90K compass document
Explicit Gemini cache API: uploads compass + persona + tools once, references by name on subsequent requests.
Re-analyzing the same audio file burned 50–80K tokens every time
document_analyses table tracks completed analyses. analyze_audio checks cache before spending tokens. First analysis: 30–60s. Subsequent: 100ms.
Custom message converter stripped tool invocations from history, causing models to 'forget' their tools
Use AI SDK's convertToModelMessages which preserves all parts: tool-call, tool-result, reasoning, file parts.
Performance Characteristics
Why Build This?
Real Stakes, Real Data
This wasn't built to impress anyone. It was built because I needed it. The evidence was real, the legal case was real, and the existing tools weren't good enough. Building under real constraints produces better engineering than building for demos.
Proof of Depth
When I say I build production AI systems, this is what I mean. Not a chatbot wrapper. A full-stack AI platform with a custom context manager, vector memory, audio intelligence, deep research orchestrator, and 28 agent tools — all wired together and actually used.
The Hardest AI Problems Are Personal
Reasoning across 100K+ documents, detecting tampered audio, mapping entity relationships across years of evidence — these are the same problems enterprises pay millions to solve. I built the solution for myself first.