Session Ingestion
Session ingestion captures design decisions from your AI coding conversations (Claude Code transcripts) and writes them into the knowledge graph.
How It Works
The pipeline has three phases:
Phase 0 — Preprocess. Reads the raw JSONL transcript, compresses turns (strips tool calls, collapses long outputs), and identifies which files were touched during the session.
Phase 1 — Segment. Sends the compressed conversation to Claude, which splits it into logical segments — each representing a distinct task or discussion topic. Each segment is tagged with whether it likely contains design decisions.
Phase 2 — Extract. For each approved segment, the system pulls in code structure context from the graph (callers, callees, file structure), then asks Claude to extract specific decisions with anchoring information.
~/.claude/projects/*.jsonl
↓
Phase 0: Parse & compress turns
↓
Phase 1: LLM segments the conversation
↓
User approves which segments to analyze
↓
Phase 2: Per-segment deep extraction + graph context
↓
Write DecisionContext nodes to Memgraph
↓
Create PENDING_COMPARISON edges
↓
(later) npm run connect → build relationship edgesWhere Transcripts Come From
Claude Code stores conversation transcripts as JSONL files in:
~/.claude/projects/<hashed-project-dir>/<session-id>.jsonlEach line is a message (user, assistant, or tool call). The ingestion pipeline reads these directly — no export step needed.
Usage
# Process all new sessions across all projects
npm run ingest:sessions:v2
# Process only sessions from a specific project
npm run ingest:sessions:v2 -- --project bite-me-website
# Process a specific session by ID
npm run ingest:sessions:v2 -- --session abc123
# Auto-approve all segments that have decisions (skip interactive prompt)
npm run ingest:sessions:v2 -- --auto-approve
# Dry run — Phase 0 only, no LLM calls (useful for previewing)
npm run ingest:sessions:v2 -- --dry-run
# Re-process a previously ingested session
npm run ingest:sessions:v2 -- --force --session abc123
# Control concurrency for Phase 2 extraction
npm run ingest:sessions:v2 -- --concurrency 3Interactive Approval
By default, after Phase 1 segments the conversation, you'll see a list like:
[abc12345] bite-me-website
42 turns | 5 files | ~12000 tokens
🔍 Phase 1: Segmenting...
✓ 4 segments (2 with decisions):
[1] ✅ Turn 1-12: Refactored auth middleware to use JWT
Hints: chose JWT over session cookies, trade-off discussion
[2] ❌ Turn 13-20: Fixed CSS layout bug
[3] ✅ Turn 21-35: Designed rate limiting strategy
Hints: Redis vs in-memory, sliding window approach
[4] ❌ Turn 36-42: Updated README
Analyze which? (all / 1,3 / none):You can select specific segments by number, analyze all, or skip. Use --auto-approve to automatically analyze all segments tagged with decisions.
State Tracking
Processed sessions are tracked in data/ingested-sessions-v2.json. Each entry records:
- Session ID
- Number of segments found and approved
- Number of decisions extracted
- Decision IDs (for re-processing with
--force)
On subsequent runs, only new (unprocessed) sessions are picked up. Use --force to re-process a session — old decisions are deleted and replaced.
Large Sessions
Sessions exceeding ~80,000 tokens are automatically split into overlapping chunks for Phase 1 segmentation. The overlap (5 turns) prevents decisions at chunk boundaries from being missed. Segments from different chunks are deduplicated before Phase 2.
After Ingestion
Ingestion creates DecisionContext nodes and PENDING_COMPARISON edges. To build the relationship graph (CAUSED_BY, DEPENDS_ON, etc.), run:
npm run connectSee CLI Reference for details.