Skip to content

Session Ingestion

Session ingestion captures design decisions from your AI coding conversations (Claude Code transcripts) and writes them into the knowledge graph.

How It Works

The pipeline has three phases:

Phase 0 — Preprocess. Reads the raw JSONL transcript, compresses turns (strips tool calls, collapses long outputs), and identifies which files were touched during the session.

Phase 1 — Segment. Sends the compressed conversation to Claude, which splits it into logical segments — each representing a distinct task or discussion topic. Each segment is tagged with whether it likely contains design decisions.

Phase 2 — Extract. For each approved segment, the system pulls in code structure context from the graph (callers, callees, file structure), then asks Claude to extract specific decisions with anchoring information.

~/.claude/projects/*.jsonl

   Phase 0: Parse & compress turns

   Phase 1: LLM segments the conversation

   User approves which segments to analyze

   Phase 2: Per-segment deep extraction + graph context

   Write DecisionContext nodes to Memgraph

   Create PENDING_COMPARISON edges

   (later) npm run connect → build relationship edges

Where Transcripts Come From

Claude Code stores conversation transcripts as JSONL files in:

~/.claude/projects/<hashed-project-dir>/<session-id>.jsonl

Each line is a message (user, assistant, or tool call). The ingestion pipeline reads these directly — no export step needed.

Usage

bash
# Process all new sessions across all projects
npm run ingest:sessions:v2

# Process only sessions from a specific project
npm run ingest:sessions:v2 -- --project bite-me-website

# Process a specific session by ID
npm run ingest:sessions:v2 -- --session abc123

# Auto-approve all segments that have decisions (skip interactive prompt)
npm run ingest:sessions:v2 -- --auto-approve

# Dry run — Phase 0 only, no LLM calls (useful for previewing)
npm run ingest:sessions:v2 -- --dry-run

# Re-process a previously ingested session
npm run ingest:sessions:v2 -- --force --session abc123

# Control concurrency for Phase 2 extraction
npm run ingest:sessions:v2 -- --concurrency 3

Interactive Approval

By default, after Phase 1 segments the conversation, you'll see a list like:

[abc12345] bite-me-website
    42 turns | 5 files | ~12000 tokens
    🔍 Phase 1: Segmenting...
    ✓ 4 segments (2 with decisions):
    [1] ✅ Turn 1-12:  Refactored auth middleware to use JWT
        Hints: chose JWT over session cookies, trade-off discussion
    [2] ❌ Turn 13-20: Fixed CSS layout bug
    [3] ✅ Turn 21-35: Designed rate limiting strategy
        Hints: Redis vs in-memory, sliding window approach
    [4] ❌ Turn 36-42: Updated README

    Analyze which? (all / 1,3 / none):

You can select specific segments by number, analyze all, or skip. Use --auto-approve to automatically analyze all segments tagged with decisions.

State Tracking

Processed sessions are tracked in data/ingested-sessions-v2.json. Each entry records:

  • Session ID
  • Number of segments found and approved
  • Number of decisions extracted
  • Decision IDs (for re-processing with --force)

On subsequent runs, only new (unprocessed) sessions are picked up. Use --force to re-process a session — old decisions are deleted and replaced.

Large Sessions

Sessions exceeding ~80,000 tokens are automatically split into overlapping chunks for Phase 1 segmentation. The overlap (5 turns) prevents decisions at chunk boundaries from being missed. Segments from different chunks are deduplicated before Phase 2.

After Ingestion

Ingestion creates DecisionContext nodes and PENDING_COMPARISON edges. To build the relationship graph (CAUSED_BY, DEPENDS_ON, etc.), run:

bash
npm run connect

See CLI Reference for details.

Released under the Apache 2.0 License.