Mem- Oracle

Architecture

Understanding how Mem-Oracle works under the hood.

System Overview


Component Details

Client Layer

  • Claude Code: Primary integration via plugin hooks
  • OpenCode: Alternative editor integration
  • CLI: Direct command-line access

Integration Layer

  • Plugin Hooks: Lifecycle event handlers for automatic doc injection
  • MCP Server: Model Context Protocol for explicit tool calls

Service Layer

  • Worker Service: HTTP server handling all requests
  • Orchestrator: Coordinates indexing and retrieval operations

Processing Pipeline

ComponentResponsibility
FetcherHTTP requests with caching and rate limiting
ExtractorHTML/Markdown parsing, content extraction
ChunkerSplits content into semantic chunks
CrawlerDiscovers and queues linked pages

Embedding Layer

ProviderTypeUse Case
LocalTF-IDFNo API required, fast
OpenAINeuralHigh quality, general purpose
VoyageNeuralOptimized for code
CohereNeuralMulti-language support

Storage Layer

StorePurpose
SQLiteDocset and page metadata
Vector StoreEmbedding vectors (JSON files)
Content CacheRaw fetched content

Indexing Flow

Seed-First Strategy

  1. Immediate: Index the seed page synchronously
  2. Background: Crawl and index discovered pages asynchronously
  3. Benefit: Users get immediate results while full indexing continues

Retrieval Flow


Data Flow


Data Models

Docset

Docset
interface Docset {
  id: string;
  name: string;
  baseUrl: string;
  seedSlug: string;
  status: 'indexing' | 'complete' | 'error';
  createdAt: Date;
  updatedAt: Date;
}

Page

Page
interface Page {
  id: string;
  docsetId: string;
  url: string;
  title: string;
  status: 'pending' | 'indexed' | 'failed';
  contentHash: string;
  indexedAt: Date;
}

Chunk

Chunk
interface Chunk {
  id: string;
  pageId: string;
  content: string;
  startIndex: number;
  endIndex: number;
  embedding: number[];
}

File Structure

Directory Structure
~/.mem-oracle/
├── config.json           # User configuration
├── metadata.db           # SQLite database
├── cache/                # Fetched content cache
│   └── {hash}.html      
├── vectors/              # Vector embeddings
│   └── {docsetId}/
│       └── {pageId}.json
├── worker.pid            # Worker process ID
└── worker.log            # Worker logs

On this page