Skip to content

Latest commit

 

History

History
168 lines (123 loc) · 4.97 KB

File metadata and controls

168 lines (123 loc) · 4.97 KB

External Embedding API Support

MCP Memory Service can use external OpenAI-compatible embedding APIs instead of running embedding models locally. This is useful for:

  • Shared infrastructure: Run a single embedding service for multiple MCP instances
  • Resource efficiency: Offload GPU/CPU intensive embedding to dedicated servers
  • Model flexibility: Use embedding models not available in SentenceTransformers
  • Hosted services: Use OpenAI, Cohere, or other embedding APIs

⚠️ Storage Backend Compatibility

External embedding APIs are currently only supported with the sqlite_vec storage backend.

If you're using hybrid or cloudflare backends, the external API will NOT be used:

  • Hybrid: SQLite-vec will fall back to local models, Cloudflare uses Workers AI
  • Cloudflare: Always uses Workers AI (@cf/baai/bge-base-en-v1.5)

To use external embedding APIs, configure:

export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings

Support for external APIs with Hybrid/Cloudflare backends is planned for a future release.

Configuration

Set these environment variables to enable external embeddings:

# Required: API endpoint URL
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings

# Optional: Model name (default: nomic-embed-text)
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text

# Optional: API key for authenticated endpoints
export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx

Supported Backends

vLLM

vLLM provides high-performance inference with OpenAI-compatible API.

# Start vLLM with an embedding model
vllm serve nomic-ai/nomic-embed-text-v1.5 --port 8890

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5

Ollama

Ollama provides easy local model deployment.

# Pull and run embedding model
ollama pull nomic-embed-text

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:11434/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text

Text Embeddings Inference (TEI)

TEI is HuggingFace's optimized embedding server.

# Start TEI
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id nomic-ai/nomic-embed-text-v1.5

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8080/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5

OpenAI

export MCP_EXTERNAL_EMBEDDING_URL=https://api.openai.com/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=text-embedding-3-small
export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx

Embedding Dimension Compatibility

⚠️ Important: The embedding dimension must match your database schema.

Model Dimensions
nomic-embed-text 768
text-embedding-3-small 1536
text-embedding-3-large 3072
all-MiniLM-L6-v2 384
all-mpnet-base-v2 768

If you're migrating from a local model to an external API (or vice versa), ensure the dimensions match or you'll need to re-embed your memories.

Fallback Behavior

If the external API is unavailable at startup, MCP Memory Service will fall back to local embedding models (ONNX → SentenceTransformer → Hash embeddings).

To require external embeddings without fallback, you can set:

export MCP_MEMORY_USE_ONNX=false  # Disable ONNX fallback
# Don't install sentence-transformers # Disable ST fallback

Performance Considerations

  • Batching: The adapter batches requests (default 32 sentences) for efficiency
  • Caching: Embedding models are cached per API URL + model combination
  • Timeout: Default 30 second timeout per request (configurable)
  • Retry: Currently no automatic retry; failures fall back to local models

Troubleshooting

Connection refused

ConnectionError: Cannot connect to external embedding API at http://localhost:8890/v1/embeddings
  • Verify the embedding service is running
  • Check the port is correct
  • Ensure no firewall blocking

Dimension mismatch

RuntimeError: Dimension mismatch for inserted vector. Expected 768 dimensions but received 384.
  • The external model produces different dimensions than your database
  • Either change the model or migrate your database

Authentication error

ConnectionError: API returned status 401: Unauthorized
  • Set MCP_EXTERNAL_EMBEDDING_API_KEY environment variable
  • Verify the API key is valid

API Compatibility

The adapter expects OpenAI-compatible /v1/embeddings endpoint:

Request:

{
  "input": ["text to embed", "another text"],
  "model": "model-name"
}

Response:

{
  "data": [
    {"index": 0, "embedding": [0.1, 0.2, ...]},
    {"index": 1, "embedding": [0.3, 0.4, ...]}
  ]
}