External Embedding API Support

MCP Memory Service can use external OpenAI-compatible embedding APIs instead of running embedding models locally. This is useful for:

Shared infrastructure: Run a single embedding service for multiple MCP instances
Resource efficiency: Offload GPU/CPU intensive embedding to dedicated servers
Model flexibility: Use embedding models not available in SentenceTransformers
Hosted services: Use OpenAI, Cohere, or other embedding APIs

⚠️ Storage Backend Compatibility

External embedding APIs are currently only supported with the sqlite_vec storage backend.

If you're using hybrid or cloudflare backends, the external API will NOT be used:

Hybrid: SQLite-vec will fall back to local models, Cloudflare uses Workers AI
Cloudflare: Always uses Workers AI (@cf/baai/bge-base-en-v1.5)

To use external embedding APIs, configure:

export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings

Support for external APIs with Hybrid/Cloudflare backends is planned for a future release.

Configuration

Set these environment variables to enable external embeddings:

# Required: API endpoint URL
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings

# Optional: Model name (default: nomic-embed-text)
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text

# Optional: API key for authenticated endpoints
export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx

Supported Backends

vLLM

vLLM provides high-performance inference with OpenAI-compatible API.

# Start vLLM with an embedding model
vllm serve nomic-ai/nomic-embed-text-v1.5 --port 8890

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8890/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5

Ollama

Ollama provides easy local model deployment.

# Pull and run embedding model
ollama pull nomic-embed-text

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:11434/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text

Text Embeddings Inference (TEI)

TEI is HuggingFace's optimized embedding server.

# Start TEI
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id nomic-ai/nomic-embed-text-v1.5

# Configure MCP Memory Service
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:8080/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5

OpenAI

export MCP_EXTERNAL_EMBEDDING_URL=https://api.openai.com/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=text-embedding-3-small
export MCP_EXTERNAL_EMBEDDING_API_KEY=sk-xxx

Embedding Dimension Compatibility

⚠️ Important: The embedding dimension must match your database schema.

Model	Dimensions
nomic-embed-text	768
text-embedding-3-small	1536
text-embedding-3-large	3072
all-MiniLM-L6-v2	384
all-mpnet-base-v2	768

If you're migrating from a local model to an external API (or vice versa), ensure the dimensions match or you'll need to re-embed your memories.

Fallback Behavior

If the external API is unavailable at startup, MCP Memory Service will fall back to local embedding models (ONNX → SentenceTransformer → Hash embeddings).

To require external embeddings without fallback, you can set:

export MCP_MEMORY_USE_ONNX=false  # Disable ONNX fallback
# Don't install sentence-transformers # Disable ST fallback

Performance Considerations

Batching: The adapter batches requests (default 32 sentences) for efficiency
Caching: Embedding models are cached per API URL + model combination
Timeout: Default 30 second timeout per request (configurable)
Retry: Currently no automatic retry; failures fall back to local models

Troubleshooting

Connection refused

ConnectionError: Cannot connect to external embedding API at http://localhost:8890/v1/embeddings

Verify the embedding service is running
Check the port is correct
Ensure no firewall blocking

Dimension mismatch

RuntimeError: Dimension mismatch for inserted vector. Expected 768 dimensions but received 384.

The external model produces different dimensions than your database
Either change the model or migrate your database

Authentication error

ConnectionError: API returned status 401: Unauthorized

Set MCP_EXTERNAL_EMBEDDING_API_KEY environment variable
Verify the API key is valid

API Compatibility

The adapter expects OpenAI-compatible /v1/embeddings endpoint:

Request:

{
  "input": ["text to embed", "another text"],
  "model": "model-name"
}

Response:

{
  "data": [
    {"index": 0, "embedding": [0.1, 0.2, ...]},
    {"index": 1, "embedding": [0.3, 0.4, ...]}
  ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

External Embedding API Support

⚠️ Storage Backend Compatibility

Configuration

Supported Backends

vLLM

Ollama

Text Embeddings Inference (TEI)

OpenAI

Embedding Dimension Compatibility

Fallback Behavior

Performance Considerations

Troubleshooting

Connection refused

Dimension mismatch

Authentication error

API Compatibility

Uh oh!

FilesExpand file tree

external-embeddings.md

Latest commit

History

external-embeddings.md

File metadata and controls

External Embedding API Support

⚠️ Storage Backend Compatibility

Configuration

Supported Backends

vLLM

Ollama

Text Embeddings Inference (TEI)

OpenAI

Embedding Dimension Compatibility

Fallback Behavior

Performance Considerations

Troubleshooting

Connection refused

Dimension mismatch

Authentication error

API Compatibility