Polyglot inference library for fully offline, text-only embedding and chat generation on CPU-only Linux, plus Windows and ARM64.
This repository hosts implementations in multiple languages. Java is first; Go follows. Both implementations produce wire-compatible artifacts and observable behavior.
| Language | Status | Path |
|---|---|---|
| Java | ๐ง in development (Phase 1) | java/ |
| Go | ๐ planned | go/ |
Phase 1 is library-only โ embedding via ONNX Runtime + bge-small-en-v1.5; chat generation via a forked llama.cpp Java binding + Qwen 2.5-0.5B-Instruct (default). HTTP/OpenAI-compatible layer is Phase 2.
docs/ARCHITECTURE.mdโ cross-language designdocs/WIRE_FORMAT.mdโ JSON shapes shared across languagesdocs/MODEL_REGISTRY.mdโ canonical model IDsjava/โ Java implementation
Apache 2.0 โ see LICENSE.