Contact

Contact HaxiTAG for enterprise services, consulting, and product trials.

Friday, July 3, 2026

No More AI “Hallucinating with Confidence”: Enterprise-Grade Knowledge Computation Engine Yueli KGM Is Officially Open Source

Large language models are undeniably smart, but why do they still “hallucinate” at the most critical moments?

A risk-control model spits out a plausible regulation that simply doesn’t exist. An internal knowledge Q&A returns a “close enough” but inaccurate compliance explanation. A technical document search pulls in an outdated version without warning…

This is not about the model being subpar — it’s about model inference lacking factual grounding.

Today, we are officially releasing the Yueli Knowledge Computation Engine (Yueli KGM Computing) — a dynamic scheduling middleware that combines a self-hosted inference orchestration layer with a compatibility gateway, open-sourced under the MIT license on GitHub. Its mission: to provide a deterministic knowledge anchor for trustworthy LLM reasoning.


Core Proposition: Boundaries First, Uniqueness Second

Yueli KGM Computing (hereafter KGM) is not yet another LLM application framework, nor is it meant to replace the vLLM or LangChain you are already using. Its positioning is crystal clear:

Self-hosted native inference + inference orchestration and compatibility gateway. You can self-deploy models suitable for enterprise private deployment scenarios and perform local inference; you can also orchestrate a mix of local inference and cloud MaaS services, with KGM acting as the gateway and orchestration layer, offloading the primary compute to external inference services. Furthermore, you can extend and develop on top of the open-source codebase to define your own enterprise scheduling routes and workflows.

In one sentence:

yueli-kgm-computing is the knowledge infrastructure layer for enterprise AI applications, making LLMs more trustworthy and reliable.

KGM is purpose-built for the following scenarios:

  • Enterprise intelligent application development and algorithm service foundation
  • Structured extraction of private enterprise data and automated knowledge graph construction
  • Anchoring LLM inference to knowledge graph fact nodes (reducing hallucinations, increasing traceability)
  • Unified semantic computation for multimodal content
  • Providing standardized knowledge APIs for enterprise AI applications, supporting full-stack private deployment
  • Unified encapsulation, offering data audit cost control and data security assurance

What’s Delivered: Four Clear Capability Lines

Once installed, @haxitag/yueli-kgm-computing gives you four things:

① Dual-Protocol HTTP Surface

Within the same process, two API sets are exposed simultaneously: OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages). Tool semantics are bidirectionally mapped — OpenAI’s tool_calls and Anthropic’s tool_use are automatically converted at the gateway layer.

For the business side, this means: one Base URL, two industry protocols, and zero client-side awareness of upstream differences.

② KGM Extension: Orchestration Knobs on the Same Request Body

In a standard OpenAI/Anthropic request body, you can optionally carry a kgm field that serves as a “progressive enhancement switch”. When omitted, KGM operates in passthrough mode (directly proxying the upstream SSE). When orchestration signals are present, it automatically switches to bridge streaming mode (KGM assembles SSE segments, injecting intermediate semantics from knowledge graphs, retrieval, tools, etc.).

This is a diversion approach, not an either/or choice — traffic that doesn’t need orchestration pays no orchestration cost.

③ Managed Runtime Control Plane

Artifact pulling, runtime lifecycle management, and inference-related metrics are all brought under unified management. KGM knows where each model artifact is, what state it’s in, and which runtime it runs on — you get an operable, observable control plane, not just a forwarding router.

④ In-Process Native Inference Engine

KGM includes its own NativeRuntimeEngine, capable of performing tensor forward pass and decoding within the same process. It’s important to note that this is fundamentally different from “replacing a vLLM cluster.” KGM honestly documents a four-tier capability boundary (A/B/C/D) in its docs, clearly indicating which paths are production-suitable and which are for regression validation.

Architects and developers: first connect an external engine to establish a passthrough baseline, then evaluate whether you need the in-process Native engine for target models.


Orchestration Core: How Cognitive Augmentation Works

KGM’s main execution pipeline breaks down “cognitive augmentation” into configurable, observable modules:

Context Management: runs memory retrieval, graph queries, and conversation history retrieval in parallel. Stable parts are cached, dynamic parts are incrementally updated — highly effective in multi-turn dialog scenarios.

Memory Management: a separated short-term/long-term memory system, written and retrieved via API, implicitly triggered within the ContextBuilder path.

Knowledge Graph Augmentation: triggered via kgm.graph.enabled=true, injects graph sub-query results into the context before inference so that retrieval results carry “contextual relationships” rather than relying solely on similarity.

Tool Orchestration: server-side multi-turn execution compatible, parses intent and executes tool calls, with responses carrying an audit trail. It also supports delegating tool execution to an external sandbox, enabling a “tool gating” design.


Multi-Provider Access: Unified Management of 30+ Mainstream LLM Providers

KGM covers 30+ mainstream LLM providers through LlmProviderFactory — from OpenAI, DeepSeek, Anthropic Claude, and Google Gemini to Alibaba Cloud Bailian, Volcano Ark, Zhipu GLM, Baidu Qianfan, and on-premise options like Ollama, vLLM, SGLang, and LM Studio. Switching is done with a single environment variable.

When multi-routing strategy is enabled, you can implement auditable routing rules through declarative JSON configuration — for instance, “sensitive tasks → intranet Ollama / complex reasoning → vLLM / long-context → OpenRouter.”


Production Deployment: Enterprise-Grade Engineering Reliability

KGM already delivers production-grade capabilities:

  • Structured Logging: JSON format, automatic sensitive data masking
  • Unified Error Handling: custom error types, stack traces hidden in production
  • Circuit Breaker: circuit breaker pattern for external service calls, with monitorable state
  • Database: SQLite for development, PostgreSQL recommended for production
  • Observability: Prometheus-compatible metrics (latency, time-to-first-token, tokens per second, KV cache memory usage, queue depth, etc.)
  • Graceful Shutdown: SIGTERM/SIGINT handling, completing in-flight requests

A minimal production startup takes just 5 minutes, and a Web Playground is included for managing skills, MCP connectors, and output templates.


Division of Labor with the Open-Source Ecosystem: Not Competition, but Layering

The most common question from technical teams — “I’m already using LangChain/LlamaIndex/vLLM. Do I still need KGM?” — has a clear answer: they operate at different layers.

DimensionLangChainLlamaIndexvLLMYueli KGM
Primary PositioningLLM app frameworkData retrieval frameworkHigh-perf inference engineInference orchestration + compatibility gateway
Unified Dual ProtocolRequires DIYRequires DIYNot providedNative dual protocol, bidirectional tool mapping
Diversion DesignNoneNoneN/ANative diversion — no orchestration, no added cost
Managed Control PlaneNoneNoneStandalone serviceNative control plane
Knowledge Graph-constrained ReasoningNeeds custom integrationBasic supportNoneNative KGM, deep integration

Recommended composition patterns:

  • vLLM/SGLang as the compute backbone + KGM as the protocol compatibility and orchestration layer
  • LangChain/LlamaIndex as the application logic layer + KGM as the underlying unified HTTP entry point
  • Dify or BotFactory as the low-code workflow layer + KGM for model routing and key management

Uniqueness with Engineering Honesty: Six Points, Fully Documented

Summarized from the repository’s capabilities.md:

Single self-hosted surface unifying two industry protocols: reduces dual-stack maintenance costs

“Diversion” rather than “either/or”: passthrough for non-orchestrated requests avoids unnecessary traffic rewriting

KGM extension as a progressive switch: supports an integration path of “proxy first, augment later”

Managed Runtime + cross-format recognition: oriented toward model asset governance, not just HTTP pass-through

Honest Native layered narrative (A→D): reduces the industry misconception that “parsing a config means it can run”

Operable: Prometheus metrics and automatic route auditing give the gateway layer SRE-grade observability


Value Propositions for Different Audiences

Enterprise IT Decision-Makers and Architects

KGM doesn’t solve “make the model smarter” — it solves “make enterprise AI infrastructure governable, auditable, and replaceable.”

Any LLM provider can be switched via an environment variable. Any business application only needs to interface with KGM’s unified API. This is an engineering path to reduce vendor lock-in risk and build an evolvable AI infrastructure.

Suggested evaluation path: deploy KGM for a single scenario (internal knowledge Q&A or API unification), establish a /metrics and passthrough baseline, verify observability and routing capabilities, and then decide whether to enable KGM extensions for orchestration-enhanced phases.

Enterprise Service Technical Teams and Software Engineers

KGM’s core engineering design philosophy: configuration-driven, not code-driven. Multi-provider routing is a JSON rule. Skills and MCP connectors are Playground configurations. A great deal of “dirty work” is abstracted into declarative configurations, eliminating the need to reinvent the wheel for every project.

MaaS Providers and Cloud Compute Vendors

KGM’s ProviderType registry already covers 30+ vendors — zero-cost out-of-the-box integration. KGM’s declarative routing, key management, circuit breaker, and Prometheus metrics mean customers can manage multi-cloud compute allocation within a single observable control plane.

As a middleware layer for protocol compatibility, orchestration routing, and control plane, KGM is open-sourced under MIT, available as an NPM package and source code. It supports low-complexity integration across different tech stacks including Golang, Python, and Rust, with unrestricted production deployment and modification rights.

The enterprise service solutions built by the HaxiTAG team are also integrated on top of yueli-kgm-computing. We welcome peers, partners, and talented developers to build upon KGM for private deployments, integration services, and industry solutions.


Final Words

Integrating AI into enterprise applications and production systems is no longer about “who has the strongest model,” but “who can make models work stably, trustworthily, and auditably in real business scenarios.”

yueli-kgm-computing’s answer: use the determinism of knowledge graphs to constrain the probabilistic nature of large language models.

This is not a minor technical patch — it’s the essential path for enterprise AI to move from “the lab” to “production.”

Related topic: