Malicious GGUF models could give attackers RCE on SGLang inference servers


A critical flaw in SGLang could let attackers turn a standard GGUF model file into a remote code execution path on AI inference servers. The bug, tracked as CVE-2026-5760, affects SGLang’s /v1/rerank endpoint and can execute attacker-controlled Python code when the server loads a poisoned model and processes a rerank request.

The issue matters because it shifts risk from prompts to model files themselves. An attacker does not need direct shell access to the target server. They only need a victim to load a malicious model into SGLang, whether through a manual admin action or an automated deployment workflow pulling models from a public source.

As more companies deploy open source models in production, this bug highlights a growing supply-chain problem in AI infrastructure. If a model artifact can carry executable abuse logic inside metadata, then “download and serve” becomes a much riskier workflow than many teams assume.

How the vulnerability works

CERT/CC says the flaw sits in SGLang’s reranking endpoint at /v1/rerank. The vulnerable path renders a model-supplied tokenizer.chat_template, and that template can contain a Jinja2 server-side template injection payload. When the rerank endpoint processes it, the payload runs in the context of the SGLang service.

The root cause is unsandboxed template rendering. CERT/CC says SGLang uses jinja2.Environment() in its get_jinja_env() path instead of a sandboxed alternative, which allows arbitrary Python execution during template rendering. The GitHub advisory describes the same issue and ties it directly to model-supplied chat templates.

That makes this a server-side template injection problem with infrastructure-level consequences. Once triggered, the payload can run operating system commands, access local data, and potentially move into lateral movement, data theft, or service disruption depending on what privileges the SGLang process holds.

Why GGUF model files become the delivery path

The attack chain is straightforward. A malicious actor creates a GGUF model with a crafted tokenizer.chat_template, adds a trigger phrase that routes the request through the vulnerable reranker logic, and waits for a victim to load the model into SGLang. When any request later hits /v1/rerank, the server renders the template and executes the payload.

The proof-of-concept published by Stuart Beck shows exactly that pattern. Its README says the exploit uses a malicious GGUF file, a trigger phrase for SGLang’s Qwen3 reranker handling, and a Jinja2 escape path that reaches os.popen() to run arbitrary OS commands on the inference host.

This does not mean every GGUF model is dangerous. The real risk appears when teams load untrusted or weakly vetted models into a vulnerable SGLang deployment and expose the rerank functionality to users or applications that can trigger the bad template path.

Why this is more than an AI app bug

The vulnerability sits inside model-serving infrastructure, not only inside an application feature. That changes the blast radius. If the model server runs with access to internal data, service credentials, GPU nodes, or shared storage, a poisoned model can become a stepping stone to a broader compromise. This is an inference from CERT/CC’s impact statement that successful exploitation could lead to host compromise, lateral movement, data exfiltration, or denial of service.

The bug also fits a pattern security teams have started to watch more closely. Stuart Beck’s PoC notes that CVE-2026-5760 falls into the same general vulnerability class as the earlier “Llama Drama” issue in llama-cpp-python, where unsafe template handling also exposed model-serving environments.

That is why this finding matters even beyond SGLang. It shows that model metadata, templates, and serving frameworks can become a practical attack surface when developers treat model files as passive content rather than partially trusted code and configuration.

At a glance

ItemDetail
CVECVE-2026-5760
Affected componentSGLang /v1/rerank endpoint
Root causeUnsandboxed Jinja2 template rendering
Delivery methodMalicious GGUF model file with crafted tokenizer.chat_template
Attack triggerRequest to rerank endpoint after loading the poisoned model
Potential impactArbitrary code execution, host compromise, lateral movement, data theft, DoS

What defenders should do now

The first priority is to stop loading untrusted models into vulnerable SGLang environments. Teams that automatically pull models from public repositories should treat this as a supply-chain risk and review whether those pipelines allow unverified GGUF artifacts into production.

CERT/CC recommends using ImmutableSandboxedEnvironment instead of jinja2.Environment() when rendering chat templates. That is the clearest mitigation published so far. CERT/CC also notes that it did not receive a response or patch from the project maintainers during coordination, so defenders should not assume a vendor fix was already available at publication time.

Organizations should also isolate model-serving infrastructure more aggressively. In practice, that means running inference services with minimal privileges, limiting network reach, avoiding broad secrets exposure, and separating experimental model pipelines from production-facing inference servers. This guidance follows from the RCE impact described by CERT/CC and the supply-chain delivery path shown in the PoC.

  • Stop loading unverified GGUF models into SGLang deployments.
  • Assume /v1/rerank is high risk if the environment can load third-party models.
  • Replace unsandboxed Jinja2 rendering with ImmutableSandboxedEnvironment.
  • Restrict inference server privileges, outbound connectivity, and access to sensitive secrets or internal systems.
  • Review automated model-ingestion workflows for trust, provenance, and approval controls.

FAQ

What is CVE-2026-5760?

CVE-2026-5760 is a remote code execution flaw in SGLang’s reranking endpoint. It allows a malicious model file to execute arbitrary Python code through an unsafe Jinja2 chat template rendering path.

Does an attacker need direct access to the server?

Not necessarily. The attack can work if a victim loads a malicious GGUF model into SGLang and later triggers the vulnerable /v1/rerank path.

Are public model repositories part of the risk?

Yes. CERT/CC says the attacker can create a malicious model for SGLang, and the PoC describes a scenario where the victim downloads and loads the model from a public source such as Hugging Face.

Is there a published fix?

CERT/CC recommends switching to ImmutableSandboxedEnvironment for chat template rendering. At publication, CERT/CC said it had not received a maintainer response or patch during coordination.

Readers help support VPNCentral. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help VPNCentral sustain the editorial team Read more

User forum

0 messages