Kali Linux shows how to run AI-assisted pentesting fully offline with Ollama, 5ire, and MCP Kali Server

Home » News

Yash

News

4 min. read

Published on March 11, 2026

Kali Linux has published a new guide that shows how to run AI-assisted penetration testing fully offline, without sending prompts, targets, or tool output to a cloud service. The setup uses a local GPU, Ollama for model serving, MCP Kali Server for tool access, and 5ire as the desktop client that connects everything together.

That matters for red teams, lab environments, and anyone handling sensitive testing data. Kali’s guide makes the case that local inference can reduce privacy and operational security concerns because the model, the tool bridge, and the interface all stay on the same machine.

BEST SPRING 2026 DEALS

Editor's Choice

Private Internet Access

Access content across the globe at the highest speed rate.

70% of our readers choose Private Internet Access

70% of our readers choose ExpressVPN

ExpressVPN

Browse the web from multiple devices with industry-standard security protocols.

Nord VPN

Faster dedicated servers for specific actions (currently at summer discounts)

Kali’s test system used an NVIDIA GeForce GTX 1060 with 6 GB of VRAM. The team switched to NVIDIA’s proprietary driver stack so CUDA could handle inference, and the guide shows nvidia-smi reporting driver version 550.163.01 and CUDA 12.4 after setup.

The local model layer runs through Ollama. Kali describes Ollama as a wrapper for llama.cpp, and the guide uses it to pull and serve models small enough to fit inside the system’s VRAM budget. The team tested llama3.1:8b, llama3.2:3b, and qwen3:4b, all chosen because they support tools, which is required for MCP-driven command execution.

The security tooling side comes from mcp-kali-server, which Kali says is already available in its repositories. In the guide, the package exposes a local Flask-based API on 127.0.0.1:5000 and checks for tools such as nmap, gobuster, dirb, and nikto before the MCP client connects.

Because Ollama does not natively support MCP, Kali uses 5ire as the bridge. The guide installs 5ire as an AppImage, enables Ollama as the provider, turns on tool support for each model, and registers /usr/bin/mcp-server as the local tool endpoint inside the GUI.

Kali then validates the full stack with a simple real-world test. Using qwen3:4b, the assistant receives a natural-language prompt asking for a TCP scan of scanme.nmap.org on ports 80, 443, 21, and 22, then invokes nmap through the MCP chain and returns structured results while ollama ps shows GPU-backed processing.

How the offline Kali AI setup works

Component	Role in the stack
NVIDIA GPU + CUDA	Runs local model inference
Ollama	Serves local LLMs
Supported models	`llama3.1:8b`, `llama3.2:3b`, `qwen3:4b`
MCP Kali Server	Exposes pentesting tools through a local API
5ire	Desktop client that connects Ollama to MCP tools

What Kali actually installed

NVIDIA proprietary drivers for CUDA-backed inference
Ollama as a persistent systemd service
Local models with tool support enabled
mcp-kali-server plus tools such as nmap, dirb, gobuster, nikto, hydra, john, sqlmap, and wpscan
5ire AppImage configured to use Ollama and /usr/bin/mcp-server

Why this matters for pentesting teams

Kali is not pitching this as a magic autopwn button. The guide shows a practical way to let a local language model interpret natural-language requests and trigger standard security tools on the same machine. That can help with repetitive tasks, lab work, internal testing, and training environments where sending data to outside AI services is not acceptable. This is an inference from Kali’s setup and validation flow.

The hardware limit also matters. Kali chose models that fit a 6 GB VRAM card, which means the setup is accessible on older consumer hardware, but it also means teams need to manage expectations around model size and speed. The tradeoff here is clear: lower recurring service risk in exchange for local hardware constraints. This is an inference based on the published hardware and model sizes in the guide.

Key takeaways

Kali’s new guide shows a fully local AI-assisted pentesting workflow with no cloud dependency.
The stack relies on Ollama for local models, MCP Kali Server for tool access, and 5ire as the MCP-capable desktop client.
Kali tested the setup on an NVIDIA GTX 1060 6 GB system with CUDA enabled.
The published demo used natural language to trigger an nmap scan against scanme.nmap.org.

FAQ

What is new in Kali Linux’s latest AI security guide?

Kali’s latest guide focuses on running the full LLM-assisted testing stack locally with Ollama, 5ire, and MCP Kali Server, instead of relying on cloud AI services.

Does this setup require an NVIDIA GPU?

Kali’s published walkthrough uses an NVIDIA GPU with CUDA support, specifically a GeForce GTX 1060 6 GB. The guide presents that as the reference hardware for local inference.

What models did Kali test?

The guide tested llama3.1:8b, llama3.2:3b, and qwen3:4b. Kali says tool support was a required feature for model selection.

What does MCP Kali Server do?

It acts as the bridge between the language model workflow and Kali’s local security tools by exposing them through a local API and MCP server process.

Yash

I am a Business Analytics student with a strong interest in publishing well-researched and data-driven news articles. I focus on analyzing trends in business, finance, and technology to create clear, accurate, and engaging content for readers. I enjoy transforming complex data and information into simple, meaningful stories that help audiences understand current developments. With analytical thinking and attention to detail, I aim to deliver credible and insightful news that adds real value to readers.

Readers help support VPNCentral. We may get a commission if you buy through our links.

Improve this guide

User forum

0 messages

Sort by:

How the offline Kali AI setup works

What Kali actually installed

Why this matters for pentesting teams

Key takeaways

FAQ

Leave a Reply Cancel reply