Building Nemesis AI Agents: A Tale of Many (Inter)faces

stelladelpilar
5 days ago
6 min read

When we set out to integrate AI capabilities into Nemesis, our Breach and Attack Simulation platform, we faced a common challenge: how do you build a flexible AI agent that supports multiple LLM backends (including OpenAI, Claude/Anthropic, other future models) while exposing them through different user interfaces tailored to distinct workflows?

The answer lay in creating a clean separation of concerns with two layers of abstraction: a backend interface that normalizes different AI providers, and dual UI interfaces that serve different user needs.

The Architecture: One Brain, Two Faces

At the core sits NemesisAssistant, our AI agent wrapper that abstracts away the complexity of different LLM backends (OpenAI/AutoGen and Claude Agent SDK). Think of it as a universal translator—it speaks the same language to our applications regardless of which AI model powers it underneath.

This assistant connects to our security platform through the Model Context Protocol (MCP), giving it access to:

Real-time assessment data and results
MITRE ATT&CK technique coverage analysis
Threat intelligence and detection gaps
Compliance reporting capabilities

The key insight? Security teams don't want to write queries or navigate dashboards when under pressure. They want to ask: "Which credential theft techniques are we vulnerable to?" or "Show me prevention gaps from last week's ransomware simulation."

Real-Time Chatbot Interface

Our chatbot interface handles high-concurrency, ephemeral connections—perfect for embedded chat widgets or mobile apps where users expect instant responses. This interface is embedded directly into the Nemesis platform, allowing analysts to access AI assistance without leaving their current workflow. They can ask questions about the assessment they're viewing, request analysis of results on screen, or get guidance on configurations—all within the same browser context.

WebSocket architecture for real-time interaction:

We expose a single WebSocket endpoint that provides bidirectional streaming, enabling both real-time response delivery and cancellation of long-running tasks.

The bidirectional nature also enables a keep-alive mechanism through periodic ping/pong messages. This is essential in enterprise environments where load balancers, firewalls, and proxy servers often terminate idle connections after 30-60 seconds. Without keep-alive, an analyst asking a complex question that takes 2 minutes to fully answer would see their connection silently dropped mid-response.

Resource management:

We support streaming cancellation that gives the ability to immediately terminate the current task and clean up resources.

The interface also implements idle timeouts and maximum connection duration limits to prevent resource leaks from forgotten browser tabs—critical when you're paying per-token for LLM calls.

Advanced AI Operator Interface

For deeper investigative work, our advanced operator interface provides a persistent plus shareable conversation experience backed by PostgreSQL. We built it on using Chainlit, a framework designed for AI chat applications. We had to integrate it deeply with our platform's authentication and authorization system.

The authentication challenge was significant. Chainlit has its own authentication mechanisms when running as a stand-alone application, but we needed it to respect our platform's existing JWT-based authentication and multi-tenant access controls. The solution was implementing Chainlit's authentication callback to intercept the WebSocket handshake, extract our JWT tokens from cookies, validate them against our internal authentication service, and only then create a Chainlit session.

Token validation happens in two steps: first, we extract the JWT from our platform’s auth cookie, then call our validation endpoint. Only after successful validation do we create a chainlit.User object that Chainlit recognizes.

JWT tokens expire, so we implemented proactive token refresh by calling our platform's refresh endpoint before the token expires. This ensures analysts never experience session interruptions during critical investigations.

Message history restoration was another challenge. When analysts share conversation URLs with colleagues, they expect context to carry over. We serialize the full conversation history into a single context-priming message:

transcript = "=== Previous Conversation Context ===\n"

for msg in chat_history:

transcript += f"[{msg['role'].upper()}]: {msg['content']}\n"

This approach works for different backends like Claude and OpenAI without requiring backend-specific state management, though it does increase token consumption proportional to conversation length.

RAG Knowledge Base: Starting Simple, Planning for Scale

To ground the AI agent in Nemesis-specific knowledge, we implemented a Retrieval-Augmented Generation (RAG) system as a custom agent tool. The agent can query platform documentation, best practices, and troubleshooting guides using semantic similarity search.

As the next evolutionary step we're designing a comprehensive knowledge management system with:

Dynamic ingestion pipelines — Automatically generate embeddings from updated documentation and assessment results
Intra-tenant data isolation — Nemesis environments are standalone and even within them you can compartmentalize into isolated organizations. Each organization's custom scenarios, internal runbooks, and threat intelligence are stored separately with proper access controls
Public and private knowledge bases — Shared security best practices remain globally accessible; organization-specific data (vulnerable assets, remediation procedures, incident postmortems) stay private

The beauty of our tool-based architecture is that this upgrade will be transparent to the agent. The query_knowledge_base tool interface remains the same; the backend just gets smarter.

Configuration-Driven Architecture

Everything we've described is fully configurable via YAML. You can swap LLM backends, enable/disable MCP servers, configure built-in tools, adjust system prompts, and control authentication flows without touching code.

Example configuration structure:

agent:

type: claudeagentsdk

claudeagentsdk:

model: claude-sonnet-4-5

max_turns: 1000

disallowed_tools:

- Bash # Block shell execution

- Write # Block file modifications

- Delete # Block file deletion

builtin_tools:

web_search:

enabled: true

read:

enabled: true

system_message: |

You are Nemesis, an expert AI security consultant...

[extensive prompt engineering here]

mcp_servers:

enabled: true

servers:

- name: nemesis_observer

enabled: true

transport: http

url: ${NEMESIS_OBSERVER_MCP_URL}

tools:

- "mcp__nemesis_observer__list_assessments"

- "mcp__nemesis_observer__fetch_atomic_runs"

- name: ti_threat_intel

enabled: true

transport: http

url: ${TI_MCP_URL}

tools:

- "mcp__ti__get_report"

- "mcp__ti__search_actors"

knowledge_base:

...

tools:

query_knowledge_base:

enabled: true

security_tools:

threat_analysis:

enabled: true

vulnerability_assessment:

enabled: true

This configuration-first approach means:

Backend flexibility — Switch LLM backends by changing agent.type
Tool governance — Security teams control which operations the AI can perform
Environment-specific configs — Development uses mock MCP servers; production connects to live platform APIs
Feature toggles — Enable/disable capabilities (web search, code execution, threat intelligence) per deployment

The Backend Abstraction Layer

Both UI interfaces rely on a common AgentBackend abstraction that handles:

Streaming responses — Yielding tokens as they arrive for responsive UX
History management — Maintaining conversation context across turns
Resource cleanup

Here's the contract both backends implement:

class AgentBackend(ABC):

@abstractmethod

async def stream_response(

self, message: str, cancellation_token: Optional[CancellationToken] = None

) -> AsyncGenerator[str, None]:

"""Stream response tokens or control markers like [[TOOL:name]]"""

@abstractmethod

async def restore_conversation_history(self) -> None:

"""Restore prior conversation context"""

@abstractmethod

async def shutdown(self) -> None:

"""

Shutdown the backend and cleanup resources.

This includes:

- Closing connections to LLM APIs

- Terminating MCP server processes (if using stdio transport)

- Releasing memory and file handles

- Cancelling pending async tasks

"""

The real power emerges when the agent calls MCP tools. For example, when an analyst asks about failed prevention tests:

# Agent internally calls via MCP:

mcp__nemesis_observer__get_assessment_atomics_by_run_id(

run_id=latest_run,

filter_status="PREVENTED_FALSE"

)

The UI streams back:

[[TOOL:get_assessment_atomics_by_run_id]] # Tool usage indicator

Then the actual analysis results, formatted naturally by the LLM.

Why This Matters for Security

Traditional BAS platforms force security teams to learn proprietary query languages or click through dozens of filters to answer simple questions. Our AI agent approach inverts this:

Natural language queries replace complex filters
Contextual follow-ups enable investigative workflows ("Now show me mitigation steps")
Threat intelligence integration via web search tools provides external context
Automated report generation from raw assessment data

Most importantly, the dual-interface architecture means:

SOC analysts get instant answers via the chatbot interface during active incidents
Security engineers can deep-dive with persistent operator sessions, sharing findings with stakeholders via URLs
Executives receive AI-generated summaries without touching the underlying complexity

What We Learned

Token refresh must be proactive — Waiting until expiration causes conversation interruptions
Backend abstraction pays off — Switching from OpenAI to Claude takes one config change, no interface rewrites - we’re ready for the future and even custom LLM hosting
Configuration is king — YAML-driven architecture lets security teams control AI behavior without developer involvement
Knowledge is evolving, not static — Start with simple embedding stores, but design for dynamic knowledge ingestion from day one

Building Nemesis AI Agents: A Tale of Many (Inter)faces

The Architecture: One Brain, Two Faces

Real-Time Chatbot Interface

Advanced AI Operator Interface

RAG Knowledge Base: Starting Simple, Planning for Scale

Configuration-Driven Architecture

The Backend Abstraction Layer

Why This Matters for Security

What We Learned

Recent Posts

Keep up with the news!

Want to learn more about how Nemesis can help you?