Building Nemesis AI Agents: A Tale of Many (Inter)faces
top of page

Building Nemesis AI Agents: A Tale of Many (Inter)faces

When we set out to integrate AI capabilities into Nemesis, our Breach and Attack Simulation platform, we faced a common challenge: how do you build a flexible AI agent that supports multiple LLM backends (including OpenAI, Claude/Anthropic, other future models) while exposing them through different user interfaces tailored to distinct workflows?

The answer lay in creating a clean separation of concerns with two layers of abstraction: a backend interface that normalizes different AI providers, and dual UI interfaces that serve different user needs.




The Architecture: One Brain, Two Faces

At the core sits NemesisAssistant, our AI agent wrapper that abstracts away the complexity of different LLM backends (OpenAI/AutoGen and Claude Agent SDK). Think of it as a universal translator—it speaks the same language to our applications regardless of which AI model powers it underneath.

This assistant connects to our security platform through the Model Context Protocol (MCP), giving it access to:

  • Real-time assessment data and results

  • MITRE ATT&CK technique coverage analysis

  • Threat intelligence and detection gaps

  • Compliance reporting capabilities

The key insight? Security teams don't want to write queries or navigate dashboards when under pressure. They want to ask: "Which credential theft techniques are we vulnerable to?" or "Show me prevention gaps from last week's ransomware simulation."




Real-Time Chatbot Interface

Our chatbot interface handles high-concurrency, ephemeral connections—perfect for embedded chat widgets or mobile apps where users expect instant responses. This interface is embedded directly into the Nemesis platform, allowing analysts to access AI assistance without leaving their current workflow. They can ask questions about the assessment they're viewing, request analysis of results on screen, or get guidance on configurations—all within the same browser context.

WebSocket architecture for real-time interaction:

We expose a single WebSocket endpoint that provides bidirectional streaming, enabling both real-time response delivery and cancellation of long-running tasks.

The bidirectional nature also enables a keep-alive mechanism through periodic ping/pong messages. This is essential in enterprise environments where load balancers, firewalls, and proxy servers often terminate idle connections after 30-60 seconds. Without keep-alive, an analyst asking a complex question that takes 2 minutes to fully answer would see their connection silently dropped mid-response.

Resource management:

We support streaming cancellation that gives the ability to immediately terminate the current task and clean up resources.

The interface also implements idle timeouts and maximum connection duration limits to prevent resource leaks from forgotten browser tabs—critical when you're paying per-token for LLM calls.




Advanced AI Operator Interface

For deeper investigative work, our advanced operator interface provides a persistent plus shareable conversation experience backed by PostgreSQL. We built it on using Chainlit, a framework designed for AI chat applications. We had to integrate it deeply with our platform's authentication and authorization system.

The authentication challenge was significant. Chainlit has its own authentication mechanisms when running as a stand-alone application, but we needed it to respect our platform's existing JWT-based authentication and multi-tenant access controls. The solution was implementing Chainlit's authentication callback to intercept the WebSocket handshake, extract our JWT tokens from cookies, validate them against our internal authentication service, and only then create a Chainlit session.

Token validation happens in two steps: first, we extract the JWT from our platform’s auth cookie, then call our validation endpoint. Only after successful validation do we create a chainlit.User object that Chainlit recognizes.

JWT tokens expire, so we implemented proactive token refresh by calling our platform's refresh endpoint before the token expires. This ensures analysts never experience session interruptions during critical investigations.

Message history restoration was another challenge. When analysts share conversation URLs with colleagues, they expect context to carry over. We serialize the full conversation history into a single context-priming message:


transcript = "=== Previous Conversation Context ===\n"

for msg in chat_history:

    transcript += f"[{msg['role'].upper()}]: {msg['content']}\n"


This approach works for different backends like Claude and OpenAI without requiring backend-specific state management, though it does increase token consumption proportional to conversation length.




RAG Knowledge Base: Starting Simple, Planning for Scale

To ground the AI agent in Nemesis-specific knowledge, we implemented a Retrieval-Augmented Generation (RAG) system as a custom agent tool. The agent can query platform documentation, best practices, and troubleshooting guides using semantic similarity search.

As the next evolutionary step we're designing a comprehensive knowledge management system with:

  • Dynamic ingestion pipelines — Automatically generate embeddings from updated documentation and assessment results

  • Intra-tenant data isolation — Nemesis environments are standalone and even within them you can compartmentalize into isolated organizations. Each organization's custom scenarios, internal runbooks, and threat intelligence are stored separately with proper access controls

  • Public and private knowledge bases — Shared security best practices remain globally accessible; organization-specific data (vulnerable assets, remediation procedures, incident postmortems) stay private

The beauty of our tool-based architecture is that this upgrade will be transparent to the agent. The query_knowledge_base tool interface remains the same; the backend just gets smarter.




Configuration-Driven Architecture

Everything we've described is fully configurable via YAML. You can swap LLM backends, enable/disable MCP servers, configure built-in tools, adjust system prompts, and control authentication flows without touching code.


Example configuration structure:


agent:

  type: claudeagentsdk

  

  claudeagentsdk:

    model: claude-sonnet-4-5

    max_turns: 1000

    disallowed_tools:

      - Bash          # Block shell execution

      - Write         # Block file modifications

      - Delete        # Block file deletion

    

    builtin_tools:

      web_search:

        enabled: true

      read:

        enabled: true

  

  system_message: |

    You are Nemesis, an expert AI security consultant...

    [extensive prompt engineering here]


mcp_servers:

  enabled: true

  servers:

    - name: nemesis_observer

      enabled: true

      transport: http

      url: ${NEMESIS_OBSERVER_MCP_URL}

      tools:

        - "mcp__nemesis_observer__list_assessments"

        - "mcp__nemesis_observer__fetch_atomic_runs"

    

    - name: ti_threat_intel

      enabled: true

      transport: http

      url: ${TI_MCP_URL}

      tools:

        - "mcp__ti__get_report"

        - "mcp__ti__search_actors"


knowledge_base:

   ...


tools:

  query_knowledge_base:

    enabled: true

  security_tools:

    threat_analysis:

      enabled: true

    vulnerability_assessment:

      enabled: true


This configuration-first approach means:

  • Backend flexibility — Switch LLM backends by changing agent.type

  • Tool governance — Security teams control which operations the AI can perform

  • Environment-specific configs — Development uses mock MCP servers; production connects to live platform APIs

  • Feature toggles — Enable/disable capabilities (web search, code execution, threat intelligence) per deployment




The Backend Abstraction Layer

Both UI interfaces rely on a common AgentBackend abstraction that handles:

  1. Streaming responses — Yielding tokens as they arrive for responsive UX

  2. History management — Maintaining conversation context across turns

  3. Resource cleanup


Here's the contract both backends implement:


class AgentBackend(ABC):

    @abstractmethod

    async def stream_response(

        self, message: str, cancellation_token: Optional[CancellationToken] = None

    ) -> AsyncGenerator[str, None]:

        """Stream response tokens or control markers like [[TOOL:name]]"""

        

    @abstractmethod

    async def restore_conversation_history(self) -> None:

        """Restore prior conversation context"""

    

    @abstractmethod

    async def shutdown(self) -> None:

        """

        Shutdown the backend and cleanup resources.

        

        This includes:

        - Closing connections to LLM APIs

        - Terminating MCP server processes (if using stdio transport)

        - Releasing memory and file handles

        - Cancelling pending async tasks

        """

The real power emerges when the agent calls MCP tools. For example, when an analyst asks about failed prevention tests:

# Agent internally calls via MCP:

mcp__nemesis_observer__get_assessment_atomics_by_run_id(

    run_id=latest_run,

    filter_status="PREVENTED_FALSE"

)

The UI streams back:

[[TOOL:get_assessment_atomics_by_run_id]]  # Tool usage indicator

Then the actual analysis results, formatted naturally by the LLM.




Why This Matters for Security

Traditional BAS platforms force security teams to learn proprietary query languages or click through dozens of filters to answer simple questions. Our AI agent approach inverts this:

  • Natural language queries replace complex filters

  • Contextual follow-ups enable investigative workflows ("Now show me mitigation steps")

  • Threat intelligence integration via web search tools provides external context

  • Automated report generation from raw assessment data

Most importantly, the dual-interface architecture means:

  • SOC analysts get instant answers via the chatbot interface during active incidents

  • Security engineers can deep-dive with persistent operator sessions, sharing findings with stakeholders via URLs

  • Executives receive AI-generated summaries without touching the underlying complexity




What We Learned

  1. Token refresh must be proactive — Waiting until expiration causes conversation interruptions

  2. Backend abstraction pays off — Switching from OpenAI to Claude takes one config change, no interface rewrites - we’re ready for the future and even custom LLM hosting

  3. Configuration is king — YAML-driven architecture lets security teams control AI behavior without developer involvement

  4. Knowledge is evolving, not static — Start with simple embedding stores, but design for dynamic knowledge ingestion from day one


 
 

Keep up with the news!

Subscribe to keep updated about the latest product features, technology news and resources.

Want to learn more about how Nemesis can help you?

Fill in the form and we will contact you shortly or you can always reach us out via: info@persistent-security.net

Schedule an appointment
December 2025
SunMonTueWedThuFriSat
Week starting Sunday, December 7
Time zone: Coordinated Universal Time (UTC)Online meeting
Friday, Dec 12
10:00 AM - 11:00 AM
11:00 AM - 12:00 PM
12:00 PM - 1:00 PM
1:00 PM - 2:00 PM
bottom of page