NexSev: AI-Powered Incident Response

How we used LLMs and agentic workflows to cut Sev1 response time by 40%

The problem: incident response is a time sink

If you've been on-call for enterprise infrastructure, you know the pattern: an alert fires, you troubleshoot, you resolve—then the administrative work begins: detailed Root Cause Analysis (RCA), Customer Action Notices (CANs), searching past tickets, and updating the knowledge base.

At HashiCorp, our support team was spending 2+ hours per Sev1 incident on post-resolution work. Across dozens of critical incidents per month, that added up to weeks of manual documentation.

During IBM's AI HackWeek, I led a cross-functional team to build NexSev—an AI-powered incident response assistant that automates those workflows.

The vision: an AI teammate for support engineers

The goal was not to replace engineers—it was to handle the tedious parts so people can focus on solving customer problems.

What we wanted NexSev to do:

Auto-generate RCA documentation from incident data and resolution notes
Draft Customer Action Notices with the right technical depth and tone
Surface relevant historical solutions from the knowledge base when similar issues arise
Provide real-time troubleshooting guidance during active incidents

In short: turn collective institutional knowledge into an always-available assistant.

Architecture: agentic workflows + knowledge retrieval

Tech stack

LLMs: Llama 3.1 and IBM Granite (via Ollama for local hosting)
Orchestration: LangChain for agentic workflows
Knowledge base: Vector embeddings of historical Zendesk tickets
Integrations: Custom MCP (Model Context Protocol) tools for Slack and Zendesk
Frontend: Next.js for document generation and review
Backend: Python for LLM orchestration and API integrations

Why this stack

Local LLMs (Ollama): Customer data had to stay inside our infrastructure. Running Llama 3.1 and Granite locally meant sensitive context did not leave the environment.

MCP: Custom tools gave the model access to Zendesk (tickets, customer context, notes), internal knowledge (past RCAs, runbooks), and Slack (updates and commands).

Agentic design: Instead of one brittle mega-prompt, the assistant could decide what to fetch, call tools, synthesize drafts, and pause for human review before finalizing.

Implementation: from concept to production

Phase 1: Knowledge base retrieval

Historical incidents were unstructured and noisy. Our approach:

Extract resolved Sev1 and Sev2 tickets from Zendesk
Clean and chunk descriptions, resolution notes, and RCAs where available
Embed with a lightweight model and store in a vector database
At incident time, retrieve by error patterns, components (TFE, PostgreSQL, Redis, VCS), and environment (cloud, topology)

Phase 2: RCA generation

A strong RCA needs accuracy, timeline, real root cause (not symptoms), remediation, and prevention. We combined ticket data with retrieved similar incidents, used structured outputs (function calling), and presented a draft for engineer refinement.

The key insight: don't chase perfect AI copy—ship a strong first draft an engineer can polish in minutes instead of writing from scratch for an hour.

Phase 3: Slack integration

Support engineers live in Slack during incidents. We shipped slash-style workflows such as:

/nexsev analyze — suggest troubleshooting steps for the active incident
/nexsev rca [ticket-id] — RCA draft for a resolved incident
/nexsev can [ticket-id] — CAN draft
/nexsev search [query] — knowledge base search

The bot used MCP-backed tools to pull Zendesk context, search knowledge, generate documents, and post results back to the channel.

Phase 4: Next.js review interface

Long-form review needed more than a thread. We built a Next.js app to review and edit RCAs and CANs, track approval state, and export finalized documents to Zendesk—closing the loop so engineer edits inform prompt improvements over time.

Impact: ~40% reduction in incident documentation time

After rollout to our APJ support team, we measured:

RCA: ~90 minutes average → ~30 minutes to review and finalize AI-generated drafts (~67% reduction)
CAN: ~30 minutes → ~10 minutes to finalize (~67% reduction)
Knowledge retrieval: automated and surfaced during incidents instead of ad hoc search

Overall: roughly 40% less time on post-incident documentation, with better consistency, fewer missed sections (timelines, prevention), and less cognitive load on engineers during high-pressure events.

Lessons learned

Local LLMs are viable in production when privacy matters—fast enough, no per-token bill, and room to customize.
Agentic flows beat one-shot prompts for reliability when context must be gathered from multiple systems.
Human-in-the-loop is non-negotiable for customer-facing quality—draft, review, then publish.
Adoption follows the workflow—Slack integration mattered as much as model quality.

What's next

NexSev is in production for APJ; the roadmap includes expansion to other regions, proactive signals, tighter feedback loops for fine-tuning, and deeper observability integrations (e.g. Datadog / Prometheus).

Technical details

Reference stack

Backend:
- Python 3.11
- LangChain (agent orchestration)
- Ollama (local LLMs)
- FastAPI (API)
- Vector DB (Chroma)

Frontend:
- Next.js (App Router)
- TypeScript
- Tailwind CSS
- shadcn/ui

Integrations:
- Slack SDK (Bolt for Python)
- Zendesk API
- Custom MCP tools

Models:
- Llama 3.1 (8B / 70B)
- IBM Granite 13B

Implementation tips: start with retrieval before generation; use structured outputs; bake review into v1; prioritize low-friction entry (Slack first).

Conclusion

NexSev began as a HackWeek experiment and became a production system that saves the team significant time each month while improving consistency. The direction that resonates most: use AI for the repetitive work so engineers can stay focused on hard technical problems and customer outcomes.