Enterprise Playbooks

Step-by-step guides for common enterprise scenarios. Each playbook walks you through a complete implementation from planning to production, with checkpoints and success criteria at every stage.

These playbooks are designed to be followed in order within each section. Complete each step before moving to the next. Time estimates assume a team of 2-3 people with platform familiarity.

Playbook 1: First Agent Deployment

First Agent Deployment Playbook

Timeline: 2–3 weeks Team: Agent Manager + Knowledge Editor + Governance Admin Difficulty: Beginner

This playbook takes you from zero to a production-deployed agent, following all governance and quality requirements.

Define the agent's scope and persona. Document the agent's purpose, target audience, knowledge domains, tone of voice, and escalation triggers. Write this as a one-page brief that stakeholders can review. A well-defined scope prevents scope creep and makes evaluation straightforward.
Set up RBAC roles for the project team. Assign the Agent Manager role to the person building the agent, Knowledge Editor to the person curating content, and ensure a Governance Admin is available for approvals. Verify separation of duties is respected.
Configure knowledge sources. Navigate to Knowledge Ops > Sources and add your first data source. Start with a single, high-quality source (e.g., your product documentation). Configure the crawl schedule and set up include/exclude URL patterns to focus on relevant content only.
Run the first knowledge pipeline. Trigger a manual pipeline run and monitor it in Pipeline Runs. Verify all stages complete successfully. Review extracted content in the Review tab, checking for extraction quality, correct formatting, and PII detection results.
Approve knowledge and map agent coverage. Review and approve content in the Review queue. Then go to Agent Coverage and assign the approved knowledge domains to your agent. Verify full coverage with no gaps in the coverage matrix.
Create the agent. Navigate to Agent Workforce and create a new agent. Configure the agent's identity (name, avatar, persona description), assign the knowledge domains, set the system prompt, and configure response parameters (temperature, max tokens, citation style).
Build the golden set. Create at least 50 question-answer pairs that cover the agent's knowledge domains. Include 20 easy, 15 medium, and 15 hard questions. Add 5 out-of-scope questions to test graceful decline behavior. Have domain experts (not the agent builder) create these.
Run the first evaluation. Navigate to Agent Readiness and run a full evaluation against your golden set. Review scores across all 7 dimensions. If any required threshold is not met, iterate on the agent's prompt, knowledge, or retrieval configuration and re-evaluate.
Configure safety gates. Enable all default safety gates (PII detection, toxicity, hallucination, topic boundary, prompt injection). Test each gate by crafting adversarial inputs in the evaluation sandbox. Verify gates trigger correctly and block unsafe responses.
Pass the production gate. Run a final evaluation and confirm all automated checks pass. Complete the manual review checkpoints: sample conversation review, edge case verification, persona consistency, and escalation path testing.
Submit for governance approval. The production gate generates a deployment request that enters the governance Approval Queue. The Governance Admin reviews the evaluation results, safety gate configuration, and knowledge coverage before approving.
Deploy to production with canary. Deploy the agent with a canary rollout (10% of traffic initially). Monitor the Trust Overview dashboard, violation timeline, and user feedback for the first 48 hours. If metrics are stable, increase to 50%, then 100%.
Set up ongoing monitoring. Configure alerts for trust score drops, safety gate trigger rate increases, and negative feedback spikes. Schedule weekly evaluation runs against your golden set. Assign an Operator to monitor the inbox and handle escalations.

Success criteria: Agent deployed to production with composite evaluation score above 0.80, safety score above 0.90, zero critical violations in first 48 hours, and canary rollout completed without quality regression.

Playbook 2: Knowledge Pipeline Setup

Knowledge Pipeline Setup Playbook

Timeline: 1–2 weeks Team: Knowledge Editor + Compliance Officer Difficulty: Intermediate

This playbook establishes a production-grade knowledge pipeline with multiple sources, compliance controls, and automated refresh cycles.

Audit existing content sources. Inventory all potential knowledge sources: documentation sites, help centers, internal wikis, support ticket archives, product specifications, and training materials. Prioritize by relevance, quality, and update frequency. Document each source's owner, refresh cadence, and sensitivity level.
Configure PII detection rules. Before ingesting any content, configure PII detection in the compliance gate settings. Add any custom PII patterns specific to your organization (employee IDs, internal codes, patient identifiers). Run a test scan against a sample document to verify detection accuracy.
Set up data sources incrementally. Add sources one at a time, starting with the highest quality source. For each source: configure the connector, set crawl parameters (depth, rate limit, include/exclude patterns), and trigger a test crawl. Verify extraction quality before adding the next source.
Establish the review workflow. Assign Knowledge Editors to the review queue. Define review criteria: content must be accurate, current, free of PII, and relevant to at least one agent's domain. Set up review SLAs (e.g., content must be reviewed within 48 hours of extraction).
Configure the Tenant Intelligence pipeline. Review the 12-stage pipeline settings. Adjust entity extraction models to your domain (e.g., enable medical entity recognition for healthcare, financial entity recognition for fintech). Set contradiction detection sensitivity and freshness TTLs appropriate to your content types.
Set up deduplication thresholds. The default semantic similarity threshold of 0.92 works well for most content. If you have many similar but distinct documents (e.g., per-product FAQs), consider raising the threshold to 0.95 to prevent over-aggressive merging.
Configure compliance gate rules. Define which compliance frameworks apply to your content (SOC 2, HIPAA, GDPR, internal policies). Set sensitivity classification rules. Configure the approval chain for regulated content (Knowledge Editor → Compliance Officer → Governance Admin).
Run the full pipeline end-to-end. Trigger a complete pipeline run across all configured sources. Monitor each of the 7 stages. Review the output: check entity extraction quality, verify deduplication decisions, confirm compliance classification accuracy.
Validate retrieval quality. Run the Training & Eval stage and verify retrieval metrics meet targets (Recall@5 ≥ 0.90, MRR ≥ 0.80, P95 latency ≤ 200ms). If metrics fall short, investigate chunk size settings, embedding model selection, and index configuration.
Set up automated crawl schedules. Configure recurring crawl schedules for each source based on update frequency. Documentation sites: daily. Knowledge bases: every 6 hours. CRM exports: weekly. Set up pipeline health monitoring alerts.
Publish the initial knowledge base. After all content has passed through the pipeline and compliance gate, initiate the publish process. Use the full approval chain. Create a versioned snapshot for rollback capability.
Document the pipeline for your team. Create internal documentation covering: which sources are configured, crawl schedules, review responsibilities, compliance requirements, and escalation procedures for pipeline failures.

Success criteria: All sources connected and crawling on schedule, PII detection configured and tested, compliance gate rules defined, retrieval metrics meeting targets, and first knowledge base published with full approval chain.

Playbook 3: Governance Implementation

Governance Implementation Playbook

Timeline: 2–4 weeks Team: Governance Admin + Platform Owner + Compliance Officer Difficulty: Advanced

This playbook establishes a comprehensive AI governance framework across your organization.

Define governance objectives. Document what governance means for your organization. Common objectives: ensure agent safety, maintain regulatory compliance, provide audit trail for accountability, control AI risk, and enable responsible scaling. Align objectives with your existing corporate governance framework.
Design the RBAC model. Map your organizational structure to Operanix's 8 roles. Identify who fills each role. Ensure separation of duties: the person building agents should not be the sole approver. For large organizations, create custom roles that match your department structure.
Implement RBAC assignments. Assign roles to all team members. Enable MFA for all users (enforce at tenant level). Set session timeout and idle timeout policies. Configure time-limited access for external consultants or auditors.
Configure governance policies. Create policies for each risk category: content safety, data privacy, topic boundaries, response quality, and regulatory compliance. Start with policies in "warn" mode (log but don't block) for 2 weeks to baseline false positive rates before switching to "block" mode.
Set up safety gates. Enable all built-in safety gates. Configure custom PII patterns, toxicity sensitivity levels, and topic boundary rules. Add industry-specific regulatory gates if applicable (healthcare disclaimers, financial advice restrictions). Test each gate with adversarial examples.
Build approval chains. Configure approval chains for all high-impact actions: agent deployment, knowledge publication, policy changes, RBAC modifications, and workflow activation. Set SLAs and auto-escalation rules. Assign backup approvers for each chain step.
Set up the audit trail. Configure audit retention period (minimum 1 year; 7 years for regulated industries). Set up audit trail integrity verification. Configure alerts for unusual audit patterns. Create saved filter presets for common audit review scenarios.
Establish the trust score baseline. Run the trust score calculation and record your initial baseline across all 5 dimensions. Set target scores for 30, 60, and 90 days. Configure alerts for score drops exceeding 5 points.
Configure compliance exports. Set up the compliance export templates matching your regulatory requirements (SOC 2, HIPAA, GDPR, or custom). Schedule automatic monthly export generation. Designate a Compliance Officer to review exports and file them with your compliance documentation.
Train the team. Conduct training sessions for each role: Agent Managers on the approval process, Knowledge Editors on compliance tagging, Operators on escalation handling, and Analysts on dashboard interpretation. Document procedures in your internal runbook.
Run a governance dry run. Simulate a full governance cycle: create a test agent, push it through the knowledge pipeline, run evaluations, trigger the production gate, go through the approval chain, and deploy. Identify bottlenecks and process gaps.
Go live and monitor. Switch policies from "warn" to "block" mode. Monitor the Trust Overview dashboard daily for the first month. Hold weekly governance review meetings to address violations, tune policies, and adjust approval SLAs based on actual team capacity.

Success criteria: All 8 RBAC roles assigned with MFA, policies active in block mode with false-positive rate below 5%, approval chains configured with SLAs, trust score above 75 (Strong posture), and compliance export generated successfully.

Playbook 4: Enterprise Rollout

Enterprise Rollout Playbook

Timeline: 8–12 weeks Team: Platform Owner + all roles Difficulty: Advanced

This playbook guides a phased enterprise rollout from pilot to full production across multiple departments and agent types.

Executive alignment and success criteria. Secure executive sponsorship by defining clear success metrics: target number of agents, expected deflection rate, cost savings projections, and compliance requirements. Set quarterly OKRs for the AI operations program. Document the business case and expected ROI timeline.
Pilot department selection. Choose a pilot department with high-volume, well-documented processes (typically customer support or IT help desk). The pilot should have a willing champion, existing knowledge base, measurable outcomes, and tolerance for iteration. Avoid starting with regulated departments.
Complete prerequisites. Before the pilot, complete three foundational playbooks: Knowledge Pipeline Setup, Governance Implementation, and First Agent Deployment. These establish the infrastructure, processes, and team skills needed for a successful rollout.
Pilot deployment (Weeks 1–4). Deploy the first agent to the pilot department with a canary rollout. Start with 10% of incoming queries, increasing to 50% after 1 week if metrics are stable. Assign dedicated Operators to monitor conversations, handle escalations, and provide daily feedback summaries.
Pilot measurement and iteration (Weeks 3–6). Measure pilot results against success criteria: resolution rate, response quality scores, escalation rate, user satisfaction, and agent accuracy. Iterate on knowledge, prompts, and safety gates based on findings. Run weekly evaluation cycles against an expanded golden set.
Second wave planning (Week 4). Based on pilot learnings, select 2–3 additional departments for the second wave. Identify shared knowledge (company policies, HR info) that can be centralized, and department-specific knowledge that needs separate pipelines. Plan RBAC assignments for new team members.
Knowledge scaling (Weeks 5–7). Expand the knowledge pipeline to cover second-wave departments. Set up new sources, configure domain-specific PII patterns, and establish review workflows for each department's content. Run the Tenant Intelligence pipeline to build cross-department entity relationships.
Second wave deployment (Weeks 6–9). Deploy agents for the second wave departments using the same canary rollout pattern. Each agent follows the full deployment playbook: golden set creation, evaluation, safety gate configuration, production gate, and governance approval.
Workflow automation (Weeks 7–10). Once agents are handling conversations reliably, add workflow integrations. Start with read-only workflows (lookup a ticket, check order status) before enabling write operations (create ticket, update record). Follow the principle of least privilege for agent workflow permissions.
Cross-agent orchestration (Weeks 8–11). Configure agent handoff rules for queries that span departments. Set up the routing layer to direct queries to the most appropriate agent based on intent classification. Test cross-agent scenarios in the evaluation sandbox before enabling in production.
Full rollout preparation (Weeks 9–11). Compile rollout results into an executive report covering: agents deployed, queries handled, resolution rates, cost impact, compliance status, and trust scores. Get executive sign-off to proceed to full rollout. Train remaining departments' teams on their roles.
Full production rollout (Weeks 10–12). Deploy remaining agents across all planned departments. Increase traffic allocation to 100% for mature agents. Establish ongoing operations: weekly evaluation runs, monthly governance reviews, quarterly compliance exports, and continuous golden set expansion from production feedback.

Success criteria: All planned agents deployed and handling 100% of assigned query volume. Composite evaluation scores above 0.80 across all agents. Trust score at Strong (75+) or Exemplary (90+) posture. Executive report approved. Ongoing operations processes documented and staffed.

Playbook 5: Compliance Audit Preparation

Compliance Audit Preparation Playbook

Timeline: 2–3 weeks before audit Team: Compliance Officer + Governance Admin + Platform Owner Difficulty: Intermediate

This playbook prepares your Operanix deployment for a SOC 2, HIPAA, or GDPR compliance audit.

Identify audit scope and framework. Confirm with your auditor which framework applies (SOC 2 Type II, HIPAA, GDPR, or custom). Determine the audit period (typically 6–12 months of evidence). Identify which Operanix components are in scope: agents, knowledge pipeline, governance, workflows, or all.
Run an audit trail integrity check. Navigate to Governance > Audit Timeline and run the integrity verification. Confirm the cryptographic hash chain is intact for the entire audit period. Document the verification result with timestamp and screenshot for the evidence package.
Generate compliance export packages. Go to Governance > Compliance Export and generate the appropriate report package (SOC 2, HIPAA, or GDPR). Review the generated evidence for completeness. Fill in any gaps by exporting additional audit trail segments or configuration snapshots.
Review RBAC configuration. Export the current RBAC role assignments. Verify that separation of duties is enforced. Check for any stale access (users who have left or changed roles). Document the role assignment rationale and last review date. Confirm MFA is enforced for all users.
Document data flows. Create a data flow diagram showing how data enters Operanix (knowledge sources), how it is processed (pipeline stages), where it is stored (databases, indices), how it is accessed (agent queries, API calls), and how it exits (workflow actions, exports). Annotate with encryption and access control details.
Review data retention compliance. Verify that data retention policies are configured and enforced. Check that no data exists beyond its retention period. Document retention settings for each data type. Confirm that deletion events are logged in the audit trail.
Compile safety gate evidence. Export safety gate configuration and trigger history for the audit period. Document each gate's purpose, configuration, and effectiveness metrics (trigger rate, false positive rate). Show examples of gates correctly blocking unsafe content.
Prepare incident response documentation. Compile any security incidents from the audit period with root cause analysis and remediation actions. If no incidents occurred, document the monitoring and detection controls that were in place. Include tabletop exercise results.
Review vendor and subprocessor documentation. Compile the list of third-party services used by Operanix (LLM providers, cloud infrastructure, integrations). Document data processing agreements, security certifications, and data residency for each. Ensure all subprocessors are disclosed per GDPR requirements.
Conduct a pre-audit self-assessment. Walk through the audit checklist with your team. For each control, verify evidence exists, is accurate, and is accessible. Identify any gaps and remediate before the auditor arrives. Flag items that need auditor discussion.
Prepare the evidence package. Organize all evidence into a structured package: executive summary, control descriptions, testing evidence, exception documentation, and remediation plans for any findings. Include cryptographic hashes for integrity verification.
Brief the team. Ensure team members who may be interviewed by the auditor understand their roles, the controls they operate, and where to find supporting evidence. Prepare a contact sheet with each team member's areas of responsibility.

Success criteria: Audit trail integrity verified, compliance export generated and reviewed, RBAC documentation complete, data flow diagram current, all evidence organized in a structured package, and team briefed on their roles in the audit process.

Playbook 6: Agent Quality Improvement

Agent Quality Improvement Playbook

Timeline: Ongoing (2-week improvement cycles) Team: Agent Manager + Knowledge Editor Difficulty: Intermediate

This playbook establishes a continuous improvement cycle for agents already in production. Use it when evaluation scores plateau, user feedback declines, or you want to systematically improve agent performance.

Baseline current performance. Run a full evaluation and record scores across all 7 dimensions. Export the evaluation report as your improvement baseline. Note the specific cases where the agent scored lowest and the dimensions with the most room for improvement.
Analyze production feedback. Go to Agent Readiness > Feedback Loop and review the last 30 days of user feedback. Cluster negative feedback by topic: incorrect answers, missing information, wrong tone, too verbose, failed to escalate, or PII exposure. Identify the top 3 failure categories by volume.
Investigate root causes. For each top failure category, trace the issue to its root cause. Common root causes and their fixes:
- Incorrect answers → Knowledge gap (add missing content) or retrieval failure (tune chunk size/embedding model)
- Missing information → Source not crawled (add source) or content not approved (check review queue)
- Wrong tone → System prompt needs refinement (update persona description)
- Too verbose → Prompt instruction adjustment (add conciseness directive)
- Failed escalation → Escalation triggers misconfigured (expand trigger conditions)
Expand the golden set. Add 10–15 new test cases based on production failures. Convert high-confidence negative feedback into golden set entries. Ensure new cases cover the identified failure categories. Target the difficulty distribution: more hard cases if the agent struggles with complex queries.
Apply knowledge improvements. If root cause analysis points to knowledge gaps: add new sources, update stale content, or improve extraction quality. Run the knowledge pipeline for changed sources. Verify retrieval metrics after the update (Recall@5, MRR). Publish through the standard approval chain.
Tune the agent configuration. If root cause analysis points to agent behavior: refine the system prompt, adjust temperature settings, update citation format, or modify response length constraints. Make one change at a time to isolate the impact of each adjustment.
Run A/B evaluation. Use Agent Readiness > A/B Comparisons to test the improved agent against the baseline. Run both versions against the same golden set (including new cases). Compare per-dimension scores and identify any regressions introduced by the changes.
Review hallucination report. Check the Hallucination Report for any new hallucination patterns. Trace each hallucination to its cause: missing knowledge (add content), conflicting sources (resolve contradictions in the knowledge base), or retrieval failure (tune retrieval parameters).
Pass the production gate. If the A/B comparison shows improvement without regressions, run a full evaluation against the production gate thresholds. Complete the manual review checkpoints. Submit for governance approval through the standard approval chain.
Deploy with canary. Deploy the improved agent version with a canary rollout (10% traffic). Monitor trust score, safety gate triggers, and user feedback for 24–48 hours. Compare metrics against the previous version. If stable, promote to 100%.
Document and share learnings. Record what was changed, why, and what impact it had on evaluation scores. Share findings with other Agent Managers so improvements can be applied across the agent fleet. Update internal best practices documentation.
Schedule the next improvement cycle. Set a calendar reminder for the next 2-week improvement cycle. Over time, the feedback loop will surface diminishing numbers of issues as agent quality improves. When improvement plateaus, shift focus to expanding agent capabilities rather than fixing quality issues.

Success criteria: Composite evaluation score improved by at least 0.03 over baseline, top failure category reduced by 50% or more, no regressions in safety or groundedness dimensions, expanded golden set covers identified failure patterns, and canary deployment completed successfully.

Choosing the Right Playbook

Scenario	Start With
Brand new to Operanix	First Agent Deployment → Knowledge Pipeline Setup → Governance Implementation
Scaling beyond pilot	Enterprise Rollout
Upcoming compliance audit	Compliance Audit Preparation
Agent quality needs improvement	Agent Quality Improvement
Setting up for a new department	Knowledge Pipeline Setup → First Agent Deployment