Knowledge Operations Guide

The Operanix Knowledge Operations pipeline transforms raw enterprise data into verified, agent-ready knowledge. This guide covers the complete 7-step pipeline, two-layer RAG architecture, compliance gates, and tenant intelligence system.

Knowledge Operations is the foundation of every Operanix agent. Agents can only answer questions grounded in knowledge that has passed through this pipeline, ensuring accuracy and compliance at every step.

Pipeline Overview

The knowledge pipeline processes enterprise data through seven sequential stages, each with built-in quality gates and compliance checks. Data enters as raw sources and exits as verified, indexed knowledge ready for agent consumption.

StepStagePurposeOutput
1SourcesConnect and configure data sourcesRaw content streams
2ReviewHuman review of extracted contentApproved documents
3Agent CoverageMap knowledge to agent domainsCoverage assignments
4Training & EvalFine-tune retrieval and validate qualityTrained embeddings
5Tenant Intelligence12-stage deep enrichment pipelineEnriched knowledge graph
6PublishDeploy to production with compliance sign-offLive knowledge base
7Pipeline RunsMonitor execution, logs, and healthAudit trail

Step 1: Sources

The Sources tab is where you connect enterprise data to the knowledge pipeline. Operanix supports a wide range of source types, each with configurable crawl schedules and extraction settings.

Supported Source Types

Crawl Configuration

Scheduling

Each source can be configured with a crawl schedule: hourly, daily, weekly, or custom cron expressions. The pipeline tracks content hashes to skip unchanged documents, minimizing compute and API costs.

{
  "source": "web_crawl",
  "url": "https://docs.example.com",
  "schedule": "0 2 * * *",
  "depth": 3,
  "include_patterns": ["/docs/*", "/api/*"],
  "exclude_patterns": ["/blog/*", "/changelog/*"],
  "respect_robots": true,
  "max_pages": 500
}

Step 2: Review

Every piece of extracted content enters the Review queue before it can proceed through the pipeline. This human-in-the-loop stage ensures that only relevant, accurate, and appropriate content reaches your agents.

Review Workflow

Content that contains detected PII is held in the review queue and cannot be published until the PII is redacted or an authorized compliance officer approves the exception.

Step 3: Agent Coverage

After review, approved content must be mapped to one or more agents. The Agent Coverage tab provides a matrix view showing which knowledge domains are assigned to which agents.

Coverage Matrix

The coverage matrix displays agents on one axis and knowledge domains on the other. Each cell shows a coverage status:

Auto-Assignment

Enable auto-assignment to let Operanix automatically map new knowledge to agents based on their configured specialization domains. Auto-assigned content still appears in the coverage dashboard for manual review.

Step 4: Training & Evaluation

Once knowledge is mapped to agents, the Training & Eval stage validates that agents can actually retrieve and use the knowledge correctly.

Retrieval Training

Evaluation Metrics

MetricTargetDescription
Recall@5≥ 0.90Correct chunk appears in top 5 results
MRR≥ 0.80Mean reciprocal rank of correct chunk
Latency P95≤ 200ms95th percentile retrieval time
Chunk relevance≥ 0.85LLM-judged relevance of top chunk to query

Step 5: Tenant Intelligence

The Tenant Intelligence pipeline is a 12-stage deep enrichment process that transforms raw knowledge into a richly connected knowledge graph. This is the most compute-intensive stage and runs asynchronously.

12-Stage Pipeline

Stages 1–4: Extraction

  • Stage 1: Entity extraction — Identifies products, features, people, organizations, dates, and domain-specific entities using NER models tuned to your industry.
  • Stage 2: Relationship mapping — Detects relationships between entities (e.g., "Product X integrates with Service Y") and builds an entity graph.
  • Stage 3: Topic clustering — Groups related chunks into coherent topics using hierarchical clustering. Topics become navigable categories in the knowledge base.
  • Stage 4: Sentiment & intent analysis — Tags content with sentiment polarity and detected user intent (informational, transactional, navigational).

Stages 5–8: Enrichment

  • Stage 5: Gap detection — Identifies topics mentioned but not fully covered. Generates gap reports with suggested content to author.
  • Stage 6: Contradiction detection — Cross-references facts across documents to find conflicting statements (e.g., different pricing on two pages).
  • Stage 7: Freshness scoring — Assigns decay scores based on content age, update frequency, and domain volatility. Stale content is flagged for re-crawl or manual update.
  • Stage 8: Cross-reference linking — Creates bidirectional links between related chunks, enabling agents to follow context chains when answering complex queries.

Stages 9–12: Quality & Compliance

  • Stage 9: Deduplication — Deterministic deduplication using content hashing (SHA-256) and semantic similarity. Near-duplicates (similarity > 0.92) are merged, preserving the most recent version.
  • Stage 10: Compliance classification — Automated classification against configured compliance frameworks (SOC 2, HIPAA, GDPR, PCI-DSS). Content that triggers compliance rules is routed to the compliance gate.
  • Stage 11: Quality scoring — Each chunk receives a composite quality score based on completeness, clarity, accuracy confidence, and source authority.
  • Stage 12: Index promotion — Final stage packages the enriched knowledge graph and promotes it to the production index with a versioned snapshot for rollback.

Two-Layer RAG Architecture

Operanix uses a two-layer retrieval-augmented generation (RAG) architecture that combines structured entity retrieval with document chunk retrieval for maximum accuracy.

Layer 1: Structured Entities

The first retrieval layer queries the entity graph built during Tenant Intelligence. When an agent receives a question, it first identifies relevant entities (products, features, policies) and retrieves their structured attributes and relationships. This layer provides precise, factual answers for entity-centric queries.

Layer 2: Document Chunks

The second layer performs traditional vector similarity search against the chunk index. Results from both layers are merged, re-ranked using a cross-encoder model, and passed to the LLM with source attribution metadata.

The two-layer approach improves answer accuracy by 23% on entity-centric queries compared to chunk-only RAG, while maintaining equivalent performance on open-ended questions.

Retrieval Flow

User Query
  |
  v
[Query Analysis] -- extract entities, intent, keywords
  |
  +--> [Layer 1: Entity Graph] -- structured lookup
  |         |
  +--> [Layer 2: Vector Search] -- semantic similarity
  |         |
  v         v
[Merge & Re-rank] -- cross-encoder scoring
  |
  v
[Compliance Filter] -- remove restricted content
  |
  v
[LLM Generation] -- grounded response with citations

Compliance Gate

The compliance gate is a mandatory checkpoint that sits between the knowledge pipeline and production deployment. No knowledge reaches agents without passing through this gate.

Gate Checks

The compliance gate cannot be bypassed. Even admin users must go through the gate. All gate decisions are logged to the immutable audit trail with the reviewer's identity and timestamp.

Deterministic Deduplication

Operanix employs a two-phase deduplication strategy to prevent duplicate knowledge from reaching agents:

Phase 1: Hash-Based (Exact Match)

Every ingested document and chunk is assigned a SHA-256 content hash. Before insertion, the hash is checked against the existing index. Exact matches are skipped immediately with zero compute overhead.

Phase 2: Semantic Similarity (Near-Duplicate)

For content that passes hash deduplication, a fast embedding comparison identifies near-duplicates. Content pairs with cosine similarity above 0.92 are flagged. The system preserves the version with the higher quality score and more recent timestamp, creating a merge record in the audit trail.

// Deduplication decision logic
if (contentHash === existingHash) {
  skip("exact_duplicate");
} else if (cosineSimilarity(embedding, existingEmbedding) > 0.92) {
  if (newQualityScore > existingQualityScore) {
    replace(existing, newContent);
    audit("near_duplicate_replaced", { reason: "higher_quality" });
  } else {
    skip("near_duplicate_lower_quality");
  }
} else {
  insert(newContent);
}

Step 6: Publish

The Publish stage deploys reviewed, enriched, and compliance-cleared knowledge to the production environment.

Step 7: Pipeline Runs

The Pipeline Runs tab provides full observability into every execution of the knowledge pipeline.

Run Dashboard

Pipeline runs are integrated with the Operanix audit trail. Every run, stage execution, approval, and publish event is recorded with full traceability for compliance audits.

Best Practices